Why I switched from PHP to Python

When it came time to start putting code to paper at CrashCorp I was faced with the decision of choosing both a language and a platform. After 11 years of coding solely in PHP I’d grown tired of the language and, to some extent, the community (not the people, who are great, but the way the community is organized).

First the language. What makes PHP, as a language, awesome is also what makes it horrible to work with, which is that it’s not really a language, but rather a giant plugin architecture for exposing lower level libraries in a high level fashion. Most of the language that developers use are, in fact, thin wrappers around popular C functions (curl, mysql, gd, etc.). Most of the time these libraries’ functions are simply exposed as-is. Anyone who’s coded curl in C will feel right at home while using curl from PHP. The problem with this is it leads to wildly inconsistent API’s.

Another touchy problem with the language is actually a byproduct of the way PHP, the core language, is managed. It’s, essentially, designed by committee. Anyone who’s ever tried to design a large scale anything knows how problematic this can be. The second problem with this approach is that nobody from on high is setting any kind of recognizable standards. PEAR has its standards and PHP has its standards, while everyone else codes however they damn well please. This leads to SPL classes being more Java style, while PEAR classes look a lot different (e.g. ArrayObject vs. HTTP_Request2).

The ultimate problem of this committee approach is that, before a feature can be integrated, the whole committee has to be on board. This is especially true for core language functionality. For instance, PHP just recently got anonymous functions and short-hand array slicing. Don’t get me started on namespaces in PHP.

I know quite a few core PHP coders personally and, from what I understand, they have a number of problems when evolving PHP. Besides the committee issues and the fact that extensions are coded by a few thousand different people, there’s the fact that PHP is installed on just about every machine on the planet so backwards incompatible changes wreak havoc on code everywhere.

At the end of the day I was tired of PHP’s inconsistent language syntax and waiting for more modern language features. Enter Python.

Python’s approach to creating a language is about as completely opposite as you can get from PHP. First, and foremost, Python is lead by Benevolent Dictator for Life, Guido van Rossum. The result is that the language’s development takes its cues from a single person with a consistent longterm vision of how things should be. Guido and the core Python coders set standards, via PEP’s, on everything from how common interfaces (e.g. DB’s) should work to coding standards (the infamous PEP8). Furthermore, practices Guido thinks are poor coding practices are simply not supported at the language level (e.g. there is no ++ operator nor can you do assignment in comparison operators).

The byproduct of this is that it permeates throughout the Python community. Due to the fact that Python has significant whitespace, combined with PEP8, you’d be hard pressed to find Python code that looks and feels drastically different between various projects.

But, overall, the thing I like most about Python is it explicitness. When you open a file in Python you know precisely what code is affecting that file. How many times I got burned by spaghetti require/include code I can’t tell you so this is a welcome addition.

On top of all of this Python has evolved significantly with regards to systems-level features. Want a daemon? Sure, just do import daemon \ daemon.daemonize(). Want threading? Sure, it’s all there. How about CLI option parsing? Just do from optparse import OptionParser.

Another thing I love about Python, is a religious adherence to KISS. You want namespaces? Fine the name of the file is the namespace. You want modules? Fine just replace / with . along with an __init__.py file and you’re good to go. Would you like to rename that function to something else? Fine just do new_func = old_func.

Finally, a stark difference between PHP and Python is that Guido, essentially, treats the developers as adults while PHP puts significant effort into protecting developers from themselves (I’m looking at you safe_mode). My favorite quote from Guido, while commenting on why Python doesn’t enforce private/protected/public variables was, “Hey, we’re all consenting adults here.” In addition to this, as my friend and Python hacker Mike Malone puts it, is that you can mangle whatever you want in Python. For instance, at runtime you can automatically extend class Foo from class Bar by doing Foo.__bases__ += Bar (Tip: This is especially handy for extending Django’s base User functionality). Much like UNIX, Python gives you more than enough rope to hang yourself, but at least hanging yourself is an option.

Overall, I’m really enjoying my decision to switch over and recommend you check out Python for your next project.

Year in Review

  1. The end of 2008 marks the end of my first year as Digg’s Lead Architect. In that time we’ve rewritten the majority of the site using frameworks that I built. We’re currently rewriting the underlying data access layer to be horizontally partitioned, elastic, services oriented and multi-homed.
  2. In early January, Digg Images launched and, with it, the result of months of work resulting in a completely rewritten submission framework for Digg. This project resulted in me writing and releasing Net_Gearman. I consider this project to be some of my best work at Digg.
  3. In early January I snuck off to Vail for one last snowboarding trip before back surgery. It was on this trip that I finally became comfortable with Western black diamonds, including an awkward drop off of an 8+ foot precipice into 3+ feet of fluffy powder.
  4. On January 23rd, 2008 I went in for back surgery. Two hours after surgery I was up and walking around without a hint of sciatica or back pain. I can’t thank Dr. Fred Naraghi enough for what I view as a second chance at life.
  5. 2008 will be known as the Year of the Conferences for me. I spoke at Future of Web Apps in Miamion a panel at SXSW on scaling websitesMySQL Conference on Services Oriented Architecture, Web 2.0 Expo in New York City, Future of Web Apps in London with Blaine Cook, Future of Web Design on the friction between developers and designers, and Q-Con in San Francisco on Digg’s architecture.
  6. The summer brought another bout of triathlon training. Along with my friend Mark Lewandowski, I trained for my first Olympic distance triathlon, which I ended up finishing in 2 hours, 50 minutes and change. As part of our training Mark and I also did a 72 mile bike race around Lake Tahoe. The race included 3,900 feet of vertical gain over 72 miles and is, without a doubt, the most challenging endurance race of my life. I finished the race in 4 hours, 15 minutes and change.
  7. In June I was elected to the PEAR Group, which is the governing board of my favorite PHP project.
  8. In early September I launched PleaseDressMe with my friends AJ and Gary Vaynerchuk. The site continues to gain traction in the tshirt arena and is, to date, my most successful side project.
  9. In October Aubrey, Kevin and I went on a whirlwind tour of Europe that included Oktoberfest in Munich, London, and Amsterdam.
  10. November brought big news at Digg with the hire of my friend and release manager for PHP6, Andrei Zmievski, as Digg’s first Open Source Fellow.
  11. November also brought about me finally diving into Python and Django for a side project. I’ve built an API for iPhone games that my friend Garren and I plan on releasing soon. More on this to come.
  12. December brought another trip to Thailand with my good friend Chris Lea. We’d originally planned to do Thailand, Cambodia and either Laos or Vietnam, however the islands of Koh Phangan and Koh Samui had other ideas. I type this sitting on Haad Lamai on Koh Samui. So far it’s been an epic trip with highlights including New Year’s Eve on Haad Rin Nok and a trip back to Haad Rin Nok tomorrow for another Full Moon Party.

This year I’m going to follow the year in cities theme that so many other blogs follow because I feel I really have done a ridiculous amount of travel this year.

  • San Francisco, CA
  • Miami, FL
  • Austin, TX
  • San Diego, CA
  • Seattle, WA
  • Vail, CO
  • East Jordan, MI
  • New York, NY
  • Munich, Germany
  • London, United Kingdom
  • Amsterdam, Netherlands
  • Los Angeles, CA
  • Bangkok, Thailand
  • Haad Leela, Koh Phangan, Thailand
  • Haad Lamai, Koh Samui, Thailand

I’m going to start a new theme here today. Below is my year in open source software. This is a list of projects I’ve released publicly and/or have contributed to. I’m not sure how many lines of code this is, but this is, by far, my most prolific year in FOSS contributions.

Giving back to the community

At Digg we use a lot of open source software. A short list includes PHP, Memcached, MogileFS, Gearman, Debian GNU/Linux, Python, Perl, MySQL, Apache, APC and PEAR. Something that may not be quite as well known is that Digg developers have been busy giving back to the community as well.

The best part, in my opinion, about all of this is that we release our code under the most liberal license possible given the circumstances – the New BSD License (We use New BSD to protect Digg’s trademarks).

Of course there are other companies that contribute significantly to FOSS. Flickr, Facebook, Yahoo!, IBM and Google are just a few and I’m more than happy to say that Digg is giving back as well.

A discussion on languages and frameworks

I start all of my talks at conferences on architecture and scaling with describing the distinct differences between scaling and performance. I define scaling in one word: specialization. I, somewhat jokingly, retort to the question of what performance is with “Who cares?”.

The reason for not caring (much) about actual performance is that whether or not you use single quotes, double quotes, objects, functions, Python, Ruby, PHP, foreach, etc. has nothing to do with whether or not your application and site will scale.

Scaling is entirely about IO. Fundamentally, it’s about whether your data is being stored in a manner that makes retrieving it at the rate of today’s high traffic websites possible. In other words, Ruby isn’t the reason you can’t store 250,000,000 records in MySQL and do range scans. It’s because MySQL (and most RDBMS’s) suck.

I’ve been playing with Django a lot lately for a side project. Despite being a pragmatic coder to a fault, I’ve decided to truly learn another language and I felt Django would ease the shock a bit. So far I love it. Django’s patterns make a lot of sense and I’m loving the goodies a true OO language like Python gives me. Since starting down this path I’ve been getting two questions over and over:

  1. Does this mean Digg is going to be using Django?
  2. Why didn’t you choose Ruby on Rails?

Digg will not be using Django or any other framework anytime soon. We deal with traffic that most Django developers will probably never see. Our stack receives billions of requests a month. That kind of traffic, as I stated earlier, requires specialization. Django is the exact opposite of this. It’s a generic web framework made to answer the majority of web programmer’s basic needs. The majority of web programmers don’t deal with the problems we deal with. I’m sure if you ripped out a lot of what makes Django so great (e.g. the models, admin, etc.) Django would be fine (e.g. If we used it only for mapping requests to views and templates), but then it wouldn’t really be Django would it?

I chose Python and Django over Ruby and Ruby on Rails for a number of reasons. First and foremost is that we use Python here at Digg. Learning Python will only enhance my ability to perform my duties at Digg. I, personally, dislike the Perlisms in Ruby. Additionally, Ruby seems to skew towards implicitness in the language, while Python skews toward explicitness. I like explicitness.

To sum things up, I have nothing technically against Ruby as a language, I love frameworks for regular development work and Perl’s syntax kills small children.

Choose what you love and be happy coders.

Parsing PUT requests in PHP

I’m working on the next generation data access layer for Digg right now, which is basically a REST layer built on top of a partitioned and multihomed database setup. The general idea is that we’ll send GET, POST, PUT and DELETE requests to URI’s on our services layer to access and manipulate data. PHP makes accessing GET and POST easy via $_GET and $_POST. DELETE isn’t an issue since what we’re deleting is just the entity defined by the URI (e.g. Sending DELETE to /2.0/User/1234.xml will delete User 1234).

After a few days work I can create, fetch and delete entities from this setup. Today I started working on implementing the PUT method. I always knew PHP wasn’t exactly top notch when it came to PUT support, but I had no idea how annoying it would be to find a simple solution for parsing PUT information. After some digging around this is what I’ve figured out.

$put = array();
parse_str(file_get_contents('php://input'), $put);

That should parse everything into a native PHP array, including arguments like foo[bar]=1&foo[baz]=2. If anyone knows of a more native way of doing this please let me know.

Digg is hiring LAMP programmers

It’s 6PM on a Thursday night and I’m about ready to head over to the Open Web Awards presented by Mashables.com to celebrate Digg’s wins with a few of my fellow Diggers. The only downer is that we don’t have more Diggers to share the fruits of our labor with. It reminded me that I should probably tell all 10 people who read this site that we’re looking for talented people to work with us in our San Francisco office (in Potrero Hill). Below is a little insight into what you’d be doing if you worked at Digg.

  • Program for the 36th largest site on the intertubes according to Compete.com. Digg.com does 20,000,000+ unique visits a month. That’s a lot of zeros!
  • We use the LAMP stack (Debian GNU/Linux, Apache, MySQL and PHP with some Python thrown around from time to time) and expect you to be proficient with that.
  • Learn from some of the brightest minds in the PHP, design and operations communities.
  • Play with Memcache, Gearman, Mogile, PEAR, etc. in a high volume environment.
  • Create and contribute to open source projects while you work on high traffic and scalability problems at Digg.
  • 20 days of PTO a year and access to all sorts of great benefits (medical, dental, vision, etc.).

I love working at Digg. It’s fun, fast pace and I work with some of the brightest minds in the industry. If you have any questions or interest please email jobs@digg.com.

Framework 0.4.0

Thanks to Southwest delaying my 1.5 hour flight to Portland by 2 hours I’ve finally gotten time to package up the latest release of Framework. I jumped ahead to 0.4.0 from the 0.2.x series due to the fact that 0.4.0 isn’t really 100% backwards compatible and because I felt it was a major enough release to skip ahead a number. Lots of bug fixes, features, etc.

The biggest enhancements are that I switched to a pure exception model for error handling. I even went so far to switch the default PEAR_Error handler to throw Framework_Exception’s. Also, as asked by a ton of people I’ve created the ability to change your DB abstraction layers or not use one at all.

New iPhone version of Digg

Like many other interesting projects the Digg iPhone project started with a conversation over a few beers and a challenge: code it in 48 hours and Kevin would give me a free iPhone. Being the unabashed Apple fanboy I am and, also, being a self-respecting coder I set out to create Digg for the iPhone.  After spending a short time white boarding the application, Daniel mocked up the design and I set off to code it.

Technically speaking, it’s no revolutionary application and I didn’t spend the entire 48 hours working on the application. I did, of course, find time to go see Transformers (awesome) over the weekend. The JavaScript was borrowed from Joe Hewitt and adapted a bit using jQuery. The application iteself is based on our API using the Services_Digg PEAR package I maintain. I’ve been talking with the jQuery team about some limitation in the animations and plan on packaging up a more robust iPhone JavaScript library based on jQuery sometime in the near future (hopefully).

Yesterday Kevin announced the iPhone application and Daniel Burka, our designer at Digg, has covered the details about designing for the iPhone. So far the response to the application has been positive. It was fun and I’m glad everyone is enjoying it. I know I did while riding the bus to work today.

And to answer everyone’s questions. I got an 8GB version last night, the keyboard is interesting, it’s breathtakingly gorgeous and I’ll write YAiPR (Yet Another iPhone Review) soon.

Technical Background of Digg's new Comment System

UPDATE: As you might have noticed, we’re having some technical issues rolling out the new comments. Please bare with us as we work out the kinks.

UPDATE: We reworked a few things and the comments are now live again.

Today Digg launched it’s redesign of the comment system, which was programmed by yours truly. Daniel has written up a detailed overview of the design decisions so if you’re interested in the design aspects you’ll definitely want to check that out.

There were a few fairly complex technical changes to the comment systems, which I’ll outline and then go into a little detail about.

  • We’ve been talking about moving towards a services oriented architecture on and off since I started in February. Steve had coded the API and the decision was made that the new comments system would use Digg’s public API.
  • Comment threads would be loaded dynamically using AJAX and JSON.
  • All commenting, editing, etc. would happen via AJAX.

The API ended up making the PHP code behind the scenes relatively painless. Each page loads using two small calls; one to figure out how many comments are on a story and another to fetch the first 50 or so comments (we use a fuzzy limit so it’ll just load all of them if there’s, say, 55 comments on a story). Both our local proxy and the comments code use the Services_Digg package we released via PEAR when the API launched. In fact, the entire permalink page is built using the API now, which is pretty neat.

By far the most complex portion of the comments system was how dynamic it was going to be. Threads would be zipping in and out, we’d be creating 90% of the HTML dynamically in the DOM from JSON, posting and editing over AJAX, etc. It was during design that Micah and I also plotted to remove script.aculo.us and replace it with the smaller jQuery library. The entire comment system is, in fact, a series of jQuery plugins.

Probably the coolest, technically speaking, portion of the new comments is the manner in which most of the page is created. No longer do we create static HTML in PHP and send you a huge HTML page. Instead we give you the basics and, via AJAX/JSON, we make requests to the API and dynamically create the DOM using the FlyDOM jQuery plugin. The FlyDOM JSON templates are a stroke of genius if you’re looking at loading JSON dynamically into the DOM. The advantage of this is that initial page loads are much snappier and you can load the threads you wish to read on demand.

I really picked up the whole dynamically created DOM ball and ran with if. If you notice, on the initial page load there aren’t any forms anywhere in the DOM. Those, also, are created dynamically on request. An interesting side effect to this is that there’s about 4x as much JavaScript code on the new comments than there is PHP.

The major technical and design changes of comments should lead to faster load times, less bandwidth being eaten up and, hopefully, a better user experience. I hope you enjoy them and, as always, welcome comments and input.

As Kevin would say, “Digg on!”

Introducing correlate.us

As some of you know I’ve been working on a little side project and I’d like to officially announce that correlate.us is open for business. Whatever that means.

I’ve been kicking this idea around for quite some time. The basic idea is that you authenticate your various Web 2.0 accounts and then correlate.us goes out and aggregates that data, groups it by tags and makes it generally more browseable for everyone involved.

The best part about it is that you can add friends. Once you’ve added a few friends you can view your friends’ activity online in an aggregated view as well. Don’t forget to add correlate.us to your facebook profile.