Distributed vs. Fault Tolerant Systems

I’ve been researching implementations of distributed search technology for some things we want to do at SimpleGeo. It was about this time that I ran across a project called Katta, which is a “distributed” implementation of Lucene indexes. While perusing the documentation I ran across a diagram detailing the architecture of Katta. What struck me as odd was that Katta was purporting to be a distributed system, yet had a master/slave setup managing it. This led me to tweet out:

Dear Engineers, It’s not a “distributed” system if you have a master/slave setup. That’s simply a “redundant” system. #

Not long after this I got a few inquiries along with some great input from the ever-wise Jan Lehnardt from the CouchDB project along with Christopher Brown:

RT @joestump: “…It’s not a ‘distributed’ system if you have a master/slave setup. That’s simply ‘redundant’” — Same: distributed != sharding #

And it’s not “distributed” if it’s all hub-n-spokes.  Might as well have a central server with some extra storage. #

Basically, Jan and I are making the argument that redundancy or sharding/partitioning doesn’t really add up to a truly distributed system. Well, not in the sense of what I think of when I think of distributed. Going back to my old friend, the CAP Theorem, I think we can see why.

Redundancy could be argued to be the A in CAP; always available. I’d even argue that partitioning, in the sense that most people think of (e.g. partitioning 300m users across 100 MySQL servers), is the A in CAP. The P for partitioning in CAP is that a system is highly tolerant to network partitioning. Because of the master/slave setup of Katta, it really only implements the A in CAP.

Of course, CAP says that you can only have two in any distributed system. I’d make the argument that no system is truly distributed unless it fully implements two of the three properties in CAP. I’d further argue that if you had a highly available (A), highly consistent system (C) even it wouldn’t be distributed (lacking the ability to be highly tolerant to network partitioning).

The problem that I have with Katta’s definition of distributed is that I can’t truly distribute Katta in the raw sense of the word. For instance, I can’t spread Katta across three data centers and not worry about a data center going down. If I lose access to the master then Katta is worthless to me.

To have a truly distributed system I’d argue you need the following characteristics in your software:

  • There can be no master (hub).
  • There must be multiple copies of data spread across multiple nodes in multiple data centers.
  • Software and systems must be tolerant of data center failures.

My point is, I think the word “distributed” is applied too freely to systems that are, more appropriately, called “redundant” or “fault tolerant”.

UPDATE: Seems I confused some people about my master/slave argument. You can build distributed systems that have master/slave pieces in the puzzle. Large distributed systems are rarely one cohesive piece of software, rather they tend to be a series of systems glued together. What I am saying is that a simple master/slave system is not distributed. Through a series of bolts, glue, add-ons, and duct tape you can make a master/slave system sufficiently distributed, but don’t confuse the totality of that system with the base master/slave system.

Year in Review

  1. The year began in Koh Phangan, Thailand with my friend Chris Lea. We spent a month laying on beaches, swinging in hammocks, and drinking booze out of buckets.
  2. While in Thailand I got some more bamboo work done on my left arm.
  3. In February I went down to Miami for Future of Web Apps to talk about scaling your tech teams.
  4. Around my birthday I was able to score a copy of Netscape Navigator 2.0, still in the box, signed by Marc Andreessen.
  5. March brought the usual trip down to Austin, TX for SXSW. I spoke on a panel titled, “Designers and Developers: Why can’t we all just get along?”
  6. In April I attended the Social Foo Camp, which is an invite-only nerdfest put on by O’Reilly.
  7. May was an insane month of travel in a year of insane travel. I spent a week in Michigan, a week in Prague, a day in Phoenix, and a few days in Boulder, CO.
  8. While I was in Michigan, Jonathan and I got our pictures taken by my high school sweetheart, Erica, for our mom for Mother’s Day.
  9. When I returned from Prague I’d made the big decision to leave Digg and build a startup with Matt Galligan. Matt and I created a company called Crash Corp. that was going to build augmented reality and location-based games.
  10. In June I got a new face.
  11. Matt and I agreed to each take a month off to clear our heads before jumping into startup mode. For unknown reasons he decided to spend his month in the Midwest. I, on the other hand, chose to go to Amsterdam, Denmark, Norway, Ireland, and London. This marked my second month off for the year, which was awesome.
  12. I spent about ten days in Norway with my buddy Arne Fismen (Side note: His last name means “fart man” in Norwegian, which is definitely worse than my last name) and was able to fulfill a childhood dream of mine by visiting the world famous fjords of Norway. I can’t express my appreciation enough for what Arne and his family did for me. It was truly a magical experience.
  13. When I returned from Europe I spent a few days in San Francisco before heading down to San Diego for my buddy Dana’s bachelor party.
  14. After Dana’s bachelor party I moved to Boulder, CO to get to work on Matt and I’s company.
  15. Soon after getting on the ground and starting to work through things Matt and I realized we needed to change direction. As a result SimpleGeo was born, which provides location services to developers.
  16. While building SimpleGeo I decided to, after 11 years, switch from PHP to Python as my language of choice.
  17. The change of direction was a watershed moment for the company. Things crystallized for both us and the investors we were pitching. It wasn’t long after this that First Round Capital agreed to be our lead investor.
  18. October was mostly sent flying around to New York City and San Francisco pitching investors, VC’s, etc.
  19. In November we closed a $1.5m round of financing from some of tech’s most well-known investors. I consider this to be the greatest achievement of my career so far.
  20. Over Thanksgiving I spent a few days down in Tulum, Mexico.
  21. In December I flew up to Seattle, WA for a quick visit. It’s still home to me and I can’t wait to move back.

According to TripIt and Dopplr I spent 142 days traveling this year. I don’t have complete numbers, but I’m guessing I logged over 80,000 miles this year on various airlines. As in the tradition of last year, I think it’s only appropriate that I create a list of my year in cities.

  • Koh Samui, Thailand
  • Koh Phangan, Thailand
  • Bangkok, Thailand
  • Seattle, Washington
  • Miami, Florida
  • Austin, Texas
  • Sebastopol, California
  • Ypsiltanti, Michigan
  • Ann Arbor, Michigan
  • East Jordan, Michigan
  • San Francisco, California
  • Prague, Czech Republic
  • Phoenix, Arizona
  • Boulder, Colorado
  • Amsterdam, The Netherlands
  • Roskilde, Denmark
  • Oslo, Norway
  • Bergen, Norway
  • Askvoll, Norway
  • Dublin, Ireland
  • Cork, Ireland
  • London, United Kingdom
  • New York City, New York
  • San Diego, California
  • Minneapolis, Minnesota
  • Ashland, Wisconsin

Disk IO and throughput benchmarks on Amazon’s EC2

When I told people that we were going to run our infrastructure on Amazon’s EC2 most people recoiled in disgust. I heard lots and lots of horror stories about how you simply couldn’t run production environments on EC2. Disk IO was horrible, throughput was bad, etc. Someone should seriously tell Amazon, since a lot of their own infrastructure runs in EC2 and AWS. For the most part, Amazon’s AWS were internal tools that they released publicly.

We’ve been ironing out kinks in our production environment for the last few weeks and one of the things that worried me was if these assertions were true. So, I set out to run a fairly comprehensive test of Disk IO and throughput. I ran hdparm -t, bonnie++, and iozone against ephemeral drives in various configurations along with EBS volumes in various configurations.

For all of my tests I tested the regular ephemeral drives as they were installed (“Normal”), LVM in a JBOD setup (“LVM”), the two ephemeral drives in a software RAID0 setup (“RAID0”), a single 100GB EBS volume (“EBS”), and two 100GB EBS volumes in a software RAID0 setup (“EBS RAID0”). All of the tests were ran on large instances (m1.large). All tests were ran using the XFS file system.

hdparm -t

EC2_RAID_JBODWhile hdparm -t isn’t the most comprehensive test in the world, I think it’s a decent gut check for simple throughput. To give a little context, I remember Digg’s production servers, depending on setup, ranging from 180MB/sec. to 340MB/sec. I’m guessing if you upgraded to the XL instance type and did a RAID0 across the four drives it has you’d see even better numbers with the RAID0 ephemeral drives.

What I also found pretty interesting about these numbers is that the EBS volumes stacked up “okay” against the ephemeral drives and that the EBS volumes in a RAID0 didn’t gain us a ton of throughput. Considering that the EBS volumes run over the network, which I assume is gigabit ethernet, 94MB/sec. is pretty much saturating that network connection and, to say the least, impressive given the circumstances.

For most applications, I’d guess that EBS throughput is just fine. The raw throughput only becomes a serious requirement when you’re moving around lots of large files. Most applications move around lots of small files, which I’d likely use S3 for anyways. If your application needs to move around lots of large files I’d consider using a RAID0 ephemeral drive setup with redundancy (e.g. MogileFS spreading files across many nodes in various data centers).

bonnie++

20091209-cgq7p1my7mtm5wmsyrjpas5xqfDisregarding the Input/Output Block performance of the ephemeral RAID0 setup here, it’s extremely interesting to note that EBS IO performance is better than the ephemeral drives and that EBS in a RAID0 was better in almost every metric as the ephemeral drives.

That all being said, RAID0 ephemeral drives are the clear winner here. I do wonder, however, if you could set up a RAID0 EBS array that had, say, four or six or eight volumes that’d be faster than the RAID0 setup.

If your application is IO bound then I’d probably recommend using EBS volumes if you can afford it. Otherwise, I’d use RAID0. Again, the trick with the ephemeral drives is to ensure your data is replicated across multiple nodes. Of course, this is cloud computing we’re talking about, so you should be doing that anyways.

EC2_RAID_JBOD-1Here’s the CPU numbers of the various configurations. One thing to note here is that EBS, LVM, and software RAID all come with CPU costs. Somewhat interesting to note is that the EBS has substantially less CPU usage in all areas except Input/Output Per Char.

If your application is both CPU and IO bound then I’d probably recommend upgrading your instance to an XL.

20091209-8kybaf7peu91m3puugxpr78reuThe last bonnie++ results are the random seeks per second and, wow, was I surprised. A single EBS runs pretty much dead even with the LVM JBOD and the EBS RAID0 is on par with the RAID0 ephemeral drives.

To say I was surprised by these numbers would be an understatement. The lesson here is that, if your application does lots of random seeking, you’ll want to use either EBS RAID0 volumes or RAID0 ephemeral drives.

iozone

Before running these tests I’d never even heard of this application, but it seemed to be used by quite a few folks so I thought I’d give it a shot.

20091209-ejjcihfsn16hjdnrnfy1g2ptai

Again, some interesting numbers from EBS volumes. What I found pretty interesting here is that the EBS RAID0 setups actually ended up being slower in a few metrics than a single EBS volume. No idea why that may be.

The other thing to note is that the single EBS volume outperformed the ephemeral RAID0 setup in a few different metrics, most notably being random writes.

Conclusions

I think the overall conclusion here is that disk IO and throughput on EC2 is pretty darn good. I have a few other conclusions as well.

  • If you can replicate your data across multiple nodes then the RAID0 ephemeral drives are the clear winners.
  • If you are just looking to store lots of small files and serve them up, then definitely use S3 with CloudFront.
  • You’d very likely get even more impressive numbers using the XL instances with RAID0 striped across four ephemeral drives.
  • Another potential disk setup would be to put different datasets on different ephemeral drives. For instance, put one MySQL database on one ephemeral drive and your other one on another.
  • If your setups are IO bound and you’re looking for lots of redundancy, then EBS volumes are likely the way to go. If you don’t need the super redundancy on a single box then use RAID0 on ephemeral drives.

Welcome to stu.mp

Whoa! What happend to the old site?! Well, after many years of rolling my own code, design, HTML, CSS, etc. I finally gave up and had some professionals take care of the hard parts. I’ve gave up rolling my own code in favor of using WordPress a while ago and am now using the Lifestream plugin for this site, which I hacked to bits.

The more exciting part, I think, though is the design. It’s been in my head for years, but never could figure a way to convert itself into bits and bytes. Thankfully, the highly skilled Jake Mix was able to take care of the gorgeous illustrations. Once I had the illustrations in hand, I had the young protégé Julian Targowski put the various bits together into the amazing design you see now.

There are still a few bits and bytes that are not aligned properly and I’m 99% sure that the site is totally broken in Internet Explorer (a fact I care little about). If you like what you see I highly recommend contacting Jake and/or Julian.

SimpleGeo (formerly CrashCorp) is hiring!

Things are really starting to get crazy at SimpleGeo (formerly CrashCorp), the company I founded with Matt Galligan in June. We’ve been running around like crazy meeting with partners, building the platform, raising money and having a ton of fun.

The good news is we’re hiring! If you’re a Python, infrastructure, scaling, GIS, LBS, systems nerd who loves building massive infrastructure we’ve got some jobs that might interest you.

  • Infrastructure Engineer – Experience with distributed systems, EC2, Python, non-relational storage engines, Django, etc.
  • LBS/GIS Guru – In depth knowledge of LBS/GIS and the algorithms, technology, data, and systems surrounding the field.

If you or anyone you know would be interested in either of these jobs please contact me via email at joe at simplegeo dot com. Must be local to Boulder or willing to relocate.

Why I switched from PHP to Python

When it came time to start putting code to paper at CrashCorp I was faced with the decision of choosing both a language and a platform. After 11 years of coding solely in PHP I’d grown tired of the language and, to some extent, the community (not the people, who are great, but the way the community is organized).

First the language. What makes PHP, as a language, awesome is also what makes it horrible to work with, which is that it’s not really a language, but rather a giant plugin architecture for exposing lower level libraries in a high level fashion. Most of the language that developers use are, in fact, thin wrappers around popular C functions (curl, mysql, gd, etc.). Most of the time these libraries’ functions are simply exposed as-is. Anyone who’s coded curl in C will feel right at home while using curl from PHP. The problem with this is it leads to wildly inconsistent API’s.

Another touchy problem with the language is actually a byproduct of the way PHP, the core language, is managed. It’s, essentially, designed by committee. Anyone who’s ever tried to design a large scale anything knows how problematic this can be. The second problem with this approach is that nobody from on high is setting any kind of recognizable standards. PEAR has its standards and PHP has its standards, while everyone else codes however they damn well please. This leads to SPL classes being more Java style, while PEAR classes look a lot different (e.g. ArrayObject vs. HTTP_Request2).

The ultimate problem of this committee approach is that, before a feature can be integrated, the whole committee has to be on board. This is especially true for core language functionality. For instance, PHP just recently got anonymous functions and short-hand array slicing. Don’t get me started on namespaces in PHP.

I know quite a few core PHP coders personally and, from what I understand, they have a number of problems when evolving PHP. Besides the committee issues and the fact that extensions are coded by a few thousand different people, there’s the fact that PHP is installed on just about every machine on the planet so backwards incompatible changes wreak havoc on code everywhere.

At the end of the day I was tired of PHP’s inconsistent language syntax and waiting for more modern language features. Enter Python.

Python’s approach to creating a language is about as completely opposite as you can get from PHP. First, and foremost, Python is lead by Benevolent Dictator for Life, Guido van Rossum. The result is that the language’s development takes its cues from a single person with a consistent longterm vision of how things should be. Guido and the core Python coders set standards, via PEP’s, on everything from how common interfaces (e.g. DB’s) should work to coding standards (the infamous PEP8). Furthermore, practices Guido thinks are poor coding practices are simply not supported at the language level (e.g. there is no ++ operator nor can you do assignment in comparison operators).

The byproduct of this is that it permeates throughout the Python community. Due to the fact that Python has significant whitespace, combined with PEP8, you’d be hard pressed to find Python code that looks and feels drastically different between various projects.

But, overall, the thing I like most about Python is it explicitness. When you open a file in Python you know precisely what code is affecting that file. How many times I got burned by spaghetti require/include code I can’t tell you so this is a welcome addition.

On top of all of this Python has evolved significantly with regards to systems-level features. Want a daemon? Sure, just do import daemon \ daemon.daemonize(). Want threading? Sure, it’s all there. How about CLI option parsing? Just do from optparse import OptionParser.

Another thing I love about Python, is a religious adherence to KISS. You want namespaces? Fine the name of the file is the namespace. You want modules? Fine just replace / with . along with an __init__.py file and you’re good to go. Would you like to rename that function to something else? Fine just do new_func = old_func.

Finally, a stark difference between PHP and Python is that Guido, essentially, treats the developers as adults while PHP puts significant effort into protecting developers from themselves (I’m looking at you safe_mode). My favorite quote from Guido, while commenting on why Python doesn’t enforce private/protected/public variables was, “Hey, we’re all consenting adults here.” In addition to this, as my friend and Python hacker Mike Malone puts it, is that you can mangle whatever you want in Python. For instance, at runtime you can automatically extend class Foo from class Bar by doing Foo.__bases__ += Bar (Tip: This is especially handy for extending Django’s base User functionality). Much like UNIX, Python gives you more than enough rope to hang yourself, but at least hanging yourself is an option.

Overall, I’m really enjoying my decision to switch over and recommend you check out Python for your next project.

Pass the lubricant as we're getting fucked by Apple too

Stories of developers being absolutely bent over the barrel and fucked hard aren’t new, but I’ve got no other recourse so I’m throwing Blunder Move‘s story into the ring. What makes our story different? I’m lucky enough to personally know people at the iTunes store. People who actually work at Apple that I drink beers with. I’m guessing most iPhone developers are in a different boat, but it doesn’t matter (just look at the Facebook app, which was featured in an iPhone commercial, taking 10+ days to get approved) that I know people there. At least Apple are equal opportunity ass fuckers.

A couple of months ago we released Chess Wars, which allows you to play your Facebook friends via Facebook Connect on your iPhone. When it was released we found a few show stopping bugs that neither us nor Apple found which kept new users from playing the game. Whoops. We pushed 1.1 a few weeks later only to find that there were problems for other new users. Again none of this was caught by us or Apple. As they say, shit happens. We quickly put together a release and submitted it to Apple about 6 weeks ago.

Silence.

Finally, after weeks of waiting I did what I’d tried hard to avoid at all cost; I contacted the friends I knew at Apple who told me to email the submitters. Canned response.

So here, like so many other iPhone developers, we sit getting ass raped on 1-star reviews, which will haunt our application forever, and no recourse. None. Nobody at Apple will respond to us. My friends at Apple can’t do anything. I can’t respond to the 1-star reviews.

To our users affected by this, I’m truly sorry. There’s absolutely nothing I can do about your horrible user experience and, as a developer who loves his users, nothing pains me more.

To Apple, please kindly extend the world class customer service I’m so accustomed to as an Apple fanboy to your developers.

UPDATE: A lot of the feedback around this post has centered around us getting what we deserved for shipping buggy code. I should mention we have about 50 beta testers and over 200 unit tests for this specific application so it’s not like we’re not testing. The two bugs were show stoppers for cases we didn’t think to test, but nonetheless affect most of our new users.

Secondly, we did hear back from Apple. They said they were rejecting the application because our in-game chat looked too much like Apple’s SMS application. I’ve asked if we changed our chat bubbles to look like Facebooks if we’d be allowed in. Our contact at Apple is going to be getting back to me soon.

What pisses me off most about this, and what I conveyed to our contact at Apple, was that it took a widely publicized profanity laced blog post to get their attention. I asked, specifically, why it took weeks to get such a simple response of “Hey, change the chat and we’re good.” back. To Apple’s credit they said I deserved an answer to that question and are looking into it.

UPDATE: Just got off the phone with Apple while I was writing this blog post and they told me, no joke, that the chat bubbles are, in fact, trademarked. Furthermore, they suggested I could, among other suggestions make them “less shiny.”

I wonder if they consider Facebook to be infringing on their trademark.

Creating the perfect bathroom

I do a fair amount of traveling and have been subjected to at least three, arguably four, cultural buckets (European, North American, Asian, and South American). One thing that I always find humor in is the drastic variations on bathrooms from one culture to the next. My experiences have led me to think about what would make the perfect bathroom, by taking bits and pieces from around the world to create a single bathroom.

  • Toilets in Europe and the UK have two flush mechanisms. One is a small button with a single dot on it and the second, larger button, has two dots on it. I find this to be an extremely simple and elegant solution to conserving water.
  • Speaking of toilets, have you ever crapped on a Japanese toilet?! Holy. Shit. Besides my Googler friends, who have been happily crapping on space age Japanese toilets for years, we’ve all been missing out. Seat warmers, bidets, music, automatic lids, and freaking medical sensors! I mean, why don’t they just add laser beams?
  • Public restrooms in Europe, the UK, and Japan have fully enclosed small rooms for their toilets. There’s absolutely no cracks or open air around you. Total privacy while taking a crap in public. Pure genius.
  • Showers in every place I’ve been to in Europe and many in the UK have two knobs, as you’d expect, but they do totally different things. One knob is temperature (many have the actual temperature numbers on them) and the other is pressure. Never fumble around adjusting hot and cold until you get it just right!
  • In Thailand their plumbing systems weren’t made for flushing toilet paper and such so they have a small spray hose (think of the sprayer by your sink attached to a wall by the toilet). Toilet paper is merely used to dry off your clean bottom. I got used to this method pretty quickly and much prefer it over toilet paper.

If I ever do build my own home or renovate another bathroom I’ll be including all of these in my bathroom as I think they really do make the perfect bathroom all together.

Chess Wars is live in the App Store

It’s taken a long time to get here, but the first version of Chess Wars is live in the App Store. It’s a bit awkward, but the general story is that Crash Corp was started to make these games, but switched gears when Matt Galligan and I teamed up to seek funding for other, more interesting, games. As a result, Blunder Move was born.

Chess Wars allows you to play your Facebook friends in chess using Facebook Connect. Here’s a list of features for Chess Wars:

  • No signup process. Simply log in with your Facebook account using Facebook Connect.
  • Play against your Facebook friends! Send challenges and invite your friends to play against you.
  • Works over WiFi, EDGE, or 3G. Play using your iPhone or iPod Touch.
  • A little rusty at Chess? Don’t worry it highlights potential moves when you select a piece.
  • Play and keep track of dozens of games with our simple inbox. Games are organized by game state (e.g. My Turn, Their Turn, New Challenges, etc.).
  • Our in-game chat system allows you to taunt your friends easily. Everyone loves rubbing a good move in!
  • Get notified on Facebook when a friend makes a move or sends you a chat message.
  • Scroll back through move history easily.

If you have any questions please don’t hesitate to contact us with your issues. We’ve got a lot of plans for next versions and will be releasing checkers and reversi next.

Changing your artist name in iTunes Connect

The short story is it’s not possible to change your artist name in iTunes Connect. The longer story is that, when you sign up for an iPhone developer account, you enter in your artist name and, likely, forget all about it. That is, until your app goes live in the App Store and you notice “I dunno” is what your app is listed under.

What makes this so frustrating is that there is absolutely no way to find out what your artist name is. It’s not shown in any of the certificates nor it’s not shown in iTunes Connect. Sure, your copyright holder, company name, etc. are all viewable, but that very important artist name, which your application actually gets listed under? Nope.

Luckily, I didn’t have to contact Apple through normal channels, which I hear takes 4 – 6 weeks to get a change done, but I did find plenty of things to be annoyed about.

  • You cannot preview your application in the App Store without actually publishing it into the App Store.
  • Your artist name is not listed anywhere in iTunes Connect so you cannot verify any changes.
  • You cannot view the status of a case number anywhere. You have no way of passively viewing the status of your case. You have to email them to find out if it’s closed or not.

Apple could easily fix all of these with very little effort. In fact, they offer a preview of what your applications will look like when you sign up, but not after you’ve submitted apps. I’d like to see three things:

  • The list of your applications in iTunes Connect should be switched to look exactly like they would be listed in iTunes.
  • In the applications overview it should show the artist name (even if I’m not allowed to edit it I should at least be able to see it).
  • Allow me to see a simple overview of any pending cases I have. Just something like a case number, when it was opened, and status (e.g. “In Review”, “Not Assigned”, “Waiting on Developer”, “Closed”).

Add this to the long list of things Apple should fix in iTunes Connect and the App Store for developers.