Why I’ll never own another server

Posted on Thursday, April 1st, 2010 at 1:18 pm in cloud computing, clustering

When Matt and I started SimpleGeo I made a decision early on to use Amazon’s AWS services to run our infrastructure. A lot of people basically think I’m nuts for a lot of reasons for this, but I generally get two major questions/concerns when I mention that we run on AWS/EC2.

  1. AWS is slow!
  2. AWS is expensive!

I’ve covered IO performance on EC2 in-depth before and have compared the IO benchmarks, favorably, against numbers from Digg and Media Temple’s systems engineers. The notion that AWS is too slow for your application is, largely, not supported by the numbers and comparisons. The second point I often make with regards to performance on AWS is that Amazon uses this to run large portions of their own infrastructure. Trust me, if it’s good enough for the largest online retailer in the world, it’s good enough for you.

The second point is a bit harder to defend sometimes. Amazon’s AWS can be cheaper than running your own hardware and vice versus. If you run huge amounts of servers AWS can be a few hundred thousand more by comparison on raw numbers that compare cost of your own hardware to cost of AWS. The problem with this vanilla comparison is it forgets one extremely important cost for startups – opportunity cost.

I have a few rhetorical questions as to why people are not using AWS.

  • How many people does it take to maintain your own DC? People have to wrangle hardware, travel around to various DCs, RMA hardware, etc. If they weren’t doing those things, or you didn’t need those people, what could you be doing with those resources if they weren’t wiring your DC?
  • How much time, money, effort, and overhead is it going to take to create multiple data centers? Have you negotiated bandwidth contracts before? Do they have power from multiple providers? Do they have power and bandwidth failover? Amazon has amazing economies of scales and has spent thousands of man hours (years?) preparing for power/bandwidth failover, floods/natural disasters, etc.
  • Managing multiple data centers requires a small army of highly trained network operations people. Have you built DC failover before? Have you implemented load balancing across multiple DCs? It took me about 30 minutes to set up an Elastic Load Balancer that spread traffic across three Availability Zones (Amazon’s term for DCs).
  • Have you thought about building your own automation and self-service APIs for the DC you want to build? Fabric/Chef/Puppet/Capistrano combined with AWS’s automation API is an extremely potent combination for automating large clusters. For instance, we use Fabric and Boto to automate the creation of all nodes in our cluster. I can run a command in Fabric that creates an API server out of thin air, bootstraps it, and puts it into our ELB. This takes about five or so minutes.
  • Have you ever set up a DC in Europe? What about Asia? Would you even know where to start? I can spin up a server in Europe in a matter of seconds. How much might you spend on flying your network operations folks to and fro all of these DCs you plan on building?

These are just a few of the nooks and crannies that people often forget when comparing running their own data centers that I think are extremely important. The two biggest costs, in my opinion, that people forget are opportunity cost and cost of creating automation systems.

To expound a bit on opportunity cost, I’d like to quote the ever-thoughtful Joi Ito.

“If you want to increase the pace of innovation, you need to lower the cost of failure.” — Joi Ito

I can fire up an entire DC for SimpleGeo with a 20-30 node cluster with a few commands, totally automated, run load/consumption/system tests against it, find flaws in my system, and iterate in a matter of hours at a cost of a few hundred dollars.

The simple fact is, SimpleGeo wouldn’t be anywhere near as robust, indeed it might not even exist, as it is without leveraging the cloud.

NoSQL vs. RDBMS: Let the flames begin!

Posted on Friday, March 26th, 2010 at 10:25 am in cloud computing, clustering, nosql

I’ve been getting solidly flamed recently, as have my former coworkers at Digg, my friends at Twitter, etc. about our adoption and promotion of various NoSQL storage systems. It seems that some DBAs are very, very upset that us internet kids are considering abandoning SQL’s ship. I’m not here to throw out a bunch of insane numbers, benchmarks, or flame back, but I did want to point out why SimpleGeo and others are jumping onto the NoSQL bandwagon.

First, and foremost, I haven’t heard of anyone saying MySQL or PostgreSQL on comparable hardware is faster than NoSQL options. The best I’ve heard is that MS SQL setups on SSD drives with lots of RAM could do 6,100 result sets a second. I guess, based on these posts, I’d like to ask a few questions to the people who honestly think RDBMSs can compete with NoSQL solutions at large scale.

  • Do you honestly think that the PhDs at Google, Amazon, Twitter, Digg, and Facebook created Cassandra, BigTable, Dynamo, etc. when they could have just used a RDBMS instead?
  • Has anyone ran RDBMS benchmarks with highly heterogeneous datasets with lots of varying indexes on them? At Digg we had probably a hundred or so tables, each table had varying indexes (a char here, an integer there, a date+time here). Disk IO becomes a serious problem when indexes for different tables are stored on different parts of disks and you have concurrent reads/writes. I know that people have found ways around this, such as 37Signals systems guy putting 15 x 15k RPM drives on his DB server. Assuming $500 a disk (15k disks range from $300 to $800 on Newegg) that’s $7,500 just for disks.
  • Anyone out there running an EC2 large instance with a RDBMS on it that’s doing 1,800 reads/second? I’ve got a Cassandra node that was getting hammered with a load of 6 serving that much traffic without falling over, which I think is pretty decent when you consider each node could easily do that and adding more nodes to handle more load is trivial.
  • How much are you spending on those MS SQL servers with SSD drives that serve up 6,100 results a second? MS SQL is $5,999 per processor. Windows Server 2008 is another $1029. Decent 128GB SSDs appear to cost around $450 each. You see where I’m going with this. Nobody is arguing you can’t get RDBMSs to scale up to a few thousand reads/writes a second if you can afford to spend $50,000 or $100,000 per server. The problem is that very few startups can spend that much money on a single server.
  • How much time are your DBAs spending administering your RDBMSs? How much time are they in the data centers? How much do those data centers cost? How much do DBAs cost a year? Let’s say you have 10 monster DB servers and 1 DBA; you’re looking at about $500,000 in database costs.
  • How easy is it to add a new server to your cluster? If we identify a hot spot in our Cassandra cluster, we can have a new node bootstrapped into our cluster in about five minutes. And I mean it’s in production taking writes and serving reads.
  • Does your RDBMS automatically rebalance the entire cluster when a new node is bootstrapped into it?
  • I’m running a 50 node cluster, which spans three data centers, on Amazon’s EC2 service for about $10,000 a month. Furthermore, this is an operational expense as opposed to a capital expense, which is a bit nicer on the books. In order to scale a RDBMS to 6,000 reads/second I’d need to spend on the order of five months of operation of my 50 node cluster.
  • Has anyone ran benchmarks with MySQL or PostgreSQL in an environment that sees 35,000 requests a second? IO contention becomes a huge issue when your stack needs to serve that many requests simultaneously. I know of one company that’s managing to scale portions of their PostgreSQL servers by purchasing $250,000 servers. This would cover my 50 node EC2 cluster for two years.

I guess what I’m saying is that my decision to use NoSQL, and I’m guessing others’ decisions to do so, has less to do with the fact that we can’t squeeze a few thousand writes a second out of MySQL and more to do with management and cost overhead. NoSQL solutions allow us to serve absurd amounts of data for a really, really low price. I’m happy to put my $/write, $/read, and $/GB numbers for my NoSQL setup against anyone’s RDBMS numbers.

We’re not nearly as dumb as everyone thinks we are; I promise.

Disk IO and throughput benchmarks on Amazon’s EC2

Posted on Wednesday, December 9th, 2009 at 11:49 am in benchmarks, cloud computing, ec2

When I told people that we were going to run our infrastructure on Amazon’s EC2 most people recoiled in disgust. I heard lots and lots of horror stories about how you simply couldn’t run production environments on EC2. Disk IO was horrible, throughput was bad, etc. Someone should seriously tell Amazon, since a lot of their own infrastructure runs in EC2 and AWS. For the most part, Amazon’s AWS were internal tools that they released publicly.

We’ve been ironing out kinks in our production environment for the last few weeks and one of the things that worried me was if these assertions were true. So, I set out to run a fairly comprehensive test of Disk IO and throughput. I ran hdparm -t, bonnie++, and iozone against ephemeral drives in various configurations along with EBS volumes in various configurations.

For all of my tests I tested the regular ephemeral drives as they were installed (”Normal”), LVM in a JBOD setup (”LVM”), the two ephemeral drives in a software RAID0 setup (”RAID0″), a single 100GB EBS volume (”EBS”), and two 100GB EBS volumes in a software RAID0 setup (”EBS RAID0″). All of the tests were ran on large instances (m1.large). All tests were ran using the XFS file system.

hdparm -t

EC2_RAID_JBODWhile hdparm -t isn’t the most comprehensive test in the world, I think it’s a decent gut check for simple throughput. To give a little context, I remember Digg’s production servers, depending on setup, ranging from 180MB/sec. to 340MB/sec. I’m guessing if you upgraded to the XL instance type and did a RAID0 across the four drives it has you’d see even better numbers with the RAID0 ephemeral drives.

What I also found pretty interesting about these numbers is that the EBS volumes stacked up “okay” against the ephemeral drives and that the EBS volumes in a RAID0 didn’t gain us a ton of throughput. Considering that the EBS volumes run over the network, which I assume is gigabit ethernet, 94MB/sec. is pretty much saturating that network connection and, to say the least, impressive given the circumstances.

For most applications, I’d guess that EBS throughput is just fine. The raw throughput only becomes a serious requirement when you’re moving around lots of large files. Most applications move around lots of small files, which I’d likely use S3 for anyways. If your application needs to move around lots of large files I’d consider using a RAID0 ephemeral drive setup with redundancy (e.g. MogileFS spreading files across many nodes in various data centers).

bonnie++

20091209-cgq7p1my7mtm5wmsyrjpas5xqfDisregarding the Input/Output Block performance of the ephemeral RAID0 setup here, it’s extremely interesting to note that EBS IO performance is better than the ephemeral drives and that EBS in a RAID0 was better in almost every metric as the ephemeral drives.

That all being said, RAID0 ephemeral drives are the clear winner here. I do wonder, however, if you could set up a RAID0 EBS array that had, say, four or six or eight volumes that’d be faster than the RAID0 setup.

If your application is IO bound then I’d probably recommend using EBS volumes if you can afford it. Otherwise, I’d use RAID0. Again, the trick with the ephemeral drives is to ensure your data is replicated across multiple nodes. Of course, this is cloud computing we’re talking about, so you should be doing that anyways.

EC2_RAID_JBOD-1Here’s the CPU numbers of the various configurations. One thing to note here is that EBS, LVM, and software RAID all come with CPU costs. Somewhat interesting to note is that the EBS has substantially less CPU usage in all areas except Input/Output Per Char.

If your application is both CPU and IO bound then I’d probably recommend upgrading your instance to an XL.

20091209-8kybaf7peu91m3puugxpr78reuThe last bonnie++ results are the random seeks per second and, wow, was I surprised. A single EBS runs pretty much dead even with the LVM JBOD and the EBS RAID0 is on par with the RAID0 ephemeral drives.

To say I was surprised by these numbers would be an understatement. The lesson here is that, if your application does lots of random seeking, you’ll want to use either EBS RAID0 volumes or RAID0 ephemeral drives.

iozone

Before running these tests I’d never even heard of this application, but it seemed to be used by quite a few folks so I thought I’d give it a shot.

20091209-ejjcihfsn16hjdnrnfy1g2ptai

Again, some interesting numbers from EBS volumes. What I found pretty interesting here is that the EBS RAID0 setups actually ended up being slower in a few metrics than a single EBS volume. No idea why that may be.

The other thing to note is that the single EBS volume outperformed the ephemeral RAID0 setup in a few different metrics, most notably being random writes.

Conclusions

I think the overall conclusion here is that disk IO and throughput on EC2 is pretty darn good. I have a few other conclusions as well.

  • If you can replicate your data across multiple nodes then the RAID0 ephemeral drives are the clear winners.
  • If you are just looking to store lots of small files and serve them up, then definitely use S3 with CloudFront.
  • You’d very likely get even more impressive numbers using the XL instances with RAID0 striped across four ephemeral drives.
  • Another potential disk setup would be to put different datasets on different ephemeral drives. For instance, put one MySQL database on one ephemeral drive and your other one on another.
  • If your setups are IO bound and you’re looking for lots of redundancy, then EBS volumes are likely the way to go. If you don’t need the super redundancy on a single box then use RAID0 on ephemeral drives.

Projects

  • Ready-to-use location infrastructure for developers.

    SimpleGeo

    Ready-to-use location infrastructure for developers.

  • Consistent, predictable Twitter avatars backed by an enterpise CDN.

    tweetimag.es

    Consistent, predictable Twitter avatars backed by an enterpise CDN.

Open Source Projects

Categories