It's not the language stupid

I’ve said it once, I’ve said it twice, I’ve screamed it from the top of mountains and yet nobody listens. I’m sitting in a session at the MySQL Conference and the person presenting just said, “You have to have well written code to avoid bottlenecks.” This is, put bluntly, stupid and patently false. Let me explain.

  • Your true bottlenecks when scaling are very rarely, if ever, because of your language. Sure Ruby is slower than PHP or Perl or Python, but only incrementally so and it’s only going to get faster. Even if your language is your problem it’s the easiest part of your architecture to scale; add more hardware.
  • Just because your code is well written doesn’t mean it will perform well and, conversely, just because you write shitty code doesn’t mean your code will not perform well. I’ve seen some seriously shitty PHP code that’s blazing fast because it’s so simple.
  • Depending on your application, as you grow you’ll find that your scaling issues come down to one fundamental problem: I/O. DB I/O, file system / disk I/O, network traffic, etc, etc. Ask anyone who’s written a large scale application where their growing pains were and I’ll bet my last dollar it wasn’t “PHP/Python/Ruby/Perl/Java/COBOL is slow”. I’m betting they’ll say something along the lines of “MySQL took a crap on us after we hit 200,000,000 records and had to do date range scans.” Or they’ll say, “I was storing user generated content and NFS couldn’t scale to the amount of requests for that content.”

I’m sick and tired of the language zealots who say PHP is slower than Perl or Ruby is slower than PHP or Java  sucks because which language you’re using has zero to do with that missing index on your table or the fact that you can’t store all of that user generated content.

It comes down to your architecture and, despite what the zealots would have you believe, the language you choose is only one component of your overall architecture. Choose what you know and run with it.

Wikipedia: cheap & exmplosive scaling with LAMP

  1. They’re now doing 3,500 page views per second and serving that off of about 100 servers.
  2. Wikipedia is LAMP (Linux, Apache, MySQL and PHP).
  3. They use 90% of memory in InnoDB buffer pool.
  4. This is laughable, their operational guideline is “general availability” as opposed to five nines. Their main operational goals are maximum efficiency and always be able to scale as people are always coming in. These are a direct result of them being a donation powered open source community site.
  5. They use aggregated tables for reports, which doubled overall performance. They also killed some metadata included on every page because it was too expensive to generate.

Overall, not very interesting. They don’t really tell how they’re scaling to that many requests with only 100 servers. I’ve found most of the short sessions missing any kind of meat as to exactly how these guys are scaling MySQL. The feeling I get is that most of them are doing exactly what I’ve been doing, which makes me rethink MySQL’s place in the “high performance” column of databases.