It's not the language stupid

I’ve said it once, I’ve said it twice, I’ve screamed it from the top of mountains and yet nobody listens. I’m sitting in a session at the MySQL Conference and the person presenting just said, “You have to have well written code to avoid bottlenecks.” This is, put bluntly, stupid and patently false. Let me explain.

  • Your true bottlenecks when scaling are very rarely, if ever, because of your language. Sure Ruby is slower than PHP or Perl or Python, but only incrementally so and it’s only going to get faster. Even if your language is your problem it’s the easiest part of your architecture to scale; add more hardware.
  • Just because your code is well written doesn’t mean it will perform well and, conversely, just because you write shitty code doesn’t mean your code will not perform well. I’ve seen some seriously shitty PHP code that’s blazing fast because it’s so simple.
  • Depending on your application, as you grow you’ll find that your scaling issues come down to one fundamental problem: I/O. DB I/O, file system / disk I/O, network traffic, etc, etc. Ask anyone who’s written a large scale application where their growing pains were and I’ll bet my last dollar it wasn’t “PHP/Python/Ruby/Perl/Java/COBOL is slow”. I’m betting they’ll say something along the lines of “MySQL took a crap on us after we hit 200,000,000 records and had to do date range scans.” Or they’ll say, “I was storing user generated content and NFS couldn’t scale to the amount of requests for that content.”

I’m sick and tired of the language zealots who say PHP is slower than Perl or Ruby is slower than PHP or Java  sucks because which language you’re using has zero to do with that missing index on your table or the fact that you can’t store all of that user generated content.

It comes down to your architecture and, despite what the zealots would have you believe, the language you choose is only one component of your overall architecture. Choose what you know and run with it.

Testing PHP/MySQL Applications with PHPUnit/DbUnit

  • You have until August 8, 2008 to move away fromPHP4.
  • We already use PHPUnit and PHPT at Digg, but this DBUnit thingy has me interested.
  • What should you test?
    • Backend: business logic, data access layers, reusable component/libraries, etc.
    • Frontend: form processing, templates, rich interfaces (AJAX/JSON), feed, web services, etc.
  • Use browser based testing (Selenium / Watir) for interfaces in your acceptance tests.
  • Use unit tests (PHPUnit or PHPT) for parseable results / libraries.
  • Developer tests are tests that developers make to ensure the code works as designed. Acceptance tess are tests that someone makes to ensure it works as the client expects it to.
  • Requirements for proper testing include a reusable test environment, automatic execution of test code, easy to learn/use, etc.
  • Unit tests: PHPUnit, SimpleTest
  • System tests: Selenium (PHPUnit + Selenium RC), Watir
  • Non-Functional tests: ab, httperf, JMeter, Grinder, OpenSTA, etc.
  • Security: Chorizo.
  • PHPUnit_Extensions_Database_TestCase is a port of the DBUnit extension for JUnit to PHPUnit.
    • Used for DB driven projects
    • Puts your DB into a known state between test runs.
    • Avoids problems with one test corrupting the database for other tests.
    • Has the ability to export and import your database to and from XML datasets.
    • You can use tableEquals() to compare a table created / modified by a test to either an array of expected records, an XML file or another control table. Hot.
    • It outputs diff’s between the two tables for inspection. Hot.
    • Avoid testing against MySQL (use SQLite) to avoid using up server resources, inter-process communication, etc. Make sure your SQL is compatible with SQLite. Tests are much faster when ran against SQLite.
    • Tests that only test one thing are more informative that tests where failure can come from many sources.
    • Check out the book xUnit Test Patterns.

Speeding up InnoDB table imports

We switched to InnoDB tables a while ago. They offer transactions, foreign key constraints and a few other goodies that are missing from MyISAM. We knew writes would be slower due to foreign key checks, etc, but we didn’t imagine that importing a table with about 160,000 records would take almost an hour to import, while it only took about 15 seconds to import using MyISAM. I did some digging and figured out the solution. Add these lines to your dumps.

SET AUTOCOMMIT=0;
SET UNIQUE_CHECKS=0;
SET FOREIGN_KEY_CHECKS=0;

... Your dump here ...

COMMIT;
SET AUTOCOMMIT=1;
SET UNIQUE_CHECKS=1;
SET FOREIGN_KEY_CHECKS=1;

This sped up InnoDB imports to pretty much the same speed as MyISAM imports for the table in question.