Presentations from MySQL and FOWA

I’ve put up my slides for both MySQL and FOWA in a newly minted presentations directory on my site. I also have almost all three hours of audio from my FOWA workshop. You can also view and embed my slides by checking out my page on SlideShare.

I don’t have slides from SXSW because it was a panel without slides. If anyone has audio from my SXSW panel or my MySQL presentation please contact me at joe [at] joestump [dot] net.

Testing PHP/MySQL Applications with PHPUnit/DbUnit

  • You have until August 8, 2008 to move away fromPHP4.
  • We already use PHPUnit and PHPT at Digg, but this DBUnit thingy has me interested.
  • What should you test?
    • Backend: business logic, data access layers, reusable component/libraries, etc.
    • Frontend: form processing, templates, rich interfaces (AJAX/JSON), feed, web services, etc.
  • Use browser based testing (Selenium / Watir) for interfaces in your acceptance tests.
  • Use unit tests (PHPUnit or PHPT) for parseable results / libraries.
  • Developer tests are tests that developers make to ensure the code works as designed. Acceptance tess are tests that someone makes to ensure it works as the client expects it to.
  • Requirements for proper testing include a reusable test environment, automatic execution of test code, easy to learn/use, etc.
  • Unit tests: PHPUnit, SimpleTest
  • System tests: Selenium (PHPUnit + Selenium RC), Watir
  • Non-Functional tests: ab, httperf, JMeter, Grinder, OpenSTA, etc.
  • Security: Chorizo.
  • PHPUnit_Extensions_Database_TestCase is a port of the DBUnit extension for JUnit to PHPUnit.
    • Used for DB driven projects
    • Puts your DB into a known state between test runs.
    • Avoids problems with one test corrupting the database for other tests.
    • Has the ability to export and import your database to and from XML datasets.
    • You can use tableEquals() to compare a table created / modified by a test to either an array of expected records, an XML file or another control table. Hot.
    • It outputs diff’s between the two tables for inspection. Hot.
    • Avoid testing against MySQL (use SQLite) to avoid using up server resources, inter-process communication, etc. Make sure your SQL is compatible with SQLite. Tests are much faster when ran against SQLite.
    • Tests that only test one thing are more informative that tests where failure can come from many sources.
    • Check out the book xUnit Test Patterns.

Upcoming Speaking Engagements

I’ve got a few upcoming speaking engagements. If you’re going to be at either of the following events please let me know and we can meet up for drinks or whatever.

  • I’m going to be giving a workshop at FOWA in Miami titled Improving the Performance of Your Web App on Feb 28th.
  • I’ll be speaking at the MySQL Conference in April in Santa Clara, CA. My talk will be a short 45 minute tutorial on “Services Oriented Architecture with PHP and MySQL“. I’ll be covering how to make parallel and asynchronous data requests over HTTP. Code samples, best practices, etc.
  • I’ll be on a panel at SXSW titled “Considerations for Scalabale Web Ventures” with Cal Henderson of Flickr, Chris Lea of Media Temple and Matt Mullenweg of WordPress. It’ll be moderated by Digg’s own Kevin Rose. We’ll talk about caching, scaling out, best practices, etc.

Speeding up InnoDB table imports

We switched to InnoDB tables a while ago. They offer transactions, foreign key constraints and a few other goodies that are missing from MyISAM. We knew writes would be slower due to foreign key checks, etc, but we didn’t imagine that importing a table with about 160,000 records would take almost an hour to import, while it only took about 15 seconds to import using MyISAM. I did some digging and figured out the solution. Add these lines to your dumps.

SET AUTOCOMMIT=0;
SET UNIQUE_CHECKS=0;
SET FOREIGN_KEY_CHECKS=0;

... Your dump here ...

COMMIT;
SET AUTOCOMMIT=1;
SET UNIQUE_CHECKS=1;
SET FOREIGN_KEY_CHECKS=1;

This sped up InnoDB imports to pretty much the same speed as MyISAM imports for the table in question.

A warning about REPLACE INTO and InnoDB

From time to time I get lazy and use REPLACE INTO instead of checking for records and then only inserting the ones that don’t exist. It works really well for importing records where some records might be already in the database. Today, I ran into a little snag due to my laziness.

MySQL uses the following algorithm for
REPLACE (and LOAD DATA ...
REPLACE
):

  1. Try to insert the new row into the table
  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
    1. Delete from the table the conflicting row that has the duplicate key value
    2. Try again to insert the new row into the table

When you combine this with foreign keys that are defined with ON DELETE CASDCADE you end up nuking child records when you REPLACE INTO a parent record. It only took me an hour and a half to debug this insanity.

Thoughts on MySQL UC

Ian and I ventured down to MySQL UC with one goal, which was to figure out a way to scale MySQL efficiently. Specifically, I was looking for a way to scale MySQL in a way that didn’t reduce feature sets and was transparent to my application. My second goal was to learn of ways to optimize MySQL and find out about new features of interest.

The answer to my first question simply; you can’t do that. MySQL offers two options for scaling out and high availability. One is replication, which is asynchronous and requires your application to know where to send reads, writes and “critical” reads. In other words, it’s not transparent to the application. The second option is clustering with NDB, which lacks foreign key constraints and comes with the suggestion that you avoid joins and denormalize your schemas to avoid joins. However, there is a glimmer of hope at the end of the tunnel, which is Continuent’s m/cluster solution, which was exactly what I was looking for. Unfortuantely, at $5k per socket per database node, it comes with a hefty price tag. They do have an open source version that only supports ANSI SQL though which I may look into.

One area which Ian and I have learned quite a bit while at MySQL UC is the abundance of new features in 5.0 and 5.1. Triggers, views, stored procedures and events are all finding their way into these versions. Ian and I have already discussed areas that we’ll be using triggers and stored procedures to create aggregate tables for caching purposes. We have a few areas that we’ll be using events in 5.1 as well.

Some other random thoughts …

  1. The lunches weren’t all that great. I wish I could opt out of the lunches, save some cash on that and buy my own.
  2. The weather in Seattle has been better than it has been in Santa Clara, which negates the purpose of having the conference in “sunny” California.
  3. In discussions with fellow attendees it’s become fairly evident that both Ian and myself are pushing the limits of PHP and MySQL far beyond what most people are doing. This makes it difficult to find people to turn to for help.
  4. There are a lot of people wearing SCAMP shirts. I can’t believe those dirty bastards have the guts to show up at a conference such as this.

As a side note to my non-techie friends, this is the last in a long string of wholly geek related entries. So, please, quit sending me email complaining about this.

ext/mysqli

  1. PHP5 comes with a new MySQL extension called mysqli. While the i stands for improved, interface and incompatible (some say incomplete – HA!).
  2. Supports MySQL versions starting with MySQL 4.1.
  3. The new function was basically a way to start over and clean things up to work with the new features in 4.1+.
  4. Includes SSL connections, stronger password algorithm, prepared statements prevent SQL injection, no default connection parameters. Overall, their goal was to make it safer.
  5. Can make use of new and more efficient MySQL binary protocol, prepared statements give massive performance improvements on large data sets, faster overall code, support for gzip compressed connections. Additionally, the MySQL server can be embedded into PHP (wtf?).
  6. You can use either OOP or procedural interfaces, prepared statements make certain operations easier and there’s less that can go wrong (which seems a bit ambiguous).
  7. Some redundant functions have been dropped, some new functions that support new features and persistent connections are no longer support (about damn time).
  8. The OOP interface is “marginally” slower than the procedural interface, but tiny compared to the cost of actually getting the data. In other words, use whichever you like.
  9. In PHP5, the OOP interface supports Exceptions (ie. ConnectException, etc.).
  10. Added autocommit(), commit() and rollback() functions to the OOP interface.
  11. Now supports multiple queries with the multi_query() function. This looks absurdly awkward. Not sure if I can even think of a reason to use this. This functionality was added specifically for stored procedures which can return multiple result sets.

Nevermind, Ian just noticed he’s reading the notes verbatim from the manual. That being said, this looks like an extremely interesting enhancement over the old MySQL client library. We’ll probably look into switching things over when we get back and start forming a larger MySQL strategy.

Wikipedia: cheap & exmplosive scaling with LAMP

  1. They’re now doing 3,500 page views per second and serving that off of about 100 servers.
  2. Wikipedia is LAMP (Linux, Apache, MySQL and PHP).
  3. They use 90% of memory in InnoDB buffer pool.
  4. This is laughable, their operational guideline is “general availability” as opposed to five nines. Their main operational goals are maximum efficiency and always be able to scale as people are always coming in. These are a direct result of them being a donation powered open source community site.
  5. They use aggregated tables for reports, which doubled overall performance. They also killed some metadata included on every page because it was too expensive to generate.

Overall, not very interesting. They don’t really tell how they’re scaling to that many requests with only 100 servers. I’ve found most of the short sessions missing any kind of meat as to exactly how these guys are scaling MySQL. The feeling I get is that most of them are doing exactly what I’ve been doing, which makes me rethink MySQL’s place in the “high performance” column of databases.

Speeding-up Queries: New Features of the MySQL 5.0 Query Engine (Part 1)

  1. Cost-based query optimization allows you to assign costs to operations and assign costs to partial plans, which allows them to find better search plans with lower costs.
  2. Possible because query plans are simple, they have data statistics and they have meta-data.
  3. Cost of an operation is proportional to disk accesses. A cost unit equals a random read of data page (4Kb).
  4. Variable length fields (BLOB, VARCHAR, etc.) require extra guess work and adds to the cost of that operation. I would assume this means variable length fields would be a bad idea when used in an index.
  5. MySQL does internal optimization on your queries to decide how it should do searches and joins based on the cost of each way to execute the query. Each choice is called a “plan” and each plan has a cost (see above).
  6. The problem with this approach is that the more joins you add the more plans (options) are added to the mix for optimization, which means that as you add joins the more time the “optimizer” takes to optimize your query. Evidently, this limit is somewhere around 5-7 joins in a single table.
  7. With 5.0 they’ve introduced a “Greedy search” that lets you suggest the plan, which is a trade off as the optimizer is skipped (at least this is how it sounds from his explanation).
  8. Users can influence the choice of indexes with the USE INDEX, FORCE INDEX and IGNORE INDEX sytnax. These should be used with extreme care as indexes can be added and dropped.
  9. You can also check out optimizer_search_depth whith tells the optimizer how much effort to put into looking for the best search plan. The default is automatic.
  10. Another option is optermizer_prune_level.
  11. The third way to force optimization on joins is to use the STRAIGHT_JOIN syntax to force join order.
  12. The range optimizer can detect ranges that can be merged, such as SELECT * FROM tb1 WHERE foo < 10 OR foo < 20. In doing so it finds the minimal sequence of smallest possible disjoint intervals.

MySQL Performance: 5.0 vs 4.1

  1. If your applications use features which were well optimized in 4.1 you might see performance degredation (wtf?).
  2. As of now 4.1 will only get critical bug fixes.
  3. Amazingly it appears that 5.0 is slightly slower than 4.1. Why this is I don’t know.

I’m leaving this one to go to the dispearsed storage engine tutorial as I’m bored to death with this. Nothing interesting.