I’ve put up my slides for both MySQL and FOWA in a newly minted presentations directory on my site. I also have almost all three hours of audio from my FOWA workshop. You can also view and embed my slides by checking out my page on SlideShare.
I don’t have slides from SXSW because it was a panel without slides. If anyone has audio from my SXSW panel or my MySQL presentation please contact me at joe [at] joestump [dot] net.
I’ve got a few upcoming speaking engagements. If you’re going to be at either of the following events please let me know and we can meet up for drinks or whatever.
- I’m going to be giving a workshop at FOWA in Miami titled Improving the Performance of Your Web App on Feb 28th.
- I’ll be speaking at the MySQL Conference in April in Santa Clara, CA. My talk will be a short 45 minute tutorial on “Services Oriented Architecture with PHP and MySQL“. I’ll be covering how to make parallel and asynchronous data requests over HTTP. Code samples, best practices, etc.
- I’ll be on a panel at SXSW titled “Considerations for Scalabale Web Ventures” with Cal Henderson of Flickr, Chris Lea of Media Temple and Matt Mullenweg of WordPress. It’ll be moderated by Digg’s own Kevin Rose. We’ll talk about caching, scaling out, best practices, etc.
We switched to InnoDB tables a while ago. They offer transactions, foreign key constraints and a few other goodies that are missing from MyISAM. We knew writes would be slower due to foreign key checks, etc, but we didn’t imagine that importing a table with about 160,000 records would take almost an hour to import, while it only took about 15 seconds to import using MyISAM. I did some digging and figured out the solution. Add these lines to your dumps.
SET AUTOCOMMIT=0;
SET UNIQUE_CHECKS=0;
SET FOREIGN_KEY_CHECKS=0;
... Your dump here ...
COMMIT;
SET AUTOCOMMIT=1;
SET UNIQUE_CHECKS=1;
SET FOREIGN_KEY_CHECKS=1;
This sped up InnoDB imports to pretty much the same speed as MyISAM imports for the table in question.
From time to time I get lazy and use REPLACE INTO instead of checking for records and then only inserting the ones that don’t exist. It works really well for importing records where some records might be already in the database. Today, I ran into a little snag due to my laziness.
MySQL uses the following algorithm for
REPLACE (and LOAD DATA ...
REPLACE):
- Try to insert the new row into the table
- While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
- Delete from the table the conflicting row that has the duplicate key value
- Try again to insert the new row into the table
When you combine this with foreign keys that are defined with ON DELETE CASDCADE you end up nuking child records when you REPLACE INTO a parent record. It only took me an hour and a half to debug this insanity.
Ian and I ventured down to MySQL UC with one goal, which was to figure out a way to scale MySQL efficiently. Specifically, I was looking for a way to scale MySQL in a way that didn’t reduce feature sets and was transparent to my application. My second goal was to learn of ways to optimize MySQL and find out about new features of interest.
The answer to my first question simply; you can’t do that. MySQL offers two options for scaling out and high availability. One is replication, which is asynchronous and requires your application to know where to send reads, writes and “critical” reads. In other words, it’s not transparent to the application. The second option is clustering with NDB, which lacks foreign key constraints and comes with the suggestion that you avoid joins and denormalize your schemas to avoid joins. However, there is a glimmer of hope at the end of the tunnel, which is Continuent’s m/cluster solution, which was exactly what I was looking for. Unfortuantely, at $5k per socket per database node, it comes with a hefty price tag. They do have an open source version that only supports ANSI SQL though which I may look into.
One area which Ian and I have learned quite a bit while at MySQL UC is the abundance of new features in 5.0 and 5.1. Triggers, views, stored procedures and events are all finding their way into these versions. Ian and I have already discussed areas that we’ll be using triggers and stored procedures to create aggregate tables for caching purposes. We have a few areas that we’ll be using events in 5.1 as well.
Some other random thoughts …
- The lunches weren’t all that great. I wish I could opt out of the lunches, save some cash on that and buy my own.
- The weather in Seattle has been better than it has been in Santa Clara, which negates the purpose of having the conference in “sunny” California.
- In discussions with fellow attendees it’s become fairly evident that both Ian and myself are pushing the limits of PHP and MySQL far beyond what most people are doing. This makes it difficult to find people to turn to for help.
- There are a lot of people wearing SCAMP shirts. I can’t believe those dirty bastards have the guts to show up at a conference such as this.
As a side note to my non-techie friends, this is the last in a long string of wholly geek related entries. So, please, quit sending me email complaining about this.
p>
- PHP5 comes with a new MySQL extension called mysqli. While the i stands for improved, interface and incompatible (some say incomplete – HA!).
- Supports MySQL versions starting with MySQL 4.1.
- The new function was basically a way to start over and clean things up to work with the new features in 4.1+.
- Includes SSL connections, stronger password algorithm, prepared statements prevent SQL injection, no default connection parameters. Overall, their goal was to make it safer.
- Can make use of new and more efficient MySQL binary protocol, prepared statements give massive performance improvements on large data sets, faster overall code, support for gzip compressed connections. Additionally, the MySQL server can be embedded into PHP (wtf?).
- You can use either OOP or procedural interfaces, prepared statements make certain operations easier and there’s less that can go wrong (which seems a bit ambiguous).
- Some redundant functions have been dropped, some new functions that support new features and persistent connections are no longer support (about damn time).
- The OOP interface is “marginally” slower than the procedural interface, but tiny compared to the cost of actually getting the data. In other words, use whichever you like.
- In PHP5, the OOP interface supports Exceptions (ie. ConnectException, etc.).
- Added autocommit(), commit() and rollback() functions to the OOP interface.
- Now supports multiple queries with the multi_query() function. This looks absurdly awkward.
Not sure if I can even think of a reason to use this. This functionality was added specifically for stored procedures which can return multiple result sets.
Nevermind, Ian just noticed he’s reading the notes verbatim from the manual. That being said, this looks like an extremely interesting enhancement over the old MySQL client library. We’ll probably look into switching things over when we get back and start forming a larger MySQL strategy.
- They’re now doing 3,500 page views per second and serving that off of about 100 servers.
- Wikipedia is LAMP (Linux, Apache, MySQL and PHP).
- They use 90% of memory in InnoDB buffer pool.
- This is laughable, their operational guideline is “general availability” as opposed to five nines. Their main operational goals are maximum efficiency and always be able to scale as people are always coming in. These are a direct result of them being a donation powered open source community site.
- They use aggregated tables for reports, which doubled overall performance. They also killed some metadata included on every page because it was too expensive to generate.
Overall, not very interesting. They don’t really tell how they’re scaling to that many requests with only 100 servers. I’ve found most of the short sessions missing any kind of meat as to exactly how these guys are scaling MySQL. The feeling I get is that most of them are doing exactly what I’ve been doing, which makes me rethink MySQL’s place in the “high performance” column of databases.
- If your applications use features which were well optimized in 4.1 you might see performance degredation (wtf?).
- As of now 4.1 will only get critical bug fixes.
- Amazingly it appears that 5.0 is slightly slower than 4.1. Why this is I don’t know.
I’m leaving this one to go to the dispearsed storage engine tutorial as I’m bored to death with this. Nothing interesting.