Hello? Is anybody paying attention?

I’ve been meaning to write up a few things about things that are currently going on in the news that absolutely shock me. Today I was tipped over the edge by news that Attorney General Alberto R. Gonzales says prosecuting journalists over leaks is a possibility.

  • Gonzales Says Prosecutions of Journalists Are Possible – I hope he plans on prosecuting Robert Novak first thing. The chilling effect and erosion to our first amendment rights if this comes to fruition is staggering and alarming.
  • Wired news publishes AT&T documents – The phone companies turned over your phone records to the NSA so the government could mine that data for suspicious activities without a warrant. In fact, they didn’t even bother to get a FISA warrant, which are handed out like candy. Fuck you, get a search warrant like the constitution tells you to do.
  • Some Republicans want to make crossing the border without papers a felony. Where the hell are we going to put these people? The prisons are so overcrowded that we regularly let convicted rapists and murderers out too early. This simply doesn’t make any sense.
  • Bush wants to use the National Guard to patrol the border. These are the same men and women are have spent months in Iraq away from their families. These are the same men and women that are trained to do the following: kill people, repair tanks, blow shit up, recover from national disasters. On top of it they’d be patrolling the border in 2-3 week shifts. I don’t know about you, but doesn’t it make more sense to, you know, hire real border personnel?

Does this make sense to anybody? Why aren’t people alarmed by the possibility of prosecuting journalists? Why aren’t we questioning stupid ideas like putting National Guardsmen on border patrol? Why aren’t we calling for the resignation and impeachment of a President that has repeatedly broken the law to spy on his fellow countrymen?

How about this one?

So Ian and I are walking from my place to Honey Hole to get some grub tonight when we see two guys walking in front of us. Both are pretty large rough looking banger type dudes. Both are wearing sweat pants and have their hats on backwards. One is wearing four sizes too small leopard print slippers.

I notice that both of them are checking out cars parked on the street and commenting about the cars to each other. At this point one of them walks up to an older Ford Taurus and kicks the tires and says to the other one, “This one looks like it’d be easy to take.” I think to myself, “There going to steal this fucking car!” Ian and I look at each other and start laughing because these two retards literally kicked the cars tires before contemplating stealing it in broad daylight.

Thoughts on MySQL UC

Ian and I ventured down to MySQL UC with one goal, which was to figure out a way to scale MySQL efficiently. Specifically, I was looking for a way to scale MySQL in a way that didn’t reduce feature sets and was transparent to my application. My second goal was to learn of ways to optimize MySQL and find out about new features of interest.

The answer to my first question simply; you can’t do that. MySQL offers two options for scaling out and high availability. One is replication, which is asynchronous and requires your application to know where to send reads, writes and “critical” reads. In other words, it’s not transparent to the application. The second option is clustering with NDB, which lacks foreign key constraints and comes with the suggestion that you avoid joins and denormalize your schemas to avoid joins. However, there is a glimmer of hope at the end of the tunnel, which is Continuent’s m/cluster solution, which was exactly what I was looking for. Unfortuantely, at $5k per socket per database node, it comes with a hefty price tag. They do have an open source version that only supports ANSI SQL though which I may look into.

One area which Ian and I have learned quite a bit while at MySQL UC is the abundance of new features in 5.0 and 5.1. Triggers, views, stored procedures and events are all finding their way into these versions. Ian and I have already discussed areas that we’ll be using triggers and stored procedures to create aggregate tables for caching purposes. We have a few areas that we’ll be using events in 5.1 as well.

Some other random thoughts …

  1. The lunches weren’t all that great. I wish I could opt out of the lunches, save some cash on that and buy my own.
  2. The weather in Seattle has been better than it has been in Santa Clara, which negates the purpose of having the conference in “sunny” California.
  3. In discussions with fellow attendees it’s become fairly evident that both Ian and myself are pushing the limits of PHP and MySQL far beyond what most people are doing. This makes it difficult to find people to turn to for help.
  4. There are a lot of people wearing SCAMP shirts. I can’t believe those dirty bastards have the guts to show up at a conference such as this.

As a side note to my non-techie friends, this is the last in a long string of wholly geek related entries. So, please, quit sending me email complaining about this.

ext/mysqli

  1. PHP5 comes with a new MySQL extension called mysqli. While the i stands for improved, interface and incompatible (some say incomplete – HA!).
  2. Supports MySQL versions starting with MySQL 4.1.
  3. The new function was basically a way to start over and clean things up to work with the new features in 4.1+.
  4. Includes SSL connections, stronger password algorithm, prepared statements prevent SQL injection, no default connection parameters. Overall, their goal was to make it safer.
  5. Can make use of new and more efficient MySQL binary protocol, prepared statements give massive performance improvements on large data sets, faster overall code, support for gzip compressed connections. Additionally, the MySQL server can be embedded into PHP (wtf?).
  6. You can use either OOP or procedural interfaces, prepared statements make certain operations easier and there’s less that can go wrong (which seems a bit ambiguous).
  7. Some redundant functions have been dropped, some new functions that support new features and persistent connections are no longer support (about damn time).
  8. The OOP interface is “marginally” slower than the procedural interface, but tiny compared to the cost of actually getting the data. In other words, use whichever you like.
  9. In PHP5, the OOP interface supports Exceptions (ie. ConnectException, etc.).
  10. Added autocommit(), commit() and rollback() functions to the OOP interface.
  11. Now supports multiple queries with the multi_query() function. This looks absurdly awkward. Not sure if I can even think of a reason to use this. This functionality was added specifically for stored procedures which can return multiple result sets.

Nevermind, Ian just noticed he’s reading the notes verbatim from the manual. That being said, this looks like an extremely interesting enhancement over the old MySQL client library. We’ll probably look into switching things over when we get back and start forming a larger MySQL strategy.

Wikipedia: cheap & exmplosive scaling with LAMP

  1. They’re now doing 3,500 page views per second and serving that off of about 100 servers.
  2. Wikipedia is LAMP (Linux, Apache, MySQL and PHP).
  3. They use 90% of memory in InnoDB buffer pool.
  4. This is laughable, their operational guideline is “general availability” as opposed to five nines. Their main operational goals are maximum efficiency and always be able to scale as people are always coming in. These are a direct result of them being a donation powered open source community site.
  5. They use aggregated tables for reports, which doubled overall performance. They also killed some metadata included on every page because it was too expensive to generate.

Overall, not very interesting. They don’t really tell how they’re scaling to that many requests with only 100 servers. I’ve found most of the short sessions missing any kind of meat as to exactly how these guys are scaling MySQL. The feeling I get is that most of them are doing exactly what I’ve been doing, which makes me rethink MySQL’s place in the “high performance” column of databases.

Speeding-up Queries: New Features of the MySQL 5.0 Query Engine (Part 1)

  1. Cost-based query optimization allows you to assign costs to operations and assign costs to partial plans, which allows them to find better search plans with lower costs.
  2. Possible because query plans are simple, they have data statistics and they have meta-data.
  3. Cost of an operation is proportional to disk accesses. A cost unit equals a random read of data page (4Kb).
  4. Variable length fields (BLOB, VARCHAR, etc.) require extra guess work and adds to the cost of that operation. I would assume this means variable length fields would be a bad idea when used in an index.
  5. MySQL does internal optimization on your queries to decide how it should do searches and joins based on the cost of each way to execute the query. Each choice is called a “plan” and each plan has a cost (see above).
  6. The problem with this approach is that the more joins you add the more plans (options) are added to the mix for optimization, which means that as you add joins the more time the “optimizer” takes to optimize your query. Evidently, this limit is somewhere around 5-7 joins in a single table.
  7. With 5.0 they’ve introduced a “Greedy search” that lets you suggest the plan, which is a trade off as the optimizer is skipped (at least this is how it sounds from his explanation).
  8. Users can influence the choice of indexes with the USE INDEX, FORCE INDEX and IGNORE INDEX sytnax. These should be used with extreme care as indexes can be added and dropped.
  9. You can also check out optimizer_search_depth whith tells the optimizer how much effort to put into looking for the best search plan. The default is automatic.
  10. Another option is optermizer_prune_level.
  11. The third way to force optimization on joins is to use the STRAIGHT_JOIN syntax to force join order.
  12. The range optimizer can detect ranges that can be merged, such as SELECT * FROM tb1 WHERE foo < 10 OR foo < 20. In doing so it finds the minimal sequence of smallest possible disjoint intervals.

MySQL Performance: 5.0 vs 4.1

  1. If your applications use features which were well optimized in 4.1 you might see performance degredation (wtf?).
  2. As of now 4.1 will only get critical bug fixes.
  3. Amazingly it appears that 5.0 is slightly slower than 4.1. Why this is I don’t know.

I’m leaving this one to go to the dispearsed storage engine tutorial as I’m bored to death with this. Nothing interesting.

MySQL Partitioning

MySQL’s new partitioning feature is a way to split tables up into chunks based on various criteria. For instance, you could break up a table based on a timestamp and rows older than a specific date are stored in MySQL’s archive storage engine.

  1. Partitioning will first appear in version 5.1.
  2. Partitioning is a “Divide and Conquer” solution for large tables.
  3. You can partition data based on year. For instance you can put data from before 1999 into /var/data/part1, before 2001 onto /var/data/part2, etc.
  4. You can move data from partition to partition as data becomes stale as well.
  5. Sits above the storage engine layer and works with all table types. Sits in its own layer.
  6. You can remove partitioning with ALTER TABLE t1 REMOVE PARTITIONING.
  7. Limits on partitioning include that the partition function must return an integer result, all partitions must use the same storage engine, and it increases response time with many partitions.
  8. You can partition by odd and even values. For instance, you can put even values into partitions 2, 4, 6 and 8 and odd values into partitions 1, 3, 5 and 7.
  9. You can partition by hashing, which lets MySQL decide where to put the data. There isn’t a logical distribution between partitions so it makes pruning more difficult. For instance, you never really know which partition the data is on (unlike the above example).
  10. You can create composit partitions as well, which are essentially sub-partitions.
  11. If a primary key is defined no fields outside of primary key is allowed in partition function (wtf?).
  12. You can reorganize partitions whenever you want. For instance you can roll partition 2 into partition 3 and move their locations. This, however, locks the table until the partition operations have completed.
  13. If you have both a PRIMARY KEY and a UNIQUE index in a single you cannot do partitioning on that table.
  14. If a storage engine has a table size limit of 64GB (ie. InnoDB) you can have 64GB on each partition. This essentially allows you to scale beyond your storage engine’s table size limit.
  15. Did I just hear something about foreign key constraints being put into the meta storage engine in the near future? This would allow foreign key constraints for all table types I believe.
  16. Creating, updating, altering, etc. of partitions is done atomically. Either the change to the partition happens or it’s rolled back and an error message is returned.
  17. If one of your partitions goes away all of your data goes away. Yikes!
  18. It doesn’t sound like you can partition federated tables.
  19. Paritioning does work with replication.
  20. You can use it with auto-increment primary keys.

This is a very cool feature for people who have very large data sets and are looking to break those data sets up into smaller data sets. For instance, if you have a table with 1m records in it you could split it up into two data sets of 500k records in each data set, which would reduce the number of rows MySQL needs to scan during queries.

MySQL Cluster: New Features and Enhancements

This specifically about new features in the 5.1 release. As much as I like Australia I think it’s retarded to be talking about that at a MySQL conference, though it is funny I suppose.

  1. They now support variable sized rows, which reduces memory usage. This equates to more rows per gigabyte.
  2. Add/Update Index has been optimized over 5.0. Before it copied the entire table, added the new index and then moved it back over the old table. This all happened over the wire so you can imagine how long that took. This optimization speeds things up about four times.
  3. 5.1 now allows you to replicate across clusters. Used for geographical redundancy, split the processing load (why not add more nodes to the cluster?), etc. You have to use the row-based replication to enable this feature.
  4. Failover of replication channels is manual.
  5. They’re adding support for data on disk in 5.1 and indexes on disk in the future.

This absolutely amazes me. The moral of the story at MySQL UC has been that clustering is a nightmare to set up, maintain and use. However, storing data on disk is a step in the right direction. The solution from Continuent seems to be infinitely more elegant than both clustering and replication.

MySQL Replication: New Features and Enhancements

  1. MySQL 5.0 supports auto-increment variables for bi-directional replication to avoid auto-increment collisions.
  2. MySQL 5.0 replicates character sets and time zones.
  3. 5.0 now replicates stored procedures, triggers and views.
  4. MySQL 5.1 introduces row-based logging and replication (RBR)
  5. Dyanmic switching of binary log format between ROW, STATEMENT and MIXED.
  6. 5.1 allows for cluster replication.
  7. Replication method cannot be configured per table, but since it’s dynamic you can change it from the client before a transaction, etc.
  8. With auto_increment_offset and auto_increment_increment you can change starting points and how many you step between. Works with most table types. Works with InnoDB and MyISAM. This is specifically for multi-master setups.
  9. RBR allows clusters to replicate and also allows the server to replicate non-deterministic statements such as LOAD_FILE(). That being said you can’t have different table definitions on the slave as you can with SBR.

Wow, nothing super impressive here that makes me excited about new features in MySQL’s 5.0/5.1 replication. Their answer to the possible auto_increment collisions seems a bit simplistic and short-sighted, but then again I’m not a maintainer for MySQL’s replication code.

What annoys me greatly is MySQL’s refusal to simply add features to storage engines in favor of simply adding new storage engines. MyISAM doesn’t support transactions or foreign key constraints? Use InnoDB. InnoDB doesn’t support FULLTEXT? Use MyISAM. You need synchronous replication of data? Use NDB, but you need to denormalize and reduce your use of JOIN’s.

It’s enough to make me switch to PostgreSQL.