Today is the 1st day of MySQL UC and I’m sitting in on the clustering tutorial, which promises to be three hours of information on MySQL’s clustering feature. I’m sure you’re thrilled that I’m, ahem, live blogging this. In reality, this is just so I have a central place for my notes.
- Broken up into three parts. The MySQL servers sit separate from the NDB Storage Engine, which are storage nodes (NDB nodes). The third part is called a management server. The management server, oddly enough, isn’t required once the cluster is up and running unless you want to add another storage node.
- Memory based storage engine. If you’re not using 5.1+ then you must have enough RAM in each storage node to store the data set. This means that if you have four machines with 4GB of RAM each you can store 8GB of data (16GB of total storage divided by two for two copies of the data set).
- Storage nodes are static and pre-allocate resources on startup.
- Supports transactions and row level locking.
- Should be noted this is a storage engine so you can’t create MyISAM or InnoDB tables inside of a cluster.
- Uses fixed sized records. This means if you have a varchar(255) and put a single byte into it that field is still using 255 bytes.
- No foreign key constraint support.
- Replication across nodes is syncronous across nodes. I assume this means that an INSERT happens once all of the nodes have completed the INSERT. This is different than regular replication which is asyncronous and introduces race conditions.
- Tables are divided into fragments (one fragment for each storage node). Each storage node is responsible for each fragment. Each fragment also has a secondary fragment, which is a copy of another node’s primary fragment. This data distribution happens automatically.
- NDB takes your primary key, creates a hash and then converts that to a fragment. So you’ll have various rows on each different storage node.
- A node group is a set of nodes that share the same fragment information. If you lose an entire node group you’ve lost half of the table and the cluster will not continue to operate. However, if one node in a node group fails there will still be enough data to keep the cluster up and running.
- If a node fails and it’s secondary counterpart takes over it will, essentially, have to perform the job of two nodes. Until a node has fully recovered it will not rejoin the cluster.
- Backups are hot and non-locking. Each node writes its own set of backup files. No support for incremental backups.
- Because it’s memory based you could lose data on a system crash (as you might have transactions sitting in RAM when a crash occurs). The COMMIT command does not write changes to disk meaning that you could have data sitting in memory that’s not on disk when a node crashes. This means the odd truth is that MySQL Clusters support syncronous replication, but are not atomic.
- NDB nodes will checkpoint data to disk (data + logs), which are used for system recovery. They write two logs, the UNDO and REDO logs.
- They recommend using TRUNCATE to delete all rows from a table.
- Modification operations are distributed to both the primary and secondary fragments (obviously).
- NDB will run on 64-bit machines. They recommend Dual CPU 64-bit machines. NDB is threaded. Application nodes (MySQL servers) can be whatever.
- SCI offers 30-100% better performance over gigabit.
- They actually recommend avoiding joins and to denormalize your schemas. Are you kidding me? He actually said “Performance for joins sucks.”
Overall, I’m underwhelmed by MySQL Clustering. You’re limited in storage with the RAM and you can’t optimize your schemas due to fixed field sizes. And any RDBMS “solution” that recommends you denormalize puts me off.
That being said the actual technology is pretty interesting and I suspect that in a few years we’ll see the clustering features in MySQL come into their own. As of now I suspect few people would be able to justify the sacrifices for the gains clustering allows.