Replication for Scaling and High Availability

The second session today is replication for scaling and high availability, which I’m extremely interested in hearing about as we’ve been having problems with race conditions on our current replication setup.

  1. Replication works with all table types.
  2. Any “critical” reads must be done on the master as replication is asynchronous.
  3. Master will rotate binary logs automatically for every 1G of log records.
  4. You must purge any old, unused logs yourself. See the PURGE MASTER LOGS command. Using file system commands is not recommended as there are index files that need to be updated.
  5. Master and slave must be similar hardware as slaves generally do as much work and, sometimes, more work than the master. In other words, don’t skimp on the slave.
  6. Keep a “spare” slave that you can take down and clone to bring up new slaves.
  7. Use a load balancer for managing access to slaves.
  8. There is no raw limit to the number of slaves that a master can host, but each slaves takes up one connection so you’re limited by your max connections on the master.
  9. You can create “relay” slaves that relay data from the master to slaves sitting below them. Doing this increases replication delays though.
  10. If a slave will also be a master (see above) you need to enable log-slave-updates.
  11. Relying on the master-* lines in my.cnf can be problematic over using CHANGE MASTER SQL statement. The slave rememebers master information.
  12. Datacenter failover involves the www connecting through a proxy, which then decides who the current writeable master is.
  13. Should avoid DNS when doing inter-datacenter failover. DNS caching, propagation, etc.
  14. Using proxy allows instantaneous and complete switching of traffic between masters. Another option is stunnel.
  15. A connection comes through A and goes directly to that DB server. On B it comes through and is forwarded to A instead of going to B. None of the connections connect directly to either A or B.
  16. Writes go to only one master at a time. The firewall rules ensures that.
  17. Both masters should use skip-slave-start and read-only. Entire setup should come up read-only at boot. Essentially, make sure they come up in the state they failed in.
  18. Need to handle “Server is read only” and “Connection refused” in your application when failover is occurring.
  19. Master<->Master replication is only a problem if you’re writing to both masters at the same time.
  20. They recommend switching traffic between datacenters manually as it’s difficult to determine when to actually failover and not actually automatically failing over.
  21. Steps for failing over
    1. Set current master to read-only
    2. Wait for writes to flush through. Checking how far behind the other master is up-to-date. A way to do this is by creating a table on the master every second and seeing how long it takes to show up on the slave.
    3. Remove forward rules with old settings
    4. Add fowarding rules with new settings
    5. Remove read-only from new master using SET GLOBAL read_only
  22. Make sure to STOP SLAVE on the new master when master fails. When you bring it back up check the data first. If all is well then allow replication on the failed master to catch up from the new master. If not then restore from the new master.
  23. Reverse the above steps to bring the master back up (usually done during a planned failover later). You’ll remain live against the new master until then.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.