By using MySQL replication, you can distribute MySQL queries over multiple servers to improve application performance, provide high availability (HA), and distribute data across diverse physical locations. The process involves one or more master servers, which send databases or tables asynchronously to slave servers. For all of its potential benefits, MySQL replication can cause serious trouble, especially in complex environments. Follow the advice here to get off to a healthy start.
In MySQL replication each participating server may be a master, a slave, or both. Master servers handle database transactions and write them to a binary log (binlog). Slaves connect to masters and request copies of their binary logs. Servers can act both as master and slave thanks to features such as different auto increment, which sets the interval between successive column values.
Theoretically, it is possible to have more than two masters. In such cases MySQL replicates changes in a circle. However, it is very hard to maintain and troubleshoot such circular replication. When the replication breaks (and such interruptions are unavoidable in the long term) it's hard to ensure the integrity of the information and resume fast operation with all the data. MySQL's developers recommend using no more than two masters. The number of MySQL slaves is not a concern, because slaves cannot make global changes and cannot cause deployment-wide problems.
A Replication Alternative
Distributed Replicated Block Device (DRBD) replication is a non-MySQL-specific replication alternative. It copies the underlying block device that stores the MySQL data files. Block device-level replication is officially supported in the Linux kernel, so it can be used for replicating files for different applications. In DRBD replication, there is always only one master, though this master can be dynamically changed through another service, such as Heartbeat, in case of failure. DRBD replication is suitable for high-availability solutions.
Replicated MySQL data represents a statement that is successfully executed on the master and recorded to its binlog, and is a statement that makes a change, such as insert, update, or delete. Select statements are not logged in the binlog nor replicated unless they invoke stored functions that make changes.
MySQL replication works in two modes. Commonly, statement-based updates are sent to slaves in the form of the original queries. Since version 5.1 MySQL also supports row-based replication, in which the master(s) informs the slave(s) which rows were affected and how they have been changed. This mode is considered more reliable because alerts appear even if a single row fails to be updated.
The simplest MySQL replication involves a single master/slave relationship between two servers. In this case the master MySQL configuration file (/etc/my.cnf by default) should include the following values:
[mysqld]log_bin = master-binlogserver-id = 1
These directives instruct MySQL to write its data-changing queries to the binlog file /var/log/mysq/master-binlog.XXX under a default Linux configuration, where XXX is a sequential number for previous binlog files. This file's events will be sent to the slave server. server-id distinguishes the server from which the queries come.
You must also create on the master server a MySQL user with privileges to get a dump of the binary log from the master:
master> CREATE USER replication_user;master> GRANT REPLICATION SLAVE ON *.* TO replication_user IDENTIFIED BY 'secret_password';
Next it's time to configure the replication on the slave server from its MySQL console:
slave> CHANGE MASTER TO MASTER_HOST='ipaddr_of_slave_server', MASTER_USER='replication_user',MASTER_PASSWORD='secret_password';
You can then start slave replication by executing start slave. To ensure it is running correctly, execute show slave status on the slave server and ensure there are no errors in Last_Error column. If you see any errors you must resolve them before you can get replication working.
show slave status
This oversimplified example shows how easy it is to get basic MySQL replication started. Complex problems are not likely to appear in such a simple master/slave relation, but real-life scenarios may cause significant headaches, as we will see in a moment.
Some of the most common problems MySQL administrators face are slowness and locked tables. Complex searches and updates can decrease performance and even cause downtime when an application's wait timeout exceeds the time for which a table is still locked from a previous query.
Of course, if you have a capital budget, you can scale-up your hardware with more RAM and faster CPUs, but the problems are not likely to disappear. That's why another strategy is usually preferred: MySQL scale-out through replication.
With this strategy, complex and resource-heavy select statements can be outsourced to slave servers, where they will not cause overhead to the main MySQL servers. Thanks to replication, the data on the slave servers mirrors that of the master(s). This technique is especially useful for generating regular reports or performing searches on big data.
Meanwhile, heavy update and replace statements can be performed on other master servers – sometimes. It can help only when row-based replication is used and the number of updates rows is little; otherwise, it will send a large amount of data, which will cause overhead too. Thus this technique can be a life-saver when complex statements compare values from millions of rows in order to update just a few. For simple update statements on numerous rows, row-based replication will cause more overhead, because it has to send all the rows to each slave instead of just a single command.
A general rule is that MySQL replication always adds a little bit of overhead to the master server, and it's definitely not a universal scale-out solution. The best solution is first to try optimizing the application's code for faster MySQL queries, but this is not always feasible. In order not to increase queries' execution times, replication implementation has to be carefully designed.
MySQL replication plays an essential role in avoiding unplanned downtime and ensuring data integrity during and after failover and failback. Recovery and failback requires a reliable, two-way MySQL replication strategy, usually in the form of master/master replication.
In master/master replication, each of two servers is both a master and a slave to the other. To ensure that values in auto increment columns don't overlap from different servers, master/master replication uses a different offset margin on each server.
Replication with two masters is a relatively safe and proven technique with one unique advantage that comes when you have to execute resource-intensive commands to change data, as happens for example when you alter the structure of large tables or update a large number of rows. These kinds of changes are usually associated with application updates. In such cases, under normal circumstances, tables will be locked and unavailable to applications, resulting in downtime. However, with a dual-master replication scheme, you can create a downtime-free scenario:
SET SESSION SQL_LOG_BIN=0
SET SESSION SQL_LOG_BIN=1
Many third-party solutions can enhance MySQL replication for HA and ease the burden of managing MySQL replication during failover and failback. Some are MySQL proxies such as MySQL proxy and GreenSQL; others are complete solutions, such as Tungsten replicator.
When working with MySQL replication, try to follow the KISS principle: Keep it simple, stupid! MySQL's powerful and extensive replication capabilities can be dangerous to your data's integrity. Even in the best case, with the simplest slave, master replication problems may render your reports inaccurate. It's relatively easy to fix such problems (as we will see shortly) or even start a new slave. But imagine a more complex case with load balancing and multiple masters. When the replication between the masters breaks, as it inevitably will, and load balancing continues working, corresponding (or even the same) records on different servers will have different values, even though you have configured the offset options correctly.
For example, imagine a customer makes a purchase at an online store. A record for the purchase is inserted correctly for the customer on one of the servers, but because of replication problems it is not replicated to the other server(s). When the client refreshes she may not see the purchase if her query is executed on a server with broken slave replication. The customer may decide to try purchasing the same item again, causing another record to be created. It's uncertain on which server this record will be created, but in any case it will be a problem later. Now imagine a busy store that handles a large number of purchases in the interval the business needs to fix the replication. Such a bad scenario can lead to refunds and chargebacks, and ruin a store's reputation.
When replication breaks you can see the exact error by running show slave status on the problematic server's MySQL console. The error will be in the column Last_Error in the result.
One common error says, "Could not parse relay log event entry." This error indicates that the relay log file at the slave server is corrupted. This is usually caused by filesystem problems or lack of free space. To fix it, execute on the slave server show slave status and write down the values from the columns Exec_Master_Log_Pos as X and Master_Log_File as Y. Go to the master server and use mysqlbinlog to dump the statements from Master_Log_File into a text file. Open the newly created text file and search for the previously found value X. Write down the next position (pos value) as Z, the value to use as the place to restart the replication from. Finally, on the slave server run change master to master_log_file='Y',master_log_pos=Z, substituting Y and Z with your values.
change master to master_log_file='Y',master_log_pos=Z
You may also see errors when a command cannot be executed on the slave server. The nature of such errors may vary greatly, from breaking constraints (foreign keys) to schema incompatibilities (different tables' structures). First you must address the reason for the error in order to preserve data consistency and integrity between the servers. You may decide that the command can be ignored; even though this is highly unrecommended it can be useful for temporary or non-critical data. In such cases there is a dirty trick to skip the failed statement: stop replication, execute on the slave server SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1, and start replication again. You can also ignore errors with same type number permanently by adding slave-skip-errors=number_of_error to your MySQL configuration file. This is rarely a good idea, but MySQL provides such an option. You can use it if you want to restore replication fast and plan to fix the errors later.
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1
MySQL replication is an extremely useful tool for implementing high availability and improving scalability. It is a complex and powerful tool, however, and has to be managed carefully.
Allowed tags: <a> link, <b> bold, <i> italics