Three threads are involved in replication: one on the master and two on the slave. When START SLAVE is issued, the I/O thread is created on the slave. It connects to the master and asks it to send the queries recorded in its binlogs. Then one thread is created on the master to send these binlogs. This thread is identified by Binlog Dump in SHOW PROCESSLIST output on the master. The I/O thread reads what the master Binlog Dump thread sends and simply copies it to some local files in the slave's data directory called relay logs. The last thread, the SQL thread, is created on the slave; it reads the relay logs and executes the queries it contains.
Note that the master has one thread for each currently connected slave server.
With SHOW PROCESSLIST you can know what is happening on the master and on the slave as regards replication.
The following example illustrates how the three threads show up in SHOW PROCESSLIST. The output format is that used by SHOW PROCESSLIST as of MySQL version 4.0.15, when the content of the State column was changed to be more meaningful compared to earlier versions.
On the master server, the output looks like this:
mysql> SHOW PROCESSLIST\G *************************** 1. row *************************** Id: 2 User: root Host: localhost:32931 db: NULL Command: Binlog Dump Time: 94 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL
On the slave server, the output looks like this:
mysql> SHOW PROCESSLIST\G *************************** 1. row *************************** Id: 10 User: system user Host: db: NULL Command: Connect Time: 11 State: Waiting for master to send event Info: NULL *************************** 2. row *************************** Id: 11 User: system user Host: db: NULL Command: Connect Time: 11 State: Has read all relay log; waiting for the slave I/O thread to update it Info: NULL
Here thread 2 is on the master. Thread 10 is the I/O thread on the slave. Thread 11 is the SQL thread on the slave; note that the value in the Time column can tell how late the slave is compared to the master (see Replication FAQ).
The following list shows the most common states you will see in the State column for the master's Binlog Dump thread. If you don't see this thread on a master server, replication is not running.
Sending binlog event to slave | Binlogs consist of events, where an event is usually a query plus some other information. The thread has read an event from the binlog and is sending it to the slave. |
Finished reading one binlog; switching to next binlog | The thread has finished reading a binlog file and is opening the next one to send to the slave. |
Has sent all binlog to slave; waiting for binlog to be updated | The thread has read all binary log files and is idle. It is waiting for new events to appear in the binary log as a result of new update queries being executed on the master. |
Waiting to finalize termination | A very brief state that happens as the thread is stopping. |
Here are the most common states you will see in the State column for the I/O thread of a slave server. Beginning with MySQL 4.1.1, this state also appears in the Slave_IO_State column of SHOW SLAVE STATUS output. This means that you can get a good view of what is happening by using only SHOW SLAVE STATUS.
Connecting to master | The thread is attempting to connect to the master. |
Checking master version | A very brief state that happens just after the connection to the master is established. |
Registering slave on master | A very brief state that happens just after the connection to the master is established. |
Requesting binlog dump | A very brief state that happens just after the connection to the master is established. The thread sends to the master a request for the contents of its binlogs, starting from the requested binlog filename and position. |
Waiting to reconnect after a failed binlog dump request | If the binlog dump request failed (due to disconnection), the thread goes into this state while it sleeps. The thread sleeps for master-connect-retry seconds before retrying. |
Reconnecting after a failed binlog dump request | Then the thread tries to reconnect to the master. |
Waiting for master to send event | The thread has connected and is waiting for binlog events to arrive. This can last for a long time if the master is idle. If the wait lasts for slave_read_timeout seconds, a timeout will occur. At that point, the thread will consider the connection to be broken and make an attempt to reconnect. |
Queueing master event to the relay log | The thread has read an event and is copying it to the relay log so the SQL thread can process it. |
Waiting to reconnect after a failed master event read | An error occurred while reading (due to disconnection). The thread is sleeping for master-connect-retry seconds before attempting to reconnect. |
Reconnecting after a failed master event read | Then the thread tries to reconnect. When connection is established again, the state will become Waiting for master to send event. |
Waiting for the slave SQL thread to free enough relay log space | You are using a non-zero relay_log_space_limit value, and the relay logs have grown so much that their combined size exceeds this value. The I/O thread is waiting until the SQL thread frees enough space by processing relay log contents so that it can delete some relay log files. |
Waiting for slave mutex on exit | A very brief state that happens as the thread is stopping. |
Here are the most common states you will see in the State column for the SQL thread of a slave server:
Reading event from the relay log | The thread has read an event from the relay log so that it can process it. |
Has read all relay log; waiting for the slave I/O thread to update it | The thread has processed all events in the relay log files and is waiting for the I/O thread to write new events to the relay log. |
Waiting for slave mutex on exit | A very brief state that happens as the thread is stopping. |
The State column for the I/O thread may also show a query string. This indicates that the thread has read an event from the relay log, extracted the query from it and is executing the query.
Before MySQL 4.0.2, the slave I/O and SQL threads were combined as a single thread, and no relay log files were used. The advantage of using two threads is that it separates query reading and query execution into two independent tasks, so the task of reading queries is not slowed down if query execution is slow. For example, if the slave server has not been running for a while, its I/O thread can quickly fetch all the binlog contents from the master when the slave starts, even if the SQL thread lags far behind and may take hours to catch up. If the slave stops before the SQL thread has executed all the fetched queries, the I/O thread has at least fetched everything so that a safe copy of the queries is locally stored in the slave's relay logs for execution when next the slave starts. This allows the binlogs to be purged on the master, because it no longer need wait for the slave to fetch their contents.
By default, relay logs are named using filenames of the form host_name-relay-bin.nnn, where host_name is the name of the slave server host, and nnn is a sequence number. Successive relay log files are created using successive sequence numbers, beginning with 001. The slave keeps track of relay logs currently in use in an index file. The default relay log index filename is host_name-relay-bin.index. By default these files are created in the slave's data directory. The default filenames may be overridden with the --relay-log and --relay-log-index server options.
Relay logs have the same format as binary logs, so they can be read with mysqlbinlog. A relay log is automatically deleted by the SQL thread as soon as it no longer needs it (that is, as soon as it has executed all its events). There is no command to delete relay logs, because the SQL thread takes care of doing so. However, from MySQL 4.0.14, FLUSH LOGS rotates relay logs, which will influence when the SQL thread deletes them.
A new relay log is created under the following conditions:
The first time the I/O thread starts after the slave server starts. (In MySQL 5.0, a new relay log will be created each time the I/O thread starts, not just the first time.)
A FLUSH LOGS statement is issued (4.0.14 and up only).
The size of the current relay log file becomes too big. The meaning of ``too big'' is determined as follows:
max_relay_log_size, if max_relay_log_size > 0
max_binlog_size, if max_relay_log_size = 0 or MySQL is older than 4.0.14
A slave replication server creates additional two small files in the data directory. These files are named master.info and relay-log.info by default. They contain information like that shown in the output of the SHOW SLAVE STATUS statement (see Replication Slave SQL for a description of this command). As disk files they survive slave's shutdown. The next time the slave starts up, it can read these files to know how far it has proceeded in reading binlogs from the master and in processing its own relay logs.
The master.info file is updated by the I/O thread. The correspondance between the lines in the file and the columns displayed by SHOW SLAVE STATUS is as follows:
Line | Description |
1 | Master_Log_File |
2 | Read_Master_Log_Pos |
3 | Master_Host |
4 | Master_User |
5 | Password (not shown by SHOW SLAVE STATUS) |
6 | Master_Port |
7 | Connect_Retry |
The relay-log.info file is updated by the SQL thread. The correspondance between the lines in the file and the columns displayed by SHOW SLAVE STATUS is as follows:
Line | Description |
1 | Relay_Log_File |
2 | Relay_Log_Pos |
3 | Relay_Master_Log_File |
4 | Exec_Master_Log_Pos |
When you back up your slave's data, you should back up these 2 small files as well, along with the relay log files. because they are needed to resume replication after you restore the slave's data. If you lose the relay logs but still have the relay-log.info file, you can check it to determine how far the SQL thread has executed in the master binlogs. Then you can use CHANGE MASTER TO with the MASTER_RELAY_LOG and MASTER_RELAY_POS options to tell the slave to re-read the binlogs from that point. This requires that the binlogs still exist on the master server.
If your slave is subject to replicating LOAD DATA INFILE statements, you should also backup the SQL_LOAD-* files that may exist in the directory that the slave uses for this purpose. The slave needs these files to resume replication of any interrupted LOAD DATA INFILE statements. The directory location is specified using the --slave-load-tmpdir option. Its default value if not specified is the value of the tmpdir variable.