I recently started exploring the wonders of mk-table-checksum after spending too long dismissing the magic-like maatkit toolkit. What follows is not an exhaustive treatment of mk-table-checksum, but just an overview and some things I had to learn by trying the tool out.
The basic principle is based on the CHECKSUM TABLE table maintenance statement supported in the MySQL Server. The mk-table-checksum runs a CHECKSUM TABLE command on all (or some) tables. The results of that are stored in another table on the server.
This, by itself, isn't that interesting. What is interesting is the --replicate option, which replicates the statements that generate the checksums and inserts the data into a checksum table to all the slaves as well. The trick of this is that this is statement based replication; the statement replication tells each slave to calculate the CHECKSUM, independently. When the statements are fully replicated, each mysql server (master or slave) will have it's own version of the checksum table, based off the contents of the table on that mysql instance.
Hence, simply comparing that checksum table on all of the servers will quickly show you if there are differences in table contents. Here are some command line examples that I assembled.
mk-table-checksum -u failover -ppass --create-replicate --replicate mysql_health.checksums --ignore-databases mysql,mysql_health --empty-replicate-table --chunk-size=10M masterhost.domain.com
This command creates the checksum table in 'mysql_health.checksums'. The chunk-size option breaks the tables down into roughly 10M chunks and checksums those chunks. When this command finishes, the checksums are generated on the master. You'll need to wait around time_to_run_on_master * (hops_to_furthest_slave + 1) for it to be fully propagated. Then you can run this:
mk-table-checksum -u failover -ppass --create-replicate --replicate mysql_health.checksums --ignore-databases mysql,mysql_health --empty-replicate-table --chunk-size=10M --replicate-check 2 masterhost.domain.com
This command (with probably too many options) checks up to 2 levels deep on the replication tree; that is that all slaves up to 2 levels away from the given master are checked for their checksum table and compared with the master. Only differences are reported.
Now if this were a live master actively taking writes, I may be concerned about other writes being interspersed in between the checksums, potentially affecting the checksum results differently on the slave compared with the master. There are ways to deal with this, but the --lock option is at least one option, which locks the tables on the master as they are checksummed. The obvious downfall of this is that your master can't take writes on a locked table. The perldoc for mk-table-checksum has some good discussion on various ways to handle this.
Once you have some idea of what is wrong, you can fix it with some more maatkit magic:
# Print output onlymk-table-sync -u failover -ppass --print --sync-to-master --replicate mysql_health.checksums slavehost.domain.com# execute fixesmk-table-sync -u failover -ppass --execute --sync-to-master --replicate mysql_health.checksums slavehost.domain.com
Will this work perfectly in all circumstances? I have no idea. I do know that I did a random delete of about 100 rows on a test table of mine on a slave, and this seemed to fix it. It's magic!