I/O Thread delay trick

Submitted by jay on October 22, 2009 - 8:32am

  I was debugging some repl delay monitoring metrics and I noticed that the test I was doing (sysbench --test=oltp prepare) to generate replication data was far outstripping the slave.  The SQL thread was caught up to the IO thread, but the IO thread was way behind the master.
    Replicating from: 
    Master:                     a2_db_bcp_re1.000166/138395515
    Slave I/O:          Yes     a2_db_bcp_re1.000165/802640907  ???
    Slave Relay:        Yes     a2_db_bcp_re1.000165/802030586  596K
  198 secs


  In this case, the I/O thread was getting further and further behind as sysbench did bulk inserts into my master.  My theory is that a lot of relatively small binary log records simply don't transfer efficiently.  That leaves the SQL thread idle some of the time waiting for the IO thread, and leads it inefficient replication.
   I poked around the replication options manual page, looking for something to help and found this:  slave_compressed_protocol
  Hmm, looks promising.  I did 'SET GLOBAL slave_compressed_protocol=ON' on both the master and slave, did a 'SLAVE STOP; SLAVE START' and suddenly things looked a lot better:
    Replicating from: 
    Master:                     a2_db_bcp_re1.000166/145629546
    Slave I/O:          Yes     a2_db_bcp_re1.000165/???
    Slave Relay:        Yes     a2_db_bcp_re1.000165/819116830  186M

    Replicating from: 
    Master:                     a2_db_bcp_re1.000166/148889364
    Slave I/O:          Yes     a2_db_bcp_re1.000166/125999252  22M
    Slave Relay:        Yes     a2_db_bcp_re1.000165/829490621  ???

    Replicating from: 
    Master:                     a2_db_bcp_re1.000166/152763939
    Slave I/O:          Yes     a2_db_bcp_re1.000166/152634478  126K
    Slave Relay:        Yes     a2_db_bcp_re1.000165/841084858  ???
Suddenly the IO thread is on the same binlog and close to the same position as the master, while the SQL thread is behind.  This doesn't catch up the slave, since the INSERTS still need to run, but at least the SQL thread is running as fast as possible and not bumping into the I/O thread.
Now this does come at a cost of some CPU on the master and slave, but it doesn't seem like a huge amount.  How does the compression algorithm work?  I have no idea.  But it does seem to work.  


Trackback URL for this post:

How far away from the master

How far away from the master to the slave? Looks like the latency between master and slave is long. With a long latency, compress could help. But with short latency, e.g, master and slave are sitting in the same place, I doubt it could help much as in this case the cpu overhead is more than the benifit we can get by transfer less data.

jay's picture

Master to slave was 80ms in

Master to slave was 80ms in this case, I believe. Definitely agree that it could be only relevant in a high latency scenario.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

More information about formatting options

Mollom CAPTCHA (play audio CAPTCHA)
Type the characters you see in the picture above; if you can't read them, submit the form and a new image will be generated. Not case sensitive.


Comment abuse is not tolerated on this site, besides all the comments are moderated, so don't bother posting comments that are not on topic, only for increasing the SEO of your site, or are outright spam.  If you've got something intelligent to contribute, by all means, post a link to your blog.  

About Me

Jay Janssen
Yahoo!, Inc.
jayj at yahoo dash inc dot com
High Availability
Global Load Balancing
View Jay Janssen on Twitter  View Jay Janssen's LinkedIn profile View Jay Janssen's Facebook profile