Archive for the ‘binary logs’ Category

Binary log file size matters (sometimes)

Май 24th, 2012

I used to think one should never look at max_binlog_size, however last year I had a couple of interesting cases which showed that sometimes it may be very important variable to tune properly. I meant to write about it earlier but never really had a chance to do it. I have it now!

One of our customers was complaining that the database would lock up at random times and then it would go back to normal in just a few seconds. This was MySQL 5.0 running MyISAM/InnoDB mix, not heavily loaded. We used pt-stalk (at that time it was aspersa stalk) trying to figure out what is happening, however all we found was a spike in writes, many queries piled up and looking at the system process list it was quite obvious that page flush daemon was acting out. I/O Pattern was rather awkward – here is an output from pt-diskstats (again, aspersa diskstats at that time):

 rd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt    wr_s wr_avkb wr_mb_s wr_mrg wr_cnc   wr_rt busy in_prg
 24.0    13.0     0.2     0%    1.4    59.3  1089.2   195.8   104.1    95%  100.1    91.9 100%      1
 14.0     8.0     0.1     0%    0.1     5.4    10.0   246.4     1.2    97%    0.0     2.0   8%      0
 36.0     9.3     0.2     0%    0.2     4.2     6.0   109.3     0.3    93%    0.0     0.2  15%      0
 12.0     8.7     0.1     0%    0.1     4.2     6.0   101.3     0.3    92%    0.0     0.0   5%      0

You can see that (a) reads are affected – response time dropped significantly and (b) the reason appears to be enormous amount of writes. For a while I was really puzzled, looking for answers in MySQL status counters, innodb status, memory info and what not. It seemed that the problem was gone before pt-stalk would kick-off and all I could see was symptoms. Indeed the case it was. I was not allowed to run pmp or oprofile on that system due to potential performance impact. However, somehow I have noticed that pt-stalk kick-off time matched that of binary logs modification time:

-rw-r--r-- 1 root root 91 Nov 2 15:15 2011_11_02_15_15_13-trigger
-rw-r--r-- 1 root root 91 Nov 2 16:17 2011_11_02_16_17_20-trigger
-rw-r--r-- 1 root root 91 Nov 2 17:38 2011_11_02_17_38_22-trigger
-rw-r--r-- 1 root root 91 Nov 3 08:36 2011_11_03_08_36_00-trigger
...
-rw-rw---- 1 mysql mysql 1073742171 Nov 2 15:15 /var/lib/mysqllogs/db1-bin-log.003229
-rw-rw---- 1 mysql mysql 1073742976 Nov 2 16:17 /var/lib/mysqllogs/db1-bin-log.003230
-rw-rw---- 1 mysql mysql 1073742688 Nov 2 17:38 /var/lib/mysqllogs/db1-bin-log.003231
-rw-rw---- 1 mysql mysql 1073742643 Nov 2 20:06 /var/lib/mysqllogs/db1-bin-log.003232
-rw-rw---- 1 mysql mysql 1073742974 Nov 3 00:10 /var/lib/mysqllogs/db1-bin-log.003233
-rw-rw---- 1 mysql mysql 1073742566 Nov 3 08:35 /var/lib/mysqllogs/db1-bin-log.003234

What is happening here?

Apparently, every time current binary log gets close to max_binlog_size in size, new binary log is created and, if expire_logs_days is not zeroish, log files older than expire_logs_days are removed. &LOCK_log mutex is held for the duration of the binary log removal and if your file system is slow at deleting files (both customers systems ran on ext3, which can be very slow deleting large files), then write operations in MySQL will suffer since that very same mutex is used inside MYSQL_LOG::write() call. Even though it only lasts 1-2s, it still gives users unstable response time – here’s how that looks on one of the customer system – this one was doing ~10k qps at a time:

# mysqladmin ext -i1 | grep Threads_running
...
| Threads_running                   | 2              |
| Threads_running                   | 2              |
| Threads_running                   | 2              |
| Threads_running                   | 2              |
| Threads_running                   | 3              |
| Threads_running                   | 3              |
| Threads_running                   | 2              |
| Threads_running                   | 2              |
| Threads_running                   | 2              |
| Threads_running                   | 3              |
| Threads_running                   | 5              |
| Threads_running                   | 4              |
| Threads_running                   | 3              |
| Threads_running                   | 2              |
| Threads_running                   | 3              |
| Threads_running                   | 108            |
| Threads_running                   | 174            |
| Threads_running                   | 5              |
| Threads_running                   | 2              |
| Threads_running                   | 2              |
| Threads_running                   | 3              |
| Threads_running                   | 2              |

A small blip, yet over 100 users are backlogged for a couple of seconds – not great user experience, considering this happens every 10-15 minutes.

Solution?

Here’s what we did: we have reduced the size of binary log file from default 1GB (some systems have it set to 100MB in their default my.cnf) down to 50MB and we never saw this problem ever again. Now the files were removed much more often, they were smaller and didn’t take that long to remove.

Note that on ext4 and xfs you should not have this problem as they would remove such files much faster.


PlanetMySQL Voting: Vote UP / Vote DOWN

Setting up Master-Slave Replication with MySQL

Март 12th, 2011
Replication enables data from one MySQL server to be replicated on one or more other MySQL servers. Replication is mostly used as scale-out solution. In such a solution, all writes and updates take place on the master server, while reads take place on one or more slaves. This model is actually known as master-slave replication and this is the kind of replication that I will be setting up in this post.
PlanetMySQL Voting: Vote UP / Vote DOWN

Purging binary logs.

Январь 13th, 2010
Being a MySQL DBA , one faces a common issue in replication environment -> Disk space issue on master, since the number of binary logs have increased.
Now, one of the solution to this would be using expire_logs_days parameter in your mysql config file.
But what if, the slave is lagging by few hours or if the slave is broken since few days and the binary logs are removed due to the parameter set. Whenever the salve comes up, it will go bonkers, knowing that the binary log where it last stopped no more exists.

I faced this issue a couple of times until I decided to automate it using a script. Herewith I am attaching the URL to my python script which can run regularly in cron.
Features :
  • Checks the slaves connected to the master (I have limit it to 3 for now.)
  • Checks the last binary log file which is being used by the slave.
  • All the binary logs until the last bin log used by slave are purged.
  • If a slave is not connected, purging is aborted, so that important bin logs are not purged.
You can download this script from here .

Tips and Warnings to use this script
  • Test a couple of times, different test cases, before using this script on critical or production databases.
  • It is advisable to take backup of data before running this script.
  • You can send the output of this script to syslog or to different mail addresses.
  • You can embed into you alerting system, so that whenever there is a disk space warning on the machine, this scrip is fired.

PlanetMySQL Voting: Vote UP / Vote DOWN