Archive for the ‘production’ Category

Color code your performance numbers

Июнь 2nd, 2010

When analyzing how good or bad response time is it is not handy to look at the averages, min or max times - something what is easily computed using built in aggregate functions. We most likely would like to see some percentile numbers - 95 percentile or 99 percentile. The problem is computing these in SQL is ugly and resource consuming. There is however a simple way to get similar data, looking at it from the different point of view.

When we're speaking about application we may not always care about exact value of response time but rather we want to see response time to be within certain range. For example if we define page feels good if response time is below 50ms it is not as important if response was 40ms or 44ms - it is much more important how frequently this goal was reached.

In fact I prefer to define 2 performance level. One what users would consider good performance, another what is acceptable, and basically the third level is unacceptable. We can also set classical color codes to them - green, yellow, red.

For example we can define good performance as response time for search pages within 1 second, acceptable as response within 3 seconds and anything over that will be considered unacceptable:

SQL:
  1. mysql> SELECT sum(wtime<1.0)/count(*)*100 green_pct, sum(wtime>=1.0 AND wtime<3.0)/count(*)*100 yellow_pct, sum(wtime>=3.)/count(*)*100 red_pct FROM performance_log_100601 WHERE page_type='search';
  2. +-----------+------------+---------+
  3. | green_pct | yellow_pct | red_pct |
  4. +-----------+------------+---------+
  5. |   99.6701 |     0.30060.0293 |
  6. +-----------+------------+---------+
  7. 1 row IN SET (24.87 sec)

These are great numbers to look after. In this case we can say system has "two nines" in terms of performance - it responds with what we consider "good" performance in over 99% cases. I consider this a lot more meaningful number for business than speaking about 99% response time, especially as it also clearly clarifies how bad the problem is. If your goal is 99% response time is 1sec and you're seeing it 2 seconds what you really do not know is how many people fall in this 1-sec to 2 sec range. It could be very small number and so your performance is not that bad or it could be majority of response times are withing 1 to 2 sec range.

Next thing I would add is the average response time for our good requests. This eliminates slow outliers which we just count by their number and allows to see how good our performance really is. I consider it secondary value as our major goal is eliminating variance as defined by Six Sigma practices. However given variance is the same better average response time for good requests helps us to track performance changes closely and act before we get large number of yellow or red requests:

SQL:
  1. mysql> SELECT avg(IF(wtime<1.0,wtime,NULL)) avg_green,sum(wtime<1.0)/count(*)*100 green_pct, sum(wtime>=1.0 AND wtime<3.0)/count(*)*100 yellow_pct, sum(wtime>=3.0)/count(*)*100 red_pct FROM performance_log_100601 WHERE page_type='search';
  2. +-------------------+-----------+------------+---------+
  3. | avg_green         | green_pct | yellow_pct | red_pct |
  4. +-------------------+-----------+------------+---------+
  5. | 0.448726461517477 |   99.6701 |     0.30060.0293 |
  6. +-------------------+-----------+------------+---------+
  7. 1 row IN SET (6.84 sec)

Finally I find it helpful to track it among time period. For example in this case we're looking for data grouped by 15 min interval though you may be looking at different intervals too:

SQL:
  1. mysql> SELECT from_unixtime(unix_timestamp(logged) div 900 * 900) period, avg(IF(wtime<1.0,wtime,NULL)) avg_green,sum(wtime<1.0)/count(*)*100 green_pct, sum(wtime>=1.0 AND wtime<3.0)/count(*)*100 yellow_pct, sum(wtime>=3.0)/count(*)*100 red_pct FROM performance_log_100601 WHERE page_type='search' GROUP BY 1;
  2. +---------------------+-------------------+-----------+------------+---------+
  3. | period              | avg_green         | green_pct | yellow_pct | red_pct |
  4. +---------------------+-------------------+-----------+------------+---------+
  5. | 2010-05-31 20:00:00 | 0.454738099705166100.0000 |     0.00000.0000 |
  6. | 2010-05-31 20:15:00 | 0.477789820622135100.0000 |     0.00000.0000 |
  7. | 2010-05-31 20:30:00 | 0.463134796179216100.0000 |     0.00000.0000 |
  8. | 2010-05-31 20:45:00 | 0.397884420995359 |   99.4595 |     0.54050.0000 |
  9. | 2010-05-31 21:00:00 | 0.438939880065877100.0000 |     0.00000.0000 |
  10. | 2010-05-31 21:15:000.48698966072074100.0000 |     0.00000.0000 |
  11. | 2010-05-31 21:30:00 | 0.432073892663709 |   93.6364 |     6.36360.0000 |
  12. | 2010-05-31 21:45:00 | 0.483174517098134100.0000 |     0.00000.0000 |
  13. | 2010-05-31 22:00:00 | 0.449507013872618100.0000 |     0.00000.0000 |
  14. | 2010-05-31 22:15:00 | 0.327130372130607100.0000 |     0.00000.0000 |
  15. | 2010-05-31 22:30:00 | 0.431361319540545100.0000 |     0.00000.0000 |
  16. | 2010-05-31 22:45:00 | 0.480885540467242100.0000 |     0.00000.0000 |

Note response time does not always correlates with number of bad requests. For example we can see at one interval there were over 6% requests in yellow but average response time for good requests actually was better than in the other intervals.

The queries which produce this data may not be overly beautiful but they are pretty fast requiring simple scan of the data range.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Color code your performance numbers

Июнь 2nd, 2010

When analyzing how good or bad response time is it is not handy to look at the averages, min or max times - something what is easily computed using built in aggregate functions. We most likely would like to see some percentile numbers - 95 percentile or 99 percentile. The problem is computing these in SQL is ugly and resource consuming. There is however a simple way to get similar data, looking at it from the different point of view.

When we're speaking about application we may not always care about exact value of response time but rather we want to see response time to be within certain range. For example if we define page feels good if response time is below 50ms it is not as important if response was 40ms or 44ms - it is much more important how frequently this goal was reached.

In fact I prefer to define 2 performance level. One what users would consider good performance, another what is acceptable, and basically the third level is unacceptable. We can also set classical color codes to them - green, yellow, red.

For example we can define good performance as response time for search pages within 1 second, acceptable as response within 3 seconds and anything over that will be considered unacceptable:

SQL:
  1. mysql> SELECT sum(wtime<1.0)/count(*)*100 green_pct, sum(wtime>=1.0 AND wtime<3.0)/count(*)*100 yellow_pct, sum(wtime>=3.)/count(*)*100 red_pct FROM performance_log_100601 WHERE page_type='search';
  2. +-----------+------------+---------+
  3. | green_pct | yellow_pct | red_pct |
  4. +-----------+------------+---------+
  5. |   99.6701 |     0.30060.0293 |
  6. +-----------+------------+---------+
  7. 1 row IN SET (24.87 sec)

These are great numbers to look after. In this case we can say system has "two nines" in terms of performance - it responds with what we consider "good" performance in over 99% cases. I consider this a lot more meaningful number for business than speaking about 99% response time, especially as it also clearly clarifies how bad the problem is. If your goal is 99% response time is 1sec and you're seeing it 2 seconds what you really do not know is how many people fall in this 1-sec to 2 sec range. It could be very small number and so your performance is not that bad or it could be majority of response times are withing 1 to 2 sec range.

Next thing I would add is the average response time for our good requests. This eliminates slow outliers which we just count by their number and allows to see how good our performance really is. I consider it secondary value as our major goal is eliminating variance as defined by Six Sigma practices. However given variance is the same better average response time for good requests helps us to track performance changes closely and act before we get large number of yellow or red requests:

SQL:
  1. mysql> SELECT avg(IF(wtime<1.0,wtime,NULL)) avg_green,sum(wtime<1.0)/count(*)*100 green_pct, sum(wtime>=1.0 AND wtime<3.0)/count(*)*100 yellow_pct, sum(wtime>=3.0)/count(*)*100 red_pct FROM performance_log_100601 WHERE page_type='search';
  2. +-------------------+-----------+------------+---------+
  3. | avg_green         | green_pct | yellow_pct | red_pct |
  4. +-------------------+-----------+------------+---------+
  5. | 0.448726461517477 |   99.6701 |     0.30060.0293 |
  6. +-------------------+-----------+------------+---------+
  7. 1 row IN SET (6.84 sec)

Finally I find it helpful to track it among time period. For example in this case we're looking for data grouped by 15 min interval though you may be looking at different intervals too:

SQL:
  1. mysql> SELECT from_unixtime(unix_timestamp(logged) div 900 * 900) period, avg(IF(wtime<1.0,wtime,NULL)) avg_green,sum(wtime<1.0)/count(*)*100 green_pct, sum(wtime>=1.0 AND wtime<3.0)/count(*)*100 yellow_pct, sum(wtime>=3.0)/count(*)*100 red_pct FROM performance_log_100601 WHERE page_type='search' GROUP BY 1;
  2. +---------------------+-------------------+-----------+------------+---------+
  3. | period              | avg_green         | green_pct | yellow_pct | red_pct |
  4. +---------------------+-------------------+-----------+------------+---------+
  5. | 2010-05-31 20:00:00 | 0.454738099705166100.0000 |     0.00000.0000 |
  6. | 2010-05-31 20:15:00 | 0.477789820622135100.0000 |     0.00000.0000 |
  7. | 2010-05-31 20:30:00 | 0.463134796179216100.0000 |     0.00000.0000 |
  8. | 2010-05-31 20:45:00 | 0.397884420995359 |   99.4595 |     0.54050.0000 |
  9. | 2010-05-31 21:00:00 | 0.438939880065877100.0000 |     0.00000.0000 |
  10. | 2010-05-31 21:15:000.48698966072074100.0000 |     0.00000.0000 |
  11. | 2010-05-31 21:30:00 | 0.432073892663709 |   93.6364 |     6.36360.0000 |
  12. | 2010-05-31 21:45:00 | 0.483174517098134100.0000 |     0.00000.0000 |
  13. | 2010-05-31 22:00:00 | 0.449507013872618100.0000 |     0.00000.0000 |
  14. | 2010-05-31 22:15:00 | 0.327130372130607100.0000 |     0.00000.0000 |
  15. | 2010-05-31 22:30:00 | 0.431361319540545100.0000 |     0.00000.0000 |
  16. | 2010-05-31 22:45:00 | 0.480885540467242100.0000 |     0.00000.0000 |

Note response time does not always correlates with number of bad requests. For example we can see at one interval there were over 6% requests in yellow but average response time for good requests actually was better than in the other intervals.

The queries which produce this data may not be overly beautiful but they are pretty fast requiring simple scan of the data range.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

On Good Instrumentation

Июнь 1st, 2010

In so many cases troubleshooting applications I keep thinking how much more efficient things could be going if only there would be a good instrumentation available. Most of applications out there have very little code to help understand what is going on and if it is there it is frequently looking at some metrics which are not very helpful.

If you look at the system from bird eye view – system needs to process transactions and you want it to successfully complete large number of transactions it gets (this is what called availability) and we want it to serve them with certain response time, which is what is called performance. There could be many variables in environment which change – load, number of concurrent users, database, the way users use the system but in the nutshell all what you really care is having predictable response time within certain range. So if we care about response time – this is exactly what our instrumentation should measure

Response Time Summary We want to understand where exactly response time comes from. For example if we define transaction as the time it took to generate HTML page we want to understand how much time was spent waiting on the database, memcache, other external services, as well as how much CPU time it consumed.

Now what is important we need this information for individual transactions. It may be every transaction which is best and easily achievable for small-medium systems or at least for large enough sample. It is very important this information is available for individual transactions not the average. Average is useless because 100 transactions taking 1 sec and 99 transactions having 1ms and 1 taking 99.1 sec will have the same average while for sake of performance analyzes these are completely different. When you have transaction sample make sure it contains fair population of transaction – getting only transactions which are slow is not helpful as we might want to compare them to the fast transactions to understand why they are slow.

What kind of components do you need to have in response time summary – all components which are significant enough. If your instrumentation has 95% of response time unaccounted for it is useless. You also want blocks not to mix apples with oranges. For example “mysql and memcache” block would not be helpful. Even further I would prefer to split “mysql time” in the “connect time” and “query time” as there are situations when one but not other would be affected.

In is important for response time summary stored in the logs which are easy to query so you can analyze data in a lot of various ways. Sometimes you may find the response time is impacted by queries from certain user, in others it may be attributed to different application/web server.

The goal for Response Time Summary is to quickly point direction where problem happens. Whenever you have spike in response time or it is bad response time for certain kind of request you can quickly understand where does it come from ? Is it wasted CPU time Slow response from MySQL or Memcache.

I also like to see numbers of calls stored together with attributed response time. For example I’d like to see number of mysql calls in addition to MySQL response time. This helps to understand if it is the issue with number of queries or their performance. If I see 2 queries taking 30 seconds it is clearly slow queries. If it is 10.000 queries executed and total response time is 4 sec I know it is pretty much as good as it gets with standard Ethernet network and finding a ways to reduce number of queries is going to be more helpful.

The Glue Our applications involve multiple layers and typically higher layer can only report response time it took to call lower layer, but not the reason for that response time. For example we can report time it took to execute SQL query from PHP application side, but we can’t say why it has taken so long. Was it row level lock ? waiting on disk IO or was it simply question of burning a lot of CPU. On the other hand this information may be available in the instrumentation stats from that lower layer – for example in MySQL Query Log. What is important is however to be able to connect the data from these logs – glue them together. The easy way to do it is to provide an unique identifier to all requests and put it in the logs with request of the lower levels. With mySQL the simple way to do it is to put it in the comments for queries you execute.

Optional Tracing The information in lower layers logs is very helpful however it typically have two problems. First not every layer has good logs. For example if you’re running memcached you probably do not have the logs detailing all requests and their response time. Second – the lower layer may only know response time from its vantage point, which in many cases does not include network communication time which can be very important.

Tracing should be optional and normally applied to the small sample of requests, though it needs to be detailed. Typically you would include the calls to the lower level services together with timestamp, the response summary with timestamp again. The information about request has to be complete enough to identify target action and response completely. For example if I’m speaking about memcache I’d like to know which server:port request was issued to which key was requested, and on response I’d like to know if it was hit, miss or error.

The way I use it may be as follows. I see the increased response time for given kind of request. I see response time is coming from MySQL. I check the number of MySQL Queries and it is 5x when it usually is for this kind of request. Looking at memcache stats I can see high number of misses. Looking at some available traces shows the server memcache01 has very high miss ratio. Checking what is going on with memcache01 shows it just was restarted (and hence has almost empty cache). This is important example as it shows your increased response time from MySQL may not have anything to do with MySQL itself but you would not know unless you’re capturing the right data.

If you’re looking for nice example framework for instrumentation, check out instrumentation for PHP – It has everything mentioned by tracing which is trivial to add.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

On Good Instrumentation

Июнь 1st, 2010

In so many cases troubleshooting applications I keep thinking how much more efficient things could be going if only there would be a good instrumentation available. Most of applications out there have very little code to help understand what is going on and if it is there it is frequently looking at some metrics which are not very helpful.

If you look at the system from bird eye view – system needs to process transactions and you want it to successfully complete large number of transactions it gets (this is what called availability) and we want it to serve them with certain response time, which is what is called performance. There could be many variables in environment which change – load, number of concurrent users, database, the way users use the system but in the nutshell all what you really care is having predictable response time within certain range. So if we care about response time – this is exactly what our instrumentation should measure

Response Time Summary We want to understand where exactly response time comes from. For example if we define transaction as the time it took to generate HTML page we want to understand how much time was spent waiting on the database, memcache, other external services, as well as how much CPU time it consumed.

Now what is important we need this information for individual transactions. It may be every transaction which is best and easily achievable for small-medium systems or at least for large enough sample. It is very important this information is available for individual transactions not the average. Average is useless because 100 transactions taking 1 sec and 99 transactions having 1ms and 1 taking 99.1 sec will have the same average while for sake of performance analyzes these are completely different. When you have transaction sample make sure it contains fair population of transaction – getting only transactions which are slow is not helpful as we might want to compare them to the fast transactions to understand why they are slow.

What kind of components do you need to have in response time summary – all components which are significant enough. If your instrumentation has 95% of response time unaccounted for it is useless. You also want blocks not to mix apples with oranges. For example “mysql and memcache” block would not be helpful. Even further I would prefer to split “mysql time” in the “connect time” and “query time” as there are situations when one but not other would be affected.

In is important for response time summary stored in the logs which are easy to query so you can analyze data in a lot of various ways. Sometimes you may find the response time is impacted by queries from certain user, in others it may be attributed to different application/web server.

The goal for Response Time Summary is to quickly point direction where problem happens. Whenever you have spike in response time or it is bad response time for certain kind of request you can quickly understand where does it come from ? Is it wasted CPU time Slow response from MySQL or Memcache.

I also like to see numbers of calls stored together with attributed response time. For example I’d like to see number of mysql calls in addition to MySQL response time. This helps to understand if it is the issue with number of queries or their performance. If I see 2 queries taking 30 seconds it is clearly slow queries. If it is 10.000 queries executed and total response time is 4 sec I know it is pretty much as good as it gets with standard Ethernet network and finding a ways to reduce number of queries is going to be more helpful.

The Glue Our applications involve multiple layers and typically higher layer can only report response time it took to call lower layer, but not the reason for that response time. For example we can report time it took to execute SQL query from PHP application side, but we can’t say why it has taken so long. Was it row level lock ? waiting on disk IO or was it simply question of burning a lot of CPU. On the other hand this information may be available in the instrumentation stats from that lower layer – for example in MySQL Query Log. What is important is however to be able to connect the data from these logs – glue them together. The easy way to do it is to provide an unique identifier to all requests and put it in the logs with request of the lower levels. With mySQL the simple way to do it is to put it in the comments for queries you execute.

Optional Tracing The information in lower layers logs is very helpful however it typically have two problems. First not every layer has good logs. For example if you’re running memcached you probably do not have the logs detailing all requests and their response time. Second – the lower layer may only know response time from its vantage point, which in many cases does not include network communication time which can be very important.

Tracing should be optional and normally applied to the small sample of requests, though it needs to be detailed. Typically you would include the calls to the lower level services together with timestamp, the response summary with timestamp again. The information about request has to be complete enough to identify target action and response completely. For example if I’m speaking about memcache I’d like to know which server:port request was issued to which key was requested, and on response I’d like to know if it was hit, miss or error.

The way I use it may be as follows. I see the increased response time for given kind of request. I see response time is coming from MySQL. I check the number of MySQL Queries and it is 5x when it usually is for this kind of request. Looking at memcache stats I can see high number of misses. Looking at some available traces shows the server memcache01 has very high miss ratio. Checking what is going on with memcache01 shows it just was restarted (and hence has almost empty cache). This is important example as it shows your increased response time from MySQL may not have anything to do with MySQL itself but you would not know unless you’re capturing the right data.

If you’re looking for nice example framework for instrumentation, check out instrumentation for PHP – It has everything mentioned by tracing which is trivial to add.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Should I buy a Fast SSD or more memory?

Апрель 6th, 2010

While a scale-out solution has traditionally been popular for MySQL, it’s interesting to see what room we now have to scale up – cheap memory, fast storage, better power efficiency.  There certainly are a lot of options now – I’ve been meeting about a customer/week using Fusion-IO cards.  One interesting choice I’ve seen people make however, is buying an SSD when they still have a lot of pages read/second – I would have preferred to buy memory instead, and use the storage device for writes.

Here’s the benchmark I came up with to confirm if this is the case:

  • Percona-XtraDB-9.1 release
  • Sysbench OLTP workload with 80 million rows (about 18GB worth of data+indexes)
  • XFS Filesystem mounted with nobarrier option.
  • Tests run with:
    • RAID10 with BBU over 8 disks
    • Intel SSD X25-E 32GB
    • FusionIO 320GB MLC
  • For each test, run with a buffer pool of between 2G and 22G (to test performance compared to memory fit).
  • Hardware was our Dell 900 (specs here).

To start with, we have a test on the RAID10 storage to establish a baseline.  The Y axis is transactions/second (more is better), the X axis is the size of innodb_buffer_pool_size:

Let me point out three interesting characteristics about this benchmark:

  • The A arrow is when data fits completely in the buffer pool (best performance). It’s important to point out that once you hit this point, a further increase in memory at all.
  • The B arrow is where the data just started to exceed the size of the buffer pool.  This is the most painful point for many customers – because while memory decreased by only ~10% the performance dropped by 2.6 times!  In production this usually matches the description of “Last week everything was fine.. but it’s just getting slower and slower!”.  I would suggest that adding memory is by far the best thing to do here.
  • The C arrow shows where data is approximately three times the buffer pool.  This is an interesting point to zoom in on – since you may not be able to justify the cost of the memory, but an SSD might be a good fit:

Where the C arrow was, in this graph a Fusion-IO card improves performance by about five times (or 2x with an Intel SSD).  To get the same improvement with memory, you would have needed to add 60% more memory -or- 260% more memory for a 5x improvement.  Imagine a situation where your C point is when you have 32GB of RAM and 100GB of data.  Than it gets interesting:

  • Can you easily add another 32G RAM (are your memory slots already filled?)
  • Does your budget allow to install SSD cards? (You may still need more than one, since they are all relatively small.  There are already appliances on the market which use 8 Intel SSD devices).
  • Is a 2x or 5x improvement enough?  There are more wins to be had if you can afford to buy all the memory that is required.

The workload here is designed to keep as much of the data hot as possible, but I guess the main lesson here is not to underestimate the size of your “active set” of data.  For some people who just append data to some sort of logging table it may only need to be a small percentage – but in other cases it can be considerably higher.  If you don’t know what your working set is – ask us!

Important note: This graph and these results are valid only for sysbench uniform. In your particular workload the points B and C may be located in differently.

Raw results:

Buffer pool, GB FusionIO Intel SSD RAID 10
2 450.3 186.33 80.67
4 538.19 230.35 99.73
6 608.15 268.18 121.71
8 679.44 324.03 201.74
10 769.44 407.56 252.84
12 855.89 511.49 324.38
14 976.74 664.38 429.15
16 1127.23 836.17 579.29
18 1471.98 1236.9 934.78
20 2536.16 2485.63 2486.88
22 2433.13 2492.06 2448.88

Entry posted by Vadim | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Missleading Innodb message on recovery

Февраль 16th, 2010

As I wrote about 2 years ago the feature of Innodb to store copy of master’s position in Slave’s Innodb tablespace got broken. There is a lot of discussions at the corresponding bug report while outcome of the fix remained uncertain for me (the bug is market duplicate while the bugs it seems to be duplicate for describe different issues).
Anyway. The customer came to me today having the following message in the error log after Slave crash while running MySQL 5.1.41 (extra virgin version without XtraDB or Percona Patches)

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 10000000, file name mysql-bin.005000
InnoDB: Last MySQL binlog file position 0 20000000, file name ./mysql-bin.003000

He has tried to restart replication from mysql-bin.005000 position 10000000 which failed with “Could not find first log file name in binary log index file” error message

Looking at the Master for this slave I could see its binary log files going only to about 2000, so the binary log file mentioned did not ever exist on this master. What is going on ?
The thing is Innodb does not update this information any more however if it is stored in the tablespace the code is still there to print it. This database was running older MySQL version a while back which was updated to MySQL 5.1 months ago, moved to the new hardware by physical copy and the log file numbers restarted back from 1 but tablespace still contained the ancient data.

I reported the bug on this which should be easy to fix. Otherwise it is easy mistake to make. We also have a patch which restores this information and uses it on slave crash recovery.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Why Swapping is bad for MySQL Performance ?

Январь 19th, 2010

So you get MySQL or other applications using too much memory on the box or OS behaving funny and using more memory for cache and pushing application to the swap. This causes swapping and causes performance problems. This much is obvious. But how bad is it ? Should you count it same as normal Disk IO as the box is having or is it worse than that ?

Swapping is going to impact your performance more than just normal IO and here are 3 reasons why. If you know more please let me know, for my taste these 3 are bad enough so I have not been looking for more.

Cache in the Swap File will multiply IO compared to just having less cache. What happens when page in cache is replaced which is swapped out itself ? First you have to find space to swap in the page (we’re speaking about memory pressure right?) which means swapping out some page. This would normally happen in background but still it has to be done. When the pages is swapped in which is second IO and finally you get page being cached read from the hard drive to the cache. This gives us 3 IOs instead of one. Nice :)
Flushing dirty pages or even discarding the page will cause extra IO slowing things down.

Skewing up all Algorithms The database internals algorithms are tuned for things being in memory and if they start dealing with data which is on disk they just often stop working with any reasonable level of efficiently – when database deals with on disk data it often uses different set of algorithms which are optimized to limit number of IOs or make them more sequential. Most of these were designed before SSD era. For example Insert Buffer in MySQL makes a special effort to avoid (delay) IO. If it happens to go to swap file it will more than defeat its purpose. Number of background threads are designed with assumptions they can check page statuses in buffer pool very efficiently which again stops working as soon as page accesses cause disk IO.

Escalated Locking/Latching If breaking of internal operations is not bad enough on its own lets see what swapping does for concurrent (multi CPU, multi client) processing. Database Locks/Latches are typically designed to be held for as short time as possible. The less portion of execution time thread spends holding exclusive locks the better system will scale. It is a big no-no to hold any critical locks while you’re doing disk IO as IOs take a long time. When system is swapping all of these gets messed up – when database is thinking it is taking the locks for few instructions only it can be a long while while IO completes – if this is critical lock it is possible to see everything in the system waiting on this IO, even transactions which work with data which is not in the swap file.

The bottom line: You should configure system so no swapping activity is going on during normal operations. The swap file itself may be justified – if you have some unexpected memory consumption spike you may prefer to see slowdown instead of MySQL being killed because of out of memory but do react on them promptly and do not treat such situation as normal.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Production scripts: sign me up for obfuscation

Январь 2nd, 2010
If there’s one thing that will always make me angry, it’s people that should not be editing my code going and editing my code. If you want to change something on the server and you have sudo privileges please let the real admin know beforehand. I don’t mind people improving processes or scripts but if [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

Verify master-master[||-slave] data consistency without locking or downtime

Август 31st, 2009

We all knew that we are risking with MMM. Risking, and placing availability as a more important like consistency.  But non of us can risk loosing data forever but we show using it, regarding to our conversations think:  "I can fix my data later on, but I can’t turn back time and prevent the downtime. (Pascal Hoffman@xaprb.com)".

As I wrote before about staying online, now let me write about how to stay consistent.

We all know, mmm is not like a key of salvation, but its getting close to it :) . While MySQL doesn't support multi-master-slave environments from it's source code, we will sleep badly wondering on the safety of our precious databases.

But its not just about MMM, a few days ago we ran in to a well known InnoDB "feature". Its about the auto increment counter determination on restart. InnoDB try to count the next auto increment value on MySQL restart what can screw up things in the replication as in your data integrity too (http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html) what could be risky when you are about to use mmm, because you can restart your masters whenever you just want to.

In this post I'm using maatkit to verify and restore the rows without locking or downtime.

Maatkit:

This is not a success of MMM, but success of maatkit.
I don't want to write about maatkit. Everybody who is able to use MMM were met with maatkit. Great stuff and if you didn't met with it yet, this is the right time. http://www.maatkit.org

Verification between your masters:

This is the hardest part, you CAN'T and/or you SHOULD NOT to avoid manual overview.

I'm using mk-table-sync for this scenerio. As the about say: mk-table-sync finds and fixes data differences between MySQL tables.
This is what we are going to do, but lets consider the possibilities. If we goes trough the help menu, we can see there is a lots of options. Worth going trough of them.

Here is the command I used to execute.

mk-table-sync --chunksize 100 --databases mytestdb --lock 1 --transaction --skipbinlog --print  db02 db01

And here is the explain:

--chunksize 100 : Chunksize means the size of the chunks maatkit compare to each other. Mean rows. In this command maatkit compare the tables by chunks which are contain 100 rows. Its an affordable size, Its wont be faster and as I benchmarked neider become slower, but as you will see later, its start a transaction and lock the table. More about locking and transactions later but to understand why I recommend a small chunksize, you have to know that this means that the locking of the table is it in will be a short lock.

--database mytestdb: Allows you to give a comma-separated list of databases should be taken care of. In this example I use only one database but as I mentioned you can pass a list of databases you want to scan or what you don't want to scan.

--lock 1: As I mentioned before, maatkit provide you two kind of "locking". The first one is a real table lock and the second one is the way of using transactions. With innodb tables using transaction is a good choice, more about later, but with MyIsam tables, I think the best way is lock 1 regarding to what I want to do here, syncing online masters to each other. mk-table-sync provide 3 different ways of locking here: "0=none, 1=per sync cycle, 2=per table, or 3=globally" as far as I know, 0 means no locks, thats not a good way if you are about to sync online masters under traffic. Locking each table could be a wrong decision too, in case of a larger table, and globally is the same under usage. So this is why I just picked lock 1 what will lock while it's take care of my chunk (this is why I lowered the chunk size to 100).

--transaction: This is what i was talking about above. Instead of locking, mk-table-sync able to use transactions . Starting a transaction with a larger chunk size could cause lock-wait-timeout messages as locking can. This is why there is a small chunk size again. You have to listen to your isolation level too. As you know, its could be unsafe under read-committed.

--skipbinlog: This is the most important part of it. In a multi-master environment syncing without this option is like a suicide. With this, mk-table-sync will execute the fixing queries without logging in to binlog. I should not to mention, that logging to binlog on the slave could really mess up your databases as the other master will replicate and execute it.

--print: mk-table-sync provide three different (and some of them able to combine) ways to run. First I would recommend is --print. Print will just print to the stdout (what you can forward to file with 1>/your/file/path) but when you redirect the stdout, don't forget when using with --ask-pass that this message won't show up for your :) . You can use --execute instead. That will execute the queries immediately. I'm always scared by that a little bit so I rather get the output, go over it and execute what I think I should.

db02/db01: In my example these two hosts are the writable master (at first) and the passive master as a second one.You can use whatever hosts you want. Commands will be executed on the second host.

What to do with your output: When its done and its find differences, you can find where ever your stdout goes. In thet file or on your screen you can see the queries what mk-table-sync would execute to make your passive master consistent. Now its up to you what to do with this information. You can execute each of the or re-run the sync with --execute instead of --print.

Syncing slaves

If you done with your masters, you can sync the slaves on the same way, but you can use the already consistent passive master of your to sync from. Both maatkit and mmm provide some very useful scripts to rebuild slaves and you always got a passive master to do from. With this you are free to rebuild your cluster whenever you want.


PlanetMySQL Voting: Vote UP / Vote DOWN