Archive for the ‘benchmarks’ Category

Testing Intel SSD 520

Май 2nd, 2012

Following my previous benchmarks of SATA SSD cards I got Intel SSD 520 240GB into my hands. In this post I show the results of raw IO performance of this card.

The benchmark methodology I described in previous posts, so let me jump directly to results.

First case is random write asynchronous 8 threads IO, the test is done just after a secure erase operation on the card.

The card is doing stable 380 MiB/sec level, but after around 4000 sec, as garbage collector kicks in, we see a performance drop to around 300 MiB/sec with some instability, which I will research in later charts.

Now, random reads, still asynchronous

It gives almost stable 370 MiB/sec throughput, with some strange small periodic drops.

To better understand response time ranges, we need to switch to synchronous IO and vary amount of threads.

Throughput:

And response times:

We still see small hiccups in throughput and response times even for small amount of threads.
For 8 threads the 95% response time is 0.69ms.

Now let me get back to random write case. I will try synchronous IO varying amount of threads and with measurements every 1 sec to see how bad are drops.

So there is more or less stable performance only for 1 thread. For 2 or more, the throughput varies a lot from second to second. I draw boxplots, which show 25-50-75 percentiles. So there is no grow in throughput after 2 threads, and the result averages at 300 MiB/sec.

I am still interesting in asynchronous IO, as MySQL 5.5 uses async IO for writes. Maybe 8 threads in the first graph is too much and we should go with 1 thread?

So even with 1 async write thread the throughput jumps a lot in range 200 – 400 MiB/sec.

As conclusion, I should say that 300 MiB/sec level for random reads and writes is very decent result for SATA card. I think with this performance SATA is getting closer to level of PCIe cards. Of course PCIe still provides better numbers, but the question is how much MySQL can use. In his keynote Mark Callaghan mentioned that Fusion-io cards they use are highly underutilized.

With the performance variance we see it is a good question how does it affect MySQL performance, and I am going to run some MySQL workloads on these cards to understand it better.

If you are interested more in SSD and MySQL questions – I will be giving a webinary “MySQL and SSD” on May-9. It will be the same as my talk on Percona Live MySQL Conference 2012, if you did not attend my talk – you are welcome to join the webinar.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing STEC SSD MACH16 200GB SLC

Апрель 26th, 2012

Following my previous benchmark of Samsung 830, today I want to show results for STEC MACH16 SATA card, 200GB size, this card is based on SLC, and regarding STEC website, it is an enterprise grade storage.

For tests I use sysbench fileio, 16KiB block size (to match workload from InnoDB, as this is primary usage for me), and recently I switched to use async IO mode. There are two reasons for that. First, MySQL/InnoDB uses async writes, so this will emulate database load, and second, async mode allows to see maximal possible throughput, it does not show reliable latency though, as it appears there is no a reliable way in the Linux asynchronous IO library to get time metrics for particular IO block.

so my testing command line looks like:

sysbench --test=fileio --file-total-size=${size}G --file-test-mode=rndwr --max-time=18000 --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=64 --file-io-mode=async --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=16384 --report-interval=10 run

You may see I gather metrics every 10 sec to see how stable the performance is, and it really helps to observe some artifacts, as you will see in following graphs.

Hardware for tests: HP ProLiant DL380 G6, filesystem: ext4, mounted with nobarrier.

The results for random write case (8 async IO threads):

In general it shows stable throughput topping to 148 MiB/sec, but every 20 sec, there is small drop to 87 MiB/sec, which I guess is related to internal garbage collector activity.

The results for random read case:

Very stable throughput on line 222 MiB/sec

To understand better what kind of response time we should expect, I ran random read sync IO mode, now for 1-64 threads.

The throughput:

We are getting to the peak throughput at 8 threads.

And response time:

For 8 threads, we may expect 0.62ms response time.

In general I have very good experience with this card, and it seems suitable to work with MySQL. I will publish sysbench oltp benchmarks running MySQL on RAID10 over 4 STEC MACH16 cards.

If you are interested more in SSD and MySQL questions – I will be giving a webinary “MySQL and SSD” on May-9. It will be the same as my talk on Percona Live MySQL Conference 2012, if you did not attend my talk – you are welcome to join the webinar.

Disclaimer: This benchmark is done as part of consulting work for STEC, but this post is totally independent and fully reflects our opinion.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing Samsung SSD SATA 256GB 830 – not all SSD created equal

Апрель 25th, 2012

I personally like PCIe based Flash, but from a pricing point our customers are looking for cheaper alternatives. SATA SSD is an options. There is many products based on MLC technology, and Intel 320 I would say is the most popular. I do not particularly like its write performance – I wrote about it before, that’s why I am looking for comparable alternatives. Samsung 830 256GB looked like a good product, that’s why I decided to test it.

For tests I use sysbench fileio, 16KiB block size (to match workload from InnoDB, as this is primary usage for me), and recently I switched to use async IO mode. There are two reasons for that. First, MySQL/InnoDB uses async writes, so this will emulate database load, and second, async mode allows to see maximal possible throughput, it does not show reliable latency though, as it appears there is no a reliable way in the Linux asynchronous IO library to get time metrics for particular IO block.

so my testing command line looks like:

sysbench --test=fileio --file-total-size=${size}G --file-test-mode=rndwr --max-time=18000 --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=64 --file-io-mode=async --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=16384 --report-interval=10 run

You may see I gather metrics every 10 sec to see how stable the performance is, and it really helps to observe some artifacts, as you will see in following graphs.

Hardware for tests: HP ProLiant DL380 G6, filesystem: ext4, mounted with nobarrier.

The results for random write case (8 async IO threads):

It seems that InnoDB is not alone with its flashing problems. You can see there periodical stalls in throughput (0 throughput for 20-30 sec period of time). When there is no drops, the drive keep write throughput on 323 MiB/sec level.

I really thought that these stalls are related, so I was totally surprised them in random reads also.
The results for random read case:

I do not have a good explanation for this. When there is no drop, the drive keeps 375 MiB/sec throughput. I may do a wild guess about drops – the drive periodically cleans an internal cache or something.

To understand better what kind of response time we should expect, I ran random read sync IO mode, now for 1-64 threads.

The throughput:

We are getting to the peak throughput at 16-32 threads.

And response time:

For 16 threads, we may expect 0.96ms response time, which increases to 1.62ms under 32 threads.

The periodic drops that I observe for both random reads and random writes do not allow me to recommend this drive for a database server usage, even in general this drive provides much better throughput than Intel 320 (some results for Intel 320).

If you are interested more in SSD and MySQL questions – I will be giving a webinary “MySQL and SSD” on May-9. It will be the same as my talk on Percona Live MySQL Conference 2012, if you did not attend my talk – you are welcome to join the webinar.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Benchmarks challenges of XtraDB Cluster

Апрель 20th, 2012

We are running internally a lot of benchmarks on our recently announced Percona XtraDB Cluster, and I am going to publish these results soon.
But before that I wanted to mention that proper benchmark of distributed system comes with a lot of challenges.
I am saying that not to complain, but to make sure, if you are going to benchmark XtraDB Cluster yourself, there is a lot of things to take into account.

And it seems that one component, which was not much important before, now appears as critical peace, which easily can became bottleneck in the benchmarks – this is network.

In case of simple client-server setup, the network is not fully utilized.

But as we start testing a cluster setup, the 1Gb network between client and switch is getting fully utilized by sysbench communication with 3 nodes.

In this setup it does not make sense to increase number of nodes, as we will not be able to load them properly.

The solution would be to increase network capacity or add additional client boxes.

Now take into account that there is an internal network communication between nodes also, and that makes a network tuning as the critical part of a cluster setup. This is not something we paid much attention before.

The main conclusion of this post is that if you are going to benchmark a Percona XtraDB Cluster or just use it under intensive communication workload, pay an attention to network component. It is very easy that a client or a client network becomes bottleneck.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Join Optimizations in MySQL 5.6 and MariaDB 5.5

Апрель 5th, 2012

This is the third blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is targeted at the join related optimizations introduced in the optimizer. These optimizations are available in both MySQL 5.6 and MariaDB 5.5, and MariaDB 5.5 has introduced some additional optimizations which we will also look at, in this post.

Now let me briefly explain these optimizations.

Batched Key Access

Traditionally, MySQL always uses Nested Loop Join to join two or more tables. What this means is that, select rows from first table participating in the joins are read, and then for each of these rows an index lookup is performed on the second table. This means many point queries, say for example if table1 yields 1000 rows then 1000 index lookups are performed on table2. Of course this could mean a lot of random lookups in case the dataset does not fit into memory or if the caches are not warm, this also does not take into account locality of access, as you could be reading the same page multiple times. Consider if you have a small buffer pool and a very large number of rows from table2 that do not fit into the buffer pool, then worst case you could be reading the same page multiple times into the buffer pool.
So considering this drawback, Batched Key Access (BKA) optimization was introduced. When BKA is being used then, after the selected rows are read from table1, the values of indexed columns that will be used to perform a lookup on table2 are batched together and sent to the storage engine. The storage engine then uses the MRR interface (which I explained in my previous post) to lookup the rows in table2. So this means that we have traded many point index lookups to one or more index range lookups. This means MySQL can employ many other optimizations like for example if columns other then the secondary key columns are required to be fetched, then MRR interface can instead access the PK by arranging the secondary key values by PK column and then performing the lookup, and then there are other possibilities like InnoDB doing read_ahead by noticing the sequential access pattern.
BKA is available in both MySQL 5.6 and MariaDB 5.5. You can read more about BKA in MySQL 5.6 here and BKA in MariaDB 5.5 here.

However, MariaDB 5.5 has one additional optimization that is used in conjunction with BKA and that is MRR Key-ordered Scan. Let’s see what this optimization actually is.

Key-ordered Scan for BKA

With this optimization the idea of MRR is further extended to improve join performance. As I told you above, when table t1 would be joined to table t2, then selected rows from t1 would be read and then for all rows, index lookup would be performed on t2. Here is where Key-ordered scan comes in. With BKA turned on, the index lookups to t2 are batched together and sent to the MRR interface. Now when the MRR interface is going to perform the lookup on t2 by a secondary key on the common join column, let’s suppose col1, then with Key-ordered scan, the secondary key on col1 would be sorted by col1 i.e. in index order and then the lookup will be performed. So key-ordered scan is basically an extension of MRR to secondary keys. You can read more about Key-ordered Scan for BKA here.

Now there is one additional join optimization in MariaDB that I would like to quickly explain here, and that is Hash Joins.

Hash Join

As I have told before MySQL has only supported one join algorithm and that is Nested Loop Join. MariaDB has introduced a new join algorithm Hash Join. This join algorithm only works with equi-joins. Now let me briefly explain how hash join algorithm works. Suppose you have two tables t1 and t2, that you wish to join together in an equi-join, then the way hash join algorithm will operate is that first a hash table will be created based on the columns of table t1 that are two be used to join rows with table t2. This step is called the build step. After the hash table has been created, rows from table t2 are read and hash function is applied to the columns participating in the join condition and then a hash table lookup is performed, on the hash table we created earlier. This step is known as probe step. This join algorithm works best for highly selective queries, and obviously when the first operand used in the build step is such that the hash table fits in memory. You can read more about the hash join algorithm here.

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #3 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made on MySQL 5.6 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='batched_key_access=on'
join_buffer_size=6M
read_rnd_buffer_size=6M

Also note that the following changes were made on MariaDB 5.5 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=8
join_buffer_size=6M
mrr_buffer_size=6M

Note that I have turned off ICP because I would like to see the affect of BKA on the query times. But I also had to turn on MRR because BKA uses the MRR interface. Note also the additional optimizer switches and variables in case of MariaDB 5.5. You can read more about these variables here.

The query used is:

select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem FORCE INDEX (i_l_orderkey)
where
        c_mktsegment = 'AUTOMOBILE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-09'
        and l_shipdate > '1995-03-09'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

In-memory workload

Now let’s see how effective are the join optimizations when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Now let’s take a look at the graph:

In the graph above the labels mean as follows:
MySQL 5.6 (2) is for MySQL 5.6 w/ buffer size 6M
MariaDB 5.5 (2) is for MariaDB 5.5 w/ buffer size 6M
MariaDB 5.5 (3) is for MariaDB 5.5 w/ buffer size 6M (Hash Join and Key-ordered Scan disabled)

You can see that the lowest query time is for MySQL 5.6 which takes 0.16s less as compared to MySQL 5.5 While with join_buffer_size set to 6M and read_rnd_buffer_size set to 6M, the query time for MySQL 5.6 becomes approximately equal to that of MySQL 5.5. MariaDB 5.5 is quite slow as compared to both MySQL 5.5 and MySQL 5.6. For MariaDB 5.5 the query time decreases when the join_buffer_size is set to 6M and mrr_buffer_size is set to 6M. While the lowest query time for MariaDB 5.5 is when both the Hash Join and the Key-ordered Scan are disabled.

IO bound workload

Now let’s see how effective are the join optimizations when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already. Now let’s take a look at the graph:

The labels in the graph above are same as in the graph for in-memory workload.

BKA makes a huge difference when the workload is IO bound and the query time drops from 2534.41s down to under a minute. MySQL 5.6 has the smallest query time when the join_buffer_size is set to 6M and the read_rnd_buffer_size is set to 6M. MariaDB 5.5 is close, yet slower by ~10s. Another thing that is important to know is that, just increasing the join_buffer_size is not going to provide the best possible query performance. As I mentioned at the start that BKA uses the MRR interface so you should also adjust/increase the size of the buffer that MRR uses appropriately. In my tests I noticed that with just the join_buffer_size increased to 6M, the query time was ~300s while when both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size were set to 6M, the query time dropped to ~40s. So the maximum possible benefit is when size of both the buffers is increased appropriately. Also in the chart above the bottom two bars correspond to MariaDB with Hash Join and Key-ordered Scan enabled, and MariaDB with Hash Join and Key-ordered Scan disabled, and the only difference in query time is 48.78s vs 48.91s, so I don’t see Hash Join and Key-ordered Scan making much of a difference here.

Now let’s take a look at the status counters.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ join_buffer_size=6M & read_rnd_buffer_size=6M MariaDB 5.5 MariaDB 5.5 w/ join_buffer_size=6M & mrr_buffer_size=6M MariaDB 5.5 Hash Join Disabled w/ join_buffer_size=4M & mrr_buffer_size=4M
Created_tmp_disk_tables 0 0 0 0 0 0
Created_tmp_tables 1 1 1 1 1 1
Handler_mrr_init N/A 112 3 1133 1 1
Handler_mrr_key_refills N/A N/A N/A 0 0 0
Handler_mrr_rowid_refills N/A N/A N/A 22 0 0
Handler_read_key 1829910 4440511 4372096 1801917 1790641 1805995
Handler_read_next 2651500 2628103 2586036 2628103 2592816 2615612
Handler_read_rnd_next 23078 22780 22757 22867 23049 23221
Innodb_buffer_pool_read_ahead 1152 23231 130919 23228 130731 131497
Innodb_buffer_pool_read_requests 12947073 10228611 7816154 10251697 9319281 9393396
Innodb_buffer_pool_reads 327963 332092 12889 335080 14067 13384
Innodb_data_read 5G 5.4G 2.2G 5.4G 2.2G 2.2G
Innodb_data_reads 329115 355323 143808 335526 16164 15506
Innodb_pages_read 329115 355323 143808 358308 144798 144881
Innodb_rows_read 4127318 6718351 6611675 6707882 5479445 5527245
Select_scan 1 1 1 1 1 1
Sort_scan 1 1 1 1 1 1

The first obvious improvement is shown by the high numbers of Innodb_buffer_pool_read_ahead when the buffers are sized appropriately, which shows that the IO access pattern has been changed to become sequential. The two other most important numbers are values for Innodb_buffer_pool_reads and Innodb_data_read. We can see that with appropriately sized buffers less no. of Innodb_buffer_pool_reads are done, and less amount of data is read from disk, in fact half the amount of data is read from disk 2.2G vs 5G. However, there is one number in MariaDB 5.5 that is quite large as compared to MySQL 5.6 and that is ‘Handler_mrr_init’. Is it because of this that the query on MariaDB 5.5 is slow as compared to MySQL 5.6. Next interesting thing are the last two columns of the table above and the values for ‘Handler_read_key’, ‘Handler_read_next’ and ‘Handler_read_rnd_next’. MariaDB 5.5 with Hash Joins enabled is doing less Handler reads as compared to when the Hash Joins are disabled, but that did not make much of a difference in the query times.

Conclusion

BKA improves the query time by a huge margin for IO bound workload but does not make much of a difference to in-memory workload. Also BKA relies on both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size and both of these buffers should be increased appropriately for the best possible performance gain. This is not entirely visible in the manual either for MariaDB or MySQL, but you need to appropriately increase read_rnd_buffer_size/mrr_buffer_size because these have an impact on MRR performance, and BKA uses the MRR interface, so these buffers indirectly impact BKA performance. I did not find much of a performance improvement from using Hash Join in MariaDB 5.5 or from Key-ordered Scan for TPCH query #3, in fact disabling both of these provided the best result for MariaDB 5.5 for in-memory workload.

I intend to run tests to see what specific types of queries would benefit from Hash Join as compared to Nested Loop Join, but for now it looks like Nested Loop Join is a much better general purpose join algorithm.


PlanetMySQL Voting: Vote UP / Vote DOWN

Join Optimizations in MySQL 5.6 and MariaDB 5.5

Апрель 5th, 2012

This is the third blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is targeted at the join related optimizations introduced in the optimizer. These optimizations are available in both MySQL 5.6 and MariaDB 5.5, and MariaDB 5.5 has introduced some additional optimizations which we will also look at, in this post.

Now let me briefly explain these optimizations.

Batched Key Access

Traditionally, MySQL always uses Nested Loop Join to join two or more tables. What this means is that, select rows from first table participating in the joins are read, and then for each of these rows an index lookup is performed on the second table. This means many point queries, say for example if table1 yields 1000 rows then 1000 index lookups are performed on table2. Of course this could mean a lot of random lookups in case the dataset does not fit into memory or if the caches are not warm, this also does not take into account locality of access, as you could be reading the same page multiple times. Consider if you have a small buffer pool and a very large number of rows from table2 that do not fit into the buffer pool, then worst case you could be reading the same page multiple times into the buffer pool.
So considering this drawback, Batched Key Access (BKA) optimization was introduced. When BKA is being used then, after the selected rows are read from table1, the values of indexed columns that will be used to perform a lookup on table2 are batched together and sent to the storage engine. The storage engine then uses the MRR interface (which I explained in my previous post) to lookup the rows in table2. So this means that we have traded many point index lookups to one or more index range lookups. This means MySQL can employ many other optimizations like for example if columns other then the secondary key columns are required to be fetched, then MRR interface can instead access the PK by arranging the secondary key values by PK column and then performing the lookup, and then there are other possibilities like InnoDB doing read_ahead by noticing the sequential access pattern.
BKA is available in both MySQL 5.6 and MariaDB 5.5. You can read more about BKA in MySQL 5.6 here and BKA in MariaDB 5.5 here.

However, MariaDB 5.5 has one additional optimization that is used in conjunction with BKA and that is MRR Key-ordered Scan. Let’s see what this optimization actually is.

Key-ordered Scan for BKA

With this optimization the idea of MRR is further extended to improve join performance. As I told you above, when table t1 would be joined to table t2, then selected rows from t1 would be read and then for all rows, index lookup would be performed on t2. Here is where Key-ordered scan comes in. With BKA turned on, the index lookups to t2 are batched together and sent to the MRR interface. Now when the MRR interface is going to perform the lookup on t2 by a secondary key on the common join column, let’s suppose col1, then with Key-ordered scan, the secondary key on col1 would be sorted by col1 i.e. in index order and then the lookup will be performed. So key-ordered scan is basically an extension of MRR to secondary keys. You can read more about Key-ordered Scan for BKA here.

Now there is one additional join optimization in MariaDB that I would like to quickly explain here, and that is Hash Joins.

Hash Join

As I have told before MySQL has only supported one join algorithm and that is Nested Loop Join. MariaDB has introduced a new join algorithm Hash Join. This join algorithm only works with equi-joins. Now let me briefly explain how hash join algorithm works. Suppose you have two tables t1 and t2, that you wish to join together in an equi-join, then the way hash join algorithm will operate is that first a hash table will be created based on the columns of table t1 that are two be used to join rows with table t2. This step is called the build step. After the hash table has been created, rows from table t2 are read and hash function is applied to the columns participating in the join condition and then a hash table lookup is performed, on the hash table we created earlier. This step is known as probe step. This join algorithm works best for highly selective queries, and obviously when the first operand used in the build step is such that the hash table fits in memory. You can read more about the hash join algorithm here.

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #3 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made on MySQL 5.6 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='batched_key_access=on'
join_buffer_size=6M
read_rnd_buffer_size=6M

Also note that the following changes were made on MariaDB 5.5 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=8
join_buffer_size=6M
mrr_buffer_size=6M

Note that I have turned off ICP because I would like to see the affect of BKA on the query times. But I also had to turn on MRR because BKA uses the MRR interface. Note also the additional optimizer switches and variables in case of MariaDB 5.5. You can read more about these variables here.

The query used is:

select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem FORCE INDEX (i_l_orderkey)
where
        c_mktsegment = 'AUTOMOBILE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-09'
        and l_shipdate > '1995-03-09'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

In-memory workload

Now let’s see how effective are the join optimizations when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Now let’s take a look at the graph:

In the graph above the labels mean as follows:
MySQL 5.6 (2) is for MySQL 5.6 w/ buffer size 6M
MariaDB 5.5 (2) is for MariaDB 5.5 w/ buffer size 6M
MariaDB 5.5 (3) is for MariaDB 5.5 w/ buffer size 6M (Hash Join and Key-ordered Scan disabled)

You can see that the lowest query time is for MySQL 5.6 which takes 0.16s less as compared to MySQL 5.5 While with join_buffer_size set to 6M and read_rnd_buffer_size set to 6M, the query time for MySQL 5.6 becomes approximately equal to that of MySQL 5.5. MariaDB 5.5 is quite slow as compared to both MySQL 5.5 and MySQL 5.6. For MariaDB 5.5 the query time decreases when the join_buffer_size is set to 6M and mrr_buffer_size is set to 6M. While the lowest query time for MariaDB 5.5 is when both the Hash Join and the Key-ordered Scan are disabled.

IO bound workload

Now let’s see how effective are the join optimizations when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already. Now let’s take a look at the graph:

The labels in the graph above are same as in the graph for in-memory workload.

BKA makes a huge difference when the workload is IO bound and the query time drops from 2534.41s down to under a minute. MySQL 5.6 has the smallest query time when the join_buffer_size is set to 6M and the read_rnd_buffer_size is set to 6M. MariaDB 5.5 is close, yet slower by ~10s. Another thing that is important to know is that, just increasing the join_buffer_size is not going to provide the best possible query performance. As I mentioned at the start that BKA uses the MRR interface so you should also adjust/increase the size of the buffer that MRR uses appropriately. In my tests I noticed that with just the join_buffer_size increased to 6M, the query time was ~300s while when both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size were set to 6M, the query time dropped to ~40s. So the maximum possible benefit is when size of both the buffers is increased appropriately. Also in the chart above the bottom two bars correspond to MariaDB with Hash Join and Key-ordered Scan enabled, and MariaDB with Hash Join and Key-ordered Scan disabled, and the only difference in query time is 48.78s vs 48.91s, so I don’t see Hash Join and Key-ordered Scan making much of a difference here.

Now let’s take a look at the status counters.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ join_buffer_size=6M & read_rnd_buffer_size=6M MariaDB 5.5 MariaDB 5.5 w/ join_buffer_size=6M & mrr_buffer_size=6M MariaDB 5.5 Hash Join Disabled w/ join_buffer_size=4M & mrr_buffer_size=4M
Created_tmp_disk_tables 0 0 0 0 0 0
Created_tmp_tables 1 1 1 1 1 1
Handler_mrr_init N/A 112 3 1133 1 1
Handler_mrr_key_refills N/A N/A N/A 0 0 0
Handler_mrr_rowid_refills N/A N/A N/A 22 0 0
Handler_read_key 1829910 4440511 4372096 1801917 1790641 1805995
Handler_read_next 2651500 2628103 2586036 2628103 2592816 2615612
Handler_read_rnd_next 23078 22780 22757 22867 23049 23221
Innodb_buffer_pool_read_ahead 1152 23231 130919 23228 130731 131497
Innodb_buffer_pool_read_requests 12947073 10228611 7816154 10251697 9319281 9393396
Innodb_buffer_pool_reads 327963 332092 12889 335080 14067 13384
Innodb_data_read 5G 5.4G 2.2G 5.4G 2.2G 2.2G
Innodb_data_reads 329115 355323 143808 335526 16164 15506
Innodb_pages_read 329115 355323 143808 358308 144798 144881
Innodb_rows_read 4127318 6718351 6611675 6707882 5479445 5527245
Select_scan 1 1 1 1 1 1
Sort_scan 1 1 1 1 1 1

The first obvious improvement is shown by the high numbers of Innodb_buffer_pool_read_ahead when the buffers are sized appropriately, which shows that the IO access pattern has been changed to become sequential. The two other most important numbers are values for Innodb_buffer_pool_reads and Innodb_data_read. We can see that with appropriately sized buffers less no. of Innodb_buffer_pool_reads are done, and less amount of data is read from disk, in fact half the amount of data is read from disk 2.2G vs 5G. However, there is one number in MariaDB 5.5 that is quite large as compared to MySQL 5.6 and that is ‘Handler_mrr_init’. Is it because of this that the query on MariaDB 5.5 is slow as compared to MySQL 5.6. Next interesting thing are the last two columns of the table above and the values for ‘Handler_read_key’, ‘Handler_read_next’ and ‘Handler_read_rnd_next’. MariaDB 5.5 with Hash Joins enabled is doing less Handler reads as compared to when the Hash Joins are disabled, but that did not make much of a difference in the query times.

Conclusion

BKA improves the query time by a huge margin for IO bound workload but does not make much of a difference to in-memory workload. Also BKA relies on both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size and both of these buffers should be increased appropriately for the best possible performance gain. This is not entirely visible in the manual either for MariaDB or MySQL, but you need to appropriately increase read_rnd_buffer_size/mrr_buffer_size because these have an impact on MRR performance, and BKA uses the MRR interface, so these buffers indirectly impact BKA performance. I did not find much of a performance improvement from using Hash Join in MariaDB 5.5 or from Key-ordered Scan for TPCH query #3, in fact disabling both of these provided the best result for MariaDB 5.5 for in-memory workload.

I intend to run tests to see what specific types of queries would benefit from Hash Join as compared to Nested Loop Join, but for now it looks like Nested Loop Join is a much better general purpose join algorithm.


PlanetMySQL Voting: Vote UP / Vote DOWN

Multi Range Read (MRR) in MySQL 5.6 and MariaDB 5.5

Март 21st, 2012

This is the second blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is aimed at the optimizer enhancement Multi Range Read (MRR). Its available in both MySQL 5.6 and MariaDB 5.5

Now let’s take a look at what this optimization actually is and what benefits it brings.

Multi Range Read

With traditional secondary index lookups, if the columns that are being fetched do not belong to the secondary index definition (and hence covering index optimization is not used), then primary key lookups have to be performed for each secondary key entry fetched. This means that secondary key lookups for column values that do not belong to the secondary index definition can result in a lot of Random I/O. The purpose of MRR is to reduce this Random I/O and make it more sequential, by having a buffer in between where secondary key tuples are buffered and then sorted by the primary key values, and then instead of point primary key lookups, a range lookup is performed on the primary key by using the sorted primary key values.

Let me give you a simple example. Suppose you have the following query executed on the InnoDB table:

SELECT non_key_column FROM tbl WHERE key_column=x

This query will roughly be evaluated in following steps, without MRR:

  1. SELECT key_column, pk_column FROM tbl WHERE key_column=x ORDER BY key_column
    (Note that secondary keys in InnoDB contain primary key columns)
  2. For each pk_column value in step 1 do:
    SELECT non_key_column FROM tbl WHERE pk_column=val

As you can see that the values returned from Step 1 are sorted by the secondary key column ‘key_column’, and then for each value of ‘pk_column’ which is a part of the secondary key tuple, a point primary key lookup is made against base table, the number of these point primary key lookups will be depend on the number of rows that match the condition ‘key_column=x’. You can see that there are a lot of random primary key lookups made.

With MRR, then steps above are changed to the following:

  1. SELECT key_column, pk_column FROM tbl WHERE key_column=x ORDER BY key_column
    (Note that secondary keys in InnoDB contain primary key columns)
  2. Buffer each pk_column value fetched from step 1, and when the buffer is full sort them by pk_column, and do a range primary key lookup as follows:
    SELECT non_key_column from tbl WHERE pk_column IN (…)

As you can see by utilizing the buffer for sorting the secondary key tuples by pk_column, we have converted a lot of point primary key lookups to one or more range primary key lookup. Thereby, converting Random access to one or more sequential access. There is one another interesting thing that has come up here, and that is the importance of the size of the buffer used for sorting the secondary key tuples. If the buffer size is large enough only a single range lookup will be needed, however if the buffer size is small as compared to the combined size of the secondary key tuples fetched, then the number of range lookups will be:

CEIL(S/N)
where,
S is the combined size of the secondary key tuples fetched, and
N is the buffer size.

In MySQL 5.6 the buffer size used by MRR can be controlled by the variable read_rnd_buffer_size, while MariaDB introduces a different variable to control the MRR buffer size mrr_buffer_size. Both buffer sizes default to 256K in MySQL 5.6 and MariaDB 5.5 respectively, which might be low depending on your scenario.

You can read more about the MRR optimization available in MySQL 5.6 here:
http://dev.mysql.com/doc/refman/5.6/en/mrr-optimization.html
and as available in MariaDB 5.5 here:
http://kb.askmonty.org/en/multi-range-read-optimization

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #10 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). I did not use Scale Factor of 40 (InnoDB dataset size ~95G), because the query was taking far too long to execute, ~11 hours in case of MySQL 5.5 and ~5 hours in case of MySQL 5.6 and MariaDB 5.5. Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made in the MySQL config:
optimizer_switch=’index_condition_pushdown=off’
optimizer_switch=’mrr=on’
optimizer_switch=’mrr_sort_keys=on’ (only on MariaDB 5.5)
optimizer_switch=’mrr_cost_based=off’
read_rnd_buffer_size=4M (only on MySQL 5.6)
mrr_buffer_size=4M (only on MariaDB 5.5)

We have turned off ICP optimization for the purpose of this particular benchmark, because we want to see the individual affect of an optimization (where possible). Also note that we have turned off mrr_cost_based, this is because the cost based algorithm used to calculate the cost of MRR when the optimizer is choosing the query execution plan, is not sufficiently tuned and it is recommended to turn this off.

The query used is:

select
        c_custkey,
        c_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
from
        customer,
        orders,
        lineitem,
        nation
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate >= '1993-08-01'
        and o_orderdate < date_add( '1993-08-01' ,interval '3' month)
        and l_returnflag = 'R'
        and c_nationkey = n_nationkey
group by
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
order by
        revenue desc
LIMIT 20;

In-memory workload

Now let's see how effective is MRR when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Ok so now let's take a look at the graph:

MRR doesn't really make any positive difference to the query times for MySQL 5.6, when the workload fits entirely in memory, because there is no extra cost for memory access at random locations versus memory access at sequential locations. In fact there is extra cost added by the buffering step introduced by MRR, and hence, there is a slight increase in query time for MySQL 5.6, increase of 0.02s. But the query times for MariaDB 5.5 are greater than both MySQL 5.5 and MySQL 5.6

IO bound workload

Now let's see how effective is MRR when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already:

MRR does make a lot of difference when the workload is IO bound, the query time is decreased from ~11min to under a minute. The query time is reduced further when the buffer size is set to 4M. Note also that query time for MariaDB is still a little higher by a couple of seconds, when compared to MySQL 5.6.

Now let's take a look at the status counters.

MySQL Status Counters

These status counters were captured when performing the benchmark on IO bound workload, mentioned above.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ read_rnd_bufer_size=4M MariaDB 5.5 MariaDB 5.5 w/ mrr_buffer_size=4M
Created_tmp_disk_tables 1 1 1 1 1
Created_tmp_tables 1 1 1 1 1
Handler_mrr_init N/A 0 0 1 1
Handler_mrr_rowid_refills N/A N/A N/A 1 0
Handler_read_key 508833 623738 622184 508913 507516
Handler_read_next 574320 574320 572889 574320 572889
Handler_read_rnd_next 136077 136094 136366 136163 136435
Innodb_buffer_pool_read_ahead 0 20920 23669 20920 23734
Innodb_buffer_pool_read_requests 1361851 1264739 1235472 1263290 1235781
Innodb_buffer_pool_reads 120548 102948 76882 102672 76832
Innodb_data_read 1.84G 1.89G 1.53G 1.89G 1.53G
Innodb_data_reads 120552 123872 100551 103011 77213
Innodb_pages_read 120548 123868 100551 123592 100566
Innodb_rows_read 799239 914146 912318 914146 912318
Select_scan 1 1 1 1 1
Sort_scan 1 1 1 1 1
  • As you can see from the status counters above that both MySQL 5.6 and MariaDB 5.5 are reporting high numbers for Innodb_buffer_pool_read_ahead which shows that the access pattern was sequential and hence InnoDB decided to do read_ahead, while in MySQL 5.5 no read_ahead was done because the access pattern was not sequential. Another thing to note is that more read_ahead is done when the buffer size used by MRR, is set to 4M, which obviously means that the more index tuples that can fit in the buffer the more sequential the access pattern will be.
  • There is one MRR related variable introduced in MySQL 5.6 and MariaDB 5.5 Handler_mrr_init and another additional one introduced in MariaDB 5.5 Handler_mrr_rowid_refills. Handler_mrr_init is incremented when a MRR range scan is performed, but we can see its only incremented in MariaDB 5.5 and not in MySQL 5.6, is that because of a bug in MySQL 5.6 code? As MRR was used in both MySQL 5.6 and MariaDB 5.5. Handler_mrr_rowid_refills counts how many times the buffer used by MRR had to be reinitialized, because the buffer was small and not all index tuples could fit in the buffer. If this is > 0 then it means Handler_mrr_rowid_refills + 1 MRR range scans had to be performed. As in the table above you can with default buffer size of 256K, MariaDB 5.5 shows that Handler_mrr_rowid_refills = 1, which means the buffer is small and there were 2 MRR range scans needed. But with a buffer size of 4M, we can see that Handler_mrr_rowid_refills = 0, which means that the buffer was big enough and only 1 MRR range scan was needed, which is as efficient as it can be. This is also evident in the query times, which is lower by a couple of seconds when buffer size of 4M is used.
  • Another interesting thing to note is that MySQL 5.6 and MariaDB 5.5 are both reading more rows than MySQL 5.5, as can be seen by the numbers reported for the status counter Innodb_rows_read. While MySQL 5.6 is also reporting increased numbers for the counter Handler_read_key. This is because of how status counter values are incremented when index lookup is performed. As I explained at the start of the post that traditional index lookup (for non-index-only columns) involves, reading an index record, and then using the PK column value in the index record to make a lookup in the PK. Both these lookups are performed in a single call to the storage engine and the counters Handler_read_key and Innodb_rows_read are incremented by ONE. However, when MRR is used then there are two separate calls made to the storage engine to perform the index record read and then to perform the MRR range scan on the PK. This causes the counters Handler_read_key and Innodb_rows_read to be incremented by TWO. It does not actually mean that queries with MRR are performing badly. The interesting thing is that though both MariaDB and MySQL 5.6 are reporting high numbers for Innodb_rows_read, which is completely in line with how the counters behave with MRR, but the value for counter Handler_read_key is more or less the same for MariaDB 5.5 when compared to MySQL 5.5, and this does not make sense to me. Probably its due to a bug in how counter is calculated inside MariaDB?

Other Observations

Sometimes both for MariaDB 5.5 and MySQL 5.6, the optimizer chooses the wrong query execution plan. Let's take a look at what are the good and bad query execution plans.

a. Bad Plan

id      select_type     table   type    possible_keys   key     key_len ref     rows    filtered        Extra
1       SIMPLE  nation  ALL     PRIMARY NULL    NULL    NULL    25      100.00  Using temporary; Using filesort
1       SIMPLE  customer        ref     PRIMARY,i_c_nationkey   i_c_nationkey   5       dbt3.nation.n_nationkey 2123    100.00
1       SIMPLE  orders  ref     PRIMARY,i_o_orderdate,i_o_custkey       i_o_custkey     5       dbt3.customer.c_custkey 7       100.00  Using where
1       SIMPLE  lineitem        ref     PRIMARY,i_l_orderkey,i_l_orderkey_quantity      PRIMARY 4       dbt3.orders.o_orderkey  1       100.00  Using where

b. Good Plan

id      select_type     table   type    possible_keys   key     key_len ref     rows    filtered        Extra
1       SIMPLE  orders  range   PRIMARY,i_o_orderdate,i_o_custkey       i_o_orderdate   4       NULL    232722  100.00  Using where; Rowid-ordered scan; Using temporary; Using filesort
1       SIMPLE  customer        eq_ref  PRIMARY,i_c_nationkey   PRIMARY 4       dbt3.orders.o_custkey   1       100.00  Using where
1       SIMPLE  nation  eq_ref  PRIMARY PRIMARY 4       dbt3.customer.c_nationkey       1       100.00
1       SIMPLE  lineitem        ref     PRIMARY,i_l_orderkey,i_l_orderkey_quantity      PRIMARY 4       dbt3.orders.o_orderkey  2       100.00  Using where

So during cold query runs the optimizer would switch to using plan 'a', which does not involve MRR, and the query time for MySQL 5.6 and MariaDB 5.5 jumps to ~11min (this is the query time for MySQL 5.5) While when it sticks to plan 'b' for MySQL 5.6 and MariaDB 5.5, then query times remain under a minute. So when the correct query execution plan is not used, there is no difference in query times between MySQL 5.5 and MySQL 5.6/MariaDB 5.5 This is another area of improvement in the optimizer, as it is clearly a part of the optimizer's job to select the best query execution plan. I had noted a similar thing when benchmarking ICP, the optimizer made a wrong choice. It looks like that there is still improvement and changes needed in the optimizer's cost estimation algorithm.

MariaDB 5.5 expands the concept of MRR to improve the performance of secondary key lookups as well. But this works only with joins and specifically with Block Access Join Algorithms. So I am not going to cover it here, but will cover it in my next post which will be on Block Access Join Algorithms.

Conclusion

There is a huge speedup when the workload is IO bound, the query time goes down from ~11min to under a minute. The query time is reduced further when buffer size is set large enough so that the index tuples fit in the buffer. But there is no performance improvement when the workload is in-memory, in fact MRR adds extra sorting overhead which means that the queries are just a bit slower as compared to MySQL 5.5 MRR clearly changes the access pattern to sequential, and hence InnoDB is able to do many read_aheads. Another thing to take away is that MariaDB is just a bit slower as compared to MySQL 5.6, may be something for the MariaDB guys to look at.


PlanetMySQL Voting: Vote UP / Vote DOWN

ext4 vs xfs on SSD

Март 15th, 2012

As ext4 is a standard de facto filesystem for many modern Linux system, I am getting a lot of question if this is good for SSD, or something else (i.e. xfs) should be used.
Traditionally our recommendation is xfs, and it comes to known problem in ext3, where IO gets serialized per i_node in O_DIRECT mode (check for example Domas’s post)

However from the results of my recent benchmarks I felt that this should be revisited.
While I am still running experiments, I would like to share earlier results what I have.

I use STEC SSD drive 200GB SLC SATA (my thanks to STEC for providing drives).

What I see, that ext4 still has problem with O_DIRECT. There are results for “single file” with O_DIRECT case (sysbench fileio 16 KiB blocksize random write workload):

  • ext4 1 thread: 87 MiB/sec
  • ext4 4 threads: 74 MiB/sec
  • xfs 4 threads: 97 MiB/sec

Dropping performance in case with 4 threads for ext4 is a signal that there still are contention issues.

I was pointed that ext4 has an option dioread_nolock, which supposedly fixes that, but that option is not available on my CentOS 6.2, so I could not test it.

At this point we may decide that xfs is still preferable, but there is one more point to consider.

Starting the MySQL 5.1 + InnoDB-plugin and later MySQL 5.5 (or equally Percona Server 5.1 and 5.5), InnoDB uses “asynchronous” IO in Linux.

Let’s test “async” mode in sysbench, and now we can get:

  • ext4 4 threads: 120 MiB/sec
  • xfs 4 threads: 97 MiB/sec

It corresponds to results I see running MySQL benchmarks (to be published later) on ext4 vs xfs.

Actually amount of threads does not affect the result significantly. This is to another question I was asked, namely: “If MySQL 5.5 uses async IO, is innodb_write_io_threads still important?”, and it seems it is not. In my tests it does not affect the final result. I would still use value 2 or 4, to avoid scheduling overhead from single thread, but it does not seem critical.

In conclusion ext4 looks like an good option, providing 20% better throughput. I am still going to run more benchmark to get better picture.

The script for tests:

for size in 100
do

cd /mnt/stec
sysbench --test=fileio --file-num=1 --file-total-size=${size}G prepare
sync
echo 3 > /proc/sys/vm/drop_caches

for numthreads in 4
do
sysbench --test=fileio --file-total-size=${size}G --file-test-mode=rndwr --max-time=3600 --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 --file-io-mode=sync --file-block-size=16384 --report-interval=10 run | tee -a run$size.thr$numthreads.txt
done
done

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Index Condition Pushdown in MySQL 5.6 and MariaDB 5.5 and its performance impact

Март 12th, 2012

I have been working with Peter in preparation for the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. We are taking a look at and benchmarking optimizer enhancements one by one. So in the same way this blog post is aimed at a new optimizer enhancement Index Condition Pushdown (ICP). Its available in both MySQL 5.6 and MariaDB 5.5

Now let’s take a look briefly at what this enhancement actually is, and what is it aimed at.

Index Condition Pushdown

Traditional B-Tree index lookups have some limitations in cases such as range scans, where index parts after the part on which range condition is applied cannot be used for filtering records. For example, suppose you have a key defined as:

KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)

and the WHERE condition defined as:

l_partkey = x
and l_quantity >= 1 and l_quantity <= 1+10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'

Then MySQL will use the key as if its only defined as including columns l_partkey and l_quantity and will not filter using the columns l_shipmode and l_shipinstruct. And so all rows matching condition l_partkey = x and and l_quantity >= 1 and l_quantity <= 1+10 will be fetched from the Primary Key and returned to MySQL server which will then in turn apply the remaining parts of the WHERE clause l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON'. So clearly if you have thousands of rows that match l_partkey and l_quantity but only a few hundred that match all the condition, then obviously you would be reading a lot of unnecessary data.

This is where ICP comes into play. With ICP, the server pushes down all conditions of the WHERE clause that match the key definition to the storage engine and then filtering is done in two steps:

  • Filter by the prefix of the index using traditional B-Tree index search
  • Filter by applying the where condition on the index entries fetched

You can read more about index condition pushdown as available in MySQL 5.6 here:
http://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html
and as available in MariaDB 5.5 here:
http://kb.askmonty.org/en/index-condition-pushdown

Now let's take a look at the benchmarks to see how much difference does this really make.

Benchmark results

For the purpose of this benchmark I used TPC-H Query #19 and ran it on TPC-H dataset (InnoDB tables) with scale factors of 2 (InnoDB dataset size ~5G) and 40 (InnoDB dataset size ~95G), so that I could see the speedup in case of in-memory and IO bound workloads. For the purpose of these benchmarks query cache was disabled and the buffer pool size was set to 6G.

The query is:

select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem force index(i_l_partkey),
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#45'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 1 and l_quantity <= 1+10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#24'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 14 and l_quantity <= 14+10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#53'
                and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                and l_quantity >= 28 and l_quantity <= 28+10
                and p_size between 1 and 15
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )

There are two changes that I made to the query and the TPC-H tables' structure.
- Added a new index: KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)
- Added an index hint in the query: lineitem force index(i_l_partkey)

The size of the buffer pool used for the benchmarks is 6G and the disks are 4 5.4K disks in Software RAID5.

In-memory workload

Now let's first take a look at how effective is ICP when the workload is in memory. For the purpose of benchmarking in-memory workload, I used a Scale Factor of 2 (dataset size ~5G), and the buffer pool was warmed up so that the relevant pages were already loaded in the buffer pool:

IO bound workload

And what about IO bound workload, when the dataset does not fit into memory. For the purpose of benchmarking IO bound workload, I used a Scale Factor of 40 (dataset size ~95G) and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already:

Now let's take a look at the status counters to see how much does this optimization make a difference.

MySQL Status Counters

These status counters are taken when performing the benchmark on IO bound workload, mentioned above.

Counter Name MySQL 5.5 MySQL 5.6 MariaDB 5.5
Created_tmp_disk_tables 0 0 0
Created_tmp_tables 0 0 0
Handler_icp_attempts N/A N/A 571312
Handler_icp_match N/A N/A 4541
Handler_read_key 19310 19192 18989
Handler_read_next 579426 4514 4541
Handler_read_rnd_next 8000332 8000349 8000416
Innodb_buffer_pool_read_ahead 81132 81067 81069
Innodb_buffer_pool_read_requests 4693757 1293628 1292675
Innodb_buffer_pool_reads 540805 28468 28205
Innodb_data_read 9.49G 1.67G 1.67G
Innodb_data_reads 621939 109537 29796
Innodb_pages_read 621937 109535 109274
Innodb_rows_read 8579426 8004514 8004541
Select_scan 1 1 1
  • We can see that values for Handler_read_key are more or less the same, because there is no change in the index lookup method, the index range scan is still done the same way as in MySQL 5.5, all the index pages that match the range condition on the l_quantity will still be fetched. Its the optimization afterwards that counts.
  • As we can see there is a huge reduction in the value of Handler_read_next, as with ICP the remaining parts of the WHERE clause are checked directly on the index pages fetched without having to fetch the entire rows from the PK, which means 128x less subsequent reads from the PK.
  • There are two status counters available in MariaDB 5.3/5.5 that are not available in MySQL, Handler_icp_attempts and Handler_icp_match, which show how many times the remaining WHERE condition was checked on the fetched index pages and how many times the index entries matched the condition. The value of Handler_icp_match directly co-relates to that of Handler_read_next. The smaller the ratio of Handler_icp_attempts to Handler_icp_match the better the filtering.
  • Another important thing is that there is 18x less IO page reads in the case of MySQL 5.6 and MariaDB 5.5 and 8x less amount of data read (1.67G compared to 9.49G). This just does not mean reduction in IO but means more free pages are available in the buffer pool to cater to other queries. For example, in the case of MySQL 5.5 (with a buffer pool of 6G), the query will be IO bound even with a warmed up buffer pool, because there is just too much data read. While in the case of MySQL 5.6 and MariaDB 5.5, the queries will have all the pages in-memory with warm caches and still 72% of the buffer pool will be empty to handle other queries.

Other Observations

There are two indexes defined on the table with similar prefixes:

  • KEY `i_l_suppkey_partkey` (`l_partkey`, `l_suppkey`)
  • KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)

Obviously, the key i_l_partkey is much more restrictive. Although it cannot filter more rows by using traditional B-Tree index lookup used by MySQL, when compared to the key i_l_suppkey_partkey. But after the B-Tree lookup step, the second key can filter a lot more data using ICP on the remaining conditions after l_quantity. Yet MySQL 5.6 and MariaDB 5.5 optimizer does not consider this logic when selecting the index. In all the runs of the benchmark, the optimizer chose the key i_l_suppkey_partkey because the optimizer estimates that both indexes will mean same no. of rows and i_l_suppkey_partkey has a smaller key size. To get around this limitation in the optimizer I had to use index hint (FORCE INDEX). Clearly, selecting the correct index is an important part of optimizer's job, and there is room for improvement there.

Another thing which I feel is that there is still further optimization possible, like moving ICP directly to the index lookup function so that the limitation in the optimizer which prevents other parts of the key after the first range condition are removed.

Conclusion

There is a huge speed up 78 minutes down to 2 minutes in case of completely IO bound workload, and upto 2x speedup in case when the workload is in-memory. There is not much difference here in terms of the performance gains on MariaDB 5.5 vs MySQL 5.6 and both are on-par. This is great for queries that suffer from the weakness of the current MySQL optimizer when it comes to doing multi-column range scans, and the ICP optimization will benefit a lot of your queries like the one I showed in my example. Not only that, because there will be a cut down in the number of data pages read into the buffer pool, it means better buffer pool utilization.

On a last note, I will be posting the benchmark scripts, table definitions, the configuration files for MySQL 5.5/5.6 and MariaDB 5.5 and the description of the hardware used.


PlanetMySQL Voting: Vote UP / Vote DOWN

Introducing new type of benchmark

Февраль 25th, 2012

Traditionally the most benchmarks are focusing on throughput. We all get used to that, and in fact in our benchmarks, sysbench and tpcc-mysql, the final result is also represents the throughput (transactions per second in sysbench; NewOrder transactions Per Minute in tpcc-mysql). However, like Mark Callaghan mentioned in comments, response time is way more important metric to compare.

I want to pretend that we pioneered (not invented, but started to use widely) a benchmark methodology when we measure not the final throughput, but rather periodic probes (i.e. every 10 sec).
It allows us to draw “stability” graphs, like this one

where we can see not only a final result, but how the system behaves in dynamic.

What’s wrong with existing benchmarks?

Well, all benchmarks are lie, and focusing on throughput does not get any closer to reality.

Benchmarks, like sysbench or tpcc-mysql, start N threads and try to push the database as much as possible, bombarding the system with queries with no pause.

That rarely happens in real life. There are no systems that are pushed to 100% load all time.

So, how we can model it? There are different theories, and the one which describes user’s behavior, is Queueing theory. In short we can assume that users send requests with some arrival rate (which can be different in the different part of day/week/month/year though). And what is important for an end user is response time, that is how long the user has to wait on results. E.g. when you go to your Facebook profile or Wikipedia page, you expect to get response within second or two.

How we should change the benchmark to base on this model ?
There are my working ideas:

  • Benchmark starts N working threads, but they all are idle until asked to handle a request
  • Benchmarks generates transactions with a given rate, i.e. M transactions per second and puts into a queue. The interval between arrivals is not uniform, but rather distributed by Exponential distribution, with λ = M. That how it goes if to believe to the Poisson process.
    For example, if our target is arrival rate 2 queries per second, then exponential distribution will give us following intervals (in sec) between events: 0.61, 0.43, 1.55, 0.18, 0.01, 0.76, 0.09, 1.26, …

    Or if we represent graphically (point means even arrival):

    As you see interval is far from being strict 0.5 sec, but 0.5 is the mean of this random generation function. On the graph you see 20 events arrived within 9 seconds.

  • Transactions from the queue are handled by one of free threads, or are waiting in the queue until one of threads are ready. The time waiting in the queue is added to a total response time.
  • As a result we measure 95% or 99% response times.

What does it give to us? It allows to see:

  • What is the response time we may expect having a given arrival rate
  • What is the optimal number of working threads (the one that provides best response times)

When it is useful?

At this moment I am looking to answer on questions like:
– When we add additional node to a cluster (e.g. XtraDB Cluster), how does it affect the response time ?
– When we put a load to two nodes instead of three nodes, will it help to improve the response time ?
– Do we need to increase number of working threads when we add nodes ?

Beside cluster testing, it will also help to see an affect of having a side on the server. For example, the famous problem with DROP TABLE performance. Does DROP TABLE, running in separate session, affect a response time of queries that handle user load ? The same for mysqldump, how does it affect short user queries ?

In fact I have a prototype based on sysbench. It is there lp:~vadim-tk/sysbench/inj-rate/. It works like a regular sysbench, but you need to specify the additional parameter tx-rate, which defines an expected arrival rate (in transactions per second).

There are some results from my early experiments. Assume we have an usual sysbench setup, and we target an arrival rate as 3000 transactions per second (regular sysbench transactions). We vary working threads from 1 to 128.

There are results for 16-128 threads (the result is 99% response time, taken every 10 sec. the less is better)

We can see that 16 threads give best 99% response time (15.74ms final), 32 threads: 16.75 ms, 64 threads: 25.14ms.
And with 128 threads we have pretty terrible unstable response times, with 1579.91ms final.
That means that 16-32 threads is probably best number of concurrently working threads (for this kind of workload and this arrival rate).

Ok, but what happens if we have not enough working threads? You can see it from following graph (1-8 threads):

The queue piles up, waiting time grows, and the final response time grows linearly up to ~30 sec, where benchmark stops, because the queue is full.

I am looking for your comments, do you find it useful?

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN