Archive for the ‘mysql performance’ Category

Interesting behavior of a MySQL benchmark on EC2

Май 3rd, 2012

I had to benchmark an EC2 instance to see whether a database could be safely moved to it. It is a good practice, which helps avoiding surprises when an instance or its storage are allocated in a noisy neighborhood, where the neighbors use so much resources that it affects the performance of our MySQL database. It is understandable that one can never get very reliable results on EC2, this is a shared environment after all, and that some fluctuations should be expected, however it is still good to know the numbers. I started my benchmarks and everything seemed fine at first, but then sometimes statistics I was getting started looking quite odd.

I was running the benchmarks on a High-CPU Extra Large Instance and couldn’t see any reliability in the results at all. I mean, in one moment I was getting poor throughput and horrible response times only to see it improve a lot a few minutes later. I ruled out a possibility that it could be some kind of problems with the storage performance causing this, so I started suspecting MySQL itself – perhaps some sort of contention. But no system statistics, nor any information from MySQL wanted to confirm that theory.

Finally, I went for oprofile and started it during one of the benchmarks and what I saw surprised me:


When oprofile was running MySQL performance nearly doubled. It was also reflected in the CPU utilization which increased accordingly and also got a lot more stable. After a while, I stopped oprofile and the performance dropped again.

At this point I am not sure how to explain this, however this was repeatable. Anyone?


PlanetMySQL Voting: Vote UP / Vote DOWN

Join Optimizations in MySQL 5.6 and MariaDB 5.5

Апрель 5th, 2012

This is the third blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is targeted at the join related optimizations introduced in the optimizer. These optimizations are available in both MySQL 5.6 and MariaDB 5.5, and MariaDB 5.5 has introduced some additional optimizations which we will also look at, in this post.

Now let me briefly explain these optimizations.

Batched Key Access

Traditionally, MySQL always uses Nested Loop Join to join two or more tables. What this means is that, select rows from first table participating in the joins are read, and then for each of these rows an index lookup is performed on the second table. This means many point queries, say for example if table1 yields 1000 rows then 1000 index lookups are performed on table2. Of course this could mean a lot of random lookups in case the dataset does not fit into memory or if the caches are not warm, this also does not take into account locality of access, as you could be reading the same page multiple times. Consider if you have a small buffer pool and a very large number of rows from table2 that do not fit into the buffer pool, then worst case you could be reading the same page multiple times into the buffer pool.
So considering this drawback, Batched Key Access (BKA) optimization was introduced. When BKA is being used then, after the selected rows are read from table1, the values of indexed columns that will be used to perform a lookup on table2 are batched together and sent to the storage engine. The storage engine then uses the MRR interface (which I explained in my previous post) to lookup the rows in table2. So this means that we have traded many point index lookups to one or more index range lookups. This means MySQL can employ many other optimizations like for example if columns other then the secondary key columns are required to be fetched, then MRR interface can instead access the PK by arranging the secondary key values by PK column and then performing the lookup, and then there are other possibilities like InnoDB doing read_ahead by noticing the sequential access pattern.
BKA is available in both MySQL 5.6 and MariaDB 5.5. You can read more about BKA in MySQL 5.6 here and BKA in MariaDB 5.5 here.

However, MariaDB 5.5 has one additional optimization that is used in conjunction with BKA and that is MRR Key-ordered Scan. Let’s see what this optimization actually is.

Key-ordered Scan for BKA

With this optimization the idea of MRR is further extended to improve join performance. As I told you above, when table t1 would be joined to table t2, then selected rows from t1 would be read and then for all rows, index lookup would be performed on t2. Here is where Key-ordered scan comes in. With BKA turned on, the index lookups to t2 are batched together and sent to the MRR interface. Now when the MRR interface is going to perform the lookup on t2 by a secondary key on the common join column, let’s suppose col1, then with Key-ordered scan, the secondary key on col1 would be sorted by col1 i.e. in index order and then the lookup will be performed. So key-ordered scan is basically an extension of MRR to secondary keys. You can read more about Key-ordered Scan for BKA here.

Now there is one additional join optimization in MariaDB that I would like to quickly explain here, and that is Hash Joins.

Hash Join

As I have told before MySQL has only supported one join algorithm and that is Nested Loop Join. MariaDB has introduced a new join algorithm Hash Join. This join algorithm only works with equi-joins. Now let me briefly explain how hash join algorithm works. Suppose you have two tables t1 and t2, that you wish to join together in an equi-join, then the way hash join algorithm will operate is that first a hash table will be created based on the columns of table t1 that are two be used to join rows with table t2. This step is called the build step. After the hash table has been created, rows from table t2 are read and hash function is applied to the columns participating in the join condition and then a hash table lookup is performed, on the hash table we created earlier. This step is known as probe step. This join algorithm works best for highly selective queries, and obviously when the first operand used in the build step is such that the hash table fits in memory. You can read more about the hash join algorithm here.

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #3 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made on MySQL 5.6 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='batched_key_access=on'
join_buffer_size=6M
read_rnd_buffer_size=6M

Also note that the following changes were made on MariaDB 5.5 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=8
join_buffer_size=6M
mrr_buffer_size=6M

Note that I have turned off ICP because I would like to see the affect of BKA on the query times. But I also had to turn on MRR because BKA uses the MRR interface. Note also the additional optimizer switches and variables in case of MariaDB 5.5. You can read more about these variables here.

The query used is:

select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem FORCE INDEX (i_l_orderkey)
where
        c_mktsegment = 'AUTOMOBILE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-09'
        and l_shipdate > '1995-03-09'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

In-memory workload

Now let’s see how effective are the join optimizations when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Now let’s take a look at the graph:

In the graph above the labels mean as follows:
MySQL 5.6 (2) is for MySQL 5.6 w/ buffer size 6M
MariaDB 5.5 (2) is for MariaDB 5.5 w/ buffer size 6M
MariaDB 5.5 (3) is for MariaDB 5.5 w/ buffer size 6M (Hash Join and Key-ordered Scan disabled)

You can see that the lowest query time is for MySQL 5.6 which takes 0.16s less as compared to MySQL 5.5 While with join_buffer_size set to 6M and read_rnd_buffer_size set to 6M, the query time for MySQL 5.6 becomes approximately equal to that of MySQL 5.5. MariaDB 5.5 is quite slow as compared to both MySQL 5.5 and MySQL 5.6. For MariaDB 5.5 the query time decreases when the join_buffer_size is set to 6M and mrr_buffer_size is set to 6M. While the lowest query time for MariaDB 5.5 is when both the Hash Join and the Key-ordered Scan are disabled.

IO bound workload

Now let’s see how effective are the join optimizations when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already. Now let’s take a look at the graph:

The labels in the graph above are same as in the graph for in-memory workload.

BKA makes a huge difference when the workload is IO bound and the query time drops from 2534.41s down to under a minute. MySQL 5.6 has the smallest query time when the join_buffer_size is set to 6M and the read_rnd_buffer_size is set to 6M. MariaDB 5.5 is close, yet slower by ~10s. Another thing that is important to know is that, just increasing the join_buffer_size is not going to provide the best possible query performance. As I mentioned at the start that BKA uses the MRR interface so you should also adjust/increase the size of the buffer that MRR uses appropriately. In my tests I noticed that with just the join_buffer_size increased to 6M, the query time was ~300s while when both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size were set to 6M, the query time dropped to ~40s. So the maximum possible benefit is when size of both the buffers is increased appropriately. Also in the chart above the bottom two bars correspond to MariaDB with Hash Join and Key-ordered Scan enabled, and MariaDB with Hash Join and Key-ordered Scan disabled, and the only difference in query time is 48.78s vs 48.91s, so I don’t see Hash Join and Key-ordered Scan making much of a difference here.

Now let’s take a look at the status counters.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ join_buffer_size=6M & read_rnd_buffer_size=6M MariaDB 5.5 MariaDB 5.5 w/ join_buffer_size=6M & mrr_buffer_size=6M MariaDB 5.5 Hash Join Disabled w/ join_buffer_size=4M & mrr_buffer_size=4M
Created_tmp_disk_tables 0 0 0 0 0 0
Created_tmp_tables 1 1 1 1 1 1
Handler_mrr_init N/A 112 3 1133 1 1
Handler_mrr_key_refills N/A N/A N/A 0 0 0
Handler_mrr_rowid_refills N/A N/A N/A 22 0 0
Handler_read_key 1829910 4440511 4372096 1801917 1790641 1805995
Handler_read_next 2651500 2628103 2586036 2628103 2592816 2615612
Handler_read_rnd_next 23078 22780 22757 22867 23049 23221
Innodb_buffer_pool_read_ahead 1152 23231 130919 23228 130731 131497
Innodb_buffer_pool_read_requests 12947073 10228611 7816154 10251697 9319281 9393396
Innodb_buffer_pool_reads 327963 332092 12889 335080 14067 13384
Innodb_data_read 5G 5.4G 2.2G 5.4G 2.2G 2.2G
Innodb_data_reads 329115 355323 143808 335526 16164 15506
Innodb_pages_read 329115 355323 143808 358308 144798 144881
Innodb_rows_read 4127318 6718351 6611675 6707882 5479445 5527245
Select_scan 1 1 1 1 1 1
Sort_scan 1 1 1 1 1 1

The first obvious improvement is shown by the high numbers of Innodb_buffer_pool_read_ahead when the buffers are sized appropriately, which shows that the IO access pattern has been changed to become sequential. The two other most important numbers are values for Innodb_buffer_pool_reads and Innodb_data_read. We can see that with appropriately sized buffers less no. of Innodb_buffer_pool_reads are done, and less amount of data is read from disk, in fact half the amount of data is read from disk 2.2G vs 5G. However, there is one number in MariaDB 5.5 that is quite large as compared to MySQL 5.6 and that is ‘Handler_mrr_init’. Is it because of this that the query on MariaDB 5.5 is slow as compared to MySQL 5.6. Next interesting thing are the last two columns of the table above and the values for ‘Handler_read_key’, ‘Handler_read_next’ and ‘Handler_read_rnd_next’. MariaDB 5.5 with Hash Joins enabled is doing less Handler reads as compared to when the Hash Joins are disabled, but that did not make much of a difference in the query times.

Conclusion

BKA improves the query time by a huge margin for IO bound workload but does not make much of a difference to in-memory workload. Also BKA relies on both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size and both of these buffers should be increased appropriately for the best possible performance gain. This is not entirely visible in the manual either for MariaDB or MySQL, but you need to appropriately increase read_rnd_buffer_size/mrr_buffer_size because these have an impact on MRR performance, and BKA uses the MRR interface, so these buffers indirectly impact BKA performance. I did not find much of a performance improvement from using Hash Join in MariaDB 5.5 or from Key-ordered Scan for TPCH query #3, in fact disabling both of these provided the best result for MariaDB 5.5 for in-memory workload.

I intend to run tests to see what specific types of queries would benefit from Hash Join as compared to Nested Loop Join, but for now it looks like Nested Loop Join is a much better general purpose join algorithm.


PlanetMySQL Voting: Vote UP / Vote DOWN

Join Optimizations in MySQL 5.6 and MariaDB 5.5

Апрель 5th, 2012

This is the third blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is targeted at the join related optimizations introduced in the optimizer. These optimizations are available in both MySQL 5.6 and MariaDB 5.5, and MariaDB 5.5 has introduced some additional optimizations which we will also look at, in this post.

Now let me briefly explain these optimizations.

Batched Key Access

Traditionally, MySQL always uses Nested Loop Join to join two or more tables. What this means is that, select rows from first table participating in the joins are read, and then for each of these rows an index lookup is performed on the second table. This means many point queries, say for example if table1 yields 1000 rows then 1000 index lookups are performed on table2. Of course this could mean a lot of random lookups in case the dataset does not fit into memory or if the caches are not warm, this also does not take into account locality of access, as you could be reading the same page multiple times. Consider if you have a small buffer pool and a very large number of rows from table2 that do not fit into the buffer pool, then worst case you could be reading the same page multiple times into the buffer pool.
So considering this drawback, Batched Key Access (BKA) optimization was introduced. When BKA is being used then, after the selected rows are read from table1, the values of indexed columns that will be used to perform a lookup on table2 are batched together and sent to the storage engine. The storage engine then uses the MRR interface (which I explained in my previous post) to lookup the rows in table2. So this means that we have traded many point index lookups to one or more index range lookups. This means MySQL can employ many other optimizations like for example if columns other then the secondary key columns are required to be fetched, then MRR interface can instead access the PK by arranging the secondary key values by PK column and then performing the lookup, and then there are other possibilities like InnoDB doing read_ahead by noticing the sequential access pattern.
BKA is available in both MySQL 5.6 and MariaDB 5.5. You can read more about BKA in MySQL 5.6 here and BKA in MariaDB 5.5 here.

However, MariaDB 5.5 has one additional optimization that is used in conjunction with BKA and that is MRR Key-ordered Scan. Let’s see what this optimization actually is.

Key-ordered Scan for BKA

With this optimization the idea of MRR is further extended to improve join performance. As I told you above, when table t1 would be joined to table t2, then selected rows from t1 would be read and then for all rows, index lookup would be performed on t2. Here is where Key-ordered scan comes in. With BKA turned on, the index lookups to t2 are batched together and sent to the MRR interface. Now when the MRR interface is going to perform the lookup on t2 by a secondary key on the common join column, let’s suppose col1, then with Key-ordered scan, the secondary key on col1 would be sorted by col1 i.e. in index order and then the lookup will be performed. So key-ordered scan is basically an extension of MRR to secondary keys. You can read more about Key-ordered Scan for BKA here.

Now there is one additional join optimization in MariaDB that I would like to quickly explain here, and that is Hash Joins.

Hash Join

As I have told before MySQL has only supported one join algorithm and that is Nested Loop Join. MariaDB has introduced a new join algorithm Hash Join. This join algorithm only works with equi-joins. Now let me briefly explain how hash join algorithm works. Suppose you have two tables t1 and t2, that you wish to join together in an equi-join, then the way hash join algorithm will operate is that first a hash table will be created based on the columns of table t1 that are two be used to join rows with table t2. This step is called the build step. After the hash table has been created, rows from table t2 are read and hash function is applied to the columns participating in the join condition and then a hash table lookup is performed, on the hash table we created earlier. This step is known as probe step. This join algorithm works best for highly selective queries, and obviously when the first operand used in the build step is such that the hash table fits in memory. You can read more about the hash join algorithm here.

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #3 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made on MySQL 5.6 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='batched_key_access=on'
join_buffer_size=6M
read_rnd_buffer_size=6M

Also note that the following changes were made on MariaDB 5.5 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=8
join_buffer_size=6M
mrr_buffer_size=6M

Note that I have turned off ICP because I would like to see the affect of BKA on the query times. But I also had to turn on MRR because BKA uses the MRR interface. Note also the additional optimizer switches and variables in case of MariaDB 5.5. You can read more about these variables here.

The query used is:

select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem FORCE INDEX (i_l_orderkey)
where
        c_mktsegment = 'AUTOMOBILE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-09'
        and l_shipdate > '1995-03-09'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

In-memory workload

Now let’s see how effective are the join optimizations when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Now let’s take a look at the graph:

In the graph above the labels mean as follows:
MySQL 5.6 (2) is for MySQL 5.6 w/ buffer size 6M
MariaDB 5.5 (2) is for MariaDB 5.5 w/ buffer size 6M
MariaDB 5.5 (3) is for MariaDB 5.5 w/ buffer size 6M (Hash Join and Key-ordered Scan disabled)

You can see that the lowest query time is for MySQL 5.6 which takes 0.16s less as compared to MySQL 5.5 While with join_buffer_size set to 6M and read_rnd_buffer_size set to 6M, the query time for MySQL 5.6 becomes approximately equal to that of MySQL 5.5. MariaDB 5.5 is quite slow as compared to both MySQL 5.5 and MySQL 5.6. For MariaDB 5.5 the query time decreases when the join_buffer_size is set to 6M and mrr_buffer_size is set to 6M. While the lowest query time for MariaDB 5.5 is when both the Hash Join and the Key-ordered Scan are disabled.

IO bound workload

Now let’s see how effective are the join optimizations when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already. Now let’s take a look at the graph:

The labels in the graph above are same as in the graph for in-memory workload.

BKA makes a huge difference when the workload is IO bound and the query time drops from 2534.41s down to under a minute. MySQL 5.6 has the smallest query time when the join_buffer_size is set to 6M and the read_rnd_buffer_size is set to 6M. MariaDB 5.5 is close, yet slower by ~10s. Another thing that is important to know is that, just increasing the join_buffer_size is not going to provide the best possible query performance. As I mentioned at the start that BKA uses the MRR interface so you should also adjust/increase the size of the buffer that MRR uses appropriately. In my tests I noticed that with just the join_buffer_size increased to 6M, the query time was ~300s while when both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size were set to 6M, the query time dropped to ~40s. So the maximum possible benefit is when size of both the buffers is increased appropriately. Also in the chart above the bottom two bars correspond to MariaDB with Hash Join and Key-ordered Scan enabled, and MariaDB with Hash Join and Key-ordered Scan disabled, and the only difference in query time is 48.78s vs 48.91s, so I don’t see Hash Join and Key-ordered Scan making much of a difference here.

Now let’s take a look at the status counters.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ join_buffer_size=6M & read_rnd_buffer_size=6M MariaDB 5.5 MariaDB 5.5 w/ join_buffer_size=6M & mrr_buffer_size=6M MariaDB 5.5 Hash Join Disabled w/ join_buffer_size=4M & mrr_buffer_size=4M
Created_tmp_disk_tables 0 0 0 0 0 0
Created_tmp_tables 1 1 1 1 1 1
Handler_mrr_init N/A 112 3 1133 1 1
Handler_mrr_key_refills N/A N/A N/A 0 0 0
Handler_mrr_rowid_refills N/A N/A N/A 22 0 0
Handler_read_key 1829910 4440511 4372096 1801917 1790641 1805995
Handler_read_next 2651500 2628103 2586036 2628103 2592816 2615612
Handler_read_rnd_next 23078 22780 22757 22867 23049 23221
Innodb_buffer_pool_read_ahead 1152 23231 130919 23228 130731 131497
Innodb_buffer_pool_read_requests 12947073 10228611 7816154 10251697 9319281 9393396
Innodb_buffer_pool_reads 327963 332092 12889 335080 14067 13384
Innodb_data_read 5G 5.4G 2.2G 5.4G 2.2G 2.2G
Innodb_data_reads 329115 355323 143808 335526 16164 15506
Innodb_pages_read 329115 355323 143808 358308 144798 144881
Innodb_rows_read 4127318 6718351 6611675 6707882 5479445 5527245
Select_scan 1 1 1 1 1 1
Sort_scan 1 1 1 1 1 1

The first obvious improvement is shown by the high numbers of Innodb_buffer_pool_read_ahead when the buffers are sized appropriately, which shows that the IO access pattern has been changed to become sequential. The two other most important numbers are values for Innodb_buffer_pool_reads and Innodb_data_read. We can see that with appropriately sized buffers less no. of Innodb_buffer_pool_reads are done, and less amount of data is read from disk, in fact half the amount of data is read from disk 2.2G vs 5G. However, there is one number in MariaDB 5.5 that is quite large as compared to MySQL 5.6 and that is ‘Handler_mrr_init’. Is it because of this that the query on MariaDB 5.5 is slow as compared to MySQL 5.6. Next interesting thing are the last two columns of the table above and the values for ‘Handler_read_key’, ‘Handler_read_next’ and ‘Handler_read_rnd_next’. MariaDB 5.5 with Hash Joins enabled is doing less Handler reads as compared to when the Hash Joins are disabled, but that did not make much of a difference in the query times.

Conclusion

BKA improves the query time by a huge margin for IO bound workload but does not make much of a difference to in-memory workload. Also BKA relies on both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size and both of these buffers should be increased appropriately for the best possible performance gain. This is not entirely visible in the manual either for MariaDB or MySQL, but you need to appropriately increase read_rnd_buffer_size/mrr_buffer_size because these have an impact on MRR performance, and BKA uses the MRR interface, so these buffers indirectly impact BKA performance. I did not find much of a performance improvement from using Hash Join in MariaDB 5.5 or from Key-ordered Scan for TPCH query #3, in fact disabling both of these provided the best result for MariaDB 5.5 for in-memory workload.

I intend to run tests to see what specific types of queries would benefit from Hash Join as compared to Nested Loop Join, but for now it looks like Nested Loop Join is a much better general purpose join algorithm.


PlanetMySQL Voting: Vote UP / Vote DOWN

Index Condition Pushdown in MySQL 5.6 and MariaDB 5.5 and its performance impact

Март 12th, 2012

I have been working with Peter in preparation for the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. We are taking a look at and benchmarking optimizer enhancements one by one. So in the same way this blog post is aimed at a new optimizer enhancement Index Condition Pushdown (ICP). Its available in both MySQL 5.6 and MariaDB 5.5

Now let’s take a look briefly at what this enhancement actually is, and what is it aimed at.

Index Condition Pushdown

Traditional B-Tree index lookups have some limitations in cases such as range scans, where index parts after the part on which range condition is applied cannot be used for filtering records. For example, suppose you have a key defined as:

KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)

and the WHERE condition defined as:

l_partkey = x
and l_quantity >= 1 and l_quantity <= 1+10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'

Then MySQL will use the key as if its only defined as including columns l_partkey and l_quantity and will not filter using the columns l_shipmode and l_shipinstruct. And so all rows matching condition l_partkey = x and and l_quantity >= 1 and l_quantity <= 1+10 will be fetched from the Primary Key and returned to MySQL server which will then in turn apply the remaining parts of the WHERE clause l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON'. So clearly if you have thousands of rows that match l_partkey and l_quantity but only a few hundred that match all the condition, then obviously you would be reading a lot of unnecessary data.

This is where ICP comes into play. With ICP, the server pushes down all conditions of the WHERE clause that match the key definition to the storage engine and then filtering is done in two steps:

  • Filter by the prefix of the index using traditional B-Tree index search
  • Filter by applying the where condition on the index entries fetched

You can read more about index condition pushdown as available in MySQL 5.6 here:
http://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html
and as available in MariaDB 5.5 here:
http://kb.askmonty.org/en/index-condition-pushdown

Now let's take a look at the benchmarks to see how much difference does this really make.

Benchmark results

For the purpose of this benchmark I used TPC-H Query #19 and ran it on TPC-H dataset (InnoDB tables) with scale factors of 2 (InnoDB dataset size ~5G) and 40 (InnoDB dataset size ~95G), so that I could see the speedup in case of in-memory and IO bound workloads. For the purpose of these benchmarks query cache was disabled and the buffer pool size was set to 6G.

The query is:

select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem force index(i_l_partkey),
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#45'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 1 and l_quantity <= 1+10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#24'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 14 and l_quantity <= 14+10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#53'
                and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                and l_quantity >= 28 and l_quantity <= 28+10
                and p_size between 1 and 15
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )

There are two changes that I made to the query and the TPC-H tables' structure.
- Added a new index: KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)
- Added an index hint in the query: lineitem force index(i_l_partkey)

The size of the buffer pool used for the benchmarks is 6G and the disks are 4 5.4K disks in Software RAID5.

In-memory workload

Now let's first take a look at how effective is ICP when the workload is in memory. For the purpose of benchmarking in-memory workload, I used a Scale Factor of 2 (dataset size ~5G), and the buffer pool was warmed up so that the relevant pages were already loaded in the buffer pool:

IO bound workload

And what about IO bound workload, when the dataset does not fit into memory. For the purpose of benchmarking IO bound workload, I used a Scale Factor of 40 (dataset size ~95G) and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already:

Now let's take a look at the status counters to see how much does this optimization make a difference.

MySQL Status Counters

These status counters are taken when performing the benchmark on IO bound workload, mentioned above.

Counter Name MySQL 5.5 MySQL 5.6 MariaDB 5.5
Created_tmp_disk_tables 0 0 0
Created_tmp_tables 0 0 0
Handler_icp_attempts N/A N/A 571312
Handler_icp_match N/A N/A 4541
Handler_read_key 19310 19192 18989
Handler_read_next 579426 4514 4541
Handler_read_rnd_next 8000332 8000349 8000416
Innodb_buffer_pool_read_ahead 81132 81067 81069
Innodb_buffer_pool_read_requests 4693757 1293628 1292675
Innodb_buffer_pool_reads 540805 28468 28205
Innodb_data_read 9.49G 1.67G 1.67G
Innodb_data_reads 621939 109537 29796
Innodb_pages_read 621937 109535 109274
Innodb_rows_read 8579426 8004514 8004541
Select_scan 1 1 1
  • We can see that values for Handler_read_key are more or less the same, because there is no change in the index lookup method, the index range scan is still done the same way as in MySQL 5.5, all the index pages that match the range condition on the l_quantity will still be fetched. Its the optimization afterwards that counts.
  • As we can see there is a huge reduction in the value of Handler_read_next, as with ICP the remaining parts of the WHERE clause are checked directly on the index pages fetched without having to fetch the entire rows from the PK, which means 128x less subsequent reads from the PK.
  • There are two status counters available in MariaDB 5.3/5.5 that are not available in MySQL, Handler_icp_attempts and Handler_icp_match, which show how many times the remaining WHERE condition was checked on the fetched index pages and how many times the index entries matched the condition. The value of Handler_icp_match directly co-relates to that of Handler_read_next. The smaller the ratio of Handler_icp_attempts to Handler_icp_match the better the filtering.
  • Another important thing is that there is 18x less IO page reads in the case of MySQL 5.6 and MariaDB 5.5 and 8x less amount of data read (1.67G compared to 9.49G). This just does not mean reduction in IO but means more free pages are available in the buffer pool to cater to other queries. For example, in the case of MySQL 5.5 (with a buffer pool of 6G), the query will be IO bound even with a warmed up buffer pool, because there is just too much data read. While in the case of MySQL 5.6 and MariaDB 5.5, the queries will have all the pages in-memory with warm caches and still 72% of the buffer pool will be empty to handle other queries.

Other Observations

There are two indexes defined on the table with similar prefixes:

  • KEY `i_l_suppkey_partkey` (`l_partkey`, `l_suppkey`)
  • KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)

Obviously, the key i_l_partkey is much more restrictive. Although it cannot filter more rows by using traditional B-Tree index lookup used by MySQL, when compared to the key i_l_suppkey_partkey. But after the B-Tree lookup step, the second key can filter a lot more data using ICP on the remaining conditions after l_quantity. Yet MySQL 5.6 and MariaDB 5.5 optimizer does not consider this logic when selecting the index. In all the runs of the benchmark, the optimizer chose the key i_l_suppkey_partkey because the optimizer estimates that both indexes will mean same no. of rows and i_l_suppkey_partkey has a smaller key size. To get around this limitation in the optimizer I had to use index hint (FORCE INDEX). Clearly, selecting the correct index is an important part of optimizer's job, and there is room for improvement there.

Another thing which I feel is that there is still further optimization possible, like moving ICP directly to the index lookup function so that the limitation in the optimizer which prevents other parts of the key after the first range condition are removed.

Conclusion

There is a huge speed up 78 minutes down to 2 minutes in case of completely IO bound workload, and upto 2x speedup in case when the workload is in-memory. There is not much difference here in terms of the performance gains on MariaDB 5.5 vs MySQL 5.6 and both are on-par. This is great for queries that suffer from the weakness of the current MySQL optimizer when it comes to doing multi-column range scans, and the ICP optimization will benefit a lot of your queries like the one I showed in my example. Not only that, because there will be a cut down in the number of data pages read into the buffer pool, it means better buffer pool utilization.

On a last note, I will be posting the benchmark scripts, table definitions, the configuration files for MySQL 5.5/5.6 and MariaDB 5.5 and the description of the hardware used.


PlanetMySQL Voting: Vote UP / Vote DOWN

When does InnoDB compress and decompress pages?

Июль 22nd, 2011

There are two sections for rows in the page format for InnoDB compressed tables. The compressed section has one or more rows and must be decompressed to access individual rows. The modification log has uncompressed rows and rows can be accessed without decompressing. The modification log is used to avoid decompressing and then possibly recompressing the compressed section on every row change. The buffer pool also has separate uncompressed copies of some pages so that every row read does not require a page decompression.

I want to understand when a page must be decompressed or recompressed. This is definitely an incomplete list.
  • A page is decompressed when a row is read and the uncompressed version of the page is not in the buffer pool.
  • I think a row can be deleted from the compressed section without decompressing it in many cases as I think marking it deleted uses fields not in the compressed section. 
  • Inserts are done to the modification log assuming it has room. When it is full the modification log and data from the compressed section are merged and the result is recompressed. When the result is too large to fit in a compressed page then the page is split and both post-split pages are recompressed.
  • I don't understand the code for UPDATE statements and need to read more source code. The docs state that updates can be done to the modification log but I don't know what that implies.
A compression failure occurs when a page is recompressed and the result is too big. Innodb fixes this by splitting the page. This only works for index-organized tables and InnoDB is index-organized. You can monitor the rate of compression failures using the information schema table INNODB_CMP. This reports the global rate of compression failures. When you have a server with many tables you need to know which tables have the high failure rate and that information is only available in a yet-to-be-published Facebook patch.

But even the changes in the Facebook patch are not sufficient. In some cases it is important to understand which indexes in a table cause the compression failures. The alternative is to guess. All indexes on a table do not compress equally well yet the same compression factor for a table is used for all indexes on it via the KEY_BLOCK_SIZE option to CREATE TABLE.

------------------------------------------------------------------------------------------------------------ 
By Mark Callaghan



PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Cluster Architecture

Июль 19th, 2011
Introduction:

MySQL Cluster is a write-scalable, real-time, ACID-compliant transactional database, combining 99.999% availability with the low TCO of open source. Designed around a distributed, multi-master architecture with no single point of failure, MySQL Cluster scales horizontally on commodity hardware to serve read and write intensive workloads, accessed via SQL and NoSQL interfaces.

MySQL Cluster's real-time design delivers predictable, millisecond response times with the ability to service millions of operations per second. Support for in-memory and disk-based data, automatic data partitioning (sharding) with load balancing and the ability to add nodes to a running cluster with zero downtime allows linear database scalability to handle the most unpredictable web-based workloads.





 MySQL Cluster comprises three types of node which collectively provide service to the application:
   
  • Data nodes manage the storage and access to data.  Tables are automatically sharded across the data nodes which also transparently handle load balancing, replication, failover and self-healing.

  • Application nodes provide connectivity from the application logic to the data nodes. Multiple APIs are presented to the application.  MySQL provides a standard SQL interface, including connectivity to all of the leading web development languages and frameworks. There are also a whole range of NoSQL inerfaces including memcached, REST/JSON, C++ (NDB-API), Java, JPA and LDAP.

  • Management nodes are used to configure the cluster and provide arbitration in the event of a network partition.

 Source: http://www.mysql.com/products/cluster/architecture.html



PlanetMySQL Voting: Vote UP / Vote DOWN

Connecting to HandlerSocket with localhost vs. 127.0.0.1

Февраль 5th, 2011
Using Net::HandlerSocket, here are some fun numbers for a single connection (open & close). When connecting to “localhost”, here’s the strace: open("/etc/hosts", O_RDONLY) = 3 fcntl(3, F_GETFD) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=187, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b909a1a3000 read(3, "# Do not remove the following [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

European Lunchtime Webinar: MySQL — Top 5 Performance Sins

Ноябрь 19th, 2010

Next Thursday Oracle University will be hosting a lunchtime webinar, between 13.00 and 14.00 CET, to help you better understand the common MySQL mistakes impacting performance that you should avoid.

 

Oracle's senior MySQL instructor and consultant Kai Voigt will review the MySQL "no no's" and advise on what simple solutions should be used instead. Kai will also provide an introduction to indexing, rewriting of queries, monitoring the use of temporary tables, and other things you should know about.

 

You can register here for this free webinar taking place on Thursday November 25th at 13.00 CET.


PlanetMySQL Voting: Vote UP / Vote DOWN

Loose index scan vs. covered indexes in MySQL

Ноябрь 18th, 2010

Loose index scan in MySQL can really help optimizing “group by” queries in some cases (for example, if you have only min() and/or max() as your aggregate functions). For example, if you have this query (to find maximum delay for all US flights with departure on Sundays in 2010):

select max(DepDelayMinutes), 	carrier, dayofweek
from ontime_2010
where dayofweek = 7
group by Carrier,  dayofweek

the usual case will be adding a covered index on (dayofweek, Carrier, DepDelayMinutes). And MySQL will use this index fine (using index mean it will use the covered index):

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010
where dayofweek =7 group by Carrier, dayofweek\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: ref
possible_keys: covered
          key: covered
      key_len: 2
          ref: const
         rows: 905138
        Extra: Using where; Using index
1 row in set (0.00 sec)

However, as the dayofweek part has low number of unique values, mysql will have to scan a lots of index entries (estimated rows: 905138).

MySQL can use loose index scan. Unfortunately, a o lots of limitations apply:

  • The query is over a single table.
  • The GROUP BY names only columns that form a leftmost prefix of the index and no other columns.
  • The only aggregate functions used in the select list (if any) are MIN() and MAX(), same column
  • etc… (see docs for details)

As our example query is suitable for loose index scan, we can create another index:

mysql> alter table ontime_2010 add key lis1(Carrier, dayofweek, DepDelayMinutes);

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010
where dayofweek =7 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: DayOfWeek,covered
          key: lis1
      key_len: 5
          ref: NULL
         rows: 19
        Extra: Using where; Using index for group-by
1 row in set (0.01 sec)

Here, Using index for group-by, means that MySQL uses loose index scan.
Also, and it is really great, it works with range on dayofweek too:

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010
where dayofweek > 3 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: DayOfWeek,covered
          key: lis1
      key_len: 5
          ref: NULL
         rows: 19
        Extra: Using where; Using index for group-by
1 row in set (0.00 sec)

And original covered index does not work with ranges in where clause:

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 use index (covered)
where dayofweek > 3 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: covered
          key: covered
      key_len: 2
          ref: NULL
         rows: 2416543
        Extra: Using where; Using index; Using temporary; Using filesort

In the above example, MySQL uses index but still have to create temporary table and filesort.

Now speed comparison:
I’m using “ontime” flight performance statistics data from transtats.bts.gov
The table only consist of data for 2010.
Table size: 5M rows, ~2G in size. Table structure:

CREATE TABLE `ontime_2010` (
  `YearD` int(11) DEFAULT NULL,
  `MonthD` tinyint(4) DEFAULT NULL,
  `DayofMonth` tinyint(4) DEFAULT NULL,
  `DayOfWeek` tinyint(4) DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `Origin` char(5) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
... more fields here ...
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Results (cached index and data):
“where dayofweek = 7″ (ref)

  • Loose index scan: 0 sec
  • Covered index: 0.6 sec

“where dayofweek > 3″ (range)

  • Loose index scan: 0 sec
  • index (+ filesort): 5.53 sec

I will also present the findings from this article (among other things) during the upcoming webinar, Getting the Best MySQL Performance in Your Products: Part 3, Query Tuning, which will take place on Tuesday, Nov 23 (webinar is free, webinar registration)


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Query Engine Scalability Issues

Май 11th, 2010
Lately in the MySQL community, we only hear about scalability or performance improvements of storage engines, but nothing about query engine itself. For example, one classic example being InnoDB; if we look back all the scalability issues that community reported a year back or even few months back; most part of those issues has been [...]
PlanetMySQL Voting: Vote UP / Vote DOWN