Archive for the ‘optimizer’ Category

Faster subqueries with materialization

Апрель 12th, 2012
In a previous post, I analyzed how a query of the famous DBT3 benchmark was
optimized by MySQL. It was this query, named "Q16" in the DBT3 jargon:

select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#23'
and p_type not like 'LARGE PLATED%'
and p_size in (43, 1, 25, 5, 35, 12, 42, 40)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)

group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;

Here is a brief recap of conclusions I had drawn:
  • for this query, MySQL tranforms the IN condition to EXISTS and then
    evaluates it with the "unique_subquery" technique, which does an index
    lookup into the subquery's table.
  • IN is evaluated 120,000 times (once per combined row of the outer tables).
  • The total execution time of query Q16 is 0.65 seconds.
If you look at the original subquery, before IN becomes EXISTS, you will see that it's not correlated, which means that it does not mention columns of tables of the top query's FROM clause ('partsupp' and 'part'). Thus, its resultset is constant throughout execution of the entire top query; here it is:


mysql> select
-> s_suppkey
-> from
-> supplier
-> where
-> s_comment like '%Customer%Complaints%';
+-----------+
| s_suppkey |
+-----------+
| 358 |
| 2820 |
| 3804 |
| 9504 |
+-----------+
4 rows in set (0.00 sec)

The transformation to EXISTS, because it injects equalities like
  `partsupp`.`ps_suppkey` = supplier`.`s_suppkey`
into the subquery's WHERE clause, makes the subquery correlated: it thus has to be executed 120,000 times, so we do 120,000 times this:
  • an index lookup in 'supplier' (which has 10,000 rows)
  • a test of the found row(s) against the LIKE condition.

Intuitively, determining the 4-row resultset once for all, and injecting it into the top query should yield better performance - it is fast to evaluate
  ps_suppkey not in (358, 2820, 3804, 9504) .
Starting from the just released MySQL 5.6.5, this transformation is automatically done by the Optimizer, and is called subquery materialization. The subquery's resultset is determined once for all and stored into an in-memory temporary table. If the temporary table has only 4 rows as in our example, searching for a match in it can be done with a scan; but if it had more rows, a hash index would help greatly. So a hash index is always created on the temporary table's columns. Last, this index is unique: there is no point in storing duplicates, and they would make the table bigger. After this one-time setup has been completed, each evaluation of IN simply does a hash index lookup into the 4-row temporary table.

My former colleague Timour Katchaounov started developing this feature years ago, when working for MySQL/Sun. In the last months, after a round of intensive QA, we have fixed some last bugs in it, in preparation for releasing in 5.6.5. But the feature still had one limitation: it was applicable only if IN was placed at certain positions in the query. For example, it couldn't be used with NOT IN. And query Q16 has a NOT IN! so the Optimizer could not apply subquery materialization to it, and was thus stuck with using EXISTS. Sad!

Why it could not work with NOT IN, is not very easy to explain. It has to do with NULL values, because they sometimes prevent using the hash index. To give an idea, look at this:
  (NULL, 1) NOT IN (SELECT ...)
Per the SQL standard, if the subquery's resultset contains at least one row of the form (x,1) where x is any number (or NULL), then the IN condition is neither TRUE, nor FALSE, it is UNKNOWN. So is the NOT IN condition, because it is the negation of IN, and NOT(UNKNOWN) is UNKNOWN.
Here are example of such rows: (NULL,1), (421,1), (236,1), (5329,1), ad infinitam.
We can see that those rows will not be found by a lookup in the hash index: this index is defined on the two columns, it has a usual "prefix-only" behaviour, which means that it cannot be used to search for "any value in first column, then 1 in second column". As long as the sentence starts with "any value in first column" a table scan is necessary; we should read each row of the temporary table and compare its second column with 1 until we find a matching row. And that:
  • will drag subquery materialization's performance down
  • will drag subquery materialization's code complexity up.
And I have even not covered all problems here: there can be more than two columns, there can be more than one NULL in the left argument of IN, there can also be NULLs inside the subquery's resultset.

In some lucky cases, the scan can be avoided, for example:
  SELECT * FROM table1 WHERE (a,b) IN (SELECT ...)
If (a,b) is (NULL,1), the IN will be UNKNOWN or FALSE. It will be UNKNOWN if the subquery's resultset contains one (x,1) as seen above; otherwise it will be FALSE. No matter what, it will not be TRUE, and this is all that WHERE wants to know - (a,b) can thus be rejected without doing a scan.
Now, for
  SELECT * FROM table1 WHERE (a,b) NOT IN (SELECT ...)
i.e.
  SELECT * FROM table1 WHERE NOT ((a,b) IN (SELECT ...))
things are different: if (a,b) is (NULL,1), the IN will be UNKNOWN or FALSE, as we said. So NOT IN will be UNKNOWN or TRUE. "Hum, can you be more specific?? I need to know if it's TRUE", will the WHERE evaluation code ask. Then we have to do the scan...

So now you understand why subquery materialization was restricted to certain placement of IN.

What I have done recently is to lift this restriction in two simple, however common, cases:
  1. If all outer and inner expressions are not nullable, then no NULL can get in the way, so there is no problem.
  2. If there is only one outer expression (and thus there is only one inner expression), figuring out the correct TRUE/FALSE/UNKNOWN answer is immediate. Understanding why... is left as an exercise to the reader :-)
Those two cases are independent: as long as one is satisfied, subquery materialization can apply to IN, no matter where it is placed (NOT IN, etc).

It turned out to be very easy to code this: I had a working prototype in an afternoon.

Q16 happens to meet the criteria of both cases: columns 'ps_suppkey' and 's_suppkey' are declared NOT NULL (first case), and the subquery has only one outer and one inner expression (second case).

So nowadays MySQL can, and does, use subquery materialization for query Q16; thanks to it, the execution time is down from 0.65 seconds to 0.47 seconds, which is a 25% improvement!

The new technique is visible in EXPLAIN. I want to first show how EXPLAIN was with the EXISTS transformation, so I temporarily disable subquery materialization, then run EXPLAIN, enable  subquery materialization, run EXPLAIN:


mysql> set optimizer_switch='materialization=off';
mysql> explain ...

+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| 1 | PRIMARY | part | ALL | PRIMARY | NULL | NULL | NULL | 199742 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey | i_ps_partkey | 4 | dbt3.part.p_partkey | 2 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | supplier | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+

mysql> set optimizer_switch='materialization=default'; # 'on' would work too
mysql> explain ...

+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| 1 | PRIMARY | part | ALL | PRIMARY | NULL | NULL | NULL | 199742 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey | i_ps_partkey | 4 | dbt3.part.p_partkey | 2 | Using where; Using index |
| 2 | SUBQUERY | supplier | ALL | NULL | NULL | NULL | NULL | 10113 | Using where |
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
If you compare, the big difference is that the third line says SUBQUERY and not DEPENDENT SUBQUERY anymore. DEPENDENT SUBQUERY means that it has be executed once per row of the top query. SUBQUERY means that it is executed only once.
EXPLAIN FORMAT=JSON, another new feature in MySQL 5.6.5, shows more details of materialization:

{
"query_block": {
"select_id": 1,
"ordering_operation": {
"using_temporary_table": true,
"using_filesort": true,
"grouping_operation": {
"using_filesort": true,
"nested_loop": [
{
"table": {
"table_name": "part",
"access_type": "ALL",
"possible_keys": [
"PRIMARY"
],
"rows": 199742,
"filtered": 100,
"attached_condition": "((`dbt3`.`part`.`p_brand` <> 'Brand#23') and (not((`dbt3`.`part`.`p_type` like 'LARGE PLATED%'))) and (`dbt3`.`part`.`p_size` in (43,1,25,5,35,12,42,40)))"
}
},
{
"table": {
"table_name": "partsupp",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"i_ps_partkey"
],
"key": "i_ps_partkey",
"key_length": "4",
"ref": [
"dbt3.part.p_partkey"
],
"rows": 2,
"filtered": 100,
"using_index": true,
"attached_condition": "(not(< in_optimizer >(`dbt3`.`partsupp`.`ps_suppkey`,`dbt3`.`partsupp`.`ps_suppkey` in ( < materialize > (select `dbt3`.`supplier`.`s_suppkey` from `dbt3`.`supplier` where (`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') ), < primary_index_lookup >(`dbt3`.`partsupp`.`ps_suppkey` in < temporary table > on distinct_key where ((`dbt3`.`partsupp`.`ps_suppkey` = `materialized subselect`.`s_suppkey`)))))))",
"attached_subqueries": [
{
"using_temporary_table": true,
"dependent": false,
"cacheable": true,
"table": {
"access_type": "eq_ref",
"key": "< auto_key >",
"rows": 1
},
"query_block": {
"select_id": 2,
"table": {
"table_name": "supplier",
"access_type": "ALL",
"rows": 10113,
"filtered": 100,
"attached_condition": "(`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')"
}
}
}
]
}
}
]
}
}
}
}
Don't forget to scroll the box above to the right, because lines are long. This shows that:
  1. For the top query, we read a row from 'part', then one row from 'partsupp', then execute the subquery.
  2. the very first execution of the subquery materializes (<materialize>) select `dbt3`.`supplier`.`s_suppkey` from `dbt3`.`supplier` where (`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') into a temporary table
  3. Each subquery execution does a lookup on the primary key of this temporary table (<primary_index_lookup> ... in <temporary table>)
  4. Going further down, we see how the temporary table will be filled: it will be the resultset of a table scan ("access_type": "ALL") of 'supplier' with a filtering LIKE condition.
More details on the feature's usage can be found in the manual.

This is the end of this post. I hope that it puts in good light the work we have put into 5.6. There are many other Optimizer features in this version, like EXPLAIN FORMAT=JSON and others; they are described in my colleagues' blog posts.

PlanetMySQL Voting: Vote UP / Vote DOWN

Faster subqueries with materialization

Апрель 12th, 2012
In a previous post, I analyzed how a query of the famous DBT3 benchmark was
optimized by MySQL. It was this query, named "Q16" in the DBT3 jargon:

select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#23'
and p_type not like 'LARGE PLATED%'
and p_size in (43, 1, 25, 5, 35, 12, 42, 40)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)

group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;

Here is a brief recap of conclusions I had drawn:
  • for this query, MySQL tranforms the IN condition to EXISTS and then
    evaluates it with the "unique_subquery" technique, which does an index
    lookup into the subquery's table.
  • IN is evaluated 120,000 times (once per combined row of the outer tables).
  • The total execution time of query Q16 is 0.65 seconds.
If you look at the original subquery, before IN becomes EXISTS, you will see that it's not correlated, which means that it does not mention columns of tables of the top query's FROM clause ('partsupp' and 'part'). Thus, its resultset is constant throughout execution of the entire top query; here it is:


mysql> select
-> s_suppkey
-> from
-> supplier
-> where
-> s_comment like '%Customer%Complaints%';
+-----------+
| s_suppkey |
+-----------+
| 358 |
| 2820 |
| 3804 |
| 9504 |
+-----------+
4 rows in set (0.00 sec)

The transformation to EXISTS, because it injects equalities like
  `partsupp`.`ps_suppkey` = supplier`.`s_suppkey`
into the subquery's WHERE clause, makes the subquery correlated: it thus has to be executed 120,000 times, so we do 120,000 times this:
  • an index lookup in 'supplier' (which has 10,000 rows)
  • a test of the found row(s) against the LIKE condition.

Intuitively, determining the 4-row resultset once for all, and injecting it into the top query should yield better performance - it is fast to evaluate
  ps_suppkey not in (358, 2820, 3804, 9504) .
Starting from the just released MySQL 5.6.5, this transformation is automatically done by the Optimizer, and is called subquery materialization. The subquery's resultset is determined once for all and stored into an in-memory temporary table. If the temporary table has only 4 rows as in our example, searching for a match in it can be done with a scan; but if it had more rows, a hash index would help greatly. So a hash index is always created on the temporary table's columns. Last, this index is unique: there is no point in storing duplicates, and they would make the table bigger. After this one-time setup has been completed, each evaluation of IN simply does a hash index lookup into the 4-row temporary table.

My former colleague Timour Katchaounov started developing this feature years ago, when working for MySQL/Sun. In the last months, after a round of intensive QA, we have fixed some last bugs in it, in preparation for releasing in 5.6.5. But the feature still had one limitation: it was applicable only if IN was placed at certain positions in the query. For example, it couldn't be used with NOT IN. And query Q16 has a NOT IN! so the Optimizer could not apply subquery materialization to it, and was thus stuck with using EXISTS. Sad!

Why it could not work with NOT IN, is not very easy to explain. It has to do with NULL values, because they sometimes prevent using the hash index. To give an idea, look at this:
  (NULL, 1) NOT IN (SELECT ...)
Per the SQL standard, if the subquery's resultset contains at least one row of the form (x,1) where x is any number (or NULL), then the IN condition is neither TRUE, nor FALSE, it is UNKNOWN. So is the NOT IN condition, because it is the negation of IN, and NOT(UNKNOWN) is UNKNOWN.
Here are example of such rows: (NULL,1), (421,1), (236,1), (5329,1), ad infinitam.
We can see that those rows will not be found by a lookup in the hash index: this index is defined on the two columns, it has a usual "prefix-only" behaviour, which means that it cannot be used to search for "any value in first column, then 1 in second column". As long as the sentence starts with "any value in first column" a table scan is necessary; we should read each row of the temporary table and compare its second column with 1 until we find a matching row. And that:
  • will drag subquery materialization's performance down
  • will drag subquery materialization's code complexity up.
And I have even not covered all problems here: there can be more than two columns, there can be more than one NULL in the left argument of IN, there can also be NULLs inside the subquery's resultset.

In some lucky cases, the scan can be avoided, for example:
  SELECT * FROM table1 WHERE (a,b) IN (SELECT ...)
If (a,b) is (NULL,1), the IN will be UNKNOWN or FALSE. It will be UNKNOWN if the subquery's resultset contains one (x,1) as seen above; otherwise it will be FALSE. No matter what, it will not be TRUE, and this is all that WHERE wants to know - (a,b) can thus be rejected without doing a scan.
Now, for
  SELECT * FROM table1 WHERE (a,b) NOT IN (SELECT ...)
i.e.
  SELECT * FROM table1 WHERE NOT ((a,b) IN (SELECT ...))
things are different: if (a,b) is (NULL,1), the IN will be UNKNOWN or FALSE, as we said. So NOT IN will be UNKNOWN or TRUE. "Hum, can you be more specific?? I need to know if it's TRUE", will the WHERE evaluation code ask. Then we have to do the scan...

So now you understand why subquery materialization was restricted to certain placement of IN.

What I have done recently is to lift this restriction in two simple, however common, cases:
  1. If all outer and inner expressions are not nullable, then no NULL can get in the way, so there is no problem.
  2. If there is only one outer expression (and thus there is only one inner expression), figuring out the correct TRUE/FALSE/UNKNOWN answer is immediate. Understanding why... is left as an exercise to the reader :-)
Those two cases are independent: as long as one is satisfied, subquery materialization can apply to IN, no matter where it is placed (NOT IN, etc).

It turned out to be very easy to code this: I had a working prototype in an afternoon.

Q16 happens to meet the criteria of both cases: columns 'ps_suppkey' and 's_suppkey' are declared NOT NULL (first case), and the subquery has only one outer and one inner expression (second case).

So nowadays MySQL can, and does, use subquery materialization for query Q16; thanks to it, the execution time is down from 0.65 seconds to 0.47 seconds, which is a 25% improvement!

The new technique is visible in EXPLAIN. I want to first show how EXPLAIN was with the EXISTS transformation, so I temporarily disable subquery materialization, then run EXPLAIN, enable  subquery materialization, run EXPLAIN:


mysql> set optimizer_switch='materialization=off';
mysql> explain ...

+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| 1 | PRIMARY | part | ALL | PRIMARY | NULL | NULL | NULL | 199742 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey | i_ps_partkey | 4 | dbt3.part.p_partkey | 2 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | supplier | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+

mysql> set optimizer_switch='materialization=default'; # 'on' would work too
mysql> explain ...

+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| 1 | PRIMARY | part | ALL | PRIMARY | NULL | NULL | NULL | 199742 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey | i_ps_partkey | 4 | dbt3.part.p_partkey | 2 | Using where; Using index |
| 2 | SUBQUERY | supplier | ALL | NULL | NULL | NULL | NULL | 10113 | Using where |
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
If you compare, the big difference is that the third line says SUBQUERY and not DEPENDENT SUBQUERY anymore. DEPENDENT SUBQUERY means that it has be executed once per row of the top query. SUBQUERY means that it is executed only once.
EXPLAIN FORMAT=JSON, another new feature in MySQL 5.6.5, shows more details of materialization:

{
"query_block": {
"select_id": 1,
"ordering_operation": {
"using_temporary_table": true,
"using_filesort": true,
"grouping_operation": {
"using_filesort": true,
"nested_loop": [
{
"table": {
"table_name": "part",
"access_type": "ALL",
"possible_keys": [
"PRIMARY"
],
"rows": 199742,
"filtered": 100,
"attached_condition": "((`dbt3`.`part`.`p_brand` <> 'Brand#23') and (not((`dbt3`.`part`.`p_type` like 'LARGE PLATED%'))) and (`dbt3`.`part`.`p_size` in (43,1,25,5,35,12,42,40)))"
}
},
{
"table": {
"table_name": "partsupp",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"i_ps_partkey"
],
"key": "i_ps_partkey",
"key_length": "4",
"ref": [
"dbt3.part.p_partkey"
],
"rows": 2,
"filtered": 100,
"using_index": true,
"attached_condition": "(not(< in_optimizer >(`dbt3`.`partsupp`.`ps_suppkey`,`dbt3`.`partsupp`.`ps_suppkey` in ( < materialize > (select `dbt3`.`supplier`.`s_suppkey` from `dbt3`.`supplier` where (`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') ), < primary_index_lookup >(`dbt3`.`partsupp`.`ps_suppkey` in < temporary table > on distinct_key where ((`dbt3`.`partsupp`.`ps_suppkey` = `materialized subselect`.`s_suppkey`)))))))",
"attached_subqueries": [
{
"using_temporary_table": true,
"dependent": false,
"cacheable": true,
"table": {
"access_type": "eq_ref",
"key": "< auto_key >",
"rows": 1
},
"query_block": {
"select_id": 2,
"table": {
"table_name": "supplier",
"access_type": "ALL",
"rows": 10113,
"filtered": 100,
"attached_condition": "(`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')"
}
}
}
]
}
}
]
}
}
}
}
Don't forget to scroll the box above to the right, because lines are long. This shows that:
  1. For the top query, we read a row from 'part', then one row from 'partsupp', then execute the subquery.
  2. the very first execution of the subquery materializes (<materialize>) select `dbt3`.`supplier`.`s_suppkey` from `dbt3`.`supplier` where (`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') into a temporary table
  3. Each subquery execution does a lookup on the primary key of this temporary table (<primary_index_lookup> ... in <temporary table>)
  4. Going further down, we see how the temporary table will be filled: it will be the resultset of a table scan ("access_type": "ALL") of 'supplier' with a filtering LIKE condition.
More details on the feature's usage can be found in the manual.

This is the end of this post. I hope that it puts in good light the work we have put into 5.6. There are many other Optimizer features in this version, like EXPLAIN FORMAT=JSON and others; they are described in my colleagues' blog posts.

PlanetMySQL Voting: Vote UP / Vote DOWN

Join Optimizations in MySQL 5.6 and MariaDB 5.5

Апрель 5th, 2012

This is the third blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is targeted at the join related optimizations introduced in the optimizer. These optimizations are available in both MySQL 5.6 and MariaDB 5.5, and MariaDB 5.5 has introduced some additional optimizations which we will also look at, in this post.

Now let me briefly explain these optimizations.

Batched Key Access

Traditionally, MySQL always uses Nested Loop Join to join two or more tables. What this means is that, select rows from first table participating in the joins are read, and then for each of these rows an index lookup is performed on the second table. This means many point queries, say for example if table1 yields 1000 rows then 1000 index lookups are performed on table2. Of course this could mean a lot of random lookups in case the dataset does not fit into memory or if the caches are not warm, this also does not take into account locality of access, as you could be reading the same page multiple times. Consider if you have a small buffer pool and a very large number of rows from table2 that do not fit into the buffer pool, then worst case you could be reading the same page multiple times into the buffer pool.
So considering this drawback, Batched Key Access (BKA) optimization was introduced. When BKA is being used then, after the selected rows are read from table1, the values of indexed columns that will be used to perform a lookup on table2 are batched together and sent to the storage engine. The storage engine then uses the MRR interface (which I explained in my previous post) to lookup the rows in table2. So this means that we have traded many point index lookups to one or more index range lookups. This means MySQL can employ many other optimizations like for example if columns other then the secondary key columns are required to be fetched, then MRR interface can instead access the PK by arranging the secondary key values by PK column and then performing the lookup, and then there are other possibilities like InnoDB doing read_ahead by noticing the sequential access pattern.
BKA is available in both MySQL 5.6 and MariaDB 5.5. You can read more about BKA in MySQL 5.6 here and BKA in MariaDB 5.5 here.

However, MariaDB 5.5 has one additional optimization that is used in conjunction with BKA and that is MRR Key-ordered Scan. Let’s see what this optimization actually is.

Key-ordered Scan for BKA

With this optimization the idea of MRR is further extended to improve join performance. As I told you above, when table t1 would be joined to table t2, then selected rows from t1 would be read and then for all rows, index lookup would be performed on t2. Here is where Key-ordered scan comes in. With BKA turned on, the index lookups to t2 are batched together and sent to the MRR interface. Now when the MRR interface is going to perform the lookup on t2 by a secondary key on the common join column, let’s suppose col1, then with Key-ordered scan, the secondary key on col1 would be sorted by col1 i.e. in index order and then the lookup will be performed. So key-ordered scan is basically an extension of MRR to secondary keys. You can read more about Key-ordered Scan for BKA here.

Now there is one additional join optimization in MariaDB that I would like to quickly explain here, and that is Hash Joins.

Hash Join

As I have told before MySQL has only supported one join algorithm and that is Nested Loop Join. MariaDB has introduced a new join algorithm Hash Join. This join algorithm only works with equi-joins. Now let me briefly explain how hash join algorithm works. Suppose you have two tables t1 and t2, that you wish to join together in an equi-join, then the way hash join algorithm will operate is that first a hash table will be created based on the columns of table t1 that are two be used to join rows with table t2. This step is called the build step. After the hash table has been created, rows from table t2 are read and hash function is applied to the columns participating in the join condition and then a hash table lookup is performed, on the hash table we created earlier. This step is known as probe step. This join algorithm works best for highly selective queries, and obviously when the first operand used in the build step is such that the hash table fits in memory. You can read more about the hash join algorithm here.

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #3 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made on MySQL 5.6 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='batched_key_access=on'
join_buffer_size=6M
read_rnd_buffer_size=6M

Also note that the following changes were made on MariaDB 5.5 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=8
join_buffer_size=6M
mrr_buffer_size=6M

Note that I have turned off ICP because I would like to see the affect of BKA on the query times. But I also had to turn on MRR because BKA uses the MRR interface. Note also the additional optimizer switches and variables in case of MariaDB 5.5. You can read more about these variables here.

The query used is:

select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem FORCE INDEX (i_l_orderkey)
where
        c_mktsegment = 'AUTOMOBILE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-09'
        and l_shipdate > '1995-03-09'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

In-memory workload

Now let’s see how effective are the join optimizations when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Now let’s take a look at the graph:

In the graph above the labels mean as follows:
MySQL 5.6 (2) is for MySQL 5.6 w/ buffer size 6M
MariaDB 5.5 (2) is for MariaDB 5.5 w/ buffer size 6M
MariaDB 5.5 (3) is for MariaDB 5.5 w/ buffer size 6M (Hash Join and Key-ordered Scan disabled)

You can see that the lowest query time is for MySQL 5.6 which takes 0.16s less as compared to MySQL 5.5 While with join_buffer_size set to 6M and read_rnd_buffer_size set to 6M, the query time for MySQL 5.6 becomes approximately equal to that of MySQL 5.5. MariaDB 5.5 is quite slow as compared to both MySQL 5.5 and MySQL 5.6. For MariaDB 5.5 the query time decreases when the join_buffer_size is set to 6M and mrr_buffer_size is set to 6M. While the lowest query time for MariaDB 5.5 is when both the Hash Join and the Key-ordered Scan are disabled.

IO bound workload

Now let’s see how effective are the join optimizations when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already. Now let’s take a look at the graph:

The labels in the graph above are same as in the graph for in-memory workload.

BKA makes a huge difference when the workload is IO bound and the query time drops from 2534.41s down to under a minute. MySQL 5.6 has the smallest query time when the join_buffer_size is set to 6M and the read_rnd_buffer_size is set to 6M. MariaDB 5.5 is close, yet slower by ~10s. Another thing that is important to know is that, just increasing the join_buffer_size is not going to provide the best possible query performance. As I mentioned at the start that BKA uses the MRR interface so you should also adjust/increase the size of the buffer that MRR uses appropriately. In my tests I noticed that with just the join_buffer_size increased to 6M, the query time was ~300s while when both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size were set to 6M, the query time dropped to ~40s. So the maximum possible benefit is when size of both the buffers is increased appropriately. Also in the chart above the bottom two bars correspond to MariaDB with Hash Join and Key-ordered Scan enabled, and MariaDB with Hash Join and Key-ordered Scan disabled, and the only difference in query time is 48.78s vs 48.91s, so I don’t see Hash Join and Key-ordered Scan making much of a difference here.

Now let’s take a look at the status counters.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ join_buffer_size=6M & read_rnd_buffer_size=6M MariaDB 5.5 MariaDB 5.5 w/ join_buffer_size=6M & mrr_buffer_size=6M MariaDB 5.5 Hash Join Disabled w/ join_buffer_size=4M & mrr_buffer_size=4M
Created_tmp_disk_tables 0 0 0 0 0 0
Created_tmp_tables 1 1 1 1 1 1
Handler_mrr_init N/A 112 3 1133 1 1
Handler_mrr_key_refills N/A N/A N/A 0 0 0
Handler_mrr_rowid_refills N/A N/A N/A 22 0 0
Handler_read_key 1829910 4440511 4372096 1801917 1790641 1805995
Handler_read_next 2651500 2628103 2586036 2628103 2592816 2615612
Handler_read_rnd_next 23078 22780 22757 22867 23049 23221
Innodb_buffer_pool_read_ahead 1152 23231 130919 23228 130731 131497
Innodb_buffer_pool_read_requests 12947073 10228611 7816154 10251697 9319281 9393396
Innodb_buffer_pool_reads 327963 332092 12889 335080 14067 13384
Innodb_data_read 5G 5.4G 2.2G 5.4G 2.2G 2.2G
Innodb_data_reads 329115 355323 143808 335526 16164 15506
Innodb_pages_read 329115 355323 143808 358308 144798 144881
Innodb_rows_read 4127318 6718351 6611675 6707882 5479445 5527245
Select_scan 1 1 1 1 1 1
Sort_scan 1 1 1 1 1 1

The first obvious improvement is shown by the high numbers of Innodb_buffer_pool_read_ahead when the buffers are sized appropriately, which shows that the IO access pattern has been changed to become sequential. The two other most important numbers are values for Innodb_buffer_pool_reads and Innodb_data_read. We can see that with appropriately sized buffers less no. of Innodb_buffer_pool_reads are done, and less amount of data is read from disk, in fact half the amount of data is read from disk 2.2G vs 5G. However, there is one number in MariaDB 5.5 that is quite large as compared to MySQL 5.6 and that is ‘Handler_mrr_init’. Is it because of this that the query on MariaDB 5.5 is slow as compared to MySQL 5.6. Next interesting thing are the last two columns of the table above and the values for ‘Handler_read_key’, ‘Handler_read_next’ and ‘Handler_read_rnd_next’. MariaDB 5.5 with Hash Joins enabled is doing less Handler reads as compared to when the Hash Joins are disabled, but that did not make much of a difference in the query times.

Conclusion

BKA improves the query time by a huge margin for IO bound workload but does not make much of a difference to in-memory workload. Also BKA relies on both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size and both of these buffers should be increased appropriately for the best possible performance gain. This is not entirely visible in the manual either for MariaDB or MySQL, but you need to appropriately increase read_rnd_buffer_size/mrr_buffer_size because these have an impact on MRR performance, and BKA uses the MRR interface, so these buffers indirectly impact BKA performance. I did not find much of a performance improvement from using Hash Join in MariaDB 5.5 or from Key-ordered Scan for TPCH query #3, in fact disabling both of these provided the best result for MariaDB 5.5 for in-memory workload.

I intend to run tests to see what specific types of queries would benefit from Hash Join as compared to Nested Loop Join, but for now it looks like Nested Loop Join is a much better general purpose join algorithm.


PlanetMySQL Voting: Vote UP / Vote DOWN

Join Optimizations in MySQL 5.6 and MariaDB 5.5

Апрель 5th, 2012

This is the third blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is targeted at the join related optimizations introduced in the optimizer. These optimizations are available in both MySQL 5.6 and MariaDB 5.5, and MariaDB 5.5 has introduced some additional optimizations which we will also look at, in this post.

Now let me briefly explain these optimizations.

Batched Key Access

Traditionally, MySQL always uses Nested Loop Join to join two or more tables. What this means is that, select rows from first table participating in the joins are read, and then for each of these rows an index lookup is performed on the second table. This means many point queries, say for example if table1 yields 1000 rows then 1000 index lookups are performed on table2. Of course this could mean a lot of random lookups in case the dataset does not fit into memory or if the caches are not warm, this also does not take into account locality of access, as you could be reading the same page multiple times. Consider if you have a small buffer pool and a very large number of rows from table2 that do not fit into the buffer pool, then worst case you could be reading the same page multiple times into the buffer pool.
So considering this drawback, Batched Key Access (BKA) optimization was introduced. When BKA is being used then, after the selected rows are read from table1, the values of indexed columns that will be used to perform a lookup on table2 are batched together and sent to the storage engine. The storage engine then uses the MRR interface (which I explained in my previous post) to lookup the rows in table2. So this means that we have traded many point index lookups to one or more index range lookups. This means MySQL can employ many other optimizations like for example if columns other then the secondary key columns are required to be fetched, then MRR interface can instead access the PK by arranging the secondary key values by PK column and then performing the lookup, and then there are other possibilities like InnoDB doing read_ahead by noticing the sequential access pattern.
BKA is available in both MySQL 5.6 and MariaDB 5.5. You can read more about BKA in MySQL 5.6 here and BKA in MariaDB 5.5 here.

However, MariaDB 5.5 has one additional optimization that is used in conjunction with BKA and that is MRR Key-ordered Scan. Let’s see what this optimization actually is.

Key-ordered Scan for BKA

With this optimization the idea of MRR is further extended to improve join performance. As I told you above, when table t1 would be joined to table t2, then selected rows from t1 would be read and then for all rows, index lookup would be performed on t2. Here is where Key-ordered scan comes in. With BKA turned on, the index lookups to t2 are batched together and sent to the MRR interface. Now when the MRR interface is going to perform the lookup on t2 by a secondary key on the common join column, let’s suppose col1, then with Key-ordered scan, the secondary key on col1 would be sorted by col1 i.e. in index order and then the lookup will be performed. So key-ordered scan is basically an extension of MRR to secondary keys. You can read more about Key-ordered Scan for BKA here.

Now there is one additional join optimization in MariaDB that I would like to quickly explain here, and that is Hash Joins.

Hash Join

As I have told before MySQL has only supported one join algorithm and that is Nested Loop Join. MariaDB has introduced a new join algorithm Hash Join. This join algorithm only works with equi-joins. Now let me briefly explain how hash join algorithm works. Suppose you have two tables t1 and t2, that you wish to join together in an equi-join, then the way hash join algorithm will operate is that first a hash table will be created based on the columns of table t1 that are two be used to join rows with table t2. This step is called the build step. After the hash table has been created, rows from table t2 are read and hash function is applied to the columns participating in the join condition and then a hash table lookup is performed, on the hash table we created earlier. This step is known as probe step. This join algorithm works best for highly selective queries, and obviously when the first operand used in the build step is such that the hash table fits in memory. You can read more about the hash join algorithm here.

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #3 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made on MySQL 5.6 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='batched_key_access=on'
join_buffer_size=6M
read_rnd_buffer_size=6M

Also note that the following changes were made on MariaDB 5.5 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=8
join_buffer_size=6M
mrr_buffer_size=6M

Note that I have turned off ICP because I would like to see the affect of BKA on the query times. But I also had to turn on MRR because BKA uses the MRR interface. Note also the additional optimizer switches and variables in case of MariaDB 5.5. You can read more about these variables here.

The query used is:

select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem FORCE INDEX (i_l_orderkey)
where
        c_mktsegment = 'AUTOMOBILE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-09'
        and l_shipdate > '1995-03-09'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

In-memory workload

Now let’s see how effective are the join optimizations when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Now let’s take a look at the graph:

In the graph above the labels mean as follows:
MySQL 5.6 (2) is for MySQL 5.6 w/ buffer size 6M
MariaDB 5.5 (2) is for MariaDB 5.5 w/ buffer size 6M
MariaDB 5.5 (3) is for MariaDB 5.5 w/ buffer size 6M (Hash Join and Key-ordered Scan disabled)

You can see that the lowest query time is for MySQL 5.6 which takes 0.16s less as compared to MySQL 5.5 While with join_buffer_size set to 6M and read_rnd_buffer_size set to 6M, the query time for MySQL 5.6 becomes approximately equal to that of MySQL 5.5. MariaDB 5.5 is quite slow as compared to both MySQL 5.5 and MySQL 5.6. For MariaDB 5.5 the query time decreases when the join_buffer_size is set to 6M and mrr_buffer_size is set to 6M. While the lowest query time for MariaDB 5.5 is when both the Hash Join and the Key-ordered Scan are disabled.

IO bound workload

Now let’s see how effective are the join optimizations when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already. Now let’s take a look at the graph:

The labels in the graph above are same as in the graph for in-memory workload.

BKA makes a huge difference when the workload is IO bound and the query time drops from 2534.41s down to under a minute. MySQL 5.6 has the smallest query time when the join_buffer_size is set to 6M and the read_rnd_buffer_size is set to 6M. MariaDB 5.5 is close, yet slower by ~10s. Another thing that is important to know is that, just increasing the join_buffer_size is not going to provide the best possible query performance. As I mentioned at the start that BKA uses the MRR interface so you should also adjust/increase the size of the buffer that MRR uses appropriately. In my tests I noticed that with just the join_buffer_size increased to 6M, the query time was ~300s while when both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size were set to 6M, the query time dropped to ~40s. So the maximum possible benefit is when size of both the buffers is increased appropriately. Also in the chart above the bottom two bars correspond to MariaDB with Hash Join and Key-ordered Scan enabled, and MariaDB with Hash Join and Key-ordered Scan disabled, and the only difference in query time is 48.78s vs 48.91s, so I don’t see Hash Join and Key-ordered Scan making much of a difference here.

Now let’s take a look at the status counters.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ join_buffer_size=6M & read_rnd_buffer_size=6M MariaDB 5.5 MariaDB 5.5 w/ join_buffer_size=6M & mrr_buffer_size=6M MariaDB 5.5 Hash Join Disabled w/ join_buffer_size=4M & mrr_buffer_size=4M
Created_tmp_disk_tables 0 0 0 0 0 0
Created_tmp_tables 1 1 1 1 1 1
Handler_mrr_init N/A 112 3 1133 1 1
Handler_mrr_key_refills N/A N/A N/A 0 0 0
Handler_mrr_rowid_refills N/A N/A N/A 22 0 0
Handler_read_key 1829910 4440511 4372096 1801917 1790641 1805995
Handler_read_next 2651500 2628103 2586036 2628103 2592816 2615612
Handler_read_rnd_next 23078 22780 22757 22867 23049 23221
Innodb_buffer_pool_read_ahead 1152 23231 130919 23228 130731 131497
Innodb_buffer_pool_read_requests 12947073 10228611 7816154 10251697 9319281 9393396
Innodb_buffer_pool_reads 327963 332092 12889 335080 14067 13384
Innodb_data_read 5G 5.4G 2.2G 5.4G 2.2G 2.2G
Innodb_data_reads 329115 355323 143808 335526 16164 15506
Innodb_pages_read 329115 355323 143808 358308 144798 144881
Innodb_rows_read 4127318 6718351 6611675 6707882 5479445 5527245
Select_scan 1 1 1 1 1 1
Sort_scan 1 1 1 1 1 1

The first obvious improvement is shown by the high numbers of Innodb_buffer_pool_read_ahead when the buffers are sized appropriately, which shows that the IO access pattern has been changed to become sequential. The two other most important numbers are values for Innodb_buffer_pool_reads and Innodb_data_read. We can see that with appropriately sized buffers less no. of Innodb_buffer_pool_reads are done, and less amount of data is read from disk, in fact half the amount of data is read from disk 2.2G vs 5G. However, there is one number in MariaDB 5.5 that is quite large as compared to MySQL 5.6 and that is ‘Handler_mrr_init’. Is it because of this that the query on MariaDB 5.5 is slow as compared to MySQL 5.6. Next interesting thing are the last two columns of the table above and the values for ‘Handler_read_key’, ‘Handler_read_next’ and ‘Handler_read_rnd_next’. MariaDB 5.5 with Hash Joins enabled is doing less Handler reads as compared to when the Hash Joins are disabled, but that did not make much of a difference in the query times.

Conclusion

BKA improves the query time by a huge margin for IO bound workload but does not make much of a difference to in-memory workload. Also BKA relies on both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size and both of these buffers should be increased appropriately for the best possible performance gain. This is not entirely visible in the manual either for MariaDB or MySQL, but you need to appropriately increase read_rnd_buffer_size/mrr_buffer_size because these have an impact on MRR performance, and BKA uses the MRR interface, so these buffers indirectly impact BKA performance. I did not find much of a performance improvement from using Hash Join in MariaDB 5.5 or from Key-ordered Scan for TPCH query #3, in fact disabling both of these provided the best result for MariaDB 5.5 for in-memory workload.

I intend to run tests to see what specific types of queries would benefit from Hash Join as compared to Nested Loop Join, but for now it looks like Nested Loop Join is a much better general purpose join algorithm.


PlanetMySQL Voting: Vote UP / Vote DOWN

Multi Range Read (MRR) in MySQL 5.6 and MariaDB 5.5

Март 21st, 2012
I have written a second blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is aimed at the optimizer enhancement Multi Range Read (MRR). Its available in both MySQL 5.6 and MariaDB 5.5
PlanetMySQL Voting: Vote UP / Vote DOWN

Multi Range Read (MRR) in MySQL 5.6 and MariaDB 5.5

Март 21st, 2012

This is the second blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is aimed at the optimizer enhancement Multi Range Read (MRR). Its available in both MySQL 5.6 and MariaDB 5.5

Now let’s take a look at what this optimization actually is and what benefits it brings.

Multi Range Read

With traditional secondary index lookups, if the columns that are being fetched do not belong to the secondary index definition (and hence covering index optimization is not used), then primary key lookups have to be performed for each secondary key entry fetched. This means that secondary key lookups for column values that do not belong to the secondary index definition can result in a lot of Random I/O. The purpose of MRR is to reduce this Random I/O and make it more sequential, by having a buffer in between where secondary key tuples are buffered and then sorted by the primary key values, and then instead of point primary key lookups, a range lookup is performed on the primary key by using the sorted primary key values.

Let me give you a simple example. Suppose you have the following query executed on the InnoDB table:

SELECT non_key_column FROM tbl WHERE key_column=x

This query will roughly be evaluated in following steps, without MRR:

  1. SELECT key_column, pk_column FROM tbl WHERE key_column=x ORDER BY key_column
    (Note that secondary keys in InnoDB contain primary key columns)
  2. For each pk_column value in step 1 do:
    SELECT non_key_column FROM tbl WHERE pk_column=val

As you can see that the values returned from Step 1 are sorted by the secondary key column ‘key_column’, and then for each value of ‘pk_column’ which is a part of the secondary key tuple, a point primary key lookup is made against base table, the number of these point primary key lookups will be depend on the number of rows that match the condition ‘key_column=x’. You can see that there are a lot of random primary key lookups made.

With MRR, then steps above are changed to the following:

  1. SELECT key_column, pk_column FROM tbl WHERE key_column=x ORDER BY key_column
    (Note that secondary keys in InnoDB contain primary key columns)
  2. Buffer each pk_column value fetched from step 1, and when the buffer is full sort them by pk_column, and do a range primary key lookup as follows:
    SELECT non_key_column from tbl WHERE pk_column IN (…)

As you can see by utilizing the buffer for sorting the secondary key tuples by pk_column, we have converted a lot of point primary key lookups to one or more range primary key lookup. Thereby, converting Random access to one or more sequential access. There is one another interesting thing that has come up here, and that is the importance of the size of the buffer used for sorting the secondary key tuples. If the buffer size is large enough only a single range lookup will be needed, however if the buffer size is small as compared to the combined size of the secondary key tuples fetched, then the number of range lookups will be:

CEIL(S/N)
where,
S is the combined size of the secondary key tuples fetched, and
N is the buffer size.

In MySQL 5.6 the buffer size used by MRR can be controlled by the variable read_rnd_buffer_size, while MariaDB introduces a different variable to control the MRR buffer size mrr_buffer_size. Both buffer sizes default to 256K in MySQL 5.6 and MariaDB 5.5 respectively, which might be low depending on your scenario.

You can read more about the MRR optimization available in MySQL 5.6 here:
http://dev.mysql.com/doc/refman/5.6/en/mrr-optimization.html
and as available in MariaDB 5.5 here:
http://kb.askmonty.org/en/multi-range-read-optimization

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #10 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). I did not use Scale Factor of 40 (InnoDB dataset size ~95G), because the query was taking far too long to execute, ~11 hours in case of MySQL 5.5 and ~5 hours in case of MySQL 5.6 and MariaDB 5.5. Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made in the MySQL config:
optimizer_switch=’index_condition_pushdown=off’
optimizer_switch=’mrr=on’
optimizer_switch=’mrr_sort_keys=on’ (only on MariaDB 5.5)
optimizer_switch=’mrr_cost_based=off’
read_rnd_buffer_size=4M (only on MySQL 5.6)
mrr_buffer_size=4M (only on MariaDB 5.5)

We have turned off ICP optimization for the purpose of this particular benchmark, because we want to see the individual affect of an optimization (where possible). Also note that we have turned off mrr_cost_based, this is because the cost based algorithm used to calculate the cost of MRR when the optimizer is choosing the query execution plan, is not sufficiently tuned and it is recommended to turn this off.

The query used is:

select
        c_custkey,
        c_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
from
        customer,
        orders,
        lineitem,
        nation
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate >= '1993-08-01'
        and o_orderdate < date_add( '1993-08-01' ,interval '3' month)
        and l_returnflag = 'R'
        and c_nationkey = n_nationkey
group by
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
order by
        revenue desc
LIMIT 20;

In-memory workload

Now let's see how effective is MRR when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Ok so now let's take a look at the graph:

MRR doesn't really make any positive difference to the query times for MySQL 5.6, when the workload fits entirely in memory, because there is no extra cost for memory access at random locations versus memory access at sequential locations. In fact there is extra cost added by the buffering step introduced by MRR, and hence, there is a slight increase in query time for MySQL 5.6, increase of 0.02s. But the query times for MariaDB 5.5 are greater than both MySQL 5.5 and MySQL 5.6

IO bound workload

Now let's see how effective is MRR when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already:

MRR does make a lot of difference when the workload is IO bound, the query time is decreased from ~11min to under a minute. The query time is reduced further when the buffer size is set to 4M. Note also that query time for MariaDB is still a little higher by a couple of seconds, when compared to MySQL 5.6.

Now let's take a look at the status counters.

MySQL Status Counters

These status counters were captured when performing the benchmark on IO bound workload, mentioned above.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ read_rnd_bufer_size=4M MariaDB 5.5 MariaDB 5.5 w/ mrr_buffer_size=4M
Created_tmp_disk_tables 1 1 1 1 1
Created_tmp_tables 1 1 1 1 1
Handler_mrr_init N/A 0 0 1 1
Handler_mrr_rowid_refills N/A N/A N/A 1 0
Handler_read_key 508833 623738 622184 508913 507516
Handler_read_next 574320 574320 572889 574320 572889
Handler_read_rnd_next 136077 136094 136366 136163 136435
Innodb_buffer_pool_read_ahead 0 20920 23669 20920 23734
Innodb_buffer_pool_read_requests 1361851 1264739 1235472 1263290 1235781
Innodb_buffer_pool_reads 120548 102948 76882 102672 76832
Innodb_data_read 1.84G 1.89G 1.53G 1.89G 1.53G
Innodb_data_reads 120552 123872 100551 103011 77213
Innodb_pages_read 120548 123868 100551 123592 100566
Innodb_rows_read 799239 914146 912318 914146 912318
Select_scan 1 1 1 1 1
Sort_scan 1 1 1 1 1
  • As you can see from the status counters above that both MySQL 5.6 and MariaDB 5.5 are reporting high numbers for Innodb_buffer_pool_read_ahead which shows that the access pattern was sequential and hence InnoDB decided to do read_ahead, while in MySQL 5.5 no read_ahead was done because the access pattern was not sequential. Another thing to note is that more read_ahead is done when the buffer size used by MRR, is set to 4M, which obviously means that the more index tuples that can fit in the buffer the more sequential the access pattern will be.
  • There is one MRR related variable introduced in MySQL 5.6 and MariaDB 5.5 Handler_mrr_init and another additional one introduced in MariaDB 5.5 Handler_mrr_rowid_refills. Handler_mrr_init is incremented when a MRR range scan is performed, but we can see its only incremented in MariaDB 5.5 and not in MySQL 5.6, is that because of a bug in MySQL 5.6 code? As MRR was used in both MySQL 5.6 and MariaDB 5.5. Handler_mrr_rowid_refills counts how many times the buffer used by MRR had to be reinitialized, because the buffer was small and not all index tuples could fit in the buffer. If this is > 0 then it means Handler_mrr_rowid_refills + 1 MRR range scans had to be performed. As in the table above you can with default buffer size of 256K, MariaDB 5.5 shows that Handler_mrr_rowid_refills = 1, which means the buffer is small and there were 2 MRR range scans needed. But with a buffer size of 4M, we can see that Handler_mrr_rowid_refills = 0, which means that the buffer was big enough and only 1 MRR range scan was needed, which is as efficient as it can be. This is also evident in the query times, which is lower by a couple of seconds when buffer size of 4M is used.
  • Another interesting thing to note is that MySQL 5.6 and MariaDB 5.5 are both reading more rows than MySQL 5.5, as can be seen by the numbers reported for the status counter Innodb_rows_read. While MySQL 5.6 is also reporting increased numbers for the counter Handler_read_key. This is because of how status counter values are incremented when index lookup is performed. As I explained at the start of the post that traditional index lookup (for non-index-only columns) involves, reading an index record, and then using the PK column value in the index record to make a lookup in the PK. Both these lookups are performed in a single call to the storage engine and the counters Handler_read_key and Innodb_rows_read are incremented by ONE. However, when MRR is used then there are two separate calls made to the storage engine to perform the index record read and then to perform the MRR range scan on the PK. This causes the counters Handler_read_key and Innodb_rows_read to be incremented by TWO. It does not actually mean that queries with MRR are performing badly. The interesting thing is that though both MariaDB and MySQL 5.6 are reporting high numbers for Innodb_rows_read, which is completely in line with how the counters behave with MRR, but the value for counter Handler_read_key is more or less the same for MariaDB 5.5 when compared to MySQL 5.5, and this does not make sense to me. Probably its due to a bug in how counter is calculated inside MariaDB?

Other Observations

Sometimes both for MariaDB 5.5 and MySQL 5.6, the optimizer chooses the wrong query execution plan. Let's take a look at what are the good and bad query execution plans.

a. Bad Plan

id      select_type     table   type    possible_keys   key     key_len ref     rows    filtered        Extra
1       SIMPLE  nation  ALL     PRIMARY NULL    NULL    NULL    25      100.00  Using temporary; Using filesort
1       SIMPLE  customer        ref     PRIMARY,i_c_nationkey   i_c_nationkey   5       dbt3.nation.n_nationkey 2123    100.00
1       SIMPLE  orders  ref     PRIMARY,i_o_orderdate,i_o_custkey       i_o_custkey     5       dbt3.customer.c_custkey 7       100.00  Using where
1       SIMPLE  lineitem        ref     PRIMARY,i_l_orderkey,i_l_orderkey_quantity      PRIMARY 4       dbt3.orders.o_orderkey  1       100.00  Using where

b. Good Plan

id      select_type     table   type    possible_keys   key     key_len ref     rows    filtered        Extra
1       SIMPLE  orders  range   PRIMARY,i_o_orderdate,i_o_custkey       i_o_orderdate   4       NULL    232722  100.00  Using where; Rowid-ordered scan; Using temporary; Using filesort
1       SIMPLE  customer        eq_ref  PRIMARY,i_c_nationkey   PRIMARY 4       dbt3.orders.o_custkey   1       100.00  Using where
1       SIMPLE  nation  eq_ref  PRIMARY PRIMARY 4       dbt3.customer.c_nationkey       1       100.00
1       SIMPLE  lineitem        ref     PRIMARY,i_l_orderkey,i_l_orderkey_quantity      PRIMARY 4       dbt3.orders.o_orderkey  2       100.00  Using where

So during cold query runs the optimizer would switch to using plan 'a', which does not involve MRR, and the query time for MySQL 5.6 and MariaDB 5.5 jumps to ~11min (this is the query time for MySQL 5.5) While when it sticks to plan 'b' for MySQL 5.6 and MariaDB 5.5, then query times remain under a minute. So when the correct query execution plan is not used, there is no difference in query times between MySQL 5.5 and MySQL 5.6/MariaDB 5.5 This is another area of improvement in the optimizer, as it is clearly a part of the optimizer's job to select the best query execution plan. I had noted a similar thing when benchmarking ICP, the optimizer made a wrong choice. It looks like that there is still improvement and changes needed in the optimizer's cost estimation algorithm.

MariaDB 5.5 expands the concept of MRR to improve the performance of secondary key lookups as well. But this works only with joins and specifically with Block Access Join Algorithms. So I am not going to cover it here, but will cover it in my next post which will be on Block Access Join Algorithms.

Conclusion

There is a huge speedup when the workload is IO bound, the query time goes down from ~11min to under a minute. The query time is reduced further when buffer size is set large enough so that the index tuples fit in the buffer. But there is no performance improvement when the workload is in-memory, in fact MRR adds extra sorting overhead which means that the queries are just a bit slower as compared to MySQL 5.5 MRR clearly changes the access pattern to sequential, and hence InnoDB is able to do many read_aheads. Another thing to take away is that MariaDB is just a bit slower as compared to MySQL 5.6, may be something for the MariaDB guys to look at.


PlanetMySQL Voting: Vote UP / Vote DOWN

Index Condition Pushdown in MySQL 5.6 and MariaDB 5.5 and its performance impact

Март 13th, 2012
I have been working with Peter in preparation for the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. We are taking a look at and benchmarking optimizer enhancements one by one. So in the same way this blog post is aimed at a new optimizer enhancement Index Condition Pushdown (ICP). Its available in both MySQL 5.6 and MariaDB 5.5
PlanetMySQL Voting: Vote UP / Vote DOWN

Index Condition Pushdown in MySQL 5.6 and MariaDB 5.5 and its performance impact

Март 13th, 2012
I have been working with Peter in preparation for the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. We are taking a look at and benchmarking optimizer enhancements one by one. So in the same way this blog post is aimed at a new optimizer enhancement Index Condition Pushdown (ICP). Its available in both MySQL 5.6 and MariaDB 5.5
PlanetMySQL Voting: Vote UP / Vote DOWN

Index Condition Pushdown in MySQL 5.6 and MariaDB 5.5 and its performance impact

Март 12th, 2012

I have been working with Peter in preparation for the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. We are taking a look at and benchmarking optimizer enhancements one by one. So in the same way this blog post is aimed at a new optimizer enhancement Index Condition Pushdown (ICP). Its available in both MySQL 5.6 and MariaDB 5.5

Now let’s take a look briefly at what this enhancement actually is, and what is it aimed at.

Index Condition Pushdown

Traditional B-Tree index lookups have some limitations in cases such as range scans, where index parts after the part on which range condition is applied cannot be used for filtering records. For example, suppose you have a key defined as:

KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)

and the WHERE condition defined as:

l_partkey = x
and l_quantity >= 1 and l_quantity <= 1+10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'

Then MySQL will use the key as if its only defined as including columns l_partkey and l_quantity and will not filter using the columns l_shipmode and l_shipinstruct. And so all rows matching condition l_partkey = x and and l_quantity >= 1 and l_quantity <= 1+10 will be fetched from the Primary Key and returned to MySQL server which will then in turn apply the remaining parts of the WHERE clause l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON'. So clearly if you have thousands of rows that match l_partkey and l_quantity but only a few hundred that match all the condition, then obviously you would be reading a lot of unnecessary data.

This is where ICP comes into play. With ICP, the server pushes down all conditions of the WHERE clause that match the key definition to the storage engine and then filtering is done in two steps:

  • Filter by the prefix of the index using traditional B-Tree index search
  • Filter by applying the where condition on the index entries fetched

You can read more about index condition pushdown as available in MySQL 5.6 here:
http://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html
and as available in MariaDB 5.5 here:
http://kb.askmonty.org/en/index-condition-pushdown

Now let's take a look at the benchmarks to see how much difference does this really make.

Benchmark results

For the purpose of this benchmark I used TPC-H Query #19 and ran it on TPC-H dataset (InnoDB tables) with scale factors of 2 (InnoDB dataset size ~5G) and 40 (InnoDB dataset size ~95G), so that I could see the speedup in case of in-memory and IO bound workloads. For the purpose of these benchmarks query cache was disabled and the buffer pool size was set to 6G.

The query is:

select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem force index(i_l_partkey),
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#45'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 1 and l_quantity <= 1+10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#24'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 14 and l_quantity <= 14+10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#53'
                and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                and l_quantity >= 28 and l_quantity <= 28+10
                and p_size between 1 and 15
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )

There are two changes that I made to the query and the TPC-H tables' structure.
- Added a new index: KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)
- Added an index hint in the query: lineitem force index(i_l_partkey)

The size of the buffer pool used for the benchmarks is 6G and the disks are 4 5.4K disks in Software RAID5.

In-memory workload

Now let's first take a look at how effective is ICP when the workload is in memory. For the purpose of benchmarking in-memory workload, I used a Scale Factor of 2 (dataset size ~5G), and the buffer pool was warmed up so that the relevant pages were already loaded in the buffer pool:

IO bound workload

And what about IO bound workload, when the dataset does not fit into memory. For the purpose of benchmarking IO bound workload, I used a Scale Factor of 40 (dataset size ~95G) and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already:

Now let's take a look at the status counters to see how much does this optimization make a difference.

MySQL Status Counters

These status counters are taken when performing the benchmark on IO bound workload, mentioned above.

Counter Name MySQL 5.5 MySQL 5.6 MariaDB 5.5
Created_tmp_disk_tables 0 0 0
Created_tmp_tables 0 0 0
Handler_icp_attempts N/A N/A 571312
Handler_icp_match N/A N/A 4541
Handler_read_key 19310 19192 18989
Handler_read_next 579426 4514 4541
Handler_read_rnd_next 8000332 8000349 8000416
Innodb_buffer_pool_read_ahead 81132 81067 81069
Innodb_buffer_pool_read_requests 4693757 1293628 1292675
Innodb_buffer_pool_reads 540805 28468 28205
Innodb_data_read 9.49G 1.67G 1.67G
Innodb_data_reads 621939 109537 29796
Innodb_pages_read 621937 109535 109274
Innodb_rows_read 8579426 8004514 8004541
Select_scan 1 1 1
  • We can see that values for Handler_read_key are more or less the same, because there is no change in the index lookup method, the index range scan is still done the same way as in MySQL 5.5, all the index pages that match the range condition on the l_quantity will still be fetched. Its the optimization afterwards that counts.
  • As we can see there is a huge reduction in the value of Handler_read_next, as with ICP the remaining parts of the WHERE clause are checked directly on the index pages fetched without having to fetch the entire rows from the PK, which means 128x less subsequent reads from the PK.
  • There are two status counters available in MariaDB 5.3/5.5 that are not available in MySQL, Handler_icp_attempts and Handler_icp_match, which show how many times the remaining WHERE condition was checked on the fetched index pages and how many times the index entries matched the condition. The value of Handler_icp_match directly co-relates to that of Handler_read_next. The smaller the ratio of Handler_icp_attempts to Handler_icp_match the better the filtering.
  • Another important thing is that there is 18x less IO page reads in the case of MySQL 5.6 and MariaDB 5.5 and 8x less amount of data read (1.67G compared to 9.49G). This just does not mean reduction in IO but means more free pages are available in the buffer pool to cater to other queries. For example, in the case of MySQL 5.5 (with a buffer pool of 6G), the query will be IO bound even with a warmed up buffer pool, because there is just too much data read. While in the case of MySQL 5.6 and MariaDB 5.5, the queries will have all the pages in-memory with warm caches and still 72% of the buffer pool will be empty to handle other queries.

Other Observations

There are two indexes defined on the table with similar prefixes:

  • KEY `i_l_suppkey_partkey` (`l_partkey`, `l_suppkey`)
  • KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`)

Obviously, the key i_l_partkey is much more restrictive. Although it cannot filter more rows by using traditional B-Tree index lookup used by MySQL, when compared to the key i_l_suppkey_partkey. But after the B-Tree lookup step, the second key can filter a lot more data using ICP on the remaining conditions after l_quantity. Yet MySQL 5.6 and MariaDB 5.5 optimizer does not consider this logic when selecting the index. In all the runs of the benchmark, the optimizer chose the key i_l_suppkey_partkey because the optimizer estimates that both indexes will mean same no. of rows and i_l_suppkey_partkey has a smaller key size. To get around this limitation in the optimizer I had to use index hint (FORCE INDEX). Clearly, selecting the correct index is an important part of optimizer's job, and there is room for improvement there.

Another thing which I feel is that there is still further optimization possible, like moving ICP directly to the index lookup function so that the limitation in the optimizer which prevents other parts of the key after the first range condition are removed.

Conclusion

There is a huge speed up 78 minutes down to 2 minutes in case of completely IO bound workload, and upto 2x speedup in case when the workload is in-memory. There is not much difference here in terms of the performance gains on MariaDB 5.5 vs MySQL 5.6 and both are on-par. This is great for queries that suffer from the weakness of the current MySQL optimizer when it comes to doing multi-column range scans, and the ICP optimization will benefit a lot of your queries like the one I showed in my example. Not only that, because there will be a cut down in the number of data pages read into the buffer pool, it means better buffer pool utilization.

On a last note, I will be posting the benchmark scripts, table definitions, the configuration files for MySQL 5.5/5.6 and MariaDB 5.5 and the description of the hardware used.


PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB 5.3 query optimizer by Sergey Petrunia

Февраль 5th, 2012

Sergey Petrunia of the MariaDB project.

What exactly is not working in MySQL? MySQL is poor at decision support/analytics. With large datasets you need special disk access strategies. Complex queries like insufficient subquery support and big joins are common int he MySQL world.

DBT-3 is used, scale=30, with a 75GB database and run a query “average price of item between a range of dates”. Query time took some 45 minutes to execute. Why? Run iostat -x to see what is going on. See that the CPU is mostly idle, so its an IO-bound load. Next you run SHOW ENGINE INNODB STATUS and you’ll see how many reads per second is happening. Possible solution is to get more RAM or get an SSD (good to speedup OLTP workloads, but analytics over data is probably not viable since SSDs are small and not cheap).

The MySQL/MariaDB solution to the above problem is improved disk access strategies: multi-range read (MRR) and batched key access (BKA). In MariaDB, MRR/BKA need to be enabled (they are not turned on by default). The query time only took 3 minutes 48 seconds, which is some 11.8x faster than the previous 45 minutes.If you look at EXPLAIN output, its almost as same as before, expect the Extra filed. iostat -x will now show some CPU load, svctm down as well (so its not random disk seeks anymore — some 8ms seek time on a regular 7,200rpm disk), SHOW ENGINE INNODB STATUS will show some 10,000 reads per second rather than the previous 200.

If you are on Fedora, check out the Systemtap feature to look at I/O patterns. stap deviceseeks.stp -c “sleep 60″.

Subqueries handling in MariaDB 5.3: check out the Subquery Optimizations Map. Only about 10% of the audience use optimizer hints in MySQL.

Related posts:

  1. MariaDB 5.1.42 released!
  2. MariaDB 5.1.44 released
  3. Where is MariaDB today?


PlanetMySQL Voting: Vote UP / Vote DOWN