Archive for the ‘optimizer’ Category

MariaDB 5.3 query optimizer by Sergey Petrunia

Февраль 5th, 2012

Sergey Petrunia of the MariaDB project.

What exactly is not working in MySQL? MySQL is poor at decision support/analytics. With large datasets you need special disk access strategies. Complex queries like insufficient subquery support and big joins are common int he MySQL world.

DBT-3 is used, scale=30, with a 75GB database and run a query “average price of item between a range of dates”. Query time took some 45 minutes to execute. Why? Run iostat -x to see what is going on. See that the CPU is mostly idle, so its an IO-bound load. Next you run SHOW ENGINE INNODB STATUS and you’ll see how many reads per second is happening. Possible solution is to get more RAM or get an SSD (good to speedup OLTP workloads, but analytics over data is probably not viable since SSDs are small and not cheap).

The MySQL/MariaDB solution to the above problem is improved disk access strategies: multi-range read (MRR) and batched key access (BKA). In MariaDB, MRR/BKA need to be enabled (they are not turned on by default). The query time only took 3 minutes 48 seconds, which is some 11.8x faster than the previous 45 minutes.If you look at EXPLAIN output, its almost as same as before, expect the Extra filed. iostat -x will now show some CPU load, svctm down as well (so its not random disk seeks anymore — some 8ms seek time on a regular 7,200rpm disk), SHOW ENGINE INNODB STATUS will show some 10,000 reads per second rather than the previous 200.

If you are on Fedora, check out the Systemtap feature to look at I/O patterns. stap deviceseeks.stp -c “sleep 60″.

Subqueries handling in MariaDB 5.3: check out the Subquery Optimizations Map. Only about 10% of the audience use optimizer hints in MySQL.

Related posts:

  1. MariaDB 5.1.42 released!
  2. MariaDB 5.1.44 released
  3. Where is MariaDB today?


PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB 5.3 query optimizer by Sergey Petrunia

Февраль 5th, 2012

Sergey Petrunia of the MariaDB project.

What exactly is not working in MySQL? MySQL is poor at decision support/analytics. With large datasets you need special disk access strategies. Complex queries like insufficient subquery support and big joins are common int he MySQL world.

DBT-3 is used, scale=30, with a 75GB database and run a query “average price of item between a range of dates”. Query time took some 45 minutes to execute. Why? Run iostat -x to see what is going on. See that the CPU is mostly idle, so its an IO-bound load. Next you run SHOW ENGINE INNODB STATUS and you’ll see how many reads per second is happening. Possible solution is to get more RAM or get an SSD (good to speedup OLTP workloads, but analytics over data is probably not viable since SSDs are small and not cheap).

The MySQL/MariaDB solution to the above problem is improved disk access strategies: multi-range read (MRR) and batched key access (BKA). In MariaDB, MRR/BKA need to be enabled (they are not turned on by default). The query time only took 3 minutes 48 seconds, which is some 11.8x faster than the previous 45 minutes.If you look at EXPLAIN output, its almost as same as before, expect the Extra filed. iostat -x will now show some CPU load, svctm down as well (so its not random disk seeks anymore — some 8ms seek time on a regular 7,200rpm disk), SHOW ENGINE INNODB STATUS will show some 10,000 reads per second rather than the previous 200.

If you are on Fedora, check out the Systemtap feature to look at I/O patterns. stap deviceseeks.stp -c “sleep 60″.

Subqueries handling in MariaDB 5.3: check out the Subquery Optimizations Map. Only about 10% of the audience use optimizer hints in MySQL.

Related posts:

  1. MariaDB 5.1.42 released!
  2. MariaDB 5.1.44 released
  3. Where is MariaDB today?


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.6.4 Development Milestone Now Available!

Декабрь 20th, 2011

I am pleased to announce that the MySQL Database 5.6.4 development milestone release ("DMR") is now available for download (select the Development Release tab). MySQL 5.6.4 includes all 5.5 production-ready features and provides an aggreation of all of the new features that have been released in earlier 5.6 DMRs.  5.6.4 adds many bug fixes and more new "early and often" enhancements that are development and system QA complete and ready for Community evaluation and feedback.  You can get the complete rundown of all the new 5.6.4 specific features here.

For those following the progression of the 5.6 DMRs as the trains leave the station, you should bookmark these MySQL Engineering development team specific blogs:

You can also track the thought and innovation leaders on the MySQL Optimizer and the new Optimizer specific improvements in 5.6.4 by following the MySQL Optimizer Team member blogs:

And of course you can follow others on the Optimizer team and all of MySQL Engineering teams by bookmarking/subscribing to PlanetMySQL.

We look forward to your feedback on MySQL 5.6.4, so please download your copy now and help us make a better MySQL. 

As always, a sincere thanks for your continued support of MySQL!   



PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.6.4 Development Milestone Now Available!

Декабрь 20th, 2011

I am pleased to announce that the MySQL Database 5.6.4 development milestone release ("DMR") is now available for download (select the Development Release tab). MySQL 5.6.4 includes all 5.5 production-ready features and provides an aggreation of all of the new features that have been released in earlier 5.6 DMRs.  5.6.4 adds many bug fixes and more new "early and often" enhancements that are development and system QA complete and ready for Community evaluation and feedback.  You can get the complete rundown of all the new 5.6.4 specific features here.

For those following the progression of the 5.6 DMRs as the trains leave the station, you should bookmark these MySQL Engineering development team specific blogs:

You can also track the thought and innovation leaders on the MySQL Optimizer and the new Optimizer specific improvements in 5.6.4 by following the MySQL Optimizer Team member blogs:

And of course you can follow others on the Optimizer team and all of MySQL Engineering teams by bookmarking/subscribing to PlanetMySQL.

We look forward to your feedback on MySQL 5.6.4, so please download your copy now and help us make a better MySQL. 

As always, a sincere thanks for your continued support of MySQL!   



PlanetMySQL Voting: Vote UP / Vote DOWN

More Early Access Features in the MySQL 5.6.3 Development Milestone!

Октябрь 3rd, 2011

For those with an interest in MySQL, this week at Oracle OpenWorld has gotten off to a great start.  Demonstrating how Oracle drives MySQL innovation Tomas' "State of the Dolphin" keynote on Monday gave a great overview of the new MySQL products that have recently been delivered:

Of these announcements I am most excited about the new early access features ready for community evaluation in the MySQL 5.6.3 DMR.  Since 5.5 was released last December we have gotten community and customer feedback that it is the best, highest quality release to date.  5.6 builds on 5.5 by improving:

The MySQL Optimizer for better query performance, scale, diagnostics
Oracle owns the MySQL optimizer and is making a huge investment in engineering and re-factoring to ensure the optimizer evolves to meet the most demanding and complex query requirements.  To this end, the 5.6.3 optimizer now includes:

File sort optimizations for queries with SELECT *, ORDER by non-indexed column, with small LIMIT values 

Currently such queries that ORDER by a non-indexed column perform a full table scan, create sort operations for entire table, potentially create temp tables, toss out the unneeded rows and merge the result set.  With the new default behavior the whole table is scanned, but only the needed rows are sorted and returned in the result set.  Early internal tests show 3x improvement in execution times when using the default sort buffer (of course your mileage may vary.)

Index Condition Pushdown ("ICP")

By default, the optimizer now pushes indexed columns in WHERE clause down to the storage engine for evaluation.  For example, for this table and related query on secondary key of postalcode+age:

CREATE TABLE person (
      personid INTEGER PRIMARY KEY,
      firstname CHAR(20),
      lastname CHAR(20),
      postalcode INTEGER,
      age INTEGER,
      address CHAR(50),
      KEY k1 (postalcode,age)
   ) ENGINE=InnoDB;

SELECT lastname, firstname FROM person
   WHERE postalcode BETWEEN 5000 AND 5500 AND age BETWEEN 21 AND 25; 

In this case the optimizer will use a range scan on postalcode from 5000-5500 (on index) and push the evaluation of age to InnoDB which will evaluate the value of age "BETWEEN 21 and 25".  For each match, InnoDB will return a row back to the server, which will continue requesting/receiving rows until all rows have been evaluated and returned.  Early non-scientific testing shows:

With 5 million rows in table, random values, running on commodity, desktop-grade hardware
RESULTS:
ICP disabled: Disk bound (default buffer pool 128 Mb) the query will take 15 seconds
ICP disabled: All data in memory (buffer pool 1.5 Gb) the query will take 1.4 seconds
ICP enabled: execution time reduced to 90 ms for both

Batched Key Access ("BKA") and Multi-Range Read ("MRR")

Multi-Range Read was introduced in the 5.6.2 DMR. MRR improves the performance of queries that use secondary indexes, by scanning one or more index ranges used in a query, sorting the associated disk blocks for the row data, then reading those disk blocks using larger sequential I/O requests. The speedup benefits operations such as
range index scans and equi-joins on indexed columns.

BKA builds on MRR by improving on the performance of disk-bound join queries.  With BKA the server fills up the join buffer with "join keys".  The server then sends batches of join keys to the storage engine layer (InnoDB or MyISAM, and soon NDB) for processing.  The storage engine then uses MRR to sort indexes according to the data sequence so the data pages can be accessed in sequence for efficiency.

Early testing of DBT3 Q3: Customer Distribution Query on InnoDB, buffer pool = 50 MB, 4GB InnoDB data file, and disk/memory ratio (~2% in memory) has shown execution time drops from 2000 sec to 10 sec with a sufficiently large join buffer.

EXPLAIN for INSERT/UPDATE/DELETE

EXPLAIN for SELECT has been available since MySQL 5.0.  This new feature is a long standing request from our community and customer base.  Best to get straight to an example of how it is used:

CREATE TABLE t1(c1 INT, c2 INT, KEY(c2), KEY(c2, c1));

EXPLAIN UPDATE t1 SET c1 = 10 WHERE c2 = 1;

mysql> EXPLAIN UPDATE t1 SET c1 = 10 WHERE c2 = 1;

+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows |  | Extra    |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | range | c2,c2_2       | c2   | 5       | NULL |    1 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

In this example the optimizer will use a type=range (scan) on key=c2 to identify and UPDATE the affected row(s).

Optimizer Traces
This feature allows DBAs, developers to trace the decision-making steps of the optimizer.  The TRACE output shows how the EXPLAIN plan was generated and the decision points resolved along the way.  An example would be:

SET SESSION.OPTIMIZER_TRACE=enabled=on";

SELECT (SELECT 1 FROM t6 WHERE d = c) AS RESULT FROM t5 ;

select * from information_schema.OPTIMIZER_TRACE;

With output snippet:
...

"records_estimation": [
  {
    "database": "test",
    "table": "t6",
    "range_analysis": {
      "table_scan": {
         "records": 2,
         "cost": 4.5034
      },
      "potential_range_indices": [
        {
          "index": "d",
          "usable": true,
          "key_parts": [
            "d"
          ]
        }
      ],
      "best_covering_index_scan": {
        "index": "d",
        "cost": 1.4233,
        "chosen": true
       },

...

Optimizer tracing works for SELECT, EXPLAIN SELECT, INSERT or REPLACE ( with VALUES or SELECT), UPDATE/DELETE, SET, DO, DECLARE/CASE/IF/RETURN.

Other Optimizer related improvements in 5.6.3 include:

Late materialization of views/subqueries in FROM clause
This improvement enables fast EXPLAINs for views/subqueries and defers/avoids materialization when possible, enabling for quicker bail out if query is abandoned or canceled. Internal tests show a 240x improvement in execution time (drops from 8 min to 2 sec)

Persistent Optimizer Statistics for InnoDB
This allows for more stable, accurate InnoDB statistics and minimizes the overhead of recalculating statistics on start up or crash recovery.  Statistics can be manually updated when needed.

The 5.6.3 DMR also includes new features and improvements to the following:

Performance Schema now includes instrumentation for:

  • Statements/execution stages - What are my most resource intensive queries?
  • Table and Index I/O - Which app tables/indexes cause the most load?
  • Table Locks - Which tables cause the most contention?
  • Users/Hosts/Accounts level resource consumption - Who are the most resource intensive users, hosts, accounts?
  • Network I/O - Is it the network or my app?  How long do sessions idle?
  • Aggregated summaries by thread, user, host, account, object

InnoDB

  • New INFORMATION_SCHEMA tables (metrics, system and buffer pool information) for better diagnostics
  • Dump and restore buffer pool for quicker startup and recovery
  • Reduce contention during file extension for better concurrency

Replication

  • Crash-safe slaves and binlog, replication checksums for better data integrity
  • Multi-threaded slaves for better performance
  • time-delayed replication for better testing, data integrity
  • Informational log events for easier troubleshooting

And more...

You can learn more about the 5.6.3 DMR and other early access features by visiting the MySQL Developers Zone.  Here you will find technical articles, the MySQL Database 5.6.3 and MySQL Cluster 7.2 DMR downloads, supporting documentation and all related PlanetMySQL blogs.

If you are attending Oracle Open World this week please plan to attend as many of the 47 MySQL sessions as you can, including my sessions on the "MySQL Roadmap" (Tuesday at 5:00 pm) and "Using MySQL with Other Oracle Products" (Wednesday at 11:45 am PT).  Both are in the Marriott Marquis (just up the block from the Moscone Center), Golden Gate C2.  Also, be sure to join us at the MySQL Community Reception .  There will be many familiar faces and friends there, and it is open to all, even if you are not attending OOW.

As always, thanks for your continued support of MySQL!

 






PlanetMySQL Voting: Vote UP / Vote DOWN

Setting optimizer search depth in MySQL

Август 11th, 2011
A customer of ours had an interesting problem regarding a query that was taking too long, around 55s. Looking at the query with the query profiler we found that it was spending most of its time in the "statistics" phase. Now the query was pretty complex, it contained nearly 20 tables with INNER JOINs, LEFT JOINs and even some subqueries. However the tables were small and fetching all the data shouldn't have taken the 55 seconds the query was taking. The problem was that the optimiser was spending too much time evaluating and finding the optimal execution plan.

There are two options in MySQL with which you can control the optimiser's behaviour a bit. The first one is optimizer_prune_level. The pruner discards non-optimal execution plans early without evaluating them fully. It is turned on by default and is not recommended to turn off unless there's a really good reason. For testing purposes we turned the pruner off for this query, but after evaluating the query for over half an hour we gave up on the test and decided to call it infinity. This is understandable, for a 20 table join there are 20! (~2.4 x 10^18) possible execution paths, so evaluating all of them would take forever (now this query contained LEFT JOINs and subqueries so that number is not exactly correct, but still used here to spread fear).

The second interesting option is the optimizer_search_depth. This defines how deep into the execution path the optimizer should look before deciding which plan to use. If this is set to a value higher than the number of tables in the query it means that all possible execution plans (except pruned ones) will be examined. If it's set lower than the number of tables in the query, the optimiser will not evaluate each path to its full extent but choose the first level based on all plans evaluated to the specified depth and then do a new evaluation for the next level and so on until the full execution plan has been chosen. Trying a few different values we got the following results:

SET SESSION optimizer_search_depth = 1;
-> statistics 0.000591

SET SESSION optimizer_search_depth = 5;
-> statistics 0.002321

SET SESSION optimizer_search_depth = 10;
-> statistics 0.365812

SET SESSION optimizer_search_depth = 15;
-> statistics 5.054150

SET SESSION optimizer_search_depth = 0;
-> statistics 0.026904


All of the above are much better than the 58.497217s we got with the default search depth of 62. Note that the value 0 (zero) is a special case where the optimiser chooses and sets the optimal search depth for each query, thus adding a bit of overhead to the optimisation process.

So why not just tune down the search depth to 1 for all queries? The problem here is that you can then miss optimal execution plans as the optimizer won't go deep enough down the execution plan before choosing which plan to use at each level. You might get an execution plan that is horribly wrong. So the best way to go if you have many queries with more than 15 tables is to choose an intermediate value of 7-8 or so, which will produce the optimal execution plan in 99% of all cases, but won't waste too much time finding it.

The best solution is of course to optimize all the 15+ table queries yourself but that's another story!

PlanetMySQL Voting: Vote UP / Vote DOWN

More on OR-conditions considered bad… And an apology..

Июль 20th, 2011
In my recent post on OR-conditions I made a mistake, and I appologize for that. I made the statement that MySQL will only use 1 index per statement, whatever you do.

This is no longer true, as a matter of fact, and that has been the case since MySQL 5.0 and I should have checked. MySQL is actually able to use index_merge. An explanation why I didn't look for thi more carefully, yes an explanation, not an excuse, is that the optimizer doesn't seem to want to use this very often. Which is too bad.

So, with this in mind, and using the same table as in the previous post, let's look at index_merge in action. Or possibly, not so much in action. Let's recap what the table looks like:
CREATE TABLE `product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`brand_id` int(11) NOT NULL,
`quantity` int(11) NOT NULL,
`weight` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_brand` (`brand_id`),
KEY `ix_weight_brand` (`weight`,`brand_id`),
KEY `ix_quantity_brand` (`quantity`,`brand_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2321268 DEFAULT CHARSET=utf8
OK, fair enough, one single table with a bunch of indexes. Looking at the index_merge documentation, we can see that if we have an OR condition with both sides appropriately indexed, then this algoritm would execute each path and then do a sort-merge of the result. Let's try with a simple example, using a similar query to the one used last time, except that we are to ignore the brand_id column this time:
EXPLAIN SELECT id FROM product WHERE weight = 41 OR quantity = 78;
and we get this:
+----+-------------+---------+-------------+-----------------------------------+-----------------------------------+---------+------+-------+------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------------+-----------------------------------+-----------------------------------+---------+------+-------+------------------------------------------------------------------+
| 1 | SIMPLE | product | index_merge | ix_weight_brand,ix_quantity_brand | ix_weight_brand,ix_quantity_brand | 4,4 | NULL | 25729 | Using sort_union(ix_weight_brand,ix_quantity_brand); Using where |
+----+-------------+---------+-------------+-----------------------------------+-----------------------------------+---------+------+-------+------------------------------------------------------------------+
That is cool! We are seeing index_merge in action here! Coolness! So, knowing that, let's see if we can get index_merge to work for us in the case which we looked at last time, where we also had a brand_id column in the query. There are indexes on brand_id combined with both quantity and weight, the exact same two indexes used above actually, so adding brand_id should produce the same nice execution plan, but of course reduce the number of rows returned. Lets try it
EXPLAIN SELECT sql_no_cache id FROM product WHERE (brand_id = 6 AND weight = 41) OR (brand_id = 6 AND quantity = 78)
And we get this:
+----+-------------+---------+------+--------------------------------------------+----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+--------------------------------------------+----------+---------+-------+------+-------------+
| 1 | SIMPLE | product | ref | ix_brand,ix_weight_brand,ix_quantity_brand | ix_brand | 4 | const | 4291 | Using where |
+----+-------------+---------+------+--------------------------------------------+----------+---------+-------+------+-------------+
No luck there. For some reason, the optimizer seems to dislike the index_merge access method, except in the most obvious of cases. But hey, we don't give up that easily, do we, we can use a force index, right? Like this:
EXPLAIN SELECT sql_no_cache id FROM product FORCE INDEX (ix_weight_brand,ix_quantity_brand) WHERE (brand_id = 6 AND weight = 41) OR (brand_id = 6 AND quantity = 78);
And the result is this:
+----+-------------+---------+-------------+-----------------------------------+-----------------------------------+---------+------+------+-------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------------+-----------------------------------+-----------------------------------+---------+------+------+-------------------------------------------------------------+
| 1 | SIMPLE | product | index_merge | ix_weight_brand,ix_quantity_brand | ix_weight_brand,ix_quantity_brand | 8,8 | NULL | 31 | Using union(ix_weight_brand,ix_quantity_brand); Using where |
+----+-------------+---------+-------------+-----------------------------------+-----------------------------------+---------+------+------+-------------------------------------------------------------+
What is annoying here is that this query, using FORCE INDEX actually hits only 31 rows according to the statistics, whereas the one not using FORCE INDEX potentially hits 4291. Why the optimizer determines the latter to be faster I do not know, but it doesn't seem right to me.

In the example given here, I was using a brand_id of 6. That particular brand_id has less entries than the other ones, so lets giva a shot using brand_id 4, which takes a bit longer. The SELECT using a UNION then looks like this:
SELECT sql_no_cache id FROM product WHERE brand_id = 4 AND weight = 41 UNION SELECT id FROM product WHERE brand_id = 4 AND quantity = 78;
and the one using FORCE INDEX looks like this:
SELECT sql_no_cache id FROM product FORCE INDEX (ix_weight_brand,ix_quantity_brand) WHERE (brand_id = 4 AND weight = 41) OR (brand_id = 4 AND quantity = 78);
Both of these use the same access path: A merge sort using the indexes ix_weight_brand and ix_quantity_brand. Which one do I prefer then? My personal opinion (but it is just that: An opinion that is personal) is to use the UNION, based on four facts:
  • When running these two statements, side by side on the same data and using SQL_NO_CACHE (i.e. not using the query cache), the UNION is consistently faster. Not that the index_merge is much slower or anything, in particular not compared to when using the ix_brand index that is preferred by the optimizer, unless I tell it not to, but the UNION is still faster.
  • Sometimes I could see the optimizer still not doing it's job correctly, even with the FORCE INDEX in place. In some cases only one of the indexes I forced would be used. Don't ask me why, and I cannot reproduce it now, so maybe it was my eyeglassed having fun with me.
  • The UNION construct means that I can use ANSI SQL, the FORCE INDEX not so. This is important to me, as I want to keep my options open when it comes to databases. Which doesn't mean I always use ANSI SQL, but if I have the choice between ANSI and non-ANSI SQL for two statements that are otherwise similar, I choose ANSI SQL.
  • I have a feeling that in my case, the UNION will be more flexible. If more indexes and conditions are added, the FORCE INDEX part will be difficult to maintain, whereas in the UNION this will be easier. Which doesn't mean that I particularily enjoy using a specific SQL declarative construct to optimze performance, I would much rather want the optimizer to deal with this for me. But it doesn't.
In conclusion, yes, I was wrong, I admit it, MySQL sure can use two indexes and do an index merge. But I was right in the sense that this seems to happen rarely, and that the optimizer isn't really doing it's job properly here anyway. But I am glad there is some openings for fixing this, as an access methods exists and the optimizer knows about it.

/Karlsson
Who was wrong! I admit it!

PlanetMySQL Voting: Vote UP / Vote DOWN

What does Handler_read_rnd mean?

Июнь 16th, 2010

MySQL’s SHOW STATUS command has two counters that are often confusing and result in “what does that mean?” questions:

  1. Handler_read_rnd
  2. Handler_read_rnd_next

As I understand it, there is some historical context to the choice of names here, hearkening back to before I was involved with MySQL, way back when it was a wrapper around ISAM tables — or even earlier, I don’t know. (Unireg, anyone?) In any case, these two variables deserve a little explanation.

Both counters indicate the number of times the corresponding storage engine API function has been called. In olden times, the storage engine API was called the handler API, which is why the variables begin with Handler_.

Handler_read_rnd counts the number of times the handler::rnd_pos() method is called. This method fetches a row from a table based on a “fixed position,” i.e. a random-read. What this actually means varies between storage engines. For MyISAM, position really means a byte offset from the beginning of the file. For InnoDB, it means to read a row based on a primary key value.

Handler_read_rnd_next is incremented when handler::rnd_next() is called. This is basically a cursor operation: read the “next” row in the table. The operation advances the cursor position so the next time it’s called, you get the next row.

The naming convention probably doesn’t make as much sense today as it did historically. Monty told me that “rnd” meant “random,” but now I can’t find a reference to that conversation. In any case, one of the operations is basically a random read (if you think of tuples as having a natural order, which historically they did), and the other is just a read-next operation; there’s nothing random about it, really. I would welcome any corrections from those who know more about this than I do.

Why would either of these be called? Usually Handler_read_rnd is called when a sort operation gathers a list of tuples and their “position” values, sorts the tuples by some criterion, and then traverses the sorted list, using the position to fetch each one. This is quite likely to result in retrieving rows from random points in the table, although that might not actually result in random IO if the data is all in memory. Handler_read_rnd_next is usually called for each row of a full or partial table scan.

PS: I’m being a bit vague about the sort-and-fetch algorithm; there are really two algorithms, and they are smarter than I’m giving them credit for. But that’s a topic for another blog post.


Entry posted by Baron Schwartz | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

EXPLAIN EXTENDED can tell you all kinds of interesting things

Июнь 16th, 2010

While many people are familiar with the MySQL EXPLAIN command, fewer people are familiar with "extended explain" which was added in MySQL 4.1

EXPLAIN EXTENDED can show you what the MySQL optimizer does to your query. You might not know this, but MySQL can dramatically change your query before it actually executes it. This process is called query rewriting, and it is a part of any good SQL optimizer. EXPLAIN EXTENDED adds a warning message to the EXPLAIN output which displays additional information, including the rewritten query.

To take a look at EXPLAIN EXTENDED, I'll start with three empty tables. It is important to note that the tables are empty because the MySQL optimizer treats empty tables (and incidentally, tables with only one row) differently than tables which contain more than one row. More about that in a bit.

SQL:
  1. mysql> CREATE TABLE j1 (c1 int);
  2. Query OK, 0 rows affected (0.16 sec)
  3.  
  4. CREATE TABLE j2 (c1 int);
  5. Query OK, 0 rows affected (0.11 sec)
  6.  
  7. mysql> CREATE TABLE j3 (c1 int);
  8. Query OK, 0 rows affected (0.10 sec)

SQL:
  1. mysql> EXPLAIN extended SELECT j1.c1 FROM j1, j2, j3 WHERE j1.c1 = j2.c1 AND j3.c1 = j1.c1;
  2. +----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
  3. | id | select_type | TABLE | type | possible_keys | KEY  | key_len | ref  | rows | Extra                                               |
  4. +----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
  5. 1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const TABLES |
  6. +----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
  7. 1 row IN SET, 1 warning (0.04 sec)
  8.  
  9. mysql> SHOW warnings;
  10. +-------+------+-------------------------------------------------------------------------------+
  11. | Level | Code | Message                                                                       |
  12. +-------+------+-------------------------------------------------------------------------------+
  13. | Note  | 1003 | SELECT '0' AS `c1` FROM `test`.`j1` JOIN `test`.`j2` JOIN `test`.`j3` WHERE 0 |
  14. +-------+------+-------------------------------------------------------------------------------+
  15. 1 row IN SET (0.00 sec)

You might notice a few odd things about this EXPLAIN. First, there are no tables listed. Taking a look at the Extra column we see that MySQL mentions 'const' tables. A 'const' table is a table that contains 0 or 1 rows, or a table on which all parts of a primary key or unique key lookup are satisfied in the where clause. If a 'const' table contains no rows, and it is not used in an OUTER JOIN, then MySQL can immediately return an empty set because it infers that there is no way that rows could be returned. MySQL does this by adding the WHERE clause in the query with 'where 0'.

Let's now look at what happens after the value (1) is inserted into each of the tables. Each table contains only a single row, and the value in each table is 1.

SQL:
  1. mysql> INSERT INTO j1 VALUES (1); INSERT INTO j2 SELECT * FROM j1; INSERT INTO j3 SELECT * FROM j2;
  2. Query OK, 1 row affected (0.00 sec)
  3.  
  4. mysql> EXPLAIN extended SELECT j1.c1 FROM j1, j2, j3 WHERE j1.c1 = j2.c1 AND j3.c1 = j1.c1;
  5. +----+-------------+-------+--------+---------------+------+---------+------+------+-------+
  6. | id | select_type | TABLE | type   | possible_keys | KEY  | key_len | ref  | rows | Extra |
  7. +----+-------------+-------+--------+---------------+------+---------+------+------+-------+
  8. 1 | SIMPLE      | j1    | system | NULL          | NULL | NULL    | NULL |    1 |       |
  9. 1 | SIMPLE      | j2    | system | NULL          | NULL | NULL    | NULL |    1 |       |
  10. 1 | SIMPLE      | j3    | system | NULL          | NULL | NULL    | NULL |    1 |       |
  11. +----+-------------+-------+--------+---------------+------+---------+------+------+-------+
  12. 3 rows IN SET, 1 warning (0.00 sec)
  13.  
  14. mysql> SHOW warnings;
  15. +-------+------+-------------------------------------------------------------------------------+
  16. | Level | Code | Message                                                                       |
  17. +-------+------+-------------------------------------------------------------------------------+
  18. | Note  | 1003 | SELECT '1' AS `c1` FROM `test`.`j1` JOIN `test`.`j2` JOIN `test`.`j3` WHERE 1 |
  19. +-------+------+-------------------------------------------------------------------------------+
  20. 1 row IN SET (0.00 sec)

You should now notice that the tables are actually listed in the EXPLAIN output, but also notice that the type field is set to 'system'. A 'system' table is a special case of 'const' table which is used when a table contains only one row. The contents of these tables are read before the query begins executing. Because of this, MySQL can compare the constant values before completely formulating the plan. You will notice the MySQL replaces the WHERE clause with 'where 1' because it knows that all the const tables contain equal values. If they did not, the above plan with the 'where 0' would be generated.

Finally, lets insert a few more rows and test the plan:

SQL:
  1. mysql> INSERT INTO j1 VALUES (1); INSERT INTO j2 SELECT * FROM j1; INSERT INTO j3 SELECT * FROM j2;
  2.  
  3. mysql> EXPLAIN extended SELECT j1.c1 FROM j1, j2, j3 WHERE j1.c1 = j2.c1 AND j3.c1 = j1.c1;
  4. +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
  5. | id | select_type | TABLE | type | possible_keys | KEY  | key_len | ref  | rows | Extra       |
  6. +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
  7. 1 | SIMPLE      | j1    | ALL  | NULL          | NULL | NULL    | NULL |    2 |             |
  8. 1 | SIMPLE      | j2    | ALL  | NULL          | NULL | NULL    | NULL |    3 | USING WHERE |
  9. 1 | SIMPLE      | j3    | ALL  | NULL          | NULL | NULL    | NULL |    4 | USING WHERE |
  10. +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
  11. 3 rows IN SET, 1 warning (0.00 sec)
  12.  
  13. mysql> SHOW warnings;
  14. +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  15. | Level | Code | Message                                                                                                                                                                    |
  16. +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  17. | Note  | 1003 | SELECT `test`.`j1`.`c1` AS `c1` FROM `test`.`j1` JOIN `test`.`j2` JOIN `test`.`j3` WHERE ((`test`.`j2`.`c1` = `test`.`j1`.`c1`) AND (`test`.`j3`.`c1` = `test`.`j1`.`c1`)) |
  18. +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  19. 1 row IN SET (0.00 sec)

The type field has now changed to ALL, which means that the entire table will be read. This is because these tables contain no indexes.

There is another interesting thing, and I probably should have mentioned it before. You will notice that the query that I explained used the comma syntax, but when MySQL rewrote the query it switched it to use the JOIN keyword. This should put to rest any debate as to any perceived performance difference between comma join and ANSI JOIN syntax. They are intrinsically the same.

Last, EXPLAIN EXTENDED can show you information about the query rewrites that MySQL makes when accessing views which use the MERGE algorithm.

For example:

SQL:
  1. mysql> CREATE VIEW v1 AS SELECT * FROM j1;
  2. Query OK, 0 rows affected (0.10 sec)
  3.  
  4. mysql> EXPLAIN extended SELECT * FROM v1 WHERE c1=1;
  5. +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
  6. | id | select_type | TABLE | type | possible_keys | KEY  | key_len | ref  | rows | Extra       |
  7. +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
  8. 1 | SIMPLE      | j1    | ALL  | NULL          | NULL | NULL    | NULL |    2 | USING WHERE |
  9. +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
  10. 1 row IN SET, 1 warning (0.00 sec)
  11.  
  12. mysql> SHOW warnings;
  13. +-------+------+-------------------------------------------------------------------------------+
  14. | Level | Code | Message                                                                       |
  15. +-------+------+-------------------------------------------------------------------------------+
  16. | Note  | 1003 | SELECT `test`.`j1`.`c1` AS `c1` FROM `test`.`j1` WHERE (`test`.`j1`.`c1` = 1) |
  17. +-------+------+-------------------------------------------------------------------------------+
  18. 1 row IN SET (0.00 sec)

The most important thing to notice is the WHERE clause. You will see that the SELECT statement used in the view has been modified to include the WHERE clause that I used when accessing the view.


Entry posted by Justin Swanhart | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Extending Index for Innodb tables can hurt performance in a surprising way

Май 23rd, 2010

One schema optimization we often do is extending index when there are queries which can use more key part. Typically this is safe operation, unless index length increases dramatically queries which can use index can also use prefix of the new index are they ? It turns there are special cases when this is not the case.

SQL:
  1. CREATE TABLE `idxitest` (
  2.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   `a` int(11) NOT NULL,
  4.   `b` int(11) NOT NULL,
  5.   PRIMARY KEY (`id`),
  6.   KEY `a` (`a`)
  7. ) ENGINE=InnoDB AUTO_INCREMENT=6029313 DEFAULT CHARSET=latin1
  8.  
  9. mysql> SELECT count(*) FROM idxitest WHERE a=5 AND b=5;
  10. +----------+
  11. | count(*) |
  12. +----------+
  13. |    60434 |
  14. +----------+
  15. 1 row IN SET (0.69 sec)
  16.  
  17. mysql> EXPLAIN SELECT count(*) FROM idxitest WHERE a=5 AND b=5;
  18. +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+
  19. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref   | rows   | Extra       |
  20. +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+
  21. 1 | SIMPLE      | idxitest | ref  | a             | a    | 4       | const | 707820 | USING WHERE |
  22. +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+
  23. 1 row IN SET (0.00 sec)

The obvious optimization is to extend index from column (a) to column (a,b) right which will make it faster and should not hurt any other queries a lot, right ?

SQL:
  1. mysql> ALTER TABLE idxitest DROP KEY a,ADD KEY(a,b);
  2. Query OK, 0 rows affected (24.84 sec)
  3. Records: 0  Duplicates: 0  Warnings: 0
  4.  
  5. mysql> SELECT count(*) FROM idxitest WHERE a=5 AND b=5;
  6. +----------+
  7. | count(*) |
  8. +----------+
  9. |    60434 |
  10. +----------+
  11. 1 row IN SET (0.02 sec)
  12.  
  13. mysql> EXPLAIN SELECT count(*) FROM idxitest WHERE a=5 AND b=5;
  14. +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+
  15. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref         | rows   | Extra       |
  16. +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+
  17. 1 | SIMPLE      | idxitest | ref  | a             | a    | 8       | const,const | 120640 | USING INDEX |
  18. +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+
  19. 1 row IN SET (0.00 sec)

Wow. The query runs 30 times faster - not only because it has to scan less rows but also because it is index covering query now - it does not need to access the data.

It turns out it is too early to celebrate. Application also had another query which previously ran so fast it hardly could be noticed. It however became a lot slower:

SQL:
  1. # Old Schema
  2.  
  3. mysql> SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  4. +---------+-----+---+
  5. | id      | a   | b |
  6. +---------+-----+---+
  7. | 3000000 | 100 | 7 |
  8. +---------+-----+---+
  9. 1 row IN SET (0.00 sec)
  10.  
  11. mysql> EXPLAIN SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  12. +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+
  13. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref   | rows   | Extra       |
  14. +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+
  15. 1 | SIMPLE      | idxitest | ref  | a             | a    | 4       | const | 126074 | USING WHERE |
  16. +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+
  17. 1 row IN SET (0.00 sec)
  18.  
  19.  
  20.  
  21. # new Schema
  22.  
  23. mysql> SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  24. +---------+-----+---+
  25. | id      | a   | b |
  26. +---------+-----+---+
  27. | 3000000 | 100 | 7 |
  28. +---------+-----+---+
  29. 1 row IN SET (1.01 sec)
  30.  
  31. mysql> EXPLAIN SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  32. +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  33. | id | select_type | TABLE    | type  | possible_keys | KEY     | key_len | ref  | rows | Extra       |
  34. +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  35. 1 | SIMPLE      | idxitest | INDEX | a             | PRIMARY | 4       | NULL |   36 | USING WHERE |
  36. +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  37. 1 row IN SET (0.00 sec)
  38.  
  39. # The plan also can look something like this:
  40.  
  41. mysql> EXPLAIN SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  42. +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+
  43. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref   | rows | Extra                                    |
  44. +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+
  45. 1 | SIMPLE      | idxitest | ref  | a             | a    | 4       | const |    1 | USING WHERE; USING INDEX; USING filesort |
  46. +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+
  47. 1 row IN SET (0.01 sec)

So why did this query become so much slower ? The reason is its plan benefits from Innodb specific feature - index entries being sorted by primary key for each complete key value. So when you have index (a) and id is a primary key the real index is (a,id) when we extend index to (a,b) it really becomes (a,b,id). So if there is a query which used both a and id key part from original index it will quite likely be unable to use new index to full extent.

What is solution ? You can have "redundant" indexes on (a) and (a,b) at the same time. This is something what suppose to work but it well often does not:

SQL:
  1. CREATE TABLE `idxitest` (
  2.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   `a` int(11) NOT NULL,
  4.   `b` int(11) NOT NULL,
  5.   PRIMARY KEY (`id`),
  6.   KEY `a` (`a`),
  7.   KEY `a_2` (`a`,`b`)
  8. ) ENGINE=InnoDB AUTO_INCREMENT=6029313 DEFAULT CHARSET=latin1
  9.  
  10. mysql> SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  11. +---------+-----+---+
  12. | id      | a   | b |
  13. +---------+-----+---+
  14. | 3000000 | 100 | 7 |
  15. +---------+-----+---+
  16. 1 row IN SET (1.03 sec)
  17.  
  18. mysql> EXPLAIN SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  19. +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  20. | id | select_type | TABLE    | type  | possible_keys | KEY     | key_len | ref  | rows | Extra       |
  21. +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  22. 1 | SIMPLE      | idxitest | INDEX | a,a_2         | PRIMARY | 4       | NULL | 2247 | USING WHERE |
  23. +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
  24. 1 row IN SET (0.00 sec)

MySQL Optimizer considers using both (a) and (a,b) indexes and in the end decides to use neither rather doing full index scan until it finds a=100. This looks like an optimizer glitch in this case because it estimates it will scan 2247 rows in the selected plan, while using (a) index you can get result scanning only 1 row guaranteed.

To really get things going you will need to use FORCE INDEX(a) to force MySQL optimizer using right plan.

These results mean you should be very careful applying index changes from mk-duplicate-key-checker key checker when it comes to redundant indexes. If you have query plans depending on Innodb ordering of data by primary key inside indexes they can become significantly affected.

Optimizer behavior may be different in different MySQL versions. These tests were done with 5.1.45 though I've seen same behavior with MySQL 5.0 too.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN