Archive for the ‘query-cache’ Category

Stripping Comments so Query Cache Works in MariaDB and XtraDB

Февраль 4th, 2012

I recently noticed both MariaDB and XtraDB (not MySQL yet) have a (newer) variable query_cache_strip_comments.

This variable is great for those who want to append comments to various queries, but still want the query cache to be able to serve such queries. Unfortunately, with MySQL, this is not currently possible.

In the past, I wrote a post on using MySQL Proxy which described a technique of monitoring queries through the proxy by appending IP addresses to the queries so one could track where they originated from. However, one pitfall to that was the MySQL query cache *does not* ignore the comment and treats them all as different queries (see the user comments for further discussion). (I did subsequently enhance that functionality implementing the SHOW PROXY PROCESSLIST command (often used in the Proxy Admin module), in large part because of this limitation.)

To enable it (in MariaDB 5.3+ and XtraDB), just add query_cache_strip_comments under the [mysqld] section in your my.cnf file and restart mysqld.

Alternatively, you can also set it dynamically:

mysql> set @@global.query_cache_strip_comments=1;
Query OK, 0 rows affected (0.04 sec)

mysql> show global variables like 'query_cache_strip_comments';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| query_cache_strip_comments | ON    |
+----------------------------+-------+

mysql> select version();
+---------------+
| version()     |
+---------------+
| 5.3.3-MariaDB |
+---------------+

Fwiw, from examining the source code, both implementations seemed to differ, but the end result is the same, and it’s a welcome addition, if you ask me. :)


PlanetMySQL Voting: Vote UP / Vote DOWN

Regularly flushing the MySQL Query Cache without cron

Июнь 19th, 2011
This is a reply on Regularly flushing the MySQL Query Cache.

The original acticle is about regulary flushing the MySQL Query Cache as it will fragment over time.

There are some drawbacks for the cron method for flushing the query cache:
  • It will only work on UNIX like platforms as MS Windows uses the task scheduler to schedule tasks.
  • It needs credentials to login to the database.
  • It's not included in your database backup
There is another method, which is native to MySQL: the event scheduler.

Step 1: Enable the event scheduler:
mysql> SET GLOBAL event_scheduler=ON;
Query OK, 0 rows affected (0.00 sec)

And don't forget to set/change this in your my.cnf or my.ini

Step 2: Create the event:
mysql> CREATE EVENT flush_query_cache ON SCHEDULE EVERY 1 HOUR DO FLUSH QUERY CACHE;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW EVENTS\G
*************************** 1. row ***************************
Db: test
Name: flush_query_cache
Definer: msandbox@localhost
Time zone: SYSTEM
Type: RECURRING
Execute at: NULL
Interval value: 1
Interval field: HOUR
Starts: 2011-06-19 12:57:46
Ends: NULL
Status: ENABLED
Originator: 0
character_set_client: utf8
collation_connection: utf8_general_ci
Database Collation: latin1_swedish_ci
1 row in set (0.00 sec)

Please keep in mind that the query cache doesn't always give you a performance benefit due to mutex contention. See also the query cache tuner from Domas Mituzas.

You schould create a stored procedure for multi statement and/or complex statements and call the procedure from your event instead of putting it directly in your event.

PlanetMySQL Voting: Vote UP / Vote DOWN

On generating unique IDs using LAST_INSERT_ID() and other tools

Февраль 2nd, 2011

There’s a trick for using LAST_INSERT_ID() to generate sequences in MySQL. Quoting from the Manual:

  1. Create a table to hold the sequence counter and initialize it:
    mysql> CREATE TABLE sequence (id INT NOT NULL);
    mysql> INSERT INTO sequence VALUES (0);
    
  2. Use the table to generate sequence numbers like this:
    mysql> UPDATE sequence SET id=LAST_INSERT_ID(id+1);
    mysql> SELECT LAST_INSERT_ID();
    

This trick calls for trouble.

Contention

A customer was using this trick to generate unique session IDs for his JBoss sessions. These IDs would eventually be written back to the database in the form of log events. Business go well, and one day the customer adds three new JBoss servers (doubling the amount of webapps). All of a sudden, nothing works quite as it used to. All kinds of queries take long seconds to complete; load average becomes very high.

A short investigation reveals that a very slight load is enough to make for an accumulation of sequence-UPDATE queries. Dozens of them are active at any given time, waiting for long seconds.

InnoDB or MyISAM both make for poor response times. No wonder! Everyone’s contending for one lock.

Not just one

Other queries seem to hang as well. Why?

It is easy to forget or let go unnoticed that there are quite a few global locks involved with each query. If query cache is activated, then any query must pass through that cache, holding the query cache mutex. There’s a global mutex on MyISAM’s key cache. There’s one on InnoDB’s buffer pool (see multiple buffer pools in InnoDB 5.5), albeit less of an overhead. And there’s the table cache.

When table cache is enabled, any completed query attempts to return file handles to the cache. Any new query attempts to retrieve handles from the cache. While writing to the cache (extracting, adding), the cache is locked. When everyone’s busy doing the sequence-UPDATE, table cache lock is being abused. Other queries are unable to find the time to squire the lock and get on with their business.

What can be done?

One could try and increase the table_open_cache value. That may help to some extent, and for limited time. But the more requests are made, the quicker the problem surfaces again. When, in fact, reducing the table_open_cache to zero (well, minimum value is 1) can make for a great impact. If there’s nothing to fight for, everyone just get by on their own.

I know the following is not a scientific explanation, but it hits me as a good comparison: when my daughter brings a friend over, and there’s a couple of toys, both are happy. A third friend makes for a fight: “I saw it first! She took it from me! I was holding it!”. Any parent knows the ultimate solution to this kind of fight: take away the toys, and have them find something else to enjoy doing. OK, sorry for this unscientific display, I had to share my daily stress.

When no table cache is available, a query will go on opening the table by itself, and will not attempt to return the file handle back to the cache. The file handle will simply be destroyed. Now, usually this is not desired. Caching is good. But in our customer’s case, the cost of not using a table cache was minified by the cost of having everyone fight for the sequence table. Reducing the table cache made for an immediate relaxation of the database, with observable poorer responsiveness on peak times, however way better than with large table cache.

Other tools?

I don’t consider the above to be a good solution. It’s just a temporary hack.

I actually don’t like the LAST_INSERT_ID() trick. Moreover, I don’t see that it’s the database’s job to provide with unique IDs. Let it do relational stuff. If generating IDs is too intensive, let someone else do it.

NoSQL solutions provide such a service. Memcached, redis, MongoDB (and probably more) all provide with increment functions. Check them out.

Application level solutions

I actually use an application level solution to generate unique IDs. I mean, there’s always GUID(), but it’s result is just too long. Take a look at the following Java code:

public class Utils {
  private static long lastUniqueNumber = 0;

  public static synchronized long uniqueNumber() {
    long unique = System.currentTimeMillis();
    if (unique <= lastUniqueNumber)
      unique = lastUniqueNumber + 1;
    lastUniqueNumber = unique;
    return unique;
  }
}

Within a Java application this above method returns with unique IDs, up to 1000 per second on average (and it can perform way more than 1000 times per second).

On consequential executions of applications on the same machine one would still expect unique values due to the time-related nature of values. However, computer time changes. It’s possible that System.currentTimeMillis() would return a value already used in the past.

And, what about two processes running on the same machine at the same time? Or on different machine?

Which is why I use the following combination to generate my unique IDs:

  • Server ID (much like MySQL’s server_id parameter). this could be the last byte in the server’s IP address, or just 4 or 5 bits if not too many players are expected.
  • Process ID (plain old pid) which I pass to the Java runtime in the form of system properties. Any two processes running on the same machine are assured to have different IDs. To consequently spawned processes will have different IDs. The time it would take to cycle the process IDs is way more than would make for a “time glitch” problem as described above
  • Current time in milliseconds.

If you have to have everything withing 64 bit (BIGINT) then you’ll have to do bit manipulation, and drop some of the MSB on the milliseconds so as to overwrite with server & process IDs.

If you are willing to have your IDs unique in the bounds of a given time frame (so, for example, a month from now you wouldn’t mind reusing old IDs), then the problem is significantly easier. You may just use “day of month” and “millis since day start” and save those precious bits.

Still other?

Please share your solutions below!


PlanetMySQL Voting: Vote UP / Vote DOWN

Making query cache contention more obvious

Сентябрь 16th, 2010

The newest release of Percona Server includes a trivial change that I think will be extremely valuable. This is the addition of a new thread state, “Waiting on query cache mutex.” Fixing the query cache to make it scalable is hard. Fixing the server to report when the query cache is a bottleneck is not hard. It has historically been very difficult for users to diagnose what’s wrong with their server when the query cache is locking it intermittently. Now it will be trivial: they will look at SHOW PROCESSLIST and the evidence will be unmistakable.

Related posts:

  1. Making Maatkit more Open Source one step at a time
  2. A growing trend: InnoDB mutex contention
  3. Making changes to many tables at once
  4. How MySQL really executes a query
  5. How to profile a query in MySQL


PlanetMySQL Voting: Vote UP / Vote DOWN

mk-query-digest, query comments and the query cache

Июль 6th, 2010

I very much like the fact that MySQL allows you to embed comments into SQL statements. These comments are extremely convenient, because they are written into MySQL log files as part of the query. This includes the general log, the binary log and the slow query log. Maatkit includes tools which interact with these logs, including mk-query-digest. This tool, in particular, has a very nice option called --embedded-attributes which can process data embedded in query comments.

The support for embedded attributes makes some cool tricks possible. Peter and I co-presented a talk at this past MySQL Conference and Expo. In this talk I presented my Instrumentation-for-PHP class as a demonstration instrumentation application for the LAMP stack. Instrumentation-for-php collects information about the PHP process such as wall time, cpu time, mysql query times, etc, and automatically places this information into the Apache environment. It also includes support for "augmenting" SQL queries with additional information such as the PHP source line and user session information by embedding this information in a specially formatted SQL comment. This information can then be cross-referenced between MySQL logs and Apache logs.

SQL:
  1. SELECT 1
  2. -- File: myfile.php Line: 25 Function: myFunc SessionID: a1b2c3d4e5f6a7b8c9d0aef UserID: swany78

Notice that the query ends with a comment, and that this comment includes important information, such as the PHP session ID, and the application user name. This allows mk-query digest to discover these attributes and use them for aggregation. You could also filter on the embedded information to display only queries from a particular file, particular session or particular user for example.

The --embedded-attributes option takes two parts. The first part is a Perl regular expression which identifies which comment will be treated as the one with attributes. The second part is another expression used to split the comment into attribute key/value pairs.

For example:

--embedded-attributes '^-- [^\n]+','(\w+): ([^ ]+)'

The first part: '^-- [^\n]+' matches everything starting with -- and ending with newline.
The second part includes two expressions, each enclosed by parenthesis so that they will be captured. The first matching expression:(\w+): matches one or more words followed by colon and a space. The second match expression:([^ ]+)captures everything up to the first space.

A problem with this approach is that MySQL treats identical queries with different comments as different queries. This essentially renders the query cache inoperative when this approach is taken. A cache that can't return any results is worse than not having a cache at all. The cache won't return any result if each query has some sort of unique identifier in it. Evicting queries will become very frequent when every query is unique and the query cache will use additional resources for no effective benefit.

Luckily the problem is mostly* fixed in Percona Server 11.

I wrote a simple test script to demonstrate the difference. The test loops twice over a set of query patterns. Each pattern is executed 1000 times in each iteration of the loop. The queries differ only by comments. In the best case scenario where comments are stripped we would expect 1000 misses and then all hits from this test. You can enable query cache stripping in Percona Server at runtime:

SQL:
  1. mysql> SET global query_cache_strip_comments='ON';
  2. Query OK, 0 rows affected (0.00 sec)

Query Hits Misses Hits Misses
  Stripping No Stripping
no comment 1000 1000 1000 1000
random at start 2000 0 0 2000
random at middle 2000 0 0 2000
static at start 2000 0 1000 1000

As you can see, the first loop of the queries (that is, those with no comment at all) get 1000 misses. Everything beyond that is a hit when stripping is enabled. Contrast this with the results where stripping is not enabled. There are many more misses due to the random values in the query comments. Also notice the query with static comments. There are 1000 hits and 1000 misses because MySQL treats identical queries with identical comments as the same. When comments are stripped every one of these queries is answered by the cache.

If you are benefiting from the query cache, but would like better instrumentation in your queries, consider using Percona Server and turning on the --strip-query-cache-comments feature.

* Percona Server 11.1 still has a problem with # and -- comments at the start of a query(before SELECT), and in some cases, with varying white-space. These issues will be rectified in the next release. I've updated instrumentation-for-php to place the comment at the end of the query to avoid these problems.


Entry posted by Justin Swanhart | One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Is the query cache useful?

Сентябрь 29th, 2009

Mark Callaghan posted a good test of the MySQL query cache in different versions. His tests clearly show that in 5.0.44 and 5.0.84 and 5.1.38, there is more query throughput when the query cache is disabled.

However, the tests are skewed — not on purpose, I am sure, and Mark admits he has not used the query cache before — but they are skewed all the same. Mark’s error was that he assumed he could just turn on the query cache and see if it works. Most features of MySQL do not work that way — you have to understand the strengths and weaknesses of the feature in order to use it properly.

Mark’s benchmark definitely reinforces that turning on the query cache without any knowledge of your system is a bad idea, and I agree with him on that. But it does not in any way mean that the query cache is always a bad idea. In fact, the MySQL manual page at http://dev.mysql.com/doc/refman/5.1/en/query-cache.html has some caveats about when the query cache is useful and when it is not useful.

It is important to know how the MySQL query cache works, so I will first explain that, and then explain why Mark’s test is not a very good broad generalization. MySQL’s query cache is not useful all the time, but it can be quite useful! If you are already familiar with the query cache, you can skip ahead to why Mark’s benchmark is skewed.


How to use the MySQL query cache
Firstly, there are a few ways to use the MySQL query cache. You can disable it, which means no queries are cached; you can set it to cache all queries, and use a special keyword to not cache queries (replace SELECT with SELECT SQL_NO_CACHE). query_cache_type=1 or ON will set the query cache in this way.

You can also set it to cache no queries except those that have a special keyword — you guessed it, replace SELECT with SELECT SQL_CACHE and a query will attempt to be cached. To set the query cache in this way, set query_cache_type=2 or DEMAND.

What queries are cached
Regardless of which method you choose, not all possible queries are cached. Only deterministic SELECT statements are cached — a statement such as SELECT count(*) from tbl is deterministic, while SELECT NOW() is non-deterministic for the purpose of the query cache.* Queries that use user-defined functions, temporary tables, LOCK IN SHARE MODE, FOR UPDATE, INFORMATION_SCHEMA or mysql system tables are not cached. Most SELECT subqueries are not cached. SELECT queries that are part of stored functions, triggers, views and events are not cached. However, in MySQL 5.1, SELECT queries that contain a view can be cached. The MySQL manual page at http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html has more detailed information on what is allowed and what is not allowed.

In the query cache, a hash of the query is stored along with the result. There is a variable, query_cache_limit, which sets an upper limit on the size of the result set that can be cached. So even if a query can be cached, it may not be cached if the result set is larger than query_cache_limit.

How the query cache is used
The lifecycle of a non-prepared SELECT statement when the query cache is on (query_cache_type > 0) is:

1) hash the SELECT query
2) check the hash against the query cache
3) if hash matches, returned the result set from the query cache. If not, the query is sent to the parser and executed in the usual manner, and results are stored in the query cache if the query is cachable.

This means that having the query cache on has some overhead, even if no queries are ever cached.

A query (and its result) is removed from the query cache when it is invalidated or pruned. A query is invalidated when any table it refers to has been changed, even if the changes do not affect the actual result set in the query cache. A query is pruned when a new query needs to be inserted into the query cache and there is no room; the query cache prunes queries using a least recently used (LRU) algorithm.

In other words, there will be a lot of unnecessary overhead with the query cache if a table is often written to, because queries using that table will be constantly removed from the query cache. This does not mean that if you have some tables that are frequently written to, you should turn the query cache off — you can use the query_cache_type of ON with the SQL_NO_CACHE keyword to cache all except some queries, or you can use the query_cache_type of DEMAND and the SQL_CACHE keyword in SELECT queries to cache no queries except those you specify.

Why Mark’s Benchmark is skewed

It is important to note that InnoDB was set up with:
innodb_buffer_pool_size=2000M

and the sysbench program was run with:

../sysbench --test=oltp --oltp-table-size=2000000 --max-time=180 --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes --oltp-read-only --num-threads=$nt run

I will explain what the important sysbench variables are in context.

1) The benchmark used sysbench in OLTP mode against an InnoDB table. The InnoDB buffer pool is hot after the table is created and populated, while the query cache is empty and cold.

InnoDB has its own cache — the InnoDB buffer pool. sysbench is run with –oltp-read-only, meaning there are no INSERT, DELETE or UPDATE statements. This is good; it means that the results are not skewed by query cache invalidations, though there may be pruning.

The first thing sysbench does is preparation — it creates the table and populates it with data (2,000,000 rows as per oltp-table-size). After this is done, the InnoDB buffer pool (2000M in size) has gone from empty to containing data. I do not know how large the row size of the sysbench table is; however, I can do some calculations:

2000 Mb * 1024 = 2,048,000 Kb / 2,000,000 rows = 1.024 Kb per row

This means that in order for some data to *not* be current in the InnoDB buffer pool, the row size had to be approximately larger than 1 Kb per row.

The manual at http://sysbench.sourceforge.net/docs/#database_mode shows the schema of the table that is used.

CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL auto_increment, `k` int(10) unsigned NOT NULL default ‘0′, `c` char(120) NOT NULL default ”, `pad` char(60) NOT NULL default ”, PRIMARY KEY (`id`), KEY `k` (`k`)

(note that you at least need to add a final parenthesis “)” for the query to work, and should also specify ENGINE=INNODB if you want to test on your system).

So, that’s two 4-byte integers, and two character fields…for the sake of argument, let’s say you are using UTF-8 in MySQL 5.1, where utf8 is 3 bytes per character, that’s:

4 + 4 + 120*3 + 60*3 = 548 bytes

Let’s say you are using a version of MySQL that uses 4 bytes per character for utf8 (I know 6.0 was using that, not sure about 5.4):

4 + 4 + 120*4 + 60*4 = 728 bytes

So either way, all of the data in the table is already loaded in the InnoDB buffer pool when the test begins. The InnoDB buffer pool is already hot, whereas the query cache is empty.

On systems where a query can be cached and used instead of executing and having to look on disk for data, using the MySQL query cache will give you *much* better performance.

2) The query cache statistics are not given. It’s very possible that the only thing that was measured was how much overhead the query cache provides if it is never used.

The queries that are run are the default (oltp-test-mode is not set, and it defaults to “complex”) which can be seen under the “Advanced transactional” heading in the sysbench manual at http://sysbench.sourceforge.net/docs/#database_mode. Only read queries are done, so the queries are:

SELECT c FROM sbtest WHERE id=N
SELECT c FROM sbtest WHERE id BETWEEN N AND M
SELECT SUM(K) FROM sbtest WHERE id BETWEEN N and M
SELECT c FROM sbtest WHERE id between N and M ORDER BY c
SELECT DISTINCT c FROM sbtest WHERE id BETWEEN N and M ORDER BY c

The values for the variables in the query are chosen at random, though they are sensible values. The testing time is very short, only 3 minutes of running time (sysbench is called with max-time=180). It is quite possible that all the SELECT queries were put into the query cache and there was never a query cache hit. Of course there will be a lot of overhead if the query cache is never used! Particularly if it has to prune for more memory.

I would like to see Qcache_hits, Qcache_inserts, Com_select (number of SELECT statements that did not use the query cache) and Qcache_lowmem_prunes. That is what I look at for any system that has the query cache running, so I can determine the effectiveness of the query cache.

There is more in-depth information on how to find the usefulness of the query cache, query cache fragmentation, and a few ways to cache more queries in the MySQL Administrator’s Bible on pages 427-434 — part of the “Caching with MySQL” chapter. I used the book to help me write this blog post, so I do think it is fair to cite my sources….even though it happens to be a source I created (and that’s why I consider it a shameless plug, I have no shame citing my sources)!

—————-

So that all being said, the query cache is extremely useful when it is used for repeated queries. The benchmark Mark performed most likely did not actually use this. I have seen production environments fall apart when someone turned on on the query cache for all queries; Mark’s benchmark proves that it is not a good idea to randomly turn on the query cache. You have to know what you’re doing.

* for the purpose of replication, it is deterministic, but that’s another topic.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Query Cache path

Сентябрь 29th, 2009

Just how effective is the Query Cache on performance? If you are not comfortable reading the MySQL code you can cheat very easily with the SHOW PROFILE command. This demonstration will show you at a high level the relative impact without and with the Query Cache. First let us confirm the Query Cache is not used.

mysql> SHOW GLOBAL VARIABLES LIKE 'query_cache%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 0       |
| query_cache_type             | ON      |
| query_cache_wlock_invalidate | OFF     |
+------------------------------+---------+
5 rows in set (0.00 sec)

We now enable profiling.

mysql> SET PROFILING=1;

We run our sample query.

mysql> select name,id,competitions from olympic_games where host_city='Sydney'\G

We can then enable the query cache and re-run the query.

mysql> SET GLOBAL query_cache_size=1024*1024*16;
mysql> select name,id,competitions from olympic_games where host_city='Sydney'\G
mysql> select name,id,competitions from olympic_games where host_city='Sydney'\G

NOTE: We run the query twice after enabling the cache, the first time, the query is cached, the second time it is retrieved from the cache. Now let us look at the profiling information.

mysql> SHOW PROFILES;
+----------+------------+-------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                   |
+----------+------------+-------------------------------------------------------------------------+
|        1 | 0.00096100 | select name,id,competitions from olympic_games where host_city='Sydney' |
|        2 | 0.00029700 | SET GLOBAL query_cache_size=1024*1024*16                                |
|        3 | 0.00837900 | select name,id,competitions from olympic_games where host_city='Sydney' |
|        4 | 0.00009500 | select name,id,competitions from olympic_games where host_city='Sydney' |
+----------+------------+-------------------------------------------------------------------------+

mysql> SHOW PROFILE SOURCE FOR QUERY 3;
+--------------------------------+----------+---------------------------+---------------+-------------+
| Status                         | Duration | Source_function           | Source_file   | Source_line |
+--------------------------------+----------+---------------------------+---------------+-------------+
| starting                       | 0.000033 | NULL                      | NULL          |        NULL |
| checking query cache for query | 0.000088 | send_result_to_client     | sql_cache.cc  |        1262 |
| Opening tables                 | 0.000025 | open_tables               | sql_base.cc   |        4482 |
| System lock                    | 0.000006 | mysql_lock_tables         | lock.cc       |         258 |
| Table lock                     | 0.000081 | mysql_lock_tables         | lock.cc       |         269 |
| init                           | 0.000037 | mysql_select              | sql_select.cc |        2350 |
| optimizing                     | 0.000016 | optimize                  | sql_select.cc |         772 |
| statistics                     | 0.000021 | optimize                  | sql_select.cc |         954 |
| preparing                      | 0.000021 | optimize                  | sql_select.cc |         964 |
| executing                      | 0.000005 | exec                      | sql_select.cc |        1648 |
| Sending data                   | 0.000500 | exec                      | sql_select.cc |        2190 |
| end                            | 0.000008 | mysql_select              | sql_select.cc |        2395 |
| query end                      | 0.000005 | mysql_execute_command     | sql_parse.cc  |        4821 |
| freeing items                  | 0.007489 | mysql_parse               | sql_parse.cc  |        5827 |
| storing result in query cache  | 0.000028 | query_cache_end_of_result | sql_cache.cc  |         813 |
| logging slow query             | 0.000007 | log_slow_statement        | sql_parse.cc  |        1628 |
| cleaning up                    | 0.000009 | dispatch_command          | sql_parse.cc  |        1595 |
+--------------------------------+----------+---------------------------+---------------+-------------+
17 rows in set (0.00 sec)

mysql> SHOW PROFILE SOURCE FOR QUERY 4;
+--------------------------------+----------+-----------------------+--------------+-------------+
| Status                         | Duration | Source_function       | Source_file  | Source_line |
+--------------------------------+----------+-----------------------+--------------+-------------+
| starting                       | 0.000035 | NULL                  | NULL         |        NULL |
| checking query cache for query | 0.000014 | send_result_to_client | sql_cache.cc |        1262 |
| checking privileges on cached  | 0.000010 | send_result_to_client | sql_cache.cc |        1346 |
| sending cached result to clien | 0.000026 | send_result_to_client | sql_cache.cc |        1441 |
| logging slow query             | 0.000005 | log_slow_statement    | sql_parse.cc |        1628 |
| cleaning up                    | 0.000005 | dispatch_command      | sql_parse.cc |        1595 |
+--------------------------------+----------+-----------------------+--------------+-------------+
6 rows in set (0.00 sec)

It does not take a rocket scientist to determine that 6 steps within the MySQL kernel is better then 17, regardless of what those steps are, and how different in timing they may be.

I’m not wanting to represent how much saving you may have here, there are many factors such as a realistic example, a loaded warmed up environment etc. You should try this in your own environment with your own queries.

This information was to provide an introduction into looking a little deeper at the Query Cache path within MySQL.


PlanetMySQL Voting: Vote UP / Vote DOWN

Using the Query Cache effectively

Сентябрь 29th, 2009

Maximize your strengths, minimize your weaknesses.

You can apply this approach to many things in life, I apply it to describing and using MySQL the product, and it’s components. The Query Cache like many features in MySQL, and indeed features in many different RDBMS products (don’t get me started on Oracle *features*) have relative benefits. In one context it can be seen as ineffective, or even detrimental to your performance, however it’s course grain nature makes it both trivial to disable dynamically (SET GLOBAL query_cache_size=0;), and also easy to get basic statistics on current performance (SHOW GLOBAL STATUS LIKE ‘QCache%’;) to determine effectiveness and action appropriately.

The Query Cache is course grained, that is it is rather simple/dumb in nature. When you understand the path of execution of a query within the MySQL kernel you learn a few key things.

  • When enabled, by default the Query Cache will cache all SELECT statements within certain defined system parameter conditions. There are of course exceptions such as non-deterministic functions, prepared statements in earlier versions etc.
  • Any DML/DDL statement for a table that has a query cached, flushes all query cache results that pertain to this table.
  • You can use SQL_CACHE and SQL_NO_CACHE as hints however you can’t configure on a table by table, or query basis.
  • The query cache works on an exact match of the query (including spaces and case) and other settings such the client character set, and protocol version. If a match is found, data is returned in preformed network packets.<.li>

The Query Cache was not good when set to large values (e.g. > 128M) due to in-efficient cache invalidation. I’m not certain of the original source of this condition however Bug #21074, fixed in 5.0.50 and 5.1.21 is likely the reason.

My advice is to disable the Query Cache by default, especially for testing. As a final stress test you can enable to determine if there is a benefit.

I wish MySQL would spend time in improving key features, for example the Query Cache lacks sufficient instrumentation like what queries are in the cache, what tables are in the cache, and also lack all the sufficient system parameters exposed to fine tune. I believe there is a patch to show the queries for example, but I was unable to find via a google search.

It is a powerful and easy technology if you use it well. It involves architecting your solution appropriately, and knowing when the Query Cache is ineffective.

I have a number of circumstances where the query cache is extremely effective, or could be with simple modifications. A recommendation to a recent client with a 1+TB database was to split historical and current data into two different instances. The data was already in separated tables, the application already performed dual queries, so the change was a simple as a new connection pool. The benefits were huge, not only would the backup process be more efficient, some 500GB of data now only had to be backed up once (as is was 100% static), the scaling and recovery process improved, but the second MySQL instance could enable the query cache and the application would get a huge performance improvement with ZERO code changes for caching. That’s a quick and easy win.

On a side note, I wanted to title this “The MySQL Query Cache is not useless”. When I was a MySQL employee I got reprimanded (twice) for blogging anything about MySQL that wasn’t positive. This blog post is in direct response to Konstantin, a Sun/MySQL employee who actually works on the actually MySQL server code who wrote Query cache = useless?. In my view it is not useless.


PlanetMySQL Voting: Vote UP / Vote DOWN

querystat — DTrace script to monitor your queries, query cache and server thread pre-emption

Сентябрь 18th, 2009

I was recently helping some colleagues check what was happening with their MySQL queries, and wrote a DTrace script to do it. Time to share that script.

First of all, a look at some output from the script:

mashie[bash]# ./querystat.d -p `pgrep mysqld`
Tracing started at 2009 Sep 17 16:28:35
2009 Sep 17 16:28:38   throughput 3 queries/sec
2009 Sep 17 16:28:41   throughput 4 queries/sec
2009 Sep 17 16:28:44   throughput 528 queries/sec
2009 Sep 17 16:28:47   throughput 1603 queries/sec
2009 Sep 17 16:28:50   throughput 1676 queries/sec
^C
Tracing ended   at 2009 Sep 17 16:28:51
Average latency, all queries: 107 us
Latency distribution, all queries (us): 
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@                                       170      
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        3728     
             128 |@@@@@                                    533      
             256 |                                         26       
             512 |                                         18       
            1024 |                                         2        
            2048 |                                         1        
            4096 |                                         0        
            8192 |                                         1        
           16384 |                                         1        
           32768 |                                         0        
Query cache statistics:
    count             hit: 6
    count            miss: 4474
    avg latency      miss: 107 (us)
    avg latency       hit: 407 (us)
Latency distribution, for query cache hit (us): 
           value  ------------- Distribution ------------- count    
              64 |                                         0        
             128 |@@@@@@@@@@@@@                            2        
             256 |@@@@@@@                                  1        
             512 |@@@@@@@@@@@@@@@@@@@@                     3        
            1024 |                                         0        
Latency distribution, for query cache miss (us): 
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@                                       170      
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        3728     
             128 |@@@@@                                    531      
             256 |                                         25       
             512 |                                         15       
            1024 |                                         2        
            2048 |                                         1        
            4096 |                                         0        
            8192 |                                         1        
           16384 |                                         1        
           32768 |                                         0        
Average latency when query WAS NOT pre-empted: 73 us
Average latency when query     WAS pre-empted: 127 us
Pre-emptors:
[...]
   mysql                                     6
   Xorg                                     18
   sched                                    25
   firefox-bin                              44
   sysbench                               3095

You can see that while the script is running (prior to pressing <Ctrl>-C), we get a throughput count every 3 seconds.

Then we get some totals, some averages, and even some distribution histograms, covering all queries, then with breakdowns on whether we used the query cache, and whether the thread executing the query was pre-empted.

This may be useful for determining things like:

  • Do I have some queries in my workload that consume a lot more CPU than others?
  • Is the query cache helping or hurting?
  • Are my database server threads being pre-empted (kicked off the CPU) by (an)other process(es)?

Things have become easier since I first tried this, and had to use the PID provider to trace functions in the database server.

If you want to try my DTrace script, get it from here. NOTE: You will need a version of MySQL with DTrace probes for it to work.


PlanetMySQL Voting: Vote UP / Vote DOWN

querystat — DTrace script to monitor your queries, query cache and server thread pre-emption

Сентябрь 18th, 2009

I was recently helping some colleagues check what was happening with their MySQL queries, and wrote a DTrace script to do it. Time to share that script.

First of all, a look at some output from the script:

mashie[bash]# ./querystat.d -p `pgrep mysqld`
Tracing started at 2009 Sep 17 16:28:35
2009 Sep 17 16:28:38   throughput 3 queries/sec
2009 Sep 17 16:28:41   throughput 4 queries/sec
2009 Sep 17 16:28:44   throughput 528 queries/sec
2009 Sep 17 16:28:47   throughput 1603 queries/sec
2009 Sep 17 16:28:50   throughput 1676 queries/sec
^C
Tracing ended   at 2009 Sep 17 16:28:51
Average latency, all queries: 107 us
Latency distribution, all queries (us): 
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@                                       170      
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        3728     
             128 |@@@@@                                    533      
             256 |                                         26       
             512 |                                         18       
            1024 |                                         2        
            2048 |                                         1        
            4096 |                                         0        
            8192 |                                         1        
           16384 |                                         1        
           32768 |                                         0        
Query cache statistics:
    count             hit: 6
    count            miss: 4474
    avg latency      miss: 107 (us)
    avg latency       hit: 407 (us)
Latency distribution, for query cache hit (us): 
           value  ------------- Distribution ------------- count    
              64 |                                         0        
             128 |@@@@@@@@@@@@@                            2        
             256 |@@@@@@@                                  1        
             512 |@@@@@@@@@@@@@@@@@@@@                     3        
            1024 |                                         0        
Latency distribution, for query cache miss (us): 
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@                                       170      
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        3728     
             128 |@@@@@                                    531      
             256 |                                         25       
             512 |                                         15       
            1024 |                                         2        
            2048 |                                         1        
            4096 |                                         0        
            8192 |                                         1        
           16384 |                                         1        
           32768 |                                         0        
Average latency when query WAS NOT pre-empted: 73 us
Average latency when query     WAS pre-empted: 127 us
Pre-emptors:
[...]
   mysql                                     6
   Xorg                                     18
   sched                                    25
   firefox-bin                              44
   sysbench                               3095

You can see that while the script is running (prior to pressing <Ctrl>-C), we get a throughput count every 3 seconds.

Then we get some totals, some averages, and even some distribution histograms, covering all queries, then with breakdowns on whether we used the query cache, and whether the thread executing the query was pre-empted.

This may be useful for determining things like:

  • Do I have some queries in my workload that consume a lot more CPU than others?
  • Is the query cache helping or hurting?
  • Are my database server threads being pre-empted (kicked off the CPU) by (an)other process(es)?

Things have become easier since I first tried this, and had to use the PID provider to trace functions in the database server.

If you want to try my DTrace script, get it from here. NOTE: You will need a version of MySQL with DTrace probes for it to work.


PlanetMySQL Voting: Vote UP / Vote DOWN