Archive for the ‘mk-query-digest’ Category

Profiling your slow queries using pt-query-digest and some love from Percona Server

Декабрь 28th, 2011
This guide will get you up and running with how to identify the bottleneck queries using the excellent tool pt-query-digest. You will learn how to use and analyze the output returned by pt-query-digest. You will also learn some differences between slow query logging in various MySQL versions. Later on in the post I will also show you how to make use of the extra diagnostic data available with Percona Server.
PlanetMySQL Voting: Vote UP / Vote DOWN

Book Review – Effective MySQL

Ноябрь 4th, 2011

Effective MySQL: Optimizing SQL Statements

by Ronald Bradford

No Nonsense, Readable, Practical, and Compact

Effective MySQLI like that this book is small; 150 pages means you can carry it easily.  It's also very no nonsense.  It does not dig too deeply into theory unless it directly relates to your day-to-day needs.  And those needs probably cluster heavily around optimizing SQL queries, as those pesky developers are always breaking things ;)

Jokes aside, this new book out on Oracle Press is a very readable volume. Bradford has drawn directly from real-world experience to give you the right bite size morsels you need in your day-to-day MySQL activities.

Highlights

Chapter one, The Five Minute DBA gives you the basic methodology if you don't already know it.  Enable the slow query log, analyze it, and use the explain facility.  Then index as appropriate, or eliminate queries if you can.

Chapter two digs a little deeper into the basics, introducing explain extended, table statistics and storage engines.  You'll also learn how to use show session & global status, as well as session & global variables.  You'll also have your first look at MySQL's data dictionary - INFORMATION_SCHEMA.

Chapter three is where it starts to get meaty.  You probably know that MySQL has b-tree indexes, but did you know that it has b+tree indexes, or hash indexes?

Chapter four digs into indexes further with single & multi-column indexes using them for sorting and joining.  You'll also find out about covering indexes which are multi-column matching the where clause, but also including columns needed in the SELECT predicate.  Do you have duplicate or unused indexes?  You'll learn why they matter to performance and how to eliminate them with tools like mk-duplicate-key-checker.

Chapter five continues along the same lines, with more coverage of indexes.  Learn to identify when you are using a covering index, fulfilling the entire query by only accessing the index.  You'll also learn about partial indexes, how they can reduce the size of index storage and retrieval while still getting your data for you.

Chapter six covers configuring the server itself, hitting on the system variables such as the innodb buffer pool (innodb_buffer_pool_size) and key buffer (key_buffer_size) as well as the query cache.  You'll also learn how to set the four main session memory settings - sort buffer (sort_buffer_size) and join buffer ( join_buffer_size) as well as the lesser known read buffers (read_buffer_size and read_rnd_buffer_size).

Chapter seven is all about the process of tuning and optimizing MySQL.  Rolling all the previous sections into marching orders, and prescriptive advice, he takes you through step by step how to apply the principles.  You'll get an introduction to mk-query-digest (though strangely without attribution to Baron Schwartz), the great maatkit tool for query analysis and aggregation, as well as the microsecond precision patch, which allows your mysql shell client to display more exact timing data.  For the patch he links back to an article on his own site which seems to be not found.  The author of the high precision mysql timer patch is Stewart Smith.

I personally got the most out of Chapter eight, full of self-described hidden performance tips.  From identifying unused or duplicate indexes, to replacing inefficient data types with better ones, why it's important to use NOT NULL where possible or how to store IP addresses efficiently, this chapter has a lot of goodies.  For those still struggling with SQL statement tuning, there are a few patterns that are described, offering advice on how to rewrite a subquery as an inner join,

What you might not know

  • MySQL includes Oracle's index organized tables by a different name
  • Too many indexes can dramatically impact INSERT & UPDATE performance
  • Many DDL operations can be done online - see oak-online-alter-table (Shlomi Noach)
  • Datatypes matter - use enum, int unsigned, timestamp & not null where possible
  • Covering indexes are your friend, duplicate & unused indexes are not!
  • A replication slave can have different storage engines or indexes from the master. These can support different uses - such as data warehousing or non-transactional requirements.
  • While a_string LIKE '%end of my sentence.' won't use an index, you can index reverse_string, then use reverse_string LIKE REVERSE '%end of my sentence.' and MySQL will use this index.  You've simulated an advanced Oracle feature, reverse key indexes!

A few small gripes

If I were to add a few complaints it would be to say that some of the examples were rather simplistic.  In many cases tuning SQL is not as simple as just adding the right index.  For instance there was no good discussion of the dreaded "using temporary, using filesort" that we see a lot in MySQL explains when sorting has to be done, but will not fit in memory.  Or what about tmpdir=/dev/shm, how will that improve things?  What about UNION versus UNION ALL where appropriate.  Why does DISTINCT do a sort?

The book was also missing a discussion of triggers, stored procedures, when or if the query cache can cause problems and so forth.  Also the article link mentioned about chapter seven isn't the only missing link.  I followed links to optimizing sql  statements and it seems to go to a generic holding page.  Also the main link effectivemysql.com/book leads to an outline of an as yet unreleased title on Backup and Recovery.

All in all, well worth your money

However, other than these few gripes the book overall is a very welcome addition to the small family of MySQL books.  Get a copy quick before they're all gone!

 

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Book Review – Effective MySQL

Ноябрь 4th, 2011

Effective MySQL: Optimizing SQL Statements

by Ronald Bradford

No Nonsense, Readable, Practical, and Compact

Effective MySQLI like that this book is small; 150 pages means you can carry it easily.  It's also very no nonsense.  It does not dig too deeply into theory unless it directly relates to your day-to-day needs.  And those needs probably cluster heavily around optimizing SQL queries, as those pesky developers are always breaking things ;)

Jokes aside, this new book out on Oracle Press is a very readable volume. Bradford has drawn directly from real-world experience to give you the right bite size morsels you need in your day-to-day MySQL activities.

Highlights

Chapter one, The Five Minute DBA gives you the basic methodology if you don't already know it.  Enable the slow query log, analyze it, and use the explain facility.  Then index as appropriate, or eliminate queries if you can.

Chapter two digs a little deeper into the basics, introducing explain extended, table statistics and storage engines.  You'll also learn how to use show session & global status, as well as session & global variables.  You'll also have your first look at MySQL's data dictionary - INFORMATION_SCHEMA.

Chapter three is where it starts to get meaty.  You probably know that MySQL has b-tree indexes, but did you know that it has b+tree indexes, or hash indexes?

Chapter four digs into indexes further with single & multi-column indexes using them for sorting and joining.  You'll also find out about covering indexes which are multi-column matching the where clause, but also including columns needed in the SELECT predicate.  Do you have duplicate or unused indexes?  You'll learn why they matter to performance and how to eliminate them with tools like mk-duplicate-key-checker.

Chapter five continues along the same lines, with more coverage of indexes.  Learn to identify when you are using a covering index, fulfilling the entire query by only accessing the index.  You'll also learn about partial indexes, how they can reduce the size of index storage and retrieval while still getting your data for you.

Chapter six covers configuring the server itself, hitting on the system variables such as the innodb buffer pool (innodb_buffer_pool_size) and key buffer (key_buffer_size) as well as the query cache.  You'll also learn how to set the four main session memory settings - sort buffer (sort_buffer_size) and join buffer ( join_buffer_size) as well as the lesser known read buffers (read_buffer_size and read_rnd_buffer_size).

Chapter seven is all about the process of tuning and optimizing MySQL.  Rolling all the previous sections into marching orders, and prescriptive advice, he takes you through step by step how to apply the principles.  You'll get an introduction to mk-query-digest (though strangely without attribution to Baron Schwartz), the great maatkit tool for query analysis and aggregation, as well as the microsecond precision patch, which allows your mysql shell client to display more exact timing data.  For the patch he links back to an article on his own site which seems to be not found.  The author of the high precision mysql timer patch is Stewart Smith.

I personally got the most out of Chapter eight, full of self-described hidden performance tips.  From identifying unused or duplicate indexes, to replacing inefficient data types with better ones, why it's important to use NOT NULL where possible or how to store IP addresses efficiently, this chapter has a lot of goodies.  For those still struggling with SQL statement tuning, there are a few patterns that are described, offering advice on how to rewrite a subquery as an inner join,

What you might not know

  • MySQL includes Oracle's index organized tables by a different name
  • Too many indexes can dramatically impact INSERT & UPDATE performance
  • Many DDL operations can be done online - see oak-online-alter-table (Shlomi Noach)
  • Datatypes matter - use enum, int unsigned, timestamp & not null where possible
  • Covering indexes are your friend, duplicate & unused indexes are not!
  • A replication slave can have different storage engines or indexes from the master. These can support different uses - such as data warehousing or non-transactional requirements.
  • While a_string LIKE '%end of my sentence.' won't use an index, you can index reverse_string, then use reverse_string LIKE REVERSE '%end of my sentence.' and MySQL will use this index.  You've simulated an advanced Oracle feature, reverse key indexes!

A few small gripes

If I were to add a few complaints it would be to say that some of the examples were rather simplistic.  In many cases tuning SQL is not as simple as just adding the right index.  For instance there was no good discussion of the dreaded "using temporary, using filesort" that we see a lot in MySQL explains when sorting has to be done, but will not fit in memory.  Or what about tmpdir=/dev/shm, how will that improve things?  What about UNION versus UNION ALL where appropriate.  Why does DISTINCT do a sort?

The book was also missing a discussion of triggers, stored procedures, when or if the query cache can cause problems and so forth.  Also the article link mentioned about chapter seven isn't the only missing link.  I followed links to optimizing sql  statements and it seems to go to a generic holding page.  Also the main link effectivemysql.com/book leads to an outline of an as yet unreleased title on Backup and Recovery.

All in all, well worth your money

However, other than these few gripes the book overall is a very welcome addition to the small family of MySQL books.  Get a copy quick before they're all gone!

 

 


PlanetMySQL Voting: Vote UP / Vote DOWN

New algorithm for calculating 95 percentile

Август 30th, 2011

The 95 percentile for query response times is and old concept; Peter and Roland blogged about it in 2008. Since then, MySQL tools have calculated the 95 percentile by collecting all values, either exactly or approximately, and returning all_values[int(number_of_values * 0.95)] (that’s an extreme simplification). But recently I asked myself*: must we save all values? The answer is no. I created a new algorithm** for calculating the 95 percentile that is faster, more accurate, and saves only 100 values.***

Firstly, my basis of comparison is the 95 percentile algo used by mk-query-digest. That algo is fast, memory-stable, and very proven in the real world. It works well for any number of values, even hundreds of thousands of values. It saves all values by using base 1.05 buckets and counting the number of values that fall within the range of each bucket. The results are not exact, but the differences are negligible because a 10ms and 13ms response time are indiscernible to a human. Any algo that hopes to handle very large numbers of values must approximate because not even C can store and sort hundreds of thousands of floats (times N many attributes times N many query classes) quickly enough.

So when I finished the new algo, I compared it to the mk-query-digest algo and obtained the following results:

FILE                         REAL_95     OLD_95     NEW_95  OLD_DIFF NEW_DIFF  OLD_TIME NEW_TIME   FILE_SIZE  OLD_RSS  NEW_RSS
nums/500k-1-or-2               1.751      1.697      1.784    -0.054   +0.033     12.12     9.37     4500000    3.88M    2.63M
nums/100k-1-or-2               1.749      1.697      1.794    -0.052   +0.045      2.42     1.88      900000    3.88M    2.63M
nums/50k-trend-1-to-9          6.931      6.652      6.995    -0.279   +0.064      1.24     0.90      450000    3.88M    2.63M
nums/25k-trend-1-to-5          3.888      3.704      3.988    -0.184   +0.100      0.64     0.47      225000    3.88M    2.63M
nums/21k-1-spike5-1            0.997      0.992      2.002    -0.005   +1.005      0.55     0.42      189000    3.88M    2.63M
nums/10k-rand-0-to-20         19.048     18.532     19.054    -0.516   +0.006      0.29     0.21       95079    3.86M    2.62M
nums/10k-rand-0-to-10          9.511      9.360      9.525    -0.151   +0.014      0.29     0.21       90000    3.86M    2.62M
nums/4k-trend-1-to-7           5.594      5.473      6.213    -0.121   +0.619      0.14     0.09       36000    3.86M    2.63M
nums/1k-sub-sec                0.941      0.900      0.951    -0.041   +0.010      0.07     0.04        9000    3.80M    2.62M
nums/400-half-10              10.271      9.828     10.273    -0.443   +0.002      0.05     0.03        3800    3.79M    2.62M
nums/400-high-low             10.446     10.319     10.446    -0.127        0      0.05     0.03        3800    3.79M    2.62M
nums/400-low-high             10.445     10.319     10.475    -0.126   +0.030      0.05     0.03        3800    3.79M    2.63M
nums/400-quarter-10           10.254      9.828     10.254    -0.426        0      0.06     0.03        3700    3.79M    2.62M
nums/153-bias-50              88.523     88.305     88.523    -0.218        0      0.05     0.03        1500    3.79M    2.62M
nums/100-rand-0-to-100        90.491     88.305     90.491    -2.186        0      0.05     0.03         991    3.79M    2.62M
nums/105-ats                  42.000     42.000     42.000         0        0      0.05     0.03         315    3.75M    2.61M
nums/20                       19.000     18.532     19.000    -0.468        0      0.04     0.03          51    3.79M    2.62M
nums/1                        42.000     42.000     42.000         0        0      0.04     0.03           3    3.75M    2.61M

 
I generated random microsecond values in various files. The first number of the filename indicates the number of values. So the first file has 500k values. The remaining part of the filename hints at the distribution of the values. For example, “50k-trend-1-to-9″ mean 50k values that increase from about 1 second to 9 seconds. Number and distribution of values affects 95 percentile algorithms, so I wanted to simulate several possible combinations.

“REAL_95″ is the real, exact 95 percentile; this is the control by which the “old” (i.e. the mk-query-digest) and new algos are compared. The diffs are comparisons to this control.

Each algo was timed and its memory (rss) measured, too. The time and memory comparisons are a little bias because the mk-query-digest module that implements its 95 percentile algo does more than my test script for the new algo.

The results show that the new algo is about 20% faster in all cases and more accurate in all but one case (“21k-1-spike5-1″). Also, the new algo uses less memory, but again this is a little bias; the important point is that it doesn’t use more memory to get its speed or accuracy increase.

The gains of the new algo are small in these comparisons, but I suspect they’ll be much larger given that the algo is used at least twice for each query. So saving 1 second in the algo can save minutes in data processing when there’s tens of thousands of queries.

Instead of explaining the algorithm exhaustively, I have upload all my code and data so you can reproduce the results on your machine: new-95-percentile-algo.tar.gz. You’ll need to checkout Maatkit, tweak the “require” lines in the Perl files, and tweak the Bash script (cmp-algos.sh), but otherwise I think the experiment should be straight forward. The new algo is in new-algo.pl. (new-algo.py is for another blog post.)

My ulterior motive for this blog post is to get feedback. Is the algorithm sane? Is there a critical flaw that I overlooked? Do you have a real-world example that doesn’t work well? If you’re intrepid or just curious and actually study the algo and have questions, feel free to contact me.

* By “recently asked myself” I mean that some time ago Baron and I wondered if it was possible to calculate 95 percentile without saving all values. At that time, I didn’t think it was feasible, but lately I thought and coded more about the problem.

** By “a new algorithm” I doubt that this has never been attempted or coded before, but I can’t find any examples of a similar algorithm.

*** By “saves only 100 values” I mean ultimately. At certain times, 150 values may be saved, but eventually the extra 50 should be integrated back into the base 100 values.


PlanetMySQL Voting: Vote UP / Vote DOWN

mk-query-digest Tips – Showing all hosts & users

Июнь 1st, 2010

The Maatkit tools provide a suite of additional MySQL commands. There is one command I use constantly and that is mk-query-digest.

Unfortunately the documentation does leave a lot to be desired for usability. While throughout, it is a man page and not a user guide. Several of us have discussed writing better documentation however it’s always a matter of time. I have however learned a number of tips and I’d like to share them in smaller digests.

The first is showing additional display. Maatkit works on truncating per line output to a reasonable length of 73 characters?

One of those lines is the list of hosts that connected to MySQL for a query, for example.

# Hosts                  4 192.168.40... (2), 192.168.40... (2)... 2 more
# Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6)... 1 more

The problem is I want to know what that 1 more is so I can gather a complete list of IP addresses that connect to this server. You do that with the –show-all=host argument.

Without

$ cat external.tcpdump | ./mk-query-digest --type tcpdump | grep Hosts | uniq -c
#
      1 # Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6)... 1 more
      1 # Hosts                  1 99.99.139.140

With

$ cat external.tcpdump | ./mk-query-digest --type tcpdump --show-all=host | grep Hosts | uniq -c
      1 # Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6), 99.99.139.140 (2)
      1 # Hosts                  1 99.99.139.140

You can apply the same principle to the Users as well with –show-all=user

$ cat external.tcpdump | ./mk-query-digest --type tcpdump  --show-all=user | grep Users | uniq -c
      1 # Users                  2 xxx (13), phpmysqlmo... (5)
     49 # Users                  1  xxx

The problem is a still gett a truncation of the name ‘phpmysqlmo…’ That’s the one thing I’m trying to uncover, because that IP and usernme are not valid permissions for this system.


PlanetMySQL Voting: Vote UP / Vote DOWN

tcpdump errors on FreeBSD for mk-query-digest

Май 28th, 2010

While I use this tcpdump command for MySQL query analysis with mk-query-digest, I found recently that it didn’t work on FreeBSD

$ tcpdump -i bge0 port 3306 -s 65535 -x -n -q -tttt -c 5
tcpdump: syntax error

It left me perplexed and reading the man page seemed to indicate my options were valid. I tried a few variances just to be sure without success.

$ tcpdump -i bge0 -c 5 port 3306 -x
tcpdump: syntax error
$ tcpdump -i bge0 -c 5 port 3306 -q
tcpdump: syntax error
$ tcpdump -i bge0 -c 5 port 3306 -tttt
tcpdump: syntax error

The solution was actually quite simple in the end, it had nothing to do with the commands, it had everything to do with the order of them. Placing port as the last option solved the problem.

$ tcpdump -i bge0 -s 65535 -x -n -q -tttt -c 5  port 3306
$ uname -a
FreeBSD db4.example.com 6.3-RELEASE-p3 FreeBSD 6.3-RELEASE-p3 #0: Wed Jul 16 05:13:50 EDT 200

PlanetMySQL Voting: Vote UP / Vote DOWN

Videos of Pythian Sessions from the 2010 O’Reilly MySQL Conference and Expo

Апрель 21st, 2010

Here’s a sneak peek at a video matrix — this is all the videos that include Pythian Group employees at the MySQL conference. I hope to have all the rest of the videos processed and uploaded within 24 hours, with a matrix similar to the one below (but of course with many more sessions).

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Main Stage
Keynote: Under New Management: Next Steps for the CommunitySheeri K. Cabral (Pythian)N/A18:16
session 14808
Ignite talk: MySQLtuner 2.0Sheeri K. Cabral (Pythian)PDF5:31N/A
Interview
Thoughts on Drizzle and MySQLSheeri K. Cabral (Pythian)N/A9:22N/A
Tutorials
MySQL Configuration Options and Files: Basic MySQL Variables (Part 1)Sheeri K. Cabral (Pythian)
PDF
1:25:04, pre-break

1:35:47, post-break
session 12408
MySQL Configuration Options and Files: Intermediate MySQL Variables (Part 2)Sheeri K. Cabral (Pythian)
PDF
1:25:04, pre-break

1:24:28, post-break
session 12435
Sessions
Better Database Debugging for Shorter DowntimesRob Hamel (Pythian)PDF33:13
session 13021
Find Query Problems Proactively With Query ReviewsSheeri K. Cabral (Pythian)PDF45:59session 13267
Time Zones and MySQLSheeri K. Cabral (Pythian)PDF45:54
session 12412
Security Around MySQLDanil Zburivsky (The Pythian Group)ODP37:27session 13458
Continual Replication SyncDanil Zburivsky (The Pythian Group)ODP45:57session 13428


PlanetMySQL Voting: Vote UP / Vote DOWN

Filtering and analyzing queries by year, month, hour and day with mk-query-digest

Октябрь 23rd, 2009

I originally posted this on the Maatkit discussion list:

A little while ago a user asked in http://groups.google.com/group/maatkit-discuss/browse_thread/thread/256b6c780bdb066d if it was possible to use mk-query-digest to analyze queries per hour. I responded with a skeleton script for use with –filter, but I didn’t actually test this. Today, I filled out the script and tested it and found that it works. The script is available from trunk at:

http://maatkit.googlecode.com/svn/trunk/mk-query-digest/t/samples/filter-add-ymdh-attribs.txt

The test file I’m using is available at:

http://maatkit.googlecode.com/svn/trunk/common/t/samples/binlog005.txt

The filter code does two things: it adds attributes called year, month, day and hour to each event, and it uses environment variables called YEAR, MONTH, DAY and HOUR to filter those newly added attributes. I’ll show how this works later.

The filter works best with binary logs because binlogs reliably timestamp events. If an event does not have a timestamp (as is often the case in a slowlog), then it gets values 0, 0, 0, 24 for year, month, day and hour respectively. Since 0 is a valid hour, 24 is used to indicate that the event had no hour.

The basic usage is to group queries by hour. Let’s say you want to see query stats for each hour. The command line is:

mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour

Notice “–group-by hour”. And the result is (truncated for brevity):

# ########################################################################
# Report grouped by hour
# ########################################################################

# Item 1: 1.50 QPS, 31.01kx concurrency, ID 0x0DB5E4B97FC2AF39 at byte 450
#              pct   total     min     max     avg     95%  stddev  median
# Count         30       3
# Exec time     30  62029s  20661s  20704s  20676s  19861s       0  19861s
# Time range 2007-12-07 13:02:08 to 2007-12-07 13:02:10
# bytes         23      81      27      27      27      27       0      27
# day           25      21       7       7       7       7       0       7
# error cod      0       0       0       0       0       0       0       0
# month         27      36      12      12      12      12       0      12
# year          27      21       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
13
...
# Item 2: 0.00 QPS, 0.71x concurrency, ID 0xAA27A0C99BFF6710 at byte 301 _
#              pct   total     min     max     avg     95%  stddev  median
# Count         30       3
# Exec time     29  62000s  20661s  20675s  20667s  19861s       0  19861s
# Time range 2007-12-07 12:02:50 to 2007-12-08 12:12:12
# bytes         46     163      22      87   40.75   84.10   25.86   26.08
# day           37      30       7       8    7.50    7.70    0.36    7.70
# error cod      0       0       0       0       0       0       0       0
# month         36      48      12      12      12      12       0      12
# year          36      28       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
12
...
# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x                 62029.0000 30.0%       3 20676.3333 13
#    2 0x                 62000.0000 30.0%       3 20666.6667 12
#    3 0x                 20661.0000 10.0%       1 20661.0000 23
#    4 0x                 20661.0000 10.0%       1 20661.0000 10
#    5 0x                 20661.0000 10.0%       1 20661.0000 08
#    6 0x                 20661.0000 10.0%       1 20661.0000 18

 
Each item corresponds to the queries for that hour. Shown above are hours 13 (1pm) and 12 (noon). Then the profile gives you summarized information about each hour. From this fake binlog we see that 30% of queries occurred in the noon hour. (binlog005.txt is highly contrived; the values are just for demonstration.)

Unless your logs are rotated daily, chances are there will be noon-hour queries for multiple days. If you want to see per-hour stats for one specific day, the filter can do this, too, by using environment variables. Filter scripts were not originally meant to accept user input, and having to modify values in the actual code isn’t flexible, so the solution is to use environment variables. Here’s how:

DAY=7 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour

The leading “DAY=7″ temporarily sets the environment variable DAY only during the execution of mk-query-digest. This way you don’t pollute your normal environment variables. The result is now (truncated again):

# Item 2: 0.00 QPS, 12.24x concurrency, ID 0xAA27A0C99BFF6710 at byte 301
# This item is included in the report because it matches --limit.
#              pct   total     min     max     avg     95%  stddev  median
# Count         28       2
# Exec time     28  41339s  20664s  20675s  20670s  20675s      8s  20670s
# Time range 2007-12-07 12:02:50 to 2007-12-07 12:59:07
# bytes         28      54      27      27      27      27       0      27
# day           28      14       7       7       7       7       0       7
# error cod      0       0       0       0       0       0       0       0
# month         28      24      12      12      12      12       0      12
# year          28      14       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
12

 
Notice that there are now only 2 queries in the noon hour and that the time range is only in 2007-12-07. Previously, there was a noon-hour query in 2007-12-08. Thus we know that the DAY filter worked.

In this fashion, you can group and filter your log as you please. You can combine multiple filters like:

DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour

That will group and analyze only queries from the noon hour of the 7th (December 7, 2007 in this log). mk-query-digest is so flexible you can even do this:

DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour --no-report --print

That will suppress the query analysis and report and simply print all the queries from the noon hour of the 7th in pseudo-slowlog format.

There are, of course, other ways to do this kind of per-time-unit query aggregation, analysis and reporting (e.g. –since and –until), but if all you have are pre-existing logs and mk-query-digest, then –filter can be used to accomplish this task, too.


PlanetMySQL Voting: Vote UP / Vote DOWN

Take a look at mk-query-digest

Октябрь 8th, 2009

Q: What SQL is running on your MySQL database server now?
A: The bane of pain for MySQL DBA’s when there is no official MySQL instrumentation that is dynamic and fine grained sufficiently to solve this problem at the SQL interface.

While hybrid solutions exist, the lack of dynamic and real-time are the issues. There is however great work being done by Baron and others on Maatkit mk-query-digest and packet sniffing the MySQL TCP packets.

$ sudo tcpdump -i eth0 port 3306 -s 65535  -x -n -q -tttt | ./mk-query-digest --type tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
# Caught SIGINT.
5444 packets captured
8254 packets received by filter
2809 packets dropped by kernel
# 2.1s user time, 40ms system time, 22.23M rss, 57.60M vsz
# Overall: 1.58k total, 38 unique, 262.67 QPS, 2.34x concurrency _________
#                    total     min     max     avg     95%  stddev  median
# Exec time            14s    41us      2s     9ms    23ms    72ms   236us
# Time range        2009-10-05 08:17:44.377926 to 2009-10-05 08:17:50.052460
# bytes            271.31k       8   8.79k  176.28  621.67  412.51   44.60
# Rows affe            248       0       3    0.16    0.99    0.38       0
# Warning c              0       0       0       0       0       0       0
#   3% (58)   No_index_used

# Query 1: 118.67 QPS, 1.23x concurrency, ID 0x16219655761820A2 at byte 2167682
#              pct   total     min     max     avg     95%  stddev  median
# Count         45     712
# Exec time     52      7s    41us      1s    10ms    23ms    80ms   138us
# Hosts                 11 10.251.199... (132), 10.251.103... (129)... 9 more
# Time range 2009-10-05 08:17:44.377926 to 2009-10-05 08:17:50.051177
# bytes          2   6.43k       8      17    9.25   16.81    3.15    7.70
# Errors                 1    none
# Rows affe      1       4       0       1    0.01       0    0.07       0
# Warning c      0       0       0       0       0       0       0       0
#   0% (1)    No_index_used
# Query_time distribution
#   1us
#  10us  ################################################
# 100us  ################################################################
#   1ms  #####
#  10ms  #############
# 100ms  #
#    1s  #
#  10s+
# EXPLAIN
select 1\G

....

# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x16219655761820A2     7.3861 54.9%     712   0.010374 SELECT
#    2 0x930DE584EC815E11     1.6664 12.4%      35   0.047611 SELECT X
#    3 0x68B1E4E47977667A     1.4265 10.6%      71   0.020092 SELECT Y Z
...

In this real-time example, the SELECT 1 a Connector/J keep alive in version 3.1.4 using iBATIS is the major SQL statement used. (yes MM I know about /* ping*/, have suggested to client). I was however with additional sample times able to identify a new query and confirm a full table scan by lack of good index. Monitoring had highlighted an increase in SQL statements and table scans, but you need tools such as this to identify the problem SQL in a well tuned system.

There is a lot of information to digest with this output, to confirm and determine the relative benefit of each number, the histogram etc, but the identification of SQL in real-time and the good work of overall summaries and comments for EXPLAIN and SHOW CREATE TABLE for example shows this tool has been designed by MySQL DBA’s for MySQL DBA’s.

Sheeri just wrote about Dynamic General and Slow Query Log Before MySQL 5.1 which apart from the File I/O overhead is an idea I’d not considered before. What may be a good idea, is to pass this information into a named pipe and then let another process do whatever. Drizzle solves this problem with query logging information being able to be shipped off to Gearman.


PlanetMySQL Voting: Vote UP / Vote DOWN