Archive for the ‘Maatkit’ Category

A clarification on mk-slave-prefetch

Август 21st, 2011

It seems to be a popular misconception that mk-slave-prefetch is designed to keep a MySQL replica server “warm” and ready to serve production traffic in case the master is demoted or fails. This is not what mk-slave-prefetch does. It’s related, and easy to confuse, but its purpose is different.

The mk-slave-prefetch tool is designed to try to execute a read-only approximation of the write workload that the replica is about to have to perform. It is meant to do this just a little bit before the replication thread (which can only be true if replication is lagging), so that when the replica replays writes to execute replication, it doesn’t have to wait for disk I/O.

Keeping caches warmed up for production traffic requires that the read workload, which does not flow through relay logs, is executed on the server. If you point mk-slave-prefetch at a server, you’re just double-executing the write workload in a read-only fashion.

Perhaps an example will clarify. Suppose that a master database has a bunch of writes to WritableTable, and lots and lots of reads to ReadableTable. On the replica, the only queries in the relay log will be on WritableTable. And what will mk-slave-prefetch be doing? Executing SELECT queries against WritableTable. No traffic on ReadableTable at all. If you switch to use the replica as the primary server, ReadableTable’s data won’t be in the caches.

To actually warm up the replica, you need to replay the queries against the ReadableTable. You can only find those by looking at the master. You can read its query logs, or watch its processlist, or capture TCP traffic, or any other method of capturing read traffic. There is a feature built into mk-query-digest to help you capture and replay these against the replica: –execute.

A parting note: mk-slave-prefetch is an extremely niche tool that generally doesn’t help replicas keep up with their masters. Only in very special cases is the hardware, data, and workload suitable for what it does.

Related posts:

  1. Pop quiz: how can one slave break another slave
  2. Why MySQL says the server is not configured as a slave
  3. How to know if a MySQL slave is identical to its master
  4. How to measure MySQL slave lag accurately
  5. MySQL disaster recovery by promoting a slave


PlanetMySQL Voting: Vote UP / Vote DOWN

Making mk-table-checksum less intrusive

Август 8th, 2011

About a month ago I needed to compare tens of thousands of tables in hundreds of databases between a few different servers. The obvious choice was, mk-table-checksum! The only problem was, that the tool needs to know the minimum and maximum value of the column by which each table is to be subdivided into chunks and checksummed. This select min(col), max(col) from table locks all write operations on the table and on a big table it meant downtime.

Looking at the source it was clear we could make mk-table-checksum run the select min(col), max(col) from table on the read-only slave and use the values to checksum the master.

It was subtle code changes in function:
get_range_statistics adding

my $cxn_string_dc = “DBI:mysql:;host=slavehost;port=3306;mysql_read_default_group=client”;
my $user = ‘user’;
my $pass = ‘password’;
my $dbh_slave = DBI->connect($cxn_string_dc, $user, $pass);

and changing $dbh connection to $dbh_slave in the rest of the function where the min,max values were being calculated.

Good luck and enjoy a much less intrusive checksumming on your masters.

Because sharing is caring


PlanetMySQL Voting: Vote UP / Vote DOWN

Planned change in Maatkit & Aspersa development

Июль 7th, 2011

I’ve just sent an email to the Maatkit discussion list to announce a planned change to how Maatkit (and Aspersa) are developed. In short, Percona plans to create a Percona Toolkit of MySQL-related utilities, as a fork of Maatkit and Aspersa. I’m very happy about this change, and I welcome your responses to that thread on the discussion list.

Related posts:

  1. Aspersa, a new opensource toolkit
  2. Four companies to sponsor Maatkit development
  3. How Maatkit benefits from test-driven development
  4. Aspersa gets a user manual
  5. Using Aspersa to capture diagnostic data


PlanetMySQL Voting: Vote UP / Vote DOWN

Maatkit’s mk-query-digest learns about Apdex

Ноябрь 9th, 2010

This month’s Maatkit release, which just hit the download servers, contains a new feature for mk-query-digest: Apdex scores. The Apdex performance index is a sensible metric of how satisfactory the response time to a request is. Of all the scientific metrics I’ve seen, it’s the most practical; and of all the practical ones I’ve seen, it’s the most scientific.

To use this feature, you need to know what the Apdex threshold means, and you need to know how to configure this in mk-query-digest.

And then you can ask and answer practical questions such as “which classes of queries against this MySQL server are not satisfying users?” Or, because mk-query-digest supports the HTTP protocol, you can pose the same question about your web application overall.

Related posts:

  1. Maatkit learns how to map-reduce
  2. mk-query-digest now supports Postgres logs
  3. Learn about mk-query-digest at PgEast 2010
  4. mk-query-digest uses less memory
  5. Slides from my session on mk-query-digest at PgEast 2010


PlanetMySQL Voting: Vote UP / Vote DOWN

Aspersa gets a user manual

Сентябрь 23rd, 2010

It doesn’t exist until it is nicely documented. Now Aspersa has documentation. Writing these tools has taught me how powerful and flexible Bash can be. Solving MySQL problems is a lot easier with good tools!

Related posts:

  1. Aspersa, a new opensource toolkit
  2. Using Aspersa to capture diagnostic data
  3. MySQL manual gets improved searching
  4. I’m a Postgres user, as it turns out
  5. Advanced MySQL user variable techniques


PlanetMySQL Voting: Vote UP / Vote DOWN

Exploring mk-table-checksum

Сентябрь 9th, 2010

I recently started exploring the wonders of mk-table-checksum after spending too long dismissing the magic-like maatkit toolkit. What follows is not an exhaustive treatment of mk-table-checksum, but just an overview and some things I had to learn by trying the tool out.

 

The basic principle is based on the CHECKSUM TABLE table maintenance statement supported in the MySQL Server. The mk-table-checksum runs a CHECKSUM TABLE command on all (or some) tables. The results of that are stored in another table on the server.

 

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

Two subtle bugs in OUTER JOIN queries

Август 3rd, 2010

OUTER JOIN queries in SQL are susceptible to two very subtle bugs that I’ve observed a number of times in the real world. Daniel and I have been hammering out ways to automatically detect queries that suffer from these bugs, in a relatively new Maatkit tool called mk-query-advisor. It’s part of our series of advisor tools for MySQL. I wrote a blog post about it a while ago. Automated analysis of bad query patterns is a good thing to write tools to do, because catching buggy queries is hard work if you do it manually.

Let’s dive right in and analyze these subtle bugs. Warning: if you don’t understand how SQL handles NULL, you’re not going to understand the following. Many people have a hard time with NULL, which is why these bugs are so hard to understand and avoid. This is one reason why SQL is a hard language to use properly.

Bug 1: a column could be NULL for two reasons, and you can’t distinguish them

If the outer table in your query contains NULL-able columns, and you place a WHERE clause to filter out all but those rows, you’re going to get bugs because a non-matching row in the outer table will be all-NULL. Here’s an example. Let’s start with a plain outer join query:

select * from L left join R on l_id = r_id;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

Here we see that one row in the outer table is missing, and one row (the middle row) has a NULL r_other column. Now, let’s add a WHERE clause:

select * from L left join R on l_id = r_id where r_other is null;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

This query is buggy, because the two rows are returned for completely different reasons, and you can’t be sure which is which. IS NULL clauses can safely be placed on the columns used in the JOIN clause, but not on other columns in the outer table that might be NULL.

Bug 2: an OUTER JOIN is converted to INNER

If you place a non-null-safe comparison operator on any column in the outer table that isn’t part of the JOIN clause, you implicitly disable the outer-ness of the query and convert it to an INNER JOIN. Here’s an example:

select * from L left join R on l_id = r_id where r_other > 1;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
+------+------+---------+

The left-outer-ness of the above query is what causes the third row to be output in the first query I showed you above. The greater-than operator in this example automatically makes the left-ness impossible, because anytime there’s a row in the inner table that has no match in the outer table, it’ll be filled in with NULLs, and those NULLs will be eliminated by the operator. So the effect is that only matching rows will ever be output.

If you want to ponder variations and subtleties of the above, you can read more discussion on the issue report where we’re hammering out the details of automatically detecting and warning about these sneaky errors.

Related posts:

  1. How to simulate FULL OUTER JOIN in MySQL
  2. How to write a SQL exclusion join
  3. How to write SQL JOIN clauses more compactly
  4. The dangerous subtleties of LEFT JOIN and COUNT() in SQL
  5. How to write INSERT IF NOT EXISTS queries in standard SQL


PlanetMySQL Voting: Vote UP / Vote DOWN

Online Verification That Master and Slaves are in Sync

Июль 27th, 2010

In October 2008, Baron posted How to Check MySQL Replication Integrity Continually. Here at Pythian we have developed a method based on that post, and added “verifying that masters and slaves are in sync” to our standard battery of tests.

We call it “Continual replication sync checking”. This article will explain how it works, how to test and make the procedure non-blocking, benchmarks from the real world, issues we encountered along the way, and finally Pythian’s procedure of setting up continual replication sync in a new environment.

At the 2010 MySQL User Conference & Expo, my co-worker Danil Zburivsky did a presentation about this, and you can get the slides in ODP (Open Office) format or watch the 46-minute video on YouTube.

How it works

On the master, mk-table-checksum is run. In order to make the checksum operation online, we use the modulo and offset features of mk-table-checksum to checksum only part of the data at a time. The checksum is run (from cron) on the master and replicates to the slave. The results are captured in a result table, and a separate process checks the result table and notifies us of any discrepancies.

Testing resource usage for non-blocking replication sync checking

Each environment has a different data size, different resources, and a different level of what “intrusive” means. We start with a baseline of the size of the database, which we get from:

SELECT SUM(INDEX_LENGTH+DATA_LENGTH)/1024/1024/1024 as sizeGb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE='BASE TABLE';

I included INDEX_LENGTH because the queries that generate the checksum information may or may not depend on indexes. Note that for InnoDB tables, the metadata is approximate, so this will not necessarily be the exact size of the database, but it will be in the right range. In general, running this query in the INFORMATION_SCHEMA did not consume too many resources, though in many cases the query took a few minutes to complete. On databases with several thousand tables, this query can take hours to complete.

Of course there is the caveat that it may take a VERY long time to run this in your environment, and some folks have reported crashing (this is VERY rare, I personally have not seen it even with the high demand client we have had, but others have reported it). So if you prefer to be on the safe side, you could look at the size of your ibdata files (whether you are innodb_file_per_table or not) plus the sizes of all the tables on disk (MyISAM, CSV, etc). That will give you a sense of how big the data is, although the size of the files on disk also includes any fragmentation. In practice, I have seen fragmentation reduce the size of a large database by 40%, so beware that neither of these methods is perfect.

Once we get the data size in gigabytes, we can come up with a modulo to test. In general, we have found that mk-table-checksum is quick enough that it is non-blocking when we use a modulo value that corresponds to approximately 1 Gb of data, and a chunk-size of 100. We actually started out using a checksum of 1000, but locking 1000 rows to test a checksum proved to be longer than was acceptable for clients with lots of transactions — the problem was not the locking on the master, the problem was that it created a slave lag on the slaves. We tried using the --sleep option to mk-table-checksum, which worked, but changing the chunksize to be smaller caused less slave lag than adding a sleep parameter.

Another issue when using chunk-size is that it requires a numeric (but non-decimal) index. We ran across clients using VARCHAR(32) as a PRIMARY key (yes, on InnoDB, and yes, performance was an issue there), or tables that did not have any indexes at all (such as a logging table). If mk-table-checksum does not find a suitable index it will just do a checksum on the whole table.

To find tables that would be problematic, here is an INFORMATION_SCHEMA query that can be run (again, the usual caveats about INFORMATION_SCHEMA apply):

SELECT CONCAT("SHOW CREATE TABLE ", table_schema,".",table_name,"\\G")
FROM TABLES
LEFT JOIN (
     SELECT distinct table_schema,table_name
     FROM STATISTICS INNER JOIN COLUMNS USING
       (table_schema,table_name,column_name)
     WHERE (data_type like '%int' or data_type like 'date%'
     or data_type like 'time%' or data_type='float') and seq_in_index=1
     ) AS good_for_checksum
USING (table_schema,table_name)
WHERE good_for_checksum.table_name IS NULL
and table_schema not in ('information_schema');

Frequency of checksum runs


The frequency that we run the checksum is also very flexible, so we take the size and translate that into a modulo that is “even” in a time-based way. For example, on a server that reported 113 Gb in size from the INFORMATION_SCHEMA query above, we set the modulo to 120. The checksum took 10 minutes and 6 seconds from the time the master started to the time the slave finished. There was no excessive slave lag caused and other resource checks showed that this test was acceptable, including application response time for non-administrative queries.

Given a modulo of 120 that takes about 10 minutes to run and the environment, we decided to run the checksum 9 times per day (hourly for 9 hours during off-peak time). This resulted in the entire data set being checked during a period of just under 2 weeks (120 parts / 9 times per day = 13.333 days).

This means that if there is a data discrepancy, it is discovered within 2 weeks in this particular environment. Though that is not ideal, it is much better than not discovering data discrepancies at all, which is how replication currently works.

Benchmarks From the Real World

The first row in the table below is the example we just went through. The subsequent rows of the table are results from some of the other production environments we run the checksum in. As you can see, we try to keep the data checked at one time to about 1 Gb.



Total Data SizeModuloTest timeChecksum RateFrequencyPeriod
113 Gb12010 min 6 seconds1.59 Mb / sec9x / day2 weeks
9 Gb2151 seconds8.4 Mb / sec3x / day1 week
29 Gb219 min 16 seconds2.6 Mb / sec3x / day1 week
70 Gb2128 seconds2650 Mb / sec
(data was
freshly defragmented!)
3x / day1 week
5.1 Gb214 min 22 sec0.958 Mb / sec3x / day1 week
314.5 Gb33636 min 3 seconds0.44 Mb / sec16x / day3 weeks

In all of these environments, slave lag was 10 seconds or less at any given point on the slaves.

Issues Encountered


Some of the issues we encountered have workarounds, so I wanted to discuss and explain them here before giving our procedure, which contains the workarounds.

  1. Bug 304 – mk-table-checksum deletes all prior checksum results in the result table on every run. There is no fix for this yet, but if you are using mk-table-checksum only for the procedure described in this article (and in particular are not using the --resume-replication option), you can comment out the following code from mk-table-checksum:

    # Clean out the replication table entry for this table.
    if ( (my $replicate_table = $final_o->get('replicate'))
    && !$final_o->get('explain') ) {
    use_repl_db(%args); # USE the proper replicate db
    my $del_sql = "DELETE FROM $replicate_table WHERE db=? AND tbl=?";
    MKDEBUG && _d($dbh, $del_sql, $db, $table->{table});
    $dbh->do($del_sql, {}, $db, $table->{table});
    }

    It is in different places in different versions, but last I checked, searching for "DELETE FROM" in mk-table-checksum only matched three lines of code, and it was pretty clear (due to the inline comment) which block of code to delete. The block shown above is from lines 5154 – 5161 in mk-table-checksum changeset 6647.

  2. Running the checksum may cause “statement not safe for replication” errors, especially in 5.1.
  3. This is usually OK to ignore, because mk-table-checksum works specifically because you can run the same command on the master and slave and get different results. In MySQL 5.1, CSV tables for the general and slow logs exist by default, even if they are not being written to, and “Statement is not safe to log in statement format” errors show up.

    You will need to redirect stderr and expect to see those statements in the MySQL error log. Note that mk-table-checksum works regardless of whether you are using statement-based, row-based or mixed replication.

  4. mk-table-checksum is not perfect, and sometimes shows false positives and false negatives.
  5. This is a hard to deal with problem, and we encourage making bug reports when they are found. However, I will note that if mk-table-checksum finds even one undetected data integrity issue, then it is useful, because right now there is no other way of detecting issues in an automated fashion. As more people use mk-table-checksum and can help the developers figure out how to fix the false positives/false negatives, I am sure it will be even better.

Pythian’s procedure to set up continual replication sync

  1. Check to make sure all tables have appropriate indexes, as above
  2. . If they do not, the nibbling algorithm can be used, though as a caveat I have not tested nibbling with mk-table-checksum.

  3. Figure out the modulo value based on data size, as above

  4. Decide what database in which to put the tables that mk-table-checksum uses. We either use our monitoring database or a database called “maatkit”. Note that it is important to use a database that actually gets replicated!
  5. Get mk-table-checksum and comment out the lines that always delete from the replicated table, as above
  6. Create and populate the table mk-table-checksum will need for the modulo value:
    CREATE TABLE IF NOT EXISTS `checksum_modulo` (
    `modulo_offset` smallint(5) unsigned NOT NULL default '0' primary key
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

    INSERT IGNORE INTO checksum_modulo (modulo_offset) VALUES (0);

  7. Do a test run of mk-table-checksum:

  8. perl mk-table-checksum -u avail --ask-pass --algorithm BIT_XOR \
    --replicate maatkit.checksum --create-replicate-table \
    --modulo 120 --chunk-size 100 \
    --offset 'modulo_offset FROM maatkit.checksum_modulo' localhost

    And update the modulo table:
    update maatkit.checksum_modulo set modulo_offset = ((modulo_offset+1)%21);

    And check the results on the slave:
    SELECT * FROM maatkit.checksum
    WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');

  9. If the test run came out OK, set up a script to run via cron such as:

  10. # run checksum table 9 times per day, during off-peak times:
    30 0,1,2,3,4,5,21,22,23 * * * /home/pythian/bin/checksumtbls.sh >> /home/pythian/logs/checksum.log 2>&1

    And the checksumtbls.sh script looks like:

    #!/bin/sh

    DB=maatkit
    USER=maatkit_user
    PASS=password
    REPLTBL="$DB.checksum"
    MODULO=120
    CHUNKSIZE=100
    OFFSETTBL="checksum_modulo"
    ALGORITHM=BIT_XOR
    LOG=/home/pythian/logs/checksum.log

    /usr/bin/perl /home/pythian/bin/mk-table-checksum -u $USER -p $PASS \
    --modulo $MODULO \ --algorithm $ALGORITHM --chunk-size $CHUNKSIZE \
    --offset "modulo_offset FROM $DB.$OFFSETTBL" \
    --replicate $REPLTBL --create-replicate-table localhost >> $LOG

    /usr/bin/mysql -u $USER -p$PASS -e "update $DB.$OFFSETTBL set modulo_offset=((modulo_offset+1)%$MODULO)" >> $LOG

    And of course, do not forget to periodically check on the slave to see where the issues are:

    SELECT * FROM maatkit.checksum
    WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');

    I hope this helps; It is extremely important to make sure


    PlanetMySQL Voting: Vote UP / Vote DOWN

Aspersa’s mysql-summary tool

Июль 10th, 2010

For those of you who miss what Maatkit’s mk-audit tool (now retired) gave you, there’s a pair of tools in Aspersa that more than replaces it. I wrote previously about the summary tool. I don’t think I have mentioned the mysql-summary tool. It has been under development for a while, and at this point it has quite a lot of functionality. You can see a sample of the output on its wiki page.

Related posts:

  1. Apsersa’s summary tool supports Adaptec and MegaRAID controllers
  2. Aspersa, a new opensource toolkit
  3. Using Aspersa to capture diagnostic data
  4. MySQL Toolkit’s Show Grants tool 0.9.1 released
  5. Introducing MySQL Toolkit’s Show Grants tool


PlanetMySQL Voting: Vote UP / Vote DOWN

Is Maatkit notable enough for Wikipedia now?

Июль 2nd, 2010

The Maatkit article on Wikipedia was removed some time ago, after being deemed not notable. I believe this is no longer the case. It’s hard to find a credible book published on MySQL in the last few years that doesn’t mention Maatkit, there’s quite a bit of blogging about it from MySQL experts and prominent community members, and the toolkit is certainly in wide use — it’s important enough that notable companies are supporting its development. It’s available through every major Unix-like operating system’s package repository. On Debian, it’s actually part of the mysql-client package, so if you install MySQL, you automatically get Maatkit too. I believe it’s probably the second most important set of MySQL command-line tools; the most important, of course, is the set of client applications that is included with MySQL itself.

But my opinion on this topic is beside the point. I’m the creator, and I’m biased. The Maatkit Wikipedia article should be created by independent people, not the project’s founder. If you think that Maatkit belongs on Wikipedia, I encourage you to help write that article.

Related posts:

  1. Where do you use Maatkit in real life?
  2. Get Maatkit fast from the command line
  3. Maatkit in RHEL and CentOS
  4. New Maatkit release policy
  5. Maatkit version 1417 released


PlanetMySQL Voting: Vote UP / Vote DOWN