Archive for the ‘partitioning’ Category

Overlooked MySQL 5.6 New Features – WL#5217

Май 24th, 2012

There’s a lot of great new features in MySQL 5.6 DMRs – almost too many to keep track of.  And while a lot of (justified) attention is given to the headline-grabbing features of 5.6 (memcached APIs! global transaction ids! improved PERFORMANCE_SCHEMA!), I’m often curious about the new features that don’t make as big a splash.  I thought I would look at one such new feature – WorkLog #5217.  I’m not telling you what this WorkLog is yet; test your knowledge of 5.6 features by seeing if you can figure it out from the following scenario.  Imagine the following table data:

mysql> SELECT * FROM p;
+------+------+
| a    | b    |
+------+------+
|    1 |    1 |
|   11 |    1 |
|   21 |    1 |
+------+------+
3 rows in set (0.00 sec)

OK, try to write an UPDATE statement that increments b for the row where a = 21.  Easy, right?

UPDATE p SET b = b+1 WHERE a = 21;

OK, now do it without a WHERE clause.  Maybe that made you think for a minute, but perhaps you came up with something like this:

UPDATE p SET b = b+1 ORDER BY a = 11 DESC LIMIT 1;

Nice!  Now do it without ORDER BY or LIMIT.  Can you do it?

There’s a hint in the table structure:

mysql> SHOW CREATE TABLE p\G
*************************** 1. row ***************************
Table: p
Create Table: CREATE TABLE `p` (
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (a)
(PARTITION p0 VALUES LESS THAN (10) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (20) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (30) ENGINE = InnoDB) */
1 row in set (0.00 sec)

You’ve probably figured out that 5.6 adds the ability to explicitly restrict statements to specific partitions, even if you haven’t found or reviewed the excellent documentation in the manual.  In 5.6, the following works:

mysql> UPDATE p PARTITION(p1) SET b = b+1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM p;
+------+------+
| a    | b    |
+------+------+
|    1 |    1 |
|   11 |    2 |
|   21 |    1 |
+------+------+
3 rows in set (0.00 sec)

The documentation on this is very good, and provides some useful examples of how this might prove useful, including showing how this works not only with UPDATE statements, but INSERT, SELECT, DELETE, REPLACE and LOAD DATA/XML statements.  There’s also the WorkLog and associated bugs.mysql.com feature requests.  The documentation says the following about UPDATE statements:

UPDATE statements using explicit partition selection behave in the same way; only rows in the partitions referenced by the PARTITION option are considered when determining the rows to be updated, as can be seen by executing the following statements.

It then provides an example that is similar to the solution for the problem posed above.  I still had one question, though – what would happen if I did an UPDATE on a partition key column that moves a row from one partition to another?  I get an error:

mysql> UPDATE p PARTITION(p1) SET a = 22 WHERE a = 11;
ERROR 1745 (HY000): Found a row not matching the given partition set

But if I include both the source partition and the target partition, all’s well:

mysql> UPDATE p PARTITION(p1,p2) SET a = 22 WHERE a = 11;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

How do I know the data was actually moved from p1 to p2, and that mysqld isn’t just pulling my leg?  Easy:

mysql> SELECT * FROM p PARTITION(p2);
+------+------+
| a    | b    |
+------+------+
|   21 |    1 |
|   22 |    1 |
+------+------+

Awesome.  You might want to download the 5.6.5 Development Milestone Release yourself and try it out (you’ll need to click on the “Development Releases” tab to access it, something which took me longer to figure out than I care to admit).


PlanetMySQL Voting: Vote UP / Vote DOWN

Benchmarking single-row insert performance on Amazon EC2

Май 16th, 2012

I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, but still the tool serves well for benchmark purposes.

OK, let’s start off with the configuration details.

Configuration

First of all let me describe the EC2 instance type that I used.

EC2 Configuration

I chose m2.4xlarge instance as that’s the instance type with highest memory available, and memory is what really really matters.

High-Memory Quadruple Extra Large Instance
68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

As for the IO configuration I chose 8 x 200G EBS volumes in software RAID 10.

Now let’s come to the MySQL configuration.

MySQL Configuration

I used Percona Server 5.5.22-55 for the tests. Following is the configuration that I used:

## InnoDB options
innodb_buffer_pool_size         = 55G
innodb_log_file_size            = 1G
innodb_log_files_in_group       = 4
innodb_buffer_pool_instances    = 4
innodb_adaptive_flushing        = 1
innodb_adaptive_flushing_method = estimate
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_max_dirty_pages_pct      = 50
innodb_io_capacity              = 800
innodb_read_io_threads          = 8
innodb_write_io_threads         = 4
innodb_file_per_table           = 1

## Disabling query cache
query_cache_size                = 0
query_cache_type                = 0

You can see that the buffer pool is sized at 55G and I am using 4 buffer pool instances to reduce the contention caused by buffer pool mutexes. Another important configuration that I am using is that I am using “estimate” flushing method available only on Percona Server. The “estimate” method reduces the impact of traditional InnoDB log flushing, which can cause downward spikes in performance. Other then that, I have also disabled query cache to avoid contention caused by query cache on write heavy workload.

OK, so that was all about the configuration of the EC2 instance and MySQL.

Now as far as the benchmark itself is concerned, I made no code changes to iiBench, and used the version available here. But I changed the table to use range partitioning. I defined a partitioning scheme such that every partition would hold 100 million rows.

Table Structure

The table structure of the table with no secondary indexes is as follows:

CREATE TABLE `purchases_noindex` (
  `transactionid` int(11) NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  PRIMARY KEY (`transactionid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (transactionid)
(PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

While the structure of the table with secondary indexes is as follows:

CREATE TABLE `purchases_index` (
  `transactionid` int(11) NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  PRIMARY KEY (`transactionid`),
  KEY `marketsegment` (`price`,`customerid`),
  KEY `registersegment` (`cashregisterid`,`price`,`customerid`),
  KEY `pdc` (`price`,`dateandtime`,`customerid`)
) ENGINE=InnoDB AUTO_INCREMENT=11073789 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (transactionid)
(PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

Also, I ran 5 instances of iiBench simultaneously to simulate 5 concurrent connections writing to the table, with each instance of iiBench writing 200 million single row inserts, for a total of 1 billion rows. I ran the test both with the table purchases_noindex which has no secondary index and only a primary index, and against the table purchases_index which has 3 secondary indexes. Another thing I would like to share is that, the size of the table without secondary indexes is 56G while the size of the table with secondary indexes is 181G.

Now let’s come down to the interesting part.

Results

With the table purchases_noindex, that has no secondary indexes, I was able to achieve an avg. insert rate of ~25k INSERTs Per Second, while with the table purchases_index, the avg. insert rate reduced to ~9k INSERTs Per Second. Let’s take a look at the graphs have a better view of the whole picture.

Note, in the above graph, we have “millions of rows” on the x-axis and “INSERTs Per Second” on the y-axis.
The reason why I have chosen to show “millions of rows” on the x-axis so that we can see the impact of growth in data-set on the insert rate.

We can see that adding the secondary indexes to the table has decreased the insert rate by 3x, and its not even consistent. While with the table having no secondary indexes, you can see that the insert rate is pretty much constant remaining between ~25k to ~26k INSERTs Per Second. But on the other hand, with the table having secondary indexes, we can see that there are regular spikes in the insert rate, and the variation in the rate can be classified as large, because it varies between ~6.5k to ~12.5k INSERTs per second, with noticeable spikes after every 100 million rows inserted.

I noticed that the insert rate drop was mainly caused by IO pressure caused by increase in flushing and checkpointing activity. This caused spikes in write activity to the point that the insert rate was decreased.

Conclusion

As we all now there are pros and cons to using secondary indexes. While secondary indexes cause read performance to improve, but they have an impact on the write performance. Well most of the apps rely on read performance and hence having secondary indexes is an obvious choice. But for those applications that are write mostly or that rely a lot on write performance, reducing the no. of secondary indexes or even going away with secondary indexes causes a write throughput increase of 2x to 3x. In this particular case, since I was mostly concerned with write performance, so I went ahead to choose a table structure with no secondary indexes. Other important things to consider when you are concerned with write performance is using partitioning to reduce the size of the B+tree, having multiple buffer pool instances to reduce contention problems caused by buffer pool mutexes, using “estimate” checkpoint method to reduce chances of log flush storms and disabling the query cache.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Partitioning and its Confusing Syntax

Июль 18th, 2011

While looking at partitioning I recently made a mistake which I guess can happen to others. Often this is due to not fully reading the documentation or scanning it too quickly and misunderstanding what’s being said.

So this post is to complain about the MySQL partitioning syntax and to warn others who may easily make the same mistake without realising.

First we probably need to ask why we are partitioning a table in the first place. The main reasons for this are I think:

  • to improve query performance
  • to reduce individual .ibd file sizes for large tables (if using innodb_file_per_table)

In my case I wanted to do both. I had a several tables which store a large number of rows (batches of data) based on an incremental batch number. One of these tables was around 40 GB and had about 500,000,000 rows in it.  When processing data in this table often all the data from a particular batch run would be aggregated with the resultant data being stored elsewhere.

So I knew I wanted to partition and performance of these aggregations would be improved as the time to table scan a batch would be reduced to the time to scan the partition rather than the whole table.

The primary key of these tables was already in the form ( batch_id, other_key ) so I really wanted to just partition by the batch_id key, using in my case 64 partitions. batch_id is defined as int unsigned NOT NULL.

This is where I made the mistake. There are various ways to partition tables in MySQL and they are named: RANGE, LIST, HASH, KEY.  Given the batch_id was a gradually increasing value and I didn’t want to modify the partitioning once it was created RANGE and LIST seemed inappropriate. Of HASH and KEY, I incorrectly assumed that HASH would do some complex hash function of my integer batch_id, and since batch_id was part of the key that KEY would be the right way to partition.

So I incorrectly defined the table like this:

ALTER TABLE xxxxx PARTITION BY KEY ( batch_id ) PARTITIONS 64;

When you read the documentation you see that for HASH-type partitioning you provide a functional value which must be numeric and it actually determines the partition to use by doing MOD( your_functional_value, number_of_partitions ).

KEY-type partitioning works diferently and only allows you to provide column names and then it uses its own internal hashing function to generate the partition id.

So using PARTITION BY KEY ( numeric_column_name ) seems to correct but is likely to be more expensive to calculate. For large tables this is likely to cause additional performance issues.  It looks like I’m going to have to rebuild the tables I’ve just partitioned and that’ll be another weekend of work.

A suggestion to those at Oracle is that:

  • PARTITION BY KEY ( numeric_column_name ) should be modified to behave like PARTITION BY HASH ( numeric_column_name ). However, as this is likely to cause on disk incompatibilities during a version change if it were implemented, I’m guessing it just won’t happen, unless there’s some easy way to distinguish the current behaviour and my proposed new behaviour.
  • The documentation is made a little clearer and mentions this obvious case. What I see with a lot of the MySQL documentation is that it documents technically how things are done rather than documenting the problem and then how to implement the solution. Since it’s likely that many people are going to partition on one of the columns especially if a multi-column primary key is used this use case should be made clearer.

If I had time I’d look at how partitioning is implemented in Oracle database, Sybase, Postgres or DB2 and see whether it’s just MySQL which has chosen these unfortunate keywords for defining their partitioning methods.

However, I’m curious: am I the only one to fall in this trap?


PlanetMySQL Voting: Vote UP / Vote DOWN

Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning)

Июль 30th, 2010

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).

Partition – Larger tables or fact tables can benefit from partitioning because it makes data load easier and can increase join performance and use data elimination. Parallel execution can be done with partitioning due to partition pruning. The degree of parallelism should be a power of 2, because of hash-based algorithm in hash partitioning. To translate this to the MySQL world, if you are using LINEAR HASH partitioning, then you should use a degree of parallelism that is a power of 2 (I checked, and indeed. Otherwise, use a degree of parallelism that makes sense given the number of partitions you have.

One important note that during Pythian’s testing of MySQL partitioning, we found that all partitions were locked when an INSERT occurs, for the duration of the INSERT. Bulk-loading with MySQL partitioning is not as fast as it would be if MySQL allowed partition pruning for INSERTs.

So, what should be partitioned? For the first level of partitioning, the goal is to enable partitioning pruning and simplify data management. The most typical partitioning is range or interval partitioning on a date column. Interval partitioning is you say what the partition is (date, month) and partition is automatically created. MySQL does not have interval partitioning, and I have seen typical first-level partitioning be range or list based on a date or timestamp column. Note that if you use a timestamp field, the partitioning expression is optimized if you use TO_DAYS(timestamp_field) or YEAR(timestamp_field). In my experience, using anything else (such as DATE(timestamp_field)) actually makes partitioning slower than not using partitioning at all. Note that this is based on tests I did a few months ago, and your mileage may vary.

So — how do you decide partitioning strategy? Ask yourself:

  • What range of data do the queries touch – a quarter, a year?
  • What is the data loading frequency?
  • Is an incremental load required?
  • How much data is involved, a day, a week, a month?

The answers to the above questions will tell you about how big your interval needs to be. The best scenario is that all answers are the same, “we load every day, and people query by day.” If the answers are different weight access a higher priority than loading, because most people care more about query performance than performance of ETL.

This is true even if your intervals have different sizes — ie sales per day are much bigger in Dec but that’s OK. However, Maria recommends that the subpartition be as evenly divided as possible.

Easier to look at more partitions than to look at a partition that’s too big. But you don’t want too many partitions, max Oracle allows partitions is 1 million partitions, prior to 11g it was 64,000. “Stick closer to 64,000 than 1 million”. MySQL’s limitation is 1024 per table.

For the second level of partitioning, also called subpartitioning, the goal is to allow for multi-level pruning and improve join performance. In Oracle, the most typical subpartition is hash or list – in MySQL, you can only subpartition by hash or key.

How do you decide subpartitioning strategy?

  • Select the dimension queried most frequently on the fact table OR
  • Pick the common join column

For example, if you want to look at sales per day, per store, you would choose “per day” as the partition and “per store” as the subpartition.

If you do not have a good partition on logical elements (like grouping), then you can subpartition using hash partitioning on common joins — perhaps surrogate keys, or using join key of the largest table involved in the join.

For example, if the sales table is partitioned and another big table is product, you can hash subpartition product_id.

Because there’s overhead in partitions (loading metadata, reading metadata), make sure size of partitions and subpartitions is >20 Mb. So better to have a 30 Mb subpartition than a 15 Mb subpartition. [I have no idea if this is true in MySQL or not -- I think the general concept is true, because there is some overhead, but I have no idea about the 20 Mb figure and why that's true for Oracle, nor do I know what is true in MySQL.]

One easy calculation is double the # of CPUs, round up to nearest power of 2. If you’re executing in parallel, Oracle will use 2x CPUs. (all this advice, by the way, follows 80/20 rule, this is probably good for about 80% of the environments out there). Of course, MySQL does not do parallel execution very well, so this probably does not apply.

Oracle knows it can get partition elimination while it does a join.

If 2 tables have the same degree of parallelism (same # of buckets) and are partitioned in the same way on the join column (say, customer_id in a subpartition of sales and a partition of customer), Oracle will match the partitions when joining:

sales table joined with customer table can change into 4 small joins:
sales sub part 1 joins with customer part 1
sales sub part 2 joins with customer part 2
sales sub part 3 joins with customer part 3
sales sub part 4 joins with customer part 4

And with parallelism, the total time is now reduced to the time it takes to do one of those smaller joins.

This is also why you want to have a power of 2 for buckets – because cores/processors come in powers of 2. Partition-wise joins like this can also be done with range or list, assuming both tables in the join have the same buckets.

I have no idea if MySQL partitioning works this way, but it’s certainly a functionality that makes sense to me.


PlanetMySQL Voting: Vote UP / Vote DOWN

Using MySQL Partitioning Instead of MERGE Tables

Июль 24th, 2010

One common question I get is how to use partitioning instead of MERGE tables. The process I use involves using stored procedures to create and drop partitions. This article will go over the stored procedures I use; special thanks to Roland Bouman for taking a look and giving great feedback to optimize this process.

First, a simple table, not partitioned (yet):

use test;
DROP TABLE IF EXISTS my_part;
CREATE TABLE IF NOT EXISTS my_part (
  id int NOT NULL,
  creationDate datetime NOT NULL,
  PRIMARY KEY (id,creationDate)
) ENGINE=InnoDB;

In real, life there is more to the table than just id and creationDate. The most important part is that the partitioned field(s) need to be part of the primary key.

Now, add the partition definition. This can be done in the CREATE statement, but I have found that it is easier for me to think about what fields and indexes I want first, and then worry about partitioning, so I naturally gravitate towards defining the fields and indexes first and then altering the table to add partitioning. This also helps me think about how to modify an existing table to add partitioning.

ALTER TABLE my_part PARTITION BY RANGE (TO_DAYS(creationDate)) (
partition 2010_07_01 values less than (to_days('2010-07-02')),
partition 2010_07_02 values less than (to_days('2010-07-03')),
partition 2010_07_03 values less than (to_days('2010-07-04')),
partition 2010_07_04 values less than (to_days('2010-07-05'))
);

This makes it pretty clear what is happening — the idea is to give the partition names actual dates that they hold, so that it is easy to see what partitions need to be added and deleted.

Deleting partitions

I find that making stored procedures makes things easy….so I will define a procedure called partition_drop to drop partitions. The partition_drop stored procedure takes in a table name and schema name to drop partitions from, and a date to delete up through and including that date. Here’s the procedure:

DELIMITER ||
DROP PROCEDURE IF EXISTS partition_drop ||

CREATE PROCEDURE partition_drop (IN through_date date, IN tbl varchar(64), IN db varchar(64))
BEGIN
DECLARE delete_me varchar(64);
DECLARE notfound BOOL DEFAULT FALSE;
DECLARE pname CURSOR FOR SELECT PARTITION_NAME
FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl AND TABLE_SCHEMA=db
AND DATE(PARTITION_NAME)!= 0
AND DATE(PARTITION_NAME) IS NOT NULL
AND DATE(PARTITION_NAME)<=through_date;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET notfound:=TRUE;
OPEN pname;

cursor_loop: LOOP
FETCH pname INTO delete_me;
IF notfound THEN LEAVE cursor_loop; END IF;
SET @alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl," DROP PARTITION ",delete_me);

# sanity check commented out for production use
# SELECT @alter_stmt;
PREPARE stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter;

END LOOP;
CLOSE pname;
END ||
DELIMITER ;

Go ahead and run CALL partition_drop('2010-07-02','my_part','test'); to verify that SHOW CREATE TABLE my_part; shows that the desired partitions have been dropped.

Adding partitions

Adding partitions is similar to deleting partitions — using a stored procedure that takes in the date to add partitions up to that date. It will not try to add so many partitions that the table will have more than 1024, and it won’t add any partitions that already exist.

DELIMITER ||
DROP PROCEDURE IF EXISTS partition_add ||

CREATE PROCEDURE partition_add (IN through_date date, IN tbl varchar(64), IN db varchar(64))
BEGIN
DECLARE add_me char(10);
DECLARE max_new_parts,add_cnt smallint unsigned default 0;
SELECT 1024-COUNT(*) AS max_new_parts,
SUM(CASE WHEN
 DATE(PARTITION_NAME)>=through_date then 1 else 0
 END)
INTO max_new_parts, add_cnt
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE  TABLE_SCHEMA = db
 AND TABLE_NAME = tbl;

IF add_cnt=0 THEN
BEGIN
SELECT MAX(DATE(PARTITION_NAME)) INTO add_me
FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl and TABLE_SCHEMA=db
AND DATE(PARTITION_NAME)<through_date;

# to do: declare handler for exceptions here
IF DATEDIFF(through_date,add_me)+1 < max_new_parts THEN
BEGIN
WHILE add_me<through_date do BEGIN
SET add_me:=DATE_FORMAT(add_me + INTERVAL 1 DAY,"%Y_%m_%d");
SET @alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl," ADD PARTITION (PARTITION ",add_me," VALUES LESS THAN (TO_DAYS('",add_me+INTERVAL 1 DAY, "')))" );

PREPARE stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter;

END;
END WHILE;
END;
END IF;
END;
END IF;
END ||
DELIMITER ;

Here’s how to call that stored procedure:
CALL partition_add ('2010_07_10', 'my_part','test');

Caveat: This stored procedure will only add in partitions from the maximum partition name/date until the date you add it in, it will not fill in any gaps. However, it is possible to add in another input parameter to be the “start date” and change the stored procedure to add the partition if it does not exist, from the start date through the end date.

Note: It was pointed out by Roland Bouman that it would be better to change both the add and drop stored procedures to do one ALTER TABLE instead of sequential ones. He wrote:

It’s probably better to generate a single statement to drop / add all partitions.

So the general pattern would be:

- generate and concatenate lines for each partition
- one single sequence of prepare, execute, deallocate to execute one DDL statement.

For the drop partition procedure, this approach would allow you to do away with the cursor. You can simply use GROUP_CONCAT in a single SELECT…INTO statement to generate the entire DDL statement.

Note: Roland also mentioned that these stored procedures could be generalized to use any interval, for example if each table partition held 7 days. The only limitation is that to get the optimal partition performance with dates, use either TO_DAYS(date_field) or YEAR(date_field) as the partitionining function.

Putting it all together

The procedure: daily I would run the following in MySQL, to keep only the previous “x” days:

CALL partition_drop('CURRENT_DATE()-INTERVAL x DAY', 'tbl_name','schema_name');

To add new partitions, I would do:
CALL partition_add('CURRENT_DATE()-INTERVAL 32 DAY', 'tbl_name','schema_name');

It is run daily just in case; the side benefit is that it will catch any gaps if you do not modify the stored procedure to include a start date for when to add. Adding a partition does not take a long time, because there’s no data to reorganize.

I would make a daily MySQL event, and then everything is stored nicely in the database, and backed up with the database. Others may choose to run the stored procedures in a cron script or scheduled task (Windows), but that requires a password to be stored somewhere — either in the script, somewhere the script can read, or in a config file such as .my.cnf.


PlanetMySQL Voting: Vote UP / Vote DOWN

Welcome googleCL

Июнь 19th, 2010
I am writing this blog post with Vim, my favorite editor, instead of using the online editor offered by blogger. And I am uploading this post to my Blogger account using Google CL a tool that lets you use Google services from the command line.
I am a command line geek, and as soon as I saw the announcement, I installed it in my laptop. The mere fact that you are reading this blog post shows that it works.

GoogleCL is an apparently simple application. If you install it on Mac using macports you realize how many dependencies it has and how much complexity it gives under the hood.
Using an easy to understand syntax, it allows you to access your blog, pictures, calendar, contacts, videos, and online documents at your fingertips.
For example, let's query my blog for partitioning:

$ google blogger --blog="The Data Charmer" --title=partitioning list "title,url"

Hmm. No results. The manual doesn't help much, but something happened during this query. The first thing ist that I was asked to authorize the script to access my blog, and that was done by activating a key that I got in the command line. So far, so good. The second thing was a message informing me that a default configuration file was created in my home directory. Looking at that file, I saw an option saying "regex = True". Aha! So the title supports regular expressions. Let's try:

$ google blogger --blog="The Data Charmer" --title=".*partitioning" list "title"
Holiday gift - A deep look at MySQL 5.5 partitioning enhancements
The partition helper - Improving usability with MySQL 5.1 partitioning
A quick usability hack with partitioning
MySQL 5.1 Improving ARCHIVE performance with partitioning

OK. This gives me everything with the word "partitioning" in the title. But I know that some titles are missing. Comparing with the results that I get online, I see that the titles where "partitioning" is capitalized are not reported. So the search is case sensitive. What I need to do is to tell the regular expression that I want a case insensitive search. Fortunately, I know how to speak regular expressions. Let's try again.

$ google blogger --blog="The Data Charmer" --title="(?i).*partitioning.*" list "title"
Holiday gift - A deep look at MySQL 5.5 partitioning enhancements
Partitioning with non integer values using triggers
Tutorial on Partitioning at the MySQL Users Conference 2009
The partition helper - Improving usability with MySQL 5.1 partitioning
A quick usability hack with partitioning
MySQL 5.1 Improving ARCHIVE performance with partitioning

Now I feel confident enough to do some changes to my online contents.
To create this blog post, I used some of googlecl capabilities. After I created an image, I uploaded it to my Picasa album using this command:

$google picasa post -n "Blogger Pictures" -t googlecl ~/Desktop/google_cl.png

Then I asked Picasa to give me the URL of the image:

$ google picasa list -n "Blogger Pictures" --query googlecl title,url_direct
google_cl.png,http://lh6.ggpht.com/_gVfZHGgf5LA/TBzjaKiJJvI/AAAAAAAAA74/dthDDhybsmc/google_cl.jpg

And then I inserted that URL in this blog post. Finally, I uploaded the blog post with this command:

google blogger --blog="The Data Charmer" --draft --title "Welcome googleCL" --tags="google,mysql,partitioning,command line,blogging" post ~/blog/welcome_googlecl.html


(Now writing online) And after I checked that the post was looking as I wanted it, I hit the "PUBLICH POST" button.
Welcome, GoogleCL!

PlanetMySQL Voting: Vote UP / Vote DOWN

I wrote a new tool that runs aggregation queries over MySQL sharded databases using Gearman.

Май 23rd, 2010
I created a new tool this week:
http://code.google.com/p/shard-query

As the name Shard-Query suggests, the goal of the tool is to run a query over multiple shards, and to return the combined results together as a unified query. It uses Gearman to ask each server for a set of rows and then runs the query over the combined set. This isn't a new idea, however, Shard-Query is different than other Gearman examples I've seen, because it supports aggregation.

It does this by doing some basic query rewriting based on the input query.

Take this query for example:
select c2, 
       sum(s0.c1), 
       max(c1) 
 from t1 as s0 
 join t1 using (c1,c2) 
 where c2 = 98818 
 group by c2;


The tool will split this up into two queries.

This first query will be sent to each shard. Notice that any aggregations, grouping or sorting has been stripped away. Any joins or where clauses are still present:
select   c2 as `c2` ,   
         (s0.c1) as `(s0.c1)` ,   
         (c1) as `(c1)`   
  from   t1 s0   
  join   t1   
 using   (c1,c2)   
 where   c2   =   98818;


Shard-Query supports two different methods of coalescing the results from each shard. The first is called 'fetch' which as it sounds, fetches the results through an array, and inserts them into a temporary table. This is less desirable than the default method, which is called store. When the store method is used, each worker uses bulk insert to insert rows directly into a destination table. The downside is that the table has to be a physical table, since it must be inserted into from multiple connections. The store method is also better than fetch because the rows are inserted into the table in parallel from each shard. The insertion is serialized with the fetch method, since the results are handled by a single threaded PHP script.

Either a temporary table, or a concrete table will be created to hold the rows from all the shards depending on the fetch method.
CREATE TEMPORARY TABLE -- not a temporary table when the method is store
         IF NOT EXISTS `aggregation_tmp_#14861474` 
(        `c2` BINARY(255), 
         `(s0.c1)` BINARY(255), 
         `(c1)` BINARY(255)
) ENGINE=INNODB;


Finally, after all the rows are inserted into that table, the final rewritten query is executed. Notice that the SELECT clause has been rewritten to accommodate the column names of the temporary table. Any FROM clause has been removed and replaced with a scan of the single table. Also notice the WHERE clause is not present, because the rows were filtered at each shard:
select (`c2`), 
       sum(`(s0.c1)`), 
       max(`(c1)`)  
  from `aggregation_tmp_#14861474`                 
 group by `c2`;


Finally, the result is output and then the table is dropped.
c2      sum(`(s0.c1)`)  max(`(c1)`)
-       -               -
98818   16              8


I hope you find it useful. Right now it works with MySQL using the plain old mysql_ class of functions, but I can abstract the database access out if somebody really wants me to. This could potentially be used for just about any database which supports SQL.
PlanetMySQL Voting: Vote UP / Vote DOWN

Two quick performance tips with MySQL 5.1 partitions

Май 6th, 2010
partitionsWhile I was researching for my partitions tutorial, I came across two hidden problems, which may happen often, but are somehow difficult to detect and even more difficult to fix, unless you know what's going on, and why. I presented both cases during my tutorial, but there were no pictures to convey the mechanics of the problem. Here is the full story.

TO_DAYS() prunes two partitions instead of one


If you are partitioning by date, chances are that you are using TO_DAYS(). And depending on how you have partitioned your table, your queries are as fast as you expect them to be. However, there are cases where your query takes twice as long as it should, and of course this will not make you happy.

For example, in a table partitioned by month, when your query searches for values within one specific month, EXPLAIN PARTITIONS tells you that the search involves two partitions (see figure above). This means that, instead of searching through 1,000,000 rows in one partitions, the partitions engine is searching through 2,000,000 rows in two partitions.
But why? The reasoning, as reported from the developers, is that
This is not a bug, since TO_DAYS() returns NULL for invalid dates, it needs to scan the first partition as well (since that holds all NULL values) for ranges.

Bug#49754: Partitioning by RANGE with TO_DAYS always includes first partition when pruning
This makes sense, from a developer's standpoint. From a user's experience, though, it's a bug.
Anyway, it doesn't do us any good to rant about it. Our query is still twice as slow as we want it. We need to take action. The workaround is to create an empty partition in first position. If we are creating a new table, it's simple. Just say
PARTITION p000 VALUES LESS THAN (0)
and all will be well. The partition pruning mechanism will still find two partitions, but since the first one is empty, it won't impact the performance.
If you have an existing table already partitioned, like in our example, then you need to perform a different operation

Now we have a different first partition, with no records. When we issue the same query, the partition pruning will look at partition p0, but it will skip it because there are no records.

Inserting single rows in partitions is slow


Also this bug is sometimes difficult to detect. If you want to test partitions in MySQL 5.1, probably you will take an existing table and convert it to a partitioned one, or you create a new table and load the contents from a dump. Either way, you are unlikely to insert millions of records with single INSERT statements. These single inserts are slower than bulk inserts in the first place, but with partitions there is an additional penalty. Whenever you insert a record, the partitioning engine locks the entire table. When you insert thousands of records, the partitioning engine will lock all partitions before the insert, and unlock them after the insert. If you have 500 partitions, that's 500 locks and 500 unlocks for every statement. Ouch!
It's a design problem, and it is not likely to be fixed without turning around the whole architecture of partitions. Also in this case, there is a bug report, Partitioning performance drops drastically with hundreds of partitions, although nobody says that this is a feature.
What can you do, then? You have several choices:
  • You can use a bulk insert. Instead of single statements, use INSERT with multiple records, or LOAD DATA INFILE.
  • Explicitly LOCK the table before inserting and UNLOCK it after you finish with all the inserts. This will avoid the overhead, although it won't make your table concurrently accessible until you finish.
  • If you use partitioning only to facilitate heavy queries, consider using a non-partitioned table on the master, and partitioned ARCHIVE tables on the slaves (see figure below).


As I have said many times in my presentations, always benchmark before using partitions in production. Whether you think that it will boost your performance or that it will slow things down, don't trust your instincts, and test. You may be up for a surprise.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL: Partition-wise backups with mysqldump

Март 30th, 2010
To whom it may concern,

in response to a query from André Simões (also known as ITXpander), I slapped together a MySQL script that outputs mysqldump commands for backing up individual partitions of the tables in the current schema. The script is maintained as a snippet at MySQL Forge.

How it works

The script works by querying the information_schema.PARTITIONS system view to generate an appropriate expression for mysqldump's --where option. The generated command also redirects the output to a file with this name pattern:
<schema>.<table>.<partition-name>.sql
For example, for this table (taken from the MySQL reference manual):
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
the script generates the following commands:
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) < 1960" test members > test.members.p0.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1960 and YEAR(joined) < 1970" test members > test.members.p1.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1970 and YEAR(joined) < 1980" test members > test.members.p2.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1980 and YEAR(joined) < 1990" test members > test.members.p3.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1990 and YEAR(joined) < 18446744073709551615" test members > test.members.p4.sql
Tip: in order to obtain directly executable output from the mysql command line tool, run the script with the --skip-column-names (or -N) option.

Features

Currently, the script supports the following partitioning methods:

Limitations

The LINEAR HASH method is currently not supported, but I may implement that in the future.

Currently I do not have plans to implement the KEY and LINEAR KEY partitioning methods, but I may reconsider if and when I have more information about the storage-engine specific partitioning methods used by these methods.

Finally, I should point out that querying the information_schema.PARTITIONS table is dog-slow. This may not be too big of an issue, however it is pretty annoying. If anybody has some tips to increase performance, please let me know.

Acknowledgements

Thanks to André for posing the problem. I had a fun hour of procrastination to implement this, and it made me read part of the MySQL reference manual on partitioning.

I also would like to thank Giuseppe Maxia (the Datacharmer) for providing valuable feedback. If you're interested in either partitioning or the mysql command line, you should visit his tutorials at the MySQL conference, april 12-15, 2010.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL user group meeting in Dubai, January 13th

Январь 11th, 2010

Dubai MySQL meetup

On January 13th I will be in Dubai, UAE, on my way to Wellington, New Zealand.
I will be at the MySQL User Group, hosted at Sun Microsystems offices at 6pm.
I will talk about boosting performance with MySQL 5.1 partitions, covering the recent 5.5. additions.
The meeting is open to all. If you want to attend, please register at the meetup page.

PlanetMySQL Voting: Vote UP / Vote DOWN