Archive for the ‘hash’ Category

MySQL Partitioning and its Confusing Syntax

Июль 18th, 2011

While looking at partitioning I recently made a mistake which I guess can happen to others. Often this is due to not fully reading the documentation or scanning it too quickly and misunderstanding what’s being said.

So this post is to complain about the MySQL partitioning syntax and to warn others who may easily make the same mistake without realising.

First we probably need to ask why we are partitioning a table in the first place. The main reasons for this are I think:

  • to improve query performance
  • to reduce individual .ibd file sizes for large tables (if using innodb_file_per_table)

In my case I wanted to do both. I had a several tables which store a large number of rows (batches of data) based on an incremental batch number. One of these tables was around 40 GB and had about 500,000,000 rows in it.  When processing data in this table often all the data from a particular batch run would be aggregated with the resultant data being stored elsewhere.

So I knew I wanted to partition and performance of these aggregations would be improved as the time to table scan a batch would be reduced to the time to scan the partition rather than the whole table.

The primary key of these tables was already in the form ( batch_id, other_key ) so I really wanted to just partition by the batch_id key, using in my case 64 partitions. batch_id is defined as int unsigned NOT NULL.

This is where I made the mistake. There are various ways to partition tables in MySQL and they are named: RANGE, LIST, HASH, KEY.  Given the batch_id was a gradually increasing value and I didn’t want to modify the partitioning once it was created RANGE and LIST seemed inappropriate. Of HASH and KEY, I incorrectly assumed that HASH would do some complex hash function of my integer batch_id, and since batch_id was part of the key that KEY would be the right way to partition.

So I incorrectly defined the table like this:

ALTER TABLE xxxxx PARTITION BY KEY ( batch_id ) PARTITIONS 64;

When you read the documentation you see that for HASH-type partitioning you provide a functional value which must be numeric and it actually determines the partition to use by doing MOD( your_functional_value, number_of_partitions ).

KEY-type partitioning works diferently and only allows you to provide column names and then it uses its own internal hashing function to generate the partition id.

So using PARTITION BY KEY ( numeric_column_name ) seems to correct but is likely to be more expensive to calculate. For large tables this is likely to cause additional performance issues.  It looks like I’m going to have to rebuild the tables I’ve just partitioned and that’ll be another weekend of work.

A suggestion to those at Oracle is that:

  • PARTITION BY KEY ( numeric_column_name ) should be modified to behave like PARTITION BY HASH ( numeric_column_name ). However, as this is likely to cause on disk incompatibilities during a version change if it were implemented, I’m guessing it just won’t happen, unless there’s some easy way to distinguish the current behaviour and my proposed new behaviour.
  • The documentation is made a little clearer and mentions this obvious case. What I see with a lot of the MySQL documentation is that it documents technically how things are done rather than documenting the problem and then how to implement the solution. Since it’s likely that many people are going to partition on one of the columns especially if a multi-column primary key is used this use case should be made clearer.

If I had time I’d look at how partitioning is implemented in Oracle database, Sybase, Postgres or DB2 and see whether it’s just MySQL which has chosen these unfortunate keywords for defining their partitioning methods.

However, I’m curious: am I the only one to fall in this trap?


PlanetMySQL Voting: Vote UP / Vote DOWN

Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning)

Июль 30th, 2010

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).

Partition – Larger tables or fact tables can benefit from partitioning because it makes data load easier and can increase join performance and use data elimination. Parallel execution can be done with partitioning due to partition pruning. The degree of parallelism should be a power of 2, because of hash-based algorithm in hash partitioning. To translate this to the MySQL world, if you are using LINEAR HASH partitioning, then you should use a degree of parallelism that is a power of 2 (I checked, and indeed. Otherwise, use a degree of parallelism that makes sense given the number of partitions you have.

One important note that during Pythian’s testing of MySQL partitioning, we found that all partitions were locked when an INSERT occurs, for the duration of the INSERT. Bulk-loading with MySQL partitioning is not as fast as it would be if MySQL allowed partition pruning for INSERTs.

So, what should be partitioned? For the first level of partitioning, the goal is to enable partitioning pruning and simplify data management. The most typical partitioning is range or interval partitioning on a date column. Interval partitioning is you say what the partition is (date, month) and partition is automatically created. MySQL does not have interval partitioning, and I have seen typical first-level partitioning be range or list based on a date or timestamp column. Note that if you use a timestamp field, the partitioning expression is optimized if you use TO_DAYS(timestamp_field) or YEAR(timestamp_field). In my experience, using anything else (such as DATE(timestamp_field)) actually makes partitioning slower than not using partitioning at all. Note that this is based on tests I did a few months ago, and your mileage may vary.

So — how do you decide partitioning strategy? Ask yourself:

  • What range of data do the queries touch – a quarter, a year?
  • What is the data loading frequency?
  • Is an incremental load required?
  • How much data is involved, a day, a week, a month?

The answers to the above questions will tell you about how big your interval needs to be. The best scenario is that all answers are the same, “we load every day, and people query by day.” If the answers are different weight access a higher priority than loading, because most people care more about query performance than performance of ETL.

This is true even if your intervals have different sizes — ie sales per day are much bigger in Dec but that’s OK. However, Maria recommends that the subpartition be as evenly divided as possible.

Easier to look at more partitions than to look at a partition that’s too big. But you don’t want too many partitions, max Oracle allows partitions is 1 million partitions, prior to 11g it was 64,000. “Stick closer to 64,000 than 1 million”. MySQL’s limitation is 1024 per table.

For the second level of partitioning, also called subpartitioning, the goal is to allow for multi-level pruning and improve join performance. In Oracle, the most typical subpartition is hash or list – in MySQL, you can only subpartition by hash or key.

How do you decide subpartitioning strategy?

  • Select the dimension queried most frequently on the fact table OR
  • Pick the common join column

For example, if you want to look at sales per day, per store, you would choose “per day” as the partition and “per store” as the subpartition.

If you do not have a good partition on logical elements (like grouping), then you can subpartition using hash partitioning on common joins — perhaps surrogate keys, or using join key of the largest table involved in the join.

For example, if the sales table is partitioned and another big table is product, you can hash subpartition product_id.

Because there’s overhead in partitions (loading metadata, reading metadata), make sure size of partitions and subpartitions is >20 Mb. So better to have a 30 Mb subpartition than a 15 Mb subpartition. [I have no idea if this is true in MySQL or not -- I think the general concept is true, because there is some overhead, but I have no idea about the 20 Mb figure and why that's true for Oracle, nor do I know what is true in MySQL.]

One easy calculation is double the # of CPUs, round up to nearest power of 2. If you’re executing in parallel, Oracle will use 2x CPUs. (all this advice, by the way, follows 80/20 rule, this is probably good for about 80% of the environments out there). Of course, MySQL does not do parallel execution very well, so this probably does not apply.

Oracle knows it can get partition elimination while it does a join.

If 2 tables have the same degree of parallelism (same # of buckets) and are partitioned in the same way on the join column (say, customer_id in a subpartition of sales and a partition of customer), Oracle will match the partitions when joining:

sales table joined with customer table can change into 4 small joins:
sales sub part 1 joins with customer part 1
sales sub part 2 joins with customer part 2
sales sub part 3 joins with customer part 3
sales sub part 4 joins with customer part 4

And with parallelism, the total time is now reduced to the time it takes to do one of those smaller joins.

This is also why you want to have a power of 2 for buckets – because cores/processors come in powers of 2. Partition-wise joins like this can also be done with range or list, assuming both tables in the join have the same buckets.

I have no idea if MySQL partitioning works this way, but it’s certainly a functionality that makes sense to me.


PlanetMySQL Voting: Vote UP / Vote DOWN