Archive for the ‘partitions’ Category

Blowing up in memory

Сентябрь 25th, 2011

MySQL isn’t too concerned about table handler memory usage – it will allocate row size buffer thrice per each table invocation. There’s a few year old bug discussing UNION memory usage – for each mention in an union one can allocate nearly 200k of unaccounted memory – so a megabyte sized query can consume 7GB of RAM already.

Partitioning though adds even more pain here – it will allocate those three buffers per each partition, so opening a table with 1000 partitions looks like this on memory profile:

Click to enlarge, and you will see 191MB sent to execute a simple single-row fetching query from a table (I filed a bug on this).

There’re multiple real life situations when this is painful (e.g. any kind of server stall may lead to multiple concurrent threads reading from same table, consuming additional gigabytes or tens of gigabytes of memory). It gets even more painful when combined with UNION bug – a megabyte query on an empty table can now consume 7TB of memory and I doubt anyone has that much on their MySQL servers :-)

P.S. Also, check out how much memory can be wasted for malloc overhead, once discussed here.
P.P.S. And here you can see why innodb_max_dirty_pages_pct=0 doesn’t do what you’d expect.



PlanetMySQL Voting: Vote UP / Vote DOWN

Exchanging partitions with tables

Апрель 29th, 2010
MySQL PartitionsWhile I was presenting my partitioning tutorial at the latest MySQL Conference, I announced a new feature that was, as far as I knew, still in the planning stage. Mattias Jonsson, one of the partitions developers, was in attendance, and corrected me, explaining that the feature was actually available in a prototype.

So, we can have a look at this improvement, which I am sure will make DBAs quite happy. The new feature is an instantaneous exchange between a partition and a table with the same structure. Using this feature, you can transfer the contents of one partition to one table, and vice versa. Since the transition is done only in the attribution of the data, there is no copy involved. The data stays where it is at the moment. What is in the table ends up in the partition and what's in the partition ends up in the table. Let's see an example.

With the data in figure 1, where we have a partitioned table t1 and an empty table t2 with the same structure, we can issue the following statement:
ALTER TABLE t1
EXCHANGE PARTITION p2
WITH TABLE t2


After the exchange, partition p2 is empty, and table t2 contains 4 records.
If we repeat the command, the contents will be swapped again, leaving table t2 empty and partition p2 with its original contents.

If you want to test on your own, you can get the code from Launchpad. Once you get the code, you can use cmake to build the server.

$ cmake-gui .
# add the options you need. For example, enable innodb
# or else you will need to load it as a plugin.
$ make && ./scripts/make_binary_distribution

You can then use this script to test the new functionality. You may want to change Innodb with MyISAM to test it thoroughly. At the moment, it doesn't work with the archive engine (yet).

# ############################
# test_exchange_partitions.sql
# ############################
use test;
set default_storage_engine=innodb;
drop procedure if exists compare_tables;
delimiter //
create procedure compare_tables (wanted int)
reads sql data
begin
set @part_table := (select count(*) from t1);
set @non_part_table := (select count(*) from t2);
select @part_table, @non_part_table,
if(@non_part_table = wanted, "OK", "error") as expected;
end //
delimiter ;

drop table if exists t1, t2;
create table t1 (i int) # not null primary key)
partition by range (i)
(
partition p01 values less than (100001),
partition p02 values less than (200001),
partition p03 values less than (300001),
partition p04 values less than (400001),
partition p05 values less than (500001),
partition p06 values less than (600001),
partition p07 values less than (700001),
partition p08 values less than (800001),
partition p09 values less than (900001),
partition p10 values less than (1000001),
partition p11 values less than (maxvalue));

create table t2 (i int ) ; # not null primary key);

select table_name, engine
from information_schema.tables
where table_schema='test' and table_type='base table';


select 'generating 1 million records. ...' as info;
# generates 1 million records
# see this article for details
# http://datacharmer.blogspot.com/2007/12/data-from-nothing-solution-to-pop-quiz.html
create or replace view v3 as select null union all select null union all select null;
create or replace view v10 as select null from v3 a, v3 b union all select null;
create or replace view v1000 as select null from v10 a, v10 b, v10 c;
set @n = 0;
insert into t1 select @n:=@n+1 from v1000 a,v1000 b;

select partition_name, table_rows from information_schema . partitions where table_name='t1' and table_schema='test';

call compare_tables(0);

alter table t1 exchange partition p04 with table t2;
call compare_tables(100000);

select partition_name, table_rows from information_schema . partitions where table_name='t1' and table_schema='test';

alter table t1 exchange partition p04 with table t2;
call compare_tables(0);

alter table t1 exchange partition p04 with table t2;
call compare_tables(100000);

Here is a test run:

$ ~/sandboxes/msb_5_6_99/use -t test -vvv < test_exch_part.sql
--------------
set default_storage_engine=innodb
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
drop procedure if exists compare_tables
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create procedure compare_tables (wanted int)
reads sql data
begin
set @part_table := (select count(*) from t1);
set @non_part_table := (select count(*) from t2);
select @part_table, @non_part_table,
if(@non_part_table = wanted, "OK", "error") as expected;
end
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
drop table if exists t1, t2
--------------

Query OK, 0 rows affected (0.07 sec)

--------------
create table t1 (i int)
partition by range (i)
(
partition p01 values less than (100001),
partition p02 values less than (200001),
partition p03 values less than (300001),
partition p04 values less than (400001),
partition p05 values less than (500001),
partition p06 values less than (600001),
partition p07 values less than (700001),
partition p08 values less than (800001),
partition p09 values less than (900001),
partition p10 values less than (1000001),
partition p11 values less than (maxvalue))
--------------

Query OK, 0 rows affected (0.08 sec)

--------------
create table t2 (i int )
--------------

Query OK, 0 rows affected (0.14 sec)

--------------
select table_name, engine
from information_schema.tables
where table_schema='test' and table_type='base table'
--------------

+------------+--------+
| table_name | engine |
+------------+--------+
| t1 | InnoDB |
| t2 | InnoDB |
+------------+--------+
2 rows in set (0.01 sec)

--------------
select 'generating 1 million records. ...' as info
--------------

+-----------------------------------+
| info |
+-----------------------------------+
| generating 1 million records. ... |
+-----------------------------------+
1 row in set (0.00 sec)

--------------
create or replace view v3 as select null union all select null union all select null
--------------

Query OK, 0 rows affected (0.12 sec)

--------------
create or replace view v10 as select null from v3 a, v3 b union all select null
--------------

Query OK, 0 rows affected (0.14 sec)

--------------
create or replace view v1000 as select null from v10 a, v10 b, v10 c
--------------

Query OK, 0 rows affected (0.09 sec)

--------------
set @n = 0
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t1 select @n:=@n+1 from v1000 a,v1000 b
--------------

Query OK, 1000000 rows affected (10.01 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

--------------
select partition_name, table_rows from information_schema . partitions where table_name='t1' and table_schema='test'
--------------

+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p01 | 100623 |
| p02 | 100623 |
| p03 | 100623 |
| p04 | 100623 |
| p05 | 100623 |
| p06 | 100623 |
| p07 | 100623 |
| p08 | 100623 |
| p09 | 100623 |
| p10 | 100623 |
| p11 | 0 |
+----------------+------------+
11 rows in set (0.01 sec)

--------------
call compare_tables(0)
--------------

+-------------+-----------------+----------+
| @part_table | @non_part_table | expected |
+-------------+-----------------+----------+
| 1000000 | 0 | OK |
+-------------+-----------------+----------+
1 row in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)

--------------
alter table t1 exchange partition p04 with table t2
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
call compare_tables(100000)
--------------

+-------------+-----------------+----------+
| @part_table | @non_part_table | expected |
+-------------+-----------------+----------+
| 900000 | 100000 | OK |
+-------------+-----------------+----------+
1 row in set (0.54 sec)

Query OK, 0 rows affected (0.54 sec)

--------------
select partition_name, table_rows from information_schema . partitions where table_name='t1' and table_schema='test'
--------------

+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p01 | 100623 |
| p02 | 100623 |
| p03 | 100623 |
| p04 | 0 |
| p05 | 100623 |
| p06 | 100623 |
| p07 | 100623 |
| p08 | 100623 |
| p09 | 100623 |
| p10 | 91799 |
| p11 | 0 |
+----------------+------------+
11 rows in set (0.01 sec)

--------------
alter table t1 exchange partition p04 with table t2
--------------

Query OK, 0 rows affected (0.05 sec)

--------------
call compare_tables(0)
--------------

+-------------+-----------------+----------+
| @part_table | @non_part_table | expected |
+-------------+-----------------+----------+
| 1000000 | 0 | OK |
+-------------+-----------------+----------+
1 row in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)

--------------
alter table t1 exchange partition p04 with table t2
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
call compare_tables(100000)
--------------

+-------------+-----------------+----------+
| @part_table | @non_part_table | expected |
+-------------+-----------------+----------+
| 900000 | 100000 | OK |
+-------------+-----------------+----------+
1 row in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)

Bye

Notice that the value for "table_rows" is only approximate with InnoDB, while it is reliable for MyISAM. Anyway, when it says that a partition has 0 records, it's reliable for any engine. Here you see that, after the exchange, partition p04 is empty.
The exchange is repeated twice, to make sure that it works both ways.
Notice also that, if the table contains data that doesn't fit with the partition, the server throws an error, and the exchange does not happen.

mysql > insert into t2 values (2000000);
Query OK, 1 row affected (0.00 sec)

mysql > alter table t1 exchange partition p04 with table t2;
ERROR 1697 (HY000): Found row that does not match the partition

If you remove the offending row from the table, the exchange works as expected.

PlanetMySQL Voting: Vote UP / Vote DOWN

Holiday gift — A deep look at MySQL 5.5 partitioning enhancements

Декабрь 24th, 2009

A deep look at MySQL 5.5 partitioning enhancements

Half a day into my vacation, I managed to finish an article on a topic that has been intriguing me for a while.
Since several colleagues were baffled by the semantics of the new enhancements of MySQL 5.5 partitions, after talking at length with the creator and the author of the manual pages, I produced this article: A deep look at MySQL 5.5 partitioning enhancements.
Happy holidays!

PlanetMySQL Voting: Vote UP / Vote DOWN

Spider and vertical partition engines with new goodies

Октябрь 15th, 2009

sharding for the masses

The Spider storage engine should be already known to the community. Its version 2.5 has recently been released, with new features, the most important of which is that you can execute remote SQL statements in the backend servers. The method is quite simple. Together with Spider, you also get an UDF that executes SQL code in a remote server. You send a query with parameters saying how to connect to the server, and check the result (1 for success, 0 for failure). If the SQL involves a SELECT, the result can be sent to a temporary table. Simple and effective.

In addition to the Spider engine, Kentoku SHIBA has also created the vertical partitioning engine. Instead of splitting tables by record, you split them by columns. You can define a table with column A and column B, with primary key K, and another table with column C and column D, with primary key K. The vertical partition engine allows you to define a table with columns K, A, B, C, D, which looks to the user like a regular column. The backend tables can be of any engine.
There is a MySQL University session about the Spider and VP engines today at 15:00 CEST. Free attendance!

PlanetMySQL Voting: Vote UP / Vote DOWN