Archive for the ‘barracuda’ Category

InnoDB 5.6.4 supports databases with 4k and 8k page sizes

Декабрь 20th, 2011

In the 5.6.4 release it is now possible to create an InnoDB database with 4k or 8k page sizes in addition to the original 16k page size. Previously, it could be done by recompiling the engine with a different value for UNIV_PAGE_SIZE_SHIFT and UNIV_PAGE_SIZE. With this release, you can set –innodb-page-size=n when starting mysqld, or put innodb_page_size=n in the configuration file in the [mysqld] section where n can be 4k, 8k, 16k, or 4096, 8192, 16384.

The support of smaller page sizes may be useful for certain storage media such as SSDs. Performance results can vary depending on your data schema, record size, and read/write ratio. But this provides you more options to optimize your performance.

When this new setting is used, the page size is set for all tablespaces used by that InnoDB instance. You can query the current value with;

SHOW VARIABLES LIKE ‘innodb_page_size’;
or
SELECT variable_value FROM information_schema.global_status  WHERE LOWER(variable_name) = ‘innodb_page_size’;

It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace. That happens when InnoDB does not find ibdata1 in the data directory. If you start mysqld with a page size other than the standard 16k, the error log will contain something like this;

111214 15:55:05 InnoDB: innodb-page-size has been changed from the default value 16384 to 4096.

If your system tablespace already exists using one page size and innodb-page-size is something else, the engine will not start. There will be a message logged like this;

111214 16:06:51 InnoDB: Error: data file .\ibdata1 uses page size 4096,
111214 16:06:51 InnoDB: but the start-up parameter is innodb-page-size=16384

InnoDB knows the page size used in an existing tablespace created by version 5.6.4 because it stamps that page size in the header page. But this is not readable to older engines, of course. If an older engine opens a database with a page size other than 16k, it will report a corrupted page and quit.

All features in InnoDB work the same with smaller page sizes. But some limits are affected. The maximum record size is proportionately less with smaller pages. Record length limits are calculated within InnoDB based on a variety of factors including row type, column type and length, number of columns, secondary indexes, index prefix lengths, and of course, the page size. The main record is stored in a clustered index and a minimum of two records must fit into each page. So the maximum record length for 8k pages is about half that of 16k pages and the maximum record size with 4k pages is about half that of 8k pages.

In addition to record lengths, the maximum key lengths are proportionately smaller. For 16k pages, MySQL prevents key definitions from containing over 3072 bytes of data. This means that the primary key in the clustered index cannot use more than 3072 bytes of data. Secondary indexes can only contain up to 3072 bytes of data in the key also. But within a secondary index, InnoDB must store the primary key as the record pointer.  So if you have a 3072 byte primary key and a 3072 byte secondary key, each entry in the secondary index contains 6144 bytes of data plus internal record overhead. The amount of internal overhead is dependent upon the column types, but this gets the secondary record close to half the page size which is its natural limit. So this limit of 3072 bytes per key is less than but close to what InnoDB would have to impose on a table by table basis.  Based on that, InnoDB now reports to MySQL that any key defined for 8k page sizes must be less than 1536 bytes.  Likewise, the maximum key length when InnoDB uses 4k page sizes is 768 bytes.

If you have a database schema with any large records or keys defined, you may not be able to use smaller page sizes. Even if your records do barely fit in the clustered index page, it may not be advisable to use these smaller pages because the btree will be a lot deeper.  For example, if only 2 records fit on the page and there are 1,000,000 records, leaf pages are 20 levels deep, meaning InnoDB will need to read 20 pages to find the leaf page.  If that were on 4k pages, then using the same table on 16k pages would give 8 records per page and the leaf pages would only be 7 levels down.

There is a trick to reducing the size of records on the clustered index page;
1) Use Dynamic or Compressed row format.
2) Convert VARCHAR fields to TEXT fields.  (VARBINARY can be converted to BLOB)

There are 4 ROW FORMATS in InnoDB. The first two, Redundant and Compact, which are considered the Antelope file version, store at least 768 bytes of each field in the clustered record.  The Barracuda file version consists of Compact and Dynamic ROW FORMATS. These have the ability to put all of a VARCHAR, VARBINARY, TEXT or BLOB field onto a separate BLOB page for storage.  So one good way to prepare a table structure to use smaller page sizes is to use Dynamic or Compressed row format.

The decision of how big a record will be in the clustered record is made during the INSERT or UPDATE.  Each record is evaluated for its own length.  If the record is too long to fit in half the page, InnoDB will look for the longest actual field in that record and put as much of that field off-page as possible based on the row type.  Then if it is still to long, it will shorten the longest field remaining.  Since this is done when the record is added to the page, different records may have different columns stored off-page when there are multiple long fields in the record.

VARCHAR/VARBINARY fields are treated like TEXT/BLOB fields if they are over 255 bytes long.  If you are using Compressed or Dynamic row format and your record is too long because you have too many VARCHAR fields 255 bytes or less, you can reduce the record length by converting them to TEXT fields. Likewise, VARBINARY fields 255 bytes or less can be converted to BLOB fields to take advantage of this ability to store the whole TEXT or BLOB field on a separate page.

A file extent in InnoDB is 1 Mb independent of the page size. So an extent will hold 64 16k pages, 128 8k pages and 256 4k pages.  This means that the read ahead mechanisms will read more pages with smaller page sizes since they read a whole extent at a time.  The doublewrite buffer, which is based on the size of an extent, will also contain more pages.

If you want to use smaller page sizes with existing data, export the data first with a logical export utility such as mysqldump. Then create the new mysql instance with innodb-page-size=4k or 8k and import the data. Do not use a physical export method such as alter table … discard tablespace.

Summary:

This feature makes it easier to try smaller page sizes in an InnoDB database. And with the 5.6.4 release, those smaller page sizes are fully supported by MySQL. Just export your data, move or delete the system database (ibdata1) and the log files (ib_logfile0 & ib_logfile1), set innodb-page-size to either 4k or 8k, and restart MySQL. A new InnoDB instance will be created with the smaller page size. Then you can import your data and run your tests, all without recompiling InnoDB.


PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB 5.6.4 supports databases with 4k and 8k page sizes

Декабрь 20th, 2011

In the 5.6.4 release it is now possible to create an InnoDB database with 4k or 8k page sizes in addition to the original 16k page size. Previously, it could be done by recompiling the engine with a different value for UNIV_PAGE_SIZE_SHIFT and UNIV_PAGE_SIZE. With this release, you can set –innodb-page-size=n when starting mysqld, or put innodb_page_size=n in the configuration file in the [mysqld] section where n can be 4k, 8k, 16k, or 4096, 8192, 16384.

The support of smaller page sizes may be useful for certain storage media such as SSDs. Performance results can vary depending on your data schema, record size, and read/write ratio. But this provides you more options to optimize your performance.

When this new setting is used, the page size is set for all tablespaces used by that InnoDB instance. You can query the current value with;

SHOW VARIABLES LIKE ‘innodb_page_size’;
or
SELECT variable_value FROM information_schema.global_status  WHERE LOWER(variable_name) = ‘innodb_page_size’;

It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace. That happens when InnoDB does not find ibdata1 in the data directory. If you start mysqld with a page size other than the standard 16k, the error log will contain something like this;

111214 15:55:05 InnoDB: innodb-page-size has been changed from the default value 16384 to 4096.

If your system tablespace already exists using one page size and innodb-page-size is something else, the engine will not start. There will be a message logged like this;

111214 16:06:51 InnoDB: Error: data file .\ibdata1 uses page size 4096,
111214 16:06:51 InnoDB: but the start-up parameter is innodb-page-size=16384

InnoDB knows the page size used in an existing tablespace created by version 5.6.4 because it stamps that page size in the header page. But this is not readable to older engines, of course. If an older engine opens a database with a page size other than 16k, it will report a corrupted page and quit.

All features in InnoDB work the same with smaller page sizes. But some limits are affected. The maximum record size is proportionately less with smaller pages. Record length limits are calculated within InnoDB based on a variety of factors including row type, column type and length, number of columns, secondary indexes, index prefix lengths, and of course, the page size. The main record is stored in a clustered index and a minimum of two records must fit into each page. So the maximum record length for 8k pages is about half that of 16k pages and the maximum record size with 4k pages is about half that of 8k pages.

In addition to record lengths, the maximum key lengths are proportionately smaller. For 16k pages, MySQL prevents key definitions from containing over 3072 bytes of data. This means that the primary key in the clustered index cannot use more than 3072 bytes of data. Secondary indexes can only contain up to 3072 bytes of data in the key also. But within a secondary index, InnoDB must store the primary key as the record pointer.  So if you have a 3072 byte primary key and a 3072 byte secondary key, each entry in the secondary index contains 6144 bytes of data plus internal record overhead. The amount of internal overhead is dependent upon the column types, but this gets the secondary record close to half the page size which is its natural limit. So this limit of 3072 bytes per key is less than but close to what InnoDB would have to impose on a table by table basis.  Based on that, InnoDB now reports to MySQL that any key defined for 8k page sizes must be less than 1536 bytes.  Likewise, the maximum key length when InnoDB uses 4k page sizes is 768 bytes.

If you have a database schema with any large records or keys defined, you may not be able to use smaller page sizes. Even if your records do barely fit in the clustered index page, it may not be advisable to use these smaller pages because the btree will be a lot deeper.  For example, if only 2 records fit on the page and there are 1,000,000 records, leaf pages are 20 levels deep, meaning InnoDB will need to read 20 pages to find the leaf page.  If that were on 4k pages, then using the same table on 16k pages would give 8 records per page and the leaf pages would only be 7 levels down.

There is a trick to reducing the size of records on the clustered index page;
1) Use Dynamic or Compressed row format.
2) Convert VARCHAR fields to TEXT fields.  (VARBINARY can be converted to BLOB)

There are 4 ROW FORMATS in InnoDB. The first two, Redundant and Compact, which are considered the Antelope file version, store at least 768 bytes of each field in the clustered record.  The Barracuda file version consists of Compact and Dynamic ROW FORMATS. These have the ability to put all of a VARCHAR, VARBINARY, TEXT or BLOB field onto a separate BLOB page for storage.  So one good way to prepare a table structure to use smaller page sizes is to use Dynamic or Compressed row format.

The decision of how big a record will be in the clustered record is made during the INSERT or UPDATE.  Each record is evaluated for its own length.  If the record is too long to fit in half the page, InnoDB will look for the longest actual field in that record and put as much of that field off-page as possible based on the row type.  Then if it is still to long, it will shorten the longest field remaining.  Since this is done when the record is added to the page, different records may have different columns stored off-page when there are multiple long fields in the record.

VARCHAR/VARBINARY fields are treated like TEXT/BLOB fields if they are over 255 bytes long.  If you are using Compressed or Dynamic row format and your record is too long because you have too many VARCHAR fields 255 bytes or less, you can reduce the record length by converting them to TEXT fields. Likewise, VARBINARY fields 255 bytes or less can be converted to BLOB fields to take advantage of this ability to store the whole TEXT or BLOB field on a separate page.

A file extent in InnoDB is 1 Mb independent of the page size. So an extent will hold 64 16k pages, 128 8k pages and 256 4k pages.  This means that the read ahead mechanisms will read more pages with smaller page sizes since they read a whole extent at a time.  The doublewrite buffer, which is based on the size of an extent, will also contain more pages.

If you want to use smaller page sizes with existing data, export the data first with a logical export utility such as mysqldump. Then create the new mysql instance with innodb-page-size=4k or 8k and import the data. Do not use a physical export method such as alter table … discard tablespace.

Summary:

This feature makes it easier to try smaller page sizes in an InnoDB database. And with the 5.6.4 release, those smaller page sizes are fully supported by MySQL. Just export your data, move or delete the system database (ibdata1) and the log files (ib_logfile0 & ib_logfile1), set innodb-page-size to either 4k or 8k, and restart MySQL. A new InnoDB instance will be created with the smaller page size. Then you can import your data and run your tests, all without recompiling InnoDB.


PlanetMySQL Voting: Vote UP / Vote DOWN

Initial thoughts on space compression using the innodb_plugin

Май 23rd, 2010

While setting up MySQL Enterprise Monitor 2.2 (Merlin) on a system which had been running version 2.1 I thought I’d try and see what difference the change from using normal innodb tables to using the compressed table format available in the innodb plugin.

I’ve been using a separate db backend for merlin because for me it’s easier to manage and also the database backend has been put on a dedicated server. I’ve also been trying the innodb_plugin on another busier server as I had performance problems with the normal 5.1.42 built-in innodb engine which the plugin managed to solve.

So given that I was using a separate db server I upgraded it to 5.1.47, configured the server to use the plugin (1.0.8) rather than to use the built-in innodb engine and then decided to alter the data tables (dc_p_long, dc_p_string and dc_p_double) to use the new innodb compressed table format. These tables are designed for storing a large number of rows of a specific type but there was no harm in trying.

Here are the results by doing the following:

SET GLOBAL innodb_file_format = "Barracuda";
ALTER TABLE dc_p_xxxx ROW_FORMAT=compressed;

Using the older Antelope storage format:

dc_p_string 178 MB
dc_p_double 514 MB
dc_p_long 15.3 GB

Using Barracuda COMPRESSED:
dc_p_string 35 MB
dc_p_double 223 MB
dc_p_long 6.8 GB

The compressed format is using the standard block size. I need to do further tests to see how much difference using the smaller 1 kb, 2 kb or 4 kb blocks will make.

So from the point of view of Merlin only it does seem to make sense to use this format, assuming performance is not significantly affected. After all it means I can store twice the amount of data on disk, and one of the problems I have had in the past is that I could only keep a week’s worth of data because of storage limitations. Note: the 2.1 to 2.2 update will save a lot of space as the string table will drop in size significantly. However gaining an extra 50% by using innodb compression seems worth doing if it comes for free.

That said I have been told that there are still a few issues with this new table format so for anyone looking to use it in production it may be best to wait for 5.1.48 which should remove a few of these edge cases. If you only want to see how much difference the storage usage is then 5.1.47 should be ok. YMMV.

In the meantime I’m going to leave this server running for a while. Merlin does hammer the db quite heavily, so I’ll be able to see if it survives in a few days.

I also have a few servers which currently use MyISAM tables because of the smaller disk footprint compared to innodb. These servers do suffer from some of MyISAM’s weaknesses such as concurrent reading and writing on the same table is not possible, so now it looks like we might have a good reason to try this compressed table format out and with that gain a lot using innodb. Recovery after a crash has always been a problem on this type of server and innodb recovery should be both quicker and less intrusive.

I’m looking forward to further experimentation but so far this new compressed format does look promising. So thanks to the innodb folk who have made this possible.


PlanetMySQL Voting: Vote UP / Vote DOWN