Archive for the ‘Feature’ Category

InnoDB 5.6.4 supports databases with 4k and 8k page sizes

Декабрь 20th, 2011

In the 5.6.4 release it is now possible to create an InnoDB database with 4k or 8k page sizes in addition to the original 16k page size. Previously, it could be done by recompiling the engine with a different value for UNIV_PAGE_SIZE_SHIFT and UNIV_PAGE_SIZE. With this release, you can set –innodb-page-size=n when starting mysqld, or put innodb_page_size=n in the configuration file in the [mysqld] section where n can be 4k, 8k, 16k, or 4096, 8192, 16384.

The support of smaller page sizes may be useful for certain storage media such as SSDs. Performance results can vary depending on your data schema, record size, and read/write ratio. But this provides you more options to optimize your performance.

When this new setting is used, the page size is set for all tablespaces used by that InnoDB instance. You can query the current value with;

SHOW VARIABLES LIKE ‘innodb_page_size’;
or
SELECT variable_value FROM information_schema.global_status  WHERE LOWER(variable_name) = ‘innodb_page_size’;

It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace. That happens when InnoDB does not find ibdata1 in the data directory. If you start mysqld with a page size other than the standard 16k, the error log will contain something like this;

111214 15:55:05 InnoDB: innodb-page-size has been changed from the default value 16384 to 4096.

If your system tablespace already exists using one page size and innodb-page-size is something else, the engine will not start. There will be a message logged like this;

111214 16:06:51 InnoDB: Error: data file .\ibdata1 uses page size 4096,
111214 16:06:51 InnoDB: but the start-up parameter is innodb-page-size=16384

InnoDB knows the page size used in an existing tablespace created by version 5.6.4 because it stamps that page size in the header page. But this is not readable to older engines, of course. If an older engine opens a database with a page size other than 16k, it will report a corrupted page and quit.

All features in InnoDB work the same with smaller page sizes. But some limits are affected. The maximum record size is proportionately less with smaller pages. Record length limits are calculated within InnoDB based on a variety of factors including row type, column type and length, number of columns, secondary indexes, index prefix lengths, and of course, the page size. The main record is stored in a clustered index and a minimum of two records must fit into each page. So the maximum record length for 8k pages is about half that of 16k pages and the maximum record size with 4k pages is about half that of 8k pages.

In addition to record lengths, the maximum key lengths are proportionately smaller. For 16k pages, MySQL prevents key definitions from containing over 3072 bytes of data. This means that the primary key in the clustered index cannot use more than 3072 bytes of data. Secondary indexes can only contain up to 3072 bytes of data in the key also. But within a secondary index, InnoDB must store the primary key as the record pointer.  So if you have a 3072 byte primary key and a 3072 byte secondary key, each entry in the secondary index contains 6144 bytes of data plus internal record overhead. The amount of internal overhead is dependent upon the column types, but this gets the secondary record close to half the page size which is its natural limit. So this limit of 3072 bytes per key is less than but close to what InnoDB would have to impose on a table by table basis.  Based on that, InnoDB now reports to MySQL that any key defined for 8k page sizes must be less than 1536 bytes.  Likewise, the maximum key length when InnoDB uses 4k page sizes is 768 bytes.

If you have a database schema with any large records or keys defined, you may not be able to use smaller page sizes. Even if your records do barely fit in the clustered index page, it may not be advisable to use these smaller pages because the btree will be a lot deeper.  For example, if only 2 records fit on the page and there are 1,000,000 records, leaf pages are 20 levels deep, meaning InnoDB will need to read 20 pages to find the leaf page.  If that were on 4k pages, then using the same table on 16k pages would give 8 records per page and the leaf pages would only be 7 levels down.

There is a trick to reducing the size of records on the clustered index page;
1) Use Dynamic or Compressed row format.
2) Convert VARCHAR fields to TEXT fields.  (VARBINARY can be converted to BLOB)

There are 4 ROW FORMATS in InnoDB. The first two, Redundant and Compact, which are considered the Antelope file version, store at least 768 bytes of each field in the clustered record.  The Barracuda file version consists of Compact and Dynamic ROW FORMATS. These have the ability to put all of a VARCHAR, VARBINARY, TEXT or BLOB field onto a separate BLOB page for storage.  So one good way to prepare a table structure to use smaller page sizes is to use Dynamic or Compressed row format.

The decision of how big a record will be in the clustered record is made during the INSERT or UPDATE.  Each record is evaluated for its own length.  If the record is too long to fit in half the page, InnoDB will look for the longest actual field in that record and put as much of that field off-page as possible based on the row type.  Then if it is still to long, it will shorten the longest field remaining.  Since this is done when the record is added to the page, different records may have different columns stored off-page when there are multiple long fields in the record.

VARCHAR/VARBINARY fields are treated like TEXT/BLOB fields if they are over 255 bytes long.  If you are using Compressed or Dynamic row format and your record is too long because you have too many VARCHAR fields 255 bytes or less, you can reduce the record length by converting them to TEXT fields. Likewise, VARBINARY fields 255 bytes or less can be converted to BLOB fields to take advantage of this ability to store the whole TEXT or BLOB field on a separate page.

A file extent in InnoDB is 1 Mb independent of the page size. So an extent will hold 64 16k pages, 128 8k pages and 256 4k pages.  This means that the read ahead mechanisms will read more pages with smaller page sizes since they read a whole extent at a time.  The doublewrite buffer, which is based on the size of an extent, will also contain more pages.

If you want to use smaller page sizes with existing data, export the data first with a logical export utility such as mysqldump. Then create the new mysql instance with innodb-page-size=4k or 8k and import the data. Do not use a physical export method such as alter table … discard tablespace.

Summary:

This feature makes it easier to try smaller page sizes in an InnoDB database. And with the 5.6.4 release, those smaller page sizes are fully supported by MySQL. Just export your data, move or delete the system database (ibdata1) and the log files (ib_logfile0 & ib_logfile1), set innodb-page-size to either 4k or 8k, and restart MySQL. A new InnoDB instance will be created with the smaller page size. Then you can import your data and run your tests, all without recompiling InnoDB.


PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB 5.6.4 supports databases with 4k and 8k page sizes

Декабрь 20th, 2011

In the 5.6.4 release it is now possible to create an InnoDB database with 4k or 8k page sizes in addition to the original 16k page size. Previously, it could be done by recompiling the engine with a different value for UNIV_PAGE_SIZE_SHIFT and UNIV_PAGE_SIZE. With this release, you can set –innodb-page-size=n when starting mysqld, or put innodb_page_size=n in the configuration file in the [mysqld] section where n can be 4k, 8k, 16k, or 4096, 8192, 16384.

The support of smaller page sizes may be useful for certain storage media such as SSDs. Performance results can vary depending on your data schema, record size, and read/write ratio. But this provides you more options to optimize your performance.

When this new setting is used, the page size is set for all tablespaces used by that InnoDB instance. You can query the current value with;

SHOW VARIABLES LIKE ‘innodb_page_size’;
or
SELECT variable_value FROM information_schema.global_status  WHERE LOWER(variable_name) = ‘innodb_page_size’;

It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace. That happens when InnoDB does not find ibdata1 in the data directory. If you start mysqld with a page size other than the standard 16k, the error log will contain something like this;

111214 15:55:05 InnoDB: innodb-page-size has been changed from the default value 16384 to 4096.

If your system tablespace already exists using one page size and innodb-page-size is something else, the engine will not start. There will be a message logged like this;

111214 16:06:51 InnoDB: Error: data file .\ibdata1 uses page size 4096,
111214 16:06:51 InnoDB: but the start-up parameter is innodb-page-size=16384

InnoDB knows the page size used in an existing tablespace created by version 5.6.4 because it stamps that page size in the header page. But this is not readable to older engines, of course. If an older engine opens a database with a page size other than 16k, it will report a corrupted page and quit.

All features in InnoDB work the same with smaller page sizes. But some limits are affected. The maximum record size is proportionately less with smaller pages. Record length limits are calculated within InnoDB based on a variety of factors including row type, column type and length, number of columns, secondary indexes, index prefix lengths, and of course, the page size. The main record is stored in a clustered index and a minimum of two records must fit into each page. So the maximum record length for 8k pages is about half that of 16k pages and the maximum record size with 4k pages is about half that of 8k pages.

In addition to record lengths, the maximum key lengths are proportionately smaller. For 16k pages, MySQL prevents key definitions from containing over 3072 bytes of data. This means that the primary key in the clustered index cannot use more than 3072 bytes of data. Secondary indexes can only contain up to 3072 bytes of data in the key also. But within a secondary index, InnoDB must store the primary key as the record pointer.  So if you have a 3072 byte primary key and a 3072 byte secondary key, each entry in the secondary index contains 6144 bytes of data plus internal record overhead. The amount of internal overhead is dependent upon the column types, but this gets the secondary record close to half the page size which is its natural limit. So this limit of 3072 bytes per key is less than but close to what InnoDB would have to impose on a table by table basis.  Based on that, InnoDB now reports to MySQL that any key defined for 8k page sizes must be less than 1536 bytes.  Likewise, the maximum key length when InnoDB uses 4k page sizes is 768 bytes.

If you have a database schema with any large records or keys defined, you may not be able to use smaller page sizes. Even if your records do barely fit in the clustered index page, it may not be advisable to use these smaller pages because the btree will be a lot deeper.  For example, if only 2 records fit on the page and there are 1,000,000 records, leaf pages are 20 levels deep, meaning InnoDB will need to read 20 pages to find the leaf page.  If that were on 4k pages, then using the same table on 16k pages would give 8 records per page and the leaf pages would only be 7 levels down.

There is a trick to reducing the size of records on the clustered index page;
1) Use Dynamic or Compressed row format.
2) Convert VARCHAR fields to TEXT fields.  (VARBINARY can be converted to BLOB)

There are 4 ROW FORMATS in InnoDB. The first two, Redundant and Compact, which are considered the Antelope file version, store at least 768 bytes of each field in the clustered record.  The Barracuda file version consists of Compact and Dynamic ROW FORMATS. These have the ability to put all of a VARCHAR, VARBINARY, TEXT or BLOB field onto a separate BLOB page for storage.  So one good way to prepare a table structure to use smaller page sizes is to use Dynamic or Compressed row format.

The decision of how big a record will be in the clustered record is made during the INSERT or UPDATE.  Each record is evaluated for its own length.  If the record is too long to fit in half the page, InnoDB will look for the longest actual field in that record and put as much of that field off-page as possible based on the row type.  Then if it is still to long, it will shorten the longest field remaining.  Since this is done when the record is added to the page, different records may have different columns stored off-page when there are multiple long fields in the record.

VARCHAR/VARBINARY fields are treated like TEXT/BLOB fields if they are over 255 bytes long.  If you are using Compressed or Dynamic row format and your record is too long because you have too many VARCHAR fields 255 bytes or less, you can reduce the record length by converting them to TEXT fields. Likewise, VARBINARY fields 255 bytes or less can be converted to BLOB fields to take advantage of this ability to store the whole TEXT or BLOB field on a separate page.

A file extent in InnoDB is 1 Mb independent of the page size. So an extent will hold 64 16k pages, 128 8k pages and 256 4k pages.  This means that the read ahead mechanisms will read more pages with smaller page sizes since they read a whole extent at a time.  The doublewrite buffer, which is based on the size of an extent, will also contain more pages.

If you want to use smaller page sizes with existing data, export the data first with a logical export utility such as mysqldump. Then create the new mysql instance with innodb-page-size=4k or 8k and import the data. Do not use a physical export method such as alter table … discard tablespace.

Summary:

This feature makes it easier to try smaller page sizes in an InnoDB database. And with the 5.6.4 release, those smaller page sizes are fully supported by MySQL. Just export your data, move or delete the system database (ibdata1) and the log files (ib_logfile0 & ib_logfile1), set innodb-page-size to either 4k or 8k, and restart MySQL. A new InnoDB instance will be created with the smaller page size. Then you can import your data and run your tests, all without recompiling InnoDB.


PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB 2011 Summer Labs Releases

Июль 27th, 2011

In April of 2011, InnoDB team published the early access of NoSQL to InnoDB with memcached, plus several new features as part of MySQL 5.6.2 milestone release. This week, we announced additional early access to new InnoDB features for the community to test, and provide feedback.

There are two release packages from InnoDB team on MySQL Labs: InnoDB full-text search, and InnoDB new features.

InnoDB Full-Text Search

MySQL 5.5 makes InnoDB the default storage engine, so everyone can benefit from ACID-compliant transactions, referential integrity, crash recovery.  However, some users need InnoDB to have built-in full-text search, similar to MyISAM’s full-text search.

InnoDB full-text search provides users with the ability to build full text indices and search for specific text-based content stored in InnoDB tables.  This new functionality supports fast and accurate search on document content using natural language, boolean, wildcard, and proximity search.

The design and implementation of InnoDB full-text search can trace back to 2005, when Osku Salerma detailed the design in his master thesis “Design of a Full Text Search index for a database management system“. Later, Sunny Bains and Jimmy Yang from the InnoDB team took over the development and made major contributions to this important feature.

Jimmy gave an overview of InnoDB full-text search, and the main differences in design between InnoDB full-text search and MyISAM full-text search. John provided a set of examples in the tutorial. What about the performance of InnoDB full-text search, you can find out in Vinay and Jimmy’s article.

Please download mysql-5.6-labs-innodb-fts from MySQL Labs and give a try.

InnoDB New Features

The package mysql-5.6-labs-innodb-features on MySQL Labs consists of a set of InnoDB new features since MySQL 5.6.2 milestone release, except InnoDB full-text search. Some of the new features are already in MySQL server main development tree, and the rest of them are intended to move into the main development tree toward future development milestone releases and GA releases.

The new InnoDB features included in this package are:

  • Increase the max size of redo log files from 4GB to 2TB
  • Reduce contention during file extension
  • Make deadlock detection non-recursive
  • Improve thread scheduling
  • Change rw-lock to mutex for trx_sys_t
  • Option to preload InnoDB buffer pool
  • Allow UNDO logs to reside in their own tablespace
  • Reintroduce random readahead
  • Support smaller page sizes (4K & 8K)
  • Increase the max length of prefix index from 767 bytes to 3072 bytes

In additional to continue improvements of InnoDB performance and scalability, we are also focusing on optimizing InnoDB for flash drives. InnoDB with flash drives could benefit from new features such as larger REDO log files, separate UNDO logs, smaller page sizes, and preloaded buffer pool.

Group commit with binlog is released separately by MySQL replication team, also on MySQL Labs.

Want to learn the details of InnoDB new features? Download mysql-5.6-labs-innodb-features from MySQL Labs, play with it, and read the blogs from InnoDB engineers:

BTW, do not forget to send us feedback. Thanks for being interested in InnoDB!


PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB Full-Text Search Tutorial

Июль 27th, 2011

The InnoDB full-text search capability is an exciting feature. The full-text search itself is generally useful to have in an RDBMS. If an application is using all InnoDB tables except for one that is used for full-text searches, now that last table can be switched to InnoDB. If putting the full-text data in a MyISAM table led to scalability problems, duplication, or a less-than-ideal schema design, now those issues can be addressed.

In this post, I’ll take you through some of the basics of setting up and querying an InnoDB FULLTEXT search index. I’ll leave the scalability and performance aspects to Jimmy’s and Vinay’s blog posts, and just use some toy-sized data for demonstration purposes.

Creating a Table with a Full-Text Search Index

The key component of this feature is an index of type FULLTEXT, applied to one or more columns of an InnoDB table.

In Jimmy’s post, he mentions some scalability considerations where you might create the table (including a special FTS_DOC_ID column), load the data, then create the FULLTEXT index afterward. For simplicity (and since the data volume is so small), I’ll create the table with the index in place, then load the data afterward.

use test;
-- We will do some commits and rollbacks to demonstrate transactional features.
-- So turn off the default setting that commits immediately after each statement.
set autocommit=0;

drop table if exists quotes;
-- In 5.5 and above, by default this table is an InnoDB table.
-- The full-text search feature lets us define the FULLTEXT index.

create table quotes
  (    id int unsigned auto_increment primary key
    , author varchar(64)    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
  );

-- Get some words and phrases to search for into the table.
insert into quotes (author, quote, source) values
  ('Abraham Lincoln', 'Fourscore and seven years ago...',
  'Gettysburg Address')
, ('George Harrison', 'All those years ago...',
  'Live In Japan')
, ('Arthur C. Clarke', 'Then 10 years ago the monolith was discovered.',
  '2010: The Year We Make Contact')
, ('Benjamin Franklin',
  'Early to bed and early to rise, makes a man healthy, wealthy, and wise.',
  'Poor Richard''s Almanack')
, ('James Thurber',
  'Early to rise and early to bed makes a male healthy and wealthy and dead.',
  'The New Yorker')
, ('K', '1500 hundred years ago, everybody knew that the Earth was the center of the universe.',
  'Men in Black')
;

-- Since this is an InnoDB table, we are mindful of transactions.
commit;

Word and Phrase Search – Natural Language Mode

Once the data is loaded and committed, you can run queries using the MATCH(columns) AGAINST (search expression) operator to do the actual searches. You can combine this operator with all the usual WHERE and similar clauses in the SELECT statement.

The simplest kind of search is to find a single word, or a phrase with all words in exact order. For this type of search, use the IN NATURAL LANGUAGE clause inside the AGAINST() call. This technique typically involves a user-entered string that you pass verbatim to the query (of course, after escaping any quotation marks or other special characters to prevent SQL injection attacks).

-- Search for a single word.
select author as "Monolith" from quotes
  where match(quote) against ('monolith' in natural language mode);
+------------------+
| Monolith         |
+------------------+
| Arthur C. Clarke |
+------------------+
1 row in set (0.01 sec)

select author as "Ago" from quotes
  where match(quote) against ('ago' in natural language mode);
+------------------+
| Ago              |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

-- Search for a phrase.
select author as "Years Ago" from quotes
  where match(quote) against ('years ago' in natural language mode);
+------------------+
| Years Ago        |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

AND / OR / NOT Operators – Boolean Mode

For more complicated searches, you can have multiple words and phrases and search for different combinations of optional and required terms, not necessarily in the same order. This technique typically involves several data values that you query from elsewhere, or splitting apart a user-entered string and applying your own rules to the words and phrases inside.

-- Search for a combination of words, not in the same order as the original.
select author as "Ago and Years" from quotes
  where match(quote) against ('+ago +years' in boolean mode);
+------------------+
| Ago and Years    |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

-- Search for other Boolean combinations of words.
select author as "Fourscore or Monolith" from quotes
  where match(quote) against ('fourscore monolith' in boolean mode);
+-----------------------+
| Fourscore or Monolith |
+-----------------------+
| Abraham Lincoln       |
| Arthur C. Clarke      |
+-----------------------+
2 rows in set (0.00 sec)

select author as "Years and not Monolith" from quotes
  where match(quote) against ('+years -monolith' in boolean mode);
+------------------------+
| Years and not Monolith |
+------------------------+
| Abraham Lincoln        |
| George Harrison        |
| K                      |
+------------------------+
3 rows in set (0.00 sec)

Proximity Search

Proximity search is a special case of Boolean search using the @ operator within the AGAINST() string. You supply 2 or more words, double-quoted, within the single-quoted AGAINST() string, followed by @distance to specify how far apart these words can be. The distance represents the maximum number of bytes between the starting points of all these words.

-- The starting points for these words are too far apart
-- (not within 20 bytes), so no results.
select quote as "Too Far Apart" from quotes
  where match(quote) against ('"early wise" @20' in boolean mode);
Empty set (0.00 sec)

-- But the starting points of all words are within 100 bytes,
-- so this query does give results.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @100' in boolean mode);
+-------------------------------------------------------------------------+
| Early...Wise                                                            |
+-------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- In this case, the smallest distance that produces results is 49.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @49' in boolean mode);
+-------------------------------------------------------------------------+
| Early...Wise                                                            |
+-------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- Here is an example showing 2 results, with the words close to each other.
select quote as "Early...Bed" from quotes
  where match(quote) against ('"early bed" @20' in boolean mode);
+---------------------------------------------------------------------------+
| Early...Bed                                                               |
+---------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise.   |
| Early to rise and early to bed makes a male healthy and wealthy and dead. |
+---------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Relevance Ranking

The relevance ranking is fairly basic, derived from word frequencies within each document and the search data overall. Typically, you would only ORDER BY this value for very simplistic searches of small documents; for any important search you would layer your own ranking logic on top, perhaps with the MySQL relevance value as one factor in the overall rank.

-- Get the relevance of a single word.
select substr(quote,1,20) as "And",
  match(quote) against ('and' in natural language mode) as "Relevance"
  from quotes order by "Relevance" desc;
+----------------------+--------------------+
| And                  | Relevance          |
+----------------------+--------------------+
| Fourscore and seven  | 0.0906190574169159 |
| All those years ago. |                  0 |
| Then 10 years ago th |                  0 |
| Early to bed and ear | 0.1812381148338318 |
| Early to rise and ea | 0.2718571722507477 |
| 1500 hundred years a |                  0 |
+----------------------+--------------------+
6 rows in set (0.00 sec)

Transactions

The key idea behind bringing full-text search to InnoDB tables is to make this feature compatible with transactions, so that you can include full-text columns alongside other columns in tables in ways that make sense in terms of schema design, and multiple sessions can update the full-text column data (and/or other columns in the table) simultaneously. The full-text data doesn’t have to be treated as read-only or read-mostly.

As mentioned in Jimmy’s blog post, the table structures that manipulate the full-text data behind the scenes are only updated at COMMIT time. So make sure to insert or update full-text data in one transaction, commit, and then run any full-text queries in a subsequent transaction. (Actually, in the examples below, it looks like the data is taken out of the full-text results as soon as a DELETE is issued, then comes back if the deletion is rolled back. I think that is explained in Jimmy’s blog post by the discussion about the delete-marking optimization to avoid huge updates to the full-text index for deleted data.)

drop table if exists quotes_uncommitted;

create table quotes_uncommitted
  (
      author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
    , primary key (author, quote(128))
  );

-- We insert but don't immediately commit.
insert into quotes_uncommitted select author, quote, source from quotes;
-- Within the same transaction, a full-text search does not see the uncommitted data.
select count(author), author as "Uncommitted Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------+
| count(author) | Uncommitted Results |
+---------------+---------------------+
|             0 | NULL                |
+---------------+---------------------+
1 row in set (0.00 sec)

-- If the newly inserted rows are rolled back...
rollback;
-- ...then the full-text search still doesn't see them.
select count(author), author as "Rolled-Back Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------+
| count(author) | Rolled-Back Results |
+---------------+---------------------+
|             0 | NULL                |
+---------------+---------------------+
1 row in set (0.00 sec)

-- OK, let's start with some committed data in the table, then empty the table,
-- and then try some FTS queries
-- both before and after the commit.
insert into quotes_uncommitted select author, quote, source from quotes;
commit;
delete from quotes_uncommitted;
select count(author), author as "Deleted but still not committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------------------+
| count(author) | Deleted but still not committed |
+---------------+---------------------------------+
|             0 | NULL                            |
+---------------+---------------------------------+
1 row in set (0.00 sec)

rollback;
select count(author), author as "Deleted and rolled back" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-------------------------+
| count(author) | Deleted and rolled back |
+---------------+-------------------------+
|             4 | Abraham Lincoln         |
+---------------+-------------------------+
1 row in set (0.00 sec)

delete from quotes_uncommitted;
commit;
select count(author), author as "Deleted and committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-----------------------+
| count(author) | Deleted and committed |
+---------------+-----------------------+
|             0 | NULL                  |
+---------------+-----------------------+
1 row in set (0.00 sec)

insert into quotes_uncommitted select author, quote, source from quotes;
commit;
truncate table quotes_uncommitted;
select count(author), author as "Truncated" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-----------+
| count(author) | Truncated |
+---------------+-----------+
|             0 | NULL      |
+---------------+-----------+
1 row in set (0.00 sec)

Multi-Column Searches

Although you can only have one FULLTEXT index in an InnoDB table, that index can apply to multiple columns, allowing you to search when you aren’t sure which column contains the term. With a multi-column index, we can MATCH() against all the columns to find words that appear in any of those columns. Always reference all the same columns in the MATCH() clause as in the FULLTEXT index definition, because the information about which column the words appear in is not included in the full-text search data.

drop table if exists quotes_multi_col;

create table quotes_multi_col
  (
    id int unsigned auto_increment primary key
    , author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(author, quote, source)
  );

insert into quotes_multi_col select * from quotes;
commit;

select author as "Poor 1 (NL)", substr(quote,1,15) as "Poor 2 (NL)", source as "Poor 3 (NL)" from
  quotes_multi_col where match(author, quote, source)
  against ('poor' in natural language mode);
+-------------------+-----------------+-------------------------+
| Poor 1 (NL)       | Poor 2 (NL)     | Poor 3 (NL)             |
+-------------------+-----------------+-------------------------+
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
+-------------------+-----------------+-------------------------+
1 row in set (0.00 sec)

select author as "Poor 1 (BOOL)", substr(quote,1,15) as "Poor 2 (BOOL)", source as "Poor 3 (BOOL)"
  from quotes_multi_col where match(author, quote, source)
  against ('poor' in boolean mode);
+-------------------+-----------------+-------------------------+
| Poor 1 (BOOL)     | Poor 2 (BOOL)   | Poor 3 (BOOL)           |
+-------------------+-----------------+-------------------------+
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
+-------------------+-----------------+-------------------------+
1 row in set (0.00 sec)

select author as "Clarke 1 (NL)", substr(quote,1,15) as "Clarke 2 (NL)", source as "Clarke 3 (NL)"
  from quotes_multi_col where match(author, quote, source)
  against ('clarke' in natural language mode);
+------------------+-----------------+--------------------------------+
| Clarke 1 (NL)    | Clarke 2 (NL)   | Clarke 3 (NL)                  |
+------------------+-----------------+--------------------------------+
| Arthur C. Clarke | Then 10 years a | 2010: The Year We Make Contact |
+------------------+-----------------+--------------------------------+
1 row in set (0.00 sec)

Interaction with Other Indexes

Remember that the design of your primary key index and secondary indexes is a big factor in query performance for InnoDB tables.

  • You can include parts (prefixes) of the full-text column(s) within the primary key.
  • However, that might not be a good idea if (a) the associated columns will ever be updated — which causes an expensive reorganization within the InnoDB table, or (b) if the table will have any other secondary indexes — the primary key values for a row are duplicated in the entry for that row in every secondary index, making index operations require more I/O and memory.
  • As mentioned in Jimmy’s blog post, adding the FULLTEXT index to the table is going to create a new column and associated index in the original table, so you could set up the column and index ahead of time, to avoid table reorganization later.
  • You can use the unique constraint of the primary key or a UNIQUE index to prevent duplicate values or combinations of values from being entered.
  • You can use the not-null constraint of the primary key to prevent blank values or combinations of values from being entered.
  • For the Labs release, the InnoDB FULLTEXT processing isn’t integrated with the MySQL optimizer and its estimates for which index is best to use, so don’t draw conclusions about performance characteristics from this early preview.

Stopwords

Stopwords are typically short, commonly used words that you designate as not significant for a search. They are left out of the FULLTEXT index and ignored when entered in FULLTEXT queries. For example, a search for ‘the’ is unsuccessful because it’s in the default stopword list. For your own customized search, you might create a bigger list (say, with common words from several languages) or a smaller one (for example, a music or movie site where words such as “The” in names and titles are significant). The details about customizing the stopword list are in Jimmy’s blog post.

select count(*), author as "Stopword 1", quote as "Stopword 2", source as "Stopword 3"
  from quotes_multi_col
  where match(author, quote, source) against ('the' in natural language mode);
+----------+------------+------------+------------+
| count(*) | Stopword 1 | Stopword 2 | Stopword 3 |
+----------+------------+------------+------------+
|        0 | NULL       | NULL       | NULL       |
+----------+------------+------------+------------+

PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB Full-Text Search Tutorial

Июль 27th, 2011

The InnoDB full-text search capability is an exciting feature. The full-text search itself is generally useful to have in an RDBMS. If an application is using all InnoDB tables except for one that is used for full-text searches, now that last table can be switched to InnoDB. If putting the full-text data in a MyISAM table led to scalability problems, duplication, or a less-than-ideal schema design, now those issues can be addressed.

In this post, I’ll take you through some of the basics of setting up and querying an InnoDB FULLTEXT search index. I’ll leave the scalability and performance aspects to Jimmy’s and Vinay’s blog posts, and just use some toy-sized data for demonstration purposes.

Creating a Table with a Full-Text Search Index

The key component of this feature is an index of type FULLTEXT, applied to one or more columns of an InnoDB table.

In Jimmy’s post, he mentions some scalability considerations where you might create the table (including a special FTS_DOC_ID column), load the data, then create the FULLTEXT index afterward. For simplicity (and since the data volume is so small), I’ll create the table with the index in place, then load the data afterward.

use test;
-- We will do some commits and rollbacks to demonstrate transactional features.
-- So turn off the default setting that commits immediately after each statement.
set autocommit=0;

drop table if exists quotes;
-- In 5.5 and above, by default this table is an InnoDB table.
-- The full-text search feature lets us define the FULLTEXT index.

create table quotes
  (    id int unsigned auto_increment primary key
    , author varchar(64)    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
  );

-- Get some words and phrases to search for into the table.
insert into quotes (author, quote, source) values
  ('Abraham Lincoln', 'Fourscore and seven years ago...',
  'Gettysburg Address')
, ('George Harrison', 'All those years ago...',
  'Live In Japan')
, ('Arthur C. Clarke', 'Then 10 years ago the monolith was discovered.',
  '2010: The Year We Make Contact')
, ('Benjamin Franklin',
  'Early to bed and early to rise, makes a man healthy, wealthy, and wise.',
  'Poor Richard''s Almanack')
, ('James Thurber',
  'Early to rise and early to bed makes a male healthy and wealthy and dead.',
  'The New Yorker')
, ('K', '1500 hundred years ago, everybody knew that the Earth was the center of the universe.',
  'Men in Black')
;

-- Since this is an InnoDB table, we are mindful of transactions.
commit;

Word and Phrase Search – Natural Language Mode

Once the data is loaded and committed, you can run queries using the MATCH(columns) AGAINST (search expression) operator to do the actual searches. You can combine this operator with all the usual WHERE and similar clauses in the SELECT statement.

The simplest kind of search is to find a single word, or a phrase with all words in exact order. For this type of search, use the IN NATURAL LANGUAGE clause inside the AGAINST() call. This technique typically involves a user-entered string that you pass verbatim to the query (of course, after escaping any quotation marks or other special characters to prevent SQL injection attacks).

-- Search for a single word.
select author as "Monolith" from quotes
  where match(quote) against ('monolith' in natural language mode);
+------------------+
| Monolith         |
+------------------+
| Arthur C. Clarke |
+------------------+
1 row in set (0.01 sec)

select author as "Ago" from quotes
  where match(quote) against ('ago' in natural language mode);
+------------------+
| Ago              |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

-- Search for a phrase.
select author as "Years Ago" from quotes
  where match(quote) against ('years ago' in natural language mode);
+------------------+
| Years Ago        |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

AND / OR / NOT Operators – Boolean Mode

For more complicated searches, you can have multiple words and phrases and search for different combinations of optional and required terms, not necessarily in the same order. This technique typically involves several data values that you query from elsewhere, or splitting apart a user-entered string and applying your own rules to the words and phrases inside.

-- Search for a combination of words, not in the same order as the original.
select author as "Ago and Years" from quotes
  where match(quote) against ('+ago +years' in boolean mode);
+------------------+
| Ago and Years    |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

-- Search for other Boolean combinations of words.
select author as "Fourscore or Monolith" from quotes
  where match(quote) against ('fourscore monolith' in boolean mode);
+-----------------------+
| Fourscore or Monolith |
+-----------------------+
| Abraham Lincoln       |
| Arthur C. Clarke      |
+-----------------------+
2 rows in set (0.00 sec)

select author as "Years and not Monolith" from quotes
  where match(quote) against ('+years -monolith' in boolean mode);
+------------------------+
| Years and not Monolith |
+------------------------+
| Abraham Lincoln        |
| George Harrison        |
| K                      |
+------------------------+
3 rows in set (0.00 sec)

Proximity Search

Proximity search is a special case of Boolean search using the @ operator within the AGAINST() string. You supply 2 or more words, double-quoted, within the single-quoted AGAINST() string, followed by @distance to specify how far apart these words can be. The distance represents the maximum number of bytes between the starting points of all these words.

-- The starting points for these words are too far apart
-- (not within 20 bytes), so no results.
select quote as "Too Far Apart" from quotes
  where match(quote) against ('"early wise" @20' in boolean mode);
Empty set (0.00 sec)

-- But the starting points of all words are within 100 bytes,
-- so this query does give results.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @100' in boolean mode);
+-------------------------------------------------------------------------+
| Early...Wise                                                            |
+-------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- In this case, the smallest distance that produces results is 49.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @49' in boolean mode);
+-------------------------------------------------------------------------+
| Early...Wise                                                            |
+-------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- Here is an example showing 2 results, with the words close to each other.
select quote as "Early...Bed" from quotes
  where match(quote) against ('"early bed" @20' in boolean mode);
+---------------------------------------------------------------------------+
| Early...Bed                                                               |
+---------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise.   |
| Early to rise and early to bed makes a male healthy and wealthy and dead. |
+---------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Relevance Ranking

The relevance ranking is fairly basic, derived from word frequencies within each document and the search data overall. Typically, you would only ORDER BY this value for very simplistic searches of small documents; for any important search you would layer your own ranking logic on top, perhaps with the MySQL relevance value as one factor in the overall rank.

-- Get the relevance of a single word.
select substr(quote,1,20) as "And",
  match(quote) against ('and' in natural language mode) as "Relevance"
  from quotes order by "Relevance" desc;
+----------------------+--------------------+
| And                  | Relevance          |
+----------------------+--------------------+
| Fourscore and seven  | 0.0906190574169159 |
| All those years ago. |                  0 |
| Then 10 years ago th |                  0 |
| Early to bed and ear | 0.1812381148338318 |
| Early to rise and ea | 0.2718571722507477 |
| 1500 hundred years a |                  0 |
+----------------------+--------------------+
6 rows in set (0.00 sec)

Transactions

The key idea behind bringing full-text search to InnoDB tables is to make this feature compatible with transactions, so that you can include full-text columns alongside other columns in tables in ways that make sense in terms of schema design, and multiple sessions can update the full-text column data (and/or other columns in the table) simultaneously. The full-text data doesn’t have to be treated as read-only or read-mostly.

As mentioned in Jimmy’s blog post, the table structures that manipulate the full-text data behind the scenes are only updated at COMMIT time. So make sure to insert or update full-text data in one transaction, commit, and then run any full-text queries in a subsequent transaction. (Actually, in the examples below, it looks like the data is taken out of the full-text results as soon as a DELETE is issued, then comes back if the deletion is rolled back. I think that is explained in Jimmy’s blog post by the discussion about the delete-marking optimization to avoid huge updates to the full-text index for deleted data.)

drop table if exists quotes_uncommitted;

create table quotes_uncommitted
  (
      author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
    , primary key (author, quote(128))
  );

-- We insert but don't immediately commit.
insert into quotes_uncommitted select author, quote, source from quotes;
-- Within the same transaction, a full-text search does not see the uncommitted data.
select count(author), author as "Uncommitted Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------+
| count(author) | Uncommitted Results |
+---------------+---------------------+
|             0 | NULL                |
+---------------+---------------------+
1 row in set (0.00 sec)

-- If the newly inserted rows are rolled back...
rollback;
-- ...then the full-text search still doesn't see them.
select count(author), author as "Rolled-Back Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------+
| count(author) | Rolled-Back Results |
+---------------+---------------------+
|             0 | NULL                |
+---------------+---------------------+
1 row in set (0.00 sec)

-- OK, let's start with some committed data in the table, then empty the table,
-- and then try some FTS queries
-- both before and after the commit.
insert into quotes_uncommitted select author, quote, source from quotes;
commit;
delete from quotes_uncommitted;
select count(author), author as "Deleted but still not committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------------------+
| count(author) | Deleted but still not committed |
+---------------+---------------------------------+
|             0 | NULL                            |
+---------------+---------------------------------+
1 row in set (0.00 sec)

rollback;
select count(author), author as "Deleted and rolled back" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-------------------------+
| count(author) | Deleted and rolled back |
+---------------+-------------------------+
|             4 | Abraham Lincoln         |
+---------------+-------------------------+
1 row in set (0.00 sec)

delete from quotes_uncommitted;
commit;
select count(author), author as "Deleted and committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-----------------------+
| count(author) | Deleted and committed |
+---------------+-----------------------+
|             0 | NULL                  |
+---------------+-----------------------+
1 row in set (0.00 sec)

insert into quotes_uncommitted select author, quote, source from quotes;
commit;
truncate table quotes_uncommitted;
select count(author), author as "Truncated" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-----------+
| count(author) | Truncated |
+---------------+-----------+
|             0 | NULL      |
+---------------+-----------+
1 row in set (0.00 sec)

Multi-Column Searches

Although you can only have one FULLTEXT index in an InnoDB table, that index can apply to multiple columns, allowing you to search when you aren’t sure which column contains the term. With a multi-column index, we can MATCH() against all the columns to find words that appear in any of those columns. Always reference all the same columns in the MATCH() clause as in the FULLTEXT index definition, because the information about which column the words appear in is not included in the full-text search data.

drop table if exists quotes_multi_col;

create table quotes_multi_col
  (
    id int unsigned auto_increment primary key
    , author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(author, quote, source)
  );

insert into quotes_multi_col select * from quotes;
commit;

select author as "Poor 1 (NL)", substr(quote,1,15) as "Poor 2 (NL)", source as "Poor 3 (NL)" from
  quotes_multi_col where match(author, quote, source)
  against ('poor' in natural language mode);
+-------------------+-----------------+-------------------------+
| Poor 1 (NL)       | Poor 2 (NL)     | Poor 3 (NL)             |
+-------------------+-----------------+-------------------------+
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
+-------------------+-----------------+-------------------------+
1 row in set (0.00 sec)

select author as "Poor 1 (BOOL)", substr(quote,1,15) as "Poor 2 (BOOL)", source as "Poor 3 (BOOL)"
  from quotes_multi_col where match(author, quote, source)
  against ('poor' in boolean mode);
+-------------------+-----------------+-------------------------+
| Poor 1 (BOOL)     | Poor 2 (BOOL)   | Poor 3 (BOOL)           |
+-------------------+-----------------+-------------------------+
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
+-------------------+-----------------+-------------------------+
1 row in set (0.00 sec)

select author as "Clarke 1 (NL)", substr(quote,1,15) as "Clarke 2 (NL)", source as "Clarke 3 (NL)"
  from quotes_multi_col where match(author, quote, source)
  against ('clarke' in natural language mode);
+------------------+-----------------+--------------------------------+
| Clarke 1 (NL)    | Clarke 2 (NL)   | Clarke 3 (NL)                  |
+------------------+-----------------+--------------------------------+
| Arthur C. Clarke | Then 10 years a | 2010: The Year We Make Contact |
+------------------+-----------------+--------------------------------+
1 row in set (0.00 sec)

Interaction with Other Indexes

Remember that the design of your primary key index and secondary indexes is a big factor in query performance for InnoDB tables.

  • You can include parts (prefixes) of the full-text column(s) within the primary key.
  • However, that might not be a good idea if (a) the associated columns will ever be updated — which causes an expensive reorganization within the InnoDB table, or (b) if the table will have any other secondary indexes — the primary key values for a row are duplicated in the entry for that row in every secondary index, making index operations require more I/O and memory.
  • As mentioned in Jimmy’s blog post, adding the FULLTEXT index to the table is going to create a new column and associated index in the original table, so you could set up the column and index ahead of time, to avoid table reorganization later.
  • You can use the unique constraint of the primary key or a UNIQUE index to prevent duplicate values or combinations of values from being entered.
  • You can use the not-null constraint of the primary key to prevent blank values or combinations of values from being entered.
  • For the Labs release, the InnoDB FULLTEXT processing isn’t integrated with the MySQL optimizer and its estimates for which index is best to use, so don’t draw conclusions about performance characteristics from this early preview.

Stopwords

Stopwords are typically short, commonly used words that you designate as not significant for a search. They are left out of the FULLTEXT index and ignored when entered in FULLTEXT queries. For example, a search for ‘the’ is unsuccessful because it’s in the default stopword list. For your own customized search, you might create a bigger list (say, with common words from several languages) or a smaller one (for example, a music or movie site where words such as “The” in names and titles are significant). The details about customizing the stopword list are in Jimmy’s blog post.

select count(*), author as "Stopword 1", quote as "Stopword 2", source as "Stopword 3"
  from quotes_multi_col
  where match(author, quote, source) against ('the' in natural language mode);
+----------+------------+------------+------------+
| count(*) | Stopword 1 | Stopword 2 | Stopword 3 |
+----------+------------+------------+------------+
|        0 | NULL       | NULL       | NULL       |
+----------+------------+------------+------------+

PlanetMySQL Voting: Vote UP / Vote DOWN

Improve InnoDB thread scheduling

Июль 26th, 2011

Allow UNDO logs to reside in their own tablespace

Июль 26th, 2011

Introduction

InnoDB stores UNDO log resides in a special system table called the UNDO log. This log is made up of several segments. These segments are called rollback segments. A segment in InnoDB is similar to what a file would be in a file system,e.g., user tables and indexes are also stored as separate segments within the same tablespace,  only their format is different. In that sense there is nothing special about InnoDB UNDO logs. This feature allows storing of the UNDO log across several tablespaces.

Purpose

UNDO logs  contain the before image of modified records. There are two types of UNDO records, one for insert and another for updates. The insert UNDO records can be discarded on transaction rollback. The update records are used for rollback, MVCC and by purge. It is because of purge that we can’t just remove the UNDO log records  once the UNDO logs are not referenced by any running transaction’s snapshot view (for MVCC). When a transaction is started it is assigned a rollback segment in a round robin scheme. Multiple transactions can be assigned the same rollback segment to write their changes. Up to a maximum of 1024 transactions can use a the same rollback segment. If you have more rollback segments then there is less contention around the rollback segment mutex.

Purge

The purge thread(s) run in the background and they read the UNDO log records from oldest to the latest, up to but not including the oldest active snapshot view in the system. It parses the UNDO log records and for entries that represent delete operations it uses the stored index key to search for the records in both the secondary and primary index and purges the entries, modifying the index tree structure if required. Normal DML operations simply delete mark the records but they don’t physically purge them, unless it is an insert that is being rolled back. This is to avoid expensive tree modifying operations in DML code. Once purge is finished with the UNDO entries it then truncates the UNDO log up to where it has processed the entries. For MVCC user transactions we need to follow the DATA_ROLL_PTR pointer to the UNDO log to build a previous version of the row.

Why did we make this change

All this activity results in lots of RANDOM IO. In order to reduce the  of random IO this feature helps by allowing the user to place the UNDO logs anywhere in the file system. This flexibility can be used to place them on an SSD for example.

New configuration variables

  1. innodb_undo_dir – This should be set to the directory where all the configured UNDO tablespaces will be created. Can be an absolute path too. Default is “.”.
  2. innodb_undo_tablespaces – This should be set to the number of UNDO tabespaces that you want to created. Note: This must be set at the time of database creation. Once created this cannot be changed.
  3. innodb_undo_logs (renamed from innodb_rollback_segments) – This should be set to the minimum number of rollback segments that you want. More can be added later but they cannot be removed. However, setting it to a smaller value at runtime will ensure that only the number set will be used when assigning to transactions.

There are a few gotchas here that you should be aware off. The innodb_undo_tablespaces can be set only at the time of database creation and the value must be the same for the lifetime of the database. More UNDO tablespaces cannot be added and existing tablespaces cannot be dropped.  We may add better UNDO tablespace management in the future.

Case 1. 128 segments spread across 128 segments.

  • innodb_undo_logs = 128
  • innodb_undo_tablespaces = 127 (we always create one log in tablespace 0, this is to avoid breaking the system tablespace layout)

Case 2. Start with one rollback segment but may want to create more later and spread them across say 8 UNDO tablespaces

  • innodb_undo_tablespaces = 4
  • innodb_undo_logs = 1

This will create the UNDO tablespaces but they will be empty, the single rollback segment that is created in tablespace 0 will work. Later if you want to add more tablespaces. You will need to shutdown the server and set the innodb_undo_logs variable to the number that you want. Lets say we add 2 more spaces. We would restart the server with the following parameters:

  • innodb_undo_tablespaces = 4
  • innodb_undo_logs = 3  - This will create two new rollback segments in the UNDO tablespace 1 and UNDO tablespace 2. Also this will disable the use of the rollback segment in tablespace 0.

If there were some UNDO records in rollback segment 1 they will be processed as normal. Only change is that no new ones will be written to rollback segment 1. It will be skipped when assigning rollback segments to new transactions.

Future work

Mapping of tablespace id to tablespace name is something that we would like to add so that users can change the path of individual tablespaces instead of all being lumped together in innodb_undo_dir. Managing these extra tablespaces is also something that we think is important especially, the adding and dropping of UNDO tablespaces including better management of the UNDO segments.

Conclusion

As we have seen  writing to the UNDO logs and accessing them can result in a lot of random IO. One way to reduce that overhead is to place the UNDO tablespaces on SSD. This feature allows the placement of the UNDO logs anywhere on the file system in a directory that is controlled by the configuration variable innodb_undo_dir. Note: this change is not backward compatible, databases created with this feature cannot be opened by older versions of InnoDB.  The main problems are that the old code doesn’t open these new system tablespaces before applying REDO and secondly there are some checks that assume space id can only be 0. The file format is not the problem as such.


PlanetMySQL Voting: Vote UP / Vote DOWN

Shortened warm-up times with a preloaded InnoDB buffer pool

Июль 26th, 2011

Are you running an InnoDB installation with a many-gigabytes buffer pool(s)? Does it take too long before it goes back to speed after a restart? If yes, then the following will be interesting to you.

In the latest MySQL 5.6 Labs release we have implemented an InnoDB buffer pool(s) dump and load to solve this problem.

The contents of the InnoDB buffer pool(s) can be saved on disk before MySQL is shutdown and then read up after a restart so that the warm up time is drastically shortened – the buffer pool(s) go to the state they were before the server restart! The time needed for that is roughly the time needed to read data from disk that is about the size of the buffer pool(s).

Lets dive straight into the commands to perform various dump/load operations:

The buffer pool(s) dump can be done at any time when MySQL is running by doing:

  mysql> SET innodb_buffer_pool_dump_now=ON;

This operation is very quick and creates a file named ib_buffer_pool in the InnoDB data directory, by analogy with the default InnoDB data file ibdata1. The same operation can also be done automatically during MySQL shutdown:

  mysql> SET innodb_buffer_pool_dump_at_shutdown=ON;

By analogy the buffer pool(s) can be loaded manually by:

  mysql> SET innodb_buffer_pool_load_now=ON;

and automatically during startup from the file ib_buffer_pool by setting

  innodb_buffer_pool_load_at_startup=ON

in my.cnf or my.ini.

The filename where the buffer pool(s) are dumped and later loaded from can be specified by changing the value of the global configuration variable innodb_buffer_pool_filename which defaults to ib_buffer_pool.

Additionally the progress of the buffer pool(s) dump (although very quick) and load (not so quick) can be monitored by:

  mysql> SHOW STATUS LIKE 'innodb_buffer_pool_dump_status';
  mysql> SHOW STATUS LIKE 'innodb_buffer_pool_load_status';

And finally the load operation can be interrupted by:

  mysql> SET innodb_buffer_pool_load_abort=ON;

Enjoy!


PlanetMySQL Voting: Vote UP / Vote DOWN

Create InnoDB databases with 4k and 8k page sizes without recompiling

Июль 25th, 2011

One of the features found in the summer 2011 labs release is the ability to select the InnoDB page size without recompiling.  Smaller page sizes may be useful for certain storage media such as SSDs where there is no need to minimize seek time between reads.

A new global setting called innodb-page-size can be set to 4k, 8k or 16k before creating a new MySQL instance. This sets the page size for all tablespaces used by that InnoDB instance.   This can be done in my.cnf or on the mysqld command line.  It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace.  That happens when InnoDB does not find ibdata1 in the data directory.  If your system tablespace already exists using one page size and innodb-page-size is something else, the engine will not start.

A few bugs were found and fixed related to smaller page sizes in InnoDB tablespaces. Those bug fixes can also be found in the 2011 summer labs release.

The mysql-test-run can be run like this to test a smaller page size;

perl mysql-test-run.pl –mysqld=–innodb-page-size=4k –suite=innodb –force

Other features of InnoDB work the same with smaller page sizes but some limits are affected.  While each of the 4 row formats; Redundant, Compact, Compressed and Dynamic are supported, the maximum record size is less.  Record length limits are calculated within InnoDB based on a bunch of factors including row type, column type and length, number of columns, secondary indexes, index prefix lengths, and of course, the page size.  Since the main record is stored in a clustered index, a minimum of two records must fit in each page.  So the maximum record length for 8k pages is about half that of 16k pages and the max record size with 4k pages is half that of 8k pages.

If you have a database schema with any large records, you may not be able to use smaller page sizes.  Even if your records do barely fit in the clustered index page, it may not be advisable to use these smaller pages because the btree will be a lot deeper.  For example, if only 2 records fit on the page and there are  1,000,000 records, leaf pages are 20 levels down meaning InnoDB will need to read and search 20 pages to find the leaf page.  If that were on 4k pages, then using the same table on 16k pages would give 8 records per page and the leaf pages would only be 7 levels down.

There is a trick to reducing the size of records on the clustered index page.  VARCHAR fields can be converted to TEXT fields. (And VARBINARY can be converted to BLOB.)  InnoDB tries to put as much of VARCHAR fields in the clustered record as it can.  Normally, it will put as much as 767 bytes which is the index prefix limit on long fields.  But this labs release also contains a feature to increase that limit to 3072 bytes.  So if you have VARCHAR fields that use hundreds or thousands of bytes, they can make the clustered record very long.  TEXT columns, like BLOB columns, are stored in BLOB pages.  Only 20 bytes is needed in the record to link to that data.  Long VARCHAR fields are split at 767 bytes (or 3072 bytes if innodb_large_prefix=ON) and that 20 bytes is added to the clustered record to link to the rest of those fields.  If your VARCHAR(5000) field rarely ever uses more than 767 bytes, it will rarely be split up.  Still, you might get better performance by reducing the depth of the btree if you define long columns as TEXT instead of VARCHAR.

A file extent in InnoDB is 1 Mb independent of the page size.  So an extent will hold 64  16k pages, 128 8k pages and 256 4k pages.  This means that the read ahead mechanisms will read more pages with smaller page sizes since they read a whole extent at a time.  The doublewrite buffer, which is based on the size of an extent, will also contain more pages.

If you want to use smaller page sizes with existing data, export the data first with a logical export utility such as mysqldump.  Then create the new mysql instance with innodb-page-size=4k or 8k and import the data.  Do not use a physical export method such as alter table … discard tablespace.

Summary:

This feature makes it easier to try smaller page sizes in an InnoDB database. Just export your data, move or delete the system database (ibdata1) and the log files (ib_logfile0 & ib_logfile1), set innodb-page-size to either 4k or 8k, and restart MySQL.  A new InnoDB instance will be created with the smaller page size.  Then you can import your data and run your tests, all without recompiling InnoDB.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.5: InnoDB Change Buffering

Сентябрь 19th, 2010

To speed up bulk loading of data, InnoDB implements an insert buffer, a special index in the InnoDB system tablespace that buffers modifications to secondary indexes when the leaf pages are not in the buffer pool. Batched merges from the insert buffer to the index pages result in less random access patterns than when updating the pages directly. This speeds up the operation on hard disks.

In MySQL 5.5, the insert buffer has been extended to a change buffer, which covers all modifications of secondary index leaf pages. This will improve the performance of bulk deletes and updates, transaction rollback and the purging of deleted records (reducing the “purge lag”).

To assess the benefits of the extended buffering, you may want to run benchmarks with the settings innodb_change_buffering=all, innodb_change_buffering=inserts, and innodb_change_buffering=none. Users of solid-state storage, where random reads are about as fast as sequential reads, might benefit from disabling the buffering altogether.

Read on to learn how the change buffering works.

Operations on Secondary Indexes

InnoDB can perform three kinds of modifications on secondary index records. If the affected index page is not in the buffer pool, the modifications can be buffered in the change buffer. When an index lookup or scan needs a page that is not in the buffer pool, the page will be read from the tablespace and any buffered changes will be merged to it.

The following operations can modify secondary index pages:

Insert
Inserting a record; supported in all versions of InnoDB
Delete-mark
Marking a record for deletion
Purge
Removing a deleted record that is no longer accessible by active transactions

Before MySQL 5.5, UPDATE, DELETE and purge operations were performed directly on the index pages, resulting in random-access I/O. In MySQL 5.5, all of these operations can be buffered.

Implications of InnoDB Multiversioning

In InnoDB, there are two types of indexes: the clustered index B-tree, where the records are stored in the PRIMARY KEY order, and secondary index B-trees, which identify rows by primary key. InnoDB multiversion concurrency control (MVCC) treats these indexes differently.

Records in the clustered index can be updated in place, and their hidden system columns DB_TRX_ID, DB_ROLL_PTR point to undo log entries from which earlier versions can be reconstructed. InnoDB secondary index records do not contain any system columns, and their data is never updated in place. An UPDATE of an indexed column requires the operations Delete-mark(old), Insert(new) and eventually Purge(old) in the secondary index. An UPDATE of a PRIMARY KEY results in Delete-mark, Insert and eventually Purge in all indexes.

When a secondary index record has been marked for deletion or when the page has been updated by a newer transaction, InnoDB will look up the clustered index record. In the clustered index, it suffices to check the DB_TRX_ID and only retrieve the correct version from the undo log when the record was modified after the reading transaction started.

To Buffer or not to Buffer

When a page is in the buffer pool, it will always be updated directly. When a page is loaded to the buffer pool, any buffered changes will be merged to it, so that users never see unmerged changes.

Because change buffering works on individual leaf pages, we cannot buffer changes that would result into page splits or merges, but must perform such changes on the B-tree pages directly.

The insert buffer bitmap keeps track on the available space on pages and prevents overflows when buffering inserts. Delete-marking records can always be buffered, because the flag will be updated in place. Purging a delete-marked record could result in an empty page, something that we do not allow. We determine the non-emptiness of a page from previously buffered operations on the same page. If there are no previously buffered operations, the purge will have to load the index page to the buffer pool.

InnoDB refuses to buffer an operation when the on-disk change buffer tree would grow bigger than ⅓ of the in-memory buffer pool (innodb_buffer_pool_size). This might be a good rule-of-thumb, but some setups could benefit from the ability of setting the change buffer size independently of the buffer pool size.

Conclusion

The InnoDB change buffer is a persistent data structure and a complex mechanism that comes into play when the workload does not fit in the buffer pool. Because it trades random I/O with a larger amount of sequential I/O, it speeds up operation on hard disks, where random access is much slower than sequential access.

On solid-state storage, there is not much difference between sequential and random access times. Change buffering may still be useful if writes to solid-state storage are expensive, either in terms of speed or the consumption of limited program/erase cycles. Change buffering could reduce the write load on user tablespaces and cause more writes to the system tablespace (which contains the insert buffer) and the redo log. These should be placed on a hard disk.


PlanetMySQL Voting: Vote UP / Vote DOWN