Archive for the ‘myisam’ Category

Getting rid of huge ibdata file, no dump required

Май 22nd, 2012

You have been told (guilty as charged), that the only way to get rid of the huge InnoDB tablespace file (commonly named ibdata1), when moving to innodb_file_per_table, is to do a logical dump of your data, completely erase everything, then import the dump.

To quickly reiterate, you can only delete the ibdata1 file when no InnoDB tables exist. Delete this file with an existing InnoDB table, even a table in its own tablespace, and nothing ever works anymore.

The problem with the dump-based solution

The impact of doing a logical dump is often overwhelming. Well, the dump may be tolerable, but the restore is much longer. The real pain is that you can't do this one table at a time: you have to destroy everything before dropping the ibdata1 file; you then have to import everything.

Perhaps the most common scenario is that we do the changes on a slave, so as not to completely shut down our database. This is nice; no one is aware of the shutdown process. However, Huston, we have a problem: we need to make sure we can keep up the binary logs on the master for the duration of the entire process.

A semi-solution for binary logs

You may get by by keeping the SQL_IO_THREAD running on the slave while dump is taken (SQL thread is better turned off). If you're careful, you could do the same after restarting the database: you should still be able to acquire relay logs. With row based replication becoming more common, the problem of binary logs disk space returns: the logs (rather, log entries) are just so much larger!

Either way, the process can takes long days, at the end of which your slave is up, but lags for long days behind.

Wishful thought: do it one table at a time

If we could do it one table at a time, and assuming our dataset is fairly split among several tables (i.e. not all of our 500GB of data is in one huge table), life would be easier: we could work on a single table, resume replication, let the slave catch up, then do the same for the next table.

How? Didn't we just say one can only drop the ibdata1 file when no InnoDB tables exist?

Solution: do it one table at a time

I'm going to illustrate what seems like a longer procedure. I will later show why it is not, in fact, longer.

The idea is to first convert all your tables to MyISAM (Yay! A use for MyISAM!). That is, convert your tables one table at a time, using normal ALTER TABLE t ENGINE=MyISAM.

Please let go of the foreign keys issue right now. I will address it later, there's a lot to be addressed.

So, on a slave:

  1. STOP SLAVE
  2. One ALTER TABLE ... ENGINE=MyISAM
  3. START SLAVE again
  4. Wait for slave catch up
  5. GOTO 1

What do we end up with? A MyISAM only database. What do we do with it? Why, convert it back to InnoDB, of course!

But, before that, we:

  1. Shut MySQL down
  2. Delete ibdata1 file, ib_logfile[01] (i.e. delete all InnoDB files)
  3. Start MySQL

A new ibdata1 file, and new transaction log files will be created. Note: the new ibdata1 file is small. Mission almost accomplished.

We then:

  1. STOP SLAVE
  2. Do one ALTER TABLE ... ENGINE=InnoDB [ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 ...]
  3. START SLAVE again
  4. Wait for slave catch up
  5. GOTO 1

What do we end up with? An InnoDB only database, with true file per table, and a small ibdata1 file. Space recovered!

The advantage of this method

The thing is, we resume replication after each table alteration. This means breaking the lag period into many smaller periods. While the total runtime does not reduce, we do reduce the maximum lag time. And this makes for easier recovery: no need to store multitudes of binary logs!

So what about the foreign keys?

Phew. Continued next post.


PlanetMySQL Voting: Vote UP / Vote DOWN

Performance improvements for big INFORMATION_SCHEMA tables

Май 4th, 2012
A short while after I fixed the legacy bug that prevented temporary MyISAM tables from using the dynamic record format, I got an email from Davi Arnaut @ Twitter. It turned out that Twitter needed to fix the very same problem, but for the case when INFORMATION_SCHEMA temporary tables use MyISAM.

In short, INFORMATION_SCHEMA tables provide access to database metadata. Despite their name, they are more like views than tables: when you query them, relevant data is gathered from the dictionary and other server internals, not from tables. The gathered data is stored in a temporary table (memory or MyISAM depending on size) and then returned to the user.

The reason Davi emailed me was to let me know that he had further improved the fix for temporary MyISAM tables to also enable the use of dynamic record format for INFORMATION_SCHEMA tables. I usually don't have huge databases on my development box so the problem of querying metadata had gone unnoticed. But it turns out that Davi and his colleagues at Twitter do deal with massive amounts of data :-)

This was one of the queries that caused problems:

SELECT pool_id FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE limit 1;

Read more »

PlanetMySQL Voting: Vote UP / Vote DOWN

Dedicated table for counters

Апрель 23rd, 2012

There are a few ways to implement counters. Even though it’s not a complex feature, often I see people having problems around it. This post describes how bad implementation can impact both application and MySQL performance and how to improve it.

A customer asked me for help with performance problem they were facing. I logged into their database and found many client connections waiting for table locks. Almost all threads were stuck on one, small table called hits. What was the reason?

The problem was related to the way they developed a very simple system for counting page views they later used in some reporting. The table structure was:

mysql> SHOW CREATE TABLE hits\G
*************************** 1. row ***************************
Table: hits
Create Table: CREATE TABLE `hits` (
`cnt` int(11) NOT NULL
) ENGINE=MyISAM

mysql> SELECT * FROM hits;
+---------+
| cnt     |
+---------+
| 3823273 |
+---------+

The application was updating cnt column on every page load, or sometimes even multiple times per page load (although that was actually a bug), with such obvious query:
UPDATE hits SET cnt = cnt + 1;

It may not scale with traffic. MyISAM storage engine uses table level locking, which means that every UPDATE sets an exclusive lock on the table and forces any other threads accessing the same table concurrently to wait for the lock to be released. When traffic peaks beyond certain threshold, the cascade of locks may simply prevent the application from working at all. This is what happened to the customer during a traffic spike coming from some marketing campaign they launched.

Switching to InnoDB may seem like a natural choice, but does it really help here? Not so much. At the very least, it may not be enough.

InnoDB implements row-level locking mechanism which is better as locks are set on smaller chunks of data – individual records rather than entire tables. It means that concurrent queries updating different rows in the same table would not block each other, unlike in MyISAM. However, the table contains only one row, so that particular advantage does really exist there.

When row-level locking might become a bottleneck, why not just use more rows that could be updated instead of just one?

CREATE TABLE `hits` (
`id` tinyint(4) NOT NULL,
`cnt` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB

This enables multiple counters, each of which can be incremented independently of others, and what’s more important also concurrently.

mysql> SELECT * FROM hits;
+------+------+
| id   | cnt  |
+------+------+
|    1 |   10 |
|    2 |    8 |
|    3 |    9 |
|    4 |    8 |
+------+------+

How to use the updated structure?

The technique is pretty much the same as used before, but with a small modification in the application code – it has to randomly choose a row that gets updated though the value of id column.
UPDATE hits SET cnt = cnt + 1 WHERE id = 3;

Of course, with such modification the way data is read has to change as well.

mysql> SELECT SUM(cnt) FROM hits;
+----------+
| SUM(cnt) |
+----------+
|       35 |
+----------+

Even though MySQL has to perform more work, the solution can scale better than the original design.

Often using one table per “counter” will not be so good idea, but of course there’s no problem extending this further. For example:

CREATE TABLE `hits` (
`slug` VARCHAR(32) NOT NULL,
`slug_id` tinyint(4) NOT NULL,
`cnt` int(11) NOT NULL,
PRIMARY KEY (`slug`, `slug_id`)
) ENGINE=InnoDB

Some counters could have only one row, while those updated more frequently could have multiple.


PlanetMySQL Voting: Vote UP / Vote DOWN

Copying unused bytes is bad (duh!)

Апрель 18th, 2012
Last summer my colleague Marko Mäkelä committed this seemingly innocent performance fix for InnoDB in MySQL 5.6:

3581 Marko Makela    2011-08-10
Bug#12835650 VARCHAR maximum length performance impact

row_sel_field_store_in_mysql_format(): Do not pad the unused part of
the buffer reserved for a True VARCHAR column (introduced in 5.0.3).
Add Valgrind instrumentation ensuring that the unused part will be
flagged uninitialized.

Before this, buffers which were used to send VARCHARs from InnoDB to the MySQL server were padded with 0s if the string was shorter than specified by the column. If, e.g., the string "foo" was stored in a VARCHAR(8), InnoDB used to write "3foo00000" to the buffer (the first character - 3 - determines the actual length of the string). However, even though these trailing bytes are not used anywhere, writing 0s to the buffer certainly has a cost. Hence the fix which stops InnoDB from writing them.

Oh my were we up for a ride! All of a sudden Valgrind started barking like this (and similar) for a number of tests:
Thread 19:
Syscall param pwrite64(buf) points to uninitialised byte(s)
at 0x381C60EEE3: ??? (in /lib64/libpthread-2.12.so)
by 0x84B703: my_pwrite (my_pread.c:162) by 0x86FE6F: mi_nommap_pwrite (mysql_file.h:1203)
   by 0x88621B: _mi_write_static_record (mi_statrec.c:65)
   by 0x889592: mi_write (mi_write.c:142)
   by 0x532085: handler::ha_write_row(unsigned char*) (handler.cc:6043)
   by 0x69B7F7: end_write(JOIN*, st_join_table*, bool) (sql_select.cc:20237)
   by 0x69A43C: evaluate_join_record(JOIN*, st_join_table*, int)
(sql_select.cc:19187)
   by 0x69A8E0: sub_select(JOIN*, st_join_table*, bool) (sql_select.cc:19294)
The modifications we did to the server to remedy the problems can be divided into two categories as described below. In the end, the result was significantly improved performance.

Read more »

PlanetMySQL Voting: Vote UP / Vote DOWN

Auto caching tables

Март 6th, 2012

Is there a way to create a caching table, some sort of a materialized view, such that upon selecting from that table, its data is validated/invalidated?

Hint: yes.

But to elaborate the point: say I have some table data_table. Can I rewrite all my queries which access data_table to read from some autocache_data_table, but have nothing changed in the query itself? No caveats, no additional WHEREs, and still have that autocache_data_table provide with the correct data, dynamically updated by some rule of our choice?

And: no crontab, no event scheduler, and no funny triggers on data_table? In such way that invalidation/revalidation occurs upon SELECT?

Well, yes.

This post is long, but I suggest you read it through to understand the mechanism, it will be worthwhile.

Background

The following derives from my long research on how to provide better, faster and safer access to INFORMATION_SCHEMA tables. It is however not limited to this exact scenario, and in this post I provide with a simple, general purpose example. I'll have more to share about INFORMATION_SCHEMA specific solutions shortly.

I was looking for a server side solution which would not require query changes, apart from directing the query to other tables. Solution has to be supported by all standard MySQL installs; so: no plugins, no special rebuilds.

Sample data

I'll explain by walking through the solution. Let's begin with some sample table:

CREATE TABLE sample_data (
  id INT UNSIGNED NOT NULL PRIMARY KEY,
  dt DATETIME,
  msg VARCHAR(128) CHARSET ascii
);

INSERT INTO sample_data VALUES (1, NOW(), 'sample txt');
INSERT INTO sample_data VALUES (2, NOW(), 'sample txt');
INSERT INTO sample_data VALUES (3, NOW(), 'sample txt');

SELECT * FROM sample_data;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

In this simplistic example, I wish to create a construct which looks exactly like sample_data, but which caches data according to some heuristic. It will, in fact, cache the entire content of sample_data.

That much is not a problem: just create another table to cache the data:

CREATE TABLE cache_sample_data LIKE sample_data;

The big question is: how do you make the table invalidate itself while SELECTing from it?

Here's the deal. I'll ask for your patience while I draw the outline, and start with failed solutions. By the end, everything will work.

Failed attempt: purge rows from the table even while reading it

My idea is to create a stored function which purges the cache_sample_data table, then fills in with fresh data, according to some heuristic. Something like this:

DELIMITER $$

CREATE FUNCTION `revalidate_cache_sample_data`() RETURNS tinyint unsigned
    MODIFIES SQL DATA
    DETERMINISTIC
    SQL SECURITY INVOKER
BEGIN
  if(rand() > 0.1) then
    return 0; -- simplistic heuristic
  end if;

  DELETE FROM cache_sample_data;
  INSERT INTO cache_sample_data SELECT * FROM sample_data;
  RETURN 0;
END $$

DELIMITER ;

So the function uses some heuristic. It's a funny RAND() in our case; you will want to check up on time stamps, or some flags, what have you. But this is not the important part here, and I want to keep the focus on the main logic.

Upon deciding the table needs refreshing, the function purges all rows, then copies everything from sample_data. Sounds fair enough?

Let's try and invoke it. Just write some query by hand:

mysql> SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              0 |
+--------------------------------+

mysql> SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              0 |
+--------------------------------+

mysql> SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              1 |
+--------------------------------+

First two invocations - nothing. The third one indicated a revalidation of cache data. Let's verify:

mysql> SELECT * FROM cache_sample_data;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

OK, seems like the function works.

We now gather some courage, and try combining calling to this function even while SELECTing from the cache table, like this:

SELECT
  cache_sample_data.*
FROM
  cache_sample_data,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

To explain what happens in the above query, consider its programmatic nature: we create a derived table, populated by the function's result. That means the function is invoked in order to generate the derived table. The derived table itself must be materialized before the query begins execution, and so it is that we first invoke the function, then make the SELECT.

Don't open the champaign yet. While the above paragraph is correct, we are deceived: in this last invocation, the function did not attempt a revalidation. The RAND() function just didn't provide with the right value.

Let's try again:

SELECT
  cache_sample_data.*
FROM
  cache_sample_data,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
ERROR 1442 (HY000): Can't update table 'cache_sample_data' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.

Aha! Bad news. The MySQL manual says on Restrictions on Stored Programs:

A stored function or trigger cannot modify a table that is already being used (for reading or writing) by the statement that invoked the function or trigger.

Anyone to the rescue?

I was quite upset. Can we not make this work? At sorrow times like these, one reflects back on words of wiser people. What would Roland Bouman say on this?

Oh, yes; he would say: "we can use a FEDERATED table which connect onto itself, thus bypass the above restriction".

Unfortunately, FEDERATED is by default disabled nowadays; I cannot rely on its existence. Besides, to use FEDERATED one has to fill in passwords and stuff. Definitely not an out-of-the-box solution in this case.

Few more days gone by. Decided the problem cannot be solved. And then it hit me.

MyISAM to the rescue

MyISAM? Really?

Yes, and not only MyISAM, but also its cousin: it's long abandoned cousin, forgotten once views and partitions came into MySQL. MERGE.

MERGE reflects the data contained within MyISAM tables. Perhaps the most common use for MERGE is to work out partitioned-like table of records, with MyISAM table-per month, and an overlooking MERGE table dynamically adding and removing tables from its view.

But I intend for MERGE a different use: just be an identical reflection of cache_sample_data.

So we must work out the following:

ALTER TABLE cache_sample_data ENGINE=MyISAM;
CREATE TABLE cache_sample_data_wrapper LIKE cache_sample_data;
ALTER TABLE cache_sample_data_wrapper ENGINE=MERGE UNION=(cache_sample_data);

I just want to verify the new table is setup correctly:

mysql> SELECT * FROM cache_sample_data_wrapper;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

Seems fine.

So the next step is what makes the difference: the two tables are not the same. One relies on the other, but they are distinct. Our function DELETEs from and INSERTs to cached_sample_data, but it does not affect, nor lock, cache_sample_data_wrapper.

We now rewrite our query to read:

SELECT
  cache_sample_data_wrapper.*
FROM
  cache_sample_data_wrapper,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;

This query is perfectly valid. It works. To illustrate, I do:

-- Try this a few times till RAND() is lucky:

TRUNCATE cache_sample_data;

SELECT
  cache_sample_data_wrapper.*
FROM
  cache_sample_data_wrapper,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

Whoa! Where did all this data come from? Didn't we just TRUNCATE the table?

The query worked. The function re-populated cache_sample_data.

The final touch

Isn't the above query just beautiful? I suppose not many will share my opinion. What happened to my declaration that "the original query need not be changed, apart from querying a different table"?

Yes, indeed. It's now time for the final touch. There's nothing amazing in this step, but we all know the way it is packaged is what makes the sale. We will now use views. We use two of them since a view must not contain a subquery in the FROM clause. Here goes:

CREATE OR REPLACE VIEW revalidate_cache_sample_data_view AS
  SELECT revalidate_cache_sample_data()
;

CREATE OR REPLACE VIEW autocache_sample_data AS
  SELECT
    cache_sample_data_wrapper.*
  FROM
    cache_sample_data_wrapper,
    revalidate_cache_sample_data_view
;

And finally, we can make a very simple query like this:

SELECT * FROM autocache_sample_data;
--
-- Magic in work now!
--
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

Much as we would query the original sample_data table.

Summary

So what have we got? A stored routine, a MyISAM table, a MERGE table and two views. Quite a lot of constructs just to cache a table! But a beautiful cache access: plain old SQL queries. The flow looks like this:

Our cache table is a MyISAM table. It can get corrupted, which is bad. But not completely bad: it's nothing more than a cache; we can throw away its entire data, and revalidate. We can actually ask the function to revalidate (say, pass a parameter).


PlanetMySQL Voting: Vote UP / Vote DOWN

Auto caching tables

Март 6th, 2012

Is there a way to create a caching table, some sort of a materialized view, such that upon selecting from that table, its data is validated/invalidated?

Hint: yes.

But to elaborate the point: say I have some table data_table. Can I rewrite all my queries which access data_table to read from some autocache_data_table, but have nothing changed in the query itself? No caveats, no additional WHEREs, and still have that autocache_data_table provide with the correct data, dynamically updated by some rule of our choice?

And: no crontab, no event scheduler, and no funny triggers on data_table? In such way that invalidation/revalidation occurs upon SELECT?

Well, yes.

This post is long, but I suggest you read it through to understand the mechanism, it will be worthwhile.

Background

The following derives from my long research on how to provide better, faster and safer access to INFORMATION_SCHEMA tables. It is however not limited to this exact scenario, and in this post I provide with a simple, general purpose example. I'll have more to share about INFORMATION_SCHEMA specific solutions shortly.

I was looking for a server side solution which would not require query changes, apart from directing the query to other tables. Solution has to be supported by all standard MySQL installs; so: no plugins, no special rebuilds.

Sample data

I'll explain by walking through the solution. Let's begin with some sample table:

CREATE TABLE sample_data (
  id INT UNSIGNED NOT NULL PRIMARY KEY,
  dt DATETIME,
  msg VARCHAR(128) CHARSET ascii
);

INSERT INTO sample_data VALUES (1, NOW(), 'sample txt');
INSERT INTO sample_data VALUES (2, NOW(), 'sample txt');
INSERT INTO sample_data VALUES (3, NOW(), 'sample txt');

SELECT * FROM sample_data;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

In this simplistic example, I wish to create a construct which looks exactly like sample_data, but which caches data according to some heuristic. It will, in fact, cache the entire content of sample_data.

That much is not a problem: just create another table to cache the data:

CREATE TABLE cache_sample_data LIKE sample_data;

The big question is: how do you make the table invalidate itself while SELECTing from it?

Here's the deal. I'll ask for your patience while I draw the outline, and start with failed solutions. By the end, everything will work.

Failed attempt: purge rows from the table even while reading it

My idea is to create a stored function which purges the cache_sample_data table, then fills in with fresh data, according to some heuristic. Something like this:

DELIMITER $$

CREATE FUNCTION `revalidate_cache_sample_data`() RETURNS tinyint unsigned
    MODIFIES SQL DATA
    DETERMINISTIC
    SQL SECURITY INVOKER
BEGIN
  if(rand() > 0.1) then
    return 0; -- simplistic heuristic
  end if;

  DELETE FROM cache_sample_data;
  INSERT INTO cache_sample_data SELECT * FROM sample_data;
  RETURN 0;
END $$

DELIMITER ;

So the function uses some heuristic. It's a funny RAND() in our case; you will want to check up on time stamps, or some flags, what have you. But this is not the important part here, and I want to keep the focus on the main logic.

Upon deciding the table needs refreshing, the function purges all rows, then copies everything from sample_data. Sounds fair enough?

Let's try and invoke it. Just write some query by hand:

mysql> SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              0 |
+--------------------------------+

mysql> SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              0 |
+--------------------------------+

mysql> SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              1 |
+--------------------------------+

First two invocations - nothing. The third one indicated a revalidation of cache data. Let's verify:

mysql> SELECT * FROM cache_sample_data;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

OK, seems like the function works.

We now gather some courage, and try combining calling to this function even while SELECTing from the cache table, like this:

SELECT
  cache_sample_data.*
FROM
  cache_sample_data,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

To explain what happens in the above query, consider its programmatic nature: we create a derived table, populated by the function's result. That means the function is invoked in order to generate the derived table. The derived table itself must be materialized before the query begins execution, and so it is that we first invoke the function, then make the SELECT.

Don't open the champaign yet. While the above paragraph is correct, we are deceived: in this last invocation, the function did not attempt a revalidation. The RAND() function just didn't provide with the right value.

Let's try again:

SELECT
  cache_sample_data.*
FROM
  cache_sample_data,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
ERROR 1442 (HY000): Can't update table 'cache_sample_data' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.

Aha! Bad news. The MySQL manual says on Restrictions on Stored Programs:

A stored function or trigger cannot modify a table that is already being used (for reading or writing) by the statement that invoked the function or trigger.

Anyone to the rescue?

I was quite upset. Can we not make this work? At sorrow times like these, one reflects back on words of wiser people. What would Roland Bouman say on this?

Oh, yes; he would say: "we can use a FEDERATED table which connect onto itself, thus bypass the above restriction".

Unfortunately, FEDERATED is by default disabled nowadays; I cannot rely on its existence. Besides, to use FEDERATED one has to fill in passwords and stuff. Definitely not an out-of-the-box solution in this case.

Few more days gone by. Decided the problem cannot be solved. And then it hit me.

MyISAM to the rescue

MyISAM? Really?

Yes, and not only MyISAM, but also its cousin: it's long abandoned cousin, forgotten once views and partitions came into MySQL. MERGE.

MERGE reflects the data contained within MyISAM tables. Perhaps the most common use for MERGE is to work out partitioned-like table of records, with MyISAM table-per month, and an overlooking MERGE table dynamically adding and removing tables from its view.

But I intend for MERGE a different use: just be an identical reflection of cache_sample_data.

So we must work out the following:

ALTER TABLE cache_sample_data ENGINE=MyISAM;
CREATE TABLE cache_sample_data_wrapper LIKE cache_sample_data;
ALTER TABLE cache_sample_data_wrapper ENGINE=MERGE UNION=(cache_sample_data);

I just want to verify the new table is setup correctly:

mysql> SELECT * FROM cache_sample_data_wrapper;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

Seems fine.

So the next step is what makes the difference: the two tables are not the same. One relies on the other, but they are distinct. Our function DELETEs from and INSERTs to cached_sample_data, but it does not affect, nor lock, cache_sample_data_wrapper.

We now rewrite our query to read:

SELECT
  cache_sample_data_wrapper.*
FROM
  cache_sample_data_wrapper,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;

This query is perfectly valid. It works. To illustrate, I do:

-- Try this a few times till RAND() is lucky:

TRUNCATE cache_sample_data;

SELECT
  cache_sample_data_wrapper.*
FROM
  cache_sample_data_wrapper,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

Whoa! Where did all this data come from? Didn't we just TRUNCATE the table?

The query worked. The function re-populated cache_sample_data.

The final touch

Isn't the above query just beautiful? I suppose not many will share my opinion. What happened to my declaration that "the original query need not be changed, apart from querying a different table"?

Yes, indeed. It's now time for the final touch. There's nothing amazing in this step, but we all know the way it is packaged is what makes the sale. We will now use views. We use two of them since a view must not contain a subquery in the FROM clause. Here goes:

CREATE OR REPLACE VIEW revalidate_cache_sample_data_view AS
  SELECT revalidate_cache_sample_data()
;

CREATE OR REPLACE VIEW autocache_sample_data AS
  SELECT
    cache_sample_data_wrapper.*
  FROM
    cache_sample_data_wrapper,
    revalidate_cache_sample_data_view
;

And finally, we can make a very simple query like this:

SELECT * FROM autocache_sample_data;
--
-- Magic in work now!
--
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+

Much as we would query the original sample_data table.

Summary

So what have we got? A stored routine, a MyISAM table, a MERGE table and two views. Quite a lot of constructs just to cache a table! But a beautiful cache access: plain old SQL queries. The flow looks like this:

Our cache table is a MyISAM table. It can get corrupted, which is bad. But not completely bad: it's nothing more than a cache; we can throw away its entire data, and revalidate. We can actually ask the function to revalidate (say, pass a parameter).


PlanetMySQL Voting: Vote UP / Vote DOWN

FictionPress Selects TokuDB for Consistent Performance and Fast Disaster Recovery

Январь 3rd, 2012

FictionPress

Issues addressed:

  • Support complex and efficient indexes at 100+ million rows.
  • Predicable and consistent performance regardless of data size growth.
  • Fast recovery.

Ensuring Predictable Performance at Scale

The Company:  FictionPress operates both FictionPress.com and FanFiction.net and is home to over 6 million works of fiction, with millions of writers/readers participating from around the world in over 30 languages

The Challenge: FictionPress offers a number of interactive features to its large user base. These include discussion forums, in-site messaging and user reviews. FictionPress made the decision to build its own discussion forums to meet its strict security and performance requirements. Xing Li, CTO of FictionPress, noted that the site “needs to host hundreds of thousands of forums. Existing forum software doesn’t do this while meeting our performance and security targets.”

To ensure the real-time responsiveness of the forums, FictionPress needs the ability to create and efficiently maintain complex indexes and be able to support millions of small rows. In addition, it needs the ability to index them with minimal impact to resource costs and performance. “The only way to make this all work and provide a good customer experience is to guarantee that we can deliver a flat predictable performance with our database back-end even as the number of rows crosses the 100 million mark,” according to Li.

FictionPress considered InnoDB, the default storage engine for MySQL, but it did not offer predictable performance at scale. Indexes became dramatically slower as the number of rows increased, causing a reduction of both read and write performance. InnoDB also did not offer the performance-enhancing feature of multiple clustering indexes.

The Solution:  FictionPress uses MariaDB and TokuDB to manage its discussion forums, reviews, and in-site messaging systems.

FictionPress installed TokuDB in a Linux environment with dedicated hardware. Each configuration has a single master with multiple read slaves. “TokuDB’s high write concurrency and support for multiple clustering indexes gave us the freedom to design and deploy better performing queries at scale,” according to Li. This was important to FictionPress as its environment is continually expanding.

The Benefits:

Predictable Performance: “While raw performance is important, the predictability of response time as one scales the system was our focal point” according to Li. “InnoDB can only have one clustering index, but TokuDB gives you basically an unlimited number. In addition, both MyISAM and InnoDB slow down with many indexes on databases of our size. MyISAM also causes replication lag due to concurrency. In the end, TokuDB gives us predictability, performance at scale, and more flexible indexing without the limitations found in other MySQL options.”

Cost: “To get additional performance, one can always throw hardware at the problem,” according to Li. “By utilizing TokuDB instead we improved scalability and at the same time saved on costs for additional server hardware that would have been required if TokuDB was not in the picture. In addition, we saw an 8x size reduction in disk space compared to MyISAM due to improved compression. The hardware cost saving made moving to TokuDB an easy decision.”

Crash Recovery: FictionPress had been using MyISAM initially. “We needed a replacement for MyISAM for small BLOB data,” according to Li. “In fact, we wanted to move away from MyISAM whenever possible to shorten its long crash recovery. InnoDB was an option but TokuDB offered better compression and a smaller storage footprint for both core data and index data for our own data sets.”

Hot Schema Changes: “For performance reasons we need a lot of indexes but also need to add and maintain these indexes quickly,” according to Li. “TokuDB is the only MySQL solution I found that offers Hot Schema changes such as Hot Indexing. Hot Schema changes are a powerful capability which we use to minimize downtime during system-wide upgrades and shorten our application/schema development cycle.”

 

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Self throttling MySQL queries

Ноябрь 1st, 2011

Recap on the problem:

  • A query takes a long time to complete.
  • During this time it makes for a lot of I/O.
  • Query's I/O overloads the db, making for other queries run slow.

I introduce the notion of self-throttling queries: queries that go to sleep, by themselves, throughout the runtime. The sleep period means the query does not perform I/O at that time, which then means other queries can have their chance to execute.

I present two approaches:

  • The naive approach: for every 1,000 rows, the query sleep for 1 second
  • The factor approach: for every 1,000 rows, the query sleeps for the amount of time it took to iterate those 1,000 rows (effectively doubling the total runtime of the query).

Sample query

We use a simple, single-table scan. No aggregates (which complicate the solution considerably).

SELECT
  rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental
;

The naive solution

We need to know every 1,000 rows. So we need to count the rows. We do that by using a counter, as follows:

SELECT
  rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days,
  @row_counter := @row_counter + 1
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;

A thing that bothers me, is that I wasn't asking for an additional column. I would like the result set to remain as it were; same result structure. We also want to sleep for 1 second for each 1,000 rows. So we merge the two together along with one of the existing columns, like this:

SELECT
  rental_id +
    IF(
      (@row_counter := @row_counter + 1) % 1000 = 0,
      SLEEP(1), 0
    ) AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;

To remain faithful to my slides, I rewrite as follows, and this is the naive solution:

SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) % 1000 = 0 THEN SLEEP(1)
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;

The WHEN clause always returns 0, so it does not affect the value of rental_id.

The factor approach

In the factor approach we wish to keep record of query execution, every 1,000 rows. I introduce a nested WHEN statement which updates time records. I rely on SYSDATE() to return the true time, and on NOW() to return query execution start time.

SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) IS NULL THEN NULL
      WHEN @row_counter % 1000 = 0 THEN
        CASE
          WHEN (@time_now := SYSDATE()) IS NULL THEN NULL
          WHEN (@time_diff := (TIMESTAMPDIFF(SECOND, @chunk_start_time, @time_now))) IS NULL THEN NULL
          WHEN SLEEP(@time_diff) IS NULL THEN NULL
          WHEN (@chunk_start_time := SYSDATE()) IS NULL THEN NULL
          ELSE 0
        END
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter,
  (SELECT @chunk_start_time := NOW()) sel_chunk_start_time
;

Proof

How can we prove that the queries do indeed work?

We can see if the total runtime sums up to the number of sleep calls, in seconds; but how do we know that sleeps do occur at the correct times?

A solution I offer is to use a stored routines which logs to a MyISAM table (a non transactional table) the exact time (using SYSDATE()) and value per row. The following constructs are introduced:

CREATE TABLE test.proof(
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  dt DATETIME NOT NULL,
  msg VARCHAR(255)
) ENGINE=MyISAM;

DELIMITER $$
CREATE FUNCTION test.prove_it(message VARCHAR(255)) RETURNS TINYINT
DETERMINISTIC
MODIFIES SQL DATA
BEGIN
  INSERT INTO test.proof (dt, msg) VALUES (SYSDATE(), message); RETURN 0;
END $$
DELIMITER ;

The prove_it() function records the immediate time and a message into the MyISAM table, which immediately accepts the write, being non-transactional. It returns with 0, so we will now embed it within the query. Of course, the function itself incurs some overhead, but it will nevertheless convince you that SLEEP()s do occur at the right time!

SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) IS NULL THEN NULL
      WHEN @row_counter % 1000 = 0 THEN
        CASE
          WHEN (@time_now := SYSDATE()) IS NULL THEN NULL
          WHEN (@time_diff := (TIMESTAMPDIFF(SECOND, @chunk_start_time, @time_now))) IS NULL THEN NULL
          WHEN SLEEP(@time_diff) + test.prove_it(CONCAT('will sleep for ', @time_diff, ' seconds')) IS NULL THEN NULL
          WHEN (@chunk_start_time := SYSDATE()) IS NULL THEN NULL
          ELSE 0
        END
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter,
  (SELECT @chunk_start_time := NOW()) sel_chunk_start_time
;

mysql> SELECT * FROM test.proof;
+----+---------------------+--------------------------+
| id | dt                  | msg                      |
+----+---------------------+--------------------------+
|  1 | 2011-11-01 09:22:36 | will sleep for 1 seconds |
|  2 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  3 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  4 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  5 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  6 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  7 | 2011-11-01 09:22:38 | will sleep for 1 seconds |
|  8 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
|  9 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 10 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 11 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 12 | 2011-11-01 09:22:40 | will sleep for 1 seconds |
| 13 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
| 14 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
| 15 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
+----+---------------------+--------------------------+

The above query is actually very fast. Try adding BENCHMARK(1000,ENCODE('hello','goodbye')) to rental_id so as to make it slower, or just use it on a really large table, see what happens (this is what I actually used to make the query run for several seconds in the example above).

Observant reads will note that the "will sleep..." message actually gets written after the SLEEP() call. I leave this as it is.

Another very nice treat of the code is that you don't need sub-second resolution for it to work. If you look at the above, we don't actually go to sleep every 1,000 rows (1,000 is just too quick in the query -- perhaps I should have used 10,000 seconds). But we do make it once a second has elapsed. Which means it works correctly on average. Of course, the entire discussion is only of interest when a query executes for a substantial number of seconds, so this is just an anecdote.

And the winner is...

Wow, this contest was anything but popular. Marc Alff is the obvious winner: he is the only one to suggest a solution :)

But Marc uses a very nice trick: he reads the PERFORMANCE_SCHEMA. Now, I'm not sure how the PERFORMANCE_SCHEMA gets updated. I know that the INFORMATION_SCHEMA.GLOBAL_STATUS table does not get updated by a query until the query completes (so you cannot expect a change in innodb_rows_read throughout the execution of the query). I just didn't test it (homework, anyone?). If it does get updated, then we can throttle the query based on InnoDB page reads using a simple query. Otherwise, an access to /proc/diskstats is possible, assuming no apparmor or SELinux are blocking us.

Marc also uses a stored function, which is the clean way of doing it; however I distrust the overhead incurred by s stored routine and prefer my solution (which is, admittedly, not a pretty SQL sight!).

Happy throttling!


PlanetMySQL Voting: Vote UP / Vote DOWN

MyISAM Key Buffer Usage

Июль 12th, 2011
For MyISAM one of the most important variables is the Key Buffer.  The Key Buffer is sometimes called the Key Cache. It's used as a buffer for the indices of MyISAM tables. There is some overhead in the buffer depending on the configured key block size.

The official way to calculate the key buffer usage as documented in the MySQL Reference manual:
1 - ((Key_blocks_unused * key_cache_block_size) / key_buffer_size)
 This will return the factor, so you have to multiply it with 100 to get the percentage. The Key_blocks_unused is used instead of the more obvious Key_blocks_used. This is due to the fact that Key_blocks_used is the maximum number of key blocks ever used. It will not return to 0 after a FLUSH TABLES.

This calculation does not take the overhead in account. The key buffer efficiency can be calculated if the key buffer is empty or (has been) completely full.

If the the key buffer is full:
key_buffer_coefficient = key_cache_block_size/(key_buffer_size/Key_blocks_used)

If the the key buffer is empty:
key_buffer_coefficient = key_cache_block_size/(key_buffer_size/Key_blocks_unused)

The 'fixed' formula:
1 - ((Key_blocks_unused * key_cache_block_size) / (key_buffer_size * key_buffer_coefficient))

The result is shown below:

The overhead in the key buffer will be allocated on startup, so the larger overhead for a larger key buffer will result in more memory usage, even if MyISAM is only used for the internal mysql tables.

PlanetMySQL Voting: Vote UP / Vote DOWN

MyISAM Key Buffer Usage

Июль 12th, 2011
For MyISAM one of the most important variables is the Key Buffer.  The Key Buffer is sometimes called the Key Cache. It's used as a buffer for the indices of MyISAM tables. There is some overhead in the buffer depending on the configured key block size.

The official way to calculate the key buffer usage as documented in the MySQL Reference manual:
1 - ((Key_blocks_unused * key_cache_block_size) / key_buffer_size)
 This will return the factor, so you have to multiply it with 100 to get the percentage. The Key_blocks_unused is used instead of the more obvious Key_blocks_used. This is due to the fact that Key_blocks_used is the maximum number of key blocks ever used. It will not return to 0 after a FLUSH TABLES.

This calculation does not take the overhead in account. The key buffer efficiency can be calculated if the key buffer is empty or (has been) completely full.

If the the key buffer is full:
key_buffer_coefficient = key_cache_block_size/(key_buffer_size/Key_blocks_used)

If the the key buffer is empty:
key_buffer_coefficient = key_cache_block_size/(key_buffer_size/Key_blocks_unused)

The 'fixed' formula:
1 - ((Key_blocks_unused * key_cache_block_size) / (key_buffer_size * key_buffer_coefficient))

The result is shown below:

The overhead in the key buffer will be allocated on startup, so the larger overhead for a larger key buffer will result in more memory usage, even if MyISAM is only used for the internal mysql tables.

PlanetMySQL Voting: Vote UP / Vote DOWN