Archive for the ‘logs’ Category

What is the proper size of InnoDB logs?

Апрель 10th, 2012

In one of my previous posts, “How to resize InnoDB logs?”, I gave the advice on how to safely change the size of transaction logs. This time, I will explain why doing it may become necessary.

A brief introduction to InnoDB transaction logs

The transaction logs handle REDO logging, which means they keep the record of all recent modifications performed by queries in any InnoDB table. But they are a lot more than just an archive of transactions. The logs play important part in the process of handling writes. When a transaction commits, InnoDB synchronously makes a note of any changes into the log, while updating the actual table files happens asynchronously and may take place much later. Each log entry is assigned a Log Sequence Number – an incremental value that always uniquely identifies a change.

InnoDB writes changes to the log as a transaction commits, while updating data in the table file happens asynchronously and may take place much later

 

Such design serves database in two ways.

First, it optimizes MySQL performance. It enables InnoDB to use light sequential I/O to store the modifications on disk as transactions commit and delay expensive random I/O required for data and index updates for when it is more convenient. Buffered updates may be then rearranged or merged in order to further optimize disk access.

And second, after a crash InnoDB can use the logs contents to perform recovery. As updates to the data files are done asynchronously, in the event of an unclean shut down, any modifications that existed only in the buffer pool would be lost. The ability to replay changes from the log deals with the problem.

InnoDB logs are circular, which means that they create a loop. When database reaches the end of the last file (commonly there are two files), it begins writing into the first one again.

When database reaches the end of the last file (commonly there are two files), it begins writing into the first one again.

They are also fixed size, so in order to prevent them from filling, InnoDB implements a background mechanism called checkpointing, which manages the process of synchronizing modified pages to disk (step (3) in the first illustration). As it runs, it marks exactly one log record as checkpoint, which states that all modifications carrying a younger LSN value were already safely stored in data files. This means the log contents prior to that LSN is no longer needed for recovery or any other purpose. As the checkpoint progresses, continuously chasing the most recent LSN, it keeps freeing the log space behind it, which can then be re-used by future writes.

The challenges

What are the main challenges of choosing the right size for InnoDB logs?

Making them small enough to avoid unnecessarily long distance between checkpoint LSN and the most recent LSN to avoid needlessly long recovery times.

With newer MySQL versions this should not be a significant factor anymore as the recovery process has been greatly optimized in MySQL 5.1.46 and in 5.5.

In older versions, however, the process used to be slow and would often needed hours to replay transactions from even medium-sized logs such as 512MB or 1GB, so it is a risk that needs to be evaluated.

Making them large enough to fit writes over sufficiently long period of time, so that InnoDB has some room to maneuver in deciding when to flush some of the buffer pool contents.

When the transaction logs are set too small for given workload, MySQL performance may suffer. During busier periods incoming writes may start pushing LSN faster than checkpoint can progress and the log space will start filling up. After a threshold is exceeded, checkpointing becomes very intense as InnoDB spots the upcoming problem that it may be short of free log space soon. When this doesn’t help, the engine may need to start blocking queries as it is flushing the buffer pool contents and advancing the checkpoint.

When the transaction logs are set too small for given workload, MySQL performance may suffer.

What size works?

When installing a new database, the transaction logs often need to be configured without too much knowledge about future workload. One can simply try choosing a reasonable size such as 64M for a pretty average database. There is no reason to go below this value. Of course, the more writes this new database is expected to take, the larger they may need to be. In any case it is extremely important to always change the default size, which is just two files of 5MB. Such configuration is not sufficient for any serious purpose, so do not ever never allow it even on a development server.

With database already running in testing or in production, the necessary size can be calculated based on the rate at which data is written into the transaction logs as this information is available from MySQL. A good rule says that the logs should be able to hold at least one hour worth of changes. In order to come up with a number, simply check how quickly Log Sequence Number progresses. Be sure not to check this during quiet periods as this has to be tuned for the peak usage.

Calculating the size

The information can be found in InnoDB status output, which you can obtain with SHOW ENGINE INNODB STATUS:

mysql> \P grep 'Log sequence number'
PAGER set to 'grep 'Log sequence number''
mysql> SHOW ENGINE INNODB STATUS\G
Log sequence number 21060750647056
1 row in set (0.06 sec)

Using the command line pager helped limiting the output to only the relevant information.

In reality, LSN value represents an offset from byte zero of the transaction log, so since the database has been initialized. Seeing how it changes essentially means seeing how much data was written into it over a period of time. So to figure out the amount of changes happening to a database:

  • check the most recent LSN using the method shown above
  • wait some time
  • check the LSN again
  • subtract the two values

Here is how to do it in practice:

mysql> \P grep 'Log sequence number'
PAGER set to 'grep 'Log sequence number''
mysql> SHOW ENGINE INNODB STATUS\G SELECT SLEEP(60); SHOW ENGINE INNODB STATUS\G
21057170602213
1 row in set (0.06 sec)

1 row in set (1 min 0.00 sec)

21057190468976
1 row in set (0.05 sec)

The two LSN values are 21057170602213 and 21057190468976. Let’s calculate the difference:

mysql> select ROUND((21057190468976 - 21057170602213)/ 1024 / 1024) as MB;
+------+
| MB   |
+------+
|   19 |
+------+
1 row in set (0.00 sec)

19 megabytes were written into the log file in one minute. Using this information, how large the logs should be to keep at least one hour worth of writes? 19MB * 60 minutes / 2 files = 570MB. The division by two comes from the fact that InnoDB uses two log files by default, while we need to set size for each individual file. Therefore we calculated that this database needs innodb_log_file_size set to at least 512MB.

In MySQL 5.1 or newer, a query against INFORMATION_SCHEMA.GLOBAL_STATUS can be used instead of looking at the InnoDB status output.

SELECT @a1 := variable_value AS a1
FROM information_schema.global_status
WHERE variable_name = 'innodb_os_log_written'
UNION ALL
SELECT Sleep(60)
UNION ALL
SELECT @a2 := variable_value AS a2
FROM information_schema.global_status
WHERE variable_name = 'innodb_os_log_written';

SELECT ROUND((@a2-@a1) * 60 / 1024 / 1024 / @@innodb_log_files_in_group) as MB;

PlanetMySQL Voting: Vote UP / Vote DOWN

Usability improvements in Tungsten Replicator 2.0.4

Август 11th, 2011
If you love a software product, you should try to improve it, and not be afraid of criticizing it. This principle has guided me with MySQL (where I have submitted many usability bugs, and discussed interface with developers for years), and it proves true for Tungsten Replicator as well. When I started working at Continuent, while I was impressed by the technology, I found the installation procedure and the product logs quite discouraging. I would almost say disturbing. Fortunately, my colleagues have agreed on my usability focus, and we can enjoy some tangible improvements. I have already mentioned the new installation procedure, which requires just one command to install a full master/slave cluster. I would like to show how you can use the new installer to deploy a multiple source replication topology like the following: The first step is to install one master in each node. I can run the commands from node #4, which is the one that will eventually receive the updates from the remote masters, and where I need to install the slave services:
TUNGSTEN_BASE=$HOME/newinst

SERVICES=(alpha bravo charlie delta)
REPLICATOR=$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/replicator

for N in 1 2 3 4
do
INDEX=$(($N-1))

./tools/tungsten-installer \
--master-slave \
--master-host=qa.r$N.continuent.com \
--datasource-user=tungsten \
--datasource-password=secret \
--service-name=${SERVICES[$INDEX]} \
--home-directory=$TUNGSTEN_BASE \
--cluster-hosts=qa.r$N.continuent.com \
--start-and-report
done
The above loop will install a master (remotely or locally) in the four servers. Then I need to create the slave services. To do it, I use the updated configure-service in the tools directory.
TUNGSTEN_TOOLS=$TUNGSTEN_BASE/tungsten/tools

COMMON_OPTIONS='-C -q
--local-service-name=delta
--role=slave
--service-type=remote
--allow-bidi-unsafe=true
--datasource=qa_r4_continuent_com'

$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r1.continuent.com alpha
$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r2.continuent.com bravo
$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r3.continuent.com charlie

$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/replicator restart
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl services
These commands create the slave services locally in Delta. After restarting the replicator, a simple test will be creating something different in each master, and check that the data has replicated to the single slave. The latest improvement in matter of usability is the simplification of the replicator logs. Until a few days ago, if you had an error in the replicator, you would get a long list of not exactly helpful stuff. For example, if I create a table in a slave, and then create the same table in the master, I will break replication. The extended log would produce something like this:
INFO   | jvm 1    | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,216 [tsandbox - q-to-dbms-0] ERROR pipeline.SingleThreadStageTask Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master

INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,217 [tsandbox - Event dispatcher thread] ERROR management.OpenReplicatorManager Received error notification, shutting down services: Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
INFO | jvm 1 | 2011/08/11 18:10:52 | com.continuent.tungsten.replicator.applier.ApplierException: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:183)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:1233)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.java:101)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(SingleThreadStageTask.java:498)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThreadStageTask.java:155)
INFO | jvm 1 | 2011/08/11 18:10:52 | at java.lang.Thread.run(Unknown Source)
INFO | jvm 1 | 2011/08/11 18:10:52 | Caused by: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:139)
INFO | jvm 1 | 2011/08/11 18:10:52 | ... 5 more
INFO | jvm 1 | 2011/08/11 18:10:52 | Caused by: java.sql.SQLSyntaxErrorException: Table 't1' already exists
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:78)
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.DrizzleStatement.executeBatch(DrizzleStatement.java:930)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:125)
INFO | jvm 1 | 2011/08/11 18:10:52 | ... 5 more
INFO | jvm 1 | 2011/08/11 18:10:52 | Caused by: org.drizzle.jdbc.internal.common.QueryException: Table 't1' already exists
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.internal.mysql.MySQLProtocol.executeQuery(MySQLProtocol.java:500)
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.internal.mysql.MySQLProtocol.executeBatch(MySQLProtocol.java:546)
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.DrizzleStatement.executeBatch(DrizzleStatement.java:917)
INFO | jvm 1 | 2011/08/11 18:10:52 | ... 6 more
INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,218 [tsandbox - Event dispatcher thread] WARN management.OpenReplicatorManager Performing emergency service shutdown
INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - Event dispatcher thread] INFO pipeline.Pipeline Shutting down pipeline: slave
INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - q-to-dbms-0] INFO pipeline.SingleThreadStageTask Terminating processing for stage task thread
INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - q-to-dbms-0] INFO pipeline.SingleThreadStageTask Last successfully processed event prior to termination: seqno=0 eventid=mysql-bin.000002:0000000000000426;20
Did you see the reason for the error? No? Neither did I. I would need to open the THL, look for event #1, and determine what it was. Instead, the new user.log looks like this:
2011-08-11 18:10:52,216 ERROR Received error notification: Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master

Caused by : java.sql.SQLException: Statement failed on slave but succeeded on master
Caused by : Statement failed on slave but succeeded on master
Caused by : Table 't1' already exists
Caused by : Table 't1' already exists
2011-08-11 18:10:54,721 INFO State changed ONLINE -> OFFLINE:ERROR
2011-08-11 18:10:54,721 WARN Received irrelevant event for current state: state=OFFLINE:ERROR event=OfflineNotification
That's much better. It is not perfect yet, but it will be soon. Right now, it tells me what is wrong without forcing me to go hunting for it amid hundreds of stack trace lines. Give it a try, using the latest replicator build.

PlanetMySQL Voting: Vote UP / Vote DOWN

EXPLAIN: missing db info

Май 11th, 2010

I’m further developing a general log hook, which can stream queries from the general log.

A particular direction I’m taking is to filter queries by their type of actions. For example, the tool (oak-hook-general-log) can be instructed to only stream out those queries which involve creation of a temporary table; or those which cause for a filesort, or full index scan, etc.

This is done by evaluating of query execution plans on the fly. I suspect the MySQL query analyzer roughly does the same (as a small part of what it does).

It’s almost nothing one cannot do with sed/awk. However, I bumped into a couple of problems:

  1. The general log (and the mysql.general_log table, in  particular) does not indicate the particular database one is using for the query. Since slow log does indicate this data, I filed a bug on this. I currently solve this by crossing connection id with the process list, where the current database is listed. It’s shaky, but mostly works.
  2. Just realized: there’s no DB info in the EXPLAIN output! It’s weird, since I’ve been EXPLAINing queries for years now. But I’ve always had the advantage of knowing the schema used: either because I was manually executing the query on a known schema, or mk-query-digest was kind enough to let me know.

For example, look at the following imaginary query, involving both the world and sakila databases:

mysql> use test;
Database changed
mysql> EXPLAIN SELECT * FROM world.Country JOIN sakila.city WHERE Country.Capital = city.city_id;
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
| id | select_type | table   | type   | possible_keys | key     | key_len | ref                   | rows | Extra       |
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
|  1 | SIMPLE      | Country | ALL    | NULL          | NULL    | NULL    | NULL                  |  239 |             |
|  1 | SIMPLE      | city    | eq_ref | PRIMARY       | PRIMARY | 2       | world.Country.Capital |    1 | Using where |
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
2 rows in set (0.00 sec)

The query is imaginary, since the tables don’t actually have anything in common. But look at the EXPLAIN result: can you tell where city came from? Country can somehow be parsed from the ref column, but no help on city.

Moreover, table aliases show on the EXPLAIN plan (which is good), but with no reference to the original table.

So, is it back to parsing of the SQL query? I’m lazy reluctant to do that. It’s error prone, and one needs to implement, or use, a good parser, which also accepts MySQL dialect. Haven’t looked into this yet.

I’m currently at a standstill with regard to automated query execution plan evaluation where database cannot be determined.


PlanetMySQL Voting: Vote UP / Vote DOWN

oak-hook-general-log: streaming general log

Март 21st, 2010

I’m seeking input on a new openark kit utility I’ve started to implement.

The tool, oak-hook-general-log, will hook up to a MySQL (>= 5.1) server, and stream the general log into standard output. It looks like this:

bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root
2010-03-21 10:18:42     root[root] @ localhost []       79      1       Query   SELECT COUNT(*) FROM City
2010-03-21 10:18:48     root[root] @ localhost []       79      1       Query   DELETE FROM City WHERE id=1000
2010-03-21 10:18:54     root[root] @ localhost []       79      1       Query   SHOW PROCESSLIST
2010-03-21 10:19:06     root[root] @ localhost []       79      1       Quit
2010-03-21 10:19:07     root[root] @ localhost []       93      1       Connect root@localhost on
2010-03-21 10:19:07     root[root] @ localhost []       93      1       Query   select @@version_comment limit 1
2010-03-21 10:22:33     root[root] @ localhost []       93      1       Query   SELECT City.Name, Country.Name FROM Country JOIN City ON Country.Capit
2010-03-21 10:22:58     root[root] @ localhost []       93      1       Quit

Since output is written to stdout, one can further:

bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root | grep Connect
bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root | grep webuser@webhost

What the tool does is to enable table logs, and periodically rotate the mysql.general_log table, read and dump its content.

The tool:

  • Stores and restores the original log state (general log enabled/disabled, log output).
  • Disables printing of its own queries to the general log.
  • Automatically times out (timeout configurable) so as not to enter a situation where the general log is forgotten to be turned on.
  • Can discard pre-existing data on the mysql.general_log table.
  • Will cleanup the mysql.slow_log table, if it wasn’t originally used (turning on table logs applies to both general log and slow log).

What would you have the tool do further? Should it provide filtering, or should we just use grep/sed/awk for that? Any internal aggregation of data?

I would love to hear your thoughts. Meanwhile, view or grab the python script file.


PlanetMySQL Voting: Vote UP / Vote DOWN