Archive for the ‘ACID’ Category

Benchmarking MySQL ACID performance with SysBench

Июнь 21st, 2010

A couple of question I get a lot from MySQL customers is “how will this hardware upgrade improve my transactions per second (TPS)” and “what level of TPS will MySQL perform on this hardware if I’m running ACID settings?” Running sysbench against MySQL with different values for per-thread and global memory buffer sizes, ACID settings, and other settings gives me concrete values to bring to the customer to show the impact that more RAM, faster CPUs, faster disks, or cnf changes have on the server. Here are some examples for a common question: “If I’m using full ACID settings vs non-ACID settings what performance am I going to get from this server?”

Let’s find out by running sysbench with the following settings (most are self explanatory – if not the man page can explain them):

  • sysbench –test=oltp –db-driver=mysql –oltp-table-size=1000000 –mysql-engine-trx=yes –oltp-test-mode=complex –oltp-read-only=off –oltp-dist-type=special –max-requests=0 –num-threads=8 –max-time=120 –init-rng=on run

MySQL Settings:

In the first test MySQL is set to the following ACID related settings. This will give us results for TPS performance without full ACID compliance – very common settings on a server that is handling blogs, ad serving, general business websites, and other roles where full ACID is not required and performance is valued over the benefits of full ACID. These are important settings when we look at the difference in performance when we change to full ACID in the second test.

  • innodb_flush_log_at_trx_commit = 0
  • sync_binlog=0
  • transaction-isolation=REPEATABLE-READ

System configuration and InnoDB buffer pool size:

  • XEON E5345 Series 2.33ghz 8-core, 16GB RAM, Local SATA 7.2K disks
  • innodb_buffer_pool_size = 10G

Full result set from sysbench:

Summary OLTP test statistics:

  • queries performed:
  • transactions:                        172426 (1436.83 per sec.)
  • read/write requests:                 3276664 (27304.51 per sec.)
  • other operations:                    344882 (2873.91 per sec.)

Take away results:

We can simplify the results by looking at the following TPS results for this non-ACID test:

  • transactions:                        172426 (1436.83 per sec.)

Let’s go ahead and run the test again with different ACID settings. This will give us the TPS results for full ACID compliance:

  • innodb_flush_log_at_trx_commit = 1
  • sync_binlog=1
  • transaction-isolation=REPEATABLE-READ

We get the following results for TPS:

  • transactions:                     3197   (26.58 per sec.)
  • read/write requests:                 60743  (505.04 per sec.)
  • other operations:                    6394   (53.16 per sec.)

Final Results:

So as you can see the difference between full ACID settings and not (on the same server with only those values on the cnf being changed) results in a huge difference in performance on this standard database server. We can now hand this data to the customer and they will know what impact the settings will have on their application’s performance and what to expect when running full ACID vs non-ACID.

More info on using sysbench here: http://sysbench.sourceforge.net


PlanetMySQL Voting: Vote UP / Vote DOWN

High Insertion Rates into a TokuDB Table with Durable Transactions

Январь 6th, 2010

We recently made transactions in TokuDB 3.0 durable. We write table changes into a log file so that in the event of a crash, the table changes up to the last checkpoint can be replayed. Durability requires the log file to be fsync’ed when a transaction is committed. Unfortunately, fsync’s are not free, and may cost 10’s of milliseconds of time. This may seriously affect the insertion rate into a TokuDB table. How can one achieve high insertion rates in TokuDB with durable transactions?

Decrease the fsync cost

The fsync of the TokuDB log file writes all of the dirty log file data that is cached in memory by the operating system to the underlying storage system. The fsync time can be modeled with a simple linear equation: fsync time = N/R + K, where N is the amount of dirty data that needs to by written to disk, R is the disk write rate, and K is a constant time defined by the storage system. We want to to minimize the fsync time. Note that this model is conceptual and has not been verified by experiment, but it is good enough to identify some opportunities for decreasing the fsync cost.

One can increase the bandwidth of the log device (R). Suppose that large rows are being inserted into the database. A very large amount of log file data may be cached in memory by the operating system before the fsync. A storage system, perhaps a striped RAID, with a high write bandwidth will be able to store this log data quickly.

One can decrease the amount of data (N) that must be written to the log. There are several techniques that may be used here by TokuDB, including logging table rather than dictionary changes, and compressing the log files. These techniques will be shipped in a future TokuDB release.

One can decrease the constant fsync cost (K). A battery backed up RAID may speed up fsync’s since it writes data to non-volatile memory that is faster than the disks in the RAID.

One can put the TokuDB logs on their own storage system by using the tokudb_log_dir MySQL system variable. This will increase the overall system write bandwidth, and perhaps eliminate contention between the TokuDB log and the TokuDB fractal tree.

Amortize the fsync cost with large transactions

When the fsync time is a significant fraction of the time to execute a transaction, the insertion rate can be increased by using larger transactions.

Does this actually work? We ran iibench with tokudb_commit_sync ON and 1000 rows per transaction and measured over 10K rows/second insertion rate at 250M rows on a Sun Fire X4150. For 10,000 rows per transaction, we measured 15K rows/sec insertion rate.

Relax durability

Several MySQL storage engines provide mechanisms that relax durability by decoupling the fsync from the transaction commit. TokuDB provides the tokudb_commit_sync MySQL session variable, which works as follows.

If tokudb_commit_sync=ON, then the TokuDB log file is fsync’ed when the transaction commits. When used in this way, all transactions are durable. This is the default setting.

If tokudb_commit_sync=OFF, then the TokuDB log file is not fsync’ed when the transaction commits. When used in this way, the TokuDB tables recover to a transactionally consistent state that may not include the transactions committed after the last TokuDB checkpoint. It all depends on when the TokuDB log was last fsync’ed.

The TokuDB log consists of a sequence of log files, each of which is about 100MiB in size. TokuDB will fsync a log file whenever it fills one up and needs to create the next one. The TokuDB log will also be fsync’ed by a commit of a transaction in another MySQL client connection that has its tokudb_commit_sync session variable set ON.

How does a TokuDB checkpoint work? TokuDB checkpoints open dictionaries every 60 seconds by taking a snapshot of the current state of the dictionaries, writing all of the dirty dictionary data to disk, fsyncing the dictionary files, and finally fsyncing the TokuDB log.

We ran the iibench with tokudb_commit_sync OFF and measured over 17K rows/second insertion rate at 250M rows on a Sun Fire X4150.

The tokudb_commit_sync session variable may be used to implement application defined durability. For example, the application can set the tokudb_commit_sync session variable ON once per second rather than for every transaction. The effect will be one second worth of transaction vulnerability.

Summary

Durable transactions in TokuDB increase the write bandwidth to the storage system. This may make the storage system a performance bottleneck. If this is the case, one may consider using a storage system for the TokuDB logs with higher write bandwidth. One may also entertain relaxing the durability requirements of the application and control the fsync’s by the application.


PlanetMySQL Voting: Vote UP / Vote DOWN