Archive for the ‘write performance’ Category

Benchmarking single-row insert performance on Amazon EC2

Май 16th, 2012

I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, but still the tool serves well for benchmark purposes.

OK, let’s start off with the configuration details.

Configuration

First of all let me describe the EC2 instance type that I used.

EC2 Configuration

I chose m2.4xlarge instance as that’s the instance type with highest memory available, and memory is what really really matters.

High-Memory Quadruple Extra Large Instance
68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

As for the IO configuration I chose 8 x 200G EBS volumes in software RAID 10.

Now let’s come to the MySQL configuration.

MySQL Configuration

I used Percona Server 5.5.22-55 for the tests. Following is the configuration that I used:

## InnoDB options
innodb_buffer_pool_size         = 55G
innodb_log_file_size            = 1G
innodb_log_files_in_group       = 4
innodb_buffer_pool_instances    = 4
innodb_adaptive_flushing        = 1
innodb_adaptive_flushing_method = estimate
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_max_dirty_pages_pct      = 50
innodb_io_capacity              = 800
innodb_read_io_threads          = 8
innodb_write_io_threads         = 4
innodb_file_per_table           = 1

## Disabling query cache
query_cache_size                = 0
query_cache_type                = 0

You can see that the buffer pool is sized at 55G and I am using 4 buffer pool instances to reduce the contention caused by buffer pool mutexes. Another important configuration that I am using is that I am using “estimate” flushing method available only on Percona Server. The “estimate” method reduces the impact of traditional InnoDB log flushing, which can cause downward spikes in performance. Other then that, I have also disabled query cache to avoid contention caused by query cache on write heavy workload.

OK, so that was all about the configuration of the EC2 instance and MySQL.

Now as far as the benchmark itself is concerned, I made no code changes to iiBench, and used the version available here. But I changed the table to use range partitioning. I defined a partitioning scheme such that every partition would hold 100 million rows.

Table Structure

The table structure of the table with no secondary indexes is as follows:

CREATE TABLE `purchases_noindex` (
  `transactionid` int(11) NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  PRIMARY KEY (`transactionid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (transactionid)
(PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

While the structure of the table with secondary indexes is as follows:

CREATE TABLE `purchases_index` (
  `transactionid` int(11) NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  PRIMARY KEY (`transactionid`),
  KEY `marketsegment` (`price`,`customerid`),
  KEY `registersegment` (`cashregisterid`,`price`,`customerid`),
  KEY `pdc` (`price`,`dateandtime`,`customerid`)
) ENGINE=InnoDB AUTO_INCREMENT=11073789 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (transactionid)
(PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

Also, I ran 5 instances of iiBench simultaneously to simulate 5 concurrent connections writing to the table, with each instance of iiBench writing 200 million single row inserts, for a total of 1 billion rows. I ran the test both with the table purchases_noindex which has no secondary index and only a primary index, and against the table purchases_index which has 3 secondary indexes. Another thing I would like to share is that, the size of the table without secondary indexes is 56G while the size of the table with secondary indexes is 181G.

Now let’s come down to the interesting part.

Results

With the table purchases_noindex, that has no secondary indexes, I was able to achieve an avg. insert rate of ~25k INSERTs Per Second, while with the table purchases_index, the avg. insert rate reduced to ~9k INSERTs Per Second. Let’s take a look at the graphs have a better view of the whole picture.

Note, in the above graph, we have “millions of rows” on the x-axis and “INSERTs Per Second” on the y-axis.
The reason why I have chosen to show “millions of rows” on the x-axis so that we can see the impact of growth in data-set on the insert rate.

We can see that adding the secondary indexes to the table has decreased the insert rate by 3x, and its not even consistent. While with the table having no secondary indexes, you can see that the insert rate is pretty much constant remaining between ~25k to ~26k INSERTs Per Second. But on the other hand, with the table having secondary indexes, we can see that there are regular spikes in the insert rate, and the variation in the rate can be classified as large, because it varies between ~6.5k to ~12.5k INSERTs per second, with noticeable spikes after every 100 million rows inserted.

I noticed that the insert rate drop was mainly caused by IO pressure caused by increase in flushing and checkpointing activity. This caused spikes in write activity to the point that the insert rate was decreased.

Conclusion

As we all now there are pros and cons to using secondary indexes. While secondary indexes cause read performance to improve, but they have an impact on the write performance. Well most of the apps rely on read performance and hence having secondary indexes is an obvious choice. But for those applications that are write mostly or that rely a lot on write performance, reducing the no. of secondary indexes or even going away with secondary indexes causes a write throughput increase of 2x to 3x. In this particular case, since I was mostly concerned with write performance, so I went ahead to choose a table structure with no secondary indexes. Other important things to consider when you are concerned with write performance is using partitioning to reduce the size of the B+tree, having multiple buffer pool instances to reduce contention problems caused by buffer pool mutexes, using “estimate” checkpoint method to reduce chances of log flush storms and disabling the query cache.


PlanetMySQL Voting: Vote UP / Vote DOWN

Scaling writes in MySQL

Сентябрь 28th, 2009
We use MySQL on most of our projects. One of these projects has a an access pattern unlike any other I've worked on. Several million records a day need to be written to a table. These records are then read out once at the end of the day, summarised and then very rarely touched again. Each record is about 104 bytes long (thre's one VARCHAR column, everything else is fixed), and that's after squeezing out every byte possible. The average number of records that we write in a day is 40 million, but this could go up.

A little bit about the set up. We have fairly powerful boxes with large disks using RAID1/0 and 16GB RAM, however at the time they only had 4GB. For BCP, we have a multi-master set up in two colos with statement level replication. We used MySQL 5.1.

My initial tests with various parameters that affect writes showed that while MyISAM performed slightly better than InnoDB while the tables were small, it quickly deteriorated as the table size crossed a certain point. InnoDB performance deteriorated as well, but at a higher table size. The table size turned out to be related to the innodb_buffer_pool_size, and that in turn was capped by the amount of RAM we had on the system.

I decided to go with InnoDB since we also needed transactions for the summary tables and I preferred not to divide my RAM between two different engines. I stripped out all indexes, and retained only the primary key. Since InnoDB stores the table in the primary key, I decided that rather than use an auto_increment column, I'd cover several columns with the primary key to guarantee uniqueness. This had the added advantage that if the same record was inserted more than once, it would not result in duplicates. This small point was crucial for BCP, because it meant that we did not have to keep track of which records had already been inserted. If something crashed, we could just reinsert the last 30 minutes worth of data, possibly into the secondary master, and not have any duplicates at the end of it. I used INSERT IGNORE to get this done automatically.

Now to get back to the table size limit that we were facing. Initial tests showed that we could insert at most 2100 records per second until the table size got to a little over the innodb_buffer_pool_size and at that point it degraded fairly rapidly to around 150 records per second. This was unacceptable because records were coming in to the system at an average rate of 1000 per second. Since we only needed to read these records at the end of the day, it was safe to accumulate them into a text file and periodically insert them in bulk. I decided to insert 40,000 records at one time. The number I chose was arbitrary, but later tests that I ran on batches of 10K, 20K and 80K showed no difference in insert rates. With batch inserts, we managed to get an insert rate of 10,000 records per second, but this also degraded as soon as we hit the limit going down to 150 records per second.

System stats on the database box showed that the disk was almost idle for most of the run and then suddenly shot up to 90-100% activity once we hit this limit, so it was obvious that at this point, the DB was exchanging data between buffers and disk all the time.

At this point, someone suggested that we try partitioning, which was available in MySQL 5.1. My first instinct was to partition based on the primary key so that we could read data out easily. However, reads weren't really our problem since we had no restriction on how fast they needed to be (at least not as much as writes). Instead, I decided to partition my table based on the pattern of incoming data.

The first part was obvious, use a separate table for each day's data. On a table of this size, DROP TABLE is much faster than DELETE From <table> Where ..., and it also reclaims lost space. I should mention at this point that we used file_per_table as well to make sure that each table had its own file rather than use a single innodb file.

Secondly, each table was partitioned on time. 12 partitions per day, 2 hours of data per partition. The MySQL docs for Partitioning were quite useful in understanding what to do. The command ended up looking like this:
    CREATE TABLE (
...
) PARTITION BY RANGE( ( time DIV 3600 ) MOD 24 ) (
Partition p0 values less than (2),
Partition p0 values less than (4),
Partition p0 values less than (6),
Partition p0 values less than (8),
Partition p0 values less than (10),
Partition p0 values less than (12),
Partition p0 values less than (14),
Partition p0 values less than (16),
Partition p0 values less than (18),
Partition p0 values less than (20),
Partition p0 values less than (22),
Partition p0 values less than (24)
);
The time field is the timestamp of incoming records, and since time always moves forward (at least in my universe), this meant that I would never write to more than 2 partitions at any point in time. Now, a little back of the envelope calculations:
    44M x 102 bytes = approx 4.2GB
2x for InnoDB overhead = approx 8.4GB
+10% for partitioning overhead = 9.2GB
/12 partitions = approx 760MB per partition
This turned out to be more or less correct. In most cases total table size ranges between 8-10GB, sometimes it goes up to 13GB. Partition sizes range from less than 700MB to over 1GB depending on the time of day. With 4GB of RAM, we had an innodb_buffer_pool set at 2.7GB, which was good enough to store two partitions, but not good enough to work on any other tables or do anything else on the box. Boosting the RAM to 16GB meant that we could have a 12GB buffer pool, and leave 4GB for the system. This was enough for 2 partitions, even if the total number of records went up, and we could work on other tables as well.

After partitioning, tests showed that we could sustain an insert rate of 10K rows per second for some time. As the table size grew past 10 million records, the insert rate dropped to about 8500 rows per second, but it stayed at that rate for well over 44 million records. I tested inserts up to 350 million records and we were able to sustain an insert rate of around 8500 rows per second. Coincidentally, during Michael Jackson's memorial service, we actually did hit an incoming rate of a little over 8000 records per second for a few hours.

One more BotE calculation:
    8500 rows per second  x  86400 seconds per day = 734.4 Million records per day
Considering that before this system was redesigned it was handling about 7 Million records per day, I'd say that we did pretty well.

PlanetMySQL Voting: Vote UP / Vote DOWN