Archive for the ‘benchmarks’ Category

btrfs – probably not ready yet

Май 23rd, 2012

Every time I have a conversation on SSD, someone mentions btrfs filesystem. And usually it is colored as a solution that will solve all our problems, improve overall performance and SSD in particular, and it is a saviour. Of course it caught my curiosity and I decided to perform a benchmark similar to what I did on ext4 filesystem over Intel 520 SSD.
I was prepared for surprises, as even on formatting stage, mkfs.btrfs says that filesystem is EXPERIMENTAL. In case with filesystems I kind of agree with Stewart, so question #1, what you should ask deciding on what filesystem to use, is “Was this filesystem used in a production more than 5 years?”, so from this point, btrfs has a long way ahead.

How you can get btrfs? Actually it is quite easy if you are on CentOS/RedHat/Oracle Linux 6.2.
Oracle provides Unbreakable Enterprise Kernel, which includes btrfs, so you can get it with this kernel. And installation is quite easy and straightforward, just follow instructions.

So, to numbers. Workload and benchmark are exactly the same as in my previous benchmark, and I perform runs only for 10 and 20GB buffer pool, as it is enough to understand picture. The previous run was done on ext4, so if we repeat the same on btrfs, it will allow us to compare the results.

I format btrfs with default options, and mount it with -o ssd,nobarrier options.

Throughput results:

We can see that btrfs not only provides worse throughput (5x!), but it is also less stable.

Response time:

The same happens with response time. Actually 95% response time is about 10x worse with btrfs.

And response time, timeline:

We can see that btrfs is very far from providing a stable response time.

I guess the conclusion is obvious, and I think it is fine for a filesystem that is in the EXPERIMENTAL state.
Most likely it is some bug or misconfiguration that does not allow btrfs to show all its potential.
I just will consider all talks of btrfs characteristic as premature and will wait until it is more stable
before running any more experiments with it.

Benchmarks specification, hardware, scripts and raw results are available in the full report for Intel 520 SSD.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Intel 520 SSD in MySQL sysbench oltp benchmark

Май 22nd, 2012

In my raw IO benchmark of Intel 520 SSD we saw that the drive does not provide uniform throughput and response time, but it is interesting how does it affect workload if it comes from MySQL.
I prepared benchmarks results for Sysbench OLTP workload with MySQL running on Intel 520.
You can download it there.

There I want to publish graphs to compare Intel 520 vs regular RAID10.

Throughput:

Response time:

So despite big variation in raw IO, it seems it does not affect MySQL workload significantly, and single Intel 520 SSD gives much better throughput and response time comparing with traditional SAS RAID, and what is interesting it also much cheaper.
What’s bad with Intel 520 is that this card does not have capacitor to protect write cache, so if you worry about data protection in case of power outage it is better to disable write cache on this card and use write cache from RAID controller (i.e. LSI-9260).

Benchmarks specification, hardware, scripts and raw results are available in the full report.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Benchmarking single-row insert performance on Amazon EC2

Май 16th, 2012

I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, but still the tool serves well for benchmark purposes.

OK, let’s start off with the configuration details.

Configuration

First of all let me describe the EC2 instance type that I used.

EC2 Configuration

I chose m2.4xlarge instance as that’s the instance type with highest memory available, and memory is what really really matters.

High-Memory Quadruple Extra Large Instance
68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

As for the IO configuration I chose 8 x 200G EBS volumes in software RAID 10.

Now let’s come to the MySQL configuration.

MySQL Configuration

I used Percona Server 5.5.22-55 for the tests. Following is the configuration that I used:

## InnoDB options
innodb_buffer_pool_size         = 55G
innodb_log_file_size            = 1G
innodb_log_files_in_group       = 4
innodb_buffer_pool_instances    = 4
innodb_adaptive_flushing        = 1
innodb_adaptive_flushing_method = estimate
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_max_dirty_pages_pct      = 50
innodb_io_capacity              = 800
innodb_read_io_threads          = 8
innodb_write_io_threads         = 4
innodb_file_per_table           = 1

## Disabling query cache
query_cache_size                = 0
query_cache_type                = 0

You can see that the buffer pool is sized at 55G and I am using 4 buffer pool instances to reduce the contention caused by buffer pool mutexes. Another important configuration that I am using is that I am using “estimate” flushing method available only on Percona Server. The “estimate” method reduces the impact of traditional InnoDB log flushing, which can cause downward spikes in performance. Other then that, I have also disabled query cache to avoid contention caused by query cache on write heavy workload.

OK, so that was all about the configuration of the EC2 instance and MySQL.

Now as far as the benchmark itself is concerned, I made no code changes to iiBench, and used the version available here. But I changed the table to use range partitioning. I defined a partitioning scheme such that every partition would hold 100 million rows.

Table Structure

The table structure of the table with no secondary indexes is as follows:

CREATE TABLE `purchases_noindex` (
  `transactionid` int(11) NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  PRIMARY KEY (`transactionid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (transactionid)
(PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

While the structure of the table with secondary indexes is as follows:

CREATE TABLE `purchases_index` (
  `transactionid` int(11) NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  PRIMARY KEY (`transactionid`),
  KEY `marketsegment` (`price`,`customerid`),
  KEY `registersegment` (`cashregisterid`,`price`,`customerid`),
  KEY `pdc` (`price`,`dateandtime`,`customerid`)
) ENGINE=InnoDB AUTO_INCREMENT=11073789 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (transactionid)
(PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

Also, I ran 5 instances of iiBench simultaneously to simulate 5 concurrent connections writing to the table, with each instance of iiBench writing 200 million single row inserts, for a total of 1 billion rows. I ran the test both with the table purchases_noindex which has no secondary index and only a primary index, and against the table purchases_index which has 3 secondary indexes. Another thing I would like to share is that, the size of the table without secondary indexes is 56G while the size of the table with secondary indexes is 181G.

Now let’s come down to the interesting part.

Results

With the table purchases_noindex, that has no secondary indexes, I was able to achieve an avg. insert rate of ~25k INSERTs Per Second, while with the table purchases_index, the avg. insert rate reduced to ~9k INSERTs Per Second. Let’s take a look at the graphs have a better view of the whole picture.

Note, in the above graph, we have “millions of rows” on the x-axis and “INSERTs Per Second” on the y-axis.
The reason why I have chosen to show “millions of rows” on the x-axis so that we can see the impact of growth in data-set on the insert rate.

We can see that adding the secondary indexes to the table has decreased the insert rate by 3x, and its not even consistent. While with the table having no secondary indexes, you can see that the insert rate is pretty much constant remaining between ~25k to ~26k INSERTs Per Second. But on the other hand, with the table having secondary indexes, we can see that there are regular spikes in the insert rate, and the variation in the rate can be classified as large, because it varies between ~6.5k to ~12.5k INSERTs per second, with noticeable spikes after every 100 million rows inserted.

I noticed that the insert rate drop was mainly caused by IO pressure caused by increase in flushing and checkpointing activity. This caused spikes in write activity to the point that the insert rate was decreased.

Conclusion

As we all now there are pros and cons to using secondary indexes. While secondary indexes cause read performance to improve, but they have an impact on the write performance. Well most of the apps rely on read performance and hence having secondary indexes is an obvious choice. But for those applications that are write mostly or that rely a lot on write performance, reducing the no. of secondary indexes or even going away with secondary indexes causes a write throughput increase of 2x to 3x. In this particular case, since I was mostly concerned with write performance, so I went ahead to choose a table structure with no secondary indexes. Other important things to consider when you are concerned with write performance is using partitioning to reduce the size of the B+tree, having multiple buffer pool instances to reduce contention problems caused by buffer pool mutexes, using “estimate” checkpoint method to reduce chances of log flush storms and disabling the query cache.


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing Fusion-io ioDrive2 Duo

Май 11th, 2012

I was lucky enough to get my hands on new Fusion-io ioDrive2 Duo card. So I decided to run the same series of tests I did for other Flash devices. This is ioDrive2 Duo 2.4TB card and it is visible to OS as two devices (1.2TB each), which can be connected together via software RAID. So I tested in two modes: single drive, and software RAID-0 over two drives.

I should note that to run this card you need to have an external power, by the same reason I mentioned in the previous post: PCIe slot can provide only 25W power, which is not enough for ioDrive2 Duo to provide full performance. I mention this, as it may be challenge for some servers: some models may not have connector for external power, and for some you may need special “power kit”. So you need to make sure you have compatible hardware before getting Duo card. I personally ended up with setup like this: I use a separate power supply.

Fusion-io ioDrive2 Firmware v6.0.0, rev 107004 Public, Fusion-io driver version: 3.1.1.

Now to the results.
For this test I also use Cisco UCS C250 server, and on the graph I show the results for both single card and raid (Duo).

Random writes, async:

We see stable and predictable write performance, with throughput: 660 MiB/s for single, and 1300 MiB/s for Duo

Random reads:

Again both modes provides stable level of throughput. 1350 MiB/s for single and 2300 MiB/s for Duo.

Now with separation per thread for random read synchronous IO:

There is also excellent response time characteristics. 0.25ms and 0.19ms for 8 threads, single and Duo cases.

In general ioDrive2 seems to provide better and more stable performance results comparing to previous generation ioDrive.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing Fusion-io ioDrive2 Duo

Май 11th, 2012

I was lucky enough to get my hands on new Fusion-io ioDrive2 Duo card. So I decided to run the same series of tests I did for other Flash devices. This is ioDrive2 Duo 2.4TB card and it is visible to OS as two devices (1.2TB each), which can be connected together via software RAID. So I tested in two modes: single drive, and software RAID-0 over two drives.

I should note that to run this card you need to have an external power, by the same reason I mentioned in the previous post: PCIe slot can provide only 25W power, which is not enough for ioDrive2 Duo to provide full performance. I mention this, as it may be challenge for some servers: some models may not have connector for external power, and for some you may need special “power kit”. So you need to make sure you have compatible hardware before getting Duo card. I personally ended up with setup like this: I use a separate power supply.

Fusion-io ioDrive2 Firmware v6.0.0, rev 107004 Public, Fusion-io driver version: 3.1.1.

Now to the results.
For this test I also use Cisco UCS C250 server, and on the graph I show the results for both single card and raid (Duo).

Random writes, async:

We see stable and predictable write performance, with throughput: 660 MiB/s for single, and 1300 MiB/s for Duo

Random reads:

Again both modes provides stable level of throughput. 1350 MiB/s for single and 2300 MiB/s for Duo.

Now with separation per thread for random read synchronous IO:

There is also excellent response time characteristics. 0.25ms and 0.19ms for 8 threads, single and Duo cases.

In general ioDrive2 seems to provide better and more stable performance results comparing to previous generation ioDrive.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

New distribution of random generator for sysbench – Zipf

Май 10th, 2012

Sysbench has three distribution for random numbers: uniform, special and gaussian. I mostly use uniform and special, and I feel that both do not fully reflect my needs when I run benchmarks. Uniform is stupidly simple: for a table with 1 mln rows, each row gets equal amount of hits. This barely reflects real system, it also does not allow effectively test caching solution, each row can be equally put into cache or removed. That’s why there is special distribution, which is better, but to other extreme – it is skewed to very small percentage of rows, which makes this distribution good to test cache, but it is hard to emulate high IO load.

That’s why I was looking for alternatives, and Zipfian distribution seems decent one. This distribution has a parameter θ (theta), which defines how skewed the distribution is. A physical sense of this parameter, if to apply to database tables, is following: say row 1 accessed N, then row 2 is accessed 2^θ less times, row 3 is accessed 3^θ less, …, row X is accessed X^θ less times.
Say θ=1.1, then if row 1 accessed 1,000,000 times, then row 2 is : 1,000,000/(2^1.1)=466,516 times, row 3: 1,000,000/(2^1.1)=298,652 times, …, row id=10000 : 1,000,000/(10,000^1.1) = 39 times.

Obviously with θ=0 we are getting uniform distribution – each row is accessed equal times ( for row X: 1/(X^0) ).

There is a research that shows that user behavior can be described by this distribution: Zipf, Power-laws, and Pareto – a ranking tutorial

To see distribution on graphs, I took tables with 1mln rows and run row lookup 1 million times.

There are histograms on how many times each row selected for different θ: 0.5, 0.9, 1.1:

The curve is very skewed, so I zoomed graphs to show only 0-100k level:

I implemented Zipf for sysbench, right now it is in a separate tree https://code.launchpad.net/~vadim-tk/sysbench/zipf-distribution, you are welcome to try if it sounds interesting.

I am going to run couple incoming benchmarks with this distribution.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

New distribution of random generator for sysbench – Zipf

Май 10th, 2012

Sysbench has three distribution for random numbers: uniform, special and gaussian. I mostly use uniform and special, and I feel that both do not fully reflect my needs when I run benchmarks. Uniform is stupidly simple: for a table with 1 mln rows, each row gets equal amount of hits. This barely reflects real system, it also does not allow effectively test caching solution, each row can be equally put into cache or removed. That’s why there is special distribution, which is better, but to other extreme – it is skewed to very small percentage of rows, which makes this distribution good to test cache, but it is hard to emulate high IO load.

That’s why I was looking for alternatives, and Zipfian distribution seems decent one. This distribution has a parameter θ (theta), which defines how skewed the distribution is. A physical sense of this parameter, if to apply to database tables, is following: say row 1 accessed N, then row 2 is accessed 2^θ less times, row 3 is accessed 3^θ less, …, row X is accessed X^θ less times.
Say θ=1.1, then if row 1 accessed 1,000,000 times, then row 2 is : 1,000,000/(2^1.1)=466,516 times, row 3: 1,000,000/(2^1.1)=298,652 times, …, row id=10000 : 1,000,000/(10,000^1.1) = 39 times.

Obviously with θ=0 we are getting uniform distribution – each row is accessed equal times ( for row X: 1/(X^0) ).

There is a research that shows that user behavior can be described by this distribution: Zipf, Power-laws, and Pareto – a ranking tutorial

To see distribution on graphs, I took tables with 1mln rows and run row lookup 1 million times.

There are histograms on how many times each row selected for different θ: 0.5, 0.9, 1.1:

The curve is very skewed, so I zoomed graphs to show only 0-100k level:

I implemented Zipf for sysbench, right now it is in a separate tree https://code.launchpad.net/~vadim-tk/sysbench/zipf-distribution, you are welcome to try if it sounds interesting.

I am going to run couple incoming benchmarks with this distribution.

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing Fusion-io ioDrive – now with driver 3.1

Май 8th, 2012

In my previous post with results for Fusion-io ioDrive we saw some instability in results, I was pointed that it may be fixed in new drivers VSL 3.1.1. I am not sure if this driver is available for everyone – if you are interested, please contact your Fusion-io support representative. I installed new drivers and firmware, and in fact, the result improved.

Information about driver and firmware: Firmware v6.0.0, rev 107006. Fusion-io driver version: 3.1.1 build 172.

Actually an upgrade was not flawless, after a firmware upgrade I had to perform low-level formatting, which erase all data. So if you want to do the same – make sure you copy your data.

So there are results for driver 3.1 (with comparison to previous driver 2.3)

Random writes:

For random writes there is not much improvements, the throughput is about the same.

Random reads:

But there is a significant improvement for random reads. The results is stable on 640 MiB/sec level and it is higher than previously.

Sync random reads per threads, throughput:

Response time:

Again, there is improvement in throughput, in both in quality and absolute value. For response time – in some cases, there is 2x improvement.

So it seems for Fusion-io ioDrive it is worth to consider upgrade to 3.1 Driver (remember to copy your data before).

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing Fusion-io ioDrive – now with driver 3.1

Май 8th, 2012

In my previous post with results for Fusion-io ioDrive we saw some instability in results, I was pointed that it may be fixed in new drivers VSL 3.1.1. I am not sure if this driver is available for everyone – if you are interested, please contact your Fusion-io support representative. I installed new drivers and firmware, and in fact, the result improved.

Information about driver and firmware: Firmware v6.0.0, rev 107006. Fusion-io driver version: 3.1.1 build 172.

Actually an upgrade was not flawless, after a firmware upgrade I had to perform low-level formatting, which erase all data. So if you want to do the same – make sure you copy your data.

So there are results for driver 3.1 (with comparison to previous driver 2.3)

Random writes:

For random writes there is not much improvements, the throughput is about the same.

Random reads:

But there is a significant improvement for random reads. The results is stable on 640 MiB/sec level and it is higher than previously.

Sync random reads per threads, throughput:

Response time:

Again, there is improvement in throughput, in both in quality and absolute value. For response time – in some cases, there is 2x improvement.

So it seems for Fusion-io ioDrive it is worth to consider upgrade to 3.1 Driver (remember to copy your data before).

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing Virident FlashMAX 1400

Май 5th, 2012

I still continue to run benchmarks of different SSD cards. This time I show numbers for Virident FlashMAX 1400. This is a MLC PCIe SSD device. There are couple notes on these results.
First, this time I use a different server. For this benchmark it is Cisco UCS C250, while for previous results I used HP ProLiant DL380 G6.

Second note is, that I use a mode “turbo=1″ for Virident card. What does that mean? Apparently PCIe specification has a limitation on available power. If I am not mistaken it is 25W, however Virident to provide full write performance requires 28W. And while many servers can handle 28W on PCIe, this is a non-standard mode, and Virident by default uses 25W (turbo=0). To force full power, I load a driver with turbo=1. I also use “maxperformance” formatting for Virident, which gives less capacity (1.2TB visible for user), but reserves internally more space to provide better write performance.

So as usually I start with random writes, async.

Soon after initial period, the result stabilizes at 550 MiB/sec level.

Random read, async:

Random read throughput is very close to perfect line, and it is 1450 MiB/sec.
This is best read throughput I’ve seen so far in my benchmarks.

To see distribution of response time, the results for random read synchronous IO.

There we can see that 1450 MiB/sec is not quite achievable in sync mode, and only 64 threads are getting close.

Response time:

In the conclusion, from all tested cards, Virident FlashMAX shows the most stable results and the best absolute performance so far.

For reference, other results in series:

Follow @VadimTk


PlanetMySQL Voting: Vote UP / Vote DOWN