Archive for the ‘ec2’ Category

Benchmarking single-row insert performance on Amazon EC2

Май 16th, 2012

I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, but still the tool serves well for benchmark purposes.

OK, let’s start off with the configuration details.

Configuration

First of all let me describe the EC2 instance type that I used.

EC2 Configuration

I chose m2.4xlarge instance as that’s the instance type with highest memory available, and memory is what really really matters.

High-Memory Quadruple Extra Large Instance
68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

As for the IO configuration I chose 8 x 200G EBS volumes in software RAID 10.

Now let’s come to the MySQL configuration.

MySQL Configuration

I used Percona Server 5.5.22-55 for the tests. Following is the configuration that I used:

## InnoDB options
innodb_buffer_pool_size         = 55G
innodb_log_file_size            = 1G
innodb_log_files_in_group       = 4
innodb_buffer_pool_instances    = 4
innodb_adaptive_flushing        = 1
innodb_adaptive_flushing_method = estimate
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_max_dirty_pages_pct      = 50
innodb_io_capacity              = 800
innodb_read_io_threads          = 8
innodb_write_io_threads         = 4
innodb_file_per_table           = 1

## Disabling query cache
query_cache_size                = 0
query_cache_type                = 0

You can see that the buffer pool is sized at 55G and I am using 4 buffer pool instances to reduce the contention caused by buffer pool mutexes. Another important configuration that I am using is that I am using “estimate” flushing method available only on Percona Server. The “estimate” method reduces the impact of traditional InnoDB log flushing, which can cause downward spikes in performance. Other then that, I have also disabled query cache to avoid contention caused by query cache on write heavy workload.

OK, so that was all about the configuration of the EC2 instance and MySQL.

Now as far as the benchmark itself is concerned, I made no code changes to iiBench, and used the version available here. But I changed the table to use range partitioning. I defined a partitioning scheme such that every partition would hold 100 million rows.

Table Structure

The table structure of the table with no secondary indexes is as follows:

CREATE TABLE `purchases_noindex` (
  `transactionid` int(11) NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  PRIMARY KEY (`transactionid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (transactionid)
(PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

While the structure of the table with secondary indexes is as follows:

CREATE TABLE `purchases_index` (
  `transactionid` int(11) NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  PRIMARY KEY (`transactionid`),
  KEY `marketsegment` (`price`,`customerid`),
  KEY `registersegment` (`cashregisterid`,`price`,`customerid`),
  KEY `pdc` (`price`,`dateandtime`,`customerid`)
) ENGINE=InnoDB AUTO_INCREMENT=11073789 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (transactionid)
(PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

Also, I ran 5 instances of iiBench simultaneously to simulate 5 concurrent connections writing to the table, with each instance of iiBench writing 200 million single row inserts, for a total of 1 billion rows. I ran the test both with the table purchases_noindex which has no secondary index and only a primary index, and against the table purchases_index which has 3 secondary indexes. Another thing I would like to share is that, the size of the table without secondary indexes is 56G while the size of the table with secondary indexes is 181G.

Now let’s come down to the interesting part.

Results

With the table purchases_noindex, that has no secondary indexes, I was able to achieve an avg. insert rate of ~25k INSERTs Per Second, while with the table purchases_index, the avg. insert rate reduced to ~9k INSERTs Per Second. Let’s take a look at the graphs have a better view of the whole picture.

Note, in the above graph, we have “millions of rows” on the x-axis and “INSERTs Per Second” on the y-axis.
The reason why I have chosen to show “millions of rows” on the x-axis so that we can see the impact of growth in data-set on the insert rate.

We can see that adding the secondary indexes to the table has decreased the insert rate by 3x, and its not even consistent. While with the table having no secondary indexes, you can see that the insert rate is pretty much constant remaining between ~25k to ~26k INSERTs Per Second. But on the other hand, with the table having secondary indexes, we can see that there are regular spikes in the insert rate, and the variation in the rate can be classified as large, because it varies between ~6.5k to ~12.5k INSERTs per second, with noticeable spikes after every 100 million rows inserted.

I noticed that the insert rate drop was mainly caused by IO pressure caused by increase in flushing and checkpointing activity. This caused spikes in write activity to the point that the insert rate was decreased.

Conclusion

As we all now there are pros and cons to using secondary indexes. While secondary indexes cause read performance to improve, but they have an impact on the write performance. Well most of the apps rely on read performance and hence having secondary indexes is an obvious choice. But for those applications that are write mostly or that rely a lot on write performance, reducing the no. of secondary indexes or even going away with secondary indexes causes a write throughput increase of 2x to 3x. In this particular case, since I was mostly concerned with write performance, so I went ahead to choose a table structure with no secondary indexes. Other important things to consider when you are concerned with write performance is using partitioning to reduce the size of the B+tree, having multiple buffer pool instances to reduce contention problems caused by buffer pool mutexes, using “estimate” checkpoint method to reduce chances of log flush storms and disabling the query cache.


PlanetMySQL Voting: Vote UP / Vote DOWN

Taming the EC2 API

Декабрь 14th, 2011

I've been spending some time lately familiarizing myself with EC2, setting up some MySQL servers & clusters here and there, and doing some really basic configuration testing. One situation you'll run into when interacting with EC2 is that it gets unwieldy to use the AWS Management Console web interface for interacting with your instances. There ends up being lots of scrolling, lots of staring, and lots of sighs. Since I'm using SSH to connect to and interact with my instances, I want a reasonable way to find information about them on the Unix command line.

Amazon has an official set of tools [http://aws.amazon.com/developertools/351] that give you this information , at least theoretically. It is some gigantic distribution of shell scripts and Java madness that, if you are very patient, will eventually give you some information about your instances, in a format that is very difficult to work with.

$ time ./bin/ec2-describe-instances i-83c5d4e0
Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
RESERVATION     r-7db4731c      801025846226
INSTANCE        i-83c5d4e0      ami-31814f58                    stopped skysql-ec2      0               m1.small        2011-12-09T20:41:39+0000        us-east-1c    aki-805ea7e9                    monitoring-disabled             10.0.0.164      vpc-cd4fafa5    subnet-c44fafac ebs                                  paravirtual      xen             sg-134b547f     default
BLOCKDEVICE     /dev/sda1       vol-19ec6174    2011-12-10T01:30:32.000Z
TAG     instance        i-83c5d4e0      Name    ndb32-02

real    0m7.693s
user    0m10.119s
sys     0m0.451s

OK, it takes me about 7.5 seconds to get data about an instance, and it's given to me in 4 lines. If I get information about all of my data, I have no idea how I would be able to successfully grep through that to interact with any of it programatically. I went looking for a different solution, preferably one that would be faster, more flexible, and easier to use.

I found a great script called, simply, aws, written by Timothy Kay [http://timkay.com/aws/].

$ du -hsc ec2-api-tools*
 14M    ec2-api-tools-1.5.0.1-2011.11.30
 11M    ec2-api-tools.zip
 26M    total

$ ls -sk aws 
 76 aws

Ahem. I'll take a 76K perl script over a 14M mess any day. Let's see how it performs.

$ time aws din i-83c5d4e0
+------------+--------------+----------------------+------------------------------------------+------------+--------------+--------------------------+---------------------------------------------+--------------+----------------+-----------------+--------------+------------------+-----------------+---------------------------------------------+-------------------------------------------------------------------------------------------------+--------------+----------------+----------------+------------------------------------------------------------------------------------------------------------------------------------+--------------------+--------+------+----------+
| instanceId |   imageId    |    instanceState     |                  reason                  |  keyName   | instanceType |        launchTime        |                  placement                  |   kernelId   |   monitoring   |    subnetId     |    vpcId     | privateIpAddress | sourceDestCheck |                  groupSet                   |                                           stateReason                                           | architecture | rootDeviceType | rootDeviceName |                                                         blockDeviceMapping                                                         | virtualizationType | tagSet | key  |  value   |
+------------+--------------+----------------------+------------------------------------------+------------+--------------+--------------------------+---------------------------------------------+--------------+----------------+-----------------+--------------+------------------+-----------------+---------------------------------------------+-------------------------------------------------------------------------------------------------+--------------+----------------+----------------+------------------------------------------------------------------------------------------------------------------------------------+--------------------+--------+------+----------+
| i-83c5d4e0 | ami-31814f58 | code=80 name=stopped | User initiated (2011-12-10 01:29:51 GMT) | skysql-ec2 | m1.small     | 2011-12-09T20:41:39.000Z | availabilityZone=us-east-1c tenancy=default | aki-805ea7e9 | state=disabled | subnet-c44fafac | vpc-cd4fafa5 | 10.0.0.164       | true            | item= groupId=sg-134b547f groupName=default | code=Client.UserInitiatedShutdown message=Client.UserInitiatedShutdown: User initiated shutdown | i386         | ebs            | /dev/sda1      | item= deviceName=/dev/sda1 ebs= volumeId=vol-19ec6174 status=attached attachTime=2011-12-10T01:30:32.000Z deleteOnTermination=true | paravirtual        |        |      |          |
|            |              |                      |                                          |            |              |                          |                                             |              |                |                 |              |                  |                 |                                             |                                                                                                 |              |                |                |                                                                                                                                    |                    |        | Name | ndb32-02 |
+------------+--------------+----------------------+------------------------------------------+------------+--------------+--------------------------+---------------------------------------------+--------------+----------------+-----------------+--------------+------------------+-----------------+---------------------------------------------+-------------------------------------------------------------------------------------------------+--------------+----------------+----------------+------------------------------------------------------------------------------------------------------------------------------------+--------------------+--------+------+----------+

real    0m1.546s
user    0m0.123s
sys    0m0.035s

Well, the output format isn't exactly any more appealing than what you get from the Amazon tool, but it sure gives it to you a lot faster! A little poking around showed me that the aws tool allows you to forego the pretty-printing and get the actual XML that the tool receives from the AWS API.

$ aws --xml din i-83c5d4e0
<?xml version="1.0" encoding="UTF-8"?>
<DescribeInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2011-11-01/">
    <requestId>4e1bf76d-ad02-439b-b255-108e09713251</requestId>
    <reservationSet>
        <item>
            <reservationId>r-7db4731c</reservationId>
            <ownerId>801025846226</ownerId>
            <groupSet/>
            <instancesSet>
                <item>
                    <instanceId>i-83c5d4e0</instanceId>
                    <imageId>ami-31814f58</imageId>
                    <instanceState>
                        <code>80</code>
                        <name>stopped</name>
                    </instanceState>
                    <privateDnsName/>
                    <dnsName/>
                    <reason>User initiated (2011-12-10 01:29:51 GMT)</reason>
                    <keyName>skysql-ec2</keyName>
                    <amiLaunchIndex>0</amiLaunchIndex>
                    <productCodes/>
                    <instanceType>m1.small</instanceType>
                    <launchTime>2011-12-09T20:41:39.000Z</launchTime>
                    <placement>
                        <availabilityZone>us-east-1c</availabilityZone>
                        <groupName/>
                        <tenancy>default</tenancy>
                    </placement>
                    <kernelId>aki-805ea7e9</kernelId>
                    <monitoring>
                        <state>disabled</state>
                    </monitoring>
                    <subnetId>subnet-c44fafac</subnetId>
                    <vpcId>vpc-cd4fafa5</vpcId>
                    <privateIpAddress>10.0.0.164</privateIpAddress>
                    <sourceDestCheck>true</sourceDestCheck>
                    <groupSet>
                        <item>
                            <groupId>sg-134b547f</groupId>
                            <groupName>default</groupName>
                        </item>
                    </groupSet>
                    <stateReason>
                        <code>Client.UserInitiatedShutdown</code>
                        <message>Client.UserInitiatedShutdown: User initiated shutdown</message>
                    </stateReason>
                    <architecture>i386</architecture>
                    <rootDeviceType>ebs</rootDeviceType>
                    <rootDeviceName>/dev/sda1</rootDeviceName>
                    <blockDeviceMapping>
                        <item>
                            <deviceName>/dev/sda1</deviceName>
                            <ebs>
                                <volumeId>vol-19ec6174</volumeId>
                                <status>attached</status>
                                <attachTime>2011-12-10T01:30:32.000Z</attachTime>
                                <deleteOnTermination>true</deleteOnTermination>
                            </ebs>
                        </item>
                    </blockDeviceMapping>
                    <virtualizationType>paravirtual</virtualizationType>
                    <clientToken/>
                    <tagSet>
                        <item>
                            <key>Name</key>
                            <value>ndb32-02</value>
                        </item>
                    </tagSet>
                    <hypervisor>xen</hypervisor>
                </item>
            </instancesSet>
            <requesterId>058890971305</requesterId>
        </item>
    </reservationSet>
</DescribeInstancesResponse>

Sweet, sweet data! Hold on, though, I can't use grep to get at that. I'm going to have to remember how to interact with XML documents; I decided I had better see if I could dig up any XPath knowledge.

The next question was what tool I wanted to use to execute some XPath expressions against. I was not very keen on having to write an entire perl or python script to read the XML, build it into some DOM, and then loop several times over crusty data structures to get the data I wanted. I wanted to be able to do some more generalized things that are very easily accomplished in XPath, such as getting a list of instances based on a prefix of their Name, get a list of "stopped" instances, get a list of instances with public IP addresses, et cetera.

I figured there must be some command-line tool that would let me execute arbitrary XPath against an XML file. After poking around a while, I found XMLStarlet [http://xmlstar.sourceforge.net/]. Installing this on my MacBook Pro using Homebrew [http://mxcl.github.com/homebrew/] and I was off to the races.

After grappling for a very annoying amount of time with XML namespaces, I eventually figure I'd just strip the thing out so that I didn't have to deal with it. (If you leave the namespace in, you have to give it an alias and then specify that before every tag in your XPath expressions. No, thanks.)

$ cat strip_xmlns 
sed 's/ xmlns="[^>]*"//'

The xmlstarlet/xmlstar/xml tool works by specifying a template that includes some expression to match and some expressions to generate output. The tool does a lot, so some of the options can appear to be a bit verbose at first. Here's a very basic use of the tool to get just a list of instance IDs:

$ aws --xml din | strip_xmlns | xml sel -T -t -m '//instancesSet/item' -v 'instanceId' -n
i-d1dbceb2
i-afdacfcc
i-cbc2d7a8
i-99bfaafa
i-1d40547e
i-f7c5d494
i-83c5d4e0
i-77c4d514
i-75c4d516
i-47feee24
i-707d9512

You can see the XSLT that the tool is applying internally by using the -C option:

$ aws --xml din | strip_xmlns | xml sel -C -t -m '//instancesSet/item' -v 'instanceId' -n
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="//instancesSet/item">
<xsl:value-of select="instanceId"/>
<xsl:value-of select="'&#10;'"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

OK, so, there's a tool that will let me execute some XPath and get back information about my instances, that's nice. Instead of trying to parse some formatted output, I should be able to select the XML elements I want for a particular task.

Say I want the instance IDs of all instances that are stopped:

  aws --xml din | strip_xmlns | xml sel -T -t -m '//instancesSet/item[instanceState/name="stopped"]' -v 'instanceId' -n

Or maybe I want all instances with Names that start with the string "ndb":

  aws --xml din | strip_xmlns | xml sel -T -t -m '//instancesSet/item[starts-with(tagSet/item[key="Name"]/value, "ndb32")]' -v 'instanceId' -n

Instead of having to write several loops in perl or python, I'm able to write a very straightforward expression that matches just the nodes I want. Instead of writing that XPath every time, of course, I'll put a few of the more popular ones into a script along with some flexibility to provide arbitrary filtering. (I call this WHERE in the script because that's the first thing my DBMS-addled brain came up with!)

#!/bin/bash
while getopts "p:s:w:" OPTION
do
    case $OPTION in
        p)
            WHERE="[starts-with(tagSet/item[key='Name']/value, '$OPTARG')]"
            ;;
        s)
            WHERE="[instanceState/name = '$OPTARG']"
            ;;
        w)
            WHERE="$OPTARG"
            ;;
    esac
done

MATCHEXPR="/DescribeInstancesResponse/reservationSet/item/instancesSet/item$WHERE"

aws --xml din | strip_xmlns | xml sel -T -t -m "$MATCHEXPR" -o "instanceId    " -v instanceId -n -o "instanceName    " -v tagSet/item[key=\"Name\"]/value -n -o "privateIp    " -v privateIpAddress -n -o "ipAddress    " -v ipAddress -n -o "instanceState    " -v instanceState/name -n -n

$ ec2-ls -p ndb
$ ec2-ls -s stopped
$ ec2-ls -w "[instanceType='m1.small']"

My script returns several items that may or may not be of interest to others. Further extension to the script could easily make the list of items returned a bit more useful. From that basically reasonable if limited script, I vastly overreached my bash skills and turned it into this monstrosity:

#!/bin/bash
OUTPUT="instanceId;instanceName:tagSet/item[key='Name']/value;privateIp:privateIpAddress;ipAddress;instanceState:instanceState/name"
DELIM=" " #there might be a <tab> in there!
declare -a XMLARGS

push()            # Push item on stack.
{
        if [ -z "$1" ]    # Nothing to push?
        then
          return
        fi
        XMLARGS[${#XMLARGS[*]}]="$1"
        return
}


while getopts "p:s:w:o:d:D" OPTION
do
        case $OPTION in
                p)
                        WHERE="[starts-with(tagSet/item[key='Name']/value, '$OPTARG')]"
                        ;;
                s)
                        WHERE="[instanceState/name = '$OPTARG']"
                        ;;
                w)
                        WHERE="$OPTARG"
                        ;;
                o)
                        OUTPUT="$OPTARG"
                        ;;
                d)
                        DELIM="$OPTARG"
                        ;;
                D)
                        DEBUG=1
                        ;;
                
        esac
done
shift $((OPTIND-1)) #something about argument processing, supposedly

for i in "sel" "-T" "-t" "-m"; do
        push "$i"
done

MATCHEXPR="/DescribeInstancesResponse/reservationSet/item/instancesSet/item$WHERE"
push "$MATCHEXPR";

OLDIFS=$IFS;
IFS=";"
for f in $OUTPUT; do
        FIELDNAME=$(echo $f | cut -d':' -f 1)
        FIELDEXPR=$(echo $f | cut -d':' -f 2)
        if [[ -z $FIELDEXPR ]]; then
                FIELDEXPR=$FIELDNAME
        fi
        push "-o";
        push "$FIELDNAME$DELIM";
        push "-v";
        push "$FIELDEXPR";
        push "-n";
done
push "-n";
IFS=$OLDIFS


if [[ $DEBUG -eq 1 ]]; then
        echo "$MATCHEXPR" >&2 
        echo "${XMLARGS[@]}" >&2
fi

aws --xml din | strip_xmlns | xml "${XMLARGS[@]}"

I'm sure there are plenty of problems with that script, but at least now I can finally get the information I want about my EC2 instances!

$ ec2-ls -p ndb32 -o "instanceId;privateIp:privateIpAddress"

PlanetMySQL Voting: Vote UP / Vote DOWN

Public Clouds: Trust but Verify

Октябрь 14th, 2011

Review of Thursday’s Cloud Events in Boston

Everyone is well aware by now of the EC2 outage that Amazon had back in April and it would have surprised no one if that high profile had put a damper on cloud adoption. But judging what we heard yesterday at Boston’s two cloud events (MassTLC’s Cloud Computing Summit and Vilna’s Moving Your Data to the Cloud Panel), cloud solutions can work just fine. For example, there was the customer story told by Douglas Kim, Managing Director, Global Head, PaaS & Cloud Computing at PegaSystems. Pegasystems is a Boston tech company that started offering cloud versions of its BPM services to conservative Fortune 500 customers in regulation-laden fields such as healthcare and finance. After migrating over a major healthcare customer to the cloud, Kim asked the COO how they internally overcame the concerns about complying with HIPAA requirements as they considered the cloud. The COO admitted they were actually already facing $120M in HIPAA violations in the past year – from using their in-house solution! In other words, before throwing too many stones at Amazon (or other cloud providers), ask if you can really do better.

MassTLC: The monkey is not in the cloud

So, with two events focused on the cloud yesterday, we know bloggers, analysts, VCs and press are hot on the topic – how about actual adoption? Bruce Guptill, Senior Vice President and Head of Research at Saugatuck Technology noted that cloud adoption levels were about one in three for new IT applications last year, but that we are heading to a one in two tipping point in 2014. He further noted that buyers are demanding cloud offerings, which in turn is driving 90% of ISV’s to some sort of cloud based presence.

Michael Skok, a General Partner at North Bridge Venture Partners, noted that the biggest drivers for cloud adoption are agility, scalability and cost, based on a study his firm completed in conjunction with several research houses. That will continue to evolve. In five years, Skok thinks the drivers will hinge on innovation, mobility, APIs and competitive pressures. In terms of inhibitors, while security is always a top concern, Bob Shinn, Founder and Senior Managing Partner, Cloud Silver Lining noted that “even the CIA is in the cloud.” Portability was also observed to be a key item to address. Chris Brookins, VP of Engineering, Acquia noted that by resolving the portability issue (i.e., vendor lock-in) they have been able to grow to 60k users.

Business models were also up for debate. Dan Pelton, CIO, Enterasys Networks claimed that companies selling cloud applications should really follow the Google model to make money (subscription price based on number of users) because it is scalable and easy to understand. Kim argued for going even further with enterprise verticals to find ways to connect pricing to actual business value. This includes, as an example from the insurance industry, charging pennies per claim, which can be directly tied to the bottom line.

Vilna: Does anyone have a prayer to beat Google?

Alright, so how do we spur adoption, especially in the wake of public outages? Shin noted that even though FUD still exists, the reality is that “most people’s data centers are not close to the security and quality of Google.” Still have doubts? Kim suggests jumping in with both feet, but not blindly. For how to do this, he pointed to Netflix. They avoided catastrophic failures with Amazon because they implemented across zones and because they were constantly testing for failure, with what they call their “Chaos Monkey.” This is a program that randomly terminates computer processes and services in Netflix’s IT infrastructure architecture to continuously test resiliency. In the end, when it comes to public clouds, perhaps it is best to “trust but verify.”


PlanetMySQL Voting: Vote UP / Vote DOWN

Shard-Query EC2 images available

Май 12th, 2011

Infobright and InnoDB AMI images are now available

There are now demonstration AMI images for Shard-Query. Each image comes pre-loaded with the data used in the previous Shard-Query blog post. The data in the each image is split into 20 “shards”. This blog post will refer to an EC2 instances as a node from here on out. Shard-Query is very flexible in it’s configuration, so you can use this sample database to spread processing over up to 20 nodes.

The Infobright Community Edition (ICE) images are available in 32 and 64 bit varieties. Due to memory requirements, the InnoDB versions are only available on 64 bit instances. MySQL will fail to start on a micro instance, simply decrease the values in the /etc/my.cnf file if you really want to try micro instances.

Where to find the images

Amazon ID Name Arch Notes
ami-20b74949 shard-query-infobright-demo-64bit x86_64 ICE 3.5.2pl1. Requires m1.large or larger
ami-8eb648e7 shard-query-innodb-demo-64bit x86_64 Percona Server 5.5.11 with XtraDB. Requires m1.large or larger.
ami-f65ea19f shard-query-infobright-demo i686 ICE 3.5.2pl1 32bit. Requires m1.small or greater.
snap-073b6e68 shard-query-demo-data-flatfiles 30GB ext3 EBS This is an ext3 volume which contains the flat files for the demos, if you want to reload on your favorite storage engine or database

About the cluster

For best performance, there should be an even data distribution in the system. To get an even distribution, the test data was hashed over the values in the date_id column. There will be another blog post about the usage and performance of the splitter. It is multi-threaded(actually multi-process) and is able to hash split up to 50GB/hour of input data on my i970 test machine. It is possible to distribute splitting and/or loading among multiple nodes as well. Note that in the demonstration each node will contain redundant, but non-accessed data for all configurations of more than one node. This would not be the case in normal circumstances. The extra data will not impact performance because it will never be accessed.

Since both InnoDB and ICE versions of the data are available it is important to examine the differences in size. This will give us some interesting information about how Shard-Query will perform on each database. To do the size comparison, I used the du utility:

InnoDB file size on disk: 42GB (with indexes)

# du -sh *
203M    ibdata1
128M    ib_logfile0
128M    ib_logfile1
988K    mysql
2.1G    ontime1
2.1G    ontime10
2.1G    ontime11
2.1G    ontime12
2.1G    ontime13
2.1G    ontime14
2.1G    ontime15
2.1G    ontime16
2.1G    ontime17
2.1G    ontime18
2.1G    ontime19
2.1G    ontime2
2.1G    ontime20
2.1G    ontime3
2.1G    ontime4
2.1G    ontime5
2.1G    ontime6
2.1G    ontime7
2.1G    ontime8
2.1G    ontime9
212K    performance_schema
0       test

ICE size on disk: 2.5GB

# du -sh *
8.0K    bh.err
11M     BH_RSI_Repository
4.0K    brighthouse.ini
4.0K    brighthouse.log
4.0K    brighthouse.seq
964K    mysql
123M    ontime1
124M    ontime10
123M    ontime11
123M    ontime12
123M    ontime13
123M    ontime14
123M    ontime15
123M    ontime16
123M    ontime17
123M    ontime18
124M    ontime19
124M    ontime2
124M    ontime20
124M    ontime3
123M    ontime4
122M    ontime5
122M    ontime6
122M    ontime7
123M    ontime8
125M    ontime9

The InnoDB data directory size is 42GB, which is twice the original size of the input data. The ICE schema was discussed in the comments of the last post. ICE does not have any indexes (not even primary keys).

Here is the complete InnoDB schema from one shard. The schema is duplicated 20 times (but not the ontime_fact data):

DROP TABLE IF EXISTS `dim_airport`;
CREATE TABLE `dim_airport` (
  `airport_id` int(11) NOT NULL DEFAULT '0',
  `airport_code` char(3) DEFAULT NULL,
  `CityName` varchar(100) DEFAULT NULL,
  `State` char(2) DEFAULT NULL,
  `StateFips` varchar(10) DEFAULT NULL,
  `StateName` varchar(50) NOT NULL,
  `Wac` int(11) DEFAULT NULL,
  PRIMARY KEY (`airport_id`),
  KEY `CityName` (`CityName`),
  KEY `State` (`State`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Data from BTS ontime flight data.  Data for Origin and Destination airport data.';

CREATE TABLE `dim_date` (
  `Year` year(4) DEFAULT NULL,
  `Quarter` tinyint(4) DEFAULT NULL,
  `Month` tinyint(4) DEFAULT NULL,
  `DayofMonth` tinyint(4) DEFAULT NULL,
  `DayOfWeek` tinyint(4) DEFAULT NULL,
  `FlightDate` date NOT NULL,
  `date_id` smallint(6) NOT NULL,
  PRIMARY KEY (`date_id`),
  KEY `FlightDate` (`FlightDate`),
  KEY `Year` (`Year`,`Quarter`,`Month`,`DayOfWeek`),
  KEY `Quarter` (`Quarter`,`Month`,`DayOfWeek`),
  KEY `Month` (`Month`,`DayOfWeek`),
  KEY `DayOfWeek` (`DayOfWeek`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains the date information from the BTS ontime flight data.  Note dates may not be in date_id order';
/*!40101 SET character_set_client = @saved_cs_client */;

CREATE TABLE `dim_flight` (
  `UniqueCarrier` char(7) DEFAULT NULL,
  `AirlineID` int(11) DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `FlightNum` varchar(10) DEFAULT NULL,
  `flight_id` int(11) NOT NULL DEFAULT '0',
  `AirlineName` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`flight_id`),
  KEY `UniqueCarrier` (`UniqueCarrier`,`AirlineID`,`Carrier`),
  KEY `AirlineID` (`AirlineID`,`Carrier`),
  KEY `Carrier` (`Carrier`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains information on flights, and what airline offered those flights and the flight number of the flight.  Some data hand updated.';

--
-- Table structure for table `ontime_fact`
--

CREATE TABLE `ontime_fact` (
  `date_id` int(11) NOT NULL DEFAULT '0',
  `origin_airport_id` int(11) NOT NULL DEFAULT '0',
  `dest_airport_id` int(11) NOT NULL DEFAULT '0',
  `flight_id` int(11) NOT NULL DEFAULT '0',
  `TailNum` varchar(50) DEFAULT NULL,
  `CRSDepTime` int(11) DEFAULT NULL,
  `DepTime` int(11) DEFAULT NULL,
  `DepDelay` int(11) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `DepDel15` int(11) DEFAULT NULL,
  `DepartureDelayGroups` int(11) DEFAULT NULL,
  `DepTimeBlk` varchar(20) DEFAULT NULL,
  `TaxiOut` int(11) DEFAULT NULL,
  `WheelsOff` int(11) DEFAULT NULL,
  `WheelsOn` int(11) DEFAULT NULL,
  `TaxiIn` int(11) DEFAULT NULL,
  `CRSArrTime` int(11) DEFAULT NULL,
  `ArrTime` int(11) DEFAULT NULL,
  `ArrDelay` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `ArrDel15` int(11) DEFAULT NULL,
  `ArrivalDelayGroups` int(11) DEFAULT NULL,
  `ArrTimeBlk` varchar(20) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  `CRSElapsedTime` int(11) DEFAULT NULL,
  `ActualElapsedTime` int(11) DEFAULT NULL,
  `AirTime` int(11) DEFAULT NULL,
  `Flights` int(11) DEFAULT NULL,
  `Distance` int(11) DEFAULT NULL,
  `DistanceGroup` tinyint(4) DEFAULT NULL,
  `CarrierDelay` int(11) DEFAULT NULL,
  `WeatherDelay` int(11) DEFAULT NULL,
  `NASDelay` int(11) DEFAULT NULL,
  `SecurityDelay` int(11) DEFAULT NULL,
  `LateAircraftDelay` int(11) DEFAULT NULL,
  `FirstDepTime` varchar(10) DEFAULT NULL,
  `TotalAddGTime` varchar(10) DEFAULT NULL,
  `LongestAddGTime` varchar(10) DEFAULT NULL,
  `DivAirportLandings` varchar(10) DEFAULT NULL,
  `DivReachedDest` varchar(10) DEFAULT NULL,
  `DivActualElapsedTime` varchar(10) DEFAULT NULL,
  `DivArrDelay` varchar(10) DEFAULT NULL,
  `DivDistance` varchar(10) DEFAULT NULL,
  `Div1Airport` varchar(10) DEFAULT NULL,
  `Div1WheelsOn` varchar(10) DEFAULT NULL,
  `Div1TotalGTime` varchar(10) DEFAULT NULL,
  `Div1LongestGTime` varchar(10) DEFAULT NULL,
  `Div1WheelsOff` varchar(10) DEFAULT NULL,
  `Div1TailNum` varchar(10) DEFAULT NULL,
  `Div2Airport` varchar(10) DEFAULT NULL,
  `Div2WheelsOn` varchar(10) DEFAULT NULL,
  `Div2TotalGTime` varchar(10) DEFAULT NULL,
  `Div2LongestGTime` varchar(10) DEFAULT NULL,
  `Div2WheelsOff` varchar(10) DEFAULT NULL,
  `Div2TailNum` varchar(10) DEFAULT NULL,
  `Div3Airport` varchar(10) DEFAULT NULL,
  `Div3WheelsOn` varchar(10) DEFAULT NULL,
  `Div3TotalGTime` varchar(10) DEFAULT NULL,
  `Div3LongestGTime` varchar(10) DEFAULT NULL,
  `Div3WheelsOff` varchar(10) DEFAULT NULL,
  `Div3TailNum` varchar(10) DEFAULT NULL,
  `Div4Airport` varchar(10) DEFAULT NULL,
  `Div4WheelsOn` varchar(10) DEFAULT NULL,
  `Div4TotalGTime` varchar(10) DEFAULT NULL,
  `Div4LongestGTime` varchar(10) DEFAULT NULL,
  `Div4WheelsOff` varchar(10) DEFAULT NULL,
  `Div4TailNum` varchar(10) DEFAULT NULL,
  `Div5Airport` varchar(10) DEFAULT NULL,
  `Div5WheelsOn` varchar(10) DEFAULT NULL,
  `Div5TotalGTime` varchar(10) DEFAULT NULL,
  `Div5LongestGTime` varchar(10) DEFAULT NULL,
  `Div5WheelsOff` varchar(10) DEFAULT NULL,
  `Div5TailNum` varchar(10) DEFAULT NULL,
  KEY `date_id` (`date_id`),
  KEY `flight_id` (`flight_id`),
  KEY `origin_airport_id` (`origin_airport_id`),
  KEY `dest_airport_id` (`dest_airport_id`),
  KEY `DepDelay` (`DepDelay`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains all avaialble data from 1988 to 2010';

mysql> use ontime1;
Database changed

mysql> show table status like 'ontime_fact'\G
*************************** 1. row ***************************
           Name: ontime_fact
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 6697533
 Avg_row_length: 241
    Data_length: 1616904192
Max_data_length: 0
   Index_length: 539279360
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2011-05-10 04:26:14
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment: Contains all avaialble data from 1988 to 2010
1 row in set (0.00 sec)

With ICE, after compression there is only 2.5GB of data, so ICE gets over 16:1 compression ratio, which is quite nice. Each shard contains only 128MB of data!

Storage engine makes a big difference

In general, a column store performs about 8x-10x better than a row store for queries which access a significant amount of data. One big reason for this is the excellent compression that RLE techniques provide.
I have not loaded InnoDB compressed tables yet but since InnoDB compression is not RLE, I doubt it will have the same impact.

For large datasets effective compression results in the need for fewer nodes in order to keep data entirely in memory. This frees disk to use on-disk temporary storage for hash joins and other background operations. This will have a direct impact in our query response times and throughput.

Setting up a cluster using the AMI images

You can easily test Shard-Query for yourself. Spin up the desired number of EC2 instances using on of the the AMI images. You should spin a number of instances that evenly divides into 20 for best results. There is a helpful utility (included in the image) to help configure the cluster and it uses a copy of this text on this page. To use it, ensure:

  1. That only the instances that you want to use are shown in the EC2 console.
  2. That the “private ip” field is selected in the list of columns to show (click show/hide to change the columns)
  3. That the “public dns” field is selected

SSH to the public DNS entry of the node on the list of nodes. This node will become “shard1″.

Now, in the EC2 console hit CTRL-A to select all text on the page and then CTRL-C to copy it. Paste this into a text file on shard1 called “/tmp/servers.txt” and run the following commands:

$ cat servers.txt | grep "10\."| grep -v internal |tee hosts.internal
[host list omitted]

Now you need to set up the hosts file:

sudo su -
# cat hosts.internal | ~ec2-user/tools/mkhosts >> /etc/hosts

# ping shard20
PING shard20 (10.126.15.34) 56(84) bytes of data.
64 bytes from shard20 (10.126.15.34): icmp_seq=1 ttl=61 time=0.637 ms
...

Note: There is no need to put that hosts file on your other nodes unless you want to run workers on them.

Generate a cluster configuration

There is a script provided to generate the shards.ini file for testing an cluster of 1 to 20 nodes.

cd shard-query

#generate a config for 20 shards (adjust to your number of nodes)
php genconfig 20 > shards.ini

Running the test

For best performance, you should run the workers on one or two nodes. You should start two workers per core in the cluster.

First start gearmand:

gearmand -p 7000 -d

Then start the workers on node 1 (assuming a 20 node cluster):

cd shard-query
./start_workers 80

I normally start (2 * TOTAL_CLUSTER_CORES) workers. That is, if you have 20 machines, each with 2 cores, run 80 workers.

Test the system. You should see the following row count (the first number is wall time, the second exec time, the third parse time).

$ echo "select count(*) from ontime_fact;" | ./run_query

Array
(
    [count(*)] => 135125787
)
1 rows returned (0.084244966506958s, 0.078309059143066s, 0.0059359073638916s)

Execute the test:

As seen above, the run_query script will run one more more semicolon terminated SQL statements. The queries for the benchmark are in ~ec2-user/shard-query/queries.sql.

I have also provided a convenient script which will summarize the output from the ./run_query command, called pivot_results

cd shard-query/
$ ./run_query < queries.sql | tee raw |./pivot_results &
[1] 12359
$ tail -f ./raw
-- Q1
...

At the end, you will get a result output that is easy to graph in a spreadsheet:

$ cat raw | ./pivot_results
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8.0,Q8.1,Q8.2,Q8.3,Q8.4,Q9,Q10,Q11
34.354,60.978,114.175,27.138,45.751,14.905,14.732,34.946,126.599,250.222,529.287,581.295,11.042,63.366,14.573

InnoDB my.cnf

[client]
port=3306
socket=/tmp/mysql-inno.sock

[mysqld]
socket=/tmp/mysql-inno.sock
default-storage-engine=INNODB
innodb-buffer-pool-instances=2
innodb-buffer-pool-size=5600M
innodb-file-format=barracuda
innodb-file-per-table
innodb-flush-log-at-trx-commit=1
innodb-flush-method=O_DIRECT
innodb-ibuf-active-contract=1
innodb-import-table-from-xtrabackup=1
innodb-io-capacity=1000
innodb-log-buffer-size=32M
innodb-log-file-size=128M
innodb-open-files=1000
innodb_fast_checksum
innodb-purge-threads=1
innodb-read-ahead=linear
innodb-read-ahead-threshold=8
innodb-read-io-threads=16
innodb-recovery-stats
innodb-recovery-update-relay-log
innodb-replication-delay=#
innodb-rollback-on-timeout
innodb-rollback-segments=16
innodb-stats-auto-update=0
innodb-stats-on-metadata=0
innodb-stats-sample-pages=256
innodb-stats-update-need-lock=0
innodb-status-file
innodb-strict-mode
innodb-thread-concurrency=0
innodb-thread-concurrency-timer-based
innodb-thread-sleep-delay=0
innodb-use-sys-stats-table
innodb-write-io-threads=4
join-buffer-size=16M
key-buffer-size=64M
local-infile=on
lock-wait-timeout=300
log-error=/var/log/mysqld-innodb.log
max-allowed-packet=1M
net-buffer-length=16K
#we value throughput over response time, get a good plan
optimizer-prune-level=0
partition=ON
port=3306
read-buffer-size=512K
read-rnd-buffer-size=1M
skip-host-cache
skip-name-resolve
sort-buffer-size=512K
sql-mode=STRICT_TRANS_TABLES
symbolic-links
table-definition-cache=16384
table-open-cache=128
thread-cache-size=32
thread-stack=256K
tmp-table-size=64M
transaction-isolation=READ-COMMITTED
user=mysql
wait-timeout=86400

To be continued

You can now set up a cluster from 1 to 20 nodes for testing. This way you can verify the numbers in my next blog post. I will compare performance of various cluster sizes on both storage engines.


PlanetMySQL Voting: Vote UP / Vote DOWN

Amazon now accepts hard drives for EC2 data transfer

Июнь 13th, 2010

I guess they got tired of people sending angry emails about data transfer fees:

“Amazon provides an online calculator to help customers decide whether it makes financial sense to ship data via mail rather than uploading over the Internet. You plug in the number of terabytes, devices, average file size, return shipping information and other factors, and find out how much the data transfer would cost via mail compared to standard Internet uploads.

For example, transferring data from a single device containing 2TB would require 26 hours of data loading time and cost $144.74. Uploading the same amount of data over the Internet would cost $204.80. The calculator does not show how long the Internet transfer would take.”

http://www.networkworld.com/news/2010/061010-amazon-cloud-fedex.html


PlanetMySQL Voting: Vote UP / Vote DOWN

Piper Jaffray on the Cloud

Март 16th, 2010

Piper Jaffray has published a 300+ page study on the cloud computing industry based on a recent survey undertaken of 100 CIOs. Bottom line, cloud computing is expected to grow significantly over the next five years. 

    Survey respondents expect the mix of cloud computing to escalate strongly to 13.5% in five years. This equates to a five-year CAGR of 19.2%, or 23.9% when we also incorporate IDC’s forecast that total software budgets will grow 4.7% annually. In other words, software spending will grow gradually in the next five years, but the mix of spend allocated to cloud-based applications will likely surge rapidly. Another way to think about the data is that the Cloud Computing market is expected to grow five times as fast as the broader software market: 23.9% vs. 4.7%.

If anything, I think the prediction is conservative and the impact could be much larger in magnitude when mainstream adoption occurs.  But the risk is that adoption takes longer, just as it did for open source software.  And as the report indicates, open source is powering much of the cloud computing that's going on:

    The next-generation Cloud Computing data centers are NOT running Microsoft Windows; they are increasingly leveraging the compelling economics of open source components. For example, the data centers powering Amazon, Google, and salesforce.com all run on Linux and other open source technologies. In fact, Red Hat’s operating system and the MySQL database are key components to many of the leading-edge Clouds being developed today. 

Why is this occurring? Because open source leverages a global community development process which results in a product that evolves rapidly, provides transparency into the source code dynamics, and surpasses other products in terms of security and reliability – all at a lower total cost of ownership (TCO) than traditional offerings.


PlanetMySQL Voting: Vote UP / Vote DOWN

Dissection of EC2 / EBS volume

Август 8th, 2009

So during preparation of XtraDB template for EC2 I wanted to understand what IO characteristics we can expect from EBS volume ( I am speaking about single volume, not RAID as in my previous post). Yasufumi did some benchmarks and pointed me on interesting behavior, there seems several level of caching on EBS volume.

Let me show you. I did sysbench random read IO benchmark on files with size from 256M to 5GB with step 256M. And, as Morgan pointed me, I previously made first write, to avoid first-write penalty:

dd if=/dev/zero of=/dev/sdk bs=1M

for reference script is:

CODE:
  1. #!/bin/sh
  2. set -u
  3. set -x
  4. set -e
  5.  
  6. for size in `seq 256 256 5888`; do
  7.    for mode in rndrd; do
  8.       ./sysbench --test=fileio --file-num=1 --file-total-size=${size}M prepare
  9.       for threads in 1; do
  10.          echo PARAMS $size $mode $threads> sysbench-size-$size-mode-$mode-threads-$threads
  11.          ./sysbench --test=fileio --file-total-size=${size}M --file-test-mode=$mode\
  12.             --max-time=60 --max-requests=10000000 --num-threads=$threads --init-rng=on \
  13.             --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 run \
  14.            >> sysbench-size-$size-mode-$mode-threads-$threads 2>&1
  15.       done
  16.      ./sysbench --test=fileio --file-total-size=${size}M cleanup
  17.    done
  18. done

And raw results (for m.large instance, though for m.xlarge it was similar) are available on page
https://spreadsheets.google.com/ccc?key=0AjsVX7AnrCYwdFlBVW9KWVJGUGFqeVdpUHY0Y0VXYXc&hl=en, see Sheet "256_5GB filesize".

Results in graph are:

randrd_sizes
So can you see several levels of results
256M-1.25G , 1.5G - 2.25G, 2.5G + .

With 1.5G-2.25G we see performance comparable with RAID10 on 4 disks, and with
2.5G+ results are similar for single HDD performance.

So we may guess the schema of storage is
schema

So running InnoDB on database bigger 2.5G, you may expect performance as from single HDD, and you may consider some RAID setup, see my previous post
EC2/EBS single and RAID volumes IO benchmark


Entry posted by Vadim | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

EC2/EBS single and RAID volumes IO benchmark

Август 7th, 2009

During preparation of Percona-XtraDB template to run in RightScale environment, I noticed that IO performance on EBS volume in EC2 cloud is not quite perfect. So I have spent some time benchmarking volumes. Interesting part with EBS volumes is that you see it as device in your OS, so you can easily make software RAID from several volumes.

So I created 4 volumes ( I used m.large instance), and made:

RAID0 on 2 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 2 -l 0 /dev/sdj /dev/sdk

RAID0 on 4 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 4 -l 0 /dev/sdj /dev/sdk /dev/sdl /dev/sdm

RAID5 on 3 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 3 -l 0 /dev/sdj /dev/sdk /dev/sdl

RAID10 on 4 volumes in two steps:

mdadm -v --create /dev/md0 --chunk=256 --level=raid1 --raid-devices=2 /dev/sdj /dev/sdk
mdadm -v --create /dev/md1 --chunk=256 --level=raid1 --raid-devices=2 /dev/sdm /dev/sdl

and
mdadm -v --create /dev/md2 --chunk=256 --level=raid0 --raid-devices=2 /dev/md0 /dev/md1

And also in Linux you can create tricky RAID10,f2 (you can read what is this here http://www.mythtv.org/wiki/RAID)

mdadm -C /dev/md0 --chunk=256 -n 4 -l 10 -p f2 /dev/sdj /dev/sdk /dev/sdk /dev/sdm

and also I tested IO on single volume.

I used xfs filesystem mounted with noatime,nobarrier options

and for benchmark I used sysbench fileio modes on 16GB file with next script:

CODE:
  1. #!/bin/sh
  2. set -u
  3. set -x
  4. set -e
  5.  
  6. for size in 256M 16G; do
  7.    for mode in seqwr seqrd rndrd rndwr rndrw; do
  8.       ./sysbench --test=fileio --file-num=1 --file-total-size=$size prepare
  9.       for threads in 1 4 8 16; do
  10.          echo PARAMS $size $mode $threads> sysbench-size-$size-mode-$mode-threads-$threads
  11.          ./sysbench --test=fileio --file-total-size=$size --file-test-mode=$mode\
  12.             --max-time=60 --max-requests=10000000 --num-threads=$threads --init-rng=on \
  13.             --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 run \
  14.            >> sysbench-size-$size-mode-$mode-threads-$threads 2>&1
  15.       done
  16.      ./sysbench --test=fileio --file-total-size=$size cleanup
  17.    done
  18. done

So tested modes: seqrd (sequential read), seqwr (sequential write), rndrd (random read), rndwr (random write), rndrw (random read-write). And sysbench uses 16KB pagesize to emulate work of InnoDB with 16KB pagesize.

Raw results you may find in Google Docs https://spreadsheets.google.com/ccc?key=0AjsVX7AnrCYwdFlBVW9KWVJGUGFqeVdpUHY0Y0VXYXc&hl=en
, but let me show most interesting results from my point of view. On graphs I show requests / second (more is better) and response time in ms for 95% cases (less is better).
random read
random write
random read-write

What I see from the results is that if you are looking for IO performance in EC2/EBS environment it's definitely worth to consider some RAID setup.
RAID5 does not show benefits comparing with others, and RAID10,f2 is worse than RAID10.
But speaking RAID0 vs RAID10 it's your call. For sure in regular server I'd never suggest RAID0 for database, but speaking about EBS I am not sure what guarantee Amazon gives here. I'd expect under EBS volume there already exists redundant array, and it may not worth to add additional redundancy, but I am not sure in that.
For now I'd consider RAID10 on 4 - 10 volumes.
And of course to get benefit from multi-threading IO in MySQL you need to use XtraDB or MySQL 5.4 ®

However there may be small problem with backup over EBS. On single EBS volume you can just do snapshot, but on several volumes it may be tricky. But in this case you may consider LVM snapshots or XtraBackup


Entry posted by Vadim | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN