Archive for the ‘hardware’ Category

dbbenchmark.com – Site launched

Август 28th, 2010

Welcome to DBbenchmarks.com, a publicly accessible database that tracks anonymously submitted data about MySQL server performance. You can use this site to see research the performance of certain types of hardware when running MySQL. Our open-source benchmarking script is free to own and use, we only ask that you allow the script to connect to this database and submit the results. All results and data collected is anonymous and viewable on this site. We only track performance data from MySQL – you can see the list on the About page.

Check out the database of benchmarks here: [link]


PlanetMySQL Voting: Vote UP / Vote DOWN

Four short links: 1 July 2010

Июль 1st, 2010

  1. Conflict Minerals and Blood Tech (Joey Devilla) -- electronic components have a human and environmental cost. I remember Saul Griffith asking me, "do you want to kill gorillas or dolphins?" for one component. Now we can add child militias and horrific rape to the list. (via Simon Willison)
  2. Meteor -- an open source HTTP server that serves streaming data feeds (for apps that need Comet-style persistent connections). (via gianouts on Delicious)
  3. Hobby King RC Store -- online source for remote control goodness, as recommended by Dan Shapiro at Foo.
  4. RethinkDB -- MySQL storage engine optimised for SSD drives. See also TechCrunch article.


PlanetMySQL Voting: Vote UP / Vote DOWN

mount: /dev/sdb1 already mounted or /mysql busy

Июнь 10th, 2010

We added a 500GB 7.2K SATA/300 Hitachi Deskstar E7K500 16MB disk to one of our dev servers and partitioned using fdisk and formatted the partition with etx3. When we tried mounting the same, we got the following error :

[root@xyz user]# mount -t ext3  /dev/sdb1 /mysql
mount: /dev/sdb1 already mounted or /mysql busy

lsof didn’t provide any open files that might be linked to this problem or there was any “famd” running. Finally doing the following steps to remove the logical devices from the device-mapper driver helped us fix the problem.

[root@xyz user]# dmsetup ls
ddf1_44656c6c202020201028001510281f033832b7a2f6678dab   (253, 0)
ddf1_44656c6c202020201028001510281f033832b7a2f6678dab1  (253, 1)

[root@xyz user]# dmsetup remove ddf1_44656c6c202020201028001510281f033832b7a2f6678dab1
[root@xyz user]# dmsetup ls
ddf1_44656c6c202020201028001510281f033832b7a2f6678dab   (253, 0)

[root@xyz user]# dmsetup remove ddf1_44656c6c202020201028001510281f033832b7a2f6678dab

[root@xyz user]# dmsetup ls
No devices found

Mounting using the command “mount -t ext3  /dev/sdb1 /mysql”  after the above steps worked fine.


Tagged: disk, mount, troubleshooting
PlanetMySQL Voting: Vote UP / Vote DOWN

change accelerator cache ratio

Июнь 7th, 2010

I was given the task of checking the array accelerator cache ratio and see if it was set to optimal levels. Our ideal preference was a read/write ratio of 0/100.

The machine configuration is HP DL180 G5, 2 x Xeon L5420 2.50GHz, 15.7GB / 16GB 667MHz DDR2, 6 x 300GB-15K SAS.This machine was running mysql 5.1.36 using the innodb plugin.

The command line utility to check the controller configuration is “hpacucli”. Navigating using hpacucli is very straight forward.

“ctrl all show config detail” Will give you the entire controller configuration.

=> ctrl all show config detail

Smart Array P400 in Slot 5
Bus Interface: PCI
Slot: 5
Serial Number: P61630K9SW31NL
Cache Serial Number: PA82C0J9SW02H1
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Chassis Slot:
Hardware Revision: Rev D
Firmware Version: 4.12
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Accelerator Ratio: 0% Read / 100% Write
Drive Write Cache: Disabled
Total Cache Size: 256 MB
Battery Pack Count: 1
Battery Status: OK
SATA NCQ Supported: True

Array: A
Interface Type: SAS
Unused Space: 0 MB
Status: OK

Logical Drive: 1
Size: 1.4 TB
Fault Tolerance: RAID 5
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Stripe Size: 64 KB
Status: OK
Array Accelerator: Enabled
Parity Initialization Status: Initialization Completed
Unique Identifier: 600508B100104B39535733314E4C0003
Disk Name: /dev/cciss/c0d0
Mount Points: / 3.9 GB, none 12.0 GB, /var 3.9 GB, /tmp 3.9 GB, /home 1.3 TB
Logical Drive Label: A0432BCEP61630K9SW31NLD55F

physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1BEP400009004UQCD
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1ZZRN000090035Q2Q
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP20VCQ00009004XE2V
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1ZZSB00009003MMWZ
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:5
Port: 1I
Box: 1
Bay: 5
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1L4T000009004UQCV
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:6
Port: 1I
Box: 1
Bay: 6
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP196KG00009004S0ZH
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown

In the above output our point of interest was “Accelerator Ratio: 0% Read / 100% Write“. In this case it has been set to an optimal value. In case it wasn’t set to an optimal value it can be changed using the command “ctrl slot=5 modify cacheratio=0/100

When you are stuck for a particular command, you can just run “help <command name>” for more input.

=> help aa

The following documentation pertains to your search:

<target> modify [arrayaccelerator=enable|disable]
Enables or disables the array accelerator for a given logical drive. The
target can be any valid logical drive target on a controller that supports
array accelerator management.

<target> modify [cacheratio=#/#|?]
Sets the array accelerator cache ratio for the controller. The first # is
the read cache %. The second # is the write cache %. The target can be any
valid controller.

<target> create [type=ld]
[drives=[#:]#:#,[#:]#:#,[#:]#:#-[#:]#:#],…|all|allunassigned]
[raid=6|5|1+0|1|0|?]
[size=#|?]
[stripesize=8|16|32|64|128|256|default|?]
[sectors=32|63|?]
[arrayaccelerator=enable|disable|?]
[drivetype=sas|satalogical|sata|saslogical|parallelscsi|?]

[type=] The type parameter specifies the device type that is being created.
A logical drive is the only device type supported at this time.

[drives=] The drives parameter specifies the physical drives to be used for
creating a logical drive on a new or existing array. If the drives specified
are all unassigned drives, then a new array will be created with a new
logical drive on it. If the drives specified are all assigned to an existing
array, then a new logical drive will be created on that array. The symbol
#:# stands for port:id or box:bay, depending on the controller. Some
controllers may also support port:box:bay and use the #:#:# syntax. The all
and allunassigned keywords both target all physical drives that are not
currently assigned to an array.

[raid=] The raid parameter sets the raid level of the logical drive. If not
specified, the default raid is the highest level possible. The availability
of certain raid settings depends on the number of drives designated in the
“drives=” parameter. For example, RAID 1 will only be available if two
drives are selected while RAID 1+0 will be shown for a selection of 4 or
more drives.

[size=] The size parameter specifies the size of the logical drive, the
implied units are MB. If not specified, the default is the maximum possible
size.

[stripesize=] The stripesize parameter sets the logical drive’s stripesize.
The implied units of stripe size are KB.

[sectors=] The sectors parameter specifies the sectors per track of the
logical drive. If not specified, the default is 32.

[arrayaccelerator=] The arrayaccelerator parameter specifies the array
accelerator state for the logical drive. If not specified, the default is
enable.

[drivetype=] The drivetype parameter specifies the drive
interface type. If there are multiple drive types when selecting all
physical drives, the desired drive type needs to be specified. Mixed drives
are not allowed on the same array or logical drive. If all drive types in a
controller are the same this parameter is not needed. The target can be a
controller or an array in the system.

Examples:
controller slot=3 logicaldrive 2 modify arrayaccelerator=enable
controller slot=1 modify cacheratio=25/75
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=6
ctrl slot=1 create type=ld drives=1:1-1:6,1:9,1:10-1:12 raid=6
ctrl slot=1 create type=ld drives=all drivetype=parallelscsi
controller slot=5 array A create type=ld raid=5 size=1000
controller slot=1 array C create type=ld raid=1 stripesize=32
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 stripesize=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=1+0 stripesize=?
ctrl slot=1 create type=ld drives=1:1,1:2 raid=1 size=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=1+0 sectors=?


Tagged: accelerator, cache, controller, HP DL180, hpacucli, mysql
PlanetMySQL Voting: Vote UP / Vote DOWN

RAID Controllers Cache Management – Missing Features

Апрель 16th, 2010

PERC4DC_4 We all know how important hardware RAID controllers are in today’s data storage performance especially when dealing with large data sets. If we look at the trend from now to couple of years back; they really evolved rapidly with lot of useful features and their usage also grown as most of the new servers by default has one or two controllers built-in (one for internal and another one for external storage array or for redundancy).

Few popular RAID controller vendors in the market: 

More or less everyone supports all common features and differs in number of ports, protocol support (ISCSI, SATA, SAS, HBA/FB), transfer speed, RAID levels, total disks support, cache size and its management.

Controller Cache – Database Workloads

For database OLTP workloads (IO bound), controller cache plays a crucial role for overall write or read throughput, depending on how the cache is used. Most RAID controllers are equipped with either 128MB or 256 MB or 512MB cache, and newer controllers like HP Smart Array P812 supports 1GB.

Write-back mode improves the writes performance by magnitude as the write request is returned as completed as soon as the data is in the controller cache without actually writing to the disk (that’s why controller needs a BBU, Battery Backup Module so that there is no data loss on power failures)

In case if you enable the read ahead from the controller (sometimes good for OLAP workloads or ETL data warehouse, especially adaptive read ahead due to heavy sequential access); then the same cache is used to store the pre-fetched data that can be satisfied later from the cache without hitting the disk. But in case if the database system does read ahead (like InnoDB), then it is better to turn off read ahead from controller to avoid page trashing.

For some workloads, the controller cache can also cause negative performance if the cache is not properly utilized by the controller.

Missing Cache Management Tools

At present, none of the controllers either supports any cache management tools nor exposes how the cache has been actually used, so that one can adjust the cache according to the workloads for improved performance.

Some of the missing features:

  • A way to flush the data from cache to disks, so that the systems can be taken for offline maintenance. Right now there is no easy way to flush data from cache to disk; other than some of the controllers will indicate through LED whether data is in the cache or not
  • Way to set the cache threshold in time or %, so that it can start flushing to disk once it meets the threshold value. For example; if you notice big spikes from RRD graphs for every few minutes, then one can adjust the threshold to evenly distribute the load.
  • Cache usage statistics (writes data size, read ahead data size etc ), so that workload can be adjusted to yield much better results
  • Splitting of cache between reads and writes either in size or by %; so that they do not overlap and cause performance issues. For example; one prefers to set 20% for read ahead data and 80% for writes. Only HP Smart Array controller supports this feature at present.

As you get more control over the controller cache, the more you can tweak and adjust the workloads to get improved performance. Hopefully one day all vendors will expose more cache management options.


PlanetMySQL Voting: Vote UP / Vote DOWN

Data Store, Software and Hardware – What is best

Апрель 12th, 2010

Other day we had a small discussion about data stores and hardware; and which one drives the other when it comes to data storage solution, rather it is a hard discussion as both on its own are bigger entities; and one can’t easily conclude as it depends on use cases and actually speaking data store limitation(s) drives the need for more powerful hardware for demanding scalability needs.

We all know how important the hardware is in today’s data scalability, especially when dealing with large data sets. Without hardware, it is hard to scale even if you have a powerful data store either it could be SQL (row or columnar) or NoSQL (key/value or other means) or any other data storage solution; because they are limited by the data structures & its implementation and data store performance directly depends on the hardware lately.

At times, data store vendors claim that they have scalable, high performance architecture; that means the solution is directly built on top of hardware scalability and performance by taking advantage of today’s evolving hardware technology. Also, hardware evolution is too aggressive in the recent years when compared with data store solutions due to the market share as hardware is everywhere as it is not just the storage solution.

In short, when a data store performance is directly proportional to hardware performance; that means the data store actually surpassed all of its software performance bottleneck (algorithms, decision making, data structures etc). Overcoming from software performance is not that easy as the requirement changes day by day and it also depends on how data is actually:

  • stored
  • retrieved
  • processed and
  • maintained

If data is stored and retrieved from memory or non-persistent storage solution; then one does not need to worry about rest of the stuff or performance as it yields the best throughput; but memory or non-persistent solution can be a solution for smaller data sets, but not for large data sets that deals with tera bytes of data.

Other than newly evolving columnar data stores (yet to see any one solution that is really pitching with universal acceptance like Oracle/SQLServer/MySQL), NoSQL or big data warehouse solutions (like Aster data, Green Plum etc), none of the existing solutions really take advantage of the latest hardware or even the data  structures as most of the data store kernels are written years back. In today’s world; the only option for scalability is by depending on the hardware and by distributing the load across multiple systems (either in shared-nothing or shared-common or even "cloud" way…).

Hoping to see a solution, one day that actually bridges the gap between data store, hardware and scalability without the need of using multiple technologies for common use cases instead of depending on one single solution that can be universally adopted. Brian Aker in his recent interview claims the same thought.


PlanetMySQL Voting: Vote UP / Vote DOWN

Dell MD1120 Storage Array Performance

Март 29th, 2010

Here is some file IO performance numbers from DELL MD1120 SAS storage array. Last year I did the same test with HP P800 storage array and numbers were impressive. But when it comes to this high end storage array, few surprises.  Before getting into actual details; lets see the test stats and configuration details.

System Configuration:

  1. DELL R710 with CentOS 5.4
  2. NOOP IO Scheduler
  3. MD1120 with 22 15K SAS disks
    • 20 disk RAID-10 (hardware)
    • 2 hot spares
    • Disk Cache disabled
  4. PERC 6/E RAID controller with BBU
    • Connected to DELL MD1120 using SAS
    • Write Back
    • Read Cache Disabled

Test Configuration:

  1. Sysbench fileio test with variable modes and threads
  2. 64 files with 50G total size
  3. All tests ran in un-buffered mode (O_DIRECT) as most of the workload is InnoDB based.

Test Results:

Number of Threads vs Number of Requests/Sec. Every mode ran with 5 iterations and average is taken.

Random IO:

rndio

Sequential IO:

seqio 

Analysis:

  1. Overall the numbers are not bad when it comes to writes, but few surprises when it comes to reads. When compared with HP’s P800 storage array, the numbers still dropped by 20%.
  2. Radon IO:
    • Random write requests ranges from 3200-5000 per sec; due to write back mode (512M cache)
    • Writes are linearly scaling well with the threads, good sign that controller is able to manage the cache efficiently
    • Random reads and writes (rndrw) is also scaling linearly with the threads load, means the IO distribution and cache burst to satisfy reads seems be efficient as it needs to flush the data from controller cache to disk before the read can be satisfied due to O_DIRECT mode.
  3. Sequential IO:
    • Writes seems to be scaling well even in sequential mode without much overhead
    • When it comes to reads, big surprise is drop from 5626 requests/sec to 615 from one thread to two threads. Which is really odd. Worst case it should be ~2000-3000 requests/sec; not sure where the overhead is. I can’t believe it could be thread scheduling as there is only 2 threads.
  4. During 100% IO, on and off I noticed IO serialization with higher queue waits, which indicates that there is some degree of serialization overhead in OS; but not able to track which layer is triggering this. Tried with cfq/deadline, still the same.
  5. Next attempt will be replacing 3Gb/s SAS to fiber channel HBA or 6Gb/s SAS (PERC H800) to see how it performs along with combination of HW and SW raid instead of only depending on controller.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL and hardware information

Октябрь 26th, 2009

People often ask “what’s the best hardware to run a database on?” And the answer, of course, is “it depends”. With MySQL, though, you can get good performance out of almost any hardware.

If you need *great* performance, and you have active databases with a large data set, here are some statistics on real life databases — feel free to add your own.

We define “large data set” as over 100 Gb, mostly because smaller data sets have an easier time with the available memory on a machine (even if it’s only 8 Gb) and backups are less intrusive — InnoDB Hot Backup and Xtrabackup are not really “hot” backups, they are “warm” backups, because there is load on the machine to copy the data files, and on large, active servers we have found that this load impacts query performance. As for how active a database is, we’ve found that equates to a peak production load of over 3,000 queries per second on a transactional database — that is, normal production load gets the server to over 3,000 queries per second at peak times — and a flat average of over 500 queries per second if there are definite quiet and peak times, or if the server is used for reporting or a combined reporting/transactional load. This flat average should be taken over a period of a week or more.

We’re only showing the hardware here, not the configurations, for our three busiest/largest environments. All the configurations shown here have 2 machines, for an active primary and either an active secondary (for read-only queries) or a passive secondary (quiet until needed):

A music distribution company runs the following for primary production (Data size in the 360-380GB range):
2x Sun SunFire X4600 M2 Servers using 4xDual Core Opteron 8220 with 32GB
of RAM attached to a Hatachi DF600F SAN. The SunFire X4600 is scalable
to 8 Sockets (32 Core) and 512GB of RAM.

National post office for a G8 nation = 1.2T of data (and growing fast!).
The primary site has 2 machines connected to the same LUNs on a HA
setup. We have an ‘active’ and a ‘passive’ master configured to kick-in
if the other node fails (only one mounts the LUNs with the data). Both
these servers have 4 x Quad-core Intel Xeon processors and 16G of RAM each.

An online marketing firm has ~600GB of Data
2x Dell PowerEdge R710 with 36GB of RAM and two Intel Xeon L5520 CPUs (Quad Core) – Servers supports up to 144GB of RAM and max of 2 Sockets
Storage: Combination of Local Storage (logs, etc) and an DELL PowerVault
MD 3000 Direct Attached Storage (shared).

What are your details?


PlanetMySQL Voting: Vote UP / Vote DOWN

Advanced Squid Caching in Scribd: Hardware + Software Used

Август 4th, 2009

After the previous post in this caching related series I’ve received many questions on hardware and software configuration of our servers so in this post I’ll describe our server’s configs and the motivation behind those configs.

Hardware Configuration

Since in our setup Squid server uses one-process model (with an asynchronous requests processing) there was no point in ordering multi-core CPUs for our boxes and since we have a lots of pages on the site and the cache is pretty huge all the servers ended up being highly I/O bound. Considering these facts we’ve decided to use the following hardware specs for the servers:

CPU: One pretty cheap dual-core Intel Xeon 5148 (no need in multiple cores or really high frequencies – even these CPUs have ~1% avg load)
RAM: 8Gb (basically to reduce I/O pressure by caching hot content in RAM)
Disks: 4 x small SAS 15k drives in JBOD mode (no RAIDS – we’ve tried all kinds of RAID configs and it did not help with the I/O performance)

So, once again: nothing is as important in a squid box as I/O throughput.

Here is a sample CPU load graph from one of the boxes:

squid-cpu-graph

Software Configuration

This could be a long story, but in a few words our experience with different squid versions was the following.

First, when I’ve started working on this caching project I’ve just installed squid using Debian’s apt-get install squid command. As the result we’ve got some ancient squid 2.6 release that for some reason (still unclear to me) was painfully slow in I/O operations and it had some leaking file descriptors problem so after a few hours under production load the box would simply stop processing requests.

When the first approach failed, I’ve decided to go to the squid web site, download the latest production release and install it from sources (yes, we do it all the time when OS vendor ships too old or buggy releases). Result – freaking fast and stable squid 3.0 which worked flawlessly for about 5 months.

Few months ago we’ve found out about the stale-* extensions available in squid 2.7 and I’ve started wondering if we should change our perfectly stable 3.0 setup to 2.7. And some time later I’ve decided to use Vary HTTP header in our caching architecture and then I found out that vary-caching correctly implemented only in 2.7 and since 3.0 is a complete rewrite of the 2.X branch, vary-caching is not yet implemented there (or not in a way we’d want it to be implemented).

So, the final result: at this moment in time we’re using custom-built Squid 2.7STABLE6 and really happy with it, it is stable, fast and feature-rich caching proxy server.

Caching Cluster Configuration

Apparently we have more than one squid server in scribd and this makes it a bit harder to use those servers (comparing to one box when you’d send all requests to one IP:port pair). We’ve tried to use round-robin balancing for the squid boxes + ICP-based neighbor checks but it was adding more latency to our responses and we’ve decided to put haproxy load balancer between nginx and squid farm and set up URL hash based balancing to distribute requests evenly amongst squid backends.

This scheme worked pretty nice, but we had one serious problem with this setup: if one squid box would go down, haproxy would quickly detect the problem and would remove it from the pool… And here comes the problem – removing a server from the pool completely changes hashing keys space and all cached requests become invalid. To solve this problem we’ve developed a nginx balancer module that performs consistent hashing of URLs and we’re testing this module now in production. What is really good about this module is that it removes one hop from the chain if http proxies between the site and a user.

So, this was a short description of what hardware we use for our caching cluster and why do we use it. In the next posts of this series we’ll talk about cache control and objects invalidation.