Archive for the ‘raid’ Category
Should RAID 5 be used in a MySQL server?
Апрель 2nd, 2012PlanetMySQL Voting: Vote UP / Vote DOWN
Setting up XFS on Hardware RAID — the simple edition
Декабрь 16th, 2011There are about a gazillion FAQs and HOWTOs out there that talk about XFS configuration, RAID IO alignment, and mount point options. I wanted to try to put some of that information together in a condensed and simplified format that will work for the majority of use cases. This is not meant to cover every single tuning option, but rather to cover the important bases in a simple and easy to understand way.
Let’s say you have a server with standard hardware RAID setup running conventional HDDs.
RAID setup
For the sake of simplicity you create one single RAID logical volume that covers all your available drives. This is the easiest setup to configure and maintain and is the best choice for operability in the majority of normal configurations. Are there ways to squeeze more performance out of a server by dividing the logical volumes: perhaps, but it requires a lot of fiddling and custom tuning to accomplish.
There are plenty of other posts out there that discuss RAID minutia. Make sure you cover the following:
- RAID type (usually 5 or 1+0)
- RAID stripe size
- BBU enabled with Write-back cache only
- No read cache or read-ahead
- No drive write cache enabled
Partitioning
You want to run only MySQL on this box, and you want to ensure your MySQL datadir is separated from the OS in case you ever want to upgrade the OS, but otherwise keep it simple. My suggestion? Plan on allocating partitions roughly as follows, based on your available drive space and keeping in mind future growth.
- 8-16G for Swap –
- 10-20G for the OS (/)
- Possibly 10G+ for /tmp (note you could also point mysql’s tmpdir elsewhere)
- Everything else for MySQL (/mnt/data or similar): (sym-link /var/lib/mysql into here when you setup mysql)
Are there alternatives? Yes. Can you have separate partitions for Innodb log volumes, etc.? Sure. Is it work doing much more than this most of the time? I’d argue not until you’re sure you are I/O bound and need to squeeze every last ounce of performance from the box. Fiddling with how to allocate drives and drive space from partition to partition is a lot of operational work which should be spent only when needed.
Aligning the Partitions
#fdisk -ul Disk /dev/sda: 438.5 GB, 438489317376 bytes 255 heads, 63 sectors/track, 53309 cylinders, total 856424448 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00051fe9 Device Boot Start End Blocks Id System /dev/sda1 2048 7813119 3905536 82 Linux swap / Solaris Partition 1 does not end on cylinder boundary. /dev/sda2 * 7813120 27344895 9765888 83 Linux /dev/sda3 27344896 856422399 414538752 83 Linux
- Start with your RAID stripe size. Let’s use 64k which is a common default. In this case 64K = 2^16 = 65536 bytes.
- Get your sector size from fdisk. In this case 512 bytes.
- Calculate how many sectors fit in a RAID stripe. 65536 / 512 = 128 sectors per stripe.
- Get start boundary of our mysql partition from fdisk: 27344896.
- See if the Start boundary for our mysql partition falls on a stripe boundary by dividing the start sector of the partition by the sectors per stripe: 27344896 / 128 = 213632. This is a whole number, so we are good. If it had a remainder, then our partition would not start on a RAID stripe boundary.
Create the Filesystem
XFS requires a little massaging (or a lot). For a standard server, it’s fairly simple. We need to know two things:
- RAID stripe size
- Number of unique, utilized disks in the RAID. This turns out to be the same as the size formulas I gave above:
- RAID 1+0: is a set of mirrored drives, so the number here is num drives / 2.
- RAID 5: is striped drives plus one full drive of parity, so the number here is num drives – 1.
# mkfs.xfs -d su=64k,sw=4 /dev/sda3
meta-data=/dev/sda3 isize=256 agcount=4, agsize=25908656 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=103634624, imaxpct=25
= sunit=16 swidth=64 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=50608, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
The XFS FAQ is a good place to check out for more details.
Mount the filesystem
Again, there are many options to use here, but let’s use some simple ones:
/var/lib/mysql xfs nobarrier,noatime,nodiratime
Setting the IO scheduler
This is a commonly missed step related to getting the IO setup properly. The best choices here are between ‘deadline’ and ‘noop’. Deadline is an active scheduler, and noop simply means IO will be handled without rescheduling. Which is best is workload dependent, but in the simple case you would be well-served by either. Two steps here:
echo noop > /sys/block/sda/queue/scheduler # update the scheduler in realtime
And to make it permanent, add ‘elevator=<your choice>’ in your grub.conf at the end of the kernel line:
kernel /boot/vmlinuz-2.6.18-53.el5 ro root=LABEL=/ noapic acpi=off rhgb quiet notsc elevator=noop
This is a complicated topic, and I’ve tried to temper the complexity with what will provide the most benefit. What has made most improvement for you that could be added without much complexity?
PlanetMySQL Voting: Vote UP / Vote DOWN
Green HDs and RAID Arrays
Сентябрь 26th, 2011Some so-called “Green” harddisks don’t like being in a RAID array. These are primarily SATA drives, and they gain their green credentials by being able reduce their RPM when not in use, as well as other aggressive power management trickery. That’s all cool and in a way desirable – we want our hardware to use less power whenever possible! – but the time it takes some drives to “wake up” again is longer than a RAID setup is willing to tolerate.
First of all, you may wonder why I bother with SATA disks at all for RAID. I’ve written about this before, but they simply deliver plenty for much less money. Higher RPM doesn’t necessarily help you for a db-related (random access) workload, and for tasks like backups which do have a lot of speed may not be a primary concern. SATA disks have a shorter command queue than SAS, so that means they might need to seek more – however a smart RAID controller would already arrange its I/O in such a way as to optimise that.
The particular application where I tripped over Green disks was a backup array using software RAID10. Yep, a cheap setup – the objective is to have lots of diskspace with resilience, and access speed is not a requirement.
Not all Green HDs are the same. Western Digital ones allow their settings to be changed, although that does need a DOS tool (just a bit of a pest using a USB stick with FreeDOS and the WD tool, but it’s doable), whereas Seagate has decided to restrict their Green models such that they don’t accept any APM commands and can’t change their configuration.
I’ve now replaced Seagates with (non-Green) Hitachi drives, and I’m told that Samsung disks are also ok.
So this is something to keep in mind when looking at SATA RAID arrays. I also think it might be a topic that the Linux software RAID code could address – if it were “Green HD aware” it could a) make sure that they don’t go to a state that is unacceptable, and b) be tolerant with their response time – this could be configurable. Obviously, some applications of RAID have higher demands than others, not all are the same.
PlanetMySQL Voting: Vote UP / Vote DOWN
Apsersa’s summary tool supports Adaptec and MegaRAID controllers
Май 16th, 2010I spent a little time yesterday doing some things with the “summary” tool from Aspersa. I added support for summarizing status and configuration of Adaptec and LSI MegaRAID controllers. I also figured out how to write a test suite for Bash scripts, so most major parts of the tool are fully tested now. I learned a lot more sed and awk this weekend.
There is really only one way to get status of Adaptec controllers (/usr/StorMan/arcconf), but the LSI controllers can be queried through multiple tools. I added support for MegaCli64, as long as it’s located in the usual place at /opt/MegaRAID/MegaCli/MegaCli64. I am looking for feedback and/or help on supporting other methods of getting status from the LSI controllers, such as megarc and omreport. If you can contribute sample output from these tools, please attach them as a file to a new issue report on the project’s issue tracker. (Don’t paste them as text, please — formatting and whitespace will get mangled. Tabs and spaces need to be preserved.)
I am slowly gaining insight into how best to write a similar summary tool for MySQL servers. The goals of this tool are very specific — including things like diff’able output. I’m figuring out what went wrong with Maatkit’s mk-audit tool and how to go about it differently.
Related posts:
- mk-query-digest now supports Postgres logs Maatkit do
- New Maatkit tool to compute index usage In a coupl
- Try mk-query-advisor, a new Maatkit tool We have an
Related posts brought to you by Yet Another Related Posts Plugin.
PlanetMySQL Voting: Vote UP / Vote DOWN
Using ext4 for MySQL
Март 12th, 2010This week with a client I saw ext4 used for the first time on a production MySQL system which was running Ubuntu 9.10 (Karmic Koala). I observe today while installing 9.10 Server locally that ext4 is the default option. The ext4 filesystem is described as better performance, reliability and features while there is also information about improvements in journaling.
At OSCON 2009 I attended a presentation on Linux Filesystem Performance for Databases by Selena Deckelmann in which ext4 was included. While providing some improvements in sequential reading and writing, there were issue with random I/O which is the key for RDBMS products.
Is the RAID configuration (e.g. RAID 5, RAID 10), strip size, buffer caches, LVM etc more important then upgrading from ext3 to ext4? I don’t have access to any test equipment in order to determine myself however I’d like to know of any experiences from members of the MySQL community and if anybody has experienced any general problems running ext4.
ext4 References
- Ext 4 How To on kernel.org
- Ext4 on kernelnewbies.org
- ext4ext4 overview via wikipedia.org
- First benchmarks of the ext4 file system
PlanetMySQL Voting: Vote UP / Vote DOWN
Knowing your PERC 6/i BBU
Февраль 6th, 2010PlanetMySQL Voting: Vote UP / Vote DOWN
Storage Miniconf Deadline Extended!
Сентябрь 30th, 2009The linux.conf.au organisers have given all miniconfs an additional few weeks to spruik for more proposal submissions, huzzah!
So if you didn’t submit a proposal because you weren’t sure whether you’d be able to attend LCA2010, you now have until October 23 to convince your boss to send you and get your proposal in.
PlanetMySQL Voting: Vote UP / Vote DOWN
Storage Miniconf Deadline Extended!
Сентябрь 30th, 2009The linux.conf.au organisers have given all miniconfs an additional few weeks to spruik for more proposal submissions, huzzah!
So if you didn’t submit a proposal because you weren’t sure whether you’d be able to attend LCA2010, you now have until October 23 to convince your boss to send you and get your proposal in.
PlanetMySQL Voting: Vote UP / Vote DOWN
EC2/EBS single and RAID volumes IO benchmark
Август 7th, 2009During preparation of Percona-XtraDB template to run in RightScale environment, I noticed that IO performance on EBS volume in EC2 cloud is not quite perfect. So I have spent some time benchmarking volumes. Interesting part with EBS volumes is that you see it as device in your OS, so you can easily make software RAID from several volumes.
So I created 4 volumes ( I used m.large instance), and made:
RAID0 on 2 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 2 -l 0 /dev/sdj /dev/sdk
RAID0 on 4 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 4 -l 0 /dev/sdj /dev/sdk /dev/sdl /dev/sdm
RAID5 on 3 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 3 -l 0 /dev/sdj /dev/sdk /dev/sdl
RAID10 on 4 volumes in two steps:
mdadm -v --create /dev/md0 --chunk=256 --level=raid1 --raid-devices=2 /dev/sdj /dev/sdk
mdadm -v --create /dev/md1 --chunk=256 --level=raid1 --raid-devices=2 /dev/sdm /dev/sdl
and
mdadm -v --create /dev/md2 --chunk=256 --level=raid0 --raid-devices=2 /dev/md0 /dev/md1
And also in Linux you can create tricky RAID10,f2 (you can read what is this here http://www.mythtv.org/wiki/RAID)
mdadm -C /dev/md0 --chunk=256 -n 4 -l 10 -p f2 /dev/sdj /dev/sdk /dev/sdk /dev/sdm
and also I tested IO on single volume.
I used xfs filesystem mounted with noatime,nobarrier options
and for benchmark I used sysbench fileio modes on 16GB file with next script:
-
#!/bin/sh
-
set -u
-
set -x
-
set -e
-
-
for size in 256M 16G; do
-
for mode in seqwr seqrd rndrd rndwr rndrw; do
-
./sysbench --test=fileio --file-num=1 --file-total-size=$size prepare
-
for threads in 1 4 8 16; do
-
echo PARAMS $size $mode $threads> sysbench-size-$size-mode-$mode-threads-$threads
-
./sysbench --test=fileio --file-total-size=$size --file-test-mode=$mode\
-
--max-time=60 --max-requests=10000000 --num-threads=$threads --init-rng=on \
-
--file-num=1 --file-extra-flags=direct --file-fsync-freq=0 run \
-
>> sysbench-size-$size-mode-$mode-threads-$threads 2>&1
-
done
-
./sysbench --test=fileio --file-total-size=$size cleanup
-
done
-
done
So tested modes: seqrd (sequential read), seqwr (sequential write), rndrd (random read), rndwr (random write), rndrw (random read-write). And sysbench uses 16KB pagesize to emulate work of InnoDB with 16KB pagesize.
Raw results you may find in Google Docs https://spreadsheets.google.com/ccc?key=0AjsVX7AnrCYwdFlBVW9KWVJGUGFqeVdpUHY0Y0VXYXc&hl=en
, but let me show most interesting results from my point of view. On graphs I show requests / second (more is better) and response time in ms for 95% cases (less is better).



What I see from the results is that if you are looking for IO performance in EC2/EBS environment it's definitely worth to consider some RAID setup.
RAID5 does not show benefits comparing with others, and RAID10,f2 is worse than RAID10.
But speaking RAID0 vs RAID10 it's your call. For sure in regular server I'd never suggest RAID0 for database, but speaking about EBS I am not sure what guarantee Amazon gives here. I'd expect under EBS volume there already exists redundant array, and it may not worth to add additional redundancy, but I am not sure in that.
For now I'd consider RAID10 on 4 - 10 volumes.
And of course to get benefit from multi-threading IO in MySQL you need to use XtraDB or MySQL 5.4 ®
However there may be small problem with backup over EBS. On single EBS volume you can just do snapshot, but on several volumes it may be tricky. But in this case you may consider LVM snapshots or XtraBackup
Entry posted by Vadim | No comment
PlanetMySQL Voting: Vote UP / Vote DOWN




