Archive for the ‘hardware’ Category

Got open source cloud storage? Red Hat buys Gluster

Октябрь 6th, 2011

Red Hat’s $136m acquisition of open source storage vendor Gluster marks Red Hat’s biggest buy since JBoss and starts the fourth quarter with a very intersting deal. The acquisition is definitely good for Red Hat since it bolsters its Cloud Forms IaaS and OpenShift PaaS technology and strategy with storage, which is often the starting point for enterprise and service provider cloud computing deployments. The acquisition also gives Red Hat another weapon in its fight against VMware, Microsoft and others, including OpenStack, of which Gluster is a member (more on that further down). The deal is also good for Gluster given the sizeable price Red Hat is paying for the provider of open source, software-based, scale-out storage for unstructured data and also as validation of both open source and software in today’s IT and cloud computing storage.

This is exactly the kind of disruption we’ve been seeing and expecting as Linux vendors compete with new rivals in virtualization, cloud computing and different layers of the stack, including storage (VMware, Microsoft, OpenStack, Oracle, Amazon and others), as covered in our recent special report, The Changing Linux Landscape.

While the deal makes perfect sense for both Red Hat and for Gluster, it also has implications for the white hot open source cloud computing project OpenStack. There was no mention of OpenStack in Red Hat’s FAQ on the deal, but there was a reference to ongoing support for Gluster partners, of which there are many fellow OpenStack members. OpenStack was also highlighted among Gluster’s key open standards participation along with the Linux Foundation and Red Hat-led Open Virtualization Alliance oriented around KVM. Sources at both Gluster and Red Hat, which point to OpenStack support being bundled into Red Hat’s coming Fedora 16, also reiterated to me Red Hat is indeed planning to continue involvement with OpenStack around the Gluster technologies. I suspect Red Hat is looking to leverage Gluster more for its own purposes than for OpenStack’s, but I must also acknowledge Red Hat’s understanding of the value of openness, community and compatibility. Taking that idea a step further, Gluster may represent a way that Red Hat can integrate with and tap into the OpenStack community by blending it with its own community around Fedora, RHEL, JBoss, RHEV and Cloud Forms and OpenShift.

The deal also leads many to wonder whether or what may be next for Red Hat in terms of acquisition. We’ve long thought database and data management technologies were areas where we might see Red Hat building out. This was also the subject of renewed rumors recently, and we believe it might still be an attractive piece for Red Hat given the open source opportunities and targets around NoSQL technologies such as Apache Hadoop distributed data management framework and Cassandra distributed database management software. We’ve also believed systems management to be a potential place for Red Hat to further expand. Given its need to largely stay within open source, we would expect targets in this area to include GroundWork Open Source, which joins Linux and Windows systmes in its monitorig and management, and Zenoss, which works with Cisco and Red Hat rival VMware in monitoring and managing systems with its open source software. Another potential target that would increase Red Hat’s depth in open source virtualization and cloud computing is Convirture, which might also be an avenue for Red Hat to reach out to midmarket and SMB customers and channel players. Red Hat was among the non-OpenStack members we listed as potential acquirers when considering the M&A possibilities (451 subscribers) out of OpenStack.

Given its recent quarterly earnings report and topping the $1 billion annual revenue mark, Red Hat seems again to be bucking the bad economy. We’ve written before in 2008 and more recently how bad economic conditions can be good for open source software. Red Hat is atop the list of open source vendors that suffer as traditional, enterprise IT customers such as banks freeze spending or worse, fail. However, the company’s deal for Gluster is yet another sign it is thriving and expanding despite economic difficulty and uncertainty.

You don’t have to just look at Red Hat’s earnings or take our word for it. On Jim Cramer’s ‘Mad Money’ this week, we heard Red Hat CEO Jim Whitehurst praised for Red Hat performance and traction where most companies and many economists are throwing the blame: financial services, government and Europe. Cramer credited Red Hat for a ’spectacular quarter’ and allowed Whitehurst to tout the benefits of the Gluster technology and acquisition, particularly Gluster’s software-based storage technology that matches cloud computing. It was quite a contrast to the news out of Oracle Open World, where hardware was a focal point.


PlanetMySQL Voting: Vote UP / Vote DOWN

CodeBits — An event of competitive innovation

Август 15th, 2011
Codebits 2009 - Pedro and RupertIt was my pleasure and privilege to attend Codebits in 2009. As Roland Bouman says, its talk choice method is based on public voting, and therefore everyone cha have contribute to the schedule.But that is not the main reason for attending this extraordinary event. It is not just a conference. It's an innovation fest. For 1 and 1/2 days, it's a conference, where the speakers are encouraged to bring to their audience the most innovative and inspiring talks. In the afternoon of the second day, the event becomes a competition, where the teams that have registered will have 24 hours to bring a project to completion, and they have to start and finish within the allotted time. The project can be anything, and I have seen quite a lot of exciting stuff rolling live in the huge pavilion: I could hardly ignore robotics, as these little mechanical smurfs were running all over the place and you would have to be careful not to squash them when you walked.There was plenty of occasions for planning of great projects, together with attempts at improving social relations, and mixing up with big brother.There were projects based on 3D printing, and less broad projects like all-seasons keyboards.A very popular session, followed by practical workshops was lock picking. I attended one of them, learned how to pick simple and less simple locks, and I brought home some lockpicking tools.On a more technical level, I was there with Lenz Grimmer and Kai Seidler, we spoke about MySQL and other cool things, and we had lots of fun for three days.Besides the teams hacking away at their projects, there were several teams showcasing technology that had been developed by winners of the previous years, such as 3D television and intelligent phone networks. In short, This was an inspiring event, which I can warmly recommend.

PlanetMySQL Voting: Vote UP / Vote DOWN

Aligning IO on a hard disk RAID – the Benchmarks

Июнь 9th, 2011

In the first part of this article I have showed how I align IO, now I want to share results of the benchmark that I have been running to see how much benefit can we get from a proper IO alignment on a 4-disk RAID1+0 with 64k stripe element. I haven’t been running any benchmarks in a while so be careful with my results and forgiving to my mistakes :)

The environment

Here is the summary of the system I have been running this on (for brevity I have removed some irrelevant information):

# Aspersa System Summary Report ##############################
    Platform | Linux
     Release | Ubuntu 10.04.2 LTS (lucid)
      Kernel | 2.6.32-31-server
Architecture | CPU = 64-bit, OS = 64-bit
# Processor ##################################################
  Processors | physical = 2, cores = 12, virtual = 24, hyperthreading = yes
      Speeds | 24x1600.000
      Models | 24xIntel(R) Xeon(R) CPU X5650 @ 2.67GHz
      Caches | 24x12288 KB
# Memory #####################################################
       Total | 23.59G
...
  Locator   Size     Speed             Form Factor   Type          Type Detail
  ========= ======== ================= ============= ============= ===========
  DIMM_A1   4096 MB  1333 MHz (0.8 ns) DIMM          {OUT OF SPEC} Other
...
# Disk Schedulers And Queue Size #############################
         sda | [deadline] 128
# RAID Controller ############################################
  Controller | LSI Logic MegaRAID SAS
       Model | MegaRAID SAS 8704EM2, PCIE interface, 8 ports
       Cache | 128MB Memory, BBU Present
         BBU | 100% Charged, Temperature 34C, isSOHGood=

  VirtualDev Size      RAID Level Disks SpnDpth Stripe Status  Cache
  ========== ========= ========== ===== ======= ====== ======= =========
  0(no name) 1.088 TB  1 (1-0-0)      2     2-2     64 Optimal WT, RA

  PhysiclDev Type State   Errors Vendor  Model        Size
  ========== ==== ======= ====== ======= ============ ===========
  Hard Disk  SAS  Online   0/0/0 SEAGATE ST3600057SS  558.911
  Hard Disk  SAS  Online   0/0/0 SEAGATE ST3600057SS  558.911
  Hard Disk  SAS  Online   0/0/0 SEAGATE ST3600057SS  558.911
  Hard Disk  SAS  Online   0/0/0 SEAGATE ST3600057SS  558.911

It says controller cache is set to write-through (WT), though in fact for every benchmark I have repeated it with (a) write-through and (b) write-back to see if write-back cache would minimize the effects of misalignment.

File system of choice was XFS. Barriers and physical disk cache was disabled. The tool I used was sysbench 0.4.10 that came with this Ubuntu system. I have run every fileio benchmark and an IO bound read-write oltp benchmark in autocommit mode.

File IO benchmark

For the FileIO benchmark, I used 64 files – 1GB, 4GB and 16GB total in size with 1, 4 and 8 threads. The operations were done in 16kB units to mimic InnoDB pages. There were couple interesting surprised I faced:

1. After I got (what I thought was) the best configuration, I added LVM on top of that and the performance improved another 20-40%. It took me a while to figure it out, but here’s what happened – for XFS file system on a raw partition I was using full partition size which was slightly over 1TB in size. When I added LVM on top however, I made the logical volume slightly below 1TB. Investigating this I found that 32-bit xfs inodes (which are used by default) have to live in the first terabyte of the device which seems to have affected the performance here (IMO that’s because of where first data extents were placed in this case). When I have mounted the partition with inode64 option however, the effect disappeared and performance without LVM was slightly better than with LVM as expected. I had to redo all of the benchmarks to get the numbers right.

2. I was running vmstat during one of the tests and my eye caught the spike in OS buffers during “prepare” phase of sysbench. I found out that sysbench would not honor –file-extra-flags during “prepare” phase and instead of having files created using direct IO they were buffered in OS cache and so writes to files were serialized until they were fully overwritten and that way flushed from OS buffers. Buffers would be flushed within first few seconds so the effects of this were marginal. Alexey Kopytov fixed this in the sysbench trunk immediately, though I didn’t want to recompile sysbench on this system so I’ve used Domas’ uncache after prepare to make sure caches were clean.

OLTP benchmark

As the goal was to compare performance with different IO alignment, not different MySQL configurations, I didn’t try out different MySQL versions or settings. Moreover, I have been running these benchmarks for a customer so I just used the setting that they would have used anyway. One thing I did change was – I have significantly reduced InnoDB buffer pool to make sure the benchmark is IO bound.

That said, benchmark was running on a Percona Server 5.0.92-87 with the following my.cnf configuration:

[mysqld]
datadir=/data/mysql
socket=/var/run/mysqld/mysqld.sock
innodb_file_per_table = true
innodb_data_file_path = ibdata1:10M:autoextend
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 8M
innodb_buffer_pool_size = 128M
innodb_log_file_size = 64M
innodb_log_files_in_group = 2
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_io_capacity = 200
port = 3306
back_log = 50
max_connections = 2500
max_connect_errors = 10
table_cache = 2048
max_allowed_packet = 16M
binlog_cache_size = 16M
max_heap_table_size = 64M
thread_cache_size = 32
query_cache_size = 0
tmp_table_size = 64M
key_buffer_size = 8M
bulk_insert_buffer_size = 8M
myisam_sort_buffer_size = 8M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover
skip-grant-tables

Amount of rows used was 20M, transactions were not used (autocommit), number of threads – 1, 4, 8, 16 and 32.

Benchmark scenarios

Here’s the different settings that I have ran the same benchmark on. As I mentioned earlier, each of those were run twice – first with RAID controller cache set to Write-Through and then to Write-Back.

1. Baseline – misalignment on the partition table, no LVM and no alignment settings in the file system. This is what you would often get on RHEL5, Ubuntu 8.04 or similar “older” systems if you wouldn’t do anything with respect to IO alignment.

2. Misalignment on the partition table, but proper alignment options on the file system. This is what we get when file system tries to balance writes but is not aware that it is not aligned to the beginning of the stripe element.

3. 1M alignment in partition table but no options on the file system. You should get this on RHEL6, Ubuntu 10.04 and similar systems if you wouldn’t do anything with respect to IO alignment yourself. In this case offset is correct, but file system is unaware how to align files properly.

4. Partition table and file system properly aligned; sunit/swidth set during mkfs. No LVM at this point.

5. Partition table aligned properly; sunit/swidth set during mounting but not during mkfs. This is your best option if you have a proper alignment in partition table but you did not set alignment options in xfs when creating it and you don’t want or can’t format the file system. One thing to note however – files that were written before this was set may still be unaligned, though xfs defragmentation may be able to fix that (not verified).

6. Added LVM on top of aligned partition table, used proper file system alignment.

Benchmark results

I had a hard time thinking how it would be best to present results so it’s not too stuffed and actually interesting. I decided that instead of preparing charts for each benchmark, I’ll just describe few less interesting numbers first, then I’ll show graphs for more interesting results. Let me know if you thought this was a bad idea :)

File IO benchmark results

Sequential read results are expectedly the least interesting. Read-ahead kicked in immediately giving ~9’600 iops (~150MB/s) at 1 thread, 14500 iops (~230MB/s) at 4 threads and ~16300 iops (~250MB/s) at 8 threads. Neither IO alignment nor file size made any difference. Adding LVM here reduced single-thread performance by 5-10%.

Sequential write results were a bit more interesting. With WT (write-through) cache enabled, performance was really poor whatsoever and there was virtually no difference whether it was 1 thread, 4 or 8 threads. Different file sizes made no difference too. Write-back cache gave an incredible performance boost – up to 33x in single-threaded workload. File system IO alignment seems to have made a different – up to 15% when with write-back cache enabled. Here’s 1GB seqwr with WT cache:

1GB seqwr WT cache

Here’s same test with WB cache:

1GB seqwr WB cache

And just to show you the difference between sequential writes with WT cache and WB cache:

1GB seqwr WT vs WB

Random read. This is probably the most interesting number for OLTP workload which is usually light on writes (especially if there’s a BBU protected Write-Back cache) and heavy on random reads. Regardless of the file size, the difference between aligned and misaligned reads was the same and, WT -vs- WB cache of course showed no difference at all. Here are the results:

16GB rndrd

As you can see IO alignment makes a difference here and improves performance up to 15% in case of 8 threads running concurrently. Because the customer was running a database which was way bigger than 16G, I’ve repeated the random read (and write) benchmark with 8 threads and total size of 256G. While the number of operations per second was slightly lower, the difference was still 15% — 909 iops unaligned -vs- 1049 aligned.

Random write. This is an important metric for write intensive workloads where there’s a lot of data being modified, inserts are done to random positions (not consecutive PK causing page splits) etc. Benchmark results are fairly consistent regardless of file size, let’s look at them. First, results with WT cache:

16 rndwr WT cache

And here’s with WB cache:

16 rndwr WB cache

Apparently proper IO alignment in this case gives up to 23% improvement when WB cache is used. With WT cache enabled, single thread performance improvement is marginal however WB cache brings single thread random write performance close to what 8 threads can do, and IO alignment gives extra 23% in this case.

I mentioned I did single test on a larger files (same test I did for random reads) i.e. 8 thread random write benchmark on files totaling to 256GB. With WB cache enabled, I got 919 iops unaligned and 1127 iops aligned i.e. the improvement is still 23%.

OLTP benchmark results

From this benchmark, I only have two graphs to show you. First one is with RAID controller set to WT cache:

sysbench OLTP 20M rows, WT cache

The second is with WB cache:

sysbench OLTP 20M rows, WB cache

I couldn’t figure out what exactly happened with setting #3 when WB cache was disabled, what I do know though is that, based on IO stats I was gathering during the benchmarks, the reason was in fact lower number of IO operations and higher response time – so it seems in this case misaligned IO had some collateral effects in a mixed read/write environment. Note that the benchmarks were all scripted and oltp benchmarks would automatically start after file tests so if there was an error in the setting, it would have reflected across all other benchmarks for the same setting.

Summary

For the two workloads that are most relevant to databases – random reads and random writes – IO alignment on a 4-disk RAID10 with standard 64k stripe element size makes a significant difference. When I launched the system that I was benchmarking, I could clearly see the difference in production as I had another machine running sideways with the same hardware, but with a misaligned IO. Here’s diskstats from the two shards running side by side:

Aligned:
  #ts device    rd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt    wr_s wr_avkb wr_mb_s wr_mrg wr_cnc   wr_rt busy in_prg
{540} dm-0     447.1    34.0     7.4     0%    2.4     5.4    23.4    49.6     0.6     0%    0.0     0.6  85%      0

Misaligned:
  #ts device    rd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt    wr_s wr_avkb wr_mb_s wr_mrg wr_cnc   wr_rt busy in_prg
{925} dm-0     462.1    34.1     7.7     0%    3.8     8.2    12.1    87.0     0.5     0%    0.0     0.7  93%      0

While number of operations from the OS perspective is very similar, due to high concurrency response time in the first case is significantly better.

It would be interesting however to run similar benchmarks on a larger RAID5 system where it should make even bigger difference on writes. Another interesting setting might be a [mirrored] RAID0 with many more stripes as not having proper file system alignment should have really interesting effects. Large stripe on the other hand should somewhat reduce the effects of misalignment, though it would definitely be interesting to run benchmarks and verify that. If you have some numbers to share, please leave a comment. Next, I plan to look at IO alignment on Flash cards to see what benefits we can get there from proper alignment.

You can find scripts and plain data here on our public wiki.


PlanetMySQL Voting: Vote UP / Vote DOWN

dbbenchmark.com – now supporting MySQL on OSX 10.6

Август 29th, 2010

Just a quick note to let everyone know that our new benchmarking script now supports OSX 10.6 on Intel hardware. That means you can run one simple command and get all of the sequential and random INSERT and SELECT performance statistics about your database performance. As usual the script is open source and released under the new BSD license. Give is a try by downloading now! See the download page for more details.


PlanetMySQL Voting: Vote UP / Vote DOWN

dbbenchmark.com – Benchmarking script now available

Август 28th, 2010

You can download the first release of the benchmarking script here: http://code.google.com/p/dbbenchmark/

Please read the README file or consult the Support page before running the benchmarks.


PlanetMySQL Voting: Vote UP / Vote DOWN

dbbenchmark.com – Site launched

Август 28th, 2010

Welcome to DBbenchmarks.com, a publicly accessible database that tracks anonymously submitted data about MySQL server performance. You can use this site to see research the performance of certain types of hardware when running MySQL. Our open-source benchmarking script is free to own and use, we only ask that you allow the script to connect to this database and submit the results. All results and data collected is anonymous and viewable on this site. We only track performance data from MySQL – you can see the list on the About page.

Check out the database of benchmarks here: [link]


PlanetMySQL Voting: Vote UP / Vote DOWN

Four short links: 1 July 2010

Июль 1st, 2010

  1. Conflict Minerals and Blood Tech (Joey Devilla) -- electronic components have a human and environmental cost. I remember Saul Griffith asking me, "do you want to kill gorillas or dolphins?" for one component. Now we can add child militias and horrific rape to the list. (via Simon Willison)
  2. Meteor -- an open source HTTP server that serves streaming data feeds (for apps that need Comet-style persistent connections). (via gianouts on Delicious)
  3. Hobby King RC Store -- online source for remote control goodness, as recommended by Dan Shapiro at Foo.
  4. RethinkDB -- MySQL storage engine optimised for SSD drives. See also TechCrunch article.


PlanetMySQL Voting: Vote UP / Vote DOWN

mount: /dev/sdb1 already mounted or /mysql busy

Июнь 10th, 2010

We added a 500GB 7.2K SATA/300 Hitachi Deskstar E7K500 16MB disk to one of our dev servers and partitioned using fdisk and formatted the partition with etx3. When we tried mounting the same, we got the following error :

[root@xyz user]# mount -t ext3  /dev/sdb1 /mysql
mount: /dev/sdb1 already mounted or /mysql busy

lsof didn’t provide any open files that might be linked to this problem or there was any “famd” running. Finally doing the following steps to remove the logical devices from the device-mapper driver helped us fix the problem.

[root@xyz user]# dmsetup ls
ddf1_44656c6c202020201028001510281f033832b7a2f6678dab   (253, 0)
ddf1_44656c6c202020201028001510281f033832b7a2f6678dab1  (253, 1)

[root@xyz user]# dmsetup remove ddf1_44656c6c202020201028001510281f033832b7a2f6678dab1
[root@xyz user]# dmsetup ls
ddf1_44656c6c202020201028001510281f033832b7a2f6678dab   (253, 0)

[root@xyz user]# dmsetup remove ddf1_44656c6c202020201028001510281f033832b7a2f6678dab

[root@xyz user]# dmsetup ls
No devices found

Mounting using the command “mount -t ext3  /dev/sdb1 /mysql”  after the above steps worked fine.


Tagged: disk, mount, troubleshooting
PlanetMySQL Voting: Vote UP / Vote DOWN

change accelerator cache ratio

Июнь 7th, 2010

I was given the task of checking the array accelerator cache ratio and see if it was set to optimal levels. Our ideal preference was a read/write ratio of 0/100.

The machine configuration is HP DL180 G5, 2 x Xeon L5420 2.50GHz, 15.7GB / 16GB 667MHz DDR2, 6 x 300GB-15K SAS.This machine was running mysql 5.1.36 using the innodb plugin.

The command line utility to check the controller configuration is “hpacucli”. Navigating using hpacucli is very straight forward.

“ctrl all show config detail” Will give you the entire controller configuration.

=> ctrl all show config detail

Smart Array P400 in Slot 5
Bus Interface: PCI
Slot: 5
Serial Number: P61630K9SW31NL
Cache Serial Number: PA82C0J9SW02H1
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Chassis Slot:
Hardware Revision: Rev D
Firmware Version: 4.12
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Accelerator Ratio: 0% Read / 100% Write
Drive Write Cache: Disabled
Total Cache Size: 256 MB
Battery Pack Count: 1
Battery Status: OK
SATA NCQ Supported: True

Array: A
Interface Type: SAS
Unused Space: 0 MB
Status: OK

Logical Drive: 1
Size: 1.4 TB
Fault Tolerance: RAID 5
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Stripe Size: 64 KB
Status: OK
Array Accelerator: Enabled
Parity Initialization Status: Initialization Completed
Unique Identifier: 600508B100104B39535733314E4C0003
Disk Name: /dev/cciss/c0d0
Mount Points: / 3.9 GB, none 12.0 GB, /var 3.9 GB, /tmp 3.9 GB, /home 1.3 TB
Logical Drive Label: A0432BCEP61630K9SW31NLD55F

physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1BEP400009004UQCD
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1ZZRN000090035Q2Q
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP20VCQ00009004XE2V
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1ZZSB00009003MMWZ
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:5
Port: 1I
Box: 1
Bay: 5
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1L4T000009004UQCV
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:6
Port: 1I
Box: 1
Bay: 6
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP196KG00009004S0ZH
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown

In the above output our point of interest was “Accelerator Ratio: 0% Read / 100% Write“. In this case it has been set to an optimal value. In case it wasn’t set to an optimal value it can be changed using the command “ctrl slot=5 modify cacheratio=0/100

When you are stuck for a particular command, you can just run “help <command name>” for more input.

=> help aa

The following documentation pertains to your search:

<target> modify [arrayaccelerator=enable|disable]
Enables or disables the array accelerator for a given logical drive. The
target can be any valid logical drive target on a controller that supports
array accelerator management.

<target> modify [cacheratio=#/#|?]
Sets the array accelerator cache ratio for the controller. The first # is
the read cache %. The second # is the write cache %. The target can be any
valid controller.

<target> create [type=ld]
[drives=[#:]#:#,[#:]#:#,[#:]#:#-[#:]#:#],…|all|allunassigned]
[raid=6|5|1+0|1|0|?]
[size=#|?]
[stripesize=8|16|32|64|128|256|default|?]
[sectors=32|63|?]
[arrayaccelerator=enable|disable|?]
[drivetype=sas|satalogical|sata|saslogical|parallelscsi|?]

[type=] The type parameter specifies the device type that is being created.
A logical drive is the only device type supported at this time.

[drives=] The drives parameter specifies the physical drives to be used for
creating a logical drive on a new or existing array. If the drives specified
are all unassigned drives, then a new array will be created with a new
logical drive on it. If the drives specified are all assigned to an existing
array, then a new logical drive will be created on that array. The symbol
#:# stands for port:id or box:bay, depending on the controller. Some
controllers may also support port:box:bay and use the #:#:# syntax. The all
and allunassigned keywords both target all physical drives that are not
currently assigned to an array.

[raid=] The raid parameter sets the raid level of the logical drive. If not
specified, the default raid is the highest level possible. The availability
of certain raid settings depends on the number of drives designated in the
“drives=” parameter. For example, RAID 1 will only be available if two
drives are selected while RAID 1+0 will be shown for a selection of 4 or
more drives.

[size=] The size parameter specifies the size of the logical drive, the
implied units are MB. If not specified, the default is the maximum possible
size.

[stripesize=] The stripesize parameter sets the logical drive’s stripesize.
The implied units of stripe size are KB.

[sectors=] The sectors parameter specifies the sectors per track of the
logical drive. If not specified, the default is 32.

[arrayaccelerator=] The arrayaccelerator parameter specifies the array
accelerator state for the logical drive. If not specified, the default is
enable.

[drivetype=] The drivetype parameter specifies the drive
interface type. If there are multiple drive types when selecting all
physical drives, the desired drive type needs to be specified. Mixed drives
are not allowed on the same array or logical drive. If all drive types in a
controller are the same this parameter is not needed. The target can be a
controller or an array in the system.

Examples:
controller slot=3 logicaldrive 2 modify arrayaccelerator=enable
controller slot=1 modify cacheratio=25/75
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=6
ctrl slot=1 create type=ld drives=1:1-1:6,1:9,1:10-1:12 raid=6
ctrl slot=1 create type=ld drives=all drivetype=parallelscsi
controller slot=5 array A create type=ld raid=5 size=1000
controller slot=1 array C create type=ld raid=1 stripesize=32
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 stripesize=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=1+0 stripesize=?
ctrl slot=1 create type=ld drives=1:1,1:2 raid=1 size=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=1+0 sectors=?


Tagged: accelerator, cache, controller, HP DL180, hpacucli, mysql
PlanetMySQL Voting: Vote UP / Vote DOWN

RAID Controllers Cache Management – Missing Features

Апрель 16th, 2010

PERC4DC_4 We all know how important hardware RAID controllers are in today’s data storage performance especially when dealing with large data sets. If we look at the trend from now to couple of years back; they really evolved rapidly with lot of useful features and their usage also grown as most of the new servers by default has one or two controllers built-in (one for internal and another one for external storage array or for redundancy).

Few popular RAID controller vendors in the market: 

More or less everyone supports all common features and differs in number of ports, protocol support (ISCSI, SATA, SAS, HBA/FB), transfer speed, RAID levels, total disks support, cache size and its management.

Controller Cache – Database Workloads

For database OLTP workloads (IO bound), controller cache plays a crucial role for overall write or read throughput, depending on how the cache is used. Most RAID controllers are equipped with either 128MB or 256 MB or 512MB cache, and newer controllers like HP Smart Array P812 supports 1GB.

Write-back mode improves the writes performance by magnitude as the write request is returned as completed as soon as the data is in the controller cache without actually writing to the disk (that’s why controller needs a BBU, Battery Backup Module so that there is no data loss on power failures)

In case if you enable the read ahead from the controller (sometimes good for OLAP workloads or ETL data warehouse, especially adaptive read ahead due to heavy sequential access); then the same cache is used to store the pre-fetched data that can be satisfied later from the cache without hitting the disk. But in case if the database system does read ahead (like InnoDB), then it is better to turn off read ahead from controller to avoid page trashing.

For some workloads, the controller cache can also cause negative performance if the cache is not properly utilized by the controller.

Missing Cache Management Tools

At present, none of the controllers either supports any cache management tools nor exposes how the cache has been actually used, so that one can adjust the cache according to the workloads for improved performance.

Some of the missing features:

  • A way to flush the data from cache to disks, so that the systems can be taken for offline maintenance. Right now there is no easy way to flush data from cache to disk; other than some of the controllers will indicate through LED whether data is in the cache or not
  • Way to set the cache threshold in time or %, so that it can start flushing to disk once it meets the threshold value. For example; if you notice big spikes from RRD graphs for every few minutes, then one can adjust the threshold to evenly distribute the load.
  • Cache usage statistics (writes data size, read ahead data size etc ), so that workload can be adjusted to yield much better results
  • Splitting of cache between reads and writes either in size or by %; so that they do not overlap and cause performance issues. For example; one prefers to set 20% for read ahead data and 80% for writes. Only HP Smart Array controller supports this feature at present.

As you get more control over the controller cache, the more you can tweak and adjust the workloads to get improved performance. Hopefully one day all vendors will expose more cache management options.


PlanetMySQL Voting: Vote UP / Vote DOWN