Archive for the ‘tips’ Category

Setting up XFS on Hardware RAID — the simple edition

Декабрь 16th, 2011

There are about a gazillion FAQs and HOWTOs out there that talk about XFS configuration, RAID IO alignment, and mount point options.  I wanted to try to put some of that information together in a condensed and simplified format that will work for the majority of use cases.  This is not meant to cover every single tuning option, but rather to cover the important bases in a simple and easy to understand way.

Let’s say you have a server with standard hardware RAID setup running conventional HDDs.

RAID setup

For the sake of simplicity you create one single RAID logical volume that covers all your available drives.  This is the easiest setup to configure and maintain and is the best choice for operability in the majority of normal configurations.  Are there ways to squeeze more performance out of a server by dividing the logical volumes: perhaps, but it requires a lot of fiddling and custom tuning to accomplish.

There are plenty of other posts out there that discuss RAID minutia.  Make sure you cover the following:

  • RAID type (usually 5 or 1+0)
  • RAID stripe size
  • BBU enabled with Write-back cache only
  • No read cache or read-ahead
  • No drive write cache enabled

Partitioning

You want to run only MySQL on this box, and you want to ensure your MySQL datadir is separated from the OS in case you ever want to upgrade the OS, but otherwise keep it simple.  My suggestion?  Plan on allocating partitions roughly as follows, based on your available drive space and keeping in mind future growth.

  • 8-16G for Swap –
  • 10-20G for the OS (/)
  • Possibly 10G+ for /tmp  (note you could also point mysql’s tmpdir elsewhere)
  • Everything else for MySQL (/mnt/data or similar):  (sym-link /var/lib/mysql into here when you setup mysql)

Are there alternatives?  Yes.  Can you have separate partitions for Innodb log volumes, etc.?  Sure.  Is it work doing much more than this most of the time?  I’d argue not until you’re sure you are I/O bound and need to squeeze every last ounce of performance from the box.  Fiddling with how to allocate drives and drive space from partition to partition is a lot of operational work which should be spent only when needed.

Aligning the Partitions

Once you have the partitions, it could look something like this:
#fdisk -ul

Disk /dev/sda: 438.5 GB, 438489317376 bytes
255 heads, 63 sectors/track, 53309 cylinders, total 856424448 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00051fe9

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048     7813119     3905536   82  Linux swap / Solaris
Partition 1 does not end on cylinder boundary.
/dev/sda2   *     7813120    27344895     9765888   83  Linux
/dev/sda3        27344896   856422399   414538752   83  Linux
 Several months ago my colleague Aurimas posted two excellent blogs on both the theory of Aligning IO on hardware RAID and some good benchmarks to emphasize the point, go read those if you need the theory here.  Is it common on modern Linux systems for this to be off?  Maybe not, but here’s how you check.
  We want to use mysql on /dev/sda3, but how can we ensure that it is aligned with the RAID stripes?  It takes a small amount of math:
  • Start with your RAID stripe size.  Let’s use 64k which is a common default.  In this case 64K = 2^16 = 65536 bytes.
  • Get your sector size from fdisk.  In this case 512 bytes.
  • Calculate how many sectors fit in a RAID stripe.   65536 / 512 = 128 sectors per stripe.
  • Get start boundary of our mysql partition from fdisk: 27344896.
  • See if the Start boundary for our mysql partition falls on a stripe boundary by dividing the start sector of the partition by the sectors per stripe:  27344896 / 128 = 213632.  This is a whole number, so we are good.  If it had a remainder, then our partition would not start on a RAID stripe boundary.

Create the Filesystem

XFS requires a little massaging (or a lot).  For a standard server, it’s fairly simple.  We need to know two things:

  • RAID stripe size
  • Number of unique, utilized disks in the RAID.  This turns out to be the same as the size formulas I gave above:
    • RAID 1+0:  is a set of mirrored drives, so the number here is num drives / 2.
    • RAID 5: is striped drives plus one full drive of parity, so the number here is num drives – 1.
In our case, it is RAID 1+0 64k stripe with 8 drives.  Since those drives each have a mirror, there are really 4 sets of unique drives that are striped over the top.  Using these numbers, we set the ‘su’ and ‘sw’ options in mkfs.xfs with those two values respectively.
# mkfs.xfs -d su=64k,sw=4 /dev/sda3
meta-data=/dev/sda3              isize=256    agcount=4, agsize=25908656 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=103634624, imaxpct=25
         =                       sunit=16     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=50608, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

The XFS FAQ is a good place to check out for more details.

Mount the filesystem

Again, there are many options to use here, but let’s use some simple ones:

/var/lib/mysql           xfs     nobarrier,noatime,nodiratime

Setting the IO scheduler

This is a commonly missed step related to getting the IO setup properly.  The best choices here are between ‘deadline’ and ‘noop’.   Deadline is an active scheduler, and noop simply means IO will be handled without rescheduling.  Which is best is workload dependent, but in the simple case you would be well-served by either.  Two steps here:

echo noop > /sys/block/sda/queue/scheduler   # update the scheduler in realtime

And to make it permanent, add ‘elevator=<your choice>’ in your grub.conf at the end of the kernel line:

kernel /boot/vmlinuz-2.6.18-53.el5 ro root=LABEL=/ noapic acpi=off rhgb quiet notsc elevator=noop

 

This is a complicated topic, and I’ve tried to temper the complexity with what will provide the most benefit.  What has made most improvement for you that could be added without much complexity?


PlanetMySQL Voting: Vote UP / Vote DOWN

Never say "there is no way"

Декабрь 1st, 2011
Reading a recent MySQL book, I saw an example of SHOW CREATE TABLE that comes with backticks (`) around the table and column names, and a comment:
Unfortunately, there is no way to remove this from generated syntax with this command.
(Emphasis mine).
Here's how it goes:
mysql> show create table mytest\G
*************************** 1. row ***************************
Table: mytest
Create Table: CREATE TABLE `mytest` (
`id` int(11) NOT NULL,
`description` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Of course, there is a way!

mysql> pager tr -d '`'
PAGER set to 'tr -d '`''
mysql> show create table mytest\G
*************************** 1. row ***************************
Table: mytest
Create Table: CREATE TABLE mytest (
id int(11) NOT NULL,
description varchar(50) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Never say "there is no way!"

Instead of "tr -d '`'", you can use "sed -e 's/`//g'", which does the same thing.

If you are running the query at the command line, you may use the pipe directly:

$ mysql -e 'show create table test.mytest\G' | tr -d '`'
*************************** 1. row ***************************
Table: mytest
Create Table: CREATE TABLE mytest (
id int(11) NOT NULL,
description varchar(50) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

PlanetMySQL Voting: Vote UP / Vote DOWN

TIL: Lookout For DEFINER

Октябрь 13th, 2011

The Issue


I haven't blogged in a while an I have a long TODO list of things to publish: The repository for the SNMP Agent, video and slides of my OSCON talk and a quick overview of MHA master-master support. In the meantime, here's a little fact that I didn't know from MySQL CREATE VIEW documentation:

Although it is possible to create a view with a nonexistent DEFINER account, an error occurs when the view is referenced if the SQL SECURITY value is DEFINER but the definer account does not exist.
How can this be possible?

The Problem

For a number of reasons we don't have the same user accounts on the master than we have on the slaves (ie: developers shouldn't be querying the master). Our configuration files include the following line:
replicate-ignore-table=mysql.user
So if we create a user on the master, the user definition doesn't go through the replication chain.

So a VIEW can be created in the master, but unless we run all the proper GRANT statements on the slave as well, the VIEWs won't be effective on the slaves. Example from our slave (output formatted for clarity):

show create view view3\G
*************************** 1. row ***************************
View: view3
Create View: CREATE ALGORITHM=UNDEFINED
DEFINER=`app`@`192.168.0.1`
SQL SECURITY DEFINER VIEW `view3` AS select
[...]

show grants for `app`@`192.168.0.1`;
ERROR 1141 (42000): There is no such grant defined
for user 'app' on host '192.168.0.1'

The Solution

Once again, Maatkit's to the rescue with mk-show-grants on the master:
mk-show-grants | grep 192.168.0.1
-- Grants for 'app'@'192.168.0.1'
GRANT USAGE ON *.* TO 'app'@'192.168.0.1'
IDENTIFIED BY PASSWORD '*password_hash';
GRANT DELETE, EXECUTE, INDEX, INSERT, SELECT,
SHOW VIEW, UPDATE ON `pay`.* TO 'app'@'192.168.0.1';
A simple copy from the master and paste onto the slave fixed it.

Conclusion

Every now developers come to me with unusual questions. In this case it was: How come I can access only 2 out of 3 views?. In cases like these, it usually pays off to not overthink the issue and look into the details. A SHOW CREATE PROCEDURE on the 3 views quickly showed that one had a different host for the DEFINER. A quick read through the documentation and an easy test confirmed the mistake. That's why I have 3 mantras that I keep repeating to whomever wants to listen:
  • Keep it simple
  • Pay attention to details
  • RTFM (F is for fine)
It constantly keeps me from grabbing some shears and going into yak shaving mode.

PlanetMySQL Voting: Vote UP / Vote DOWN

What is this MySQL file used for?

Февраль 17th, 2011

MySQL keeps many different files, some contain real data, some contain meta data. Witch ones are important? Witch can your throw away?

This is my attempt to create a quick reference of all the files used by MySQL, whats in them, what can you do if they are missing, what can you do with them.

When I was working for Dell doing Linux support my first words to a customer where “DO YOU HAVE COMPLETE AND VERIFIED BACKUP?” Make one now before you think about doing anything I suggest here.

You should always try to manage your data through a MySQL client.  If things have gone very bad this may not be possible. MySQL may not start. If your file system get corrupt you may have missing files. Sometimes people create other files in the MySQL directory (BAD).  This should help you understand what is safe to remove.

Before you try to work with one of these files make sure you have the file permissions set correctly.

This may not be a complete list of files used my MySQL.  It most certainly doesn’t describe everything each table is used for. If you know of ways to replace a missing file or what happens to MySQL when a file is missing that I haven’t described here, please leave me a comment or email me.  I’ll update this document and give your a reference.

my.cnf

This file alters the default configuration settings. MySQL looks in the /etc directory for my.cnf. You should review this file to insure you are looking in the right place for all other MySQL files.  MySQL WILL run without it.  If you have trouble getting MySQL to start, read the error log then try moving are renaming this file.

mysql <directory>

On Linux servers the default location for MySQL files is /var/lib/mysql.  This directory is controlled by the “datadir” variable.

Do I need to say, deleting this directory deletes everything?

ibdata1

If your remove this file your InnoDB DATA IS GONE and MySQL will recreate an empty file.

If you are not using the innodb_file_per_table option (default), this file holds almost ALL of your data in InnoDB tables.  This file is all but useless without its corresponding  ‘.frm’ file for each table in the right database directory. If all you have is the .frm files you can recreate the structure of your tables. (See below.)

Idbdata1 can get really big. The default size is 10MB. MySQL will automatically extended it by the default size as needed. If MySQL crashed, some of your InnoDB data may be in your transaction logs (ib_logfile.*).

By design the InnoDB file does not shrink. The safest way to shrink this file is to take a complete backup, stop MySQL, remove the ib* files, start MySQL and restore all your data.  REALLY.

DatabaseName <directory in mysql>

Each MySQL database has a directory named after it.  Each directory holds the meta data for the database. Your MyISAM data is in this directory. InnoDB tables may be here if the innodb_file_per_table variable is used. By default InnoDB tables are in the ibdata1 file (see above).

If you delete the directory your data may be gone. MyISAM data WILL be lost.  InnoDB tables may still be in the ibdata1.  If so, you will need to recreate the meta data files to recover your data.

Creating a directory is almost equivalent to ‘create database’.  If a directory exists MySQL will show you have a database by that name. The create database command may also creates a db.opt file.

<TableName>.frm

This file is key to both InnoDB and MyISAM databases. It is the meta data to the location of your data. It contains the table column definitions.

If you remove this file MySQL will tell you your DATA doesn’t exist.  It does. Your data is still in the ibdata (.ibd) file or the ibdata1 file. You need to recreate the table to recreate this file. If you don’t know the exact structure of this table your out of luck.

Stop MySQL and move the .MYD and .MYI files to another directory.  (You might also make a backup copy.) Start MySQL and recreate this table. Stop MySQL and copy the .MYD and .MYI files back to the database directory and restart MySQL.

<TableName>.MYD

THIS IS YOUR MyISAM DATA. If this is all you have, and you know the data structure of the the table, all is not lost.  (See .frm Above.)  You may also need to recreate the .MYI index file.

<TableName>.MYI

This file contains the indexs for your table.  If it becomes corrupt or is deleted you can recreate it using the ‘REPAIR TABLE table_name USE_FRM;’ command.

<TableName>.ibd

THIS IS YOUR InnoDB DATA. If this is all you have, and you know the data structure of the the table, all is not lost.  (See .frm Above.) Unlink MyISAM tables the indexes are contained in this file with your data.

MySQL doesn’t create this file unless you are using the innodb_file_per_table option.

<TableName>.CSV

THIS IS YOUR CSV DATA.  This file contains comma separated text data. These file do not have indexes.

<TableName>.CSM

This file contains meta data for CSV and archive tables.  I have not found what is stored here. I do know it tells MySQL if you are logging to the general logs.

ib_logfile*

This file contains your un-committed transactions data. MySQL uses it to recover from a crash.

If you shut down InnoDB cleanly, you can remove them. MySQL will recreate them.

If you change the size of innodb_log_file_size, you will need to recreate these files by stopping MySQL cleanly and deleting them.

mysql-bin.*

This “Bin Log” files contain any change made to any database. Each transaction is assigned a MASTER_LOG_POS(ion).  These files are not created by default. They are used for replication and point-in-time recovery.

You can stop the server and remove these files IF you remove the mysql-bin.index file as well.  MySQL creates a new bin log file each time it starts or the logs are flushed.  Deleting these files will the server is running will break replication.

mysql-bin.index

This fail is used by MySQL to keep a Bin Log list.  It is a simple text file like;

./mysql-bin.000001

./mysql-bin.000002

./mysql-bin.000003

If you remove this file, MySQL will recreate it with only the newest bin log name. If you need to remove old bin logs use the command “purge binary logs [to mysql-bin.######] [before “yyyy-mm-dd”]”.

You can control the number of bin logs using the expire_logs_days variable.

mysqld.log

This is MySQL’s primary administration log. MySQL reports starts and stops as well as some warning and errors in this file.  If MySQL crashes, mysqld_safe will restart it. This log will report this.

You can delete this file if needed.

slow.log

The slow query log consists of all SQL statements that took more than long_query_time seconds to execute and (as of MySQL 5.1.21) required at least min_examined_row_limit rows to be examined.

You can delete this file if needed.

db.opt

Database characteristics, like the CHARACTER SET clause are stored in the db.opt file. You may have strange query results if this file is missing. MySQL will use it’s default. You can recreate this file my altering the table with the correct settings.

mysql.pid

The PID file hold the process ID number for the running server. MySQL creates this file and scripts that start and stop MySQL use it to control MySQL.

MySQL will remove this file when is stops. You should not delete it if MySQL is running. If mysql is NOT running and the file exists MySQL may have crashed and you should delete this file.

References:

FULL DISK
http://dev.mysql.com/doc/refman/5.1/en/full-disk.html

MySQL Database Backup .MYI and .MYD
http://www.aeonity.com/frost/mysql-database-backup-myi-myd

Recovering from Crashes
http://dev.mysql.com/tech-resources/articles/recovering-from-crashes.html

Tweet


PlanetMySQL Voting: Vote UP / Vote DOWN

Excluding databases from mysqldump

Декабрь 6th, 2010
A question that came up during the MySQL track at the UKOUG conference in Birmingham was "Can I exclude only a few databases from mysqldump? Let's say that I have 50 databases, and I want to dump all of them, except a few."
As many know, mysqldump has an option to ignore specific tables. SO if you have 1,000 tables in a databases, you can tell mysqldump to dump all the tables except a few ones.
There is no corresponding option to exclude one or more databases.
However, if you know your command line tools, the solution is easy:
First, we get the list of all databases:
mysql -B -N -e 'show databases'
information_schema
employees
five
four
mysql
one
performance_schema
six
test
three
two
-B forces batch mode (no dashes box around the data), while -N gets the result without the headers.
Now, let's say that we want to exclude databases four, five, and six. And since we want to avoid unpleasant side effects, also information_schema and performance_schema.
Thus, we pipe the previous data through a filter. I use Perl, but sed or grep could get the job done.
mysql -B -N -e 'show databases' | \
perl -ne 'print unless /four|five|six|_schema/'
employees
mysql
one
test
three
two
Now that we have the list of databases that we need, we can tell mysqldump to backup the databases from such list. All we need is converting the vertical list into a horizontal one using xargs

mysql -B -N -e 'show databases' | \
perl -ne 'print unless /four|five|six|_schema/' \
xargs echo mysqldump -B
mysqldump -B employees mysql one test three two
That's it. The last line is the resulting command. Once you are sure that it is what you want, remove the "echo" after xargs, and the command will be executed.

PlanetMySQL Voting: Vote UP / Vote DOWN

Lost innodb tables, xfs and binary grep

Ноябрь 9th, 2010

Before I start a story about the data recovery case I worked on yesterday, here’s a quick tip – having a database backup does not mean you can restore from it. Always verify your backup can be used to restore the database! If not automatically, do this manually, at least once a month. No, seriously – in most of the recovery cases I worked on, customers did have some sort of backup, but it just wasn’t working, complete and what not. Someone set it up and never bothered to check if it still works after a while.

Anyway, this post is not really about the backups but rather about few interesting things I learned during last recovery case.

First, some facts about the system and how data was lost:

  • MySQL had a dedicated partition on XFS file system
  • Server was running innodb_file_per_table
  • There was a production master and two slaves, all had same setting
  • Developer accidentally ran DROP DATABASE X on the wrong machine (production master)
  • All slaves followed and dropped their copy of the data
  • The important tables were all InnoDB
  • Having a backup, customer has first attempted to restore from backup on the production master

Luckily (or rather, unfortunately) backup only had table definitions but not the data so no data was written to file system. Mind however that restoring a backup could have been fatal if it would have written some junk data as that would have overwritten the deleted files. Now, here’s what I learned while working on this case:

Recovering from XFS is possible. Just a month ago we had a team meeting in Mallorca where we went through various data loss scenarios. One of them was deleted files on xfs – we all agreed on few things:

  • recovering files from xfs is hard, if at all possible
  • we had no recovery cases on xfs, most likely because:
  • whoever is using xfs, is smart enough to have backups set up properly

Now I’m not picking on the customer or anything – indeed they did have a backup set up, it’s just that some (most important) tables weren’t backed up. We did not try any of the file recovery tools for xfs – apparently they are all targeting specific file types and sure enough InnoDB is not one of the supported files. What we did is we simply ran page_parser on the (already) unmounted file system treating it as a raw device. I was surprised how amazingly simple and fast it was (did you know that latest version of page_parser identifies pages by infimum and supremum records?) – 10G partition was scanned in like 5 minutes and all 4G of innodb pages were successfully written to a separate partition. That’s the easy part though – you run page parser, wait and see what you get.

If InnoDB Data Dictionary was not overwritten by an attempt to restore from the backup, actually second part would’ve been quite easy too, but it was so I could no longer identify correct PK id for specific tables by just mapping data dictionary table records to index records. Instead I had to grep for specific character sequences against all pages. Note however that only works for text in uncompressed text columns (varchar, char, text) but what if tables don’t have any text columns at all? Then, you read further.

GNU grep won’t match binary strings. This isn’t new, I kind of knew grep couldn’t look for binary “junk”, but I really needed it to. Why? Well, here’s few of the scenarios we’ve gone through yesterday:

1. There was this rather big table with integer and enum columns only, where we knew a rather unique PK, well something like 837492636 so we needed a way to find pages that match it. InnoDB would internally store integers in 4-bytes rather than 10 bytes if it were stored as a sequence of characters, so “grep -r 837492636 /dir” would not have worked.

2. There was another table, a small one with 4 smallint columns where all we could match on was a sequence of numbers from a single record – customer knew that there was at least one row with the following sequence: 7, 3, 7, 8. Matching by any of the numbers would be insane as it would match all of the pages while matching on numbers as a sequence of characters would not work for many reasons.

This is where I found bgrep which was exactly the tool for the task. In the case number one, I have just converted number 837492636 to it’s binary representation 0×31EB1F9C and ran “bgrep 31EB1F9C /dir” – there were only like 10 other matches across the 4 gigabytes of pages, some of them probably from the secondary pages, but when you only have that many pages to check it’s really simple.

Second case seemed somewhat complicated, but it really wasn’t – all of the columns were fixed size – 2bytes each, so the thing we had to look for was this sequence: 0007000300070008. I was expecting a lot of mismatches but in fact I ended up with only one match pointing exactly to the right page and so the right index id.

The other thing I would note about bgrep – it was so much faster than matching text using grep, so if you happen to have a lot of data to scan and you have to choose between matching text and number, matching a number using bgrep may work much better.

We are considering shipping bgrep as part of percona recovery toolset, with some additional converters so we can match against various date/time columns as well.


Entry posted by Aurimas Mikalauskas | 2 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

An argument for not using mysqldump

Ноябрь 8th, 2010

I have a 5G mysqldump which takes 30 minutes to restore from backup.  That means that when the database reaches 50G, it should take 30×10=5 hours to restore.  Right?  Wrong.

Mysqldump recovery time is not linear.  Bigger tables, or tables with more indexes will always take more time to restore.

If I restore from a raw backup (LVM snapshot, xtrabackup, innodb hot backup), it is very easy to model how much longer recovery time will take:

Backup is 80G
Copy is at 70MB/s.
10G is already complete.
= ((80-10) * 1024)/70/60 = ~17 minutes

I can tell progress with mysqldump by monitoring the rate at which show global status like 'Handler_write'; increases and compare it to my knowledge of about how many rows are in each table.  But progress != a magic number like “17 minutes”.  Not unless I do a lot of complex modeling.

I am not saying a 5 hour recovery is good or bad.  What I am saying is knowing remaining time is very important during disaster recovery.  Being able to say “we’ll be back at 2PM” is much better than saying “we’ll be back between 1PM and 4PM.. maybe”.


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Sharing an auto_increment value across multiple MySQL tables

Октябрь 5th, 2010

The title is SEO bait – you can’t do it. We’ve seen a few recurring patterns trying to achieve similar – and I thought I would share with you my favorite two:

Option #1: Use a table to insert into, and grab the insert_id:

CREATE TABLE option1 (id int not null primary key auto_increment) engine=innodb;

# each insert does one operations to get the value:
INSERT INTO option1 VALUES (NULL);
# $connection->insert_id();

Option #2: Use a table with one just row:

CREATE TABLE option2 (id int not null primary key) engine=innodb;
INSERT INTO option2 VALUES (1); # start from 1

# each insert does two operations to get the value:
UPDATE option2 SET id=@id:=id+1;
SELECT @id;

So which is better? I don’t think it’s that easy to tell at a first glance, since option 2 does look more elegant – but if the next value is fetched as part of a transaction – I can see a potential for many other transactions to back up waiting on a lock (more on that in a second).

To start with a naive test, I booted two EC2 small instances in the same availability zone. Ping times are ~0.5ms between nodes. –skip-name resolve is enabled on the server. There is some skew from the machine being virtualized. My simulation is:

  1. Create a new sequence value
  2. insert into another table

The testing options are:

  • Option 1 – auto_commit each statement
  • Option 2 – auto_commit each statement
  • Option 1 – nest each loop in a transaction
  • Option 2 – nest each loop in a transaction

The raw results are:

option1 usetransactions =19 seconds for x10000 iterations.
option1 ignoretransactions = 13 seconds for x10000 iterations.
option2 usetransactions = 27 seconds for x10000 iterations.
option2 ignoretransactions =22 seconds for x10000 iterations.

Alright – option1 seems quicker. The problem is that to be like most applications, we can’t really tell until a little concurrency is applied. Using only the “transactional” test in a few more concurrency options:


(results are transactions/second – more is better).

Option #1 starts to flatten out after 4 threads – and this is probably just because the machine I am running it against has 1 CPU. Option #2 stays flat the whole time.. and while we are running it, most of the threads are perpetually in a state of ‘Searching rows for update’ – which is what I suspect is better described as waiting on a lock.

Option #2 will likely scale better in auto_commit, since locks are held for the duration of a transaction, but this is not always possible to do if you have already started modifying data before you need an auto_increment number but you do not want to commit yet.

Full disclosure


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Book review : SQL Antipatterns

Сентябрь 23rd, 2010

SQL Antipatterns, by Bill Karwin


I remember that when I finished reading The Lord Of The Rings, I felt a pang of disappointment. "What? Already finished? What am I going to read now? What can give me the same pleasure and sense of accomplishment that these wonderful pages have given me?"
That's how I felt when I came to the last page of SQL Antipatterns. And, no, Bill Karwin doesn't tell imaginary tales from a fictitious world. This book is full of very real and very practical advice, but all the material is presented with such grace and verve that I could not put it down until the very end. I read it cover to cover in just a few hours, and I savored every page.

What is this Antipatterns, anyway? The title may deceive a casual bookshop browser into believing that it's about some philosophical database theory. Digging further, you realize that it's a collection of best practice in database handling, but told from the side of the bad practice. Every chapter starts with the story of a disaster, followed by an explanation of why it happened, and how to recognize the symptoms. Along the way, you learn what people do to fix the problem when they don't understand where exactly the trouble comes from. And then comes the best practice section, where you are told of several ways of fixing the problem for good.
It's a pleasure from the start. When I read the introductory stories, I recognized mistakes that I have witnessed in my past experience and some that I made myself. It was a fascinating sequence of "look at this mess!", "Here is what you shouldn't do", "It happens when ...", and "this is what you should do".
The more I read, the more I wanted to read. And after a few chapters, a pattern appeared among the Antipatterns. The stories are not a casual collection of database mistakes. There is a thread. Most of the stories tell you what happens when you violate the relational theory principles, and the best practice solutions focus on how to apply relational principles in real life.
The relationship between the horror stories and the relational database theory becomes apparent when you get to the appendix. In other books, you can often skip the appendixes as a reminder of something that you know already. Not here. If you skip this one, you will miss a sort of Ellery Queen ending, where the detective puts together all the pieces of the mystery for the amazed audience.

I feel I would do a disservice to Bill if I revealed more about the book. I may have already spoiled some of the surprise by disclosing the inner structure of the book. But certainly I won't give away any of the juicy stories that make this book an invaluable tool for every database professional.
I can only say this. When I read a technical book, I usually find something wrong, or debatable, or some technical solution that I could improve upon. Not in this book. I completely agree with the theory and the practice that is suggested from start to finish. My QA ego suffers at the idea of not having found anything wrong, but this slight disappointment is more than compensated by the pleasure of reading such a beautiful book. Well done, Bill! I am already queuing to buy your next one!

PlanetMySQL Voting: Vote UP / Vote DOWN