Archive for the ‘snapshot’ Category

zfs FileSystem and MySQL

Август 17th, 2011



ZFS is a new kind of 128-bit file system that provides simple administration, transactional semantics, end-to-end data integrity, and immense scalability. ZFS is not an incremental improvement to existing technology; it is a fundamentally new approach to data management. ZFS was first introduced in Solaris in 2004 and it is a default filesystem in OpenSolaris, but Linux ports are underway, Apple is shipping it in OS X 10.5 Leopard with limited zfs capability ( Apple shutdown this project afterward due to some known reason), and it will be included in FreeBSD 7.

ZFS Features:
  • Pooled Storage Model
  • Always consistent on disk
  • Protection from data corruption
  • Live data scrubbing
  • Instantaneous snapshots and clones
  • Portable snapshot streams
  • Highly scalable
  • Built in compression
  • Simplified administration model

Pooled Storage Model: ZFS presents a pooled storage model that completely eliminates the concept of volumes and the associated problems of partitions, provisioning, wasted bandwidth and stranded storage. Thousands of file systems can draw from a common storage pool, each one consuming only as much space as it actually needs. The combined I/O bandwidth of all devices in the pool is available to all file systems at all times.





Always consistent on disk: All operations are copy-on-write transactions, so the on-disk state is always valid. Every block is checksummed to prevent silent data corruption, and the data is self-healing in replicated (mirrored or RAID) configurations. If one copy is damaged, ZFS detects it and uses another copy to repair it.

Protection from data corruption: ZFS introduces a new data replication model called RAID-Z. It is similar to RAID-5 but uses variable stripe width to eliminate the RAID-5 write hole (stripe corruption due to loss of power between data and parity updates). All RAID-Z writes are full-stripe writes. There's no read-modify-write tax, no write hole, and — the best part — no need for NVRAM in hardware. ZFS loves cheap disks.

Live data scrubbing: But cheap disks can fail, so ZFS provides disk scrubbing. Similar to ECC memory scrubbing, all data is read to detect latent errors while they're still correctable. A scrub traverses the entire storage pool to read every data block, validates it against its 256-bit checksum, and repairs it if necessary. All this happens while the storage pool is live and in use.
ZFS has a pipelined I/O engine, similar in concept to CPU pipelines. The pipeline operates on I/O dependency graphs and provides scoreboarding, priority, deadline scheduling, out-of-order issue and I/O aggregation. I/O loads that bring other file systems to their knees are handled with ease by the ZFS I/O pipeline.

Instantaneous snapshots and clones (Most important and useful for huge backups in seconds): ZFS provides 2 64 constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a file system, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients.

Portable snapshot streams (Important & useful feature): You snapshot a ZFS file system, but you can also create incremental snapshots. Incremental snapshots are so efficient that they can be used for remote replication, such as transmitting an incremental update every 10 seconds.

Highly scalable (Important  useful feature): There are no arbitrary limits in ZFS. You can have as many files as you want: full 64-bit file offsets, unlimited links, directory entries, and so on.

Built in compression: ZFS provides built-in compression. In addition to reducing space usage by 2-3x, compression also reduces the amount of I/O by 2-3x. For this reason, enabling compression actually makes some workloads go faster.
In addition to file systems, ZFS storage pools can provide volumes for applications that need raw-device semantics. ZFS volumes can be used as swap devices, for example. And if you enable compression on a swap volume, you now have compressed virtual memory.

Simplified administration model: ZFS administration is both simple and powerful. zpool and zfs are the only two command you need to know. Please see the zpool(1M) and zfs(1M) man pages for more information.
The storage pool is a key abstraction: a pool can consist of many physical devices, and can hold many filesystems. Whenever you add storage to the pool, it becomes available to any filesystem that may need it. To take a newly-attached disk and use the whole disk for ZFS storage, you would use the command.

# zpool create zpool1 c2t0d0

Here, zpool1 represents the name of a pool, and c2t0d0 is a disk device.

If you have a disk had already been formatted – say, with a UFS filesystem on one partition – you can create a storage pool from another free partition:
# zpool create zpool1 c2t0d0s2 

You can even use a plain file for storage:
# zpool create zpool1 ~/storage/myzfile

Once you have a storage pool, you can build filesystems on it:
# zfs create zpool1/data # zfs create zpool1/logs 

Later on, if you run out of space, just add another device to the pool, and the filesystem will grow.
# zpool add zp1 c3t0d0

ZFS and Tablespaces:



innodb_data_file_path = /dbzpool/data/ibdatafile:20G:autoextend

Here is the only innodb_data_file_path that any ZFS system might ever need. You can split this over as many drives as you want, and ZFS will balance the load intelligently. You can stripe it, mirror it, add space when you need room to grow, bring spare disks online, and take faulted disks offline, without ever restarting the database.




PlanetMySQL Voting: Vote UP / Vote DOWN

On LVM: How to setup Volume Groups and Logical Volumes.

Май 28th, 2011
LVM (Logical Volume Management) is a very important tool to have in the toolkit of a MySQL DBA. It allows you to create and extend logical volumes on the fly. This allows me to, say, add another disk and extend a partition effortlessly. The other very important feature is the ability to take snapshots, that you can then use for backups. All in all its a must have tool. Hence, this guide will allow you to understand various terminologies associated with LVM, together with setting up LVM volumes and in a later part will also show you how to extend...
PlanetMySQL Voting: Vote UP / Vote DOWN

Using LVM snapshot filesystems for development database instances

Август 22nd, 2010

The Problem

Developers often need to have a development database copy of the live production system you are using in able to allow them to test their code and to test new functionality and make schema changes to the database for this new functionality to work.

That’s normal and happens everywhere. A typical DBA task is to make a copy of the live system, sometimes to remove any confidential or sensitive information which perhaps the development database users should not be able to see, and then give them access to this development instance. The developers then “hack away”, changing their code and perhaps things in the database until they are ready to put these new changes into production when they then come along and discuss how to apply these changes into the live systems.

Once the development database has been created it soon becomes stale so often the developers want a new up to date copy to be made to “simplify” their testing.

This is all fine until the database size begins to grow and this process of dumping and copying the data for the developers takes hours rather than minutes and therefore can only be done on a daily or weekly basis.

I have recently been experimenting with the use of mounting the development database instance on an  LVM snapshot of the original filesystem where the live system is running.  This procedure does not have to be Linux specific but should work with any OS or storage which provides a facility to make a filesystem snapshot based on the contents of another filesystem.

So what does this mean in practice?

Previous Behaviour:

Situation: server1 has a live production copy of the database. We want to make a copy to server2. server2 is already prepared with a configuration which will work based on the copy of the production data.

Procedure: stop server1, copy the filesystem holding the database to server2, start server1. start server2. [the copy procedure takes hours.]

New Behaviour:

Situation: server1 has a live production copy of the database (probably via a slave), and space/memory for a second development instance to run concurrently on the same server.

Procedure: stop server1 (live instance), make a LVM snapshot of the live filesystem (the snapshot size can be much smaller than the live filesystem size), start server1 (live instance), start server1 (development instance). [the copy procedure takes just a few seconds.]

Since I do this with the live system being a slave, I tend to also include a routine to disable replication information on the snapshot filesystem by removing the appropriate files. It may also be necessary adjust the grants on the dev-instance so that it is appropriate for the new set of db users.

To all intents and purposes when you login to the development instance it looks like an up to date copy of live system. You can make as many changes as you like as long as the number of disk blocks on the snapshot which get changed don’t exceed the snapshot size. At this point the snapshot filesystem becomes invalid and mysqld is unable to access it. Mysqld gets a bit upset about this, but you just kill it and then go and rebuild the instance again if this happens – it only takes a few seconds.

This works pretty well and speeds things up for the developers. I can create a new development environment from the live system in seconds rather than hours. The disk storage requirements also tend to drop significantly. It also helps the devs. If you do a daily refresh of this development instance then it allows the developers to test any schema changes which will be needed to be applied to the live system much more easily as “going back to the current live state” is so easy.

So if you haven’t done something like this it might be worth giving it a go.

This is an example of the output from a script I’m currently using:


[root@myhost ~]# clone_instance -s40G -d live-instance dev-instance
Aug 22 10:57:00 myhost clone_instance[16405] Cloning live-instance to dev-instance with a snapshot volume of size 40G
Aug 22 10:57:00 myhost clone_instance[16405] /mysql/live-instance is mounted as expected
Aug 22 10:57:00 myhost clone_instance[16405] Device /dev/volgroup1/live-instance is mounted on /mysql/live-instance, having volume group: volgroup1, logical volume: live-instance
Aug 22 10:57:00 myhost clone_instance[16405] live-instance is defined in /etc/my.cnf [mysqld1]
Aug 22 10:57:00 myhost clone_instance[16405] dev-instance is defined in /etc/my.cnf [mysqld2]
Aug 22 10:57:00 myhost clone_instance[16405] Found defaults file /root/.my-live.cnf needed to shutdown live-instance
Aug 22 10:57:00 myhost clone_instance[16405] Going to viciously kill any processes using files under mount point: /mysql/dev-instance
Aug 22 10:57:02 myhost clone_instance[16405] Unmounting /mysql/dev-instance
Aug 22 10:57:02 myhost clone_instance[16405] Removing existing SNAPSHOT LV /dev/volgroup1/dev-instance
Logical volume "dev-instance" successfully removed
Aug 22 10:57:03 myhost clone_instance[16405] SNAPSHOT LV /dev/volgroup1/dev-instance removed
Aug 22 10:57:03 myhost clone_instance[16405] Shutting down live-instance [mysqld1] using mysqldmin and defaults file /root/.my-live.cnf (as 'mysqld_multi stop 1' does not seem work properly)
Aug 22 11:02:34 myhost clone_instance[16405] Creating new snapshot LV dev-instance (40G) based on /dev/volgroup1/live-instance
Logical volume "dev-instance" created
Aug 22 11:02:35 myhost clone_instance[16405] Restarting live-instance [mysqld1] using mysqld_multi start 1
Aug 22 11:02:35 myhost clone_instance[16405] Mounting SNAPSHOT LV /dev/volgroup1/dev-instance on /mysql/dev-instance
Aug 22 11:02:35 myhost clone_instance[16405] Cleaning up log files on SNAPSHOT LV
Aug 22 11:02:35 myhost clone_instance[16405] Removing replication information from SNAPSHOT LV
Aug 22 11:02:35 myhost clone_instance[16405] Found defaults file /root/.my-dev.cnf needed to access dev-instance
Aug 22 11:02:35 myhost clone_instance[16405] Starting SNAPSHOT DB instance dev-instance [mysqld2] using: 'mysqld_multi start 2'
Aug 22 11:02:35 myhost clone_instance[16405] Clone procedure complete.
[root@myhost ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
...
/dev/mapper/volgroup1-live--instance
160G  135G   26G  84% /mysql/live-instance
/dev/mapper/volgroup1-dev--instance
160G  134G   27G  84% /mysql/dev-instance
[root@myhost ~]# lvs
LV             VG        Attr   LSize   Origin         Snap%  Move Log Copy%  Convert
live-instance  volgroup1 owi-ao 160.00G
dev-instance   volgroup1 swi-ao  40.00G live-instance    2.06
[root@myhost ~]#

The longest part of the procedure is shutting down the active live slave. The rest of the time is insignificant.
The “dev experience” when using this dev-instance is just as before. The instance works, can be modified and behaves just as you would expect.


PlanetMySQL Voting: Vote UP / Vote DOWN

Backup MySQL in a Second with ZFS

Сентябрь 29th, 2009

MySQL backup soon becomes an important matter when the database is used in production. The pain-point comes from the fact that while backuping the database is not available to respond to client requests anymore. With mysqldump - the standard tool for performing MySQL backups - and a large database the operation can go over many tenth of minutes if not hours. If I am running my business on line this is simply not acceptable.

The classical approach to workaround this problem is to take advantage of MySQL replication. I set up a master/slave configuration where the slave acts as copy of the master. Then, when needed, I run mysqldump on the slave without any service interruption on the master.

But ZFS snapshosts bring a new straightforward approach that avoids the pain and the complexity of a master/slave replication.

Snapshots are a key feature of ZFS that allows me to save a copy of any ZFS file system in less than a second. Yesterday, Sept 27th I first created a myDB ZFS filesystem for my database, and since I decided to backup my database everyday I also created a first snapshot. I am using InnoDB, so I made sure that both the data and the logs are located in the myDB filesystem. Today, Sept 28th, I created a second snapshot. Here are the commands:

Yesterday, September 27th


# zfs create pool/myDB
# zfs snapshot pool/myDB@Sept27.2009

Today, September 28th


# zfs snapshot pool/myDB@Sept28.2009

Note that as long as the dataset does not change, the snapshot does not use any disk space, so since there was no change to the database when I created the first snapshot zfs list reporst me a null USED space :


# zfs list -t snapshot
NAME                                     USED AVAIL REFER MOUNTPOINT
pool/myDB@Sept27.2009   0         -          667M   

The commands are simple, but what about the performance? If I run the commands just as is, the filesystem that I save takes 667M on disk. On my labtop I have much more data in my home directory. Let's see how long it takes me to snapshot this data.


# zfs list rpool/export/home/tmanfe
NAME                                      USED  AVAIL  REFER  MOUNTPOINT
rpool/export/home/tmanfe   47.8G   22.5G     28.1G     /export/home/tmanfe

# time  zfs snapshot rpool/export/home/tmanfe@28.2009
real    0m0.370s
user   0m0.004s
sys     0m0.009s


I am able to snapshot 47.9GBytes of data in 0.37seconds. Good enough. As long as I can stay bellow a second, this will certainly go unnoticed by the users. By the way, should my dataset increase, the snapshot time will stay the same.

Now, what do I need to do with MySQL?

For my data to be properly and consistently saved, I need to make sure that it gets written to disk. To do this, MySQL provides me with the command FLUSH and its TABLES WITH READ LOCK option. About this option the documentation says: "Closes all open tables and locks all tables for all databases with a read lock until you explicitly release the lock by executing UNLOCK TABLES. This is very convenient way to get backups if you have a file system such as Veritas that can take snapshots in time". Note that the documentation should be modified to add ZFS as another filesystem that supports snapshots.

Since I am using InnoDB I also need to turn autocommit off before running the FLUSH command otherwise every MySQL statement - such as FLUSH TABLES WITH READ LOCK - is wrapped into a transaction and at the end of each transaction all the table locks are released. In other words, my attempt to lock tables would be self-canceling, would have no effect. No such thing is required with MyISAM.

Then, I create the ZFS snapshot and I release the tables with the command UNLOCK TABLES. I am done with MySQL!

The snapshot I created is located on the same disks and server than my original data. To put a real backup system in place I need to transfer the snapshot to another backup server or storage system. Once again ZFS provides me with all I need. The zfs send command enables me to send a snapshot somewhere else. The only prerequisite is that ZFS must be available on my  backup server in order to take advantage of the snapshot that it receives.

Here is the exact syntax:


# zfs send -Ri pool/myDB@Sept27.2009 pool/myDB@Sept28.2009 | ssh backupHost “zfs receive -Fd pool”


With the -i option, only the difference between my two snapshots is sent to a system named backupHost. This option enables me to do incremental backups. Ahead of incremental backups I need to send a first complete snapshot to the backup host. I did it yesterday, when creating my very first  snapshot:


# zfs send -R pool/myDB@Sept27.2009 | ssh backupHost “zfs receive -Fd pool”


Note that the stream generated by zfs send replicates the filesystem, not the snapshot.

The stream is then funneled into an ssh command that connects to the backup host. Once connected, the zfs receive command is executed that takes the content of the stream - i.e. the filesytem - and copies it into a zfs pool that is local to the backup host.

I am done.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Labs : server snapshots available for download

Август 18th, 2009

Users familiar with the MySQL development process will remember that our developers use a tool called pushbuild, which builds the server code with the latest changes, using several operating systems, and runs the test suite.

MySQL Labs

This tool produces one binary package for each platform where the test runs, and every day there are a few dozen of such packages, waiting to be deleted and replaced by the ones created with the next build.

For long time, several people suggested publishing these binaries for the community. Each time, there was some minor or major impediment, such as getting together different teams and requesting resources from a third one. But the community team persisted and kept banging at the door until, at last, we can announce that the binaries are available for download at labs.mysql.com.

These binaries are temporary. You will find online for a few days, and they will eventually be replaced by newer ones. The exact dates of updates are not known. We build every day, but not all the builds are successful. Builds that fail to compile don't produce any binary, and builds that fail the test suite are not exported to the snapshot pages either.

But the binaries will be available fairly often, allowing the community to do early testing on bug fixes or new features.

A word of caution. These binaries have passed a test suite, but they have not been tested extensively like the monthly releases. Therefore, these binaries are for testing only, not for production. To make things clear, the package names include the word "snapshot" and the build date. And the files are not in the same page as the official releases.

Currently (2009-08-18) there are two branches: the latest 5.1 builds and the 5.1 GIS. For each branch, we plan to keep online at least the latest two builds before purging the oldest ones, but we may increase this number if we see that storage is not a problem.

Further releases will be introduced next month.

Notice that, unlike the normal releases, only a few platforms are included, and there are no dedicated packages (.dmg, .pkg, .deb, .rpm) but only tarballs (tar.gz, or zip for Windows). These packages are supposed to be used for testing only. In every Unix system, you can install them using MySQL Sandbox or, if it suits you, a manual installation.

I would like to thank the web and build teams who have worked together to provide this feature, and especially Daniel Fischer and Markus Popp, who delivered the ultimate goods.

Your feedback is valuable. Please let us know what you did with the snapshots.


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing the InnoDB plugin with MySQL snapshots

Август 18th, 2009

MySQL plugins

The cat is out of the bag.
MySQL 5.1 will include the InnoDB plugin, and thanks to
labs.mysql.com
you can try the new version right away.
Here is a step-by-step guide to testing the InnoDB plugin with MySQL snapshot 5.1.39 and MySQL Sandbox.

1. Install MySQL::Sandbox
This is a straightforward part. Please refer to the manual for the details.

2. get the binaries
Check the list of available binaries and download the one that matches your architecture and operating system.

3. Install the sandbox
Since we want to use the InnoDB plugin, we need to start the Sandbox with the builtin innodb engine disabled.
make_sandbox \
/path/to/mysql-5.1.39-snapshot20090812-osx10.5-i386.tar.gz \
-c ignore-builtin-innodb
The option passed with "-c" will be written to the options file.
Make sure that the sandbox is installed and the server starts. If it doesn't, check the error log at $HOME/sandboxes/msb_5_1_39/data/msandbox.err and try to figure out what happened.

4. Check the available engines
~/sandboxes/msb_5_1_39/use
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.39-snapshot20090812 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

select engine, support from information_schema.engines;
+------------+---------+
| engine | support |
+------------+---------+
| MyISAM | DEFAULT |
| MRG_MYISAM | YES |
| BLACKHOLE | YES |
| CSV | YES |
| MEMORY | YES |
| FEDERATED | NO |
| ARCHIVE | YES |
+------------+---------+
As you can see, InnoDB is not in the list.

5. Install the innodb plugin
install plugin innodb soname 'ha_innodb_plugin.so';
Query OK, 0 rows affected (0.85 sec)

select @@innodb_version;
+------------------+
| @@innodb_version |
+------------------+
| 1.0.4 |
+------------------+

6. Install the additional INFORMATION SCHEMA tables
INSTALL PLUGIN INNODB_TRX SONAME 'ha_innodb_plugin.so';
Query OK, 0 rows affected (0.00 sec)

INSTALL PLUGIN INNODB_LOCKS SONAME 'ha_innodb_plugin.so';
Query OK, 0 rows affected (0.00 sec)

INSTALL PLUGIN INNODB_LOCK_WAITS SONAME 'ha_innodb_plugin.so';
Query OK, 0 rows affected (0.00 sec)

INSTALL PLUGIN INNODB_CMP SONAME 'ha_innodb_plugin.so';
Query OK, 0 rows affected (0.00 sec)

INSTALL PLUGIN INNODB_CMP_RESET SONAME 'ha_innodb_plugin.so';
Query OK, 0 rows affected (0.00 sec)

INSTALL PLUGIN INNODB_CMPMEM SONAME 'ha_innodb_plugin.so';
Query OK, 0 rows affected (0.00 sec)

INSTALL PLUGIN INNODB_CMPMEM_RESET SONAME 'ha_innodb_plugin.so';
Query OK, 0 rows affected (0.00 sec)

6. Finally, check the results
select plugin_name, plugin_type, plugin_status from information_schema.plugins;
+---------------------+--------------------+---------------+
| plugin_name | plugin_type | plugin_status |
+---------------------+--------------------+---------------+
| binlog | STORAGE ENGINE | ACTIVE |
| partition | STORAGE ENGINE | ACTIVE |
| ARCHIVE | STORAGE ENGINE | ACTIVE |
| BLACKHOLE | STORAGE ENGINE | ACTIVE |
| CSV | STORAGE ENGINE | ACTIVE |
| FEDERATED | STORAGE ENGINE | DISABLED |
| MEMORY | STORAGE ENGINE | ACTIVE |
| MyISAM | STORAGE ENGINE | ACTIVE |
| MRG_MYISAM | STORAGE ENGINE | ACTIVE |
| InnoDB | STORAGE ENGINE | ACTIVE |
| INNODB_TRX | INFORMATION SCHEMA | ACTIVE |
| INNODB_LOCKS | INFORMATION SCHEMA | ACTIVE |
| INNODB_LOCK_WAITS | INFORMATION SCHEMA | ACTIVE |
| INNODB_CMP | INFORMATION SCHEMA | ACTIVE |
| INNODB_CMP_RESET | INFORMATION SCHEMA | ACTIVE |
| INNODB_CMPMEM | INFORMATION SCHEMA | ACTIVE |
| INNODB_CMPMEM_RESET | INFORMATION SCHEMA | ACTIVE |
+---------------------+--------------------+---------------+
Now you can read the InnoDB plugin manual and have as much fun as you can.

PlanetMySQL Voting: Vote UP / Vote DOWN