Archive for the ‘haildb’ Category

Optimizing InnoDB for creating 30,000 tables (and nothing else)

Декабрь 23rd, 2011

Once upon a time, it would have been considered madness to even attempt to create 30,000 tables in InnoDB. That time is now a memory. We have customers with a lot more tables than a mere 30,000. There have historically been no tests for anything near this many tables in the MySQL test suite.

So, in fleshing out the test cases for this and innodb_dict_size_limit I was left with the not so awesome task of making the test case run in remotely reasonable time. The test case itself is pretty simple, a simple loop in the not at all exciting mysqltest language that will create 30,000 identical tables, insert a row into each of them and then drop them.

Establishing the ground rules: I do not care about durability. This is a test case, not a production system holding important data which means I can lie, cheat and steal to get performance.

The simplest way is to use libeatmydata. This is a small shared library designed to be LD_PRELOADed that disables just about every way an application can write data safely to disk. This is perfect for running a test suite for a database server as if your machine crashes halfway through a test run, you’ll just start the test run again – you are not dealing with any important data. Obviously, you should never, ever, ever use libeatmydata with anything you care about; it is called libeat-my-data for a reason.

Without libeatmydata and using the default options for the MySQL test suite, I noticed that I was only creating about 10-15 tables every second which means we’d take a very long time to create 30,000. After a bit of time, it sped up to about 20-25 per second. Of course, with the default timeout for a MySQL test (15 minutes), we quickly (well.. in 15 minutes) hit that and the test is failed for taking too long.

With libeatmydata the test takes about 77 seconds – a huge improvement.

So how could I possibly get this test to run (even with –big-test option to mysql-test-run) in reasonable time? Well… I can set some InnoDB options! I’m going to try the obvious first: innodb-flush-method, sync-frm and innodb-flush-log-at-trx-commit. There is an undocumented option for innodb-flush-method called “nosync” that is meant to not flush data to disk. Since you could hear how much syncing to disk was going on during my test run, not syncing to disk all the time would get closer to the libeatmydata performance. I also want to disable syncing of the FRM file to disk and set log flushing to happen as infrequently as possible. With these options I started to get somewhere between 25-90 CREATE TABLE per second. This gets the test execution time down to 12 minutes, so that just escapes the timeout.

I then added the options of innodb-adaptive-checkpoint=0 and flush-neighbor-pages=0 in the hope of avoiding a bunch of background flushing (which called fsync). It didn’t help.

I noticed that there was an fsync() call when extending the data file, so I tried setting a higher innodb-autoextend-increment and a larger initial size. This also did not help.

So how fast is InnoDB under all of this? Am I hitting a fundamental limitation in InnoDB?

Well…. I went and wrote a program using HailDB – which is InnoDB as a shared library that you can call using an easy to use C API.

Writing a simple test program that creates 30,000 tables in a similar InnoDB configuration as default MySQL is pretty easy (easier than writing the mysqltest language that’s for sure). After a “I’m glad this isn’t a SSD” killer amount of fsync() activity, it took a total of 14.5 minutes. Not too bad. This is less than my initial test with MySQL, probably due to not writing and syncing FRM files. If I run the same program with libeatmydata, it only takes 15-20 seconds. Clearly it’s syncing things to disk that takes all the time.

If we make the HailDB program set flush_method to nosync and flush_log_at_trx_commit=2, the execution time is only 1 minute. This is much closer to the libeatmydata time than MySQL ever got.

With HailDB you can do more than one data dictionary operation in a single transaction. So if instead of setting flush_method and flush_log_at_trx_commit I instead group the CREATE TABLE into transactions of creating 100 tables at a time, I get an execution time of 3 minutes. This is a big difference to the original 14.5 minutes.

What’s the practical applications of all of this? Not much (unless you’re writing complex test cases) but it is clear that loading large amounts of DDL could be a lot faster than it is currently (although loading the data into tables is still going to take a while too).


PlanetMySQL Voting: Vote UP / Vote DOWN

HailDB being built by default in Drizzle

Октябрь 21st, 2010

It just it trunk – if you have HailDB installed when you build Drizzle, you will now get the HailDB plugin built. You can even run Drizzle with it (remove innobase plugin, load HailDB plugin). Previously, we had problems building both due to symbol conflicts between innobase and HailDB. We’ve fixed this thanks to the linker.

So, enjoy HailDB… welll, test it and report bugs that I can fix :)


PlanetMySQL Voting: Vote UP / Vote DOWN

New APIs in HailDB

Октябрь 18th, 2010

In the current HailDB we have a couple of new API calls that you may like:

  • ib_status_get_all()
    Is very similar to ib_cfg_get_all(). This allows the library to add new status variables without applications having to know about them – because we return a list of what there are. For Drizzle, this means that the DATA_DICTIONARY.HAILDB_STATUS table will automatically have any new status variables we add to HailDB without a single extra line of code having to be written.
  • ib_set_panic_handler()
    Having a shared library call exit() is generally considered impolite. Previously, if HailDB hit corruption (or some other nasty conditions), it could call exit() and you’d never get a chance to display a sensible error message to your user (especially bad in a GUI app where the printed to console error message would be unseen). This call allows an application to specify a callback in the case of HailDB entering such a condition. We’ll still be unable to continue (and we strongly advise that you do in fact exit the process in your callback) but you’re at least now able to (for example) pop up a dialog box saying sorry.
  • ib_trx_set_client_data()
    This call lets you associate a void* with a transaction. HailDB keeps this pointer in its transaction data structure and in some callbacks (e.g. ib_set_trx_is_interrupted_handler(), see below) will pass this pointer back to you for you to use to help make a decision. In InnoDB in MySQL, this is the THD. In Drizzle, it’s the Session.
  • ib_set_trx_is_interrupted_handler()
    In various wait conditions (e.g. waiting for a row lock), HailDB will call the callback you set with this function with the client data (set with ib_trx_set_client_data()) to work out if the transaction has been cancelled. This enables an application to implement something like the MySQL/Drizzle KILL command to cancel a transaction in another thread.
  • ib_get_duplicate_key()
    If you just got a duplicate key error, this function will tell you what key it was. This allows you to implement a nicer error message.
  • ib_get_table_statistics()
    This function gives you access to some basic table statistics that HailDB maintains. This includes an approximate row count, clustered index size, total of secondary indexes as well as a “modified counter” which can give you a rough idea about how out of date these statistics are.

All of these are new to HailDB (and weren’t available in embedded_innodb), many in the new 2.3 development release. You can see usage examples both in the HailDB test suite and (for most of them) in the Drizzle HailDB Storage Engine.


PlanetMySQL Voting: Vote UP / Vote DOWN

Second Drizzle Beta (and InnoDB update)

Октябрь 14th, 2010

We just released the latest Drizzle tarball (2010-10-11 milestone). There are a whole bunch of bug fixes, but there are two things that are interesting from a storage engine point of view:

  • The Innobase plugin is now based on innodb_plugin 1.0.6
  • The embedded_innodb engine is now named HailDB and requires HailDB, it can no longer be built with embedded_innodb.

Those of you following Drizzle fairly closely have probably noticed that we’ve lagged behind in InnoDB versions. I’m actively working on fixing that – both for the innobase plugin and for the HailDB library.

If building the HailDB plugin (which is planned to replace the innobase plugin), you’ll need the latest HailDB release (which as of writing is 2.3.1). We’re making good additions to the HailDB API to enable the storage engine to have the same features as the Innobase plugin.


PlanetMySQL Voting: Vote UP / Vote DOWN

Warnings are now actual problems

Сентябрь 23rd, 2010

Yesterday, I reached a happy milestone in HailDB development. All compiler warnings left in the api/ directory (the public interface to the database engine) are now either probable/possible bugs (that we need to look at closely) or are warnings due to unfinished code (that we should finish).

There’s still a bunch of compiler warnings that we’ve inherited (HailDB compiles with lots of warnings enabled) that we have to get through, but a lot will wait until after we update the core to be based on InnoDB 1.1.


PlanetMySQL Voting: Vote UP / Vote DOWN

HailDB 2.0.0 released!

Август 3rd, 2010

(Reposted from the HailDB Blog. See also the announcement on the Drizzle Blog.)
We’ve made our first HailDB release! We’ve decided to make this a very conservative release. Fixing some minor bugs, getting a lot of compiler warnings fixed and start to make the name change in the source from Embedded InnoDB to HailDB.

Migrating your software to use HailDB is really simple. In fact, for this release, it shouldn’t take more than 5 minutes.

Highlights of this release:

  • A lot of compiler warnings have been fixed.
  • The build system is now pandora-build.
  • some small bugs have been fixed
  • Header file is now haildb.h instead of innodb.h
  • We display “HailDB” instead of “Embedded InnoDB”
  • Library name is libhaildb instead of libinnodb
  • It is probably binary compatible with the last Embedded InnoDB release, but we don’t have explicit tests for that, so YMMV.

Check out the Launchpad page on 2.0.0 and you can download the tarball either from there or right here:

  • haildb-2.0.0.tar.gz
    MD5:  183b81bfe2303aed435cdc8babf11d2b
    SHA1:  065e6a2f2cb2949efd7b8f3ed664bc1ac655cd75

PlanetMySQL Voting: Vote UP / Vote DOWN

HailDB, Hudson, compiler warnings and cppcheck

Май 5th, 2010

I’ve integrated HailDB into our Hudson setup (haildb-trunk on Hudson). I’ve also made sure that Hudson is tracking the compiler warnings. We’ve enabled more compiler warnings than InnoDB has traditionally been compiled with – this means we’ve started off with over 4,300 compiler warnings! Most of those are not going to be anything remotely harmful – however, we often find that it’s 1 in 1000 that is a real bug. I’ve managed to get it down to about 1,700 at the moment (removing a lot of harmless ones).

I’ve also enabled a cppcheck run on it. Cppcheck is a static analysis tool for C/C++. We’ve also enabled it for Drizzle (see drizzle-build-cppcheck on Hudson). When we enabled it for Drizzle, we immediately found three real bugs! There is also a coding style checker which we’ve also enabled on both projects. So far, cppcheck has not found any real bugs in HailDB, just some style warnings.

So, I encourage you to try cppcheck if you’re writing C/C++.


PlanetMySQL Voting: Vote UP / Vote DOWN

Announcing HailDB

Апрель 16th, 2010

I just announced our continuation of the Embedded InnoDB project under the name of HailDB. Check out the announcement over at http://www.haildb.com/.

HailDB is a relational database that is embeddable within applications. You embed HailDB by linking to a shared library and calling a clean and simple API. HailDB is a continuation of the Embedded InnoDB project. It is not itself a database server, but is a library implementing the storage layer. With the addition of the HailDB plugin to Drizzle you get a full SQL interface.

Read more at http://www.haildb.com


PlanetMySQL Voting: Vote UP / Vote DOWN