Archive for the ‘freesoftware’ Category

Debugging memory leaks in plugins with Valgrind

Апрель 23rd, 2010

I had an interesting IRC discussion the other day with Monty Taylor about what turned out to be a limitation in Valgrind with respect to debugging memory leaks in dynamically loaded plugins.

Monty Taylor's original problem was with Drizzle, but as it turns out, it is common to all of the MySQL-derived code bases. When there is a memory leak from an allocation in a dynamically loaded plugin, Valgrind will detect the leak, but the part of the stack trace that is within the plugin shows up as an unhelpful three question marks "???":

==1287== 400 bytes in 4 blocks are definitely lost in loss record 5 of 8
==1287==    at 0x4C22FAB: malloc (vg_replace_malloc.c:207)
==1287==    by 0x126A2186: ???
==1287==    by 0x7C8E01: ha_initialize_handlerton(st_plugin_int*) (handler.cc:429)
==1287==    by 0x88ADD6: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1033)
Which tells you little more than that there is a leak in one of your plugins.

After trying a couple of things, we found that this is a known limitation in Valgrind in relation to code that is loaded with dlopen() and later unloaded with dlclose():

http://bugs.kde.org/show_bug.cgi?id=79362

The basic problem is that Valgrind records the location of the malloc() call as just a memory address. And when the memory leak check is performed after the end of program execution, the plugin has been unloaded with dlclose(), and the recorded memory address is therefore no longer valid.

The problem is specific to memory leak checks, which are done only after the code has been unloaded. Other checks (like use of uninitialised values and use-after-free) work fine with full information in the stack traces, as such checks are done while the plugin code is still loaded into memory. But the memory leak checks are arguably among the most useful cheks Valgrind does, as Valgrind is often the only way to find and fix critical memory leaks efficiently.

Fortunately, once the issue was understood, we had an easy work-around: disable the dlclose() call in the server plugin code, and the leak is then detected with full information in the stack trace. Unfortunately this introduces a leak of its own, since now the memory allocated in dlopen() is never freed, so we get another spurious Valgrind memory leak warning.

Another possible way to get the same effect is to pass the RTLD_NODELETE flag to dlopen() to achieve the same effect, though I did not try this yet.

A possibly better work-around (which I also did not try yet) is one suggested in the above referenced Valgrind feature request. By adding the offending plugin(s) as LD_PRELOAD when starting the server, the plugin code will not actually be unloaded in dlclose(), so stack traces should be available without any spurious leak warnings from Valgrind. However, this will not work well if some of the dynamic plugins need a particular load order (according to the suggestion in the feature request). I also need to check if this actually works for plugins (like storage engines) that has link dependencies to symbols in the main program. But it might be a good option if it can be made to work.

(At first I was surprised to learn that this was a problem in MySQL and MariaDB, as I never saw it before. But I suppose the reason is that we so far have built most plugins as built-in, rather than as dynamically loaded .so files. The problem is likely to occur more frequently as we are moving to do more and more with plugins in MariaDB, so it is nice to know a work-around. Thanks, Monty!)


PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB talk at the OpenSourceDays 2010 conference

Март 29th, 2010

Earlier this month, I was at the OpenSourceDays 2010 conference, giving a talk on MariaDB (the slides from the talk are available).

The talk went quite well I think (though I probably talked way too fast as I usually do; at least that means that I finished on time with plenty room for questions..)

There was quite a bit of interest after the talk from many of the people who heard it. It was even reported on by the Danish IT media version2.dk (article in Danish).

Especially interesting to me was to discuss with three people from Danish site komogvind.dk, who told me fascinating details about their work keeping a busy site running; one of them even went right home to benchmark against MariaDB. Thanks to you, and to everyone else for your interest and time!

This time in the talk, I tried to also focus on the community and development aspects of MariaDB (in addition to the mandatory feature list and benchmark graphs, of course). To me, the most important thing about MariaDB is that we now have the infrastructure and community for people outside of MySQL to do fullscale development at the same level as inside MySQL. This was missing before. It is a much less concrete thing than features and benchmarks, so I found it much harder to present in a good way, without it turning into nothing but buzzwords. But from the feedback I got afterwards, it seems I succeeded pretty well with this part also, which I am especially happy about!

The talk was recorded on video by the organisers. The latest I heard was that the video footage is still being edited though (I was kind of waiting, hoping to be able to include a link to the video in this post). But if they do manage to finish the editing and make the videos available later, I will post an update.

A big thanks to the organisers of the OpenSourceDays 2010 conference! I had a great time, and hope to be back again next spring for OpenSourceDays 2011.


PlanetMySQL Voting: Vote UP / Vote DOWN

Conference time!

Февраль 15th, 2010

It is conference time for me. I just came home from FOSDEM 2010 where we had a booth and I gave a talk. At the end of the month there will be a company meeting in Iceland for Monty Program, followed by Open Source Days 2010 where I will also be speaking. And then in April there is the MySQL User Conference. With two additional talks given at local user groups end of last year, I think I've about filled my quota for now, I feel quite fortunate that it turned out that I will not also be presenting at the UC! (I do not have a natural talent for speaking, and tend to need to spend quite a lot of time in preparations.)

MariaDB/PBXT booth at FOSDEM
Having a booth at FOSDEM turned out really well I think, as I got to talk to a lot of different people that passed by the booth. I also had a very nice dinner with the PostgreSQL people where I learned a lot about the internals of that database. As well as dinners with people from the MySQL world, also with lots of interesting discussions.

Thanks to all of the people that I met at FOSDEM. It was fun and inspiring to meet you, looking forward to the next time!

One thing strikes me as I am piecing together the mandatory "MariaDB feature list" for my next talk. It seems people tend to focus a lot on the extra features MariaDB has over MySQL. But there is another aspect that I think is just as important: MariaDB is creating an open framework where community developers can work together on new development. This is something that has been missing in the past.

It is easy to focus on a concrete list of features, whereas the idea of an abstract framework is much harder to present as more than buzzword talk. But I will try to get it into my next talk, as I think ultimately both are of equal importance.


PlanetMySQL Voting: Vote UP / Vote DOWN

Why I work on Free Software

Январь 20th, 2010

I happened upon this old LinuxJournal article about how the University of Zululand in South Africa used MySQL and other Free Software to make do with a 128 kbit (and later 768 kbit) internet connection for their staff and students.

This made me remember the trip I made to another African country, Burkina Faso, 15 years ago:

With the huge amount of work and numerous difficult obstacles facing my work on the MariaDB project, it can be hard to keep up motivation at times. It helps to remember why I am doing this.

It must be about the time when some of these kids should go to University or start up new projects. Maybe some of them will work with software. I want them to be able to invest in local skills and infrastructure that they need, rather than in software licenses funding nice houses and boats in other countries:

(House and yacht pictures courtesy of Wikipedia under the Creative Commons Attribution ShareAlike 3.0)


PlanetMySQL Voting: Vote UP / Vote DOWN

RunVM, a tool for automated scripting inside virtual machines

Январь 16th, 2010

In the Autumn, I wrote about some experiments I did using KVM and virtual machines to build and test MariaDB binary packages on a number of different platforms. In the period since then I added some polish and refinements, and the system is now running well for some time. We build and test packages for Debian (4 and 5), Ubuntu (8.04 to 10.04), Centos 5, and generic Linux; amd64 and i386 architectures.

To better control the startup and shutdown of the virtual machines, I created a small wrapper script around KVM called runvm. This wrapper encapsulates the steps needed to boot up a virtual machine, run a series of commands inside it, and shut it down gracefully afterwards. Some special care is taken in the script to ensure that the virtual machine is always shut down after use (gracefully if possible), even in case of various failures or the loss of the parent process or controlling TTY. And if a conflicting virtual machine somehow manages to escape shutdown, runvm automatically attempts to terminate it before starting a new one. This extra robustness is important for fully automated testing as in our Buildbot setup, to ensure that the system can run unattended for longer periods of time.

So for example, here is how to run a build inside a virtual machine using runvm:

  runvm --port=2222 ubuntu-hardy-i386.qcow2 \
    "= scp -P 2222 mariadb-5.1.41-rc.tar.gz localhost:" \
    "tar zxf mariadb-5.1.41-rc.tar.gz" \
    "cd mariadb-5.1.41-rc && ./configure" \
    "cd mariadb-5.1.41-rc && make"

Here, ubuntu-hardy-amd64.qcow2 is a KVM image already installed with compilers and set up for password-less ssh access (using public key authentication). Port 2222 on the host side is forwarded to the ssh service (port 22) on the guest side (so by specifying different --port options it is easy to run multiple runvm invocations in parallel; in our Buildbot setup we run 3 builds in parallel this way).

Note the use of the scp command, prefixed with an equals sign "=". Commands prefixed in this way are run on the host side rather than the guest side; this is a convenient way to copy data in or results out of the virtual machine while the runvm session is running.

Using runvm in this way we are able to easily and flexibly manage a large number of virtual machines for automated builds with very little overhead and complexity. In fact we have around 70 distinct virtual machines! The only resource they take is a little disk space (around 37 GByte). And the virtual machines images are also simple to set up, requiring only a minimal install; no need to set up networking bridges or IP addresses, or to install a Buildbot client. All the complex logic runs on the host system, which only needs to be installed once.

By keeping the virtual images simple, we also achieve that builds and tests run in a minimal environment, which is useful to detect any missing dependencies or other problems that do not show themselves on normal developer machines with a full desktop install (we even do install testing on a separate virtual machine from the one used to build, with compilers etc. not installed on the one used to test installation).

A further refinement of this is to create a new temporary virtual machine image before each step as a copy of a reference image, run the build, and throw away the temporary image after the build. This avoids any possibility of a previous build influencing a following build in any way (and thus also simplifies the build setup, as we can install stuff freely without any need to do cleanup). It also avoids having to fix a broken image, like needing to manually run fsck after a crash or similar. We use this technique for most of our binary package builds in Buildbot.

To use this copy-and-discard technique with runvm, the --base-image option is useful:

  runvm --port=2222 --base-image=ubuntu-hardy-i386.qcow2 tmp.qcow2 \
    "= scp -P 2222 mariadb-5.1.41-rc.tar.gz localhost:" \
    "tar zxf mariadb-5.1.41-rc.tar.gz" \
    "cd mariadb-5.1.41-rc && ./configure" \
    "cd mariadb-5.1.41-rc && make"

This will run the build in a temporary copy tmp.qcow2 of the reference imageubuntu-hardy-i386.qcow2, without modifying the reference image in any way. This uses the copy-on-write feature of the qcow2 image format (see qemu-img(1)), so it even takes only very little time (fraction of a second) and minimal space (only changed blocks are written to the new image).

This is basically how the package testing in our Buildbot setup is done. There are some further details of course, like more options for the build commmands and extra care to get logfiles out to debug problems; the full details are available in our Buildbot configuration file. But the basic principle is just a number of runvm commands like the example above.

The runvm tool is available under GPL on Lauchpad in the project Tools for MariaDB. In the bzr repository it is found as buildbot/runvm. If someone finds it useful or has suggestions for improvements, please drop us a line on the maria-developers@lists.launchpad.net mailing list.

Here is the output of runvm --help:

Usage: /home/knielsen/devel/maria/my/mariadb-tools/buildbot/runvm <options> image.qcow2 [command ...]

Boot the given KVM virtual machine image and wait for it to come up.
Run the list of commands one at a time, aborting on receiving an error.
When all commands are run (or one of them failed), shutdown the virtual
machine and exit.

Commands are by default run inside the virtual machine using ssh(1). By
prefixing a command with an equals sign '=', it will instead be run on the
host system (for example to copy files into or out of the virtual machine
using scp(1)).

Some care is taken to ensure that the virtual machine is shutdown
gracefully and not left running even in case the controlling tty is
closed or the parent process killed. If a previous virtual machine is
already running on a conflicting port, an attempt is made to shut it
down first. For this purpose, a PID file is created in $HOME/.runvm/

Available options:

  -p, --port=N        Forward this port on the host side to the ssh port (port
                      22) on the guest side. Must be different for each runvm
                      instance running in parallel to avoid conflicts. The
                      default is 2222.
                      To copy files in/out of the guest use a command prefixed
                      with '=' calling scp(1) with the -P option using the port
                      specified here, like this:
                          runvm img.qcow2 "=scp -P 2222 file.txt localhost:"
  -u, --user=USER     Name of the account to ssh into in the guest. Defaults to
                      the name of the user invoking runvm on the host.
  -m, --memory=N      Amount of memory (in megabytes) to allocate to the guest.
                      Defaults to 2047.
  --smp=N             Number of CPU cores to allocate to the guest.
                      Defaults to 2.
  -c, --cpu=NAME      Type of CPU to emulate for KVM, see qemu(1) for details.
                      For example:
                          --cpu=qemu64      For 64-bit amd64 emulation
                          --cpu=qemu32      For 32-bit x86 emulation
                          --cpu=qemu32,-nx  32-bit and disable "no-execute"
                      The default is qemu32,-nx
  --netdev=NAME       Network device to emulate. The 'virtio' device has good
                      performance but may not have driver support in all
                      operating systems, if so another can be specified.
                      The default is virtio.
  --kvm=OPT           Pass additional option OPT to kvm. Specify multiple times
                      to pass more than one option. For example
                          runvm --kvm=-cdrom --kvm=mycd.iso img.qcow2 ...
  --initial-sleep=SECS
                      Wait this many seconds before starting to poll the guest
                      ssh port for it to be up. Default 15.
  --startup-timeout=SECS
                      Wait at most this many seconds for the guest OS to respond
                      to ssh. If this time is exceeded assume it has failed to
                      boot correctly. Default 300.
  --shutdown-timeout=SECS
                      Wait at most this many seconds for the guest OS to
                      shutdown gracefully after sending a shutdown command. If
                      this time is exceeded, assume the guest has failed to
                      shutdown gracefully and kill it forcibly. Default 120.
  --kvm-retries=N     If the guest fails to come up, retry the boot this many
                      times before giving up. This helps if the virtual machine
                      sometimes crashes during boot. Default 3.
  -l, --logfile=FILE  File to redirect the output from kvm into. This includes
                      any (error) messages from kvm, and also includes anything
                      the guest writes to the kvm emulated serial port (it can
                      be useful to set the guest to send boot loader and kernel
                      messages to the serial console and log them with this
                      option). Default is to not log this output anywhere.
  -b, --base-image=IMG
                      Instead of booting an existing image, create a new
                      copy-on-write image based on IMG. This uses the -b option
                      of qemu-img(1). IMG is not modified in any way. This way,
                      the booted image can be discarded after use, so that each
                      use of IMG is using the same reference image with no risk
                      of "polution" between different invocations.
                      Note that this DELETES any existing image of the same
                      name as the one specified on the command line to boot! It
                      will be replaced with the image created as a copy of IMG,
                      with any modifications done during the runvm session.

PlanetMySQL Voting: Vote UP / Vote DOWN

Oracle speculations

Декабрь 31st, 2009

The Planet MySQL has been abuzz with opinions for or against the acquisition of Sun (and in particular MySQL) by Oracle, but I do not have a strong opinion to chime in with in support of either groups. The reason is that I do not know anything about antitrust laws, which is the legal basis for the EC blocking or not blocking the deal; and also I do not know what the alternative is to Oracle buying the MySQL part of Sun.

However, that does not mean that I can not join in the speculations about Oracles reasons for wanting MySQL in the first place ;-)

I think it is basically a matter of obtaining control over MySQL.

The horror scenario for Oracle is that MySQL (or Postgress or another Free Software program) does to the proprietary databases what Linux has done to the proprietary Unixes. Which is essentially to kill them, slowly but surely. This is not an immediate threat to Oracle, but it is a real long-term threat given how similar the technical challenges are of developing a kernel/OS and a database/RDBMS. And should it happen, the impact on the license revenues from the Oracle RDBMS would be devastating.

Compared to that horror scenario, the potential loss of some fraction of Oracle license sales which will stem from the continued development of MySQL is of less consequences. And this loss will happen in any case, if not to MySQL then to Postgress, to a MySQL fork, or to another free database.

So from this reasoning, it makes the most sense for Oracle to continue development of MySQL more or less unchanged from what it was in MySQL AB and later at Sun. At a sufficiently high level (in terms of bug fixes, features, etc.) that most of the community interest will remain on MySQL and not turn to forks or other Free databases that are outside of Oracles control. And keeping the very tight control over the development that MySQL AB and Sun also had, with the community having basically no influence over what goes into the code or not. Then, should it ever become necessary, Oracle has the control it needs to prevent or at least manage the above-mentioned horror scenario.

In fact, this is exactly what Oracle has done with InnoDB since buying them four years ago. The development has continued more or less as before, as far as I can tell at a similar pace and with the same team. And while there is a fork in XtraDB, the Oracle-controlled InnoDB is good enough that by far the majority of the community is using it and not XtraDB.

So basically, after buying InnoDB Oracle has done essentially nothing with it, one way or the other. Notably, Oracle has not done any of the following or similar bold and visionary steps with InnoDB:

  • Transfered some of the technology from the Oracle RDBMS product into InnoDB which is currently missing. Like multiple tablespaces which can be assigned independently to tables (even though MySQL already has the syntax support for this). Or the use of multiple buffer pools to control page eviction. Or online backup based on the transaction log (innobackup is not part of the GPL InnoDB version, and there is no integration with the new MySQL backup interface). Etc. etc.
  • Opened up the development process like other Open Source projects, with public revision control repository (as far as I know they have not even switched to using bzr like the rest of the MySQL world), public mailing lists for reviews and discussions, public bug tracker, etc.
  • Taken steps to integrate the development process better with MySQL AB and later Sun. As far as I know, changes to the InnoDB included in the MySQL source tree still happens by manually sending patches from Oracle to MySQL which are then manually committed to the MySQL tree! Even worse, much of the development of new features has taken place on a separate product, the InnoDB plugin, which is not enabled by default (it was not even in the MySQL source tree until this Summer), and used by only a minority of the community.
  • Given up any kind of control of the development to the community (eg. as far as I know Oracle has taken no steps to work together with XtraDB).

Nothing I have seen in the statements or discussions so far seems to suggest that Oracle would treat an acquisition of MySQL differently.

Will this be good or bad for MySQL? Without an alternative scenario to compare with, I do not know. But it certainly is not sufficient! MySQL development has been stalling for several years, and we need to invigorate it to make MySQL meet the new challenges facing existing applications, and to improve MySQL for use in applications where it is currently weak.

We need improved management of huge databases: tablespace management; buffer pool control; backup infrastructure; etc. We need replication improvements: binlog storage inside default storage engine for improved transaction handling; interleaved logging of transactions; multi-threaded application of row-based events in MVCC engines; robust automatic handling of fail-over scenarios, etc. We need refactoring of the core server to enable future development: new parser; separate abstract syntax tree; move to modern multi-threading architecture with lock-free operations and RCU; etc. We need scalability improvements to multi-core computers. We need versioned metadata for better support of on-line DDL. We need server-side cache of already executed statements with access to per-statement statistics and execution plans. We need merge and hash joins. We need better integration of the many new storage engines being developed: inclusion by default in source and binaries to make them easy to try and use; extensions to the storage engine API to better interface to the new engines and fully exploit each of their unique features. And lots more.

Will Oracle take the lead on some of these, and give up sufficient control for the community to take the lead on the rest? Well, we do not know. It does seem hard to find a motivation for Oracle to drive the development of MySQL into new areas that will necessarily cannibalise their huge license revenues. On the other hand, if they do, they will be most welcome, and we look forward to hopefully working together with them. In any case, we must not blindly rely on this to happen, not from Oracle or any other company which may end up with the MySQL assets!

I sincerely hope that whatever happens to MySQL the company, a sufficient part of the community will remember that we need not just a MySQL (under whatever name) that is "good enough" today, we also need a MySQL for tomorrow. And for this the community needs to support those that step up to lead future MySQL development, whoever it will be.

So will it be MariaDB leading? We still have a way to go before we have proven ourselves worthy to saying this. But what I can say is that we are trying!


PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB Buildbot configuration file published

Декабрь 18th, 2009

I have now published the Buildbot configuration file that we use for our continuous integration tests in our Buildbot setup. Every push into main and development branches of MariaDB is built and tested on a range of platforms to catch and fix any problems early (and we also test MySQL releases before merging to easily see whether any new problems already existed in MySQL or were introduced by something specific to MariaDB).

The configuration is included in the Tools for MariaDB Launchpad project.

Now, the Buildbot configuration file is not something that most MariaDB users will need or want to care about, of course. But I think it is still very important to have it publicly available, not sitting on some private server of the company Monty Program AB.

The reason is that the whole idea with MariaDB is to make a community branch of MySQL, developed by the community and for the community. We want MariaDB the project to be bigger than Monty Program AB the company. And since the Buildbot testing is so central to the whole MariaDB development process, the Buildbot setup also needs to be available for the community. Want to improve the setup, just see what it is doing, or even set up your own master to show you can do a better job (and yes our Windows setup currently really suck)? Just go ahead! Wondering how the Buildbot setup can be continued if Monty Program AB disappears or turns fascist? Now there is an answer.

Hopefully the configuration can also be useful as an example for people doing fancy things with Buildbot. There is some cool stuff in there. Like creating a source tarball on a linux host, and uploading it to be built on a Windows host (this is how releases are done, so important to check that no files are missing from the source tarball). Another cool thing is the builders that boots op KVM virtual machines on demand to build and test binary packages (.deb, .rpm, and .tar.gz) on all of the 18 Linux platforms we currently release for.

BTW, you do not get the miscellaneous passwords in the published configuration file, sorry! :-)

[The license for the configuration file (which is in fact a sizable Python script, as this is the way Buildbot is configured) is GPL.]


PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB Buildbot configuration file published

Декабрь 18th, 2009

I have now published the Buildbot configuration file that we use for our continuous integration tests in our Buildbot setup. Every push into main and development branches of MariaDB is built and tested on a range of platforms to catch and fix any problems early (and we also test MySQL releases before merging to easily see whether any new problems already existed in MySQL or were introduced by something specific to MariaDB).

The configuration is included in the Tools for MariaDB Launchpad project.

Now, the Buildbot configuration file is not something that most MariaDB users will need or want to care about, of course. But I think it is still very important to have it publicly available, not sitting on some private server of the company Monty Program AB.

The reason is that the whole idea with MariaDB is to make a community branch of MySQL, developed by the community and for the community. We want MariaDB the project to be bigger than Monty Program AB the company. And since the Buildbot testing is so central to the whole MariaDB development process, the Buildbot setup also needs to be available for the community. Want to improve the setup, just see what it is doing, or even set up your own master to show you can do a better job (and yes our Windows setup currently really suck)? Just go ahead! Wondering how the Buildbot setup can be continued if Monty Program AB disappears or turns fascist? Now there is an answer.

Hopefully the configuration can also be useful as an example for people doing fancy things with Buildbot. There is some cool stuff in there. Like creating a source tarball on a linux host, and uploading it to be built on a Windows host (this is how releases are done, so important to check that no files are missing from the source tarball). Another cool thing is the builders that boots op KVM virtual machines on demand to build and test binary packages (.deb, .rpm, and .tar.gz) on all of the 18 Linux platforms we currently release for.

BTW, you do not get the miscellaneous passwords in the published configuration file, sorry! :-)

[The license for the configuration file (which is in fact a sizable Python script, as this is the way Buildbot is configured) is GPL.]


PlanetMySQL Voting: Vote UP / Vote DOWN