Archive for the ‘storage engine’ Category

PBXT early impressions in production use

Май 27th, 2010

With Paul McCullagh’s PBXT storage engine getting integrated into MariaDB 5.1, it’s never been easier to it out. So we have, on a slave off one of our own production systems which gets lots of inserts from our Zabbix monitoring system.

That’s possibly an ideal usage profile, since PBXT is a log based engine (simplistically stated, it indexes its transaction logs, rather than rewriting data from log into index and indexing that) so it should require less disk I/O than say InnoDB. And that means it should be particularly suited to for instance logging, which have lots of inserts on a sustained basis. Note that for short insert burst you may not see a difference with InnoDB because of caching, but sustain it and then you can notice.

Because PBXT has such different/distinct architecture there’s a lot of learning involved. Together with Paul and help from Roland Bouman we also created a stored procedure that can calculate the optimal average row size for PBXT, and even ALTER TABLE statements you can paste to convert tables. The AVG_ROW_LENGTH option is quite critical with PBXT, if set too big (or if you let PBXT guess and it gets it wrong) it’ll eat heaps more diskspace as well as being much slower, and if too small it’ll be slower also; this, it needs to be in the right ballpark. For existing datasets it can be calculated, so that’s what we’ve worked on. The procs will be published shortly, and Paul will also put them in with the rest of the PBXT files.

Another important aspect for PBXT is having sufficient cache memory allocated, otherwise operations can take much much longer. While the exact “cause” is different, one would notice similar performance aspects when using InnoDB on larger datasets and buffers that are too small for the purpose.

So, while using or converting some tables to PBXT takes a bit of consideration, effort and learning, it appears to be dealing with the real world very well so far – and that’s a testament to Paul’s experience. Paul is also very responsive to questions. As we gain more experience, it is our intent to try PBXT for some of our clients that have operational needs that might be a particularly good fit for PBXT.

I should also mention that it is possible to have a consistent transaction between PBXT, InnoDB and the binary log, because of the 2-phase commit (XA) infrastructure. This means that you should even be able to do a mysqldump with –single-transaction if you have both PBXT and InnoDB tables, and acquire a consistent snapshot!

More experiences and details to come.


PlanetMySQL Voting: Vote UP / Vote DOWN

OpenSQL (2009 Portland) talk on an Open Storage Engine API

Май 11th, 2010

I just spotted the youtube video of my OpenSQL Camp (Portland 2009) talk on An Open Storage Engine API. I talked about some of technical issues for implementing storage engines across many SQL front ends, not just MySQL.

You can find this talk and other mostly technical material at http://tokutek.com/technology/.


PlanetMySQL Voting: Vote UP / Vote DOWN

The Drizzle (and MySQL) Key tuple format

Апрель 1st, 2010

Here’s something that’s not really documented anywhere (unless you count ha_innodb.cc as a source of server documentation). You may have some idea about the MySQL/Drizzle row buffer format. This is passed around the storage engine interface: in for write_row and update_row and out for the various scan and index read methods.

If you want to see the docs for it that exist in the code, check out store_key_val_for_row in ha_innodb.cc.

However, there is another format that is passed to your engine (and that your engine is expected to understand) and for lack of a better name, I’m going to call it the key tuple format. The first place you’ll probably see this is when implementing the index_read function for a Cursor (or handler in MySQL speak).

You get two things: a pointer to the buffer and the length of the buffer. Since a key can be made up of multiple parts, some of which can be NULL and some of which can be of variable length, this buffer is not (usually) a simple value. If you are starting out in your engine development, you can use this buffer blindly as a single value for non-nullable indexes with only 1 column.

The basic format is this:

  • The buffer is in-order of the index. First column in the index is first in the buffer, second second etc.
  • The buffer must be zero-filled. The server kernel will use memcmp to compare two key values.
  • If the column is NULLable, then the first byte is set to 1 if the column is null. Else, 0 means not-null.
  • From ha_innodb.cc (for BLOBs, which I haven’t put in embedded_innodb yet): If the column is of a BLOB type (it must be a column prefix field in this case), then we put the length of the data in the field to the next 2 bytes, in the little-endian format. If the field is SQL NULL, then these 2 bytes are set to 0. Note that the length of data in the field is <= column prefix length.
  • For fixed length fields (such as int), the next max field length bytes are for that field.
  • For VARCHAR, there is always a 2 byte (in little endian) length. This is different to the row format, which may have 1 or 2 bytes. In the key tuple format it is ALWAYS two bytes.

I’ll discuss the use of this for rnd_pos() and position() in a later post…

This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).


PlanetMySQL Voting: Vote UP / Vote DOWN

Abusing MySQL (& thoughts on NoSQL)

Март 7th, 2010

The NoSQL/relational database debate has been going on for quite some time. MariaDB, like MySQL is relational. And if you read these series of blog posts, you’ll realise that if you use MySQL correctly, you can achieve quite a lot.

  1. It all starts with Kellan Elliott-McCrea with his introductory post on Using, Abusing and Scaling MySQL at Flickr. Follow the entire series.
  2. He starts of the series with Ticket Servers: Distributed Unique Primary Keys on the Cheap. Flickr scales using shards, and ticket servers give unique integers to serve as PKs.
  3. Richard Crowley talks about OpenDNS MySQL abuses. Nothing too out of the ordinary, but it shows MySQL getting the job done.
  4. Mikhail Panchenko talks about using The Federated engine for his series.

If you’re using the Federated engine, know that MySQL disables FEDERATED by default. In MariaDB 5.1.42, you get FederatedX, which is a maintained fork of FEDERATED, by the author himself! Bugs are fixed, and this is a supported engine, so if you’re using the FEDERATED engine, it might be wise to try out FederatedX.

I’d also like to bring to attention, an interesting essay by Dennis Forbes: Getting Real about NoSQL and the SQL-Isn’t-Scalable Lie. Monty says: “NoSQL is for very smart people who need a very sharp knife. People who are not capable of mastering SQL should not even attempt to try out NoSQL.”

Related posts:

  1. MySQL Ecosystem – complementary talks at the conference?
  2. MariaDB 5.1.42 released!
  3. MySQL 5.1.26-rc released, and developer resources thoughts



PlanetMySQL Voting: Vote UP / Vote DOWN

Video: The ScaleDB shared-disk clustering Storage Engine for MySQL

Сентябрь 23rd, 2009

Mike Hogan, CEO of ScaleDB spoke at the Boston MySQL User Group in September 2009:

ScaleDB is a storage engine for MySQL that delivers shared-disk clustering. It has been described as the Oracle RAC of MySQL. Using ScaleDB, you can scale your cluster by simply adding nodes, without partitioning your data. Each node has full read/write capability, eliminating the need for slaves, while delivering cluster-level load balancing. ScaleDB is looking for additional beta testers, there is a sign up at http://www.scaledb.com.

Slides are online (and downloadable) at http://www.slideshare.net/Sheeri/scale-db-preso-for-boston-my-sql-meetup-92009

Watch the video online at http://www.youtube.com/watch?v=emu2WfNx4KA or directly embedded here:


PlanetMySQL Voting: Vote UP / Vote DOWN

The mysterious Storage Engine Independent Test Suite

Сентябрь 18th, 2009
Recently Mark observed that we now all need a storage engine independent test suite, Sun included! Well, as far as I know, there is such a thing at Sun, sort of. Apparently it has been used to test PBXT and other engines, but I've heard it is not in good enough shape to be released.

But my question is, why not release it anyway? We could turn it into an engine community project. I believe there are enough engine developers out there to get this moving forward.

The secret is to start small, and just get a few tests to run with all engines. Then additional tests can be added step by step. Engines need a way to specify that they want to skip a test entirely (e.g. transactional tests), and it should be easy to customize results for various engines.

An example of a simple and elegant solution can be found in Drizzle. As Monty Taylor mentioned in a comment to Marks blog: "We have some patches to test-run in Drizzle to allow running the whole test suite with a specified storage engine".

I think it is been long enough. This could be a good opportunity to start a Sun/Community project, something like Drizzle. In other words, get something out there, even if it is incomplete, and let the community also take a large part of the responsibility.

PlanetMySQL Voting: Vote UP / Vote DOWN

How do I find the storage engine of a MySQL table

Сентябрь 18th, 2009

This seems quite a trivial question, but developers don’t often know what a MySQL storage engine is and how to determine what storage engine is used for a table.

The first choice is to describe the table with the DESC[RIBE] command. Side Note: people often don’t realize that DESC is a short acceptable version here.

mysql> desc stats;
+---------+---------------------+------+-----+-------------------+----------------+
| Field   | Type                | Null | Key | Default           | Extra          |
+---------+---------------------+------+-----+-------------------+----------------+
| stat_id | int(10) unsigned    | NO   | PRI | NULL              | auto_increment |
| created | timestamp           | NO   |     | CURRENT_TIMESTAMP |                |
| version | tinyint(3) unsigned | NO   |     | NULL              |                |
| referer | varchar(500)        | NO   |     | NULL              |                |
| q       | varchar(100)        | YES  |     | NULL              |                |
| uri     | varchar(100)        | YES  |     | NULL              |                |
| server  | text                | NO   |     | NULL              |                |
| headers | text                | YES  |     | NULL              |                |
+---------+---------------------+------+-----+-------------------+----------------+

Alias, this command does not provide the details of the storage engine.
You need to use the SHOW CREATE TABLE as a means to get a more detailed description including the storage engine.

mysql> SHOW CREATE TABLE stats\G
*************************** 1. row ***************************
       Table: stats
Create Table: CREATE TABLE `stats` (
  `stat_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `version` tinyint(3) unsigned NOT NULL,
  `referer` varchar(500) NOT NULL,
  `q` varchar(100) DEFAULT NULL,
  `uri` varchar(100) DEFAULT NULL,
  `server` text NOT NULL,
  `headers` text,
  PRIMARY KEY (`stat_id`)
) ENGINE=MyISAM AUTO_INCREMENT=21964 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

As another side note tip, try the \G instead of a semicolon (;) which provides a better vertical output.

Another option is to query the INFORMATION_SCHEMA.TABLES meta data.

mysql> SELECT table_name,engine FROM INFORMATION_SCHEMA.TABLES WHERE table_schema=DATABASE();
+--------------+--------+
| table_name   | engine |
+--------------+--------+
| stats        | MyISAM |
+--------------+--------+
1 row in set (0.00 sec)

PlanetMySQL Voting: Vote UP / Vote DOWN

How do I find the storage engine of a MySQL table

Сентябрь 18th, 2009

This seems quite a trivial question, but developers don’t often know what a MySQL storage engine is and how to determine what storage engine is used for a table.

The first choice is to describe the table with the DESC[RIBE] command. Side Note: people often don’t realize that DESC is a short acceptable version here.

mysql> desc stats;
+---------+---------------------+------+-----+-------------------+----------------+
| Field   | Type                | Null | Key | Default           | Extra          |
+---------+---------------------+------+-----+-------------------+----------------+
| stat_id | int(10) unsigned    | NO   | PRI | NULL              | auto_increment |
| created | timestamp           | NO   |     | CURRENT_TIMESTAMP |                |
| version | tinyint(3) unsigned | NO   |     | NULL              |                |
| referer | varchar(500)        | NO   |     | NULL              |                |
| q       | varchar(100)        | YES  |     | NULL              |                |
| uri     | varchar(100)        | YES  |     | NULL              |                |
| server  | text                | NO   |     | NULL              |                |
| headers | text                | YES  |     | NULL              |                |
+---------+---------------------+------+-----+-------------------+----------------+

Alias, this command does not provide the details of the storage engine.
You need to use the SHOW CREATE TABLE as a means to get a more detailed description including the storage engine.

mysql> SHOW CREATE TABLE stats\G
*************************** 1. row ***************************
       Table: stats
Create Table: CREATE TABLE `stats` (
  `stat_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `version` tinyint(3) unsigned NOT NULL,
  `referer` varchar(500) NOT NULL,
  `q` varchar(100) DEFAULT NULL,
  `uri` varchar(100) DEFAULT NULL,
  `server` text NOT NULL,
  `headers` text,
  PRIMARY KEY (`stat_id`)
) ENGINE=MyISAM AUTO_INCREMENT=21964 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

As another side note tip, try the \G instead of a semicolon (;) which provides a better vertical output.

Another option is to query the INFORMATION_SCHEMA.TABLES meta data.

mysql> SELECT table_name,engine FROM INFORMATION_SCHEMA.TABLES WHERE table_schema=DATABASE();
+--------------+--------+
| table_name   | engine |
+--------------+--------+
| stats        | MyISAM |
+--------------+--------+
1 row in set (0.00 sec)

PlanetMySQL Voting: Vote UP / Vote DOWN

Sharding for the masses: Introducing the SPIDER storage engine (OpenSQLCamp @ FrOSCon)

Август 28th, 2009

This is the Sharding for the masses: Introducing the SPIDER storage engine by Giuseppe Maxia, given at OpenSQLCamp, at FrOSCon, in August 2009. These are somewhat live notes, and the slides are available too.

Why sharding? Scaling, of course. The MySQL way to solve this, is replication (even Yahoo! and Google use this).

When the master doesn’t have enough resources to cope with what you do (i.e. large data sets), replication chokes.

You can use proxies for sharding. There exists MySQL Proxy (can be programmed using a scripting language – Lua), HSCALE (built on top of MySQL Proxy), SpockProxy (a fork of MySQL Proxy, without LUA scripting, specialised for sharding), in the market these days. This however, is the single point of failure – everything has to pass through one proxy.

Enter SPIDER – a MySQL storage engine, built on top of the partitions engine. It associates a partition with a remote server, and is transparent to the user. Its developed by Kentoku Shiba.

Installation: Get 5.1.37 sources, then get the source code for Spider 1.0, and then get the patch for condition pushdown.

Why the condition pushdown patch? Remote server works less, by receiving the condition. The SPIDER engine without the condition pushdown patch is still fast, but it can be more than 10x faster with condition pushdowns.

http://dev.mysql.com/doc/refman/5.1/en/condition-pushdown-optimization.html (works with NDBCLUSTER), http://dev.mysql.com/doc/refman/5.4/en/condition-pushdown-optimization.html (works with MyISAM). The patch by Kentoku, will add cond_push and cond_pop, to ha_partition – so now, every storage engine that uses table partitioning can get condition pushdown through ha_partition.

You need to setup the engine first: http://datacharmer.org/downloads/spider_setup.sql (the SQL is also available in the DOCS).

spider_remote_employees.sql – use this in conjunction with http://launchpad.net/test-db/ – a good example of how to use the SPIDER storage engine.



PlanetMySQL Voting: Vote UP / Vote DOWN

RethinkDB all the rage today

Июль 29th, 2009

RethinkDB is all the rage today, as its a Y Combinator funded startup, which also launched a developer pre-alpha today. So what is RethinkDB you ask? Yet-another-MySQL-storage-engine, that’s what. But this time, its tuned for solid-state drives (SSDs), which also happen to be all the rage these days.

Anyway, check them out more, and the materials currently tell me that they’re using append-only algorithms, which allow for live schema changes and hot backups, with instantaneous recovery from power failure. Those are just some of the exciting bits.

What didn’t excite me so much was the fact that you were only getting 32-bit or 64-bit Linux binaries, built against MySQL 5.1.31 and you’ll just install it via the INSTALL PLUGIN option. But they are trying to get some semblance of a community growing, with their getting involved page, filled with some papers, as well as a support mailing list (I see Mark Callaghan is already busy asking them questions). And of course you can follow them on their blog, or on Twitter. All this without source ;-)

One of the developers also confirmed that they’re adding “features required by Wordpress so we could eat our own dogfood”. They haven’t started profiling (much yet?), and they’ve probably got ways to go on performance. Seems like “getting it working for WordPress”, is slowly becoming a good testing ground – Jeff Waugh did so for WordPress and Drizzle, too.

Anyway, it seems like its time to get some SSDs, as we start seeing things like this pop up. RethinkDB will also face another problem for mass adoption – how many hosting providers are using SSDs? Probably not many (if at all).

Have you tried RethinkDB? Your thoughts?