Archive for the ‘partitioning’ Category

Calpont opens up: InfiniDB Open Source Analytical Database (based on MySQL)

Октябрь 27th, 2009
Open source business intelligence and data warehousing are on the rise!

If you kept up with the MySQL Performance Blog, you might have noticed a number of posts comparing the open source analytical databases Infobright, LucidDB, and MonetDB. LucidDB got some more news last week when Nick Goodman announced that the Dynamo Business Intelligence Corporation will be offering services around LucidDB, branding it as DynamoDB.

Now, to top if off, Calpont has just released InfiniDB, a GPLv2 open source version of its analytical database offering, which is based on the MySQL server.

So, let's take a quick look at InfiniDB. I haven't yet played around with it, but the features sure look interesting:

  • Column-oriented architecture (like all other analytical database products mentioned)

  • Transparent compression

  • Vertical and horizontal partitioning: on top of being column-oriented, data is also partitioned, potentially allowing for less IO to access data.

  • MVCC and support for high concurrency. It would be interesting to see how much benefit this gives when loading data, because this is usually one of the bottle necks for column-oriented databases

  • Support for ACID/Transactions

  • High performance bulkloader

  • No specialized hardware - InfiniDB is a pure software solution that can run on commidity hardware

  • MySQL compatible


The website sums up a few more features and benefits, but I think this covers the most important ones.

Calpont also offers a closed source enterprise edition, which differs from the open source by offering support for multi-node scale-out support. By that, they do not mean regular MySQL replication scale-out. Instead, the enterprise edition features a true distributed database architecture which allows you to divide incoming requests across a layer of so-called "user modules" (MySQL front ends) and "performance modules" (the actual workhorses that partition, retrieve and cache data). In this scenario, the user modules break the queries they recieve from client applications into pieces, and send them to one or more performance modules in a parallel fashion. The performance modules then retrieve the actual data from either their cache, or from the disk, and sends those back to the user modules which re-assemble the partial and intermediate results to the final resultset which is sent back to the client. (see picture)
shared-disk-arch-simple
Given the MySQL compatibility and otherwise similar features, I think it is fair to compare the open source InfiniDB offering to the Infobright community edition. Interesting differences are that InfiniDB supports all usual DML statements (INSERT, DELETE, UPDATE), and that InfiniDB offers the same bulkloader in both the community edition as well as the enterprise edition: Infobright community edition does not support DML, and offers a bulk loader that is less performant than the one included in its enterprise edition. I have not heard of an InfoBright multi-node option, so when comparing the enterprise edition featuresets, that seems like an advantage too in Calpont's offering.

Please understand that I am not endorsing one of these products over the other: I'm just doing a checkbox feature list comparison here. What it mostly boils down to, is that users that need an affordable analytical database now have even more choice than before. In addition, it adds a bit more competition for the vendors, and I expect them all to improve as a result of that. These are interesting times for the BI and data warehousing market :)

PlanetMySQL Voting: Vote UP / Vote DOWN

Spider and vertical partition engines with new goodies

Октябрь 15th, 2009

sharding for the masses

The Spider storage engine should be already known to the community. Its version 2.5 has recently been released, with new features, the most important of which is that you can execute remote SQL statements in the backend servers. The method is quite simple. Together with Spider, you also get an UDF that executes SQL code in a remote server. You send a query with parameters saying how to connect to the server, and check the result (1 for success, 0 for failure). If the SQL involves a SELECT, the result can be sent to a temporary table. Simple and effective.

In addition to the Spider engine, Kentoku SHIBA has also created the vertical partitioning engine. Instead of splitting tables by record, you split them by columns. You can define a table with column A and column B, with primary key K, and another table with column C and column D, with primary key K. The vertical partition engine allows you to define a table with columns K, A, B, C, D, which looks to the user like a regular column. The backend tables can be of any engine.
There is a MySQL University session about the Spider and VP engines today at 15:00 CEST. Free attendance!

PlanetMySQL Voting: Vote UP / Vote DOWN

Partitioning with non integer values using triggers

Сентябрь 15th, 2009
Looking at Bug#47310, which is a feature request that I hear frequently when I talk about partitions, I wrote a comment, suggesting triggers to work around the limitation.
The reason for the limitation is that allowing arbitrary functions for partitioning was too complex and it was provoking crashes and other unpleasant side effects (see the discussion under bug#18198).
But if you use a trigger, the resulting column is a plain integer, and many of the side effects disappear. The drawback is that you need to add a column to your table, and you need to use that column when searching for data. With that in mind, you can implement the workaround quite easily.

USE test;
DROP TABLE IF EXISTS users;

CREATE TABLE users (
user_id int(10) NOT NULL,
username varchar(25) DEFAULT NULL,
dummy INT not null,
PRIMARY KEY (user_id, dummy),
UNIQUE KEY username(username,dummy)
) ;

CREATE TRIGGER users_bi
BEFORE INSERT ON users
FOR EACH ROW
SET NEW.dummy = ASCII(LOWER(LEFT(NEW.username,1)));

ALTER TABLE users PARTITION BY RANGE (dummy) (
PARTITION p0 VALUES LESS THAN (96), #being f
PARTITION p1 VALUES LESS THAN (109), #being m
PARTITION p2 VALUES LESS THAN (115), #being s
PARTITION p3 VALUES LESS THAN (122) #being z
);

INSERT INTO users (user_id, username)
VALUES (1,'Joe'), (2,'Sam'),(3,'Abe'),(4,'Rich');

EXPLAIN PARTITIONS SELECT * FROM users
where username = 'Abe';
# This simple query doesn't use partition pruning.
# This is to be expected.

EXPLAIN PARTITIONS SELECT * FROM users
where dummy = ASCII('a') and username = 'Abe';
# Here, the partition pruning kicks in, at the price of an extra
# condition in the query.

PlanetMySQL Voting: Vote UP / Vote DOWN

Enterprise Software as a Service (SaaS) and Partitioning

Сентябрь 2nd, 2009

 

Partitioning, new with MySQL 5.1, has complicated interactions with queries and indexes.  If one isn’t careful it is easy to degrade performance.   For example, select queries that go with that grain (queries where partition elimination occurs) can be much quicker, but select queries that go against that grain can be much slower.   Queries that go against the grain must query each partition, so for a table with 12 partitions,  one query against that  table can result in 12 queries, one against each of the partitions.  An example of this would be a query against a month partitioned table that is looking to see how much activity a product had in the past 12 months. 

The ideal partitioning scheme would be a system where all queries only needs to access data from one partition.  This describes enterprise software deployed as a service where multiple enterprise tenants all exist within one database .  As one enterprise tenant (think of a company like a bank, manufacturing firm, or retailer, not a consumer using facebook or twitter)  only queries their own data, the enterprise tenantId provides an ideal grain on which divide up the data.  This means each table that has tenant specific data must have a tenantId.  A Sale table for a multi-tenant database would look like this:

 

CREATE TABLE  Sale (
   tenantId int NOT NULL,
   orderId int NOT NULL,
   customerId int NOT NULL,
   productId int NOT NULL,
   unit int NOT NULL,
   purchaseAmount decimal(16,2) NOT NULL,
   purchaseCost decimal(16,2) NOT NULL,
   purchaseDate datetime NOT NULL,
   PRIMARY KEY (tenantId, orderId),
   KEY idx_sale_product (productId),
   KEY idx_sale_customer (customerId),
   KEY idx_sale_purchaseDate (purchaseDate)
)  PARTITION BY LIST(tenantId) (
   PARTITION t1 VALUES IN (1) ENGINE=InnoDB,
   PARTITION t2 VALUES IN (2) ENGINE=InnoDB,
   PARTITION t3 VALUES IN (3) ENGINE=InnoDB,
   PARTITION t4 VALUES IN (4) ENGINE=InnoDB,
   PARTITION t5 VALUES IN (5) ENGINE=InnoDB)

 

If you are using InnoDB, an alternative to partitioning by tenant is to create clustered indexes by tenantId.  Before MySQL had partitioning, this was a good way to implement a multi-tenant database.  If you are curious about this type of solution you can find more here:

http://dbscience.blogspot.com/2008_07_01_archive.html

Both partitioning by tenant and using InnoDB clustered indexes as in the above article are roughly going to perform the same for large data volumes.  

The advantage that partitioning  provides is on administrative tasks like server splits.  When there are too many tenants on one server and a split needs to occur the stressed out database data can be replicated to another server.  After the replication there will be two servers, each with roughly half of the tenants inactive.  Instead of slow and hardware consuming mass delete of now inactive tenants on a server the inactive partitions can be dropped in a second.  While the server split is still painful, this makes the reallocation of tenants across servers easier and the system is fully available far earlier. 

There there are the other administrate benefits with partitioning, such as dropping the data for an inactive tenant quickly.  

A downside is that you have to keep the partitioning list or range current as new tenants are added.  You will probably want to pre-allocate tenant partitions to avoid having to add partitions at the last moment. 

However, be aware of the partitioning limitations, such as only 1024 partitions per table.  This means only 1024 tenants per database, so if you store more than 1024 tenants in one database you will want to combine multiple tenants into one partition. 

If you expect to overwhelm a single database server, and if you are developing enterprise software as a service that is very possible as even simple enterprise applications seem to generate terabytes of data these days, you should strongly considering partitioning tables by the tenant.  


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Sandbox and Spider at FrOSCon and OpenSQLCamp

Август 21st, 2009

MySQL Spider

FrOSCon and the OpenSQLCamp are about to start.
I am packing for Sankt Augustin, where I will attend the fourth edition of FrOSCon and the second OpenSQLCamp. I will have two sessions, Sharding for the masses, about the Spider storage engine and MySQL Sandbox 3, about one of my favorite tools.

The program is very rich. There will be several tracks in the main event and in the associated conferences. If you have any involvement or simply some curiority in open source matters, You will find something interesting at FrOSCon.

PlanetMySQL Voting: Vote UP / Vote DOWN

Why you don’t want to shard.

Август 6th, 2009

Note: This blog post is part 1 of 4 on building our training workshop.

The Percona training workshop will not cover sharding. If you follow our blog, you’ll notice we don’t talk much about the subject; in some cases it makes sense, but in many we’ve seen that it causes architectures to be prematurely complicated.

So let me state it: You don’t want to shard.

Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. The reason you need to shard basically comes down to one of these two reasons:

  1. Very large working set – The amount of memory you require to keep your frequently accessed data loaded exceeds what you can (economically) fit in a commodity machine. 5 years ago this was 4GB, today it is 128GB or even 256GB.  Defining “working set” is always an interesting concept here, since with good schema and indexing it normally doesn’t need to be the same size as your entire database.
  2. Too many writes – Either the IO system, or a slave can’t keep up with the amount of writes being sent to the server.  While the IO system can be improved with a RAID 10 controller w/battery backed write cache, the slave delay problem is actually very hard to solve. Maatkit has a partial-solution (via Paul Tuckfield), but it doesn’t work for all workloads.

(Yes, I am simplifying some of the scalability issues with MySQL on big machines, but I have faith that Yasufumi is making this better).

What types of Sharding are there?

Despite my cautions, if you have established that you need to shard there are quite a few options available to you:

  1. Sharding Partitioning by Application Function – This is the usually the best way to fix any of the problems mentioned above. What you do is pick a few very busy tables, and move them onto their own MySQL server.  Partition-by-function keeps the architecture still simple, and should work for most cases unless you have a single table which by itself can’t fit into the above constraints.
  2. Sharding by hash or key – This method works by picking a column on a table and try and divide up your data based on it.  You can choose any column to hash on, you just need to make sure that it will equally distribute the data equally. In practice this method can be really hard to get working right, since even if each shard has the same amount of ‘customers’, demanding users tend to by far exceed average users and some servers are overloaded while others are not.

    (Tip: There are a few famous cases of both (a) bad hashing algorithms and (b) users becoming unequal all of the sudden;  You don’t want to shard based on the first character of a username – as there will be a lot more ‘M’ than ‘Z’.  For users becoming unequal all of the sudden, it’s always interesting to think of what scaling challenges Flickr would have had for the official Obama photographer in the lead up to the 08 election.)

  3. Sharding via a Lookup Service - This method works by having some sort of directory service which you query first to ask “what shard number will this users data exist on?”.  It’s a highly scalable architecture, and once you write scripts to be able to migrate users to/from shards you can tweak and rebalanced to make sure that all your hardware is utilized efficiently.  The only problem with this method is what I stated at the start: it’s complicated.

(Note: I’ve left out some of the more complicated sharding architectures.  For example; another solution is to have shards all store fragments of data, and to cross backup those fragments across shards.)

Why is it so complex?

The reason it’s complex comes down to two reasons:

  1. The application developer has to write more code to be able to handle sharding logic (this is actually lessened with projects such as HiveDB.)
  2. Operational issues become more difficult (backing up, adding indexes, changing schema).

I think that a lot of people remember (1), but (2) can be a real pain point.  It can take a lot of work to build an application that works correctly when you are rolling through an upgrade where the schema will not be the same on all nodes.  A lot of these tasks remain only semi-automated, so from an operations perspective there can often be a lot more work to be done.

This concludes Part 1 – I hope I’ve justified why we are not covering sharding.  In Part 2, I will write about something that is going to be in the course – “XtraDB: The top 10 enhancements”, and in Part 3 “XtraDB: The top 10 parameters”.


Entry posted by morgan | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Partition by column_list ready for alpha testers

Август 4th, 2009
I got time to spend on a really old worklog I completed coding
already october 2005. I blogged about it in July 2006 and
interestingly enough it's still the second most read blog
entry on my blog (probably related to search engines in some
way).

I have merged it with the azalea tree (this is an internal code
name for our development tree, name is likely to change). This
tree contains subquery optimisations, Batched join and some more
optimisations.

I have fixed a whole bunch of bugs that always shows up in early
code. The code quality is still alpha but at least you won't find
10 bugs per hour :)


Here
you can find the launch pad tree for this code.

There are two important additions made possible by this tree.
1) New function to_seconds that is recognized by range optimiser
to enable partition pruning when partitioning like:
partition by range (to_seconds(time))
2) New partitioning functionality that makes it possible to
perform partition pruning over multiple fields.

Most of the bugs I have fixed had to do with this partition pruning
of multiple fields. The routine to discover which partitions are
needed is called find_used_partitions (in sql/opt_range.cc) and this
function is called recursively over a key tree. A key tree can be
very complex and more or less have AND of key parts using next_key_part
pointer and OR condition using left and right pointers. These left and
right pointers can however show up a little here and there in the tree
so one has to be very careful about how variables are assigned, saved
and restored. I havent' worked so much with recursive functions so this
is an interesting adventure.

Here's my latest addition of a test case to give you an idea of how it
works and also what works right now.

create table t1 (a int, b char(10), c varchar(5), d int)
partition by range column_list(a,b,c)
subpartition by key (c,d)
subpartitions 3
( partition p0 values less than (column_list(1,'abc','abc')),
partition p1 values less than (column_list(2,'abc','abc')),
partition p2 values less than (column_list(3,'abc','abc')),
partition p3 values less than (column_list(4,'abc','abc')));

insert into t1 values (1,'a','b',1),(2,'a','b',2),(3,'a','b',3);
insert into t1 values (1,'b','c',1),(2,'b','c',2),(3,'b','c',3);
insert into t1 values (1,'c','d',1),(2,'c','d',2),(3,'c','d',3);
insert into t1 values (1,'d','e',1),(2,'d','e',2),(3,'d','e',3);
select * from t1 where (a = 1 AND b < 'd' AND (c = 'b' OR (c = 'c' AND d = 1)) OR
(a = 1 AND b >= 'a' AND (c = 'c' OR (c = 'd' AND d = 2))));

So in the above select statement we are performing partition
pruning over 3 fields and subpartition pruning over 2 fields
and there are 5 different ranges in the query.

So please go ahead and try this new tree out and see if it
works for you.