Archive for the ‘sharding’ Category

Words about PLUK2011

Октябрь 28th, 2011

Percona Live London 2011 is over and it was a wonderfull event. Thanks to Percona for this quality event in Europe.
And thanks to all speakers for their outstanding performances.

I just want to share with you some words about PLUK2011, these few words reflect my technical feelings after the event :
(This is the live transcript of my notes during the conference in random order)

  • MySQL is not dead !
  • Sharding is a good way for write scalability
  • It’s difficult to find a real HA solution for MySQL
  • MHA rocks !
  • IO are more than ever our enemies (and we must get into their privacy)
  • Where is Oracle ?!
  • Glad to see SkySQL there
  • We probably need new références for SSD
  • Keep in mind the operability of the proposed solutions
  • Handler socket is not ready for production
  • Cross schema queries are bad for Tungsten Replicator

I was really glad to meet (or see again) people and speak about MySQL and other stuff…
And very impressed to see so many people at the event.

See you next year !

PS : You can find some of the conferences slides on my public box.net in the front page


PlanetMySQL Voting: Vote UP / Vote DOWN

DbCharmer 1.7.0 Release: Rails 3.0 Support and Forced Slave Reads

Сентябрь 1st, 2011

This week, after 3 months in the works, we’ve finally released version 1.7.0 of DbCharmer ruby gem – Rails plugin that significantly extends ActiveRecord’s ability to work with multiple databases and/or database servers by adding features like multiple databases support, master/slave topologies support, sharding, etc.

New features in this release:

  • Rails 3.0 support. We’ve worked really hard to bring all the features we supported in Rails 2.X to the new version of Rails and now I’m proud that we’ve implemented them all and the implementation looks much cleaner and more universal (all kinds of relations in rails 3 work in exactly the same way and we do not need to implement connection switching for all kinds of weird corner-cases in ActiveRecord).
  • Forced Slave Reads functionality. Now we could have models with slaves that are not used by default, but could be turned on globally (per-controller, per-action or in a block). This is a new feature that brings our master/slave routing capabilities to a really new level – we could now use it for a really mission-critical models on demand and not be afraid of breaking major functionality of our applications by switching them to slave reads.
  • Lots of changes were made in the structure of our code and tests to make sure it would be much easier for new developers to understand DbCharmer internals and make changes in its code.

Along with the new release we’ve got a brand new web site. You can find much better, cleaner and, most importantly, correct documentation for the library on the web site. We’ll be adding more examples, will try to add more in-depth explanation of our core functions, etc.

If you have any questions about the release, feel free to ask them in our new mailing list: DbCharmer Users Group.

For more updates in our releases, you can follow @DbCharmer on Twitter.



PlanetMySQL Voting: Vote UP / Vote DOWN

Intra-query parallelism for MySQL queries without an appliance or closed source database

Май 25th, 2010
Over the weekend I spent a lot of time improving my new Shard-Query tool (code.google.com/p/shard-query) and the improvements can equate to big performance gains on partitioned data sets versus executing the query directly on MySQL.


I'll explain this graph below, but lower is better (response time) and Shard-Query is the red line.

MySQL understands that queries which access data in only certain partitions don't have to read the rest of the table. This partition elimination works well, but MySQL left a big optimization out of partitioning: getting data in parallel.

In fact, since partition elimination is the only major optimization provided by the partition options it isn't great for scaling access to large data sets when the entire data set must be accessed, but only when smaller parts of a the set are examined.

Since Shard-Query exploits parallelism with Gearman (http://www.gearman.org) I decided to extend the Shard-Query "optimizer" to support running queries with IN lists in parallel. This makes a query scale much further than it would if there was no parallelism at work.

Consider the table following partitioned fact table:
CREATE TABLE `fact` (
  `id` bigint(20) unsigned DEFAULT NULL,
  `a_id` bigint(20) unsigned DEFAULT NULL,
  `b_id` int(11) NOT NULL,
  `c_id` int(11) NOT NULL,
  `i1` tinyint(4) DEFAULT NULL,
  `qty` smallint(6) DEFAULT NULL,
  `score` decimal(10,10) DEFAULT NULL,
  `price` decimal(7,3) DEFAULT NULL,
  `i2` int(11) DEFAULT NULL,
  `i3` int(11) DEFAULT NULL,
  `wide_row` char(54) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (i1) PARTITIONS 100 */

The table is partitioned into 100 partitions, and there are 100 distinct values for i1. This means that all the values for a particular i1 are housed in a single partition.

Consider the following query:
select price*qty from fact where i1 in (1,2,3);


This query is semantically equivalent to:
select price*qty from fact where i1 = 1
UNION ALL
select price*qty from fact where i1 = 2
UNION ALL
select price*qty from fact where i1 = 3


Unfortunately, MySQL does not have any intra-query parallelism, so rewriting the query that way is not an effective scaling strategy. However, if you execute all three queries at the same time, and use a temporary table as the UNION ALL, you can actually get parallelism. This is what Shard-Query does. It can take each IN list item and assign it to a worker, and stuff the results back together at the end.

The second thing that can be done to improve performance at the partition level (or the shard level) is to push down aggregation of distributable aggregate functions to the worker. If you've read my blog before you might know that the distributable aggregate functions are SUM and COUNT.

Consider a query very much like the previous query:
select shard_col, sum(price * qty) from t1 where shard_col in (1,2,3) group by shard_col;


This query features aggregation with distributable functions. This query is semantically equivalent to the following:

This query is semantically equivalent to:
SELCT shard_col, SUM(`sum(price*qty)`) as `sum(price*qty)` 
from ( select shard_col, sum(price*qty) from t1 where shard_col = 1 group by shard_col
       UNION ALL
       select shard_col, sum(price*qty) from t1 where shard_col = 1 group by shard_col
       UNION ALL
       select shard_col, sum(price*qty) from t1 where shard_col = 1 group by shard_col
) GROUP BY shard_col;


Shard-Query can push the aggregation down to the shards and it sends each query which is part of the "UNION ALL" operation in parallel.

I ran a benchmark based on the above table with 40M rows in it. That is 40K rows per shard.
The benchmark queries follow the pattern:
select i1,sum(qty*price) from SAB_SF1.fact where i1 in(1) group by i1;
select i1,sum(qty*price) from SAB_SF1.fact where i1 in(1,2) group by i1;
select i1,sum(qty*price) from SAB_SF1.fact where i1 in(1,2,3) group by i1;
select i1,sum(qty*price) from SAB_SF1.fact where i1 in(1,2,3,4) group by i1;
...

All the way up to 100 values in the IN list. The table contains 400K rows for each c1 value, and all those rows are stored in a single partition.

Here is that graph again. For this test I used an EC2 "c1.large" instance. That is, 8 cores with 8 GB of CPU. I used a 4GB data set, Percona Server 10.2 and a 1GB buffer pool size. Each partition is approximately 32MB in size.

Since the machine has eight cores, I started eight Gearman workers. The results are, I think, impressive. In the graph, "partitions scanned" is the number of values in the IN list.



This performs very well due to the pushdown of the aggregation. If a non-distributable aggregate function were to be used, then performance would probably be worse than MySQL because all 40M rows would be accessed and copied into a temporary table.
PlanetMySQL Voting: Vote UP / Vote DOWN

I wrote a new tool that runs aggregation queries over MySQL sharded databases using Gearman.

Май 23rd, 2010
I created a new tool this week:
http://code.google.com/p/shard-query

As the name Shard-Query suggests, the goal of the tool is to run a query over multiple shards, and to return the combined results together as a unified query. It uses Gearman to ask each server for a set of rows and then runs the query over the combined set. This isn't a new idea, however, Shard-Query is different than other Gearman examples I've seen, because it supports aggregation.

It does this by doing some basic query rewriting based on the input query.

Take this query for example:
select c2, 
       sum(s0.c1), 
       max(c1) 
 from t1 as s0 
 join t1 using (c1,c2) 
 where c2 = 98818 
 group by c2;


The tool will split this up into two queries.

This first query will be sent to each shard. Notice that any aggregations, grouping or sorting has been stripped away. Any joins or where clauses are still present:
select   c2 as `c2` ,   
         (s0.c1) as `(s0.c1)` ,   
         (c1) as `(c1)`   
  from   t1 s0   
  join   t1   
 using   (c1,c2)   
 where   c2   =   98818;


Shard-Query supports two different methods of coalescing the results from each shard. The first is called 'fetch' which as it sounds, fetches the results through an array, and inserts them into a temporary table. This is less desirable than the default method, which is called store. When the store method is used, each worker uses bulk insert to insert rows directly into a destination table. The downside is that the table has to be a physical table, since it must be inserted into from multiple connections. The store method is also better than fetch because the rows are inserted into the table in parallel from each shard. The insertion is serialized with the fetch method, since the results are handled by a single threaded PHP script.

Either a temporary table, or a concrete table will be created to hold the rows from all the shards depending on the fetch method.
CREATE TEMPORARY TABLE -- not a temporary table when the method is store
         IF NOT EXISTS `aggregation_tmp_#14861474` 
(        `c2` BINARY(255), 
         `(s0.c1)` BINARY(255), 
         `(c1)` BINARY(255)
) ENGINE=INNODB;


Finally, after all the rows are inserted into that table, the final rewritten query is executed. Notice that the SELECT clause has been rewritten to accommodate the column names of the temporary table. Any FROM clause has been removed and replaced with a scan of the single table. Also notice the WHERE clause is not present, because the rows were filtered at each shard:
select (`c2`), 
       sum(`(s0.c1)`), 
       max(`(c1)`)  
  from `aggregation_tmp_#14861474`                 
 group by `c2`;


Finally, the result is output and then the table is dropped.
c2      sum(`(s0.c1)`)  max(`(c1)`)
-       -               -
98818   16              8


I hope you find it useful. Right now it works with MySQL using the plain old mysql_ class of functions, but I can abstract the database access out if somebody really wants me to. This could potentially be used for just about any database which supports SQL.
PlanetMySQL Voting: Vote UP / Vote DOWN

DbCharmer – Rails Can Scale!

Апрель 17th, 2010

Back in November 2009 I was working on a project to port Scribd.com code base to Rails 2.2 and noticed that some old plugins we were using in 2.1 were abandoned by their authors. Some of them were just removed from the code base, but one needed a replacement – that was an old plugin called acts_as_readonlyable that helped us to distribute our queries among a cluster of MySQL slaves. There were some alternatives but we didn’t like them for one or another reasons so we’ve decided to go with creating our own ActiveRecord plugin, that would help us scale our databases out. That’s the story behind the first release of DbCharmer.

Today, six months after the first release of the gem and we’ve moved it to gemcutter (which is now the official gems hosting) and we’re already at version 1.6.11. The gem was downloaded more than 2000 times. There are (at least) 10+ large users that rely on this gem to scale their products out. And (this is the most exciting) we’ve added tons of new features to the product.

Here are the main features added since the first release:

  • Much better multi-database migrations support including default migrations connection changing.
  • We’ve added ActiveRecord associations preload support that makes it possible to move eager loading queries to the same connection where your finder queries go to.
  • We’ve improved ActiveRecord’s query logging feature and now you can see what connections your queries executed on (and yes, all those improvements are colorized :-) ).
  • We’ve added an ability to temporary remap any ActiveRecord connections to any other connections for a block of code (really useful when you need to make sure all your queries would go to some non-default slave and you do not want to mess with all your models).
  • The most interesting change: we’ve implemented some basic sharding functionality in ActiveRecord which currently is being used in production in our application.

As you can see now DbCharmer helps you to do three major scalability tasks in your Rails projects:

  1. Master-Slave clusters to scale out your Rails models reads.
  2. Vertical sharding by moving some of your models to a separate (maybe even dedicated) servers and still keep using AR associations
  3. Horizontal sharding by slicing your models data to pieces and placing those pieces into different databases and/or servers.

So, If you didn’t check DbCharmer out yet and you’re working on some large rails project that is (or going to be) facing scalability problems, go read the docs, download/install the gem and prove them that Rails CAN scale!



PlanetMySQL Voting: Vote UP / Vote DOWN

Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine

Март 30th, 2010
The history tells that a single RDBMS node cannot handle tons of traffics on web system which come from all over the world, no matter how the database is tuned. MySQL has implemented a master/slave style replication built-in for long time, and it has enabled web applications to handle traffics using a scale-out strategy. Having many slaves has been suitable for web sites where most of traffics are reads. Thus, MySQL's master/slave replication has been used on many web sites, and is being used still.

However, when a site grow large, amount of traffic may exceed the replication's capacity. In such a case, people may use memcached. It's an in-memory, very fast and well-known KVS, key value store, and its read throughput is far better than MySQL. It's been used as a cache for web applications to store 'hot' data with MySQL as a back-end storage, as it can reduce read requests to MySQL dramatically.

While 1:N replication can scale read workload and memcached can reduce read requests, it cannot ease write load well. So, write traffic gets higher and higher when a web site becomes huge. On such web sites, a technique called "Sharding" has been used; it's a technique that the application choose an appropriate MySQL server from several servers.

In that way, MySQL+memcached has been a de-fact standard data store on huge web sites for long time.

Since web applications are getting larger still, especially on social media sites, write load is getting higher and higher as people communicate in real-time. In such area, yet another technique is required to handle the write load. Then, some people have chosen NoSQL solutions instead of MySQL+memcached. NoSQL is a kind of buzz word, IMHO, which represents non-relational databases which doesn't require SQL access. Despite lack of SQL access, some NoSQL softwares are suitable for huge scale web applications, like Cassandra. Although people cannot JOIN records on NoSQL system, it is not possible on RDBMS over the shards as well. So, MySQL isn't used as a RDBMS, is used as a data store without joins in other words, on such a web application in the first place.

For further information of this kind of thoughts, I recommend you to read Mark Calleghan's post: http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html
and this post: http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king

Technically, it is possible to handle huge amount of traffics using MySQL, but a running cost gets expensive, Twitter says. As these techniques are separate ones, so those people have to spent their time to learn all of three who implement the application over them and manage them. On the other hand, Cassandra can handle more traffics as a single database management system, so people only have to learn it instead of three. Sounds great? But, is it a really good choice?

No! They're not aware of yet another solution, say SPIDER storage engine!

SPIDER for MySQL
http://spiderformysql.com/

SPIDER is a storage engine developed by a Japanese MySQL hacker, Mr. Kentoku Shiba, it makes use of MySQL's partitioning functionality and store partitioned data onto remote servers. I may say it's a Sharding storage engine. While flexibility of MySQL's storage engine API enables such an engine, but I value Kentoku's design a lot.

The following picture depicts how SPIDER storage engine works. (This is a snippet from the site above.)
In this entry, I do not explain how to use SPIDER storage engine, but I tell you how great its ability is. If you want to try it out, please refer to Giuseppe Maxia's post.

Please look at the following graph, which represents an INSERT performance comparing a single MySQL server (InnoDB), 2 SPIDER node + 2 backend MySQL server and 4 SPIDER node + 4 backend MySQL Server. You can see how good it scales.
The next graph is a SELECT performance. Read scales pretty good as well.
Red circles indicate where working set sizes exceed memory sizes. While performance drops when a working set size exceeds the available memory size, SPIDER is able to expand the memory so that a working set fits in it. SPIDER can make use of memory on all remote servers, as if there is a huge buffer pool in total.

For more information about SPIDER's performance test, please refer to Kentoku's slide. It's surprising.

The most significant problem for twitter is to scale out read/write load with less running cost. Unfortunately, they had chosen NoSQL solution due to the fact that "MySQL replication + memcached + sharding" cannot handle write intensive workload well. However, such a problem can be resolved using SPIDER storage engine with MySQL!

Generally, KVS cannot solve certain problems like below:
  • JOIN
  • Sort (ORDER BY)
  • Aggregation (GROUP BY)
When using KVS, these problems can be handled using MapReduce, however, we can process the same task using a very simple SQL in general. Thus, SQL allows us to develop a complex logic very efficiently. When I ask Kentoku permission to write an article about his storage engine, he told me his philosophy like below:
I think that the most significant benefit to use RDB is its usefulness and flexibility. It is a very important characteristic for developers in order to keep the application competitive, especially for those developers who have to add new features/functionalities day by day, like web services. I develop SPIDER storage engine in order to provide developers such useful and flexible RDB's characteristics, even on the environment where the traffic and data is huge thus Sharding is required.
I 100% agree with his opinion. If you are facing the problem caused by high traffic and huge data just like twitter, please consider to use SPIDER storage engine before migrating to NoSQL solutions.

PlanetMySQL Voting: Vote UP / Vote DOWN

Spider and vertical partition engines with new goodies

Октябрь 15th, 2009

sharding for the masses

The Spider storage engine should be already known to the community. Its version 2.5 has recently been released, with new features, the most important of which is that you can execute remote SQL statements in the backend servers. The method is quite simple. Together with Spider, you also get an UDF that executes SQL code in a remote server. You send a query with parameters saying how to connect to the server, and check the result (1 for success, 0 for failure). If the SQL involves a SELECT, the result can be sent to a temporary table. Simple and effective.

In addition to the Spider engine, Kentoku SHIBA has also created the vertical partitioning engine. Instead of splitting tables by record, you split them by columns. You can define a table with column A and column B, with primary key K, and another table with column C and column D, with primary key K. The vertical partition engine allows you to define a table with columns K, A, B, C, D, which looks to the user like a regular column. The backend tables can be of any engine.
There is a MySQL University session about the Spider and VP engines today at 15:00 CEST. Free attendance!

PlanetMySQL Voting: Vote UP / Vote DOWN

Video: The ScaleDB shared-disk clustering Storage Engine for MySQL

Сентябрь 23rd, 2009

Mike Hogan, CEO of ScaleDB spoke at the Boston MySQL User Group in September 2009:

ScaleDB is a storage engine for MySQL that delivers shared-disk clustering. It has been described as the Oracle RAC of MySQL. Using ScaleDB, you can scale your cluster by simply adding nodes, without partitioning your data. Each node has full read/write capability, eliminating the need for slaves, while delivering cluster-level load balancing. ScaleDB is looking for additional beta testers, there is a sign up at http://www.scaledb.com.

Slides are online (and downloadable) at http://www.slideshare.net/Sheeri/scale-db-preso-for-boston-my-sql-meetup-92009

Watch the video online at http://www.youtube.com/watch?v=emu2WfNx4KA or directly embedded here:


PlanetMySQL Voting: Vote UP / Vote DOWN

How to generate per-database traffic statistics using mk-query-digest

Сентябрь 17th, 2009

We often encounter customers who have partitioned their applications among a number of databases within the same instance of MySQL (think application service providers who have a separate database per customer organization … or wordpress-mu type of apps). For example, take the following single MySQL instance with multiple (identical) databases:

SHOW DATABASES;
+----------+
| Database |
+----------+
| db1      |
| db2      |
| db3      |
| db4      |
| mysql    |
+----------+

Separating the data in this manner is a great setup for being able to scale by simply migrating a subset of the databases to a different physical host when the existing host begins to get overloaded. But MySQL doesn’t allow us to examine statistics on a per-database basis.

Enter Maatkit.

There is an often-ignored gem in Maatkit’s mk-query-digest, and that is the –group-by argument. This can be used to aggregate information by tables, hosts, users, or databases (full documentation is available via perldoc).

%> perl mk-query-digest --limit 100% --group-by db slow.log
...
# Rank Query ID Response time Calls R/Call Item
# ==== ======== ============= ===== ====== ====
#    1 0x       6000 60.0%    6000  0.5124 db3
#    2 0x       2000 20.0%    2000  0.0112 db1
#    3 0x       1500 15.0%    1500  0.1665 db2
#    4 0x        500  5.0%     500  0.0022 db4

So here, we can see that the majority (60%, to be exact) of execution time is spent in db3. If the server is reaching it’s capacity and the next most useful performance optimization is to migrate a database to a different server, you know exactly which database to move (db3) and how much room that will give you on the original host (60% growth) and on the new host (40% growth), which may have a direct bearing on your hardware selection.

Let Baron know how awesome you think this is by getting him a gift from his Amazon Wish List!


Entry posted by Ryan Lowe | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Sharding for the masses: Introducing the SPIDER storage engine (OpenSQLCamp @ FrOSCon)

Август 28th, 2009

This is the Sharding for the masses: Introducing the SPIDER storage engine by Giuseppe Maxia, given at OpenSQLCamp, at FrOSCon, in August 2009. These are somewhat live notes, and the slides are available too.

Why sharding? Scaling, of course. The MySQL way to solve this, is replication (even Yahoo! and Google use this).

When the master doesn’t have enough resources to cope with what you do (i.e. large data sets), replication chokes.

You can use proxies for sharding. There exists MySQL Proxy (can be programmed using a scripting language – Lua), HSCALE (built on top of MySQL Proxy), SpockProxy (a fork of MySQL Proxy, without LUA scripting, specialised for sharding), in the market these days. This however, is the single point of failure – everything has to pass through one proxy.

Enter SPIDER – a MySQL storage engine, built on top of the partitions engine. It associates a partition with a remote server, and is transparent to the user. Its developed by Kentoku Shiba.

Installation: Get 5.1.37 sources, then get the source code for Spider 1.0, and then get the patch for condition pushdown.

Why the condition pushdown patch? Remote server works less, by receiving the condition. The SPIDER engine without the condition pushdown patch is still fast, but it can be more than 10x faster with condition pushdowns.

http://dev.mysql.com/doc/refman/5.1/en/condition-pushdown-optimization.html (works with NDBCLUSTER), http://dev.mysql.com/doc/refman/5.4/en/condition-pushdown-optimization.html (works with MyISAM). The patch by Kentoku, will add cond_push and cond_pop, to ha_partition – so now, every storage engine that uses table partitioning can get condition pushdown through ha_partition.

You need to setup the engine first: http://datacharmer.org/downloads/spider_setup.sql (the SQL is also available in the DOCS).

spider_remote_employees.sql – use this in conjunction with http://launchpad.net/test-db/ – a good example of how to use the SPIDER storage engine.



PlanetMySQL Voting: Vote UP / Vote DOWN