Archive for the ‘MySQL Cluster’ Category

Who will win a MySQL Cluster 7.2 shirt today??

Март 12th, 2012

Someone will win a limited edition and very rare MySQL Cluster 7.2 shirt today

Someone will win this shirt at the North Texas MySQL User Group meeting on Monday March 12th

at the North Texas MySQL Users Group Meeting. Many other prizes including thumb drives, buttons, and lanyards plus we will have pizza. Please RSVP to make sure we get enough food!

The presentation will be on MySQL 5.6 + 20 MySQL Tips in 20 Minutes!

Date: Monday, March 12, 2012 Remind Me
Time: 4:45 PM – 7:00 PM
Oracle Corporation
6031 Connection Drive
Room 900
Irving, TX



PlanetMySQL Voting: Vote UP / Vote DOWN

Getting the best performance out of MySQL Cluster – White Paper

Март 7th, 2012

There are a number of resources available to help get you up and running with MySQL Cluster so that you can start experimenting – including the quick start guides from the download page but what happens when you get further down the line and need to get the best performance from your Cluster? One useful source is the Guide to Optimizing Performance of the MySQL Cluster Database which covers how best to configure MySQL Cluster as well as what you should be doing with your application and schema to get the best performance.

Any feedback on this white paper would be greatly appreciated – whether it be a recommendation that you found gave good results, something that didn’t work or a killer tip that we missed out.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Scaling breakfast seminar – London, April 25th

Февраль 28th, 2012

I’ll be presenting on/demoing MySQL Cluster 7.2 at this free breakfast seminar in Oracle’s London office on 25th April – starting with coffee at 9:00 and ending with lunch at 13:00 (quite a generous take on “breakfast”!). Space is limited and so if you would like to attend then register early here.

As well as MySQL Cluster there will be sessions on optimising MySQL Server for performance and scaling and Oracle’s roadmap for cloud deployment.

Full agenda:

09:00 Registration and Welcome Coffee
09:30 Introduction
Simon Deighton, MySQL Sales Manager
09:45 MySQL Database: Performance & Scalability Optimizations
Tony Holmes, Principal PreSales Consultant
10:45 Coffee/Tea Break
11:00 Performance & Scalability with MySQL Cluster 7.2
Mat Keep, Senior Product Marketing Manager & Andrew Morgan, Senior Product Manager
12:00 The MySQL Roadmap: Discover What’s Next For On-Premise & Cloud-Based Deployments
Tony Holmes, Principal PreSales Consultant
12:45 Q&A
13:00 Light lunch buffet and end of seminar

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Connector/J extension points – Load Balancing Strategies

Февраль 25th, 2012

A fourth and final Connector/J extension point I covered in my JavaOne and Silicon Valley Code Camp presentations is load-balancing strategies.  This exists in order to allow you to define behavior for balancing load across multiple back-end MySQL server instances.  MySQL Connector/J’s load-balancing implementation is a simple internal connection pool.  What appears to your application as a single Connection object can actually have multiple physical connections to MySQL servers underneath (one per configured host/port pair).  At specific points, Connector/J will re-balance and choose another host to interface with.  This extension point allows you to define how Connector/J determines which host it should pick next.

Unlike the previous extension points, my demo code does not contain examples of this.  In this case, though, there are some standard implementations provided with Connector/J that we can look at, instead.  The two implementations shipped with Connector/J today implement a “best response time” strategy and a “random” strategy.  The default behavior when using load-balanced deployments is “random”, and the Connector/J configuration properties documentation describes their use cases.  Below are the actual classes which define the behavior:

  • com.mysql.jdbc.RandomBalanceStrategy
  • com.mysql.jdbc.BestResponseTimeBalanceStrategy

These classes – and any user-implemented load-balancing strategy – implements the com.mysql.jdbc.BalanceStrategy interface.  As the purpose of this extension point is to define how load is balanced across MySQL instances, there’s really only one key method you need to focus on:  pickConnection().  The purpose of this method is to return a Connection (more specifically, a com.mysql.jdbc.ConnectionImpl object).  Looking at the RandomLoadBalacneStrategy code, you will see the logic that’s involved.

The first parameter to pickConnection() is a LoadBalancingConnectionProxy object.  This is the object that does much of the load-balancing work.  It also contains a few callback methods you will want to consider:

  1. getGlobalBlacklist() – this method returns a Map<String, Long> of hosts that have been identified as unavailable.  The String key is the host/port, while the Long is the time that the blacklist entry should expire.  Inside the proxy, this global blacklist is defined as a static Map, meaning that Host X will be found in the blacklist by one Connection if another Connection object put it there after experiencing problems.  Access to the static variable is synchronized, and the Map returned from this method is a local copy.
  2. shouldExceptionTriggerFailover() – this method takes a SQLException and determines whether such an Exception should trigger a failover.  This, too, is user-configurable, although the defaults are usually sufficient for most deployments.  A previous post contains detailed information on how to customize this behavior.
  3. addToGlobalBlacklist() – this is the method you want to call if you want to add a host to the global blacklist.
  4. createConnectionForHost() – this is a utility method that handles creation of a new ConnectionImpl object based on the host/port String, so that you don’t have to wire up ConnectionImpl objects directly.  If you look at the method implementation, you will see the work that goes into setting up a properly-configured ConnectionImpl.

The remaining parameters are, in order:

  • List<String> – a list of configured hosts involved in load-balancing
  • Map<String, ConnectionImpl> – a Map of “live” connections already established, accessed through the host/port key.  Thinking of this as a connection pool, these are the cached connections which can be reused if the host/port pair is chosen, instead of doing the additional work of setting up a new physical connection.
  • long[] – an array of response times in the same order as the List<String> of configured hosts.  This is used in BestResponseTimeBalanceStrategy.
  • int – number of retries that should be attempted before giving up on finding a new connection.

So, what can you do with this?  People frequently ask for a true round-robin load-balancer.  Our experience has been that RandomBalanceStrategy is far better, but if you really need a true round-robin load-balancing algorithm, you could implement it here.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Deploy MySQL Cluster 7.2 GA in 288 seconds

Февраль 23rd, 2012
It seems that our friends at Oracle have been pretty busy with the GA release of MySQL Cluster 7.2.

This is not just old wine in new bottles.

While it may be a dot release, it does appear to be a more significant step forward than a dot release would imply.

First off, we are very excited to announce that the Severalnines Cluster Configurator now includes support for 7.2 GA.

As the title of this blog suggests, it is possible, as we have experienced, to deploy 7.2 GA in 288 seconds, i.e. just a few minutes. This was done on m1.large instances on Amazon. We have published a quick how-to deployment guide for Amazon here: http://support.severalnines.com/entries/20916823-installing-on-amazon-ec2.

This being said, let’s have a look at some of the new features in 7.2 and why they matter.

MySQL Cluster has always suffered from poor JOIN performance, mainly because the nested loop algorithm did not work very well in a parallel environment. That meant the join was executed entirely in the MySQL Server. This resulted in a lot of network round-trips between the MySQL Server and the data nodes.

Now with 7.2, this has changed. With Adaptive Query Localization, a JOIN query is now shipped down to the data nodes and executed in parallel on local copies of data.


Results look very promising, with up to 70x factor improvement on some queries as described in this excellent post.

By getting better index statistics, the SQL optimizer can also build a better execution plan for each query. This in turn reduces the need for manual query tuning. Prior to 7.2, a lot of index hinting was needed. However, 70x speed improvement does not mean that a single query on MySQL Cluster will perform as fast as on InnoDB, since there will always be the network latency between SQL and Data nodes. But it does mean that many queries that were running very slowly in 7.1 might now run fast enough.

The Memcached API is a new addition to an already extensive list of API connectors, aimed at existing Memcached users and the NoSQL community. For a product trying to be attractive to the NoSQL community, perhaps the addition of distributed parallel JOIN algorithms may be a bit confusing. NoSQL programmers typically avoid JOIN operations, for reasons including scalability and schema flexibility.

Is someone trying to have their cake and eat it too? :-)

Among the other features described here, the separation of Data Nodes across data centers (and specially Figure 3) is misleading. With a 2-phase commit protocol to guarantee synchronous replication within a Node Group, NDB would have to defy the laws of physics to support the below diagram. Unless the data centers are physically close to each other with a low-latency redundant link in-between them, splitting of data nodes across data centers is not a good idea.

It is also worth noting that 7.2 GA is based on MySQL 5.5, whereas 7.1 was based on MySQL 5.1 – probably more relevant for users with mixed NDB and InnoDB tables.

So: is MySQL Cluster 7.2 for you?

If yes, then try it out using the Severalnines Cluster Configurator: http://www.severalnines.com/cluster-configurator/).

It will take just a few minutes to generate a deployment package, which automates the installation of the cluster. Or it might take 288 seconds ... let us know ;-)

PlanetMySQL Voting: Vote UP / Vote DOWN

1 Billion queries per minute and much more – free webinar on MySQL Cluster 7.2 GA

Февраль 15th, 2012
1 Billion queries per minute with MySQL Cluster

1 Billion queries per minute with MySQL Cluster

Oracle announced the General Availability of MySQL Cluster 7.2 today. Join this live webinar to learn about what’s new in the production-ready, GA release of MySQL Cluster 7.2, enabling the latest generation of web and telecoms applications to take advantage of high write scalability, SQL and NoSQL interfaces and 99.999% availability, including:

  • Performance enhancements delivering 1 billion queries per minute, using just 8 data nodes
  • 70x higher JOIN performance with Adaptive Query Localization, enabling real-time analytics across live data sets
  • New NoSQL API via Memcached, creating a persistent, key-value datastore for schema and schemaless data
  • Auto-sharding across data centers with synchronous replication for scaling of highly available, global services
  • Simplified ease-of-use with new options for on-premise and cloud deployments
  • Integration with the latest MySQL 5.5 GA release

The webinar takes place on Thursday 23rd February at 09:00 PST, 17:00 GMT, 18:00 CET. Mat Keep and I will be presenting.

As always, the webinar is free but you’ll need to register here in advance - even if you can’t make the live event, this will make sure that you get emailed a link to the recording.


PlanetMySQL Voting: Vote UP / Vote DOWN

1 Billion Queries Per Minute – MySQL Cluster 7.2 is GA!

Февраль 15th, 2012

1 Billion queries per minute with MySQL Cluster

1 Billion queries per minute with MySQL Cluster

Oracle have just announced that MySQL Cluster 7.2 is now GA and available for production deployments.

Amongst the highlights for the release are:

  • Performance enhancements delivering 1 billion queries per minute, using just 8 data nodes
  • 70x higher JOIN performance with Adaptive Query Localization, enabling real-time analytics across live data sets
  • New NoSQL API via Memcached, creating a persistent, key-value datastore for schema and schemaless data
  • Auto-sharding across data centers with synchronous replication for scaling of highly available, global services
  • Simplified ease-of-use with new options for on-premise and cloud deployments
  • Integration with the latest MySQL 5.5 GA release

You can find more of the details on this release together with links to lots of resources from this MySQL Dev-Zone article – “MySQL Cluster 7.2 GA Released, Delivers 1 BILLION Queries per Minute”


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Cluster Manager 1.1.4 Released – includes support for MySQL Cluster 7.2

Февраль 15th, 2012

MySQL Cluster Manager 1.1. is now available to download and try from Oracle E-Delivery (select “MySQL Database” as the product pack).

There’s lots of good stuff gone in under the covers as part of this release, with some of the highlights being:

  • Support for MySQL Cluster 7.2
  • Configuration of MySQL Server parameters
  • Verbose option added to commands for extra info on what’s going on
  • Faster Cluster rolling restarts – data nodes from different node groups will be restarted in parallel (still avoids an outage but cuts the end-to-end restart time)
  • Robustness enhancements to the configurator – especially important when managing large Clusters
  • Bug fixes (well we always need to include that one)

More details on the changes can be found in the MySQL Cluster Manager documentation.

Please give it a try and let me know what you think.


PlanetMySQL Voting: Vote UP / Vote DOWN

70x Faster Joins with AQL now GA with MySQL Cluster 7.2

Февраль 15th, 2012

70x faster joins with AQL

The new GA MySQL Cluster 7.2 Release (7.2.4) just announced by Oracle includes 2 new features which when combined can improve the performance of joins by a factor of 70x (or even higher). The first enhancement is that MySQL Cluster now provides the MySQL Server with better information on the available indexes which allows the MySQL optimizer to automatically produce better query execution plans. Previously it was up to the user to manually provide hints to the optimizer. The second new feature is Adaptive Query Localization which allows the work of the join to be distributed across the data nodes (local to the data it’s working with) rather than up in the MySQL Server; this allows more computing power to be applied to calculating the join as well as dramatically reducing the number of messages being passed around the system. The combined result is that your joins can now run MUCH faster and this post describes a test that results in a 70x speed-up for a real-world query.

The Query

11-Way Join used in Test

The join used in this test is based on a real-world example used for an on-line store/Content Management System. The original query identified all of the media in the system which was appropriate to a particular device and for which a user is entitled to access. As this query is part of a customer’s application I’ve replaced all of the table and column names.

The join runs across 11 tables (which contain 33.5K rows in total) and produces a result set of 2,060 rows, each with 19 columns. The figure to the right illustrates the join and the full join is included below.

SELECT
        tab1.uniquekey,
        tab8.name,
        tab8.tab8id,
        tab11.name,
        tab11.tab11id,
        tab11.value,
        tab10.tab10id,
        tab10.name,
        tab2.name,
        tab2.tab2id,
        tab4.value + tab5.value + tab6.value,
        tab3.colx,
        tab3.tab3id,
        tab4.tab4id,
        tab4.name,
        tab5.tab5id,
        tab5.name,
        tab6.tab6id,
        tab6.name
FROM
        tab1,tab2,tab3,tab4,tab5,tab6,tab7,tab8,tab9,tab10,tab11
WHERE
        tab7.tab2id = tab2.tab2id	AND
        tab7.tab8id = tab8.tab8id	AND
        tab9.tab2id = tab2.tab2id	AND
	tab9.tab10id = tab10.tab10id	AND
	tab10.tab11id = tab11.tab11id	AND
        tab3.tab2id = tab2.tab2id	AND
	tab3.tab4id = tab4.tab4id	AND
	tab4.tab5id = tab5.tab5id	AND
	tab4.colz =  'Y'		AND
	tab5.tab6id = tab6.tab6id	AND
	tab6.tab6id IN (6)		AND
	(tab3.tab4id IN (66, 77, 88))	AND
	tab1.tab2id = tab2.tab2id	AND
	tab1.colx = 6;

Enabling AQL

First of all, make sure that you’re using the GA version of MySQL Cluster (7.2.4 or later); the Open Source version is available from http://dev.mysql.com/downloads/cluster/#downloads

and the commercial version from the Oracle Software Delivery Cloud. You can double check that AQL is enabled:

mysql> show variables like 'ndb_join_pushdown';

| ndb_join_pushdown                   | ON |

Running the Query & Results

Test configuration

To get the full benefit from AQL, you should run “ANALYZE TABLE;” once for each of the tables (no need to repeat for every query and it only needs running on one MySQL Server in the Cluster). This is very important and you should start doing this as a matter of course when you create or modify a table.

For this test, 3 machines were used:

  1. Intel Core 2 Quad Core @2.83 GHz; 8 Gbytes RAM; single, multi-threaded data node (ndbmtd)
  2. Intel Core 2 Quad Core @2.83 GHz; 8 Gbytes RAM; single, multi-threaded data node (ndbmtd)
  3. 4 Core Fedora VM running on VirtualBox on Windows 7, single MySQL Server

The query was then run and compared to MySQL CLuster 7.1.15a:

MySQL Cluster 7.1.15a 1 minute 27.23 secs
MySQL Cluster 7.2.1 (without having run ANALYZE TABLE) 1 minute 5.3 secs 1.33x Cluster 7.1
MySQL Cluster 7.2.1 (having run ANALYZE TABLE) 1.26 secs 69.23x Cluster 7.1

How it Works

Classic Nested-Loop-Join

Traditionally, joins have been implemented in the MySQL Server where the query was executed. This is implemented as a nested-loop join; for every row from the first part of the join, a request has to be sent to the data nodes in order to fetch the data for the next level of the join and for every row in that level…. This method can result in a lot of network messages which slows down the query (as well as wasting resources). When turned on, Adaptive Query Localization results in the hard work being pushed down to the data nodes where the data is locally accessible. As a bonus, the work is divided amongst the pool of data nodes and so you get parallel execution.

NDB API

I’ll leave the real deep and dirty details to others but cover the basic concepts here. All API nodes access the data nodes using the native C++ NDB API, the MySQL Server is one example of an API node (the new Memcached Cluster API is another). This API has been expanded to allow parameterised or linked queries where the input from one query is dependent on the previous one. To borrow an example from an excellent post by Frazer Clement on the topic, the classic way to implement a join would be…

SQL > select t1.b, t2.c from t1,t2 where t1.pk=22 and t1.b=t2.pk;
  ndbapi > read column b from t1 where pk = 22;
              [round trip]
           (b = 15)
  ndbapi > read column c from t2 where pk = 15;
              [round trip]
           (c = 30)
           [ return b = 15, c = 30 ]

Using the new functionality this can be performed with a single network round trip where the second read operation is dependent on the results of the first…

  ndbapi > read column @b:=b from t1 where pk = 22;
           read column c from t2 where pk=@b;
              [round trip]
           (b = 15, c = 30)
           [ return b = 15, c = 30 ]

You can check whether your query is fitting these rules using EXPLAIN, for example:

mysql> set ndb_join_pushdown=on;
mysql> EXPLAIN SELECT COUNT(*) FROM residents,postcodes WHERE residents.postcode=postcodes.postcode AND postcodes.town="MAIDENHEAD";
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+--------------------------------------------------------------------------+
| id | select_type | table     | type   | possible_keys | key     | key_len | ref                          | rows   | Extra                                                                    |
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+--------------------------------------------------------------------------+
| 1  | SIMPLE      | residents | ALL    | NULL          | NULL    | NULL    | NULL                         | 100000 | Parent of 2 pushed join@1                                                |
| 1  | SIMPLE      | postcodes | eq_ref | PRIMARY       | PRIMARY | 22      | clusterdb.residents.postcode | 1      | Child of 'residents' in pushed join@1; Using where with pushed condition |
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+--------------------------------------------------------------------------+
mysql> EXPLAIN EXTENDED SELECT COUNT(*) FROM residents,postcodes,towns
  WHERE residents.postcode=postcodes.postcode AND
  postcodes.town=towns.town AND towns.county="Berkshire";
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table     | type   | possible_keys | key     | key_len | ref                          | rows   | filtered | Extra                                                                                                                  |
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------+
| 1  | SIMPLE      | residents | ALL    | NULL          | NULL    | NULL    | NULL                         | 100000 | 100.00   | Parent of 3 pushed join@1                                                                                              |
| 1  | SIMPLE      | postcodes | eq_ref | PRIMARY       | PRIMARY | 22      | clusterdb.residents.postcode | 1      | 100.00   | Child of 'residents' in pushed join@1                                                                                  |
| 1  | SIMPLE      | towns     | eq_ref | PRIMARY       | PRIMARY | 22      | clusterdb.postcodes.town     | 1      | 100.00   | Child of 'postcodes' in pushed join@1; Using where with pushed condition: (`clusterdb`.`towns`.`county` = 'Berkshire') |
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------+

Note that if you want to check for more details why your join isn’t currently being pushed down to the data node then you can use “EXPLAIN EXTENDED” and then “SHOW WARNINGS” to get more hints. Hopefully that will allow you to tweak your queries to get the best improvements.

PLEASE let us know your experiences and give us examples of queries that worked well and (just as importantly) those that didn’t so that we can improve the feature – just leave a comment on this Blog with your table schemas, your query and your before/after timings.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Cluster by Ralf Gebhardt

Февраль 5th, 2012

Ralf Gebhardt of SkySQL.

Cluster: shared nothing architecture (no single point of failure), synchronous replication between nodes, ACID transactions, row level locking. In-memory storage (some data can be stored on disk, but indexes must be in-memory). Checkpointing to disk for durability. It supports two types of indexes – ordered T-trees, unique hash indexes. Online operations like adding node groups, software upgrades, table alterations. Quick standard architecture diagram displayed about MySQL Cluster.

Network partitioning protocol is designed to avoid a split brain scenario. Is there at least one node from each node group? If not then this part cannot continue – graceful shutdown. Are all nodes present from any node group? If so, then this is the only viable cluster – continue to operate. Ask the arbitrator – the arbitrator which parts will continue if no arbitrator is available the cluster shuts down.

Durability – in order for a node to recover fast some data is stored locally. The REDO log is synchronized by global checkpoints (GCP). The DataMemory is synchronized by local checkpoints (LCP).

I agree with Ralf — almost impossible to talk about NDB in 25 minutes. Its very deep, you’d need at least three hours to grasp it well.

Related posts:

  1. Gong-A-Thong at LugRadio Live USA
  2. Morning sessions at MySQL MiniConf
  3. MySQL at Google


PlanetMySQL Voting: Vote UP / Vote DOWN