Archive for the ‘clustering’ Category

Synchronously Replicating Databases Across Data Centers – Are you Insane?

Октябрь 4th, 2011
 

Well actually….no. The second Development Milestone Release of MySQL Cluster 7.2 introduces support for what we call “Multi-Site Clustering”. In this post, I’ll provide an overview of this new capability, and considerations you need to make when considering it as a deployment option to scale geographically dispersed database services.

You can read more about MySQL Cluster 7.2.1 in the article posted on the MySQL Developer Zone

MySQL Cluster has long offered Geographic Replication, distributing clusters to remote data centers to reduce the affects of geographic latency by pushing data closer to the user, as well as providing a capability for disaster recovery.

Multi-Site Clustering provides a new option for cross data center scalability. For the first time splitting data nodes across data centers is a supported deployment option. With this deployment model, users can synchronously replicate updates between data centers without needing to modify their application or schema for conflict handling, and automatically failover between those sites in the event of a node failure.

MySQL Cluster offers high availability by maintaining a configurable number of data replicas.  All replicas are synchronously maintained by a built-in 2 phase commit protocol.  Data node and communication failures are detected and handled automatically.  On recovery, data nodes automatically rejoin the cluster, synchronize with running nodes, and resume service.

All replicas of a given row are stored in a set of data nodes known as a nodegroup.  To provide service, a cluster must have at least one data node from each nodegroup available at all times.  When the cluster detects that the last node in a nodegroup has failed, the remaining cluster nodes will be gracefully shutdown, to ensure the consistency of the stored databases on recovery.

Improvements to the heartbeating mechanism used by MySQL Cluster enables greater resilience to temporary latency spikes on a WAN, thereby maintaining operation of the cluster. A new “ConnectivityCheck” mechanism is introduced, which must be explicitly configured. This extra mechanism adds messaging overheads and failure handling latency, and so is not switched on by default.

When configuring Multi-Site clustering, the following factors must be considered:

Bandwidth
Low bandwidth between data nodes can slow data node recovery.  In normal operation, the available bandwidth can limit the maximum system throughput.  If link saturation causes latency on individual links to increase, then node failures, and potentially cluster failure could occur.

Latency and performance
Synchronously committing transactions over a wide area increases the latency of operation execution and commit, therefore individual operations are slowed. To maintain the same overall throughput, higher client concurrency is required.  With the same client concurrency level, throughput will decrease relative to a lower latency configuration.

Latency and stability
Synchronous operation implies that clients wait to hear of the success or failure of each operation before continuing. Loss of communication to a node, and high latency communication to a node are indistinguishable in some cases.  To ensure availability, the Cluster monitors inter-node communication.  If a node experiences high communication latency, then it may be killed by another node, to prevent its high latency causing service loss.

Where inter-node latencies fluctuate, and are in the same range as the node-latency-monitoring trigger levels, node failures can result.  Node failures are expensive to recover from, and endanger Cluster availability. 

To avoid node failures, either the latency should be reduced, or the trigger levels should be raised.  Raising trigger levels can result in a longer time-to-detection of communication problems.

WAN latencies
Latency on an IP WAN may be a function of physical distance, routing hops, protocol layering, link failover times and rerouting times. The maximum expected latency on a link should be characterized as input to the cluster configuration.

Survivability of node failures
MySQL Cluster uses a fail fast mechanism to minimize time-to-recovery. Nodes that are suspected of being unreachable or dead are quickly excluded from the Cluster.  This mechanism is simple and fast, but sometimes takes steps that result in unnecessary cluster failure.  For this reason, latency trigger levels should be configured a safe margin
above the maximum latency variation on inter-data node links.

Users can configure various MySQL Cluster parameters including heartbeats, Connectivity_Check, GCP timeouts and transaction deadlock timeouts. You can read more about these parameters in the documentation

Recommendations for Multi-Site Clustering
- Ensure minimal, stable latency;
- Provision the network with sufficient bandwidth for the expected peak load - test with node recovery and system recovery;
- Configure the heartbeat period to ensure a safe margin above latency fluctuations;

- Configure the ConnectivtyCheckPeriod to avoid unnecessary node failures;

- Configure other timeouts accordingly including the GCP timeout, transaction deadlock timeout, and transaction inactivity timeout.

Example
The following is a recommendation of latency and bandwidth requirements for applications with high throughput and fast failure detection requirements:
- latency between remote data nodes must not exceed 20 milliseconds;
- bandwidth of the network link must be more than 1 Gigabit per Second.

For applications that do not require this type of stringent operating environment, latency and bandwidth can be relaxed, subject to the testing recommended above.

As the recommendations demonstrate, there are a number of factors that need to be considered before deploying multi-site clustering. For geo-redundancy, Oracle recommends Geographic Replication, but multi-site clustering does present an alternative deployment, subject to the considerations and constraints discussed above.

You can learn more about scaling web databases with MySQL Cluster from our new Guide.  We look forward to hearing your experiences with the new MySQL Cluster 7.2.1 DMR!


PlanetMySQL Voting: Vote UP / Vote DOWN

Simpler and Safer Clustering: MySQL Cluster Manager Update

Июль 18th, 2011

Clustered computing brings with it many benefits: high performance, high availability, scalable infrastructure, etc. But it also brings with it more complexity.

Why?

Well, by its very nature, there are more “moving parts” to monitor and manage (from physical, virtual and logical hosts) to clustering software to redundant networking components – the list goes on. And a cluster that isn’t effectively provisioned and managed will cause more downtime than the standalone systems it is designed to improve upon.

When it comes to the database industry, analysts already estimate that 50% of a typical database’s Total Cost of Ownership is attributable to staffing and downtime costs. These costs will only increase if a database cluster is not effectively monitored and managed.

Monitoring and management has been a major focus in the development of the MySQL Cluster database, and as part of this focus, the latest release of MySQL Cluster Manager (MCM) hit General Availability last week. You can read all about it in Andrew Morgan's blog.

MySQL Cluster Manager 1.1.1 makes it much simpler to get up and running, to manage the cluster and to allow multiple clusters to be managed from a single process.

MySQL Cluster Manager is part of the commercial Carrier-Grade Edition but anyone is free to download and use MySQL Cluster Manager without obligation for 30 days. This is a great way for those new to MySQL Cluster to rapidly configure and provision their first cluster.

All you need do is:

1. Go to Oracle eDelivery

2. Enter some basic details and click through the agreement

3. Select “MySQL Product Pack”, then your platform, then Go

Not only does MCM make the management of MySQL Cluster simpler, it also makes it safer. One of the largest causes of downtime is administrator error, and here MySQL Cluster Manager can significantly reduce risk.

Consider the task of upgrading rom one release of MySQL Cluster to another. This can be performed as an on-line operation, using rolling restarts to apply upgrades while still serving read and write requests. Its just one of the many operations users can perform on line (ie adding data nodes, upgrading schema, backups, etc) all of which enable MySQL Cluster to achieve 99.999% uptime.

Using a manual upgrade method on a cluster configured with 4 x data nodes, 2 x MySQL Server application nodes and 2 x management nodes, the administrator would be typing 46 x manual commands in an operation that would take around 2 ½ hours to complete. The steps are shown below:

1 x preliminary check of cluster state

8 x ssh commands per server

8 x per-process stop commands

4 x scp of configuration files (2 x mgmd & 2 x mysqld)

8 x per-process start commands

8 x checks for started and re-joined processes

8 x process completion verifications

1 x verify completion of the whole cluster.

Excludes manual editing of each configuration file.

Now compare this to using MySQL Cluster Manager:

upgrade cluster --package=7.1 mycluster;

Just 1 command and walk away and leave it.

Note – both of the processes above exclude the preparation steps of copying the new software package to each host and defining where it's located. The total operation times are based on a DBA restarting 4 x MySQL Cluster Data Nodes, each with 6GB of data, and performing 10,000 operations per second.

You can learn more about MySQL Cluster Manager from our new whitepaper and on-line demo.

We also have an on-demand webinar which covers MySQL Cluster Manager as well as other complimentary methods to managing a MySQL Cluster environment:

* NDBINFO: released with MySQL Cluster 7.1, NDBINFO presents real-time status and usage statistics, providing developers and DBAs with a simple means of pro-actively monitoring and optimizing database performance and availability.

* MySQL Cluster Advisors & Graphs: part of the MySQL Enterprise Monitor and available in the commercial MySQL Cluster Carrier Grade Edition, the Enterprise Advisor includes automated best practice rules that alert on key performance and availability metrics from MySQL Cluster data nodes.

While managing clusters will never be easy, it keeps getting a whole lot simpler !


PlanetMySQL Voting: Vote UP / Vote DOWN

A few notes on InnoDB PRIMARY KEY

Апрель 24th, 2011
InnoDB uses an index-organized data storage technique, wherein the primary key acts as the clustered index and this clustered index holds the data. Its for this reason that understanding the basics of InnoDB primary key is very important, and hence the need for these notes.
PlanetMySQL Voting: Vote UP / Vote DOWN

Video: The ScaleDB shared-disk clustering Storage Engine for MySQL

Сентябрь 23rd, 2009

Mike Hogan, CEO of ScaleDB spoke at the Boston MySQL User Group in September 2009:

ScaleDB is a storage engine for MySQL that delivers shared-disk clustering. It has been described as the Oracle RAC of MySQL. Using ScaleDB, you can scale your cluster by simply adding nodes, without partitioning your data. Each node has full read/write capability, eliminating the need for slaves, while delivering cluster-level load balancing. ScaleDB is looking for additional beta testers, there is a sign up at http://www.scaledb.com.

Slides are online (and downloadable) at http://www.slideshare.net/Sheeri/scale-db-preso-for-boston-my-sql-meetup-92009

Watch the video online at http://www.youtube.com/watch?v=emu2WfNx4KA or directly embedded here:


PlanetMySQL Voting: Vote UP / Vote DOWN

Failure scenarios and solutions in master-master replication

Август 31st, 2009

I’ve been thinking recently about the failure scenarios of MySQL replication clusters, such as master-master pairs or master-master-with-slaves. There are a few tools that are designed to help manage failover and load balancing in such clusters, by moving virtual IP addresses around. The ones I’m familiar with don’t always do the right thing when an irregularity is detected. I’ve been debating what the best way to do replication clustering with automatic failover really is.

I’d like to hear your thoughts on the following question: what types of scenarios require what kind of response from such a tool?

I can think of a number of failures. Let me give just a few simple examples in a master-master pair:

Problem: Query overload on the writable master makes mysqld unresponsive
Do nothing. Moving the queries to another server will cause cascading failures.
Problem: The writable master is completely unreachable
Fence the writable master and promote the standby master.
Problem: The writable master is reachable but unresponsive due to overload-induced swapping
Do nothing. Moving the load to another server will cause cascading failures.

I don’t want to bias the jury, so I’ll stop there and ask you to contribute your failure scenarios and what you think the correct action should be.

Related posts:

  1. MySQL replication breaks single-threaded limitation? It’s
  2. How to check MySQL replication integrity continually I have rec
  3. Pop quiz: how can one slave break another slave Suppose yo

Related posts brought to you by Yet Another Related Posts Plugin.


PlanetMySQL Voting: Vote UP / Vote DOWN