Archive for the ‘failover’ Category

New MySQL 5.6 Replication Utilities – mysqlfailover and mysqlrpladmin

Апрель 11th, 2012

With all of the new news coming out right now, it can be easy to miss or overlook some of the new features.

While there’s been a lot of talk about MySQL 5.6 Replication, I specifically wanted to mention the new ‘mysqlfailover’ and ‘mysqlrpladmin’ utilities.

These are two new MySQL replication utilities (results of the new Global Transaction Identifiers (GTIDs) in MySQL 5.6).

Let me quote the MySQL 5.6 Replication article for both of these utilities:

mysqlfailover

“Provides continuous monitoring of the replication topology, enabling failover to a slave in the event of an outage on the master.

The default behavior is to promote the most up-to-date slave, based on the sets of GTIDs that have been applied, but the promotion policies are fully configurable. Therefore, a user can nominate a specific candidate slave to become the new master (i.e. because it is configured in a certain way or has better performing hardware), and the utility will automatically poll the other slaves to retrieve any events so far not replicated to the candidate slave’s relay log.

This ensures no replicated transactions are lost, even if the candidate is not the most current slave when failover is initiated.”

mysqlrpladmin

“If a user needs to take a master offline for scheduled maintenance, mysqlrpladmin can perform a switchover to a slave, defaulting to the most recent slave, or to a nominated candidate slave. It can also perform a manual failover in the event a master server has gone offline.

The utility can also be used to start and stop slaves, as well as provide slave discovery and basic monitoring, including

  • Status of the slave and its I/O and SQL threads;
  • Status of replication processing through the topology, including any lag or errors;
  • Configuration of slave promotion policies.

With either of these utilities, the user may register scripts to be called in the event of a slave promotion – for example to redirect an application to use the new master for writes and for reads that must always be consistent.”

The new MySQL Utilities are only included in MySQL Workbench, which you can download from:

http://www.mysql.com/downloads/workbench/

Again, I should note this functionality is included in 5.6, which is not GA, so I certainly don’t recommend this for production usage.

I’ve not had a chance to test these out yet myself, but Dr. Chuck Bell has posted some usage/results here, which I think you’ll find useful.

Hope this helps.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

New MySQL 5.6 Replication Utilities – mysqlfailover and mysqlrpladmin

Апрель 11th, 2012

With all of the new news coming out right now, it can be easy to miss or overlook some of the new features.

While there’s been a lot of talk about MySQL 5.6 Replication, I specifically wanted to mention the new ‘mysqlfailover’ and ‘mysqlrpladmin’ utilities.

These are two new MySQL replication utilities (results of the new Global Transaction Identifiers (GTIDs) in MySQL 5.6).

Let me quote the MySQL 5.6 Replication article for both of these utilities:

mysqlfailover

“Provides continuous monitoring of the replication topology, enabling failover to a slave in the event of an outage on the master.

The default behavior is to promote the most up-to-date slave, based on the sets of GTIDs that have been applied, but the promotion policies are fully configurable. Therefore, a user can nominate a specific candidate slave to become the new master (i.e. because it is configured in a certain way or has better performing hardware), and the utility will automatically poll the other slaves to retrieve any events so far not replicated to the candidate slave’s relay log.

This ensures no replicated transactions are lost, even if the candidate is not the most current slave when failover is initiated.”

mysqlrpladmin

“If a user needs to take a master offline for scheduled maintenance, mysqlrpladmin can perform a switchover to a slave, defaulting to the most recent slave, or to a nominated candidate slave. It can also perform a manual failover in the event a master server has gone offline.

The utility can also be used to start and stop slaves, as well as provide slave discovery and basic monitoring, including

  • Status of the slave and its I/O and SQL threads;
  • Status of replication processing through the topology, including any lag or errors;
  • Configuration of slave promotion policies.

With either of these utilities, the user may register scripts to be called in the event of a slave promotion – for example to redirect an application to use the new master for writes and for reads that must always be consistent.”

The new MySQL Utilities are only included in MySQL Workbench, which you can download from:

http://www.mysql.com/downloads/workbench/

Again, I should note this functionality is included in 5.6, which is not GA, so I certainly don’t recommend this for production usage.

I’ve not had a chance to test these out yet myself, but Dr. Chuck Bell has posted some usage/results here, which I think you’ll find useful.

Hope this helps.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Multi-Site, Multi-Master MySQL Databases Made Easy with Tungsten — Webinar 9/22

Сентябрь 16th, 2011
Cross-site databases are the next challenge facing today's MySQL-based businesses. Continuent Tungsten enables multi-master with an innovative new architecture called System of Record that avoids data conflicts, ensures sites are ready for quick failover, and uses hardware resources efficiently.Join our webinar on Thursday, 9/22, to learn how Tungsten Enterprise enables System of Record with
PlanetMySQL Voting: Vote UP / Vote DOWN

Howto setup MySQL on a DRBD volume

Декабрь 8th, 2010
One more DRBD tutorial, this time I will describe howto setup MySQL with DRBD (Distributed Replicated Block Device). Purpose This document describes how to to setup a failover system with MySQL and DRBD (Distributed Replicated Block Device). Introduction In this tutorial we will setup two Debian Linux nodes with a DRBD volume. MySQL will be [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

Cache pre-loading on mysqld startup

Декабрь 1st, 2010

The following quirky dynamic SQL will scan each index of each table so that they’re loaded into the key_buffer (MyISAM) or innodb_buffer_pool (InnoDB). If you also use the PBXT engine which does have a row cache but no clustered primary key, you could also incorporate some full table scans.

To make mysqld execute this on startup, create /var/lib/mysql/initfile.sql and make it be owned by mysql:mysql

SET SESSION group_concat_max_len=100*1024*1024;
SELECT GROUP_CONCAT(CONCAT('SELECT COUNT(',column_name,') FROM ',table_schema,'.',table_name,' FORCE INDEX (',index_name,')') SEPARATOR ' UNION ALL ') INTO @sql FROM information_schema.statistics WHERE table_schema NOT IN ('information_schema','mysql') AND seq_in_index = 1;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET SESSION group_concat_max_len=@@group_concat_max_len;

and in my.cnf add a line in the [mysqld] block

init-file = /var/lib/mysql/initfile.sql

That’s all. mysql reads that file on startup and executes each line. Since we can do the whole select in a single (admittedly quirky) query and then use dynamic SQL to execute the result, we don’t need to create a stored procedure.

Of course this kind of simplistic “get everything” only really makes sense if the entire dataset+indexes fit in memory, otherwise you’ll want to be more selective. Still, you could use the above as a basis, perhaps using another table to provide a list of tables/indexes to be excluded – or if the schema is really stable, simply have a list of tables/indexes to be included instead of dynamically using information_schema.

Practical (albeit niche) application:

In a system with multiple slaves, adding in a new slave makes it start with cold caches, but since with loadbalancing it will pick up only some of the load it often works out ok. However, some environments have dual masters but the application is not able to do read/write splits to utilise slaves. In that case all the reads also go to the active master. Consequentially, the passive master will have relatively cold caches (only rows/indexes that have been updated will be in memory) so in case of a failover the amount of disk reads for the many concurrent SELECT queries will go through the roof – temporarily slowing the effective performance to a dismal crawl: each query takes longer with the required additional disk access so depending on the setup the server may even run out of connections which in turn upsets the application servers. It’d sort itself out but a) it looks very bad on the frontend and b) it may take a number of minutes.

The above construct prevents that scenario, and as mentioned it can be used as a basis to deal with other situations. Not many people know about the init-file option, so this is a nice example.

If you want to know how the SQL works, read on. The original line is very long so I’ll reprint it below with some reformatting:

SELECT GROUP_CONCAT(CONCAT(
  'SELECT COUNT(',column_name,')
          FROM ',table_schema,'.',table_name,
          ' FORCE INDEX (',index_name,')'
       ) SEPARATOR ' UNION ALL ')
  INTO @sql
  FROM information_schema.statistics
  WHERE table_schema NOT IN ('information_schema','mysql')
  AND seq_in_index = 1;

The outer query grabs each regular db/table/index/firstcol name that exists in the server, writing out a SELECT query that counts all not-NULL values of the indexed column (so it must scan the index), forcing that specific index. We then abuse the versatile and flexible GROUP_CONCAT() function to glue all those SELECTs together, with “UNION ALL” inbetween. The result is a single very long string, so we need to tweak the maximum allowed group_concat output beforehand to prevent truncation.


PlanetMySQL Voting: Vote UP / Vote DOWN

Quest for Resilience: Multi-DC Masters

Май 14th, 2010

This is a Request for Input. Dual MySQL masters with MMM in a single datacentre are in common use, and other setups like DRBD and of course VM/SAN based failover solutions are conceptually straightforward also. Thus, achieving various forms of resilience within a single data-centre is doable and not costly.

Doing the same across multiple (let’s for simplicity sake limit it to two) datacentres is another matter. MySQL replication works well across longer links, and it can use MySQL’s in-built SSL or tools like stunnel. Of course it needs to be kept an eye on, as usual, but since it’s asynchronous the latency between the datacentres is not a big issue (apart from the fact that the second server gets up-to-date a little bit later).

But as those who have tried will know, having a client (application server) connection to a MySQL instance in a remote data-centre is a whole other matter, latency becomes a big issue and is generally very noticeable on the front-end. One solution for that is to have application servers only connect to their “local” MySQL server.

So the question to you is, do you now have (or have you had in the past) a setup with MySQL masters in different datacentres, what did that setup look like (which additional tools and infra did you use for it), and what were your experiences (good and bad, solutions to issues, etc). I’m trying to gather additional expertise that might already be about, which can help us all. Please add your input! thanks


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL University session Oct 22: Dual Master Setups With MMM

Октябрь 22nd, 2009

This Thursday (October 22nd, 13:00 UTC), Walter Heck (of Open Query) will present Dual Master Setups With MMM. MMM (Multi-Master Replication Manager for MySQL) is a set of flexible scripts to perform monitoring/failover and management of MySQL master-master replication configurations (with only one node writable at any time). Session slides (PDF).

The toolset also has the ability to read balance standard master/slave configurations with any number of slaves, so you can use it to move virtual IP addresses around a group of servers depending on whether they are behind in replication. For more
information, see mysql-mmm.org.

For MySQL University sessions you point your browser here. You need a browser with a working Flash plugin. You may register for a Dimdim account, but you don’t have to.


PlanetMySQL Voting: Vote UP / Vote DOWN

Dogfood: making our systems more resilient

Сентябрь 18th, 2009

This is a “dogfood” type story (see below for explanation of the term)… Open Query has ideas on resilient architecture which it teaches (training) and recommends (consulting, support) to clients and the general public (blog, conferences, user group talks). Like many other businesses, when we first started we set up our infrastructure quickly and on the cheap, and it’s grown since. That’s how things grow naturally, and is as always a trade-off between keeping your business running and developing while also improving infrastructure (business processes and technical).

Quite a few months ago we also started investing (mostly time) in the technical infrastructure, and slowly moving the various systems across to new servers and splitting things up along the way. Around the same time, the main webserver frequently became unresponsive. I’ll spare you the details, we know what the problem was and it was predictable, but since it wasn’t our system there was only so much we could do. However, systems get dependencies over time and thus it was actually quite complicated to move. In fact, apart from our mail, the public website was the last thing we moved, and that was through necessity not desire.

Of course it’s best for a company when their public website works, it’s quite likely you have noticed some glitches in ours over time. Now running on the new infra, I happened to take a quick peek at our Google Analytics data, and noticed an increase in average traffic numbers of about 40%. Great big auch.

And I’m telling this, because I think it’s educational and the world is generally not served by companies keeping problems and mishaps secret. Nasties grow organically and without malicious intent, improvements are a step-wise process, all that… but in the end, the net results of improvements can be more amazing than just general peace of mind! And of course it’s very important to not just see things happen, but to actively work on those incremental improvements, ongoing.

Our new infra has dual master MySQL servers (no surprise there ;-) but based in separate data centres so that makes the setup a bit more complicated (MMM doesn’t deal with that setup). Other “new” components we use are lighttpd, haproxy, and Zimbra (new in the sense that our old external infra used different tech). Most systems (not all, yet) are redundant/expendable and run on a mix of Linode instances and our own machines. Doing these things for your own infra is particularly educational, it provides extra perspective. The result is, I believe, pretty decent. Failures generally won’t cause major disruption any more, if at all. Of course, it’s still work in progress.

Running costs of this “farm”? I’ll tell later, as I think it’s a good topic for a poll and I’m curious: how much do you spend on server infrastructure per month?

Background for non-Anglophones: “eating your own dogfood” refers to a company doing themselves what they’re recommending to their clients and in general. Also known as “leading by example”, but I think it’s also about trust and credibility. On the other hand, there’s the “dentist’s tooth-ache” which refers to the fact that doctors are their own worst patients ;-)


PlanetMySQL Voting: Vote UP / Vote DOWN

Failure scenarios and solutions in master-master replication

Август 31st, 2009

I’ve been thinking recently about the failure scenarios of MySQL replication clusters, such as master-master pairs or master-master-with-slaves. There are a few tools that are designed to help manage failover and load balancing in such clusters, by moving virtual IP addresses around. The ones I’m familiar with don’t always do the right thing when an irregularity is detected. I’ve been debating what the best way to do replication clustering with automatic failover really is.

I’d like to hear your thoughts on the following question: what types of scenarios require what kind of response from such a tool?

I can think of a number of failures. Let me give just a few simple examples in a master-master pair:

Problem: Query overload on the writable master makes mysqld unresponsive
Do nothing. Moving the queries to another server will cause cascading failures.
Problem: The writable master is completely unreachable
Fence the writable master and promote the standby master.
Problem: The writable master is reachable but unresponsive due to overload-induced swapping
Do nothing. Moving the load to another server will cause cascading failures.

I don’t want to bias the jury, so I’ll stop there and ask you to contribute your failure scenarios and what you think the correct action should be.

Related posts:

  1. MySQL replication breaks single-threaded limitation? It’s
  2. How to check MySQL replication integrity continually I have rec
  3. Pop quiz: how can one slave break another slave Suppose yo

Related posts brought to you by Yet Another Related Posts Plugin.


PlanetMySQL Voting: Vote UP / Vote DOWN