Archive for the ‘Replication’ Category

MySQL 5.6 Replication: FAQ

Май 23rd, 2012

On Wednesday May 16th, we ran a webinar to provide an overview of all of the new replication features and enhancements that are previewed in the MySQL 5.6 Development Release – including Global Transaction IDs, auto-failover and self-healing, multi-threaded, crash-safe slaves and more.

Collectively, these new capabilities enable MySQL users to scale for next generation web and cloud applications.

Attendees posted a number of great questions to the MySQL developers, serving to provide additional insights into how these new features are implemented. So I thought it would be useful to post those below, for the benefit of those unable to attend the live webinar (note, you can listen to the On-Demand replay which is available now).

Before getting to the Q&A, there are a couple of other resources that maybe useful to those wanting to learn more about Replication in MySQL 5.6

On-Demand webinar

Slides used during the webinar

For more detail on any of the features discussed below, be sure to check out the Developer Zone article: Replication developments in MySQL 5.6

So here is the Q&A from the event 

Multi-Threaded Slaves

The results from recent benchmarking of the Multi-Threaded Slave enhancement were discussed, prompting the following questions

Q. Going from 0 - 10 threads, did you notice any increase in IOwait on CPU?

A. In this case, yes. The actual amount depends on a number of factors: your hardware, query/transaction distribution across databases, server general and InnoDB specific configuration parameters.

Q. Will the multiple threads work on different transactions on the same database, or each thread works on a separate database?

A. Each worker thread works on separate databases (schemas).

Q. If I set the slave-parallel-workers to less than the number of databases, can I configure which databases use which worker threads?

A. There is no such configuration option to assign a certain Worker thread to a database. Configuring slave-parallel-workers to less than the number of databases is a good setup. A Worker can handle multiple databases.

Q. If I create 4 threads and I have 100 databases, can I configure which databases use which threads?

A. There won't be 1 worker to a mapping of exactly 25 databases, but it will be very close to that type of distribution.

Q. Thank You. I ask as we have about 8 databases that have a lot of transactions and about 30 that are used less frequently. It would be nice to have the ability to create 9 workers, 1 for each if the heavy databases and 1 for all the others

A. The current design enables relatively equal distribution of transactions across your databases, but it could be that 9 workers will fit anyway.

Q. Does multi-thread slave still work if there is foreign key across database?

A. MTS preserves slave consistency *within* a database, but not necessarily *between* databases. With MTS enabled and replication ongoing, updates to one database can be executed before updates to another causing them to be temporarily out of sync with each other.

Q. How is auto-increment managed with the multi-thread slave?

A. MTS makes sure Worker threads do not execute concurrently on the same table. Auto-increment is guaranteed to be handled in the same way as the single threaded "standard" slave handles auto-increment

Q. Can you use semi-synchronous replication on one of the slaves when using MTS?

A. Semi-sync is friendly to MTS. MTS is about parallel execution - so they cooperate well.

Optimized Row Based Replication

Q. If you only store the PK column in the Before Image for updates, does this mean you don't care about the slave's data potentially being out-of-sync? Will we be able to control how much data is stored in the binary logs?

A. The rule is that we ship a *PK equivalent* so that the slave is always able to find a row. This means:
  1. if master table has PK, then we ship the PK only
  2. if master table does not have PK, then we ship the entire row

Global Transaction IDs and HA Utilities

Q. Would the failover utility need to sit on a 3rd party host to allow arbitration?

A. The utility would typically run on a client, not on the hosts it is monitoring

Q. Can you explain the upgrade process to move to MySQL 5.6? I am assuming that the slave(s) are upgraded first and that replication would be backwards compatible. And after the slave(s) are upgraded, the master would be next. But then how do you turn on the GTID?

A. Right: the slave(s) are upgraded first. After upgrading Slaves they would be started with --gtid-mode=on (technically a couple of other options are needed as well). And then the same process would be followed for the upgraded Master.

Q. For failover functionality, if I had a setup like this: Server1 replicates to Server2, Server2 replicates to Server3, Server4, and Server5. If Server2 were to fail, can I have it configured so that Server1 can become the new master for Server3/4/5?

A. Yes - you can either configure the failover to the most recent slave based on GTID, or list specific candidates. What will happen is that Slave 1 will temporarily become a slave of 3, 4 and 5 to ensure it replicates any more recent transactions those slaves may have, and then it will become the master

Replication Event Checksums

Q. What is the overhead of replication checksums?

A. It is minimal - we plan to publish benchmarks over the summer to better characterize any overhead

Q. Are you exposing the checksum to the plugin api so we can add our own checksum types?

A. We prepared some of interfaces, i.e. checksum methods are identified by a specific code byte (1-127) values. But it's not really a plugin at this point.

Q. Do checksums verify both replicated data and the data on slave?

A. Yes - checksums are implemented across the entire path - so you can check for issues in replication itself, or in the hardware or network

Q. I think, it is better to turn on checksums at the relaylog to avoid overhead on the master, but if we do that and checksum fails (i.e. not matching the master's data) then what happens – will the slave throw an error

A. I agree, it's better to relax the Master, which verifies the checksum only optionally when the Dump Thread reads from the binlog prior to the replication event being sent out to the Slave. The slave mandatorily checks the checksummed events when they are sent across the network, and optionally when they are read from Relay-log. In either case, an error is thrown.

Q. Are checksums optional? In some cases we don't care for huge data loads

A. Yes, checksums are optional.

Time Delayed Replication

Q. Is Time delayed replication applied at the database level only and not for the entire slave?

A. Applied for the slave as execution is global.

Informational Log Events

Q. When we configure information log event, does it show meaningful query log for any binlog format? (Row based especially)

A. When using row based replication, you get the original query in human readable format... obviously you don’t want to see all the rows modified in a table of size, which can be huge

Q. Will the binlog include user id?

A. User id is replicated in some cases for Query-log-event - namely user id of the invoker when a stored routine is called.

Remote Binlog Backup

Q. What level of access does the remote binlog backup need?

A. Replication slave

Summary

As you can see, our Engineering team was kept busy with questions over the course of the webinar. Be sure to check out the MySQL 5.6 Replication webinar replay and if you have further questions, please don’t hesitate to use the comments below!


PlanetMySQL Voting: Vote UP / Vote DOWN

Webinar 5/17: MySQL High Availability Realized

Май 10th, 2012
High availability is about more than just making sure that applications can get to your data, even if there is a failure: How about when you are upgrading your database schema What if you need to add memory to a database server or reconfigure/restart MySQL If your apps want to read data from a MySQL slave, how can you be sure they are not reading stale data without re-coding your apps What
PlanetMySQL Voting: Vote UP / Vote DOWN

If You *Must* Deploy Multi-Master Replication, Read This First

Апрель 30th, 2012
An increasing number of organizations run applications that depend on MySQL multi-master replication between remote sites.   I have worked on several such implementations recently.  This article summarizes the lessons from those experiences that seem most useful when deploying multi-master on existing as well as new applications.

Let's start by defining terms.  Multi-master replication means that applications update the same tables on different masters, and the changes replicate automatically between those masters.  Remote sites mean that the masters are separated by a wide area network (WAN), which implies high average network latency of 100ms or more.  WAN network latency is also characterized by a long tail, ranging from seconds due to congestion to hours or even days if a ship runs over the wrong undersea cable.

With the definitions in mind we can proceed to the lessons.  The list is not exhaustive but includes a few insights that may not be obvious if you are new to multi-master topologies.  Also, I have omitted issues like monitoring replication, using InnoDB to make slaves crash-safe, or provisioning new nodes.  If you use master/slave replication, you are likely familiar with these topics already.

1. Use the Right Replication Technology and Configure It Properly

The best overall tool for MySQL multi-master replication between sites is Tungsten.  The main reason for this assertion is that Tungsten uses a flexible, asynchronous, point-to-point, master/slave replication model that handles a wide variety of topologies such as star replication or all-to-all.  Even so, you have to configure Tungsten properly.  The following topology is currently my favorite:
  • All-to-all topology.  Each master replicates directly to every other master.  This handles prolonged network outages or replication failures well, because one or more masters can drop out without breaking  replication between the remaining masters or requiring reconfiguration.  When the broken master(s) return, replication just resumes on all sides.  All-to-all does not work well if you have a large number of masters.  
  • Updates are not logged on slaves.  This keeps master binlogs simple, which is helpful for debugging, and eliminates the possibility of loops.  It also requires some extra configuration if the masters have their own slaves, as would be the case in a Tungsten Enterprise cluster
There are many ways to set up multi-master replication replication, and the right choice varies according to the number of masters, whether you have local clustering, or other considerations.  Giuseppe Maxia has described many topologies, for example here, and the Tungsten Cookbook has even more details.

One approach you should approach with special caution is MySQL circular replication.  In topologies of three or more nodes, circular replication results in broken systems if one of the masters fails.  Also, you should be wary of any kind of synchronous multi-master replication across sites that are separated by more than 50 kilometers (i.e. 1-2ms latency).  Synchronous replication makes a siren-like promise of consistency but the price you pay is slow performance under normal conditions and broken replication when WAN links go down.

2. Use Row-Based Replication to Avoid Data Drift

Replication depends on deterministic updates--a transaction that changes 10 rows on the original master should change exactly the same rows when it executes against a replica.  Unfortunately many SQL statements that are deterministic in master/slave replication are non-deterministic in multi-master topologies.  Consider the following example, which gives a 10% raise to employees in department #35.

   UPDATE emp SET salary = salary * 1.1 WHERE dep_id = 35;

If all masters add employees, then the number of employees who actually get the raise will vary depending on whether such additions have replicated to all masters.  Your servers will very likely become inconsistent with statement replication.  The fix is to enable row-based replication using binlog-format=row in my.cnf.  Row replication transfers the exact row updates from each master to the others and eliminates ambiguity.

3. Prevent Key Collisions on INSERTs

For applications that use auto-increment keys, MySQL offers a useful trick to ensure that such keys do not  collide between masters using the auto-increment-increment and auto-increment-offset parameters in my.cnf.  The following example ensures that auto-increment keys start at 1 and increment by 4 to give values like 1, 5, 9, etc. on this server.

server-id=1
auto-increment-offset = 1
auto-increment-increment = 4

This works so long as your applications use auto-increment keys faithfully.  However, any table that either does not have a primary key or where the key is not an auto-increment field is suspect.  You need to hunt them down and ensure the application generates a proper key that does not collide across masters, for example using UUIDs or by putting the server ID into the key.   Here is a query on the MySQL information schema to help locate tables that do not have an auto-increment primary key. 

SELECT t.table_schema, t.table_name 
  FROM information_schema.tables t 
    WHERE NOT EXISTS 
      (SELECT * FROM information_schema.columns c
       WHERE t.table_schema = c.table_schema  
         AND t.table_name = c.table_name
         AND c.column_key = 'PRI'
         AND c.extra = 'auto_increment')

4. Beware of Semantic Conflicts in Applications

Neither Tungsten nor MySQL native replication can resolve conflicts, though we are starting to design this capability for Tungsten.  You need to avoid them in your applications.  Here are a few tips as you go about this.

First, avoid obvious conflicts.  These include inserting data with the same keys on different masters (described above), updating rows in two places at once, or deleting rows that are updated elsewhere.  Any of these can cause errors that will break replication or cause your masters to become out of sync.  The good news is that many of these problems are not hard to detect and eliminate using properly formatted transactions.  The bad news is that these are the easy conflicts.  There are others that are much harder to address.  

For example, accounting systems need to generate unbroken sequences of numbers for invoices.  A common approach is to use a table that holds the next invoice number and increment it in the same transaction that creates a new invoice.  Another accounting example is reports that need to read the value of accounts consistently, for example at monthly close.  Neither example works off-the-shelf in a multi-master system with asynchronous replication, as they both require some form of synchronization to ensure global consistency across masters.  These and other such cases may force substantial application changes.  Some applications simply do not work with multi-master topologies for this reason. 

5. Remove Triggers or Make Them Harmless

Triggers are a bane of replication.  They conflict with row replication if they run by accident on the slave.  They can also create strange conflicts due to weird behavior/bugs (like this) or other problems like needing definer accounts present.  MySQL native replication turns triggers off on slaves when using row replication, which is a very nice feature that prevents a lot of problems.  

Tungsten on the other hand cannot suppress slave-side triggers.  You must instead alter each trigger to add an IF statement that prevents the trigger from running on the slave.  The technique is described in the Tungsten Cookbook.  It is actually quite flexible and has some advantages for cleaning up data because you can also suppress trigger execution on the master.  

You should regard all triggers with suspicion when moving to multi-master.  If you cannot eliminate triggers, at least find them, look at them carefully to ensure they do not generate conflicts, and test them very thoroughly before deployment.  Here's a query to help you hunt them down: 

SELECT trigger_schema, trigger_name 
  FROM information_schema.triggers;

6. Have a Plan for Sorting Out Mixed Up Data

Master/slave replication has its discontents, but at least sorting out messed up replicas is simple: re-provision from another slave or the master.  No so with multi-master topologies--you can easily get into a situation where all masters have transactions you need to preserve and the only way to sort things out is to track down differences and update masters directly.   Here are some thoughts on how to do this.
  1. Ensure you have tools to detect inconsistencies.  Tungsten has built-in consistency checking with the 'trepctl check' command.  You can also use the Percona Toolkit pt-table-checksum to find differences.  Be forewarned that neither of these works especially well on large tables and may give false results if more than one master is active when you run them.  
  2. Consider relaxing foreign key constraints.  I love foreign keys because they keep data in sync.  However, they can also create problems for fixing messed up data, because the constraints may break replication or make it difficult to go table-by-table when synchronizing across masters.  There is an argument for being a little more relaxed in multi-master settings. 
  3. Switch masters off if possible.  Fixing problems is a lot easier if you can quiesce applications on all but one master.  
  4. Know how to fix data.  Being handy with SQL is very helpful for fixing up problems.  I find SELECT INTO OUTFILE and LOAD DATA INFILE quite handy for moving changes between masters.  Don't forget SET SESSION LOG_FILE_BIN=0 to keep changes from being logged and breaking replication elsewhere.  There are also various synchronization tools like pt-table-sync, but I do not know enough about them to make recommendations.  
At this point it's probably worth mentioning commercial support.  Unless you are a replication guru, it is very comforting to have somebody to call when you are dealing with messed up masters.  Even better, expert advice early on can help you avoid problems in the first place.

(Disclaimer:  My company sells support for Tungsten so I'm not unbiased.  That said, commercial outfits really earn their keep on problems like this.)

7. Test Everything

Cutting corners on testing for multi-master can really hurt.  This article has described a lot of things to look for, so put together a test plan and check for them.  Here are a few tips on procedure:
  1. Set up a realistic pre-prod test with production data snapshots.  
  2. Have a way to reset your test environment quickly from a single master, so you can get back to a consistent state to restart testing. 
  3. Run tests on all masters, not just one.  You never know if things are properly configured everywhere until you try. 
  4. Check data consistency after tests.  Quiesce your applications and run a consistency check to compare tables across masters. 
It is tempting to take shortcuts or slack off, so you'll need to find ways to improve your motivation.  If it helps, picture yourself explaining to the people you work for why your DBMS servers have conflicting data with broken replication, and the problem is getting worse because you cannot take applications offline to fix things.  It is a lot easier to ask for more time to test.  An even better approach is to hire great QA people and give them time to do the job right.

Summary

Before moving to a multi-master replication topology you should ask yourself whether the trouble is justified.  You can get many of the benefits of multi-master with system-of-record architectures with a lot less heartburn.  That said, an increasing number of applications do require full multi-master across multiple sites.  If you operate one of them, I hope this article is helpful in getting you deployed or improving what you already have.

Tungsten does a pretty good job of multi-master replication already, but I am optimistic we can make it much better.  There is a wealth of obvious features around conflict resolution, data repair, and up-front detection of problems that will make life better for Tungsten users and reduce our support load.  Plus I believe we can make it easier for developers to write applications that run on multi-master DBMS topologies.  You will see more about how we do this in future articles on this blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

If You *Must* Deploy Multi-Master Replication, Read This First

Апрель 30th, 2012
An increasing number of organizations run applications that depend on MySQL multi-master replication between remote sites.   I have worked on several such implementations recently.  This article summarizes the lessons from those experiences that seem most useful when deploying multi-master on existing as well as new applications.

Let's start by defining terms.  Multi-master replication means that applications update the same tables on different masters, and the changes replicate automatically between those masters.  Remote sites mean that the masters are separated by a wide area network (WAN), which implies high average network latency of 100ms or more.  WAN network latency is also characterized by a long tail, ranging from seconds due to congestion to hours or even days if a ship runs over the wrong undersea cable.

With the definitions in mind we can proceed to the lessons.  The list is not exhaustive but includes a few insights that may not be obvious if you are new to multi-master topologies.  Also, I have omitted issues like monitoring replication, using InnoDB to make slaves crash-safe, or provisioning new nodes.  If you use master/slave replication, you are likely familiar with these topics already.

1. Use the Right Replication Technology and Configure It Properly

The best overall tool for MySQL multi-master replication between sites is Tungsten.  The main reason for this assertion is that Tungsten uses a flexible, asynchronous, point-to-point, master/slave replication model that handles a wide variety of topologies such as star replication or all-to-all.  Even so, you have to configure Tungsten properly.  The following topology is currently my favorite:
  • All-to-all topology.  Each master replicates directly to every other master.  This handles prolonged network outages or replication failures well, because one or more masters can drop out without breaking  replication between the remaining masters or requiring reconfiguration.  When the broken master(s) return, replication just resumes on all sides.  All-to-all does not work well if you have a large number of masters.  
  • Updates are not logged on slaves.  This keeps master binlogs simple, which is helpful for debugging, and eliminates the possibility of loops.  It also requires some extra configuration if the masters have their own slaves, as would be the case in a Tungsten Enterprise cluster
There are many ways to set up multi-master replication replication, and the right choice varies according to the number of masters, whether you have local clustering, or other considerations.  Giuseppe Maxia has described many topologies, for example here, and the Tungsten Cookbook has even more details.

One approach you should approach with special caution is MySQL circular replication.  In topologies of three or more nodes, circular replication results in broken systems if one of the masters fails.  Also, you should be wary of any kind of synchronous multi-master replication across sites that are separated by more than 50 kilometers (i.e. 1-2ms latency).  Synchronous replication makes a siren-like promise of consistency but the price you pay is slow performance under normal conditions and broken replication when WAN links go down.

2. Use Row-Based Replication to Avoid Data Drift

Replication depends on deterministic updates--a transaction that changes 10 rows on the original master should change exactly the same rows when it executes against a replica.  Unfortunately many SQL statements that are deterministic in master/slave replication are non-deterministic in multi-master topologies.  Consider the following example, which gives a 10% raise to employees in department #35.

   UPDATE emp SET salary = salary * 1.1 WHERE dep_id = 35;

If all masters add employees, then the number of employees who actually get the raise will vary depending on whether such additions have replicated to all masters.  Your servers will very likely become inconsistent with statement replication.  The fix is to enable row-based replication using binlog-format=row in my.cnf.  Row replication transfers the exact row updates from each master to the others and eliminates ambiguity.

3. Prevent Key Collisions on INSERTs

For applications that use auto-increment keys, MySQL offers a useful trick to ensure that such keys do not  collide between masters using the auto-increment-increment and auto-increment-offset parameters in my.cnf.  The following example ensures that auto-increment keys start at 1 and increment by 4 to give values like 1, 5, 9, etc. on this server.

server-id=1
auto-increment-offset = 1
auto-increment-increment = 4

This works so long as your applications use auto-increment keys faithfully.  However, any table that either does not have a primary key or where the key is not an auto-increment field is suspect.  You need to hunt them down and ensure the application generates a proper key that does not collide across masters, for example using UUIDs or by putting the server ID into the key.   Here is a query on the MySQL information schema to help locate tables that do not have an auto-increment primary key. 

SELECT t.table_schema, t.table_name 
  FROM information_schema.tables t 
    WHERE NOT EXISTS 
      (SELECT * FROM information_schema.columns c
       WHERE t.table_schema = c.table_schema  
         AND t.table_name = c.table_name
         AND c.column_key = 'PRI'
         AND c.extra = 'auto_increment')

4. Beware of Semantic Conflicts in Applications

Neither Tungsten nor MySQL native replication can resolve conflicts, though we are starting to design this capability for Tungsten.  You need to avoid them in your applications.  Here are a few tips as you go about this.

First, avoid obvious conflicts.  These include inserting data with the same keys on different masters (described above), updating rows in two places at once, or deleting rows that are updated elsewhere.  Any of these can cause errors that will break replication or cause your masters to become out of sync.  The good news is that many of these problems are not hard to detect and eliminate using properly formatted transactions.  The bad news is that these are the easy conflicts.  There are others that are much harder to address.  

For example, accounting systems need to generate unbroken sequences of numbers for invoices.  A common approach is to use a table that holds the next invoice number and increment it in the same transaction that creates a new invoice.  Another accounting example is reports that need to read the value of accounts consistently, for example at monthly close.  Neither example works off-the-shelf in a multi-master system with asynchronous replication, as they both require some form of synchronization to ensure global consistency across masters.  These and other such cases may force substantial application changes.  Some applications simply do not work with multi-master topologies for this reason. 

5. Remove Triggers or Make Them Harmless

Triggers are a bane of replication.  They conflict with row replication if they run by accident on the slave.  They can also create strange conflicts due to weird behavior/bugs (like this) or other problems like needing definer accounts present.  MySQL native replication turns triggers off on slaves when using row replication, which is a very nice feature that prevents a lot of problems.  

Tungsten on the other hand cannot suppress slave-side triggers.  You must instead alter each trigger to add an IF statement that prevents the trigger from running on the slave.  The technique is described in the Tungsten Cookbook.  It is actually quite flexible and has some advantages for cleaning up data because you can also suppress trigger execution on the master.  

You should regard all triggers with suspicion when moving to multi-master.  If you cannot eliminate triggers, at least find them, look at them carefully to ensure they do not generate conflicts, and test them very thoroughly before deployment.  Here's a query to help you hunt them down: 

SELECT trigger_schema, trigger_name 
  FROM information_schema.triggers;

6. Have a Plan for Sorting Out Mixed Up Data

Master/slave replication has its discontents, but at least sorting out messed up replicas is simple: re-provision from another slave or the master.  No so with multi-master topologies--you can easily get into a situation where all masters have transactions you need to preserve and the only way to sort things out is to track down differences and update masters directly.   Here are some thoughts on how to do this.
  1. Ensure you have tools to detect inconsistencies.  Tungsten has built-in consistency checking with the 'trepctl check' command.  You can also use the Percona Toolkit pt-table-checksum to find differences.  Be forewarned that neither of these works especially well on large tables and may give false results if more than one master is active when you run them.  
  2. Consider relaxing foreign key constraints.  I love foreign keys because they keep data in sync.  However, they can also create problems for fixing messed up data, because the constraints may break replication or make it difficult to go table-by-table when synchronizing across masters.  There is an argument for being a little more relaxed in multi-master settings. 
  3. Switch masters off if possible.  Fixing problems is a lot easier if you can quiesce applications on all but one master.  
  4. Know how to fix data.  Being handy with SQL is very helpful for fixing up problems.  I find SELECT INTO OUTFILE and LOAD DATA INFILE quite handy for moving changes between masters.  Don't forget SET SESSION LOG_FILE_BIN=0 to keep changes from being logged and breaking replication elsewhere.  There are also various synchronization tools like pt-table-sync, but I do not know enough about them to make recommendations.  
At this point it's probably worth mentioning commercial support.  Unless you are a replication guru, it is very comforting to have somebody to call when you are dealing with messed up masters.  Even better, expert advice early on can help you avoid problems in the first place.

(Disclaimer:  My company sells support for Tungsten so I'm not unbiased.  That said, commercial outfits really earn their keep on problems like this.)

7. Test Everything

Cutting corners on testing for multi-master can really hurt.  This article has described a lot of things to look for, so put together a test plan and check for them.  Here are a few tips on procedure:
  1. Set up a realistic pre-prod test with production data snapshots.  
  2. Have a way to reset your test environment quickly from a single master, so you can get back to a consistent state to restart testing. 
  3. Run tests on all masters, not just one.  You never know if things are properly configured everywhere until you try. 
  4. Check data consistency after tests.  Quiesce your applications and run a consistency check to compare tables across masters. 
It is tempting to take shortcuts or slack off, so you'll need to find ways to improve your motivation.  If it helps, picture yourself explaining to the people you work for why your DBMS servers have conflicting data with broken replication, and the problem is getting worse because you cannot take applications offline to fix things.  It a lot easier to ask for more time to test.  An even better approach is to hire great QA people and give them time to do the job right.

Summary

Before moving to a multi-master replication topology you should ask yourself whether the trouble is justified.  You can get many of the benefits of multi-master with system-of-record architectures with a lot less heartburn.  That said, an increasing number of applications do require full multi-master across multiple sites.  If you operate one of them, I hope this article is helpful in getting you deployed or improving what you already have.

Tungsten does a pretty good job of multi-master replication already, but I am optimistic we can make it much better.  There is a wealth of obvious features around conflict resolution, data repair, and up-front detection of problems that will make life better for Tungsten users and reduce our support load.  Plus I believe we can make it easier for developers to write applications that run on multi-master DBMS topologies.  You will see more about how we do this in future articles on this blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

Initial Reactions to MySQL 5.6

Апрель 28th, 2012

New versions of MySQL are always interesting to try out. Often they have features which I may have asked for myself so it’s satisfying to see them eventually appear on a system I use. Often other new features make life easier for the DBA. Finally we hope overall performance will improve and managing the server(s) will be come easier.

So I had a system which needs to make heavy writes, and performance was a problem, even when writing to SSDs. Checkpointing seemed to be the big issue and the ib_logfile size in MySQL 5.5 is limited to 4 GB. That seems a lot, but once MySQL starts to fill these files (and this happens at ~70% of the total I believe),  checkpointing kicks in heavily, and slows things down.  So the main reason for trying out MySQL 5.6 was to see how things performed with larger ib_logfiles. (Yes, MariaDB 5.5 can do this too.)

Things improved a lot for my specific workload which was great news, but one thing which I noticed was that initial install / setup time of a system with these large files increased a huge amount. (I was using 2 x 16000M files.)  That was noticeable even on a box with SSDs. On a box with HDs that would be horrible.

mysql_install_db now uses innodb to setup some system things and restarts the mysqld binary a couple of times. When you also have a large innodb_buffer_pool (I was trying on a box with 192GB of RAM) the startup of mysqld each time took quite a while.

So initial thoughts are that the bootstrapping probably does not need to use the normal innodb_buffer_pool_size setting (but should perhaps overwrite it with the minimal size needed).

Managing the size of the ib_logfiles is also something that requires restarting mysqld. Given the sort of experimenting that I was doing it would be nice to be able to dynamically configure this.  I can see many issues with changing the sizes of the existing files, but what seems easier to implement would be to be able to increase or decrease the number of files used (removing the older, higher number files once they are “used”/checkpointed), and that would provide a lot more freedom to the DBA.

I clone servers quite a bit and the extra 32 GB of these redo log files is a pain to clone/copy, and time consuming, so being able to reduce the number of files, and checkpoint prior to shutting the server down for cloning, but then re-enabling the normal value afterwards would save a lot of time cloning. So would be a handy feature.

I also tried playing with the new GTID features.  They look great. Except for one thing.  This expects master and slaves to have binlogs enabled, so that the GTID values are kept locally, and on startup.  I manage several boxes where for performance reasons the slaves do not have binlogs enabled.  If it dies the slave gets recloned.  Adding back the requirement for binlogs ( and log_slave_updates ) on a slave to enable this feature seems wrong. I can see the need that the different GTID values are kept somewhere, but don’t see the need to keep all binlog information, at least on a slave. Given the ratio of slaves to masters can be quite high that can be an issue.  If necessary write this information somewhere else, why not in an InnoDB table, so that on sever recovery you have consistent information with the rest of the database, something that might not happen with the binlogs… ?

Talking of binlogs one new feature I’m using, which I think is good is: binlog_row_image = minimal. This reduces the size of the binlogs and thus I/O.  I’m missing the next step which would be to compress those binlog events, and reduce binlog size further. That may not be an issue on many systems but several servers I manage write over 100 GB of binlogs a day. Reducing this further by being able to compress the binlogs would be nice, and having better control of expire_logs_days too (as the daily granularity can be too large in many cases for me) would help.

NOTE: In MySQL 5.6.5 with gtid_mode = ON , mysql_install_db crashes! Bug reported to Oracle, and I guess will get fixed soon.

I have seen a few other crashes in 5.6.4 and 5.6.5, I can’t provide links unfortunately as they’re not public. They are still not resolved.  I’d like to make the details of these crashes public as others may come across them, or have come across them before me, but these issues are not visible in http://bugs.mysql.com. Would be nice if they were but requires me duplicating information which is a pain.  Oracle, please make this easier. It is good for us all.

So with the little testing I’ve done so far MySQL 5.6 looks good. I need to spend more time with it and investigate the new features, many of which will make life easier.

When it finally goes GA I will have to spend a lot more time seeing how it behaves but I believe there are still a few issues which still need resolving prior to that happening.


PlanetMySQL Voting: Vote UP / Vote DOWN

Replication Is Bad for MySQL Temp Tables

Апрель 23rd, 2012
Experienced MySQL DBAs know that temp tables cause major problems for MySQL replication.  It turns out the converse is also true:  replication can cause major problems for temporary tables.

In a recent customer engagement we enabled Tungsten Replicator with a MySQL application that originally ran on a server that did not use replication.  QA promptly discovered reports that previously ran in 10 seconds were now running in as many minutes.  It turned out that the reports used temp tables to assemble data, and these were being written into the master binlog.  This created bloated binlogs and extremely slow reports.  We fixed the problem by enabling row replication (i.e., binlog-format=row in my.cnf).

A common DBA response to temp table problems is to try to eliminate them completely, as suggested in the excellent High Performance MySQL, 3rd Edition. (See p. 502.)  Elimination is a good philosophy when applications use temp tables to generate updates.  However, it does not work for reporting.  Temp tables allow you to stage data for complex reports across a series of transactions, then pull the final results into a report writer like JasperReports.  This modular approach is easy to implement and maintain afterwards.  Eliminating temp tables in such cases can create an unmaintainable mess.

The real solution with report temp tables is to keep them out of the master binlog.  Here is a list of common ways to do so.  Let me know if you know others.

* Turn off binlog updates.  Issue 'SET SESSION SQL_LOG_BIN=0' when generating reports.  The downside is that it requires SUPER privilege to set.  Also, if you make a code mistake and update normal tables with this setting, your changes will not be replicated.

* Use a non-replicated database.  Configure the master my.cnf with binlog-ignore-db as follows to ignore any update (including on temp tables) that is issued when database 'scratch' is the default database:

binlog_ignore_db = scratch

This approach does not require special privileges.  However coding errors or connection pool misconfigurations are obvious liabilities.  Your application must either connect to the scratch database or issue an explicit use command. Otherwise, temp table operations will be logged, as in the following example:

use not_scratch;
create temporary table scratch.report1_temp(name varchar(256), entry_time date, exit_time date);

* Use a slave with the binlog disabled.  Remove the log-bin option from my.cnf.  This works well if you have extra reporting slaves that are caught up.  However, it may not work if the reports must be fully up-to-date or you need the ability to promote the slave quickly to a master, in which case the binlog must be enabled.  

* Use row replication.  You can set row replication at the session level using 'SET SESSION binlog_format=row', which requires SUPER privilege, or overall by setting binlog-format in my.cnf.  In this case CREATE TEMPORARY TABLE and updates on temp tables do not appear in the binlog at all.  The downside of enabling row replication fully is that it can lead to bloated logs and blocked servers if you have very large transactions.  SQL operations like DELETE that affect multiple rows are stored far more compactly in statement replication.  Also, reloading mysqldump files can be very slow in row replication compared to statement replication, which can handle block inserts generated by the --extended-insert option.

The proper solution to keep replication from hurting your use of temp tables will vary depending on your application as well as the way you run your site.  For my money, though, this is a good example of where row replication really helps and deserves a closer look.  

MySQL could use some feature improvements in the area of temp tables and replication.  I find it surprising that mixed mode replication does not fully suppress temp table binlog updates.  Only row replication does so.   Second, it would be great to have a CREATE TABLE option to suppress logging particular tables to the binlog.  This would allow applications to make the logging decision at schema design time.  Finally, global options to suppress binlogging of specific table types, such as temp tables and MEMORY tables would be useful.  Perhaps we will see some of these in future MySQL releases.  


PlanetMySQL Voting: Vote UP / Vote DOWN

Percona MySQL Conference and Expo Week in Review

Апрель 18th, 2012

Thanks to all of those who came by our booth and to see Leif’s presentation on Read Optimization, and to my Lightning Talk on OLTP and OLAP at the Percona MySQL Conference and Expo. It was an incredible week and a great place to launch TokuDB v6.0 from! A big thanks to Percona for a great event, to Pythian for a fantastic dinner, and to SkySQL for a worthwhile follow on. We are also very grateful to Network World for giving us a product of the week award, and to Bloor Research for an insightful review of TokuDB v6.0.

Mr. Bill Gets Hammered by Big Data

For those who missed it, here is a copy of Leif’s presentation with a good photo from Percona. Thanks to Sheeri for her tweet as well. In addition, here is a copy of my Lightning Talk (in case you were too distracted by Mr.Bill). There were some great photos taken by Mark Lehmann (including the one shown above, and those in the “Scanner Wars“) as well as Percona. Thanks to Erin,  SheeriAmrith and Ernie for their tweets too!

I considered a detailed conference review, but others have already captured the event so well that there was little to add. In case you missed it, there are great write-ups by O’Reilly, Percona, Shlomi, and several others.

Thanks again to those who came by!

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Why a statement can be unsafe when it uses LIMIT clause?

Апрель 17th, 2012

MySQL 5.1 or newer can sometimes start throwing a strange message into an error log. The message states that a query was unsafe for binary logging along with some additional information. What does it mean? Is it a problem?

From time to time you might spot MySQL error log filling with the following warning:

“[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT clause. This is unsafe because the set of rows included cannot be predicted. Statement: DELETE FROM score WHERE user_id = 12345 AND created = ’2012-04-15′ LIMIT 1″

If binary logging is enabled and the log format is set to STATEMENT, MySQL generates such message when it considers that a query is ambiguous and could behave differently each time it executes against the same data set. Such situation could happen, for example, on a replication slave, or on a restored backup where binary logs are used for point-in-time recovery.

Why? The reason lies in the relational database theory, which defines table as a set of rows, while a set does not assume any particular order. What if there were a few rows matching user_id = 12345 and created = '2012-04-15'? Well, in theory, such DELETE statement could read and remove a different row each time, because if all share the same values in user_id and created, how to decide which one comes first or last? Of course, a database implementation, such as MySQL, typically relies on some implicit order even if the theory behind it does not. But then again, there may be a different problem. Even if MySQL uses some implicit sort order for storing rows, does it guarantee the order is always the same? No.

On master and slave rows can be stored in different order

Two rows appear in different order on Master and Slave. The two databases are still identical from the SQL point of view, although the way they organized data is different. Now, if our statement started deleting rows from the top, it would remove two different rows from each instance and the two databases would drift out of sync. This is what MySQL is warning you about by saying This is unsafe because the set of rows included cannot be predicted. The problem does not exist when there is no LIMIT clause, because then MySQL simply removes all rows matching the criteria and it does not matter to the final result in which order it does that.

So is seeing such warning message a problem? Definitely not, if binary logs aren’t used for anything – not replication, not recovery. Otherwise the answer would have to be yes, although it does not imply there is anything wrong with the data already. And although chances are that nothing bad will happen even when MySQL keeps complaining, you should never rely on luck – fix the problem.

The real option to address this is moving to either MIXED or ROW format for binary logging, which is defined in binlog_format MySQL option. And what is cool, it can also be set at runtime with SET (for current session only) or SET GLOBAL (globally), so no restart is even necessary. However don’t rush to change it just yet. Be sure to read and understand what the differences between all these formats are first as making this change might break something that your applications rely on. But in the end, although it seems like a nasty workaround, it’s even possible to use ROW format for just one specific query by using the following sequence:

SET SESSION binlog_format = ROW;
DELETE ...;
SET SESSION binlog_format = STATEMENT;

An alternative approach to alleviate the potential problem could increasing the chances the same row gets chosen, however it can only be done under the assumption that all instances contain identical data.

To make the example a little bit more complete, here is the query again along with the underlying data structure:

DELETE FROM score WHERE user_id = 12345 AND created = '2012-04-15' LIMIT 1

CREATE TABLE `score` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`created` date NOT NULL,
`score` int(11) NOT NULL,
[..]
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`, `created`)
)

If data on master and slave is identical, to prevent potential problems, one could use ORDER BY on a unique column, such as id for example. That could help ensuring that such query would always pick the youngest or the oldest row out of the matching ones. There should be no confusion anymore as the order would be always the same regardless of the circumstances. Again, this may fail to work if some instances involved are not identical to others (e.g. master keeps only recent data, while some slave holds the full archive).

The modified query would look like this:

DELETE FROM score WHERE user_id = 12345 AND created = '2012-04-15' ORDER BY id ASC LIMIT 1

The warning message discussed in this articles isn’t necessarily a nasty problem that needs urgent attention, however it might be a sign of problems ahead. It is definitely worth an investigation to verify whether it may or may not be causing any damage to data. A fix is relatively trivial, although it should be done carefully and with analysis of the possible consequences of implementing the changes.


PlanetMySQL Voting: Vote UP / Vote DOWN

Why a statement can be unsafe when it uses LIMIT clause?

Апрель 17th, 2012

MySQL 5.1 or newer can sometimes start throwing a strange message into an error log. The message states that a query was unsafe for binary logging along with some additional information. What does it mean? Is it a problem?

From time to time you might spot MySQL error log filling with the following warning:

“[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT clause. This is unsafe because the set of rows included cannot be predicted. Statement: DELETE FROM score WHERE user_id = 12345 AND created = ’2012-04-15′ LIMIT 1″

If binary logging is enabled and the log format is set to STATEMENT, MySQL generates such message when it considers that a query is ambiguous and could behave differently each time it executes against the same data set. Such situation could happen, for example, on a replication slave, or on a restored backup where binary logs are used for point-in-time recovery.

Why? The reason lies in the relational database theory, which defines table as a set of rows, while a set does not assume any particular order. What if there were a few rows matching user_id = 12345 and created = '2012-04-15'? Well, in theory, such DELETE statement could read and remove a different row each time, because if all share the same values in user_id and created, how to decide which one comes first or last? Of course, a database implementation, such as MySQL, typically relies on some implicit order even if the theory behind it does not. But then again, there may be a different problem. Even if MySQL uses some implicit sort order for storing rows, does it guarantee the order is always the same? No.

On master and slave rows can be stored in different order

Two rows appear in different order on Master and Slave. The two databases are still identical from the SQL point of view, although the way they organized data is different. Now, if our statement started deleting rows from the top, it would remove two different rows from each instance and the two databases would drift out of sync. This is what MySQL is warning you about by saying This is unsafe because the set of rows included cannot be predicted. The problem does not exist when there is no LIMIT clause, because then MySQL simply removes all rows matching the criteria and it does not matter to the final result in which order it does that.

So is seeing such warning message a problem? Definitely not, if binary logs aren’t used for anything – not replication, not recovery. Otherwise the answer would have to be yes, although it does not imply there is anything wrong with the data already. And although chances are that nothing bad will happen even when MySQL keeps complaining, you should never rely on luck – fix the problem.

The real option to address this is moving to either MIXED or ROW format for binary logging, which is defined in binlog_format MySQL option. And what is cool, it can also be set at runtime with SET (for current session only) or SET GLOBAL (globally), so no restart is even necessary. However don’t rush to change it just yet. Be sure to read and understand what the differences between all these formats are first as making this change might break something that your applications rely on. But in the end, although it seems like a nasty workaround, it’s even possible to use ROW format for just one specific query by using the following sequence:

SET SESSION binlog_format = ROW;
DELETE ...;
SET SESSION binlog_format = STATEMENT;

An alternative approach to alleviate the potential problem could increasing the chances the same row gets chosen, however it can only be done under the assumption that all instances contain identical data.

To make the example a little bit more complete, here is the query again along with the underlying data structure:

DELETE FROM score WHERE user_id = 12345 AND created = '2012-04-15' LIMIT 1

CREATE TABLE `score` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`created` date NOT NULL,
`score` int(11) NOT NULL,
[..]
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`, `created`)
)

If data on master and slave is identical, to prevent potential problems, one could use ORDER BY on a unique column, such as id for example. That could help ensuring that such query would always pick the youngest or the oldest row out of the matching ones. There should be no confusion anymore as the order would be always the same regardless of the circumstances. Again, this may fail to work if some instances involved are not identical to others (e.g. master keeps only recent data, while some slave holds the full archive).

The modified query would look like this:

DELETE FROM score WHERE user_id = 12345 AND created = '2012-04-15' ORDER BY id ASC LIMIT 1

The warning message discussed in this articles isn’t necessarily a nasty problem that needs urgent attention, however it might be a sign of problems ahead. It is definitely worth an investigation to verify whether it may or may not be causing any damage to data. A fix is relatively trivial, although it should be done carefully and with analysis of the possible consequences of implementing the changes.


PlanetMySQL Voting: Vote UP / Vote DOWN

Mysql HA solutions

Апрель 12th, 2012
Lets see what HA solutions can be designed in mysql and where are they suited. 1. Single master - single slave. M(RW)|S(R) A simple master slave solution can be used for a small site - where all the inserts go into the master and some (non-critical) requests are served from the slave. In case if the master crashes, the slave can be simply promoted as the master - once it has replicated the "
PlanetMySQL Voting: Vote UP / Vote DOWN