Archive for the ‘transactions’ Category

Consistent transactions between storage engines

Февраль 5th, 2012

You may not realize it, but in MariaDB 5.2 and earlier and in MySQL up to version 5.5, START TRANSACTION WITH CONSISTENT SNAPSHOT does not give any guarantees of consistency between different storage engines.

For example, suppose you have two transactions which run in parallel:

Transaction T1:

BEGIN;
    SET @t = NOW();
    UPDATE xtradb_table SET a= @t WHERE id = 5;
    UPDATE pbxt_table SET b= @t WHERE id = 5;
    COMMIT;

Transaction T2:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
    START TRANSACTION WITH CONSISTENT SNAPSHOT;
    SELECT t1.a, t2.b
      FROM xtradb_table t1 INNER JOIN pbxt_table t2 ON t1.id=t2.id
    WHERE t1.id = 5;

In the above case, it is possible, even with a "consistent" snapshot, to see the changes in a transaction only in InnoDB/XtraDB tables, and not in PBXT tables.

Naturally, it would be much better if the changes were visible no matter the storage engine used by the table. Thankfully this is possible, thanks to the introduction in MariaDB 5.3 of group commit and an enhanced storage engine API for COMMIT which allows storage engines to coordinate commit ordering and visibility with each other and with the binary log.

With these improvements, the START TRANSACTION WITH CONSISTENT SNAPSHOT statement in MariaDB 5.3 has been enhanced to ensure consistency in-between storage engines which support the new API. Currently the storage engines which support this are XtraDB (MariaDB's enhanced version of InnoDB) and PBXT. In addition, the binary log, while not a storage engine as such, also supports the new API and can provide a binlog position consistent with storage engine transaction snapshots.

This means that with transaction isolation level at least REPEATABLE READ, the START TRANSACTION WITH CONSISTENT SNAPSHOT statement can be used to ensure that queries will see a transaction-consistent view of the database between storage engines.

No longer is it possible for a query to see the changes from some transaction T in XtraDB tables without also seeing the changes T makes to PBXT tables.

More information on this can be found in the AskMonty Knowledgebase .


PlanetMySQL Voting: Vote UP / Vote DOWN

Better scaling of read-only workloads

Декабрь 20th, 2011

The problem and its cause

There have been several complaints over the years about InnoDB’s inability to scale beyond 256 connections. One of the main issues behind this scalability bottleneck was the read view creation that is required for MVCC (Multi Version Concurrency Control) to work. When the user starts a transaction this is what InnoDB does under the hood:

  • Create or reuse a transaction instance – usually it is reused, the transactions are reused from a pool (trx_sys_t::mysql_trx_list).
  • Initialize the transaction start time and assign a rollback segment
  • Append the transaction to an active  transaction list ordered on trx_t::id in descending order

The append to  the trx_sys_t::trx_list and corresponding remove during commit is covered by trx_sys_t::mutex. After the transaction is “started” and if the transaction has an isolation greater than or equal to REPEATABLE-READ then before the first record/row is accessed by the transaction, InnoDB creates a view (snapshot) of the running system state. It does this by examining the transactions that are active at the time of the MVCC snapshot, so that their changes can be excluded from the creating transaction’s read view. This read view creation is also covered by the trx_sys_t::mutex. As the number of active transactions in the system increases this read view creation takes longer and longer. This increases the wait times on the trx_sys_t::mutex (during transaction start and commit) and once threads are forced to wait on a condition variable (in contrast to simply spinning while waiting for the mutex) the system throughput drops dramatically.

The solution

While investigating this problem there were two observations that I made:

  • Read only transactions should not be considered in the MVCC snapshot
  • Auto commit non-locking read-only selects should not be in the trx_sys_t::trx_list at all

For the first to work we need to tag the transactions as READ ONLY when the transaction is started e.g.,

  •       START TRANSACTION READ ONLY;

I will not be discussing this functionality in this blog because the syntax for this doesn’t exist in MySQL (yet). However, the functionality exists in InnoDB to handle this case and is  in the 5.6.4 release. Once the above syntax exists, InnoDB can take advantage of the new syntax trivially. What I want to talk about is the second case. This special case can be detected by InnoDB using existing state information and handled transparently without any syntax change in user applications and is fully functional in the 5.6.4 release.

InnoDB transaction life cycle redesign

Split the trx_sys_t::trx_list (the active transactions list) into two, trx_sys_t::ro_trx_list and trx_sys_t::rw_trx_list. Only transactions that are in the trx_sys_t::rw_trx_list are taken into consideration when creating the MVCC snapshot. For a read-only heavy work load the benefits are obvious,  the smaller size of the RW active transaction list makes the read view creation for MVCC (and purge) very fast. For auto-commit read-only non-locking selects the additional benefit is that we don’t need to acquire the trx_sys_t::mutex at all because we don’t put them on the active list. This removes the bottleneck around the trx_sys_t::mutex and improves concurrency and scalability.

Auto-commit read-only non-locking transactions go from state NOT STARTED -> ACTIVE -> NOT STARTED, in contrast to locking read-only (and read-write) transactions which go from state NOT STARTED -> ACTIVE -> COMMIT TO MEMORY -> NOT STARTED. The additional advantage in skipping the COMMIT TO MEMORY state is that we know that they cannot acquire any locks and therefore it is pointless to acquire the lock_sys_t::mutex and attempt lock release. Also, during COMMIT because they are not on any active transaction list we don’t need to acquire the trx_sys_t::mutex to remove them from the list, improving concurrency and performance further.

Changes to transaction state visibility

Currently (5.6.4) doesn’t display the auto-commit read-only non-locking selects in “SHOW ENGINE INNODB STATUS“, however they are visible in the INFORMATION_SCHEMA.innodb_trx table. The innodb_trx table has two new fields that can be queried to determine whether  a transaction is tagged as a read-only transaction and additionally whether it qualifies for the special handling of auto-commit read-only non-locking select. The new fields are:

  • trx_read_only – 0 or 1 (INT)
  • trx_autocommit_non_locking – 0 or 1 (INT)

Some additional minor tweaks, read-only transactions aren’t assigned a rollback segment and if they are flagged as autocommit non-locking selects then we only set the start time once every 32 times, his reduces the overhead of a system call.

Test results

These tests were run by Alexey Stroganov (a.k.a Ranger) using Sysbench (with –oltp-skip-trx=on), wl6046 refers to the internal worklog number of this performance fix. Note how InnoDB scales easily up to 4K threads on the 24 host and 1.5K threads on the 8 core host, there is very little (negligible) drop in the TPS as the number of threads is cranked up. In particular note how the peak TPS on the 24 core host is nearly double :-)

Test POINT_SELECT
  • QUERY - SELECT c FROM sbtest WHERE id=N


Test SIMPLE_RANGES

  • QUERY – SELECT c FROM sbtest WHERE id BETWEEN N AND M

The same result for SIMPLE RANGES, at lower thread levels 5.6 has less TPS than 5.5 on the 8 core host however that is due to another known and unrelated issue that is currently being worked on actively.

Conclusion

This is a significant step forward in InnoDB read-only performance and will allow InnoDB to scale up to a very high number of concurrent queries and take advantage of greater number of processors with the improved parallelism. Note: mixed workloads with a read-only heavy component will also benefit from this improvement. Currently, only if the read-only transactions are auto-commit non-locking selects. The better news is that with this split of the active transaction list into two, we can now optimize deadlock detection, lock release and transaction scheduling around lock grants and waits, stay tuned!


PlanetMySQL Voting: Vote UP / Vote DOWN

SQL Locking and Transactions – OSDC 2011 video

Ноябрь 24th, 2011

This recent session at OSDC 2011 Canberra is based on part of an Open Query training day, and (due to time constraints) without much of the usual interactivity, exercises and further MySQL specific detail. People liked it anyway, which is nice! The info as presented is not MySQL specific, it provides general insight in how databases implement concurrency and what trade-offs they make.

See http://2011.osdc.com.au/sqll for the talk abstract.


PlanetMySQL Voting: Vote UP / Vote DOWN

Dispelling some unintentional MySQL FUD

Ноябрь 28th, 2010
Sakila dispelling FUD There are three types of FUD: the first and more genuine is (#1) the intentional spreading of falsehood, mostly to gain some marketing advantage over a competing product. While I despise this practice, I understand it.
Then there is (#2) FUD spread by ignorance, when the originators are so blindly enraged by their hatred for a product that they don't care about getting the facts straight.
And finally, there is a third kind, not less dangerous, which is (#3) the spreading of FUD with good intentions, when the authors believe that they have the facts straight and they want to help.
I have recently come across two examples of unintentional FUD. For different reasons, my comments to these public cases did not get through, and then I have to say something about that here in my blog.

MySQL is not ACID complaint

This surprising piece of news came in the blog of a company that calls itself the remote DBA experts.
The claim is this: if I insert a record in a table and then issue a ROLLBACK command, the record is not rolled back.
Anyone who has a minimal knowledge of MySQL knows about InnoDB tables (luckily for the poster, InnoDB is default in MySQL 5.5.6, which he was testing) and autocommit.
Reading through the example, one sees that the poster did not know about this piece of information. In MySQL, autocommit is ON by default. So if you want to rollback a record, you need to deactivate it. This is not optimal, and it can be debated, but if you read the docs, you don't claim something that is simply the result of your lack of knowledge. MySQL has shortcomings, but being unable to rollback a record is not one of them. Hence, this is FUD type #2.
Why I am writing all this here and not as a comment in that blog? Because I did post a comment, on November 23rd, but as of today, it has not been approved yet. The same is true for comments posted by other more knowledgeable people.

MySQL licenses. When it's free and when you need to pay for one.

This article is well intentioned. MySQL Licenses: The Do's and Don'ts of Open Source, or What's All the Fuss About? is a well thought piece, with practical examples, to help users decide what to do with MySQL licensing, i.e. when they need to pay and when they don't. Unfortunately, the article contains some unintentional confusion, and therefore leaves the readers with more wrong ideas than they had before.
I left a long comment on that blog, but for some unfathomable reason it was reduced to a tiny piece, and thus the need for explaining the matter here again.
The poster says this:
I make commercial software, which needs to have MySQL installed. My customers can use my commercial software, for which they do need to buy a license, in combination with the MySQL database engine, for which they don't need to pay. Because the MySQL engine is not embedded in my commercial software and I don't redistribute MySQL together with my software, I don't need a commercial license for MySQL and neither do my customers.
I am afraid that this wishful information is not correct. The GPL FAQ states it clearly:
If a library is released under the GPL (not the LGPL), does that mean that any program which uses it has to be under the GPL or a GPL-compatible license?
Yes, because the program as it is actually run includes the library.

Another quote:
However... as long as I have no desire to sell the embedded MySQL source code commercially, I can let the GPL license apply.
Also this is not true. The GPL does not regulate commercial transactions. It only deals with distribution of software. If I want to distribute a public domain but GPL-incompatible software linked to a GPL application or library, I am violating the GPL, even if I don't charge anything.

Another source of disinformation is "If you decide to pay for a MySQL license, you don't actually pay for the software."
This is also incorrect. Oracle sells two kind of things with MySQL. One thing is a subscription to services (MySQL Enterprise). If you buy this, you are not getting a license (unless you ask for it explicitly) but an agreement about services for a given periods.
The other thing that Oracle sells is licenses. They can do it because they own the source code, and they can decide to release it either as GPL (which is what you download from the MySQL site) or with a commercial license. If you ask for a license, you will most definitely get one. You can also get a license together with a subscription, if you are so inclined, but that doesn't mean that you aren't buying a license.
The important thing to understand to put the matter in perspective, is that the above information about licensing was still true before 2008, when MySQL was owned by MySQL AB, and it is still true today. Oracle, despite all the preemptive accusations of being ill intentioned, has not changed the rules of the game.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Idiosyncrasies That Bite

Июнь 28th, 2010

The following are my slides that I presented at ODTUG Kaleidoscope 2010. This presentation talks about the MySQL defaults including a non-transactional state, silent data truncations, date management, and transaction isolation options. These are all critical for data integrity and consistency. I also cover in-depth topics including SQL_MODE, character sets and collations.


PlanetMySQL Voting: Vote UP / Vote DOWN

Benchmarking MySQL ACID performance with SysBench

Июнь 21st, 2010

A couple of question I get a lot from MySQL customers is “how will this hardware upgrade improve my transactions per second (TPS)” and “what level of TPS will MySQL perform on this hardware if I’m running ACID settings?” Running sysbench against MySQL with different values for per-thread and global memory buffer sizes, ACID settings, and other settings gives me concrete values to bring to the customer to show the impact that more RAM, faster CPUs, faster disks, or cnf changes have on the server. Here are some examples for a common question: “If I’m using full ACID settings vs non-ACID settings what performance am I going to get from this server?”

Let’s find out by running sysbench with the following settings (most are self explanatory – if not the man page can explain them):

  • sysbench –test=oltp –db-driver=mysql –oltp-table-size=1000000 –mysql-engine-trx=yes –oltp-test-mode=complex –oltp-read-only=off –oltp-dist-type=special –max-requests=0 –num-threads=8 –max-time=120 –init-rng=on run

MySQL Settings:

In the first test MySQL is set to the following ACID related settings. This will give us results for TPS performance without full ACID compliance – very common settings on a server that is handling blogs, ad serving, general business websites, and other roles where full ACID is not required and performance is valued over the benefits of full ACID. These are important settings when we look at the difference in performance when we change to full ACID in the second test.

  • innodb_flush_log_at_trx_commit = 0
  • sync_binlog=0
  • transaction-isolation=REPEATABLE-READ

System configuration and InnoDB buffer pool size:

  • XEON E5345 Series 2.33ghz 8-core, 16GB RAM, Local SATA 7.2K disks
  • innodb_buffer_pool_size = 10G

Full result set from sysbench:

Summary OLTP test statistics:

  • queries performed:
  • transactions:                        172426 (1436.83 per sec.)
  • read/write requests:                 3276664 (27304.51 per sec.)
  • other operations:                    344882 (2873.91 per sec.)

Take away results:

We can simplify the results by looking at the following TPS results for this non-ACID test:

  • transactions:                        172426 (1436.83 per sec.)

Let’s go ahead and run the test again with different ACID settings. This will give us the TPS results for full ACID compliance:

  • innodb_flush_log_at_trx_commit = 1
  • sync_binlog=1
  • transaction-isolation=REPEATABLE-READ

We get the following results for TPS:

  • transactions:                     3197   (26.58 per sec.)
  • read/write requests:                 60743  (505.04 per sec.)
  • other operations:                    6394   (53.16 per sec.)

Final Results:

So as you can see the difference between full ACID settings and not (on the same server with only those values on the cnf being changed) results in a huge difference in performance on this standard database server. We can now hand this data to the customer and they will know what impact the settings will have on their application’s performance and what to expect when running full ACID vs non-ACID.

More info on using sysbench here: http://sysbench.sourceforge.net


PlanetMySQL Voting: Vote UP / Vote DOWN

When commit appears to fail

Октябрь 24th, 2009
So you're using explicit transactions. Everything appears to work (every query gives the expected result) until you get to COMMIT.

Then you get an exception thrown from COMMIT. What happened?

Usually this would be because the server has been shut down, or you've lost the connection.

The problem is, that you can't assume that the commit failed, but you also can't assume it succeeded.

A robust application must make NO ASSUMPTION about whether a failed commit did, indeed, commit the transaction or not. It can safely assume that either all or none of it was committed, but can't easily tell which.

So the only way to really know is to have your application somehow remember that the transaction MIGHT have failed, and check later.

Possible solutions:
  • Ignore it and deal with any inconsistencies manually, or decide that you don't care :)
  • Write your entire transaction such that if it is repeated having been committed previously, it is a no-op or has no harmful side effects (e.g. change INSERT to INSERT IGNORE to avoid unique index violations). This is rather difficult for complex transactions, but works for some.
  • Check, in your code, even if the commit apparently succeeded that it really did. If you discover that it didn't, then retry or report failure to the caller / user.
  • Perform another transaction to "undo" or "cancel" whatever the original transaction did if commit failed (Problem 1: this may itself fail to commit; problem 2: for a short time, an inconsistency exists)
None of these is ideal, and I'd like to think that this never happens. But if you do enough transactions, it's going to happen sooner or later (networks and servers do fail, unfortunately)

In testing, we've found that this happens so infrequently that even deliberate malice (kill -9 of mysql while loads of transactions are processing etc) fails to reproduce it reliably. In the end, a set of iptables rules which blocks the response from a commit was constructed.

PlanetMySQL Voting: Vote UP / Vote DOWN