Archive for the ‘InnoDB’ Category

1 Billion Insertions – The Wait is Over!

Январь 26th, 2012

iiBench measures the rate at which a database can insert new rows while maintaining several secondary indexes. We ran this for 1 billion rows with TokuDB and InnoDB starting last week, right after we launched TokuDB v5.2. While TokuDB completed it in 15 hours, InnoDB took 7 days.

The results are shown below. At the end of the test, TokuDB’s insertion rate remained at 17,028 inserts/second whereas InnoDB had dropped to 1,050 inserts/second. That is a difference of over 16x. Our complete set of benchmarks for TokuDB v5.2 can be found here.

Benchmark Details: Ubuntu 10.10; 2x Xeon X5460; 16GB RAM; 8x 146GB 10k SAS in RAID10. Each data point is the average insertion rate for the last 2 million rows. 

We developed the iiBench benchmark to measure performance for a use case that occurs commonly in production applications, such as online advertising, social media, and network management.

iiBench simulates a pattern of usage for always-on applications that:

  • Require fast query performance and hence require indexes
  • Have high data insert rates
  • Cannot wait for offline batch processing and hence require the indexes be maintained as data comes in

Note that iiBench was created as an open-source benchmark, which allows others to freely use it, extend it, and contribute their changes back. We originally unveiled the benchmark in the context of a challenge issued at the 2008 OpenSQL camp. Since then, iiBench has been downloaded and used many times, and ported by the community (in this case, Mark Callaghan) to a Python Script.

Please let us know any feedback you have on iiBench. For additional information on…

  • iibench overview click here
  • TokuDB version 5.2 Overview click here
  • TokuDB version 5.2 Performance, including iibench, SysBench, Compression, and TPCC-like, click here

PlanetMySQL Voting: Vote UP / Vote DOWN

Building MariaDB 5.1 on Windows

Январь 23rd, 2012

Recently, I found myself needing MariaDB 5.1.60 for Windows for some testing purposes. Therefore, I needed to build it from source. I ended up using what I’d call a “blend” of the commands listed in this “how-to” and the readme file INSTALL-WIN-SOURCE, so I thought I’d post those steps.

  1. Download 5.1.60 MariaDB source from here.
  2. cd C:\mariadb-5.1
  3. win\configure.js
  4. cmake .
  5. VS: File -> Open -> Solution -> MySql.sln
  6. VS: Build -> Build Solution
  7. VS: Right-click “PACKAGE” -> Build (in “Solution Explorer” View)

That’s it.

Let’s fire it up:

MariaDB> select version();
+----------------------+
| version()            |
+----------------------+
| 5.1.60-MariaDB-debug |
+----------------------+

MariaDB> show global variables like 'innodb_version';
+----------------+-------------+
| Variable_name  | Value       |
+----------------+-------------+
| innodb_version | 1.0.17-13.0 |
+----------------+-------------+

MariaDB> show engines;
+------------+---------+---------------------------...
| Engine     | Support | Comment		   ...
+------------+---------+---------------------------...
| CSV        | YES     | CSV storage engine	   ...
| InnoDB     | DEFAULT | Percona-XtraDB, Supports t...
| PBXT       | YES     | High performance, multi-ve...
| MARIA      | YES     | Crash-safe tables with MyI...
| MyISAM     | YES     | Default engine as of MySQL...
| FEDERATED  | YES     | FederatedX pluggable stora...
| MRG_MYISAM | YES     | Collection of identical My...
| MEMORY     | YES     | Hash based, stored in memo...
+------------+---------+---------------------------...

..

For reference, here are my full outputs:

C:\Users\Chris>cd ..\..\mariadb-5.1

C:\mariadb-5.1>win\configure.js

C:\mariadb-5.1>cmake .
-- Check for working C compiler: cl
-- Check for working C compiler: cl -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: cl
-- Check for working CXX compiler: cl -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
build CSV as static library (libcsv.lib)
build FEDERATEDX as static library (libfederatedx.lib)
build HEAP as static library (libheap_s.lib)
build MARIA as static library (libmaria_s.lib)
build MYISAM as static library (libmyisam_s.lib)
build MYISAMMRG as static library (libmyisammrg_s.lib)
build PBXT as static library (libpbxt_s.lib)
build XTRADB as static library (libxtradb.lib)
build ARCHIVE as DLL (ha_archive.dll)
build BLACKHOLE as DLL (ha_blackhole.dll)
build EXAMPLE as DLL (ha_example.dll)
build FEDERATED as DLL (ha_federated.dll)
build INNODB_PLUGIN as DLL (ha_innodb_plugin.dll)
-- Configuring done
-- Generating done
-- Build files have been written to: C:/mariadb-5.1

Open Visual Studio -> File -> Open -> Project/Solution -> Select C:\mariadb-5.1.60\MySql.sln

Build Solution Output:

========== Build: 79 succeeded, 0 failed, 2 up-to-date, 2 skipped ==========

Package Build Output:

1>------ Build started: Project: PACKAGE, Configuration: Debug Win32 ------
1>
1>Performing Post-Build Event...
1>CPack: Create package using NSIS
1>CPack: Install projects
1>CPack: - Install project: MySql
1>CPack: -   Install component: Unspecified
1>CPack: -   Install component: headers
1>CPack: -   Install component: mysqltest
1>CPack: -   Install component: runtime
1>CPack: -   Install component: scripts
1>CPack: -   Install component: sqlbench
1>CPack: Compress package
1>CPack: Finalize package
1>CPack: Package C:/mariadb-5.1/MariaDB-5.1.60-win32.exe generated.
1>Build log was saved at "file://c:\mariadb-5.1\PACKAGE.dir\Debug\BuildLog.htm"
1>PACKAGE - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 81 up-to-date, 0 skipped ==========

 
 


PlanetMySQL Voting: Vote UP / Vote DOWN

Fractal Tree Indexes and Mead – MySQL Meetup

Январь 11th, 2012

 
Thanks again to Sheeri Cabral  for having me at the Boston MySQL Meetup on Monday for the talk on “Fractal Tree® Indexes – Theoretical Overview and Customer Use Cases.” The crowd was very interactive, and I appreciated that over 50 people signed up for the event and left some very positive comments and reviews.

In addition, the conversation spilled over late into the night as we made our way over to nearby Mead Hall afterwards for a few drinks, some food, and to continue the discussion.

The presentation is available here.

As a brief overview – most databases employ B-trees to achieve a good tradeoff between the ability to update data quickly and to search it quickly. It turns out that B-trees are far from the optimum in this tradeoff space. This led to the development at MIT, Rutgers and Stony Brook of Fractal Tree indexes. Fractal Tree indexes improve MySQL® scalability and query performance by allowing greater insertion rates, supporting rich indexing and offering efficient compression. They can also eliminate operational headaches such as dump/reloads, inflexible schemas and partitions.

The presentation provides an overview on how Fractal Tree indexes work, and then gets into some specific product features, benchmarks, and customer use cases that show where people have deployed Fractal Tree indexes via the TokuDB® storage engine.
 


PlanetMySQL Voting: Vote UP / Vote DOWN

Tracking Server Variables, Documentation, Manuals, Changelogs for MySQL, InnoDB, MariaDB, and XtraDB

Январь 9th, 2012

I find myself constantly looking up server variables (and manuals and changelogs) for MySQL, MariaDB, and XtraDB, which versions they are in, and so forth. So I finally created a couple pages which contain the links to all of these various bits of information across the various flavors of MySQL.

I’ve been using them every day, so I thought some others might want to bookmark these as well.

I’ve created the following:

o Changelogs
o Documentation
o Server Variables
o InnoDB Plugin Versions

The Changelogs page contains links for MySQL 3.23 up through 5.6, MariaDB 5.1 – 5.3, XtraDB 5.1 – 5.5, Xtrabackup 1.3 – 1.6, Connector/ODBC, Connector/.NET, Connector/J, Connector/C++, Connector/MXJ, and MySQL Proxy.

The Documentation page contains links for the MySQL manuals 3.23-5.6, InnoDB Plugin 1.0 – 1.1, MariaDB 5.1 – 5.3, XtraDB 5.1 – 5.3, and Xtrabackup 1.6.

The Server Variables page contains the links for the MySQL Server variables for all versions, the server variable cross-reference chart, all InnoDB startup variables, and new MariaDB options.

And lastly, the InnoDB Plugin Versions (which I did mention a couple weeks ago) contains all InnoDB Plugin version information, such as which InnoDB plugin is included with which MySQL release, as well as links to changelogs, and other relevant information.

If you find yourself wishing there were some other links added, just post me a comment and I’ll try to get it added asap.

 
 


PlanetMySQL Voting: Vote UP / Vote DOWN

Tracking Server Variables, Documentation, Manuals, Changelogs for MySQL, InnoDB, MariaDB, and XtraDB

Январь 9th, 2012

I find myself constantly looking up server variables (and manuals and changelogs) for MySQL, MariaDB, and XtraDB, which versions they are in, and so forth. So I finally created a couple pages which contain the links to all of these various bits of information across the various flavors of MySQL.

I’ve been using them every day, so I thought some others might want to bookmark these as well.

I’ve created the following:

o Changelogs
o Documentation
o Server Variables
o InnoDB Plugin Versions

The Changelogs page contains links for MySQL 3.23 up through 5.6, MariaDB 5.1 – 5.3, XtraDB 5.1 – 5.5, Xtrabackup 1.3 – 1.6, Connector/ODBC, Connector/.NET, Connector/J, Connector/C++, Connector/MXJ, and MySQL Proxy.

The Documentation page contains links for the MySQL manuals 3.23-5.6, InnoDB Plugin 1.0 – 1.1, MariaDB 5.1 – 5.3, XtraDB 5.1 – 5.3, and Xtrabackup 1.6.

The Server Variables page contains the links for the MySQL Server variables for all versions, the server variable cross-reference chart, all InnoDB startup variables, and new MariaDB options.

And lastly, the InnoDB Plugin Versions (which I did mention a couple weeks ago) contains all InnoDB Plugin version information, such as which InnoDB plugin is included with which MySQL release, as well as links to changelogs, and other relevant information.

If you find yourself wishing there were some other links added, just post me a comment and I’ll try to get it added asap.

 
 


PlanetMySQL Voting: Vote UP / Vote DOWN

FictionPress Selects TokuDB for Consistent Performance and Fast Disaster Recovery

Январь 3rd, 2012

FictionPress

Issues addressed:

  • Support complex and efficient indexes at 100+ million rows.
  • Predicable and consistent performance regardless of data size growth.
  • Fast recovery.

Ensuring Predictable Performance at Scale

The Company:  FictionPress operates both FictionPress.com and FanFiction.net and is home to over 6 million works of fiction, with millions of writers/readers participating from around the world in over 30 languages

The Challenge: FictionPress offers a number of interactive features to its large user base. These include discussion forums, in-site messaging and user reviews. FictionPress made the decision to build its own discussion forums to meet its strict security and performance requirements. Xing Li, CTO of FictionPress, noted that the site “needs to host hundreds of thousands of forums. Existing forum software doesn’t do this while meeting our performance and security targets.”

To ensure the real-time responsiveness of the forums, FictionPress needs the ability to create and efficiently maintain complex indexes and be able to support millions of small rows. In addition, it needs the ability to index them with minimal impact to resource costs and performance. “The only way to make this all work and provide a good customer experience is to guarantee that we can deliver a flat predictable performance with our database back-end even as the number of rows crosses the 100 million mark,” according to Li.

FictionPress considered InnoDB, the default storage engine for MySQL, but it did not offer predictable performance at scale. Indexes became dramatically slower as the number of rows increased, causing a reduction of both read and write performance. InnoDB also did not offer the performance-enhancing feature of multiple clustering indexes.

The Solution:  FictionPress uses MariaDB and TokuDB to manage its discussion forums, reviews, and in-site messaging systems.

FictionPress installed TokuDB in a Linux environment with dedicated hardware. Each configuration has a single master with multiple read slaves. “TokuDB’s high write concurrency and support for multiple clustering indexes gave us the freedom to design and deploy better performing queries at scale,” according to Li. This was important to FictionPress as its environment is continually expanding.

The Benefits:

Predictable Performance: “While raw performance is important, the predictability of response time as one scales the system was our focal point” according to Li. “InnoDB can only have one clustering index, but TokuDB gives you basically an unlimited number. In addition, both MyISAM and InnoDB slow down with many indexes on databases of our size. MyISAM also causes replication lag due to concurrency. In the end, TokuDB gives us predictability, performance at scale, and more flexible indexing without the limitations found in other MySQL options.”

Cost: “To get additional performance, one can always throw hardware at the problem,” according to Li. “By utilizing TokuDB instead we improved scalability and at the same time saved on costs for additional server hardware that would have been required if TokuDB was not in the picture. In addition, we saw an 8x size reduction in disk space compared to MyISAM due to improved compression. The hardware cost saving made moving to TokuDB an easy decision.”

Crash Recovery: FictionPress had been using MyISAM initially. “We needed a replacement for MyISAM for small BLOB data,” according to Li. “In fact, we wanted to move away from MyISAM whenever possible to shorten its long crash recovery. InnoDB was an option but TokuDB offered better compression and a smaller storage footprint for both core data and index data for our own data sets.”

Hot Schema Changes: “For performance reasons we need a lot of indexes but also need to add and maintain these indexes quickly,” according to Li. “TokuDB is the only MySQL solution I found that offers Hot Schema changes such as Hot Indexing. Hot Schema changes are a powerful capability which we use to minimize downtime during system-wide upgrades and shorten our application/schema development cycle.”

 

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Profiling your slow queries using pt-query-digest and some love from Percona Server

Декабрь 28th, 2011
This guide will get you up and running with how to identify the bottleneck queries using the excellent tool pt-query-digest. You will learn how to use and analyze the output returned by pt-query-digest. You will also learn some differences between slow query logging in various MySQL versions. Later on in the post I will also show you how to make use of the extra diagnostic data available with Percona Server.
PlanetMySQL Voting: Vote UP / Vote DOWN

Learning to love the InnoDB Lock Monitor

Декабрь 23rd, 2011

A customer opened a support issue to ask about some help determining why they were seeing a lot of Lock Wait Timeouts. I asked them to enable the InnoDB Lock Monitor so that I could get a look at what was going on in their transactions and whether there might be some locks held longer than necessary.

The customer sent in a 184MB MySQL error log with 4773836 lines. I started looking through it, but I could tell I was going to need a better way to get a better overview of the file than what I'd be able to piece together trying to poke through it and look for individual lines. I started piping the file through a variety of UNIX tools to narrow down what I was seeing.

I ended up with this mess:

 < mysqld.err grep ACTIVE | cut -d' ' -f 2,4 | sort -rn -k 2 | perl -F, -ane 'print "$F[0] $F[1]" if not $v{$F[0]}; $v{$F[0]} = $F[1];' | head

It's hideous, but it's pretty helpful. Here's the output:

1EC080F1  24284
1EC84325  18196
1ECAC476  13430
1ED57B19  4880
1ECFF656  4198
1ED54BF5  1229
1ED59197  193
1ED58A84  164
1ED58A80  164
1ED50FDA  134

It's just a list of the longest-running transactions, showing the transaction ID and the time it's been active.

I decided to re-write it in perl, since I was already using perl in my pipeline and wanted the opportunity to get some other output in a slightly more reasonable way. Here's the perl version:

#!/usr/bin/perl -n
if ( /ACTIVE (\d+) sec/ ) { 
    @r = split(' '); 
    $t{substr($r[1],0,8)} = $r[3] if not defined $t{$r[1]} and $t{$r[1]}<$r[3]; 

}  

END { 
    map { 
        print "$_ $t{$_}\n" 
    } (
        sort { 
            $t{$b} <=> $t{$a} 
        } keys %t
    )[0..9]  
}
$ perl toptrans.pl < mysqld.err
1EC080F1 24284
1EC84325 18196
1ECAC476 13430
1ED57B19 4880
1ECFF656 4198
1ED54BF5 1229
1ED59197 193
1ED58A80 164
1ED58A84 164
1ED50FDA 134

That gives exactly the same output as the first version, but I figure it might be easier to change the criteria I'm looking for or the output contents/format.

But really parsing through InnoDB status or monitor information is not very much fun. I got lucky that all the information I was looking for was on a single line. If I want to get anything more, though, such as the number of locks being held, any information about the locks, the MySQL thread ID, et cetera, it'd be really nasty. I thought that there must be something better out there, and a bit of Google quickly reminded me that Baron Schwartz [http://www.xaprb.com] wrote a really excellent InnoDB status parser for use in innotop [http://code.google.com/p/innotop/].

The innotop script includes the InnoDBParser module inside of the file. You can just do a require to get access to the goods, then instantiate a new InnoDBParser. I ended up using Data::Dumper to make sense of the structure of the data:

#!/usr/bin/perl
use Data::Dumper;
require "innotop";
my $innodb_parser = new InnoDBParser;


my $contents;
{
   local $INPUT_RECORD_SEPARATOR = undef;
   $contents = ;
}

my $innodb_status = $innodb_parser->parse_status_text(
   $contents,
   0,
   # Omit the following parameter to get all sections.
#   {  tx => 1 },
);

$Data::Dumper::Indent = 2;
print Dumper $innodb_status;
$ START=191; END=319; tail -n +$START < mysqld.err | head -n $(( $END - $START + 1 )) | perl txinfo.pl

$VAR1 = {
          got_all => 1,
          last_secs => '20',
          sections => {
                        bp => {
                                add_pool_alloc => '0',
                                awe_mem_alloc => 0,
                                buf_free => '7822',
                                buf_pool_hit_rate => undef,
                                buf_pool_hits => undef,
                                buf_pool_reads => undef,
                                buf_pool_size => '8192',
                                complete => 1,
                                dict_mem_alloc => '41875',
                                page_creates_sec => '0.00',
                                page_reads_sec => '0.00',
                                page_writes_sec => '0.00',
                                pages_created => '1',
                                pages_modified => '0',
                                pages_read => '368',
                                pages_total => '369',
                                pages_written => '32',
                                reads_pending => '0',
                                total_mem_alloc => '137363456',
                                writes_pending => '0',
                                writes_pending_flush_list => 0,
                                writes_pending_lru => 0,
                                writes_pending_single_page => 0
                              },
                        bt => {
                                complete => 1,
                                fulltext => 'srv_master_thread loops: 4 1_second, 4 sleeps, 0 10_second, 6 background, 6 flush
srv_master_thread log flush and writes: 4'
                              },
                        ib => {
                                bufs_in_node_heap => undef,
                                complete => 1,
                                free_list_len => undef,
                                hash_searches_s => '0.00',
                                hash_table_size => undef,
                                inserts => undef,
                                merged_recs => undef,
                                merges => undef,
                                non_hash_searches_s => '0.00',
                                seg_size => undef,
                                size => undef,
                                used_cells => undef
                              },
                        io => {
                                avg_bytes_s => '0',
                                complete => 1,
                                flush_type => 'fsync',
                                fsyncs_s => '0.00',
                                os_file_reads => '379',
                                os_file_writes => '30',
                                os_fsyncs => '11',
                                pending_aio_writes => undef,
                                pending_buffer_pool_flushes => '0',
                                pending_ibuf_aio_reads => '0',
                                pending_log_flushes => '0',
                                pending_log_ios => '0',
                                pending_normal_aio_reads => undef,
                                pending_preads => 0,
                                pending_pwrites => 0,
                                pending_sync_ios => '0',
                                reads_s => '0.00',
                                threads => {
                                             '0' => {
                                                      event_set => 0,
                                                      purpose => 'insert buffer thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '0'
                                                    },
                                             '1' => {
                                                      event_set => 0,
                                                      purpose => 'log thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '1'
                                                    },
                                             '2' => {
                                                      event_set => 0,
                                                      purpose => 'read thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '2'
                                                    },
                                             '3' => {
                                                      event_set => 0,
                                                      purpose => 'read thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '3'
                                                    },
                                             '4' => {
                                                      event_set => 0,
                                                      purpose => 'read thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '4'
                                                    },
                                             '5' => {
                                                      event_set => 0,
                                                      purpose => 'read thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '5'
                                                    },
                                             '6' => {
                                                      event_set => 0,
                                                      purpose => 'write thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '6'
                                                    },
                                             '7' => {
                                                      event_set => 0,
                                                      purpose => 'write thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '7'
                                                    },
                                             '8' => {
                                                      event_set => 0,
                                                      purpose => 'write thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '8'
                                                    },
                                             '9' => {
                                                      event_set => 0,
                                                      purpose => 'write thread',
                                                      state => 'waiting for i/o request',
                                                      thread => '9'
                                                    }
                                           },
                                writes_s => '0.00'
                              },
                        lg => {
                                complete => 1,
                                last_chkp => '139205502',
                                log_flushed_to => '139205502',
                                log_ios_done => '12',
                                log_ios_s => '0.00',
                                log_seq_no => '139205502',
                                pending_chkp_writes => '0',
                                pending_log_writes => '0'
                              },
                        ro => {
                                complete => 0,
                                del_sec => '0.00',
                                ins_sec => '0.00',
                                main_thread_id => '4496171008',
                                main_thread_proc_no => 0,
                                main_thread_state => 'waiting for server activity',
                                n_reserved_extents => 0,
                                num_rows_del => '0',
                                num_rows_ins => '0',
                                num_rows_read => '12',
                                num_rows_upd => '0',
                                queries_in_queue => '0',
                                queries_inside => '0',
                                read_sec => '0.30',
                                read_views_open => '1',
                                upd_sec => '0.00'
                              },
                        sm => {
                                complete => 1,
                                mutex_os_waits => '0',
                                mutex_spin_rounds => '0',
                                mutex_spin_waits => '0',
                                reservation_count => '4',
                                rw_excl_os_waits => undef,
                                rw_excl_spins => undef,
                                rw_shared_os_waits => undef,
                                rw_shared_spins => undef,
                                signal_count => '4',
                                wait_array_size => 0,
                                waits => []
                              },
                        tx => {
                                complete => 1,
                                history_list_len => '105',
                                is_truncated => 0,
                                num_lock_structs => undef,
                                purge_done_for => '163A',
                                purge_undo_for => '0',
                                transactions => [
                                                  {
                                                    active_secs => 0,
                                                    has_read_view => 0,
                                                    heap_size => 0,
                                                    hostname => 'localhost',
                                                    ip => '',
                                                    lock_structs => 0,
                                                    lock_wait_status => '',
                                                    lock_wait_time => 0,
                                                    mysql_thread_id => '22',
                                                    os_thread_id => '4529143808',
                                                    proc_no => 0,
                                                    query_id => '339',
                                                    query_status => '',
                                                    query_text => '',
                                                    row_locks => 0,
                                                    tables_in_use => 0,
                                                    tables_locked => 0,
                                                    thread_decl_inside => 0,
                                                    thread_status => '',
                                                    txn_doesnt_see_ge => '',
                                                    txn_id => '0',
                                                    txn_sees_lt => '',
                                                    txn_status => 'not started',
                                                    undo_log_entries => 0,
                                                    user => 'root'
                                                  },
                                                  {
                                                    active_secs => '13',
                                                    has_read_view => 0,
                                                    heap_size => '376',
                                                    hostname => 'localhost',
                                                    ip => '',
                                                    lock_structs => '2',
                                                    lock_wait_status => '',
                                                    lock_wait_time => 0,
                                                    locks => [
                                                               {
                                                                 db => 'test',
                                                                 index => '',
                                                                 insert_intention => 0,
                                                                 lock_mode => 'IX',
                                                                 lock_type => 'TABLE',
                                                                 n_bits => 0,
                                                                 page_no => 0,
                                                                 space_id => 0,
                                                                 special => '',
                                                                 table => 't1',
                                                                 txn_id => '1802',
                                                                 waiting => 0
                                                               },
                                                               {
                                                                 db => 'test',
                                                                 index => 'PRIMARY',
                                                                 insert_intention => 0,
                                                                 lock_mode => 'X',
                                                                 lock_type => 'RECORD',
                                                                 n_bits => '80',
                                                                 page_no => '386',
                                                                 space_id => 0,
                                                                 special => '',
                                                                 table => 't1',
                                                                 txn_id => '1802',
                                                                 waiting => 0
                                                               }
                                                             ],
                                                    mysql_thread_id => '21',
                                                    os_thread_id => '4528869376',
                                                    proc_no => 0,
                                                    query_id => '333',
                                                    query_status => '',
                                                    query_text => '',
                                                    row_locks => '7',
                                                    tables_in_use => 0,
                                                    tables_locked => 0,
                                                    thread_decl_inside => 0,
                                                    thread_status => '',
                                                    txn_doesnt_see_ge => '',
                                                    txn_id => '1802',
                                                    txn_sees_lt => '',
                                                    txn_status => 'ACTIVE',
                                                    undo_log_entries => 0,
                                                    user => 'root'
                                                  }
                                                ],
                                trx_id_counter => '1803'
                              }
                      },
          timestring => '2011-12-22 18:48:26',
          ts => [
                  2011,
                  12,
                  22,
                  18,
                  48,
                  26
                ]
        };

The output of that is 278 lines just for a single transaction holding a couple row locks. After digging around the output for a while, I came up with this script that provides a pretty flexible way to print out various "fields" related to transactions that satisfy particular criteria (note that here I'm only getting the output of the "tx" section, which would give 115 lines of output in the Data::Dumper example above):

#!/usr/bin/perl
use Data::Dumper;
require "innotop";
$Data::Dumper::Indent = 2;
my $innodb_parser = new InnoDBParser;

my @fields = (
        "txn_id",
        "active_secs",
        "lock_structs",
        "row_locks",
        "hostname"
);

my $contents;
{  
   local $INPUT_RECORD_SEPARATOR = undef;
   $contents = ;
}

my $innodb_status = $innodb_parser->parse_status_text(
   $contents,
   0,
   # Omit the following parameter to get all sections.
   {  tx => 1 },
);

#print Dumper $innodb_status; exit;

for my $tx (@{$innodb_status->{sections}->{tx}->{transactions}}) {
        if (
                $tx->{active_secs} > 0
        ) {
                print join( "\t",
                        map { $tx->{$_} } @fields
                ), "\n";
        }
}

I ran that against another status snippet that includes a couple different transactions each with some locks. Here's the output (along with what might be a helpful way to extract just a single status output from among a file filled with them):

$ grep -hn 'INNODB MONITOR' mysqld.err | tail -n 2
1703104:111222 19:16:12 INNODB MONITOR OUTPUT
1741522:END OF INNODB MONITOR OUTPUT

$ START=1703104; END=1741522; tail -n +$START < mysqld.err | head -n $(( $END - $START + 1 )) | perl txinfo.pl
1808    380     5       2003    localhost
1806    864     11      6206    localhost

Hey, that's pretty neat, we've actually got some nice info, somewhat programmatically and reliably retrieved from the InnoDB Lock Monitor. If you want to get additional fields from the parsed data structure, you can just add the names of the fields to the @fields array at the top of the program. I'll probably get carried away and add those as command-line options to this thing at some point. I'm looking forward to being able to use this the next time I've got almost 5 million lines of InnoDB status output to make sense of!


PlanetMySQL Voting: Vote UP / Vote DOWN

Optimizing InnoDB for creating 30,000 tables (and nothing else)

Декабрь 23rd, 2011

Once upon a time, it would have been considered madness to even attempt to create 30,000 tables in InnoDB. That time is now a memory. We have customers with a lot more tables than a mere 30,000. There have historically been no tests for anything near this many tables in the MySQL test suite.

So, in fleshing out the test cases for this and innodb_dict_size_limit I was left with the not so awesome task of making the test case run in remotely reasonable time. The test case itself is pretty simple, a simple loop in the not at all exciting mysqltest language that will create 30,000 identical tables, insert a row into each of them and then drop them.

Establishing the ground rules: I do not care about durability. This is a test case, not a production system holding important data which means I can lie, cheat and steal to get performance.

The simplest way is to use libeatmydata. This is a small shared library designed to be LD_PRELOADed that disables just about every way an application can write data safely to disk. This is perfect for running a test suite for a database server as if your machine crashes halfway through a test run, you’ll just start the test run again – you are not dealing with any important data. Obviously, you should never, ever, ever use libeatmydata with anything you care about; it is called libeat-my-data for a reason.

Without libeatmydata and using the default options for the MySQL test suite, I noticed that I was only creating about 10-15 tables every second which means we’d take a very long time to create 30,000. After a bit of time, it sped up to about 20-25 per second. Of course, with the default timeout for a MySQL test (15 minutes), we quickly (well.. in 15 minutes) hit that and the test is failed for taking too long.

With libeatmydata the test takes about 77 seconds – a huge improvement.

So how could I possibly get this test to run (even with –big-test option to mysql-test-run) in reasonable time? Well… I can set some InnoDB options! I’m going to try the obvious first: innodb-flush-method, sync-frm and innodb-flush-log-at-trx-commit. There is an undocumented option for innodb-flush-method called “nosync” that is meant to not flush data to disk. Since you could hear how much syncing to disk was going on during my test run, not syncing to disk all the time would get closer to the libeatmydata performance. I also want to disable syncing of the FRM file to disk and set log flushing to happen as infrequently as possible. With these options I started to get somewhere between 25-90 CREATE TABLE per second. This gets the test execution time down to 12 minutes, so that just escapes the timeout.

I then added the options of innodb-adaptive-checkpoint=0 and flush-neighbor-pages=0 in the hope of avoiding a bunch of background flushing (which called fsync). It didn’t help.

I noticed that there was an fsync() call when extending the data file, so I tried setting a higher innodb-autoextend-increment and a larger initial size. This also did not help.

So how fast is InnoDB under all of this? Am I hitting a fundamental limitation in InnoDB?

Well…. I went and wrote a program using HailDB – which is InnoDB as a shared library that you can call using an easy to use C API.

Writing a simple test program that creates 30,000 tables in a similar InnoDB configuration as default MySQL is pretty easy (easier than writing the mysqltest language that’s for sure). After a “I’m glad this isn’t a SSD” killer amount of fsync() activity, it took a total of 14.5 minutes. Not too bad. This is less than my initial test with MySQL, probably due to not writing and syncing FRM files. If I run the same program with libeatmydata, it only takes 15-20 seconds. Clearly it’s syncing things to disk that takes all the time.

If we make the HailDB program set flush_method to nosync and flush_log_at_trx_commit=2, the execution time is only 1 minute. This is much closer to the libeatmydata time than MySQL ever got.

With HailDB you can do more than one data dictionary operation in a single transaction. So if instead of setting flush_method and flush_log_at_trx_commit I instead group the CREATE TABLE into transactions of creating 100 tables at a time, I get an execution time of 3 minutes. This is a big difference to the original 14.5 minutes.

What’s the practical applications of all of this? Not much (unless you’re writing complex test cases) but it is clear that loading large amounts of DDL could be a lot faster than it is currently (although loading the data into tables is still going to take a while too).


PlanetMySQL Voting: Vote UP / Vote DOWN

Improving InnoDB memory usage

Декабрь 20th, 2011

Last month we did a few improvements in InnoDB memory usage. We solved a challenging issue about how InnoDB uses memory in certain places of the code.

The symptom of the issue was that under a certain workloads the memory used by InnoDB kept growing infinitely, until OOM killer kicked in. It looked like a memory leak, but Valgrind wasn’t reporting any leaks and the issue was not reproducible on FreeBSD – it only happened on Linux (see Bug#57480). Especially the latest fact lead us to think that there is something in the InnoDB memory usage pattern that reveals a nasty side of the otherwise good-natured Linux’s memory manager.

It turned out to be an interesting memory fragmentation caused by a storm of malloc/free calls of various sizes. We had to track and analyze each call to malloc during the workload, including the code path that lead to it. We collected a huge set of analysis data – some code paths were executed many 10’000s of times! A hurricane of allocations and deallocations! We looked at the hottest ones hoping that some of them are not necessary, can be eliminated, avoided, minimized or stuck together. Luckily there were plenty of them!

After an extensive testing we did a numerous improvements, allocating the smallest chunks of the memory from the stack instead of from the heap, grouping allocations together where possible, removing unnecessary allocations altogether, estimating exactly how much memory will be consumed by a given operation and allocating it in advance and others and others and others.

This not only fixed Bug#57480 but improved InnoDB memory usage in general.

Note: the fix is not in the 5.6.4 release.


PlanetMySQL Voting: Vote UP / Vote DOWN