Archive for the ‘crash’ Category

Nasty Regression Bug Seems Fixed in 5.5.18

Декабрь 1st, 2011

For those who saw my previous post about the crashing (regression) bug with SELECT COUNT(DISTINCT) on InnoDB with Primary Key (PK), you’ll be interested to know my test case does not crash in 5.5.18 (which was just released).

I’ve only tested my test case thus far, but it seems fine.

Unfortunately, the fix is not mentioned in the 5.5.18 changelogs though.

And there is no mention (yet, anyway) of a fix in the bug report I filed (though it was designated a ‘duplicate’, so it wouldn’t necessarily be updated).

I’m trying to get confirmation from the MySQL Dev Team on this (via the bug report), and will update this post if/when I hear anything.

I’ll also perform some of the other tests on my end to see if they all pass as well.

All in all, at least the initial results look very promising! :)


PlanetMySQL Voting: Vote UP / Vote DOWN

Nasty Regression Bug: SELECT COUNT(DISTINCT) crashes InnoDB when WHERE operand is in Primary Key or Unique Index

Октябрь 17th, 2011

In 5.5, a crashing, regression bug exists if you use SELECT COUNT(DISTINCT) *and* one of the WHERE operands is in the Primary Key (or just a unique index).

This simple crash (if only one row is in the table) will crash mysqld.

Of course I’ve filed a bug report, but that has been nearly 3 months and no updates yet.

Here is the bug I filed (which you won’t be able to view):

http://bugs.mysql.com/bug.php?id=61842

Really, the only thing that happened to my bug report was that it was designated a duplicate of another bug (which we also cannot view):

http://bugs.mysql.com/bug.php?id=61101

Based on the id, and the submitted dates of bugs 61100 and 61102, this initial bug (61101) was filed on May 9, 2011. So, in fact, this bug has been present for over 5 months, and not one breath of an update to the public!

Therefore, I felt it necessary to warn others about this bug, (or possibly you’ll run across this if you’re searching on SELECT COUNT(DISTINCT) in the future).

All I can say is please watch out for it!

It is extremely easy to reproduce:

CREATE TABLE t (a int(1), b int(1), PRIMARY KEY (a,b)) ENGINE=InnoDB;
INSERT INTO t VALUES (1, 1);
SELECT COUNT(DISTINCT a) FROM t WHERE b = 1;

–> crash <--

For those interested, this was filed against 5.5.14. However, with each new release, I've continued testing, and this bug is present in 5.5.15, 5.5.16, and thus far in 5.5.17 (built from the latest bzr tree).

Hopefully we don't go too many more months before this is finally fixed.

And for reference (and those searching on the stack trace / error log messages), here is my full error log snippet from 5.5.16:

111017 10:54:47 [Note] C:\Program Files\MySQL\MySQL Server 5.5\bin\mysqld: ready for connections.
Version: ’5.5.16′  socket: ”  port: 3308  MySQL Community Server (GPL)
 len 128; hex f8aec9037d803805f017fc03189ddc030000000…
111017 10:55:12  InnoDB: Assertion failure in thread 5000 in file btr0pcur.c line 236
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
111017 10:55:12 – mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=26214400
read_buffer_size=65536
max_used_connections=1
max_threads=100
thread_count=1
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 58325 K
bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.

Thread pointer: 0x3c98428
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
00CE92EC    mysqld.exe!btr_pcur_restore_position_func()[btr0pcur.c:236]
00CA62FB    mysqld.exe!sel_restore_position_for_mysql()[row0sel.c:3081]
00CA6CEA    mysqld.exe!row_search_for_mysql()[row0sel.c:3820]
00C5FE20    mysqld.exe!ha_innobase::general_fetch()[ha_innodb.cc:5918]
00C5FEDD    mysqld.exe!ha_innobase::index_next()[ha_innodb.cc:5956]
00C20DDA    mysqld.exe!index_next_different()[opt_range.cc:11038]
00C249BC    mysqld.exe!QUICK_GROUP_MIN_MAX_SELECT::next_prefix()[opt_range.cc:11099]
00C26BE7    mysqld.exe!QUICK_GROUP_MIN_MAX_SELECT::get_next()[opt_range.cc:10824]
00B68D01    mysqld.exe!rr_quick()[records.cc:344]
00BC1B9A    mysqld.exe!sub_select()[sql_select.cc:11723]
00BD10A7    mysqld.exe!do_select()[sql_select.cc:11483]
00BD37BD    mysqld.exe!JOIN::exec()[sql_select.cc:2370]
00BD3A29    mysqld.exe!mysql_select()[sql_select.cc:2581]
00BD3D4B    mysqld.exe!handle_select()[sql_select.cc:297]
00ACD76E    mysqld.exe!execute_sqlcom_select()[sql_parse.cc:4511]
00ACF816    mysqld.exe!mysql_execute_command()[sql_parse.cc:2118]
00AD2D1F    mysqld.exe!mysql_parse()[sql_parse.cc:5548]
00AD3848    mysqld.exe!dispatch_command()[sql_parse.cc:1037]
00AD43BB    mysqld.exe!do_command()[sql_parse.cc:771]
00AF2DB6    mysqld.exe!do_handle_one_connection()[sql_connect.cc:789]
00AF2F44    mysqld.exe!handle_one_connection()[sql_connect.cc:708]
00C33DE4    mysqld.exe!pthread_start()[my_winthread.c:61]
00D9C6F3    mysqld.exe!_callthreadstartex()[threadex.c:348]
00D9C79B    mysqld.exe!_threadstartex()[threadex.c:326]
765F3823    kernel32.dll!BaseThreadInitThunk()
77CAA9BD    ntdll.dll!LdrInitializeThunk()

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (03DC0F10): SELECT COUNT(DISTINCT a) FROM t WHERE b = 1
Connection ID (thread ID): 1
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
InnoDB: Thread 5980 stopped in file os0sync.c line 781
InnoDB: Thread 6820 stopped in file os0sync.c line 474
InnoDB: Thread 7532 stopped in file os0sync.c line 474

PlanetMySQL Voting: Vote UP / Vote DOWN

Unexpected mysqld crashing in 5.5

Ноябрь 22nd, 2010

An update of MySQL from 5.0 to 5.5 on CentOS 5.5 64bit has not resulted in a good experience. The mysqld process would then crash every few minutes with the following message.

101120 8:29:27 InnoDB: Operating system error number 22 in a file operation.
InnoDB: Error number 22 means ‘Invalid argument’.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html
InnoDB: File name /tmpfs/#sql6cf3_5c_0.ibd
InnoDB: File operation call: ‘aio write’.
InnoDB: Cannot continue operation.

The work around was to change the tmpdir=/tmpfs (which was a 16G tmpfs filesystem) to a physical disk.

The referenced URL didn’t provide any more information of help. Unlike Bug #26662 O_DIRECT is not specified as the flush method.


PlanetMySQL Voting: Vote UP / Vote DOWN

A Quick Review of Stack Traces

Сентябрь 29th, 2010
I'll try to pass on some basic knowledge about those confusing stack traces we sometimes see in the mysql error logs.  What can you tell from them, what are they useful for, and how to validate them?

Debugging Crashes

We tried to improve postmortem debugging of crashes + stack traces in the error log:
o) old versions of mysqld only printed numerical numbers instead of function names (if you're lucky!)
o) some platforms/architectures printed no stack trace what-so-ever!
o) faulty implementations of the crash error reporting and running query.

So, the result is that on modern supported platforms and recent versions of mysqld, you should get a useful stack trace.  A simple example from 5.5.5 on Windows:

mysqld.exe!mysql_admin_table()[sql_table.cc:5150]
mysqld.exe!mysql_optimize_table()[sql_table.cc:5226]
mysqld.exe!mysql_execute_command()[sql_parse.cc:3107]
mysqld.exe!mysql_parse()[sql_parse.cc:5911]
mysqld.exe!dispatch_command()[sql_parse.cc:1138]
mysqld.exe!do_command()[sql_parse.cc:807]
mysqld.exe!do_handle_one_connection()[sql_connect.cc:1196]
mysqld.exe!handle_one_connection()[sql_connect.cc:1136]
mysqld.exe!pthread_start()[my_winthread.c:62]
mysqld.exe!_callthreadstartex()[threadex.c:348]
mysqld.exe!_threadstartex()[threadex.c:331]
kernel32.dll!BaseThreadStart()

It should be obvious that OPTIMIZE TABLE crashed here, since the function name is clear.
The reason I always use windows stack traces in bug reports is because they are readable.  Most GDB and linux error log stack traces are not readable by the human brain at a glance, and therefore not memorable.  One reason is excessive wrapping, another reason is offsets and arguments to the functions are irrelevant and not useful for search engine indexes or the average Joe trying to find a bug report matching a stack trace.

In the old days, a crash only this printed (only 32-bit), and you had to resolve it yourself:

0x81a0705
0x8407eef
0x8408a8f
0x8408b75
0x8408dd2
0x8409af7
0x83e7fc9
0x83eefc3
0x826014c
0x826035c
0x825ec43
0x82106fd
0x81b86da
0x81bd78b
0x81bdd16
0x81bfae2

Which, on a side note is not always possible to do properly with optimized binaries so you get a partially bogus looking stack and you might be tempted to wrongly suspect faulty hardware, foul play, or bad binaries:

0x81a0705 handle_segfault + 805
0x8407eef my_write + 671
0x8408a8f init_key_cache + 1279
0x8408b75 init_key_cache + 1509
0x8408dd2 init_key_cache + 2114

0x8409af7 flush_key_blocks + 55
0x83e7fc9 flush_blocks + 41
0x83eefc3 mi_repair_by_sort + 451
0x826014c ha_myisam::repair(THD*, st_mi_check_param&, bool) + 1900
0x826035c ha_myisam::enable_indexes(unsigned int) + 364
0x825ec43 ha_myisam::end_bulk_insert() + 99
0x82106fd mysql_insert(THD*, TABLE_LIST*, List&, List >&, List&, List&, enum_duplicates, bool) + 5037
0x81b86da mysql_execute_command(THD*) + 9610
0x81bd78b mysql_parse(THD*, char const*, unsigned int, char const**) + 379
0x81bdd16 dispatch_command(enum_server_command, THD*, char*, unsigned int) + 1238
0x81bfae2 handle_one_connection + 2578

Using a technique that involves disassembling the mysqld binary into ASM and piecing together C/C++ source code/comments, it's quite possible to find those inlined functions, expanded macros, or functions that have no name in the symbols file.  A nice topic for another posting.  So, after manually inspecting the binary + numeric offsets, I could get a proper stack trace:

0x81a0705 handle_segfault
0x8407eef unlink_block
0x8408a8f free_block
0x8408b75 flush_cached_blocks
0x8408dd2 flush_key_blocks_int

0x8409af7 flush_key_blocks
0x83e7fc9 flush_blocks
0x83eefc3 mi_repair_by_sort
0x826014c ha_myisam::repair
0x826035c ha_myisam::enable_indexes
0x825ec43 ha_myisam::end_bulk_insert
0x82106fd mysql_insert
0x81b86da mysql_execute_command
0x81bd78b mysql_parse
0x81bdd16 dispatch_command
0x81bfae2 handle_one_connection

This is simply a bulk insert performing a 'repair by sort'.  It crashed in the keycache when flushing blocks, perhaps due to a memory corruption or overrun of something.  I remember fulltext indexes or large table having this problem..

Some Identifying Elements of a Stack Trace
If your 5.1. or 5.5. server ever crashes, please keep the stack trace as it can help identify exactly what the problem is, and you can search google for clues.  
  • prepared statements
Easily distinguishable by looking for the functions similar to this:
.....
mysqld-debug.exe!Prepared_statement::execute()[sql_prepare.cc:3050]
mysqld-debug.exe!mysql_sql_stmt_execute()[sql_prepare.cc:2393]
mysqld-debug.exe!mysql_execute_command()[sql_parse.cc:2935]
.....
  • stored routines and their call depth
Seeing sp_* is a sign of some stored routine activity  You can even see how many SP calls there are nested, whether they called triggers.  Takes some intuition to follow.

.......
06 mysqld_debug!sp_instr_stmt::exec_core
07 mysqld_debug!sp_lex_keeper::reset_lex_and_exec_core
08 mysqld_debug!sp_instr_stmt::execute
09 mysqld_debug!sp_head::execute
0a mysqld_debug!sp_head::execute_procedure
0b mysqld_debug!mysql_execute_command
.......
  • storage engine code (archive, innodb, myisam, merge)
InnoDB mostly asserts, and this is clearly identified in the error log before a stack trace.
"070223 21:47:40  InnoDB: Assertion failure in thread 1655241648 in file row0mysql.c line 3228"

Archive crashes can be easily seen by ha_archive functions:
.....
mysqld.exe!ha_archive::free_share()[ha_archive.cc:411]
mysqld.exe!ha_archive::open()[ha_archive.cc:498]
mysqld.exe!handler::ha_open()[handler.cc:2059]
.....
  • query cache
Any thing involving the Query_Cache class functions:
.....
#5 0x000000000065f390 in Query_cache::insert_table ()
#6 0x000000000065f63a in Query_cache::register_tables_from_list ()
#7 0x000000000065f6a5 in Query_cache::register_all_tables ()
#8 0x000000000065fc0a in Query_cache::store_query ()
#9 0x000000000058cd38 in mysql_execute_command ()
.....
  • mysql functions (string, math, datetime, comparative)
Look out for specific Item_func* methods...
.....
mysqld.exe!Arg_comparator::compare_binary_string()[item_cmpfunc.cc:1158]
mysqld.exe!Item_func_eq::val_int()[item_cmpfunc.cc:1692]
mysqld.exe!Item::val_bool()[item.cc:184]
mysqld.exe!Item_cond_and::val_int()[item_cmpfunc.cc:4222]
.....
  • first calling function in the application
You can nearly always expect a valid stack trace to have these functions at the bottom:
.....
mysqld-debug.exe!mysql_execute_command()[sql_parse.cc:2256]
mysqld-debug.exe!mysql_parse()[sql_parse.cc:5974]
mysqld-debug.exe!dispatch_command()[sql_parse.cc:1233]
mysqld-debug.exe!do_command()[sql_parse.cc:872]
mysqld-debug.exe!handle_one_connection()[sql_connect.cc:1127]
  • last calling functions before the crash
Crashing is usually handled by mysqld's segfault handler.  It depends on the OS and environment.  Most of the time you'll have:

mysqld-debug.exe!my_sigabrt_handler()[mysqld.cc:2048]
mysqld-debug.exe!raise()[winsig.c:597]
mysqld-debug.exe!abort()[abort.c:78]
......

or

0   mysqld   0x00579d3e my_print_stacktrace + 44
1 mysqld 0x00100f78 handle_segfault + 836
......


Debugging Hangs

When mysqld hangs or flatlines the CPU and logging in or killing queries doesn't help, you'd better either create a corefile,  break into the process with a debugger, or just use the PMP.

You'll probably need stack trace of all the threads to determine what is going on:

  1. A single thread is looping endlessly in some loop
  2. Multiple threads are hitting a hot mutex, or totally deadlocked, waiting for each other.


If the deadlock is in innodb you often get useful innodb outputs in the error log
for each waiting thread. But it can be extremely helpful to get full stack traces too.

--Thread 3003468656 has waited at fsp/fsp0fsp.c line 2204 for 556.00 seconds the
semaphore:
X-lock on RW-latch at 0xb759ceb0 created in file fil/fil0fil.c line 1061
That's all for now.!

PlanetMySQL Voting: Vote UP / Vote DOWN

How to crash mysqld intentionally

Март 5th, 2010

While some may think I’m daft, I have a legitimate reason for wanting to crash mysqld. However first we need to find a way to crash it.

Great thanks to Alan K, Mark L, Harrison and Hartmut on #mysql-dev for several suggestions and a config option I was unaware of. My investigation even lead to a documentation bug logged as #51739.

My first thought was to find a known bug and if necessary install the correct version to test that. A good one was suggested, Bug #48508 which fails on several versions that I will use to demonstrate with, however the simplest way is to issue kill -11

By default, no core file will be produced which is what I’m seeking but with the right options this is possible. First, the user running mysqld probably has a core file limit size of 0.

$ ulimit -c
0

You can fix this with with ulimit or you can specify this in the [mysqld_safe] section with core-file-size=unlimited

$ ulimit -c unlimited
$ ulimit -c
unlimited

The option I was not aware of is you also have to also specify core-file in your my.cnf

[mysqld]
core-file

I also for my CentOS 5.4 installation ran the following kernel commands, but this may be unnecessary.

sudo /sbin/sysctl -w kernel.core_pattern="core"
sudo /sbin/sysctl -w fs.suid_dumpable= 1

It is now easy to produce a core file.

$ bin/mysqld_safe &
$ killall -11 mysqld
$ bin/mysqld_safe: line 137:  2656 Segmentation fault      (core dumped) ...
100304 16:46:43 mysqld_safe Number of processes running now: 0
100304 16:46:43 mysqld_safe mysqld restarted
$ find . -name "core*"
./data/core.99999

NOTE: Do no run killall on a multi-instance server. I use this syntax here only for simplicity in presentation. It is best to run ps and kill the appropriate pid.

On a side note, I also tried to produce a core on Mac OS X without success. I’d still like to document that way, so if anybody can assist please ping me.


PlanetMySQL Voting: Vote UP / Vote DOWN

How to tell when using INFORMATION_SCHEMA might crash your database

Февраль 17th, 2010

There are those that are very adamant about letting people know that using INFORMATION_SCHEMA can crash your database. For example, in making changes to many tables at once Baron writes:

“querying the INFORMATION_SCHEMA database on MySQL can completely lock a busy server for a long time. It can even crash it. It is very dangerous.”

Though Baron is telling the truth here, he left out one extremely important piece of information: you can actually figure out how dangerous your INFORMATION_SCHEMA query will be, ahead of time, using EXPLAIN.


In MySQL 5.1.21 and higher, not only were optimizations made to the INFORMATION_SCHEMA, but new values were added so that EXPLAIN had better visibility into what MySQL is actually doing. As per http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html there are 6 new “Extra” values for EXPLAIN that are used only for INFORMATION_SCHEMA queries.

The first 2 “Extra” values for EXPLAIN are mostly self-explanatory:
Scanned 1 database – Only one database directory needs to be scanned.
Scanned all databases – All database directories are scanned. This is more dangerous than only scanning one database.

Note that there is no middle ground — there is no optimization to only scan 2 databases; either all database directories are scanned, or only one is. If your query spans more than one database, then all database directories are scanned. Note that this

SHOW statements are less dangerous than using INFORMATION_SCHEMA because they only use one database at a time. If you have an INFORMATION_SCHEMA query that produces an “Extra” value of “Scanned 1 database”, it is just as safe as a SHOW statement.

The optimizations went even further, though. From the most “dangerous” — ie, resource intensive — to the least, here are the other 4 “Extra” values introduced in MySQL 5.1.21 (which, for the record, came out in August 2007, so it is a feature that has been around for 2.5 years at this point):

Open_full_table
Open_trigger_only
Open_frm_only
Skip_open_table

A bit more explanation, and some examples:

Open_full_table – Needs to open all the metadata, including the tables format file (.frm) and data/index files such as .MYD and .MYI. The previously linked to manual page about the optimization includes which information will show each “Extra” type — for example, the AUTO_INCREMENT and DATA_LENGTH fields of the TABLES table require opening all the metadata.

mysql> EXPLAIN SELECT TABLE_SCHEMA,TABLE_NAME,AUTO_INCREMENT FROM TABLES\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TABLES
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Open_full_table; Scanned all databases
1 row in set (0.00 sec)

Let’s see an example that only scans 1 database:

mysql> EXPLAIN TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'\G
ERROR 1109 (42S02): Unknown table 'TABLE_NAME' in information_schema
mysql> EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TABLES
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_full_table; Scanned 1 database
1 row in set (0.00 sec)

Note that “Scanned all databases” will apply if there is any way there could be more than one database. For example, on my test server, only the ‘test’ and ’sakila’ databases exist (other than ‘mysql’ and ‘INFORMATION_SCHEMA’ of course) and yet when I do

EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA LIKE 'test%'\G

I still get “Scanned all databases”. So be careful.

One of the basic pieces of advice I see to optimize queries can be applied to queries on the INFORMATION_SCHEMA — Do not use SELECT * unless you actually want to get every single piece of information. In the case of INFORMATION_SCHEMA, optimizing your queries can mean the difference between the server crashing and the server staying up.

Open_trigger_only – Only the .TRG file needs to be opened. Interestingly enough, this does not seem to have an example that applies. The manual page says that the TRIGGERS table uses Open_full_table for all fields. When I tested it, though, I did not get anything in the “Extra” field at all — not “Open_trigger_only” and not even “Open_full_table”:

mysql> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 5.1.37-1ubuntu5-log |
+---------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM TRIGGERS\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TRIGGERS
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra:
1 row in set (0.00 sec)

Open_frm_only – Only the format file (.frm) of the table needs to be open. Again, check the manual page for the fields that can use this optimization — fields such as CREATE_OPTIONS and ENGINE in the TABLES table do, for example.

Skip_open_table – This is the last new “Extra” feature, and it is the best. This optimization type means that no files need to be opened. The database directories are scanned and information can be obtained — mostly the table name, so when querying only the TABLE_NAME and TABLE_SCHEMA fields from the TABLES table, your query is safe.

So instead of putting your head in the sand and never using the great tool that is the INFORMATION_SCHEMA, first EXPLAIN your query to see if it will work or not.

(Note, if you are still on MySQL 5.0, what are you waiting for? The upgrade to MySQL 5.1 is relatively painless, and Pythian has a comprehensive checklist for how to upgrade while keeping your sanity).


PlanetMySQL Voting: Vote UP / Vote DOWN

thread_stack_size in my.cnf

Октябрь 30th, 2009

Many configs have thread_stack_size configured explicitly, but that can cause rather bad trouble:

  • if the stack inside a thread it’s too small, you can get segfault crashes (stack overflow, essentially). Particularly on 64-bit.
  • if the stack is too large, your system cannot handle as many connections since it all eats RAM.

Let mysqld sort it out, on startup it does a calculation based on the CPU architecture, and that’s actually the most sensible. So for almost all setups, remove any thread_stack_size=… line you might have in my.cnf.


PlanetMySQL Voting: Vote UP / Vote DOWN

Calculating your database size

Сентябрь 25th, 2009

I generally use the following MySQL INFORMATION_SCHEMA (I_S) query to Calculate Your MySQL Database Size. This query and most others that access the MySQL INFORMATION_SCHEMA can be very slow to execute because they are not real tables and are not governed by physical data, memory buffers and indexes for example but rather internal MySQL data structures.

Mark Leith indicates in his post on innodb_stats_on_metadata that Innodb performs 8 random(ish) dives in to the index, when anybody accesses any of SHOW TABLE STATUS, SHOW INDEX, INFORMATION_SCHEMA.TABLES,INFORMATION_SCHEMA.STATISTICS for InnoDB tables. This can have an effect on performance, especially with a large number of Innodb tables, and a poor ratio of innodb_buffer_pool_size to disk data+index footprint.

What is even more incredible is when the result of this apparently harmless query causes the mysqld process to actual crash with a core dump due to these random index dives. The following core dump analysis highlights my query as the cause of the problem. This has happened now at least twice in for recent core crashes on a production environment.

(gdb) bt
#0 0x000000327280b6b2 in pthread_kill () from ./lib64/libpthread.so.0
#1 0x000000000055b136 in handle_segfault ()
#2 
#3 0x00000000007e1c21 in rec_get_offsets_func ()
#4 0x0000000000766007 in btr_estimate_number_of_different_key_vals ()
#5 0x000000000070d4c2 in dict_update_statistics_low ()
#6 0x000000000061fa84 in ha_innobase::info ()
#7 0x0000000000636972 in fill_schema_charsets ()
#8 0x0000000000639a66 in get_all_tables ()
#9 0x0000000000634633 in get_schema_tables_result ()
#10 0x00000000005bde37 in JOIN::exec ()
#11 0x00000000005bf7a7 in mysql_select ()
#12 0x00000000005c0127 in handle_select ()
#13 0x000000000056fcf0 in mysql_execute_command ()
#14 0x0000000000574c83 in mysql_parse ()
#15 0x00000000005751a0 in dispatch_command ()
#16 0x0000000000576483 in do_command ()
#17 0x0000000000577002 in handle_one_connection ()
#18 0x0000003272806367 in start_thread () from ./lib64/libpthread.so.0
#19 0x0000003271cd30ad in clone () from ./lib64/libc.so.6
Cannot access memory at address 0x3271cd3040 

This is an information_schema query that caused innodb to open a table.
This is totally normal. On first open, innodb tables get automatically  analyzed.
This analyze process crashed in innodb. 

This exact query *provoked* a crash: 

(gdb) x/1s 0x00002aaabc961dd0
0x2aaabc961dd0: "SELECT table_schema,table_name,engine,row_format,
table_rows, avg_row_length,
(data_length+index_length)/1024/1024 as total_mb,
(data_length)/1024/1024 as data_mb,
(index_length)/1024/1024 as index_mb,
CURDATE() AS today
FROM information_schema.tables
WHERE table_schema=@schema
ORDER BY 7 DESC"

The issue however is which table is the problem? How widespread is the corruption. Would an ALTER TABLE ENGINE=Innodb rebuild the table and eliminate the problem. Would an ANALYZE on an Innodb table identify the problem? (I doubt this second point). The problem however is even more significant due to the actual system. The largest single table of this 1TB database is 500GB. The impact of performing the ALTER, the time to undertake this blocking operation, the increase in the Innodb data file that can’t be reclaimed are just two factors that the inexperienced may fall victim of.

A saying I use is “Disaster is inevitable”. In this situation the disaster appears to not be significant but the ramifications due to the lack of appropriate and expert architectural design considerations to correct the problem are.

Is your environment capable of supporting this maintenance requirement? If not, then is the decision maker in your organization worried enough to seek the expert advice to address pro actively or will it be too late.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Workbench on Snow Leopard

Август 31st, 2009

As all you Mac users (and probably many non-Macies) know Apple released Snow Leopard (Mac OS X 10.6) recently, even though this release was announced for September previously. Since a large part of the MySQL Workbench user base works on OS X it was clear that many of you will test Workbench on the new OS. However, so far we haven’t had the opportunity to do the same (too busy fixing bugs on released OSes) and hence we did not know about incompatibility problems there.

In the meantime several users did tests and reported us a crash on startup of the application, which means you cannot use MySQL Workbench on Snow Leopard for the time being. We are currently preparing build, test and developer machines with it and will hand out a fixed WB release as soon as possible. So, please stay patient. It’s only a matter of days.


PlanetMySQL Voting: Vote UP / Vote DOWN

Will your production MySQL server survive a restart?

Август 30th, 2009
Do you know if your production MySQL servers will come back up when restarted? A recent support episode illustrates a number of best practices. The task looked trivial: Update a production MySQL server (replication master) with a configuration tuned and tested on a development server. Clean shutdown, change configuration, restart. Unfortunately, the MySQL daemon did not just ‘come back’, leaving 2 sites offline. Thus begins an illuminating debugging story.
First place to look is the daemon error log, which revealed that the server was segfaulting, seemingly at the end of or just after InnoDB recovery. Reverting to the previous configuration did not help, nor did changing the InnoDB recovery mode. Working with the client, we performed a failover to a replication slave, while I got a second opinion from a fellow engineer to work out what had gone wrong on the server.
Since debug symbols weren’t shown in the stack trace, we needed to generate a symbol file (binary was unstripped) to use with the resolve_stack_dump utility. The procedure for obtaining this is detailed in the MySQL manual. With a good stack trace in hand, we were able (with assistance from an old friend, thanks Dean!) to narrow the crash down to bug 38856 (also see 37027). A little further investigation showed that the right conditions did exist to trigger this bug:
  • expire_logs_days = 14 # had been set in the my.cnf
  • the binlog.index file did not match the actual state of log files (i.e. some had been manually deleted, or deleted by a script)
So with this knowledge, it was possible to bring the MySQL server back up. It turned out that the expire_logs_days had perhaps been added to the configuration but not tested at the time (the server had not been restarted for 3 months). This had placed the system in a state, unbeknownst to the administrators, where it would not come back up after a restart. It was an interesting (if a tad stressful) incident as it shows the reasons for many best practices – which most of us know and follow – but worth re-capping here.
  • even seemingly trivial maintenance can potentially trigger downtime
  • plan any production maintenance in the quiet zone, and be sure to allow enough time to deal with the unforeseen
  • don’t assume your live server will ‘just restart’
  • put my.cnf under revision control (check out “etckeeper”, a standard Ubuntu package; it can keep track of everything in /etc using bzr, svn or git)
  • do not make un-tested changes to config, test immediately, preferably on dev or staging system
  • be ready to failover (test regularly like a fire drill); this is another reason why master-master setups are more convenient than mere master-slave
  • replication alone is NOT a backup
  • don’t remove binlogs or otherwise touch anything in data dir behind mysql’s back
  • have only 1 admin per server so you don’t step on each other’s toes (but share credentials with 2IC for emergencies only)
  • use a trusted origin for your binary packages, just building and passing the basis test-suite is not always sufficient
  • know how to get a good stack trace with symbols, to help find bug reports
  • be familiar with bugs.mysql.com, but it still helps to ask others as they might have just seen something similar and can help you quickly find what you’re looking for!
  • and last but very important: it really pays to find the root cause to a problem (and prevention requires it!), so a “postmortum” on a dead server is very important… if we had just wiped that server, the problem might have reoccurred with another server later.

PlanetMySQL Voting: Vote UP / Vote DOWN