Archive for the ‘usability’ Category

Testing new builds with MySQL-Sandbox 3.0.24

Декабрь 19th, 2011
MySQL::Sandbox 3.0.24 was released yesterday, with many new features.

More than vanilla MySQL

If you have missed my previous announcement, here's the gist of it. MySQL Sandbox can now deal with tarballs from either Percona Server or MariaDB. The main difference after this change is that you can now create a directory called <PREFIX>5.5.16 and make_sandbox will recognize it as well as the plain 5.5.16.
$ make_sandbox --export_binaries --add_prefix=ps \
Percona-Server-5.5.11-rel20.2-114.Darwin.i386.tar.gz \
-- --sandbox_directory=msb_ps5_5_11

unpacking Percona-Server-5.5.11-rel20.2-114.Darwin.i386.tar.gz
[…]
installing with the following parameters:
upper_directory = /Users/gmax/sandboxes
sandbox_directory = msb_ps5_5_11
[…]
basedir = $HOME/opt/mysql/ps5.5.11
tmpdir =
[…]
Your sandbox server was installed in $HOME/sandboxes/msb_ps5_5_11
After the binary export, subsequent installations will be easier:
$ make_sandbox ps5.5.11
The same commands can be used for MariaDB. At the moment, make_sandbox does not recognize other packages, but adding them should not be a big deal, provided that such packages look like MySQL. It wouldn't work with Drizzle, because it lacks the main ingredients for MySQL installation.

High Performance sandboxes

While testing parallel replication and prefetch slaves with Tungsten Replicator, I realized that I was doing too much manual fiddling with my scripts. Since I need more performant servers, I added the basic items that I need to modify to enable a faster server. Now, using the '--high_performance' option with make_sandbox, you get a server that is much better than out-of-the-box MySQL. To avoid problems with too much RAM, I am using a default of 512 MB for InnoDB, which is not enough for really demanding tests, but at least it is a good placeholder in the sandbox configuration file, should you need to modify it.
$ make_sandbox 5.1.60 -- --high_performance
[…]
innodb-flush-method=O_DIRECT ; \
innodb-log-file-size=50M ; \
innodb_buffer_pool_size=512M ; \
max_allowed_packet=48M ; \
max-connections=350 ; \
innodb-additional-mem-pool-size=50M ; \
innodb-log-buffer-size=50M ; sync_binlog=0 ; \
innodb-thread-concurrency=0 ; log-error=msandbox.err
[…]

Standalone masters and slaves

MySQL Sandbox has had the ability of creating replicated systems for years. Yet, sometimes you need a stand-alone master server that you want to use for some odd experiment. Similarly, you may want to create a slave of a specific master without having a full replication system. One case where you would like this ability is when you want to try replicating between servers of different versions.
$ make_sandbox 5.1.57 -- --master
[…]
my_clause = server-id=5157 ; log-bin=mysql-bin ; log-error=msandbox.err
[…]
Your sandbox server was installed in $HOME/sandboxes/msb_5_1_57

$ make_sandbox 5.5.10 -- --slaveof='master_port=5157'
[…]
my_clause = server-id=5510 ; log-bin=mysql-bin ; log-error=msandbox.err
[…]
Your sandbox server was installed in $HOME/sandboxes/msb_5_5_10


$ ~/sandboxes/msb_5_1_57/use -e 'show master status'
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 | 106 | | |
+------------------+----------+--------------+------------------+

$ ~/sandboxes/msb_5_5_10/use -e 'show slave status\G'
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 5157
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 106
Relay_Log_File: mysql_sandbox5510-relay-bin.000002
Relay_Log_Pos: 252
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 106
Relay_Log_Space: 420
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 5157
You can download MySQL::Sandbox from either launchpad or CPAN.

PlanetMySQL Voting: Vote UP / Vote DOWN

Upgrading Tungsten Replicator: as easy as …

Сентябрь 23rd, 2011
When I talked about the usability improvements of Tungsten Replicator, I did not mention the procedure for upgrading.I was reminded about it by a question in the TR mailing list, and since the question was very relevant, I updated the Tungsten Cookbook with some quick upgrading instructions.A quick upgrading procedure is as important as the installer. Since we release software quite often, either because we have scheduled features to release or because of bug fixes, users want to apply a new release to an existing installation without much fuss. You can do the upgrade with a very quick and painless procedure.Let's suppose that you have installed one Tungsten Replicator cluster using this command:

#
# using tungsten-replicator 2.0.4
#
TUNGSTEN_HOME=/home/tungsten/installs/master_slave
./tools/tungsten-installer \
--master-slave \
--master-host=r1 \
--datasource-user=tungsten \
--datasource-password=secret \
--service-name=dragon \
--home-directory=$TUNGSTEN_HOME \
--cluster-hosts=r1,r2,r3,r4 \
--start-and-report
If you want to upgrade to the very latest Tungsten Replicator 2.0.5, build 321, this is what you need to do.
  • Get the latest tarball, and expand it;
  • Stop the replicator;
  • Run the update command (this will also restart the replicator)
  • Check that the replicator is running again.
The actual upgrade command is in bold in the following script.

#
# using tungsten-replicator 2.0.5-321 (get it from bit.ly/tr20_builds)
#
TUNGSTEN_HOME=/home/tungsten/installs/master_slave
HOSTS=(r1 r2 r3 r4)
for HOST in ${HOSTS[*]}
do
ssh $HOST $TUNGSTEN_HOME/tungsten/tungsten-replicator/bin/replicator stop
./tools/update --host=$HOST --user=tungsten --release-directory=$TUNGSTEN_HOME -q
$$TUNGSTEN_HOME/tungsten/tungsten-replicator/bin/trepctl -host $HOST services
done
One benefit of this procedure, in addition to being brief and effective, is that the previous binaries are preserved.Before the upgrade, you will see:

$ ls -lh ~/installs/master_slave/ ~/installs/master_slave/releases
/home/tungsten/installs/master_slave/:
total 32K
drwxrwxr-x 3 tungsten tungsten 4.0K Sep 22 22:03 backups
drwxrwxr-x 2 tungsten tungsten 4.0K Sep 22 22:03 configs
drwxrwxr-x 3 tungsten tungsten 4.0K Sep 22 22:03 relay
drwxrwxr-x 4 tungsten tungsten 4.0K Sep 22 22:06 releases
drwxrwxr-x 2 tungsten tungsten 4.0K Sep 22 22:03 service-logs
drwxrwxr-x 2 tungsten tungsten 4.0K Sep 22 22:03 share
drwxrwxr-x 3 tungsten tungsten 4.0K Sep 22 22:03 thl
lrwxrwxrwx 1 tungsten tungsten 75 Sep 22 22:06 tungsten -> /home/tungsten/installs/master_slave/releases/tungsten-replicator-2.0.4
/home/tungsten/installs/master_slave/releases:
total 8.0K
drwxr-xr-x 6 tungsten tungsten 4.0K Sep 22 22:03 tungsten-replicator-2.0.4
The 'tungsten' directory is a symlink to the actual binaries inside the 'releases' directory.After the upgrade, the same directory looks like this:

ls -lh ~/installs/master_slave/ ~/installs/master_slave/releases
/home/tungsten/installs/master_slave/:
total 32K
drwxrwxr-x 3 tungsten tungsten 4.0K Sep 22 22:03 backups
drwxrwxr-x 2 tungsten tungsten 4.0K Sep 22 22:03 configs
drwxrwxr-x 3 tungsten tungsten 4.0K Sep 22 22:03 relay
drwxrwxr-x 4 tungsten tungsten 4.0K Sep 22 22:06 releases
drwxrwxr-x 2 tungsten tungsten 4.0K Sep 22 22:03 service-logs
drwxrwxr-x 2 tungsten tungsten 4.0K Sep 22 22:03 share
drwxrwxr-x 3 tungsten tungsten 4.0K Sep 22 22:03 thl
lrwxrwxrwx 1 tungsten tungsten 75 Sep 22 22:06 tungsten -> /home/tungsten/installs/master_slave/releases/tungsten-replicator-2.0.5-321

/home/tungsten/installs/master_slave/releases:
total 8.0K
drwxr-xr-x 6 tungsten tungsten 4.0K Sep 22 22:03 tungsten-replicator-2.0.4
drwxr-xr-x 6 tungsten tungsten 4.0K Sep 22 22:06 tungsten-replicator-2.0.5-321
If you did some manual change to the files in 2.0.4, you will be able to retrieve them. Upgrading from earlier versions of Tungsten Replicator is not as smooth. Since we changed the installation format, it has become incompatible from previous versions. Clusters running TR 2.0.3 need to be reinstalled manually. The next upgrade, though, will be much faster!

PlanetMySQL Voting: Vote UP / Vote DOWN

Usability improvements in Tungsten Replicator 2.0.4

Август 11th, 2011
If you love a software product, you should try to improve it, and not be afraid of criticizing it. This principle has guided me with MySQL (where I have submitted many usability bugs, and discussed interface with developers for years), and it proves true for Tungsten Replicator as well. When I started working at Continuent, while I was impressed by the technology, I found the installation procedure and the product logs quite discouraging. I would almost say disturbing. Fortunately, my colleagues have agreed on my usability focus, and we can enjoy some tangible improvements. I have already mentioned the new installation procedure, which requires just one command to install a full master/slave cluster. I would like to show how you can use the new installer to deploy a multiple source replication topology like the following: The first step is to install one master in each node. I can run the commands from node #4, which is the one that will eventually receive the updates from the remote masters, and where I need to install the slave services:
TUNGSTEN_BASE=$HOME/newinst

SERVICES=(alpha bravo charlie delta)
REPLICATOR=$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/replicator

for N in 1 2 3 4
do
INDEX=$(($N-1))

./tools/tungsten-installer \
--master-slave \
--master-host=qa.r$N.continuent.com \
--datasource-user=tungsten \
--datasource-password=secret \
--service-name=${SERVICES[$INDEX]} \
--home-directory=$TUNGSTEN_BASE \
--cluster-hosts=qa.r$N.continuent.com \
--start-and-report
done
The above loop will install a master (remotely or locally) in the four servers. Then I need to create the slave services. To do it, I use the updated configure-service in the tools directory.
TUNGSTEN_TOOLS=$TUNGSTEN_BASE/tungsten/tools

COMMON_OPTIONS='-C -q
--local-service-name=delta
--role=slave
--service-type=remote
--allow-bidi-unsafe=true
--datasource=qa_r4_continuent_com'

$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r1.continuent.com alpha
$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r2.continuent.com bravo
$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r3.continuent.com charlie

$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/replicator restart
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl services
These commands create the slave services locally in Delta. After restarting the replicator, a simple test will be creating something different in each master, and check that the data has replicated to the single slave. The latest improvement in matter of usability is the simplification of the replicator logs. Until a few days ago, if you had an error in the replicator, you would get a long list of not exactly helpful stuff. For example, if I create a table in a slave, and then create the same table in the master, I will break replication. The extended log would produce something like this:
INFO   | jvm 1    | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,216 [tsandbox - q-to-dbms-0] ERROR pipeline.SingleThreadStageTask Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master

INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,217 [tsandbox - Event dispatcher thread] ERROR management.OpenReplicatorManager Received error notification, shutting down services: Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
INFO | jvm 1 | 2011/08/11 18:10:52 | com.continuent.tungsten.replicator.applier.ApplierException: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:183)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:1233)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.java:101)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(SingleThreadStageTask.java:498)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThreadStageTask.java:155)
INFO | jvm 1 | 2011/08/11 18:10:52 | at java.lang.Thread.run(Unknown Source)
INFO | jvm 1 | 2011/08/11 18:10:52 | Caused by: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:139)
INFO | jvm 1 | 2011/08/11 18:10:52 | ... 5 more
INFO | jvm 1 | 2011/08/11 18:10:52 | Caused by: java.sql.SQLSyntaxErrorException: Table 't1' already exists
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:78)
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.DrizzleStatement.executeBatch(DrizzleStatement.java:930)
INFO | jvm 1 | 2011/08/11 18:10:52 | at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:125)
INFO | jvm 1 | 2011/08/11 18:10:52 | ... 5 more
INFO | jvm 1 | 2011/08/11 18:10:52 | Caused by: org.drizzle.jdbc.internal.common.QueryException: Table 't1' already exists
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.internal.mysql.MySQLProtocol.executeQuery(MySQLProtocol.java:500)
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.internal.mysql.MySQLProtocol.executeBatch(MySQLProtocol.java:546)
INFO | jvm 1 | 2011/08/11 18:10:52 | at org.drizzle.jdbc.DrizzleStatement.executeBatch(DrizzleStatement.java:917)
INFO | jvm 1 | 2011/08/11 18:10:52 | ... 6 more
INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,218 [tsandbox - Event dispatcher thread] WARN management.OpenReplicatorManager Performing emergency service shutdown
INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - Event dispatcher thread] INFO pipeline.Pipeline Shutting down pipeline: slave
INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - q-to-dbms-0] INFO pipeline.SingleThreadStageTask Terminating processing for stage task thread
INFO | jvm 1 | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - q-to-dbms-0] INFO pipeline.SingleThreadStageTask Last successfully processed event prior to termination: seqno=0 eventid=mysql-bin.000002:0000000000000426;20
Did you see the reason for the error? No? Neither did I. I would need to open the THL, look for event #1, and determine what it was. Instead, the new user.log looks like this:
2011-08-11 18:10:52,216 ERROR Received error notification: Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master

Caused by : java.sql.SQLException: Statement failed on slave but succeeded on master
Caused by : Statement failed on slave but succeeded on master
Caused by : Table 't1' already exists
Caused by : Table 't1' already exists
2011-08-11 18:10:54,721 INFO State changed ONLINE -> OFFLINE:ERROR
2011-08-11 18:10:54,721 WARN Received irrelevant event for current state: state=OFFLINE:ERROR event=OfflineNotification
That's much better. It is not perfect yet, but it will be soon. Right now, it tells me what is wrong without forcing me to go hunting for it amid hundreds of stack trace lines. Give it a try, using the latest replicator build.

PlanetMySQL Voting: Vote UP / Vote DOWN

Exchanging partitions with tables

Апрель 29th, 2010
MySQL PartitionsWhile I was presenting my partitioning tutorial at the latest MySQL Conference, I announced a new feature that was, as far as I knew, still in the planning stage. Mattias Jonsson, one of the partitions developers, was in attendance, and corrected me, explaining that the feature was actually available in a prototype.

So, we can have a look at this improvement, which I am sure will make DBAs quite happy. The new feature is an instantaneous exchange between a partition and a table with the same structure. Using this feature, you can transfer the contents of one partition to one table, and vice versa. Since the transition is done only in the attribution of the data, there is no copy involved. The data stays where it is at the moment. What is in the table ends up in the partition and what's in the partition ends up in the table. Let's see an example.

With the data in figure 1, where we have a partitioned table t1 and an empty table t2 with the same structure, we can issue the following statement:
ALTER TABLE t1
EXCHANGE PARTITION p2
WITH TABLE t2


After the exchange, partition p2 is empty, and table t2 contains 4 records.
If we repeat the command, the contents will be swapped again, leaving table t2 empty and partition p2 with its original contents.

If you want to test on your own, you can get the code from Launchpad. Once you get the code, you can use cmake to build the server.

$ cmake-gui .
# add the options you need. For example, enable innodb
# or else you will need to load it as a plugin.
$ make && ./scripts/make_binary_distribution

You can then use this script to test the new functionality. You may want to change Innodb with MyISAM to test it thoroughly. At the moment, it doesn't work with the archive engine (yet).

# ############################
# test_exchange_partitions.sql
# ############################
use test;
set default_storage_engine=innodb;
drop procedure if exists compare_tables;
delimiter //
create procedure compare_tables (wanted int)
reads sql data
begin
set @part_table := (select count(*) from t1);
set @non_part_table := (select count(*) from t2);
select @part_table, @non_part_table,
if(@non_part_table = wanted, "OK", "error") as expected;
end //
delimiter ;

drop table if exists t1, t2;
create table t1 (i int) # not null primary key)
partition by range (i)
(
partition p01 values less than (100001),
partition p02 values less than (200001),
partition p03 values less than (300001),
partition p04 values less than (400001),
partition p05 values less than (500001),
partition p06 values less than (600001),
partition p07 values less than (700001),
partition p08 values less than (800001),
partition p09 values less than (900001),
partition p10 values less than (1000001),
partition p11 values less than (maxvalue));

create table t2 (i int ) ; # not null primary key);

select table_name, engine
from information_schema.tables
where table_schema='test' and table_type='base table';


select 'generating 1 million records. ...' as info;
# generates 1 million records
# see this article for details
# http://datacharmer.blogspot.com/2007/12/data-from-nothing-solution-to-pop-quiz.html
create or replace view v3 as select null union all select null union all select null;
create or replace view v10 as select null from v3 a, v3 b union all select null;
create or replace view v1000 as select null from v10 a, v10 b, v10 c;
set @n = 0;
insert into t1 select @n:=@n+1 from v1000 a,v1000 b;

select partition_name, table_rows from information_schema . partitions where table_name='t1' and table_schema='test';

call compare_tables(0);

alter table t1 exchange partition p04 with table t2;
call compare_tables(100000);

select partition_name, table_rows from information_schema . partitions where table_name='t1' and table_schema='test';

alter table t1 exchange partition p04 with table t2;
call compare_tables(0);

alter table t1 exchange partition p04 with table t2;
call compare_tables(100000);

Here is a test run:

$ ~/sandboxes/msb_5_6_99/use -t test -vvv < test_exch_part.sql
--------------
set default_storage_engine=innodb
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
drop procedure if exists compare_tables
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create procedure compare_tables (wanted int)
reads sql data
begin
set @part_table := (select count(*) from t1);
set @non_part_table := (select count(*) from t2);
select @part_table, @non_part_table,
if(@non_part_table = wanted, "OK", "error") as expected;
end
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
drop table if exists t1, t2
--------------

Query OK, 0 rows affected (0.07 sec)

--------------
create table t1 (i int)
partition by range (i)
(
partition p01 values less than (100001),
partition p02 values less than (200001),
partition p03 values less than (300001),
partition p04 values less than (400001),
partition p05 values less than (500001),
partition p06 values less than (600001),
partition p07 values less than (700001),
partition p08 values less than (800001),
partition p09 values less than (900001),
partition p10 values less than (1000001),
partition p11 values less than (maxvalue))
--------------

Query OK, 0 rows affected (0.08 sec)

--------------
create table t2 (i int )
--------------

Query OK, 0 rows affected (0.14 sec)

--------------
select table_name, engine
from information_schema.tables
where table_schema='test' and table_type='base table'
--------------

+------------+--------+
| table_name | engine |
+------------+--------+
| t1 | InnoDB |
| t2 | InnoDB |
+------------+--------+
2 rows in set (0.01 sec)

--------------
select 'generating 1 million records. ...' as info
--------------

+-----------------------------------+
| info |
+-----------------------------------+
| generating 1 million records. ... |
+-----------------------------------+
1 row in set (0.00 sec)

--------------
create or replace view v3 as select null union all select null union all select null
--------------

Query OK, 0 rows affected (0.12 sec)

--------------
create or replace view v10 as select null from v3 a, v3 b union all select null
--------------

Query OK, 0 rows affected (0.14 sec)

--------------
create or replace view v1000 as select null from v10 a, v10 b, v10 c
--------------

Query OK, 0 rows affected (0.09 sec)

--------------
set @n = 0
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t1 select @n:=@n+1 from v1000 a,v1000 b
--------------

Query OK, 1000000 rows affected (10.01 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

--------------
select partition_name, table_rows from information_schema . partitions where table_name='t1' and table_schema='test'
--------------

+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p01 | 100623 |
| p02 | 100623 |
| p03 | 100623 |
| p04 | 100623 |
| p05 | 100623 |
| p06 | 100623 |
| p07 | 100623 |
| p08 | 100623 |
| p09 | 100623 |
| p10 | 100623 |
| p11 | 0 |
+----------------+------------+
11 rows in set (0.01 sec)

--------------
call compare_tables(0)
--------------

+-------------+-----------------+----------+
| @part_table | @non_part_table | expected |
+-------------+-----------------+----------+
| 1000000 | 0 | OK |
+-------------+-----------------+----------+
1 row in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)

--------------
alter table t1 exchange partition p04 with table t2
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
call compare_tables(100000)
--------------

+-------------+-----------------+----------+
| @part_table | @non_part_table | expected |
+-------------+-----------------+----------+
| 900000 | 100000 | OK |
+-------------+-----------------+----------+
1 row in set (0.54 sec)

Query OK, 0 rows affected (0.54 sec)

--------------
select partition_name, table_rows from information_schema . partitions where table_name='t1' and table_schema='test'
--------------

+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p01 | 100623 |
| p02 | 100623 |
| p03 | 100623 |
| p04 | 0 |
| p05 | 100623 |
| p06 | 100623 |
| p07 | 100623 |
| p08 | 100623 |
| p09 | 100623 |
| p10 | 91799 |
| p11 | 0 |
+----------------+------------+
11 rows in set (0.01 sec)

--------------
alter table t1 exchange partition p04 with table t2
--------------

Query OK, 0 rows affected (0.05 sec)

--------------
call compare_tables(0)
--------------

+-------------+-----------------+----------+
| @part_table | @non_part_table | expected |
+-------------+-----------------+----------+
| 1000000 | 0 | OK |
+-------------+-----------------+----------+
1 row in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)

--------------
alter table t1 exchange partition p04 with table t2
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
call compare_tables(100000)
--------------

+-------------+-----------------+----------+
| @part_table | @non_part_table | expected |
+-------------+-----------------+----------+
| 900000 | 100000 | OK |
+-------------+-----------------+----------+
1 row in set (0.56 sec)

Query OK, 0 rows affected (0.56 sec)

Bye

Notice that the value for "table_rows" is only approximate with InnoDB, while it is reliable for MyISAM. Anyway, when it says that a partition has 0 records, it's reliable for any engine. Here you see that, after the exchange, partition p04 is empty.
The exchange is repeated twice, to make sure that it works both ways.
Notice also that, if the table contains data that doesn't fit with the partition, the server throws an error, and the exchange does not happen.

mysql > insert into t2 values (2000000);
Query OK, 1 row affected (0.00 sec)

mysql > alter table t1 exchange partition p04 with table t2;
ERROR 1697 (HY000): Found row that does not match the partition

If you remove the offending row from the table, the exchange works as expected.

PlanetMySQL Voting: Vote UP / Vote DOWN

Holiday gift — A deep look at MySQL 5.5 partitioning enhancements

Декабрь 24th, 2009

A deep look at MySQL 5.5 partitioning enhancements

Half a day into my vacation, I managed to finish an article on a topic that has been intriguing me for a while.
Since several colleagues were baffled by the semantics of the new enhancements of MySQL 5.5 partitions, after talking at length with the creator and the author of the manual pages, I produced this article: A deep look at MySQL 5.5 partitioning enhancements.
Happy holidays!

PlanetMySQL Voting: Vote UP / Vote DOWN

Forums are crap. Can we get some help?

Октябрь 12th, 2009
Amy Hoy has written a blog post about why forums are crap. And she is right. Forum software does not always do a good job of helping people communicate. I have worked with Amy. She did a great analysis of dealnews.com that led to our new design. So, she is not to be ignored.

However, as a software developer (Phorum), I see a lot of problems and no answers.  And it is not all on the software.  Web site owners use forums to solve problems that they really, really suck at.  Ideally, every web site would be very unique for their audience.  They would use a custom solution that fits a number of patterns that best solves their problem.  However, most web site owners don't want to take the time to do such things.  They want a one stop, drop in solution. See the monolith that is vBulletin, scary.

And what if a forum is the best solution? Well, software developers, in general, are not good designers. They don't think like normal people. And they don't see their applications as a whole, but as pieces that do jobs. The forum software market has been run by software developers for over 10 years. Most of them all are still copies of what UBB was 13 years ago. And software (like Phorum) that has tried to be different is shunned by the online communities of the world because they don't work/look/feel like every other forum software on the planet.

So, as software developers, what are we to do? We want to make great software. We want to help our users help their users. But, what we have been doing for 10+ years has only been adequate. As the leader of an open source forum software project, I am open to any and all ideas.
PlanetMySQL Voting: Vote UP / Vote DOWN

OPTIMIZE TABLE on Cluster (revisited)

Август 4th, 2009
Jonas just wrote a patch to this bug on OPTIMIZE TABLE, and the issue that was also discussed in this blog post. Jonas also fixed this bug when he was at it.

Before, OPTIMIZE TABLE hardly freed up any pages and to defragment you had to do a rolling restart of the data nodes.

Now, there is only a 2% discrepancy between OPTIMIZE TABLE and doing a rolling restart. This is great stuff.
This will fix will make it into 6.3.26 and 7.0.7.

See below for details:

Creating two tables, t5 and t6:

CREATE TABLE `t5` (
`id` varchar(32) CHARACTER SET utf8 NOT NULL,
`name` varchar(32) NOT NULL DEFAULT '',
`unit` int(11) DEFAULT NULL,
PRIMARY KEY (`id`,`name`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

And then the following table:

'CREATE TABLE `t6` ( `id` varchar(32) CHARACTER SET utf8 NOT NULL, `name` varchar(32) NOT NULL DEFAULT '', c0 int unsigned not null, c1 int unsigned not null, data2 varchar(255), `unit` int(11) DEFAULT NULL, PRIMARY KEY (`id`,`name`) ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;


Then we use hugoLoad (a test program that isn't built by default in the src distribution of cluster) to load it with 1M records.

./hugoLoad -r 1000000 -d test t5
./hugoLoad -r 1000000 -d test t6


Memory usage after population:

Node 2: Data usage is 45%(14890 32K pages of total 32768)
Node 2: Index usage is 22%(3722 8K pages of total 16416)
Node 3: Data usage is 45%(14890 32K pages of total 32768)
Node 3: Index usage is 22%(3722 8K pages of total 16416)


Delete 500K records from t5.


mysql> set ndb_use_transactions=0;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (2.73 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.13 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.11 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.13 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (2.97 sec)


Memory usage after deletion:


Node 2: Data usage is 45%(14890 32K pages of total 32768)
Node 2: Index usage is 19%(3120 8K pages of total 16416)
ndb_mgm> Node 3: Data usage is 45%(14890 32K pages of total 32768)
Node 3: Index usage is 19%(3121 8K pages of total 16416)


Run OPTIMIZE TABLE:


mysql> optimize table t5;
+---------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+----------+----------+----------+
| test.t5 | optimize | status | OK |
+---------+----------+----------+----------+
1 row in set (1 min 41.94 sec)



CPU UTIL during OPTIMIZE TABLE:
User:23.8% Sys: 4.9% ndbd

Memory usage after OPTIMIZE TABLE:


ndb_mgm> Node 2: Data usage is 39%(13090 32K pages of total 32768)
Node 2: Index usage is 19%(3120 8K pages of total 16416)
Node 3: Data usage is 39%(13090 32K pages of total 32768)
Node 3: Index usage is 19%(3121 8K pages of total 16416)


Delete 500K (50%) of the records from t6:

mysql> set ndb_use_transactions=0;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (2.77 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.05 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.12 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.19 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.24 sec)


Memory usage after deletion:


Node 2: Data usage is 39%(13090 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 39%(13090 32K pages of total 32768)
Node 3: Index usage is 15%(2522 8K pages of total 16416)


Run OPTIMIZE TABLE:


mysql> optimize table t6;
+---------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+----------+----------+----------+
| test.t6 | optimize | status | OK |
+---------+----------+----------+----------+
1 row in set (1 min 45.83 sec)


Memory usage after OPTIMIZE TABLE:


Node 2: Data usage is 28%(9249 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 28%(9249 32K pages of total 32768)
Node 3: Index usage is 15%(2522 8K pages of total 16416)


After restart of node 3:


Node 2: Data usage is 28%(9249 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 26%(8568 32K pages of total 32768)
Node 3: Index usage is 13%(2236 8K pages of total 16416)


Only two percent diff between rolling restart and OPTIMIZE TABLE. Excellent. Thank you Jonas for fixing this.