Archive for the ‘ndb’ Category

Surprises in store with ndb_restore

Февраль 16th, 2012

While doing some routine fiddling regarding some topic I've now forgotten, I discovered that ndb_restore was doing something quite surprising. It's been common wisdom for some time that one can use ndb_restore -m to restore metadata into a new cluster and automatically have your data re-partitioned across the data nodes in the destination cluster. In fact, this was the recommended procedure for adding nodes to a cluster before online add node came along. Since MySQL Cluster 7.0, though, ndb_restore hasn't behaved that way, though that change in behavior doesn't seem to be documented and most don't know that the change ever took place.

I'll go through some of the methods you can use to find information about the partitioning strategy for an NDB table, talk a bit about why ndb_restore stopped working the way most everyone expected (and still expect) it to, and discuss some possible alternatives and workarounds.

Let's start out with an example of how ndb_restore worked in the pre-7.0 days. I'm going to create a 2-node cluster, create a table, put some rows in it, look at the partitioning strategy for that table, then take a backup and shut down my cluster.

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ cat ~/cluster_2.ini 
[ndb_mgmd]
Hostname=127.0.0.1
Datadir=/home/ndb/cluster-data
NodeId=1

[ndbd default]
#MaxNoOfExecutionThreads=4
Datadir=/home/ndb/cluster-data
NoOfReplicas=2
Hostname=127.0.0.1

[ndbd]
NodeId=3
[ndbd]
NodeId=4

[mysqld]
NodeId=11

[mysqld]
NodeId=12

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_mgmd -f ~/cluster_2.ini  
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndbd --initial;./bin/ndbd --initial;
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_mgm -e 'all status'
Connected to Management Server at: localhost:1186
Node 3: started (mysql-5.1.56 ndb-6.3.45)
Node 4: started (mysql-5.1.56 ndb-6.3.45)

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/mysqld_safe &
[1] 2489
120215 20:10:49 mysqld_safe Logging to '/home/ndb/mysql/mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23/data/ip-10-0-0-59.err'.
120215 20:10:49 mysqld_safe Starting mysqld daemon with databases from /home/ndb/mysql/mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23/data

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.56-ndb-6.3.45-cluster-gpl MySQL Cluster Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.1.56-ndb-6.3.45-cluster-gpl (root) [test]> create table c1 (id int) engine=ndb;
Query OK, 0 rows affected (0.12 sec)

mysql 5.1.56-ndb-6.3.45-cluster-gpl (root) [test]> INSERT INTO c1 (id) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),
(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),
(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),
(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),
(91),(92),(93),(94),(95),(96),(97),(98),(99),(100);
Query OK, 100 rows affected (0.00 sec)
Records: 100  Duplicates: 0  Warnings: 0

mysql 5.1.56-ndb-6.3.45-cluster-gpl (root) [test]> Bye
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_desc -d test c1 -pn
-- c1 --
Version: 1
Fragment type: 5
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 2
Number of primary keys: 1
Length of frm data: 206
Row Checksum: 1
Row GCI: 1
SingleUserMode: 0
ForceVarPart: 1
TableStatus: Retrieved
-- Attributes -- 
id Int NULL AT=FIXED ST=MEMORY
$PK Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY AUTO_INCR

-- Indexes -- 
PRIMARY KEY($PK) - UniqueHashIndex

-- Per partition info -- 
Partition       Row count       Commit count    Frag fixed memory       Frag varsized memory    Extent_space    Free extent_space       Nodes   
0               56              56              32768                   0                       0               0                       3,4
1               44              44              32768                   0                       0               0                       4,3


NDBT_ProgramExit: 0 - OK

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_mgm -e 'start backup'
Connected to Management Server at: localhost:1186
Waiting for completed, this may take several minutes
Node 3: Backup 1 started from node 1
Node 3: Backup 1 started from node 1 completed
 StartGCP: 88 StopGCP: 91
 #Records: 2156 #LogRecords: 0
 Data: 53208 bytes Log: 0 bytes
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ 
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/mysqladmin shutdown
120215 20:13:45 mysqld_safe mysqld from pid file /home/ndb/mysql/mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23/data/ip-10-0-0-59.pid ended
[1]+  Done                    ./bin/mysqld_safe
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_mgm -e shutdown
Connected to Management Server at: localhost:1186
2 NDB Cluster node(s) have shutdown.
Disconnecting to allow management server to shutdown.

So, there we've created a 2-node cluster, created a table and put a few rows in it, created an NDB native backup, and then shut the cluster down. Now, we'll create a 4-node cluster, restore the backup, and see what our table looks like.

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ rm ./data/test/*
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ cat ~/cluster_4.ini 
[ndb_mgmd]
Hostname=127.0.0.1
Datadir=/home/ndb/cluster-data
NodeId=1

[ndbd default]
#MaxNoOfExecutionThreads=4
Datadir=/home/ndb/cluster-data
NoOfReplicas=2
Hostname=127.0.0.1

[ndbd]
NodeId=3
[ndbd]
NodeId=4

[ndbd]
NodeId=5
[ndbd]
NodeId=6

[mysqld]
NodeId=11

[mysqld]
NodeId=12
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_mgmd -f ~/cluster_4.ini  
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndbd --initial;./bin/ndbd --initial;./bin/ndbd --initial;./bin/ndbd --initial;
[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_mgm -e 'all status'
Connected to Management Server at: localhost:1186
Node 3: started (mysql-5.1.56 ndb-6.3.45)
Node 4: started (mysql-5.1.56 ndb-6.3.45)
Node 5: started (mysql-5.1.56 ndb-6.3.45)
Node 6: started (mysql-5.1.56 ndb-6.3.45)

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_restore -b 1 -r -n 3 -m ~/cluster-data/BACKUP/BACKUP-1/
Backup Id = 1
Nodeid = 3
backup path = /home/ndb/cluster-data/BACKUP/BACKUP-1/
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1.3.ctl'
Backup version in files: ndb-6.3.11 ndb version: mysql-5.1.56 ndb-6.3.45
Connected to ndb!!
Successfully restored table `mysql/def/ndb_apply_status`
Successfully restored table event REPL$mysql/ndb_apply_status
Successfully restored table `test/def/c1`
Successfully restored table event REPL$test/c1
Successfully restored table `mysql/def/ndb_schema`
Successfully restored table event REPL$mysql/ndb_schema
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1-0.3.Data'
_____________________________________________________
Processing data in table: sys/def/NDB$EVENTS_0(1) fragment 0
_____________________________________________________
Processing data in table: mysql/def/ndb_apply_status(4) fragment 0
_____________________________________________________
Processing data in table: test/def/c1(5) fragment 0
_____________________________________________________
Processing data in table: mysql/def/NDB$BLOB_2_3(3) fragment 0
_____________________________________________________
Processing data in table: sys/def/SYSTAB_0(0) fragment 0
_____________________________________________________
Processing data in table: mysql/def/ndb_schema(2) fragment 0
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1.3.log'
Restored 56 tuples and 0 log entries

NDBT_ProgramExit: 0 - OK

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_restore -b 1 -r -n 4 ~/cluster-data/BACKUP/BACKUP-1/
Backup Id = 1
Nodeid = 4
backup path = /home/ndb/cluster-data/BACKUP/BACKUP-1/
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1.4.ctl'
Backup version in files: ndb-6.3.11 ndb version: mysql-5.1.56 ndb-6.3.45
Connected to ndb!!
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1-0.4.Data'
_____________________________________________________
Processing data in table: sys/def/NDB$EVENTS_0(1) fragment 1
_____________________________________________________
Processing data in table: mysql/def/ndb_apply_status(4) fragment 1
_____________________________________________________
Processing data in table: test/def/c1(5) fragment 1
_____________________________________________________
Processing data in table: mysql/def/NDB$BLOB_2_3(3) fragment 1
_____________________________________________________
Processing data in table: sys/def/SYSTAB_0(0) fragment 1
_____________________________________________________
Processing data in table: mysql/def/ndb_schema(2) fragment 1
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1.4.log'
Restored 44 tuples and 0 log entries

NDBT_ProgramExit: 0 - OK

[ndb@ip-10-0-0-59 mysql-cluster-gpl-6.3.45-linux-x86_64-glibc23]$ ./bin/ndb_desc -d test c1 -pn
-- c1 --
Version: 1
Fragment type: 5
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 2
Number of primary keys: 1
Length of frm data: 206
Row Checksum: 1
Row GCI: 1
SingleUserMode: 0
ForceVarPart: 1
TableStatus: Retrieved
-- Attributes -- 
id Int NULL AT=FIXED ST=MEMORY
$PK Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY AUTO_INCR

-- Indexes -- 
PRIMARY KEY($PK) - UniqueHashIndex

-- Per partition info -- 
Partition       Row count       Commit count    Frag fixed memory       Frag varsized memory    Extent_space    Free extent_space       Nodes   
0               26              26              32768                   0                       0               0                       3,4
1               24              24              32768                   0                       0               0                       5,6
3               20              20              32768                   0                       0               0                       6,5
2               30              30              32768                   0                       0               0                       4,3


NDBT_ProgramExit: 0 - OK

Alright! We created a new cluster with 4 data nodes, restored the backup into the cluster, and confirmed with ndb_desc that the data was automatically re-partitioned to give the table a number of partitions equal to the number of data nodes in the cluster. Why is that important? This way, each data node can be primary for one partition.

You can see in the Nodes column on the very right-hand side of the Per partition info section which nodes hold each partition. The left-most node listed in that column for a given partition is the primary for that partition; any other nodes listed hold secondary replicas for that partition.

When the cluster is handling a request, data is only retrieved from the primary replica. If we had 4 data nodes but only 2 partitions, that would mean that half of our nodes were not primary for any partition, which means that they would never be responsible for sending any data to API/MySQL nodes. Clearly, that is not the best solution in terms of spreading load across the data nodes.

Unfortunately, that is exactly the behavior you get with this same operation starting with MySQL Cluster 7.0.

Here's a demo identical to the one above, but using MySQL Cluster 7.2.4:

[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_mgmd -f ~/cluster_2.ini --config-dir=/home/ndb/cluster-config/ --initial
MySQL Cluster Management Server mysql-5.5.19 ndb-7.2.4
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndbd --initial;./bin/ndbd --initial
2012-02-15 20:29:17 [ndbd] INFO     -- Angel connected to 'localhost:1186'
2012-02-15 20:29:17 [ndbd] INFO     -- Angel allocated nodeid: 3
2012-02-15 20:29:17 [ndbd] INFO     -- Angel connected to 'localhost:1186'
2012-02-15 20:29:17 [ndbd] INFO     -- Angel allocated nodeid: 4
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_mgm -e 'all status'
Connected to Management Server at: localhost:1186
Node 3: started (mysql-5.5.19 ndb-7.2.4)
Node 4: started (mysql-5.5.19 ndb-7.2.4)

[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/mysqld_safe &
[1] 3079
120215 20:29:35 mysqld_safe Logging to '/home/ndb/mysql/mysql-cluster-gpl-7.2.4-linux2.6-x86_64/data/ip-10-0-0-59.err'.
120215 20:29:35 mysqld_safe Starting mysqld daemon with databases from /home/ndb/mysql/mysql-cluster-gpl-7.2.4-linux2.6-x86_64/data

[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.5.19-ndb-7.2.4-gpl MySQL Cluster Community Server (GPL)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.5.19-ndb-7.2.4-gpl (root) [test]> create table c1 (id int) engine=ndb;
Query OK, 0 rows affected (0.17 sec)

mysql 5.5.19-ndb-7.2.4-gpl (root) [test]> INSERT INTO c1 (id) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),
(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),
(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),
(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),
(91),(92),(93),(94),(95),(96),(97),(98),(99),(100);
Query OK, 100 rows affected (0.00 sec)
Records: 100  Duplicates: 0  Warnings: 0

mysql 5.5.19-ndb-7.2.4-gpl (root) [test]> Bye
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_desc -d test c1 -pn
-- c1 --
Version: 1
Fragment type: 9
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 2
Number of primary keys: 1
Length of frm data: 204
Row Checksum: 1
Row GCI: 1
SingleUserMode: 0
ForceVarPart: 1
FragmentCount: 2
ExtraRowGciBits: 0
ExtraRowAuthorBits: 0
TableStatus: Retrieved
-- Attributes -- 
id Int NULL AT=FIXED ST=MEMORY
$PK Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY AUTO_INCR

-- Indexes -- 
PRIMARY KEY($PK) - UniqueHashIndex

-- Per partition info -- 
Partition       Row count       Commit count    Frag fixed memory       Frag varsized memory    Extent_space    Free extent_space       Nodes   
0               56              56              32768                   0                       0               0                       3,4
1               44              44              32768                   0                       0               0                       4,3


NDBT_ProgramExit: 0 - OK

[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_mgm -e 'start backup'
Connected to Management Server at: localhost:1186
Waiting for completed, this may take several minutes
Node 3: Backup 1 started from node 1
Node 3: Backup 1 started from node 1 completed
 StartGCP: 25 StopGCP: 28
 #Records: 2157 #LogRecords: 0
 Data: 53592 bytes Log: 0 bytes
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/mysqladmin shutdown
120215 20:30:15 mysqld_safe mysqld from pid file /home/ndb/mysql/mysql-cluster-gpl-7.2.4-linux2.6-x86_64/data/ip-10-0-0-59.pid ended
[1]+  Done                    ./bin/mysqld_safe
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_mgm -e shutdown
Connected to Management Server at: localhost:1186
3 NDB Cluster node(s) have shutdown.
Disconnecting to allow management server to shutdown.

OK, everything there looks about the same as before. We created the same table, inserted the same rows, and we have the same number of partitions that we did after the first half of the exercise on MySQL Cluster 6.3.45. Now, let's try the restore.


[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ rm ./data/test/*
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_mgmd -f ~/cluster_4.ini --config-dir=/home/ndb/cluster-config/ --initial
MySQL Cluster Management Server mysql-5.5.19 ndb-7.2.4
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndbd --initial;./bin/ndbd --initial;./bin/ndbd --initial;./bin/ndbd --initial;
2012-02-15 20:32:43 [ndbd] INFO     -- Angel connected to 'localhost:1186'
2012-02-15 20:32:43 [ndbd] INFO     -- Angel allocated nodeid: 3
2012-02-15 20:32:43 [ndbd] INFO     -- Angel connected to 'localhost:1186'
2012-02-15 20:32:43 [ndbd] INFO     -- Angel allocated nodeid: 4
2012-02-15 20:32:44 [ndbd] INFO     -- Angel connected to 'localhost:1186'
2012-02-15 20:32:44 [ndbd] INFO     -- Angel allocated nodeid: 5
2012-02-15 20:32:44 [ndbd] INFO     -- Angel connected to 'localhost:1186'
2012-02-15 20:32:44 [ndbd] INFO     -- Angel allocated nodeid: 6
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_mgm -e 'all status'
Connected to Management Server at: localhost:1186
Node 3: started (mysql-5.5.19 ndb-7.2.4)
Node 4: started (mysql-5.5.19 ndb-7.2.4)
Node 5: started (mysql-5.5.19 ndb-7.2.4)
Node 6: started (mysql-5.5.19 ndb-7.2.4)

[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_restore -b 1 -r -n 3 -m ~/cluster-data/BACKUP/BACKUP-1/
Backup Id = 1
Nodeid = 3
backup path = /home/ndb/cluster-data/BACKUP/BACKUP-1/
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1.3.ctl'
File size 14088 bytes
Backup version in files: ndb-6.3.11 ndb version: mysql-5.5.19 ndb-7.2.4
Stop GCP of Backup: 27
Connected to ndb!!
Created hashmap: DEFAULT-HASHMAP-240-2
Successfully restored table `mysql/def/ndb_apply_status`
Successfully restored table event REPL$mysql/ndb_apply_status
Successfully restored table `test/def/c1`
Successfully restored table event REPL$test/c1
Successfully restored table `mysql/def/ndb_schema`
Successfully restored table event REPL$mysql/ndb_schema
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1-0.3.Data'
File size 27448 bytes
_____________________________________________________
Processing data in table: mysql/def/NDB$BLOB_7_3(8) fragment 0
_____________________________________________________
Processing data in table: mysql/def/ndb_index_stat_sample(5) fragment 0
_____________________________________________________
Processing data in table: sys/def/NDB$EVENTS_0(3) fragment 0
_____________________________________________________
Processing data in table: mysql/def/ndb_apply_status(9) fragment 0
_____________________________________________________
Processing data in table: mysql/def/ndb_index_stat_head(4) fragment 0
_____________________________________________________
Processing data in table: test/def/c1(10) fragment 0
_____________________________________________________
Processing data in table: sys/def/SYSTAB_0(2) fragment 0
_____________________________________________________
Processing data in table: mysql/def/ndb_schema(7) fragment 0
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1.3.log'
File size 52 bytes
Restored 56 tuples and 0 log entries

NDBT_ProgramExit: 0 - OK

[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_restore -b 1 -r -n 4 ~/cluster-data/BACKUP/BACKUP-1/
Backup Id = 1
Nodeid = 4
backup path = /home/ndb/cluster-data/BACKUP/BACKUP-1/
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1.4.ctl'
File size 14088 bytes
Backup version in files: ndb-6.3.11 ndb version: mysql-5.5.19 ndb-7.2.4
Stop GCP of Backup: 27
Connected to ndb!!
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1-0.4.Data'
File size 26688 bytes
_____________________________________________________
Processing data in table: mysql/def/NDB$BLOB_7_3(8) fragment 1
_____________________________________________________
Processing data in table: mysql/def/ndb_index_stat_sample(5) fragment 1
_____________________________________________________
Processing data in table: sys/def/NDB$EVENTS_0(3) fragment 1
_____________________________________________________
Processing data in table: mysql/def/ndb_apply_status(9) fragment 1
_____________________________________________________
Processing data in table: mysql/def/ndb_index_stat_head(4) fragment 1
_____________________________________________________
Processing data in table: test/def/c1(10) fragment 1
_____________________________________________________
Processing data in table: sys/def/SYSTAB_0(2) fragment 1
_____________________________________________________
Processing data in table: mysql/def/ndb_schema(7) fragment 1
Opening file '/home/ndb/cluster-data/BACKUP/BACKUP-1/BACKUP-1.4.log'
File size 52 bytes
Restored 44 tuples and 0 log entries

NDBT_ProgramExit: 0 - OK

[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_desc -d test c1 -pn
-- c1 --
Version: 1
Fragment type: 9
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 2
Number of primary keys: 1
Length of frm data: 204
Row Checksum: 1
Row GCI: 1
SingleUserMode: 0
ForceVarPart: 1
FragmentCount: 2
ExtraRowGciBits: 0
ExtraRowAuthorBits: 0
TableStatus: Retrieved
-- Attributes -- 
id Int NULL AT=FIXED ST=MEMORY
$PK Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY AUTO_INCR

-- Indexes -- 
PRIMARY KEY($PK) - UniqueHashIndex

-- Per partition info -- 
Partition       Row count       Commit count    Frag fixed memory       Frag varsized memory    Extent_space    Free extent_space       Nodes   
0               56              56              32768                   0                       0               0                       3,4
1               44              44              32768                   0                       0               0                       5,6


NDBT_ProgramExit: 0 - OK

Uh oh, this didn't turn out quite the same as the example from MySQL Cluster 6.3.45. There are still only 2 partitions after the restore, even though there are 4 data nodes. Take a look at the Nodes column on the right of "Per partition info" and you can see, in fact, that the 2 partitions are actually on separate node groups. That's sort of interesting. It means that writes are still going to be scaled across all node groups, which is great, but it means that reads will not be scaled. All reads will have to come from nodes 3 and 5, because those nodes are the primaries for their respective partitions.

So, why did this change happen? It's not something that anyone decided to do consciously, I think; instead, I think it's the side effect of the implementation of the new HashMap partitioning algorithm that was introduced and made default in MySQL Cluster 7.0. Frazer Clement provides an exceptional discussion of the HashMap algorithm at http://messagepassing.blogspot.com/2011/03/mysql-cluster-online-scaling.....

It appears that the HashMap is stored as part of the schema data for the table; when the table metadata is restored with ndb_restore -m, the same HashMap is used. MySQL Cluster distributes the partitions across all the node groups in the destination cluster, but it does not change the number of partitions. (As a result, if you had a 6-node cluster, one node group would not hold any partitions for this table; that would mean 3 node groups, but there are only 2 partitions.)

Now we see how ndb_restore works starting in MySQL Cluster 7.0 and we can see that the results are not very desirable. What, then, can be done to get your table distributed across all nodes and node groups so that each data node in the cluster is primary for one partition? There are a couple options.

Part of the reason HashMap was put into place was to make it easier to redistribute data in the cluster in order to support online add node functionality. When using online add node, you execute an ALTER TABLE ... REORGANIZE PARTITION statement after creating the new node group(s) and starting the new data nodes. We can do the same, here, to reorganize the partitions of our table across all nodes in the cluster:

mysql 5.5.19-ndb-7.2.4-gpl (root) [test]> select partition_name, table_rows from information_schema.partitions where table_schema='test' and table_name='c1';
+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p0             |         56 |
| p1             |         44 |
+----------------+------------+
2 rows in set (0.00 sec)

mysql 5.5.19-ndb-7.2.4-gpl (root) [test]> alter table c1 reorganize partition;
Query OK, 0 rows affected (7.46 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql 5.5.19-ndb-7.2.4-gpl (root) [test]> select partition_name, table_rows from information_schema.partitions where table_schema='test' and table_name='c1';
+----------------+------------+
| partition_name | table_rows |
+----------------+------------+
| p0             |         26 |
| p1             |         24 |
| p2             |         30 |
| p3             |         20 |
+----------------+------------+
4 rows in set (0.02 sec)

mysql 5.5.19-ndb-7.2.4-gpl (root) [test]> Bye
[ndb@ip-10-0-0-59 mysql-cluster-gpl-7.2.4-linux2.6-x86_64]$ ./bin/ndb_desc -d test c1 -pn
-- c1 --
Version: 16777217
Fragment type: 9
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 2
Number of primary keys: 1
Length of frm data: 204
Row Checksum: 1
Row GCI: 1
SingleUserMode: 0
ForceVarPart: 1
FragmentCount: 4
ExtraRowGciBits: 0
ExtraRowAuthorBits: 0
TableStatus: Retrieved
-- Attributes -- 
id Int NULL AT=FIXED ST=MEMORY
$PK Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY AUTO_INCR

-- Indexes -- 
PRIMARY KEY($PK) - UniqueHashIndex

-- Per partition info -- 
Partition       Row count       Commit count    Frag fixed memory       Frag varsized memory    Extent_space    Free extent_space       Nodes   
0               26              116             32768                   0                       0               0                       3,4
2               30              30              32768                   0                       0               0                       4,3
1               24              84              32768                   0                       0               0                       5,6
3               20              20              32768                   0                       0               0                       6,5


NDBT_ProgramExit: 0 - OK

That's a pretty easy way to re-partition a table across your data nodes. However, keep in mind that you'd need to do this for every table in the cluster. It's fairly easy to do that programatically by checking the number of partitions for a given table in information_schema.partitions and executing ALTER TABLE ... REORGANIZE PARTITON for any of them that have fewer partitions than the number of rows in ndbinfo.nodes. Still, though, I don't find that to be terribly appealing. There are also a couple big caveats for ALTER TABLE ... REORGANIZE PARTITION – it doesn't re-partition UNIQUE indexes or BLOBs. The first of those may not be such a big problem, because UNIQUE indexes (implemented in MySQL Cluster as a separate, hidden table) are not likely to be large in size to the point that scaling reads or spreading the data across additional node groups would be so important. BLOBs, on the other hand, (also implemented in MySQL Cluster as a separate, hidden table) can take up a lot of space, so having them relegated to only some nodes in the cluster might mean that those nodes would use considerably more DataMemory than other nodes.

Another solution, if ALTER TABLE ... REORGANIZE PARTITION doesn't strike your fancy, is to use mysqldump --no-data to backup and restore your schema instead of relying on ndb_restore -m. You'd still use ndb_restore to restore data, but you'd get the schema from mysqldump. When you execute the CREATE TABLE statements output by mysqldump, MySQL Cluster sees them as brand new tables and thus partitions them across all data nodes in the Cluster, as as would be the case for any new table created on the cluster.

Using mysqldump has the advantage of backing up triggers and stored routines, which you won't get if you use ndb_restore -m. If you are using those features, this is very important, of course; if you're not using them, there isn't a lot of practical value gained by using mysqldump. In fact, it means that you add an extra step for backup, and you add an extra step for restore. On top of that, you get no guarantee of consistency. Some DDL could be executed between the time that you run mysqldump and the time you start your NDB native backup. That means that there is no guarantee that the table structure in one part of your backup matches the structure of the data in the other part. That's a little bit scary, and it can only be worked around safely by essentially taking the cluster offline (single user mode) when executing a backup.

My hope is that the original (and still widely expected) behavior of ndb_restore will be ... restored. I've opened bug #64302 to track the issue. Let me know your thoughts here, and let the MySQL Cluster developers know your thoughts on the bug report.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Cluster by Ralf Gebhardt

Февраль 5th, 2012

Ralf Gebhardt of SkySQL.

Cluster: shared nothing architecture (no single point of failure), synchronous replication between nodes, ACID transactions, row level locking. In-memory storage (some data can be stored on disk, but indexes must be in-memory). Checkpointing to disk for durability. It supports two types of indexes – ordered T-trees, unique hash indexes. Online operations like adding node groups, software upgrades, table alterations. Quick standard architecture diagram displayed about MySQL Cluster.

Network partitioning protocol is designed to avoid a split brain scenario. Is there at least one node from each node group? If not then this part cannot continue – graceful shutdown. Are all nodes present from any node group? If so, then this is the only viable cluster – continue to operate. Ask the arbitrator – the arbitrator which parts will continue if no arbitrator is available the cluster shuts down.

Durability – in order for a node to recover fast some data is stored locally. The REDO log is synchronized by global checkpoints (GCP). The DataMemory is synchronized by local checkpoints (LCP).

I agree with Ralf — almost impossible to talk about NDB in 25 minutes. Its very deep, you’d need at least three hours to grasp it well.

Related posts:

  1. Gong-A-Thong at LugRadio Live USA
  2. Morning sessions at MySQL MiniConf
  3. MySQL at Google


PlanetMySQL Voting: Vote UP / Vote DOWN

Review of High Availability MySQL Cookbook by Packt Publishing

Июнь 7th, 2010

A few months ago, I reviewed MySQL Admin Cookbook. Today I am reviewing High Availability MySQL Cookbook from Packt Publishing by Alex Davies. Overall, I found the book to contain some good hidden Gems.

The book is a mixture of MySQL Cluster (NDB), Replication schemes, some performance tuning, some minor kernel tweaking, and some more exotic approaches to common High Availability problems. Overall, I found this book very informative and a good read.
Now the specifics, the book starts out on NDB and stays focused on this fact for about 60% of the book. The next 20% is on mySQL replication then about 10% of the book is on tweaking kernel, mysql, network settings to get the most out of the system. The last 10% is a mixture of uses of exotic systems such as GFS, Conga, ISCSI and how to use these shared storage techs with mySQL.

High Availability MySQL Cookbook, is a good read, and jogged my memory on NDB (since I do not use it on a day-to-day basis). If you are looking for a good reference on how to get an HA system up then this is a good book for you. If you are looking for why to use NDB over mySQL multi-master replication, this is not a book for you. The “why-to-use-this-over-that” is not the scope of the book. The drawbacks and concerns of what technology to use are not a focus of this book, and should not be. It is assumed that the reader knows which direction to go into for the most part or gives the reader enough information to set up and environment to see which way to go.

In conclusion, I like this book. I think it is rather concise, and right to the point, which either gets you started into building HA systems or gives you a good reference for an existing HA environment.

PlanetMySQL Voting: Vote UP / Vote DOWN

AlsoSQL

Апрель 8th, 2010

So there’s a bit of a swelling around the idea of NoSQL. That is, databases that don’t have an SQL interface in front of them – with the promise of better performance. With a well designed backend, this is no doubt the case.

A flexible query language is rather useful though. I think we’ll see the rise of AlsoSQL. That is systems that present a fast and simple protocol along with a SQL interface.

This hybrid system has seen use for many years. MySQL Cluster is one such example. SQL through MySQL Server, NoSQL through NDB API.

With Drizzle, I feel we’ll be in a pretty good position to offer non-sql based protocols and access methods to existing storage engines.


PlanetMySQL Voting: Vote UP / Vote DOWN

Continuing the journey

Март 11th, 2010

A couple of months ago (December 1st for those playing along at home) it marked five years to the day that I started at MySQL AB (now Sun, now Oracle). A good part of me is really surprised it was for that long and other parts surprised it wasn’t longer. Through MySQL and Sun, I met some pretty amazing people, worked with some really smart ones and formed really solid and awesome friendships. Of course, not everything was perfect (sometimes not even close), but we did have some fun.

Up until November 2008 (that’s 3 years and 11 months for those playing at home) I worked on MySQL Cluster. Still love the product and love how much better we’re making Drizzle so it’ll be the best SQL interface to NDB :)

The ideas behind Drizzle had been talked about for a while… and with my experience with internals of the MySQL server, I thought that some change and dramatic improvement was sorely needed.

Then, in 2008, Brian created a tree. I was soon sending in patches at nights, we announced to the whole world at OSCON and it captured a lot of attention.

Since November 2008 I’ve been working on Drizzle full time. It was absolutely awesome that I had the opportunity to spend all my days hacking on Drizzle – both directly with fantastic people and for fantastic people.

But… the Sun set… which was exciting and sad at the same time.

Never to fear! There were plenty of places wanting Drizzle hackers (and MySQL hackers). For me, it came down to this: “real artists ship”. While there were other places where I would no doubt be happy and work on something really cool, the only way I could end up working out where I should really be was: what is the best way to have Drizzle make a stable release that we’d see be suitable for deployment? So, Where Am I Now?

Rackspace.

Where I’ll again be spending all my time hacking Drizzle.


PlanetMySQL Voting: Vote UP / Vote DOWN

NDB$INFO with SQL hits beta

Февраль 3rd, 2010

Bernhard blogged over at http://ocklin.blogspot.com/2010/02/mysql-cluster-711-is-there.html that MySQL Cluster 7.1.1 Beta has been released. The big feature (from my point of view) is the SQL interface on top of NDB$INFO. This means there is now full infrastructure from the NDB data nodes right out to SQL in the MySQL Server for adding monitoring to any bit of the internals of the data nodes.


PlanetMySQL Voting: Vote UP / Vote DOWN

NDB$INFO with SQL hits beta

Февраль 3rd, 2010

Bernhard blogged over at http://ocklin.blogspot.com/2010/02/mysql-cluster-711-is-there.html that MySQL Cluster 7.1.1 Beta has been released. The big feature (from my point of view) is the SQL interface on top of NDB$INFO. This means there is now full infrastructure from the NDB data nodes right out to SQL in the MySQL Server for adding monitoring to any bit of the internals of the data nodes.


PlanetMySQL Voting: Vote UP / Vote DOWN

New Open Query training days in Australia

Октябрь 2nd, 2009

The favourite Open Query course modules as well as reworked and brand new ones, with November/December 2009 dates for Brisbane, Sydney, Canberra and Melbourne listed below. You can register for days/modules individually, to suit your time, budget and current needs. Your trainers are Sean, Ray and Arjen (see OQ people).

For the Canberra and Melbourne days which are DBA/HA, registrations for all of the modules in a series before 15 October will receive a copy of the “High Performance MySQL” book (normal bookstore price is AUD 105).

Canberra

Sydney

Brisbane

  • Thu 19 Nov: MySQL Query Performance Optimisation and Tuning
  • Fri 20 Nov: MySQL Server Performance Optimisation and Tuning

Melbourne


PlanetMySQL Voting: Vote UP / Vote DOWN