I was trolling the MySQL forums web site and noticed a post regarding someone who was trying to load a 50-gigabyte MySQL dump file. The author of the post had stated that the loading of the file had been running for days, and was wondering if there was a better way to import the file. They did not have access to anything else (i.e. – original database) but this file.
I have had to restore several databases in the past from a single large MySQL dump file – which led me to start backing up each database individually. These databases are for my own personal use and are not updated that often, so I don’t need to have point-in-time recovery – and so a MySQL dump works just fine. If I had a production system, I would invest in the MySQL Enterprise Backup and the MySQL Enterprise Monitor applications.
But, I have had problems when I tried to restore a relatively small MySQL instance of 21 databases with a total file size of less than 900 megabytes. So, I wrote a Perl script to simply split the large MySQL dump file into smaller files, which I could then load one at a time to isolate the problem. I am sure that there are many other ways of troubleshooting this person’s problem, but I thought I would dig out this script and post it here. This script is also handy for splitting any text file that you have into smaller files. All you would have to do is to change the keywords that causes a new file to be created.
The script opens the MySQL dump file, and starts outputting each line to a new file, looking for the keywords “CREATE DATABASE”. When the script sees those keywords, it closes the last file and opens a new one, and then prints that line with the keywords out to the new file until the keyword appears again. The script names each new file with a sequential number – as in database_01.sql, database_02.sql, etc. The first file that this script creates (named database_00.sql) contains the header information from the MySQL dump file, so you may ignore this file.
I tested this script on an 850-megabyte MySQL dump file, and it worked perfectly (and executed in less than one minute). To make sure that your new files contain the same amount of data as the original file, you can use the Unix command “wc” to show you the size of each file, and then you can compare the sum of these sizes to the size of your MySQL dump file. I first ran the script, and it produced the following 22 files:
sh-3.2# /usr/bin/perl split_file.pl
2011-09-13 21:24:23 - Creating file database_00.sql
2011-09-13 21:24:23 - Creating file database_01.sql
2011-09-13 21:24:23 - Creating file database_02.sql
2011-09-13 21:24:23 - Creating file database_03.sql
2011-09-13 21:24:23 - Creating file database_04.sql
2011-09-13 21:24:23 - Creating file database_05.sql
2011-09-13 21:24:23 - Creating file database_06.sql
2011-09-13 21:24:23 - Creating file database_07.sql
2011-09-13 21:24:23 - Creating file database_08.sql
2011-09-13 21:24:23 - Creating file database_09.sql
2011-09-13 21:24:23 - Creating file database_10.sql
2011-09-13 21:24:23 - Creating file database_11.sql
2011-09-13 21:24:23 - Creating file database_12.sql
2011-09-13 21:24:23 - Creating file database_13.sql
2011-09-13 21:24:23 - Creating file database_14.sql
2011-09-13 21:24:24 - Creating file database_15.sql
2011-09-13 21:24:24 - Creating file database_16.sql
2011-09-13 21:24:24 - Creating file database_17.sql
2011-09-13 21:24:24 - Creating file database_18.sql
2011-09-13 21:24:24 - Creating file database_19.sql
2011-09-13 21:24:24 - Creating file database_20.sql
2011-09-13 21:25:14 - Creating file database_21.sql
Here are the sizes of the individual database*.sql files, as well as the original source file named all_data.sql:
sh-3.2# wc database*sql
21 71 844 database_00.sql
33 109 868 database_01.sql
395 18313 1134377 database_02.sql
578 101269 5280603 database_03.sql
553 14095 366305 database_04.sql
553 10050 263748 database_05.sql
8 21 139 database_06.sql
70 2783 37976 database_07.sql
547 11055 410751 database_08.sql
320 10888 426911 database_09.sql
74 337 6727 database_10.sql
282 12624 325737 database_11.sql
255 1863 42550 database_12.sql
44 152 1337 database_13.sql
57 82921 13048515 database_14.sql
737 73955 908495 database_15.sql
44 258 2972 database_16.sql
191 535786 8777550 database_17.sql
129 2325 36632 database_18.sql
8 21 139 database_19.sql
1225 28094331 826795680 database_20.sql
312 134027 5401957 database_21.sql
6436 29107254 863270813 total
sh-3.2# ls -l all_data.sql
-rw-rw-rw- 1 tonydarnell admin 863270813 Sep 13 20:23 all_data.sql
As you can see, the total size of the database*.sql files is 863270813, and the size of our original MySQL dump file (all_data.sql) is the same size – 863270813.
After splitting the main MySQL dump file, you can now restore/import each file individually, and if you have any problems, they will be easier to troubleshoot.
Here is the script:
#!/usr/bin/perl -w
# Need to explicitly load the functions in POSIX
use POSIX 'strftime';
# file name of the file you want to split
$filename = "all_data.sql";
# Note that if you pass no argument to localtime, it assumes the current time
$time = strftime '%Y-%m-%d %H:%M:%S', localtime;
$count = 0;
$filecount = 0;
# syntax for new file names
# if you have more than 100 files, you could modify this statement to place a leading
# zero if the file is greater than 10 but less than 100
if ($filecount < 10) {
$new_filename = "database_0" . $filecount . ".sql";
}
else
{
$new_filename = "database_" . $filecount . ".sql";
}
print "$time - Creating file $new_filename\n";
open(OUTPUT, ">$new_filename") || die "Can't redirect stdout";
open(LINE ,"$filename") || die ("\nERROR - could not open file: $filename\n");
while (<LINE>)
{
if ($_ =~ "^CREATE DATABASE" && $count > 1)
{
# close the old file
close(OUTPUT);
$filecount++;
# syntax for new file names
if ($filecount < 10) {
$new_filename = "database_0" . $filecount . ".sql";
}
else
{
$new_filename = "database_" . $filecount . ".sql";
}
$time = strftime '%Y-%m-%d %H:%M:%S', localtime;
print "$time - Creating file $new_filename\n";
open(OUTPUT, ">$new_filename") || die "Can't redirect stdout";
print OUTPUT $_;
}
else
{
print OUTPUT $_;
}
$count++;
# end while
}
close (LINE);
close (OUTPUT);
—————————————–
|
Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn. |
PlanetMySQL Voting: Vote UP / Vote DOWN

