Archive for the ‘tools’ Category

Chain Copying to Multiple hosts

Май 18th, 2012

This week I was given the task of repopulating our entire primary database cluster.  This was due to an alter that had to be performed on our largest table.  It was easiest to run it on one host and populate the dataset from that host everywhere.

I recalled a while back reading a blog post from Tumblr about how to chain a copy to multiple hosts using a combination of nc, tar, and pigz.  I used this, with a few other things to greatly speed up our repopulation process.  As I was repopulating production servers, I did a combination of raw data copy and xtrabackup streams across our servers, depending on the position in our replication setup.

For a normal straight copy, here’s what I did:

On the last host, configure netcat to listen and then pipe the output through pigz and tar to uncompress and untar.  This needs to be run in the destination directory:

nc -l 1337 | pigz -d | tar -xvf -

On any hosts in the middle of the chain, you do the same thing with one extra step.  Using a fifo to redirect the stream to the next host:

mkfifo copy_redirect
nc next_host_in_chain 1337 <copy_redirect &
nc -l 1337 | tee copy_redirect | pigz -d | tar -xvf -

And on the source host you actually make the stream.  This is where I differed the most from what Tumblr had written.  I added a progress bar using pv.

tar -c /data/mysql/ | pv --size $( du -sh /data/mysql/ | cut -f1 ) | pigz | nc first_host_in_chain 1337

To do this with an xtrabackup stream, the commands are similar.  On each host, tar needs to add the “i” flag (to become “tar -xvfi -”).  The progress bar here became slightly less accurate, but was still a good rough estimate of the progress.  On the source host, the command became:

innobackupex --stream=tar /tmp/ --slave-info  | pv --size $( du -sh /data/mysql/ | cut -f1 ) | pigz | nc first_host_in_chain 1337

I found that using this method, for a raw copy, I was able to achieve between 300 and 350 MB/sec copying large tables.  Smaller tables averaged slower speeds.  I didn’t do enough testing here to see where the bottleneck was.  I can say that it was not network, cpu, or io.  Our servers involved have 10 GBit network and FusionIO drives.  Increasing the compression level may have helped add some throughput here as well.  Copying a 1.4 TB Dataset to 3 destination servers took under 2 hours.

This is definitely a tool that I will be adding to my arsenal to use on a regular basis.



PlanetMySQL Voting: Vote UP / Vote DOWN

Taxonomy of database tools

Май 12th, 2012

Taxonomy of Database Tools

In the MySQL ecosphere there is an ecosystem of tools.  Like real-world ecosystems, the “creatures” in the MySQL tools ecosystem can be classified and organized by a taxonomy.  There are already multiple taxonomies of software bugs (e.g. A Taxonomy of Bugs), but as far as I know this is the first Taxonomy of Database Tools.  A taxonomy of database tools serves useful purposes, as discussed in the previously linked page.  For me, the most useful purpose is the high-level ecosystem view which I use to compare MySQL tools to Drizzle tools.  In so doing, one sees clearly how the MySQL tools ecosystem is thriving whereas the Drizzle tools ecosystem is just budding, so to speak.  For other people, I imagine two overarching interests in a taxonomy of database tools.

First, by laying out the ecosphere in a simple, organized, and comprehensible fashion, a taxonomy of database tools can permit a user (DBA, sysadmin, etc.) to see how well they are “tooled”.  For example, when I gave a presentation on pt-table-checksum at PLMCE 2012, I was surprised to learn how many people never used a tool to verify replication data integrity.  I did not bother to ask why, but I suspect it is because they were not aware that such tools existed.  By looking at this taxonomy of database tools, some users might discover a new type of tool of which there are already many examples.

Second, a taxonomy of database tools is interesting for developers because it reveals where a database server has missing capabilities that users compensate for with tools.  Point in case: pt-table-checksum is used to verifying replication data integrity because until MySQL 5.6 this capability did not exist in the database server.  It is debatable whether all types of tools could be implemented natively in a database server; in theory, they probably could.  This debate becomes a practical concern for modularly-designed database servers like Drizzle because in my humble opinion it is far easier to write plugins and thus tools-as-plugins for Drizzle than for MySQL.

This Taxonomy of Database Tools is still a work in progress.  A lot of the descriptions need to be expanded, traits refined, and more examples added.  If you do not agree with its organization, you can suggest a change, or develop your own taxonomy.  In any case, I will continue to refine this Taxonomy of Database Tools to see where it leads and what it reveals.


PlanetMySQL Voting: Vote UP / Vote DOWN

Making MySQL comfortable for Oracle DBAs

Март 7th, 2012

I’m at Hotsos Symposium this week, and it suddenly occurred to me that a lot of Oracle DBAs who are beginning to manage MySQL servers might have some things to share with others in a similar role shift:

  • Familiar, comfortable tools and techniques, or capabilities of the Oracle Database, that you miss in MySQL
  • Equivalents or replacements for the aforementioned

Do you have anything to share with your fellow DBAs going through ODT (Oracle Delerium Tremens)? Do you have any wishes that you haven’t satisfied yet? Post in the comments and let’s see if we can create a sort of forum for sharing and/or a wishlist in case someone gets an urge to fill in a missing piece!

Further Reading:


PlanetMySQL Voting: Vote UP / Vote DOWN

Building libMemcached RPMs

Декабрь 2nd, 2011

A client running CentOS 5.4 Amazon EC2 instances needed the latest libMemcached version installed. With the inclusion of the "make rpm" target, libMemcached makes it easy to build the libMemcached RPMs by doing the following:

Spin up a new CentOS Amazon EC2 instance,

As root on the new instance:

yum install @development-tools
yum install fedora-packager
/usr/sbin/useradd makerpm

Now change to the makerpm user and build the RPMs:

su - makerpm
rpmdev-setuptree
tree
wget http://launchpad.net/libmemcached/1.0/1.0.2/+download/libmemcached-1.0.2.tar.gz
tar -zxf libmemcached-1.0.2.tar.gz
./configure && make rpm
find . -name '*rpm*'

References:
http://libmemcached.org
http://fedoraproject.org/wiki/How_to_create_an_RPM_package
https://launchpad.net/libmemcached/+download


PlanetMySQL Voting: Vote UP / Vote DOWN

Monitoring your monitoring tools (MONyog inside) !

Декабрь 2nd, 2011

Regardless of the monitoring tool you use to monitor your databases, it can be better to monitor this tool.
No, it’s not a joke ! Do you think you can have a benefit with a monitoring tool not connected to your servers ? ( without being alerted )

I choose to talk about MONyog here but this can apply to all existing monitoring tools.
I just want to share the message, the tool does not matter, so, do it !

So, let me explain how to control if you have fresh data with MONyog.
With MONyog it’s easy because it’s an agentless monitoring tool.

There are two ways to check that :

Per server general info :


 

 

 

 

 

 

For each server, you can find the last data updated date in the Monitors/Advisors tab.
The MySQL Availability in the dashboard is also usefull.

But it’s a per server information, so, that can be boring if you have a lot of databases servers.

General log file :

 

The default location for this file is MONyog_PATH/MONyog.log on your MONyog server.
You can find usefull informations there about MySQL, SFTP or SSH connexion failures for all your databases servers, in one place.

This log can also be used to detect client side errors if you don’t have access to the application servers.

Hope that can help, tell us what is the best method to do that with others monitoring tools like MySQL Enterprise Monitor.

Have a nice week-end.


PlanetMySQL Voting: Vote UP / Vote DOWN

Using HAProxy for MySQL failovers

Декабрь 2nd, 2011

There are a number of solutions available for MySQL architecture that provide automatic failover, but the vast majority involve potentially significant and complex changes to existing configurations. Fortunately, HAProxy can be leveraged for this purpose with a minimum of impact to the current system.  Alex Williams posted this clever solution a couple years ago in his blog.  Here we take a closer look at the details of implementing it into an already existing system.

HAProxy is a freeware load balancer, proxy, and high availability application for the TCP and HTTP protocols.  Since it was built mostly to handle web traffic, it has robust rule writing using HTTP components to check on the health of systems.  The key to Alex's solution was creating xinetd daemons on the database servers that send out HTTP messages.  The HTTP check to determine database statuses works like this:

  1. The HAProxy server sends HTTP requests to the database servers at configured intervals to specified ports
  2. The /etc/services file on the database servers maps those ports to services in their /etc/xinetd.d directory
  3. The services can call any specified script, so we build scripts that connect to the databases and checks for whatever conditions we choose.

The services then return an OK or a Service Unavailable response per the conditions.  Code for these scripts is in included in the Alex's article.

Our database configuration for this implementation is two pairs of Master-Slave databases in an Active-Passive relationship with Master-Master replication between the sets.  Siloing the passive Master-Slave provides a hot spare as well as continuous up-time during deployments provided we have a means of swapping the active/passive roles of each pair.  To accomplish the latter, we built two HAProxy configuration files, haproxy_pair1.cfg and haproxy_pair2.cfg, the only difference between the two being which Master-Slave pair is active.  Having the two files indicate which pair is active also allows immediate visibility of the current configuration.

Just as in Alex's sample configuration, our web application uses two DNS entries, appmaster and appslave to call writes and reads respectively.  The IPs for these addresses are then attached to our HAProxy server, allowing HAProxy to bind to them when it starts and then route them to the appropriate database server.

On our HAProxy server we then customized the /etc/init.d/haproxy script to handle an additional parameter of "flipactive" which provides us the capability to swap the database pairs:

flipactive() {
        # first detect which cfg file haproxy is using
        ps_out=`ps -ef |grep haproxy|grep cfg`
        # if pair2 then use pair1
        if [[ "$ps_out" =~ pair2 ]]  then
             # the -sf does a friendly reread of the config file
             /usr/local/sbin/haproxy -f /etc/haproxy/haproxy_pair1.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
        # if pair1 then use pair2
        else
             /usr/local/sbin/haproxy -f /etc/haproxy/haproxy_pair2.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
        fi
}

The rest of the configuration details are covered pretty well in Alex's article as well as in the HAProxy documentation.


PlanetMySQL Voting: Vote UP / Vote DOWN

Rails and Database Session Variables

Ноябрь 22nd, 2011

Ruby's ActiveRecord provides a very robust means of manipulating database table structures especially in the context of automated deployments such as Capistrano.

When there is a master-slave database configuration, it may not always be prudent to perform DDL statements on the master and let those propagate through to the slave, especially in high-volume sites where Rails deployments may involve multiple migrations since replication lag may occur and present some significant problems.  In addition, when no downtime is specified, a DDL rollout may break the current application code, so a targeted deployment may be more prudent.

In MySQL, the solution would be to set the session variable SQL_LOG_BIN = 0 which causes subsequent statements not to get written to the binary log and therefore won't get replicated to slaves.  Unfortunately, the given connection options of ActiveRecord do not accommodate the setting of database options.  However, one way to accomplish setting this variable would be to have developers explicitly set it in the migration files:

class AddAccounts < ActiveRecord::Migration

   def self.up

     execute("set SESSION sql_log_bin=0")

     create_table :api_accounts do |t|

       t.column "user_id", :int

       t.column "name", :text

       t.column "account",:text

       t.column "created_at", :datetime

       t.column "updated_at", :datetime

     end

   end

   def self.down

    execute("set SESSION sql_log_bin=0")

    drop_table :api_accounts

  end

end

But since this would be a hassle for developers and is easily overlooked, we can leverage Capistrano's architecture to monkey patch the ActiveRecord::Migration class so that this variable is set whenever the migrations are invoked.  So we constructed a file, config/initializers/active_record.rb:

#opens the ActiveRecord::Migration class
#use alias_method to add functionality to the ActiveRecord::Migration.migrate method
class ActiveRecord::Migration
  class << self
     alias_method :migrate_conn, :migrate
     def migrate(direction)
        ActiveRecord::Base.connection_pool.with_connection do |conn|
        @connection = conn
        connection.execute("SET SQL_LOG_BIN=0")
        @connection = nil
        end
        migrate_conn(direction)
     end
  end
end
Note that the database user that invokes the migrations needs to have the SUPER privilege granted to it in order to be able to set this variable.

We successfully used this technique to build rolling DDL scripting to multiple databases using Capistrano.  It allowed us to have explicit control over which database was being updated, thereby giving us the means necessary to update one database while others served the current application code.  


PlanetMySQL Voting: Vote UP / Vote DOWN

Cool Percona Tools

Ноябрь 11th, 2011

Part of my job at Percona is to perform customer audits.  I’ve had fun the last week getting up to speed on some of the great tools in the Percona Toolkit.  I’m sure people have talked about these before, but to be honest, I always found it hard to take the time to learn new tools, and at my previous job there were often a lot of tools in-house that made it easier to avoid learning something new.  The great thing about the Percona tools is that they aren’t environment dependent (as much as is possible), they typically only require Perl to run.

So, with that being said, I wanted to just go through a few of the tools I learned about this last week that I found useful.

pt-collect:

The collect script collects information.  A lot of information.  I ran it for 10 minutes on a customer’s server and got ~30MB of log data to go through.  Most useful were iostat samples (for 10 mins straight), diskstats samples (more on that later), vmstat samples.  It also (if you give it the right options) can collect mysql data right along side the system stuff.  

pt-diskstats:

This is basically a better iostat.  I like the response time numbers separately for reads and writes, and it generally seems more accurate than iostat.

pt-summary:

Collects a whole bunch of system data.  Don’t forget to run this as root!  CPUs, check.  RAM, check.  Disk drives, check.  RAID controller, check (if you have the RAID cli installed, it at least works with megacli).  IO scheduler, check.  This pretty much covers everything you would normally want to collect for system information

pt-mysql-summary:

Like pt-summary, stuff you probably want to know about MySQL.  Interesting system/status variables, the works.

pt-mext:

Takes a command that outputs a bunch of values in a specific format (most useful with ‘mysqladmin ext’) and samples it over time, outputting the changes (makes the most sense with counter status variables).  This tool is similar to myq_gadgets, except that it displays from left to right over time, while myq-gadgets displays top to bottom.  This tool is also more generic, anything that spews stats (in the right format) can be used, while myq_gadgets has custom reports written around groups of MySQL status variables.

pt-online-schema-change:

I haven’t used this yet, but it looks really cool.  Basically you feed it the table you want to alter and how you want to alter it.  It then (in roughly this order):

  1. Clones the table schema from t1 into a new table t2
  2. Does the alter on t2
  3. Sets up triggers on t1 so any new data going into t1 gets added into t2
  4. Starts a slow crawl through t1 and copies data to t2
  5. Once t2 == t1, does a RENAME TABLE to switch the two
  6. Delete the old t1

pt-query-digest:

This is probably the best tool of them all.  It can take general query log, slow query log, or even tcpdump data of packets going to and from the mysql server and generate a query digest about all queries, sorted by those that take the most time.  This is extremely useful to have complete look at every query hitting a mysql server.

There’s a bunch more tools, I know.  These are the ones that I either used or read up on in the last week.  Looking forward to learning some more.


PlanetMySQL Voting: Vote UP / Vote DOWN

How to recreate an InnoDB table after the tablespace has been removed

Ноябрь 2nd, 2011

Does your error log ever get flooded with errors like this one?

 

[ERROR] MySQL is trying to open a table handle but the .ibd file for
table my_schema/my_logging_table doesnot exist.
Have you deleted the .ibd file from thedatabase directory under
the MySQL datadir, or have you used DISCARD TABLESPACE?
See http://dev.mysql.com/doc/refman/5.0/en/innodb-troubleshooting.html how you can resolve the problem.

 

No? That is great!

 

We had a case where, in order to quickly solve a disk space issue, a SysAdmin decided to remove the biggest file in the filesystem, and of course this was an InnoDB table used for logging.

That is, he ran:

shell> rm /var/lib/mysql/my_schema/my_logging_table.ibd

He could have run TRUNCATE TABLE, but that's another story.

 

The results were not ideal:

  1. The table did not exist anymore.

  2. Errors in the application while trying to write to the table.

  3. MySQL flooding the error log.

 

The solution for this problem is to:

  • run DISCARD TABLESPACE ( InnoDB will remove insert buffer entries for that tablespace);

  • run DROP TABLE ( InnoDB will complaint that the .ibd file doesn't exist, but it will remove it from the internal data dictionary );

  • recover the CREATE TABLE statement from the latest backup ( you have backups, right? );

  • issue the CREATE TABLE statement to recreate the table.

 

Example:

mysql> ALTER TABLE my_logging_table DISCARD TABLESPACE;

Query OK, 0 rows affected (0.05 sec)

 

In the error log you will see something like:

InnoDB: Error: cannot delete tablespace 251
InnoDB: because it is not found in the tablespace memory cache.
InnoDB: Warning: cannot delete tablespace 251 in DISCARD TABLESPACE.
InnoDB: But let us remove the insert buffer entries for this tablespace.

 

mysql> DROP TABLE my_logging_table;

Query OK, 0 rows affected (0.16 sec)

 

In the error log you will see something like:

InnoDB: Error: table 'my_schema/my_logging_table'
InnoDB: in InnoDB data dictionary has tablespace id 251,
InnoDB: but tablespace with that id or name does not exist. Have
InnoDB: you deleted or moved .ibd files?
InnoDB: This may also be a table created with CREATE TEMPORARY TABLE
InnoDB: whose .ibd and .frm files MySQL automatically removed, but the
InnoDB: table still exists in the InnoDB internal data dictionary.
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/innodb-troubleshooting.html
InnoDB: for how to resolve the issue.
InnoDB: We removed now the InnoDB internal data dictionary entry
InnoDB: of table `my_schema/my_logging_table`.

 

And finally:

mysql> CREATE TABLE `my_logging_table` (
(... omitted ...)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Query OK, 0 rows affected (0.05 sec)

 

Of course, the final step - is a stern talking to with the SysAdmin.


PlanetMySQL Voting: Vote UP / Vote DOWN

Getting a List of Users From the MySQL General Log

Октябрь 30th, 2011

From time to time, organizations want to know if there are any users that are not used.  For clients using MySQL 5.1 and up, that can handle a 1% overhead for general logging, we will enable the general log for a period of time and analyze the users that connect.

Note: we have some extremely busy clients, and we very rarely have a problem turning the general log on, other than making sure we're rotating and compressing logs so we do not run out of disk space.

Once we have the logs, I run this little perl tool I made -- I call it genlog_users.pl:

#!/usr/bin/perl

 

my $infile=$ARGV[0];

my %seen=();

my @uniq=();

 

open (INPUT, "<$infile");

while (<INPUT>) {

  my $line=$_;

  if ($line=~/Connect/) {

    if ($line=~/(\S*@\S*)/) { push(@uniq, $1) unless $seen{$1}++; }

 } # end if line matches Connect

}

close INPUT;

open (OUTPUT, ">>..users.txt");

$,="\n";

print OUTPUT (keys %seen);

print OUTPUT ("\n");

close OUTPUT;

 

----------

I hope it is useful for whoever stumbles on this; I know it has been useful for me in the past -- it's just doing some string matching, and I bet if I used Python it would be done in half the lines, but it's already fewer than 20 lines, so it's pretty small to begin with.


PlanetMySQL Voting: Vote UP / Vote DOWN