Archive for the ‘scripts’ Category

MySQL foreach()

Декабрь 2nd, 2011

A new routine is now available in common_schema, which makes for an easier execution syntax for some operations:

foreach(collection_to_iterate_over, queries_to_execute_per_iteration_step);

To illustrate what it can do, consider:

call foreach('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPACT');

call $('schema like shard_%', 'CREATE TABLE ${schema}.messages (id INT)');

call $('2000:2009', 'INSERT IGNORE INTO report (report_year) VALUES (${1})');

$() stands as a synonym to foreach(). I suspect it should look familiar to web programmers.

The idea for foreach() was introduced by Giuseppe Maxia during a correspondence. At first I was skeptic: this isn't jQuery; this is SQL. Why would I want to use foreach()?

Then Giuseppe provided some use cases, and as I started thinking about it, I found more and more cases where such a tool might considerably reduce scripting overhead and avoid requiring SQL-fu skills. In fact, I have been using it myself for the past few weeks

I provide examples which I hope will convince the reader as for the simplicity of using such syntax. Showing off the types of input foreach() accepts (query, table search, schema search, set of constants, single or double numbers range), and the types of queries it can execute (single, multiple, using placeholders).

I stress that this is not a replacement for common queries (i.e. does not come to replace your common SELECT, UPDATE, DELETE), but more for working out administrative tasks. Nevertheless, the last example in this post does provide with an interesting insight on possible "normal" use.

Use case: using values from query

Let's kill all queries running for over 20 seconds:

call foreach('SELECT id FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 20', 'KILL QUERY ${1}');

The thing I like most about foreach() is that it's self explanatory. Nevertheless, I note:

  • The KILL command is executed for each process running for more than 20 seconds (I did round up corners, since I didn't check for sleeping processes, for simplicity).
  • I also use the ${1} placeholder: much like in awk, this will get the first column in the result set. In our case, it is the single column, id.
  • I chose to invoke a single query/command per iteration step.

Compare the above with another solution to the same problem, using eval():

call eval('SELECT CONCAT(\'KILL QUERY \',id) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE TIME > 20');

They both get the same thing done. But foreach() is just a bit more friendly to write (and read).

Let's move to a more complicated example.

Use case: using multiple values from a query, invoking multiple commands

Let's kill some queries, as above, but also write down a log entry so that we know what happened:

call foreach(
  'SELECT id, user FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 20',
  'KILL QUERY ${1}; INSERT INTO my_log VALUES (\'have just killed query ${1}, executed by ${2}\');')
;

In the above, for each long running process, we:

  • Kill the process' query. id being the first column, is referenced by ${1}.
  • INSERT to my_log that said process has been killed. We note both id and user using placeholders ${1} and ${2}, respectively.

It's possible to invoke as many queries/commands per iteration step. It is possible to use placeholders ${1} through ${9}, as well as ${NR}, which works as in awk: it is a row-counter, 1-based.

This example can still be written with eval(), but in much uglier form. I can't just first KILL the processes, then log about them, since by the time I want to log, the queries will not be running; the commands must be coupled. This is naturally done with foreach().

Use case: iterating constant values, invoking DDL

The commands invoked by foreach() can take the form of DML (INSERT/UPDATE/...), DDL (CREATE/ALTER/...) or other (KILL/SET/...). The placeholders can be used anywhere within the text.

Take an installation where different schemata have the same exact table structure. We want to refactor a table on all schemata:

call $('{USA, UK, Japan, NZ}', 'ALTER TABLE db_region_${1}.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8');

The above translates to the following commands:

ALTER TABLE db_region_USA.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_UK.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_Japan.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_NZ.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;

In the above, we:

  • Provide a list of constant values. These can be strings, numbers, whatever. They are space delimited.
  • Use the ${1} place holder. We can also use ${NR}.

Use case: loop through number sequence

We wish to populate a table with values:

call foreach('1970:2038', 'INSERT INTO test.test_dates (dt) VALUES (DATE(\'${1}-01-01\'))');

The above results with:

mysql> SELECT dt FROM test_dates;
+------------+
| dt         |
+------------+
| 1970-01-01 |
| 1971-01-01 |
| 1972-01-01 |
| 1973-01-01 |
| 1974-01-01 |
...
| 2036-01-01 |
| 2037-01-01 |
| 2038-01-01 |
+------------+

With numbers range:

  • Integers are assumed
  • Range is indicated by low and high values, both inclusive
  • Negatives allowed (e.g. '-5:5', resulting with 11 steps)
  • Placeholders ${1} and ${NR} are allowed.

Use case: iterating through two dimensional numbers range:

We use 3 template tables; we create 15 schemata; in each we create 3 tables based on the template tables:

call foreach( '1:15,1:3',
  'CREATE DATABASE IF NOT EXISTS db_test_${1}; CREATE TABLE db_test_${1}.tbl_${2} LIKE db_template.my_table_${2};'
);

Notes:

  • Each of the number ranges has the same restrictions and properties as listed above (integers, inclusive, ascending)
  • We can now use ${1} and ${2} placeholders, noting the first and second numbers range, respectively.
  • We may also use ${NR}, which, in this case, will run 1 through 45 (15 times 3).
  • We use multiple queries per iteration step.

Use case: overcoming MySQL limitations

MySQL does not support ORDER BY & LIMIT in multi-table UPDATE and DELETE statements (as noted last year). So we cannot:

DELETE FROM t1 USING t1 JOIN t2 ON (...) JOIN t3 ON (..) WHERE x = 7 ORDER BY ts LIMIT 100;

However, we can:

call foreach(
  'SELECT t1.id FROM t1 JOIN t2 ON (...) JOIN t3 ON (..) WHERE x = 7 ORDER BY ts LIMIT 100',
  'DELETE FROM t1 WHERE id = ${1}'
);

Of course, it will do a lot of single row DELETEs. There are further MySQL limitations which complicate things if I want to overcome this. Perhaps at a later blog post.

Acknowledgements

I hit a weird bug which prevented me from releasing this earlier on. Actually it's a duplicate of this bug, which makes it 6 years old. Hurray.

To the rescue came Roland Bouman, who suggested an idea so crazy even I was skeptic: to parse and modify the original query so as to rename column names according to my scheme. And of course he made it happen, along with some additional very useful stuff. It's really a super-ultra-meta-meta-sql-fu magic he does there.

So, thanks, Roland, for joining the ride, and thanks, Giuseppe, for testing and helping out to shape this functionality. It's great fun working with other people on open-source -- a new experience for me.

Continued

In this post I've covered the general-purpose iterations. There are also more specific types of iterations with foreach(). Continued next.


PlanetMySQL Voting: Vote UP / Vote DOWN

Test-driven SQL development

Октябрь 20th, 2011

I'm having a lot of fun writing common_schema, an SQL project which includes views, tables and stored routines.

As the project grows (and it's taking some interesting directions, in my opinion) more dependencies are being introduced, and a change to one routine or view may affect many others. This is why I've turned the development on common_schema to be test driven.

Now, just how do you test drive an SQL project?

Well, much like the way you test any other project in your favorite programming language. If its functions you're testing, that's all too familiar: functions get some input and provide some output. Hmmm, they might be changing SQL data during that time. With procedures it's slightly more complex, since they do not directly return output but result sets.

Here's the testing scheme I use:

  • Tests are divided to families. For example, there is a family of tests for the eval() function.
  • Each test in a family is responsible for checking the simplest, most "atomic" issue. This means many small tests.
  • Each test can have a "pre-test" step, which prepares the ground (for example, create a table and populate it)
  • Likewise, a test can have a "post-test" step, which is typically just cleanup code (since the test is already complete by the time the post step is invoked).
  • Each test is an SQL file: a set of commands to be executed.
  • A test may have an "expected output" file.
  • If no explicit expected exists, the test is expected to return "1" (just as the most basic JUnit test assumes an "assert true")
  • A test family may also have pre- and post- steps.
  • Any failure in any step fails the entire process. Failures may include:
    • Failure to prepare the grounds for a test or family of tests
    • Failure in executing the test
    • Mismatch between test's output and expected result.
    • Failure in executing the post- step (may indicate yet invalid test result not intercepted by the test)

An example

The following image presents a single test family: the eval family, testing the eval() routine.

Test driven SQL development - sample

  • In this family, there are two tests.
  • In both tests, we have a pre-test step, and a test.
  • We have no post-test here.
  • Nor do we have an expected-output sample, which means the tests expect to return with "1".

Implementation

But how are tests conducted? Via mysql, of course. While tests are plain SQL text file, they are being executed against a running MySQL server using the mysql client. It is given the test files as input, and its output is directed to file as well.

This makes it very easy to code the test using a simple shell script. It takes a small measure of file iteration, process invocation, exit code check, and diff execution.

For example, to test eval()'s 01 test, we first execute mysql with 01/pre.sql as input. Assuming success, we execute mysql again, this time with 01/test.sql. We capture the output of this execution, and compare it with expected-output, or with "1" when no expected-output specified.

Tests pass, or no code!

Some 12 years ago, I worked with a less-known version system called aegis. The thing I remember most from aegis was that it had a good tests infrastructure. Long before "test-driven development" was coined, or was even commonly practiced, aegis supported tests right into your version control. "Right into", in the sense that you could not merge your code back to the baseline if it didn't pass all of the tests.

I work with SVN for common_schema, and I do not know of such an option in SVN. But I also use ant to build this project, and the dependency is clear there: ant dist, my target which creates the distribution files, is dependent on ant test, the target which works out the tests.

That is, you cannot generate the distribution files when tests fail.

Further notes

Since I'm now retroactively patching tests for already existing functionality, calling it test-driven development is an overstatement; nevertheless new tests are already proving invaluable when I keep changing and improving existing code. Suddenly dependent functionality no longer works as expected. What fun!

The code for the testing suite is actually much shorter than this blog post.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL HA Agent Mini HowTo

Август 18th, 2011

Why This Post


While testing Yoshinori Matsunobo's MHA agent I found that although the wiki has a very complete documentation, it was missing a some details. This article intends to close that gap and bring up some issues to keep in mind when you do your own installation. At the end of the article I added a Conclusions section, if you're not interested in the implementation details, but to read my take on the project, feel free to jump straight to the end from here.

My Test Case


Most of our production environments can be simplified to match the MHA's agent most simple use case: 1 master w/ 2 or more slaves and at least one more slave in an additional tier:

Master A --> Slave B

+-> Slave C --> Slave D

As noted in the documentation, in this case the MHA agent will be monitoring A, B & C only. I found that unless you have a dedicated manager node, a slave on the 3rd tier (Slave D above) is suitable for this role. All 4 servers were setup as VMs for my evaluation / tests. It makes it easier to simulate hard failure scenarios in a controlled environment. Once this is in place the fun begins.

1st Step: User Accounts


In all the examples in the documentation it uses root to login into MySQL and the OS. I prefer to create specific users for each application, so I created a specific MySQL user for the MHA agent and used the linux' mysql user (UID/GID = 27/27 in RedHat / CentOS).

MySQL Credentials

Reviewing the code, I was able to determine that the agent requires to run some privileged commands like: SET GLOBAL variable, CHANGE MASTER TO ..., FLUSH LOGS ..., SHOW SLAVE STATUS, etc. and creates internal working tables to be used during the master fail over. The easiest way to set it up was using:
GRANT ALL PRIVILEGES ON *.* TO mha_user@'ip address'  

IDENTIFIED BY password;
This should be repeated on all 4 servers using the IP addresses for all the potential manager nodes. Yes, it would be possible to use wildcards, but I consider restricting access from specific nodes a safer practice.

The MySQL replication user needs to be set up to connect from any other server in the cluster, since any of the slaves in the group could be promoted to be master, and have the rest of them connecting to it.

Linux User

As I mentioned before I use the default RedHat / CentOS definition for the mysql user. Keep in mind that if you installed from the official Oracle packages (ie: RPMs), they may not follow this criteria and could result in mismatching UID/GIDs between servers. The UIDs/GIDs for the mysql user and group have to be identical on all 4 servers. If this is not the case, you may use the following bash sequence/script as root to correct the situation:

#!/bin/bash 

# stop mysql
/etc/init.d/mysql stop

# Change ownership for all files / directories
find / -user mysql -exec chown -v 27 {} \;
find / -group mysql -exec chgrp -v 27 {} \;

# remove old user / group and rename the new ones
# might complain about not being able to delete group.
groupdel mysql
userdel mysql

# Add the new user / group
groupadd -g 27 mysql
useradd -c "MySQL User" -g 27 -u 27 -r -d /var/lib/mysql mysql

# restart MySQL
/etc/init.d/mysql start

Once the mysql user is properly setup, you'll have to create password-less shared keys and authorize them on all the servers. The easiest way to do it is to create it in one of them, copy the public key to the authorized_keys file under the /var/lib/mysql/.ssh directory and then copy the whole directory to the other servers.

I use the mysql user to run the scripts since for most distributions it can't be used to login directly and there is no need to worry about file permissions, which makes it a safe and convenient user.

2nd Step: Follow The Documentation to Install and Configure


Once all the users have been properly setup, this step is straight forward. Check the Installation and Configuration sections of the wiki for more details.

For the placement of the configuration files I deviated a little bit from documentation, but not much:

  1. Used a defaults file: /etc/masterha_default with access only for user mysql since it includes the MHA agent password:
    -rw------- 1 mysql mysql 145 Aug 11 16:36 masterha_default.cnf
  2. The application settings were placed under /etc/masterha.d/ this way they're easy to locate and won't clutter the /etc directory.
For simplicity, I didn't include any of the optional scripts and checks (ie: secondary check) in the configurate. You may want to check the documentation and source code of these scripts. Some of them are not even code complete (ie: master_ip_failover). Unless you are implementing some of the more complicated use cases, you won't even need them. If you do, you'll need to write your own following the examples provided with the source code.

Once you have everything in place, run the following checks as the mysql user (ie: sudo su - mysql):
  1. masterha_check_ssh: Using my configuration files the command line looks like:
    masterha_check_ssh --conf=/etc/masterha_default.cnf --conf=/etc/masterha.d/test.cnf
  2. masterha_check_repl: This test will determine whether the agent can identify all the servers in the group and the replication topology. The command line parameters are identical to the previous step.

Both should show and OK status at the end. All utilities have verbose output, so if something goes wrong it's easy to identify the issue and correct it.

3rd Step: Run the Manager Script


If everything is OK, on the MHA node (Server D in my tests) run the following command as user mysql (ie: sudo su - mysql):

masterha_manager --conf=/etc/masterha_default.cnf --conf=/etc/masterha.d/test.cnf

You have to keep in mind that should the master fail, the agent will fail over to one of the slaves and stop running. This way it'll avoid split brain situations. You will either have to build the intelligence in the application to connect to the right master when failing or use a virtual IP. In both cases you'll might need to use customized IP failover scripts. The documentation provides more details.

Read the section about running the script in the background to choose the method that best fits your practice.

You will have to configure the notification script to get notified of the master failure. The failed server will have to be removed from the configuration file before re-launching the manager script, otherwise it will fail to start.

You can restart the failed server and set it up as a slave connected to the new master and reincorporate it to the replication group using masterha_conf_host.

Conclusion


This tool solves a very specific (and painful) problem which is: make sure all the slaves are in sync, promote one of them and change the configuration of all remaining slaves to replicate off the new master and it does it fairly quickly. The tool is simple and reliable and requires very little overhead. It's easy to see it is production ready.

The log files are pretty verbose, which makes it really easy to follow in great detail all the actions the agent took when failing over to a slave.

I recommend to any potential users to start with a simple configuration and add the additional elements gradually until it fits your infrastructure needs.

Although the documentation is complete and detailed, it takes some time to navigate and to put all the pieces of the puzzle together.

I would like the agent to support master-master configurations. This way it would minimize the work to re-incorporate the failed server into the pool. Yoshinori, if you're reading this, know that I'll volunteer to test master-master if you decide to implement it.

PlanetMySQL Voting: Vote UP / Vote DOWN

Useful sed / awk liners for MySQL

Июль 6th, 2011

Listing some useful sed / awk liners to use with MySQL. I use these on occasion.

sed, awk & grep have many overlapping features. Some simple tasks can be performed by either. For example, stripping empty lines can be performed by either:

grep '.'
awk '/./'
sed '/./!d'
grep -v '^$'
awk '!/^$/'
sed '/^$/d'

It’s a matter of taste & convention which tool and variation to use. So for any script I suggest, there may be many variations, possibly cleaner, shorter; feel free to comment.

mysqldump

The output of mysqldump is in particular useful when one wishes to make transformation on data or metadata.

  • Convert MyISAM tables to InnoDB:
mysqldump | sed -e 's/^) ENGINE=MyISAM/) ENGINE=InnoDB/'

I’ve had several occasion when people said this type of conversion assumes no ‘ENGINE=MyISAM’ snippet exists within row data. This is not so. The ‘^) ENGINE=MyISAM/’ pattern strictly requires that this text is outside row data. No row data begins with a ‘)’. This is a safe conversion.

  • Convert InnoDB to InnoDB plugin, compressed tables:
mysqldump | sed -e 's/^) ENGINE=InnoDB/) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8/'
  • Slice out a specific database (assumes existence of the USE statement):
sed -n "/^USE \`employees\`/,/^USE \`/p"
  • Slice out a specific table:
sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p"
  • Combine the above two statements to slice a specific table from a specific database:
sed -n "/^USE \`employees\`/,/^USE \`/p" | sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p"

See also On restoring a single table from mysqldump.

my.cnf

Some my.cnf files are just a mess to read. Here’s some normalizing scripts:

  • Strip a my.cnf file from comments, remove blank lines, normalize spaces:
cat my.sandbox.cnf | sed '/^#/d' | sed '/^$/d' | sed -e 's/[ \t]\+//g'
  • Same, but only present [mysqld] section parameters:
cat my.sandbox.cnf | sed -n '/^\[mysqld\]/,/^\[/p' | sed '/^\[/d' | sed '/^#/d' | sed '/^$/d' | sed -e 's/[ \t]\+//g'
  • Only present [mysqld] section parameters, tab delimited (this is useful in exporting and comparing instance parameters):
cat my.sandbox.cnf | sed -n '/^\[mysqld\]/,/^\[/p' | sed '/^\[/d' | sed '/^#/d' | sed '/^$/d' | sed -e 's/[ \t]\+//g' | sed -e 's/=/\t/'
  • Multi-word parameters in my.cnf can be written with either hyphens or underscores. innodb_file_per_table is the same as innodb-file-per-table, as well as innodb_file-per_table. The following normalizes the parameter names to using underscores only, keeping from changing values (e.g. ‘mysql-bin’ parameter value should not change). It isn’t pretty!
cat my.sandbox.cnf | awk -F "=" 'NF < 2 {print} sub("=", "=~placeholder~=") {print}' | awk -F "=~placeholder~=" 'NF < 2 {gsub("-", "_", $0); print} NF==2 {gsub("-", "_", $1); print $1 "=" $2}'
grep “.”
awk ‘/./’
sed ‘/./!d’
grep -v ‘^$’
awk ‘!/^$/’
sed ‘/^$/d’

PlanetMySQL Voting: Vote UP / Vote DOWN

Fun with Bash: aliases make your live easier… share your favorites

Февраль 10th, 2011

I’ve always been a big fan of having a customized .bashrc file. The one I distribute to all of my servers has aliases for quick commands to save me time on the command line, functions that get work done when aliases are too simplistic, reporting for the server for each cli login, and of course a formatted and colored prompt (for terms that support colors). I also change certain aspects and commands based on the operating system since I’m not always on a redhat box or linux at all. Here’s my bashrc file – maybe you have some fun additions that you’d like to share. What saves you time on the command line?


PlanetMySQL Voting: Vote UP / Vote DOWN

Simple Python: a job queue with threading

Январь 21st, 2011

Every so often you need to use a queue to manage operations in an application. Python makes this very simple. Python also, as I’ve written about before, makes threading very easy to work with. So in this quick program I’ll describe via comments, how to make a simple queue where each job is processed by a thread. Integrating this code to read jobs from a mysql database would be trivial as well; simply replace the “jobs = [..." code with a database call to a row select query.

#!/usr/bin/env python
## DATE: 2011-01-20
## FILE: queue.py
## AUTHOR: Matt Reid
## WEBSITE: http://themattreid.com
from Queue import *
from threading import Thread, Lock

'''this function will process the items in the queue, in serial'''
def processor():
    if queue.empty() == True:
        print "the Queue is empty!"
        sys.exit(1)
    try:
        job = queue.get()
        print "I'm operating on job item: %s"%(job)
        queue.task_done()
    except:
        print "Failed to operate on job"

'''set variables'''
queue = Queue()
threads = 4

'''a list of job items. you would want this to be more advanced,
like reading from a file or database'''
jobs = [ "job1", "job2", "job3" ]

”’iterate over jobs and put each into the queue in sequence”’
for job in jobs:
     print “inserting job into the queue: %s”%(job)
     queue.put(job)

”’start some threads, each one will process one job from the queue”’
for i in range(threads):
     th = Thread(target=processor)
     th.setDaemon(True)
     th.start()

”’wait until all jobs are processed before quitting”’
queue.join()

PlanetMySQL Voting: Vote UP / Vote DOWN

Simple Python: a job queue with threading

Январь 21st, 2011

Every so often you need to use a queue to manage operations in an application. Python makes this very simple. Python also, as I’ve written about before, makes threading very easy to work with. So in this quick program I’ll describe via comments, how to make a simple queue where each job is processed by a thread. Integrating this code to read jobs from a mysql database would be trivial as well; simply replace the “jobs = [..." code with a database call to a row select query.

#!/usr/bin/env python
## DATE: 2011-01-20
## FILE: queue.py
## AUTHOR: Matt Reid
## WEBSITE: http://themattreid.com
from Queue import *
from threading import Thread, Lock

'''this function will process the items in the queue, in serial'''
def processor():
    if queue.empty() == True:
        print "the Queue is empty!"
        sys.exit(1)
    try:
        job = queue.get()
        print "I'm operating on job item: %s"%(job)
        queue.task_done()
    except:
        print "Failed to operate on job"

'''set variables'''
queue = Queue()
threads = 4

'''a list of job items. you would want this to be more advanced,
like reading from a file or database'''
jobs = [ "job1", "job2", "job3" ]

”’iterate over jobs and put each into the queue in sequence”’
for job in jobs:
     print “inserting job into the queue: %s”%(job)
     queue.put(job)

”’start some threads, each one will process one job from the queue”’
for i in range(threads):
     th = Thread(target=processor)
     th.setDaemon(True)
     th.start()

”’wait until all jobs are processed before quitting”’
queue.join()

PlanetMySQL Voting: Vote UP / Vote DOWN

openark-kit (rev. 170): new tools, new functionality

Декабрь 15th, 2010

I’m please to announce a new release of the openark kit. There’s a lot of new functionality inside; following is a brief overview.

The openark kit is a set of utilities for MySQL. They solve everyday maintenance tasks, which may be complicated or time consuming to work by hand.

It’s been a while since the last announced release. Most of my attention was on mycheckpoint, building new features, writing documentation etc. However my own use of openark kit has only increased in the past few months, and there’s new useful solutions to common problems that have been developed.

I’ve used and improved many tools over this time, but doing the final cut, along with proper documentation, took some time. Anyway, here are the highlights:

New tool: oak-hook-general-log

oak-hook-general-log hooks up a MySQL server and dumps the general log based on filtering rules, applying to query role or execution plan. It is possible to only dump connect/disconnect entries, queries which make a full table scan, or use temporary tables, or scan more than X number of rows, or…

I’ll write more on this tool shortly.

New tool: oak-prepare-shutdown

This tool makes for an orderly and faster shutdown by safely stopping replication, and flushing InnoDB pages to disk prior to shutting down (keeping server available for connections even while attempting to flush dirty pages to disk). A typical use case would be:

oak-prepare-shutdown --user=root --ask-pass --socket=/tmp/mysql.sock && /etc/init.d/mysql stop

New tool: oak-repeat query

oak-repeat-query repeats executing a given query until some condition holds. The condition can be:

  • Number of given iterations has been reached
  • Given time has elapsed
  • No rows have been affected by query

The tool comes in handy for cleanup jobs, warming up caches, etc.

New tool: oak-get-slave-lag

This simple tool just returns the number of seconds a slave is behind master. But it also returns with an appropriate exit code, based on a given threshold: 0 when lag is good, 1 (error exit code) when lag is too great or slave fails to replicate.

This tool has been used by 3rd party applications, such as a load balancer, to determine whether a slave should be accessed.

Updated tool: oak-chunk-update

This extremely useful utility breaks down very long queries into smaller chunks. These could be queries which should affect a huge amount of rows, or queries which cannot utilize an index.

Updates to the tool include limiting the range of rows the tool scans, by specifying start and stop position (either by providing constant values or by SELECT query). Also added is auto-termination when no rows are found to be affected. Last, it is possible to override INFORMATION_SCHEMA lookup by explicitly specifying chunking key.

This tool works great for your daily/weekly/monthly batch jobs; in creating DWH tables; populating new columns; purging old entries; clearing data based on non-indexed values; generating summary tables; and more.

Frozen tool: oak-apply-ri

I haven’t been using this tool for a while. The main work down by this tool can be done with oak-chunk-update. There are some additional safety checks oak-apply-ri provides; I’m thinking over if they justify the tool’s existence.

Frozen tool: oak-online-alter-table

With the appearance of Facebook’s Online Schema Change (OSC) tool, which derives from oak-online-alter-table, I’m not sure I will continue developing the tool. I intend to wait for general feedback on OSC before making a decision.

Documentation

Documentation is now part of openark kit‘s SVN repository.

Download

The openark kit project is currently hosted by Google Code. Downloads are available at the Google Code openark kit project page.

Downloads are available in the following packaging formats:

  • .deb package, to be installed on debian, ubuntu and otherwise debian based distributions.
  • .rpm package, architecture free (noarch), for RPM supporting Linux distributions such as RedHat, Fedora, CentOS etc.
  • .tar.gz using python’s distutils installer.
  • source, directly retrieved from SVN or from above python package.
  • Some distribution specific RPM packages, courtesy Lenz Grimmer.

Feedback

Your feedback is welcome! I may not always respond promptly; and I confess that some bugs were left open for more than I would have liked them to. I hope to make for good quality of code, and bug reporting is one major factor you can control.


PlanetMySQL Voting: Vote UP / Vote DOWN

mycheckpoint (rev 208): aggregation tables, enhanced charting, RPM distribution

Ноябрь 8th, 2010

Revision 208 of mycheckpoint, a MySQL monitoring solution, has been released. New and updated in this revision:

  • Aggregation tables: aggregated data makes for fast reports on previously slow queries.
  • Enhanced charting: interactive charts now present time stamps dynamically (see demo); “Zoom in” charts are available (see demo) on mycheckpoint‘s HTTP server.
  • RPM distribution: a “noarch” RPM mycheckpoint build is now available.
  • Initial work on formalizing test environment

mycheckpoint celebrates one year of existence!

Aggregation tables

I really wanted to avoid using these: everything was so more beautiful with one single dataset and dozens of supporting views (OK, the views themselves are hardly “beautiful”).

However it was impossible (for my level of expertise) to optimize query performance what with all those views on per-hour and per-day aggregation. The GROUP BYs and the JOINs did not make it possible for condition pushdown (i.e. using MERGE algorithm) where desired.

As result, mycheckpoint now manages aggregation tables: per-hour and per-day. The impact on sample taking is neglect able (making for two additional fast queries), but the impact on reading aggregated data is overwhelming. Generating a HTML full report could take a few minutes to complete. It now returns in no time. This makes charting more attractive, and allows for enhanced charting, such as zooming in on charts, as described following.

Aggregation tables will automatically be created and retroactively populated upon using revision 208. There’s nothing special to do; be advised that for one single execution of mycheckpoint, many INSERT queries are going to be executed. Shouldn’t take more than a couple minutes on commodity hardware and a few months of history.

It is possible to disable aggregation tables, or make for a complete rebuild of tables; by default, though, aggregation is ON.

Enhanced charting

Two enhancements here:

  1. The interactive line charts already know how to update legend data as mouse hovers over them. Now they also present accurate date & time. This provides with fully informative charts.
  2. As with other monitoring tools, it is possible to “zoom in” on a chart: zooming in will present any chart in “last 24 hours”, “last 10 days” and “complete history” views, magnified on screen. See demo here.

RPM distribution

No excuse for this being so late, I know. But RPM distribution is now available. Yeepee!

This is a noarch distribution, courtesy of Python’s distutils; you should be able to install the package on any RPM supporting platform. I have only tested in on CentOS; feedback is welcome.

Future plans

Work is going on. These are the non-scheduled future tasks I see:

  • Monitoring InnoDB Plugin & XtraDB status.
  • A proper man page.
  • Anything else that interests me & the users.

Try it out

Try out mycheckpoint. It’s a different kind of monitoring solution. Simple monitoring (charting) is immediate. For more interesting results you will need basic SQL skills, and in return you’ll get a lot of power under your hands.

mycheckpoint is released under the New BSD License.

Umm, I’ll repeat this last one: mycheckpoint is released under the New BSD License. Still, and will continue to be. Thanks for the good advice by Lenz, Domas and others.


PlanetMySQL Voting: Vote UP / Vote DOWN

A simple load test script in Python

Ноябрь 3rd, 2010

Lately I’ve had to do some environment load testing so I wrote this quick script. It can be modified as needed but the basic idea is that it spawns $x threads (–threads) and then sends two connections (or however many you want with –per-connection=) per thread to the URL (–url=). You can have it wait a configurable time between connections as well (–wait=).

The url is appended with a 32 character randomized string so that any database/caching on the backend of the site isn’t serving data from a warm cache. You can hunt down the string length for 32 and change it to whatever you want. Feel free to change and use as needed, just keep my info at top.

#!/usr/bin/python
################################################################################
## DATE: 2010-10-26
## AUTHOR: Matt Reid
## MAIL: mreid@kontrollsoft.com
## SITE: http://kontrollsoft.com
## LICENSE: BSD http://www.opensource.org/licenses/bsd-license.php
################################################################################

from __future__ import division
import threading
import sys
import urllib2
import select
import random
import string
import getopt
import time

class threader(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)
    def run(self):
        global url
        global per
        global u
        for i in range(per):
            if wait > 0:
                time.sleep(wait)
            str = randstr(32)
            # IMPORTANT: this is where we append the search string to the main URL
            # you might need to change this for your site.
            url = "%s/search/%s"%(u,str)
            print "polling url: %s"%(url)
            urllib2.urlopen(url)

def randstr(length):
    global url
    twoletters = [c+d for c in string.letters for d in string.letters]
    r = random.random
    n = len(twoletters)
    l2 = length//2
    lst = [None] * l2
    for i in xrange(l2):
        lst[i] = twoletters[int(r() * n)]
        if length & 1:
            lst.append(random.choice(string.letters))

    return "".join(lst)

def init_thread():
    backgrounds = []
    for thread in range(threads):
        print "Spawning thread: %s"%(thread)
        background = threader()
        background.start()
        backgrounds.append(background)
    for background in backgrounds:
        background.join()

def print_help():
    print '''loader.py - URL load test script
==================================================
Date: 2010-08-26
Website: http://themattreid.com
Author: Matt Reid
Email: themattreid@gmail.com
License: new BSD license
==================================================
Use the following flags to change default behavior

   Option                 Description
   --url=                 URL to test
   --per-connection=      Number of sequential reqests per connection (default 2)
   --threads=             Number of threads for url connections (default 50)
   --wait=                Time to wait in-between requests
   --help                 Print this message

   -u                     Same as --url
   -p                     Same as --per-connection
   -t                     Same as --threads
   -w                     Same as --wait
   -h                     Same as --help
   '''

def main():
    init_thread()
    sys.exit(0)

if __name__ == "__main__":
    global threads #num threads/connections to open
    global u #url to hit
    global per #per connection url hits
    try:
        options, remainder = getopt.getopt(
            sys.argv[1:], 'ptuw', ['per-connection=',
                                   'threads=',
                                   'url=',
                                   'wait=',
                                   'help'])
    except getopt.GetoptError, err:
        print str(err)
        sys.exit(2)

    for opt, arg in options:
        if opt in ('--per-connection'):
            per = int(arg)
        elif opt in ('--threads'):
            threads = int(arg)
        elif opt in ('--url'):
            u = arg
        elif opt in ('--wait'):
            wait = int(arg)
        elif opt in ('--help'):
            print_help()
            sys.exit(2)

    try:
        threads
    except NameError:
        print "No thread quantity specified."
        print_help()
        sys.exit(2)
    try:
        per
    except NameError:
        per = 2
    try:
        u
    except NameError:
        print "No URL Specified"
        print_help()
        sys.exit(2)
    try:
        wait
    except NameError:
        wait=0

    main()

PlanetMySQL Voting: Vote UP / Vote DOWN