Archive for the ‘scripts’ Category

MySQL/QueryScript use case: DELETE all but top N records per group

Февраль 9th, 2012

Some administrative tasks can be simplified by using common_schema/QueryScript. I'm collecting a bunch of these for documentation. Here's one for example:

The DBA/developer has the task of retaining only top 3 most populated countries per continent. That is, she has to DELETE 4th, 5th, 6th, ... most populated counties in each continent.

Is it possible to work out with a single query? Yes. But the query is not pretty. In fact, it is quite complicated, and either involves unintuitive subqueries, or unintuitive hacks. A normal DBA would not want to write, neither maintain this kind of query, unless top-notch-geek, which is fine.

Since this is a one time job, we just need to get it done. And common_schema/QueryScript provide with the intuitive solution: if we read our demand aloud, we realize we want to delete 4th, 5th, 6th, ... populated countries for each continent.

I present a solution made available by QueryScript, and discuss the ways in which the code overcomes limitations, or simplifies complexity:

var $num_countries_to_delete;
foreach($continent, $num_countries: SELECT continent, COUNT(*) FROM world.Country GROUP BY continent)
{
  if ($num_countries > 3)
  {
    set $num_countries_to_delete := $num_countries - 3;
    DELETE FROM world.Country WHERE Continent = $continent ORDER BY Population ASC LIMIT :$num_countries_to_delete;
  }
}

Discussion

The first thing that should be apparent from the above is that this is a programmatic solution. Queries are declarative, which is why complex ones sometimes look incomprehensible. The above is more straightforward.

The next thing to realize, which is a disclosure issue of some sorts, is that the above code is fine for a one time, or maintenance execution; but you wouldn't want to be normally issuing this type of code against your database 10,000 times a second.

Now let's break down the code to fragments:

Discussion: variables

The $num_countries_to_delete is a script variable. It is local. It is reset to NULL upon declaration and destroyed when its visibility ends. But the real power comes later, when it is expanded. This is discussed last.

Discussion: iteration

How would you iterate the continents using a stored routine? I personally think the syntax for server side cursors is overwhelmingly verbose. Declare a cursor, declare a continue handler, declare variables to grab values, open the cursor, start a loop, iteratively fetch the cursor (assign row values onto variables), oh, check up on the continue handler (programmatically exit the loop if it fails), close the cursor.

The foreach() loop statement was developed to simplify all the above. Hey: just name your query, and the list of variables which should be assigned to, and do your thing in the following statement.

Discussion: conditional branching

The standard SQL CASE statement, and the additional IF() statement are fine, and I use them a lot. But they are fine for SELECT queries, and only allow you to get data. At best, you may invoke a function based on some condition, which can actually modify data.

With QueryScript it's as with your normal programming language: you can DELETE if some condition holds true, INSERT or SELECT or ALTER or whatever if false.

In the above code there isn't too much news. The same can be done with stored routines. However the if statement can also accept a query as a condition. One can ask: if (DELETE FROM ... WHERE...). The condition holds true only is the operation was successful (rows actually DELETEd, or INSERTed, or UPDATEed). This makes for a very tight integration between script and SQL.

Discussion: variables and variable expansion

Script variables behave just as normal MySQL user defined variables (in fact, current internal implementation of script variables is by user defined variables). Which means the set statement works for them just as normal.

And here is where things become not-normal:

Say we want to delete all but the 3 most populated countries in Europe. Wouldn't we like to issue a DELETE FROM Country WHERE Continent = 'Europe' ORDER BY Population DESC LIMIT 3, 999999999? (The 9999999999 to resemble "infinite", in poor man's solution)

But MySQL's DELETE does not accept both limit & offset in the LIMIT clause. Just the limit part. Which is why we're working the other way round: we find out the number of records we wish to purge and delete bottom up. But wait, here's another problem:

In MySQL, the LIMIT clause must accept a constant. You can just DELETE FROM .... LIMIT @x. This makes for a syntax error. Bummer!

If we don't know ahead the number of records we wish to purge, how can we work both dynamically and correctly?

Enter variable expansion. In the statement:

DELETE FROM world.Country WHERE Continent = $continent ORDER BY Population ASC LIMIT :$num_countries_to_delete;

The $num_countries_to_delete variable is expanded, via ":". The :$num_countries_to_delete token is replaced in-place with the value contained by $num_countries_to_delete. MySQL never gets a variable in the LIMIT clause: by the time the query reaches MySQL, theres a constant in place, and none is the wiser. But as far as we're concerned, we get a dynamic way of producing values to the LIMIT clause.

LIMIT is not the only clause which expects constants. How about KILL? How about DDLs, such as CREATE TABLE? With variable expansion you can dynamically inject values onto such clauses, statements and commands, and get your self a dynamic script.

Conclusion

This small code sample exposes much of QueryScript's power. Throughout the months of development, I happened to use QueryScript code over and over on production, to realize how it can sometimes simplify very complex tasks into a mere 2-liner code. A code that any of my fellow programmers can understand, as well, without having to be SQL experts. And such which is executed within the server; no need for external languages, connectors, dependencies, packages etc.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL/QueryScript use case: DELETE all but top N records per group

Февраль 9th, 2012

Some administrative tasks can be simplified by using common_schema/QueryScript. I'm collecting a bunch of these for documentation. Here's one for example:

The DBA/developer has the task of retaining only top 3 most populated countries per continent. That is, she has to DELETE 4th, 5th, 6th, ... most populated counties in each continent.

Is it possible to work out with a single query? Yes. But the query is not pretty. In fact, it is quite complicated, and either involves unintuitive subqueries, or unintuitive hacks. A normal DBA would not want to write, neither maintain this kind of query, unless top-notch-geek, which is fine.

Since this is a one time job, we just need to get it done. And common_schema/QueryScript provide with the intuitive solution: if we read our demand aloud, we realize we want to delete 4th, 5th, 6th, ... populated countries for each continent.

I present a solution made available by QueryScript, and discuss the ways in which the code overcomes limitations, or simplifies complexity:

var $num_countries_to_delete;
foreach($continent, $num_countries: SELECT continent, COUNT(*) FROM world.Country GROUP BY continent)
{
  if ($num_countries > 3)
  {
    set $num_countries_to_delete := $num_countries - 3;
    DELETE FROM world.Country WHERE Continent = $continent ORDER BY Population ASC LIMIT :$num_countries_to_delete;
  }
}

Discussion

The first thing that should be apparent from the above is that this is a programmatic solution. Queries are declarative, which is why complex ones sometimes look incomprehensible. The above is more straightforward.

The next thing to realize, which is a disclosure issue of some sorts, is that the above code is fine for a one time, or maintenance execution; but you wouldn't want to be normally issuing this type of code against your database 10,000 times a second.

Now let's break down the code to fragments:

Discussion: variables

The $num_countries_to_delete is a script variable. It is local. It is reset to NULL upon declaration and destroyed when its visibility ends. But the real power comes later, when it is expanded. This is discussed last.

Discussion: iteration

How would you iterate the continents using a stored routine? I personally think the syntax for server side cursors is overwhelmingly verbose. Declare a cursor, declare a continue handler, declare variables to grab values, open the cursor, start a loop, iteratively fetch the cursor (assign row values onto variables), oh, check up on the continue handler (programmatically exit the loop if it fails), close the cursor.

The foreach() loop statement was developed to simplify all the above. Hey: just name your query, and the list of variables which should be assigned to, and do your thing in the following statement.

Discussion: conditional branching

The standard SQL CASE statement, and the additional IF() statement are fine, and I use them a lot. But they are fine for SELECT queries, and only allow you to get data. At best, you may invoke a function based on some condition, which can actually modify data.

With QueryScript it's as with your normal programming language: you can DELETE if some condition holds true, INSERT or SELECT or ALTER or whatever if false.

In the above code there isn't too much news. The same can be done with stored routines. However the if statement can also accept a query as a condition. One can ask: if (DELETE FROM ... WHERE...). The condition holds true only is the operation was successful (rows actually DELETEd, or INSERTed, or UPDATEed). This makes for a very tight integration between script and SQL.

Discussion: variables and variable expansion

Script variables behave just as normal MySQL user defined variables (in fact, current internal implementation of script variables is by user defined variables). Which means the set statement works for them just as normal.

And here is where things become not-normal:

Say we want to delete all but the 3 most populated countries in Europe. Wouldn't we like to issue a DELETE FROM Country WHERE Continent = 'Europe' ORDER BY Population DESC LIMIT 3, 999999999? (The 9999999999 to resemble "infinite", in poor man's solution)

But MySQL's DELETE does not accept both limit & offset in the LIMIT clause. Just the limit part. Which is why we're working the other way round: we find out the number of records we wish to purge and delete bottom up. But wait, here's another problem:

In MySQL, the LIMIT clause must accept a constant. You can just DELETE FROM .... LIMIT @x. This makes for a syntax error. Bummer!

If we don't know ahead the number of records we wish to purge, how can we work both dynamically and correctly?

Enter variable expansion. In the statement:

DELETE FROM world.Country WHERE Continent = $continent ORDER BY Population ASC LIMIT :$num_countries_to_delete;

The $num_countries_to_delete variable is expanded, via ":". The :$num_countries_to_delete token is replaced in-place with the value contained by $num_countries_to_delete. MySQL never gets a variable in the LIMIT clause: by the time the query reaches MySQL, theres a constant in place, and none is the wiser. But as far as we're concerned, we get a dynamic way of producing values to the LIMIT clause.

LIMIT is not the only clause which expects constants. How about KILL? How about DDLs, such as CREATE TABLE? With variable expansion you can dynamically inject values onto such clauses, statements and commands, and get your self a dynamic script.

Conclusion

This small code sample exposes much of QueryScript's power. Throughout the months of development, I happened to use QueryScript code over and over on production, to realize how it can sometimes simplify very complex tasks into a mere 2-liner code. A code that any of my fellow programmers can understand, as well, without having to be SQL experts. And such which is executed within the server; no need for external languages, connectors, dependencies, packages etc.


PlanetMySQL Voting: Vote UP / Vote DOWN

common_schema rev. 218: QueryScript, throttling, processes, documentation

Февраль 8th, 2012

common_schema, revision 218 is released, with major new features, top one being server side scripting. Here are the highlights:

  • QueryScript: server side scripting is now supported by common_schema, which acts as an interpreter for QueryScript code.
  • Throttling for queries is now made available via the throttle() function.
  • Enhancements to processlist-related views, including the new slave_hosts view.
  • Inline documentation/help is available via the help() routine.
  • more...

QueryScript

common_schema makes for a QueryScript implementation for MySQL. You can run server side scripts, interpreted by common_schema, which allow for easy syntax and greater power than was otherwise previously available on the MySQL server. For example:

foreach($table, $schema, $engine: table like '%')
  if ($engine = 'ndbcluster')
    ALTER ONLINE TABLE :$schema.:$table REORGANIZE PARTITION;

QueryScript includes flow control, conditional branching, variables & variable expansion, script throttling and more.

Read more on common_schema's QueryScript implementation.

Query throttling

Throttling for MySQL queries was suggested by means of elaborate query manipulation. It is now reduced into a single throttle function: one can now just invoke throttle(3) on one's query, so as to make the query execute for a longer time, while taking short sleep breaks during operation, easing up the query's demand for resources.

Read more on query throttling.

Process views

The processlist_grantees view provides with more details on the running processes. slave_hosts is a new view, listing hostnames of connected slaves.

Read more on process views.

help()

The common_schema documentation is now composed of well over 100 pages, including synopsis, detailed internals discussion, notes and examples. I can't exaggerate in saying that the documentation took the vast majority of time for this code to release.

The documentation is now made available inline, from within you mysql client, via the help() routine. Want to know more about redundant (duplicate) keys and how to find them? Just type:

call help('redundant');

and see what comes out!

The entire documentation, which is available online as well as a downloadable bundle, is embedded into common_schema itself. It's rather cool.

Tests

common_schema is tested. The number of tests in common_schema is rapidly growing, and new tests are introduced for new features as well as for older ones. There is not yet full coverage for all views, but I'm working hard at it. common_schema is a robust piece of code!

Get it!

Download common_schema on the common_schema project page.

Read the documentation online, or download it as well (or call for help())

common_schema is released under the BSD license.


PlanetMySQL Voting: Vote UP / Vote DOWN

QueryScript: SQL scripting language

Февраль 8th, 2012

Introducing QueryScript: a programming language aimed for SQL scripting, seamlessly combining scripting power such as flow control & variables with standard SQL statements or RDBMS-specific commands.

QueryScript is available fro MySQL via common_schema, which adds MySQL-specific usage.

What does QueryScript look like? Here are a few code samples:

Turn a bulk DELETE operation into smaller tasks. Throttle in between.

while (DELETE FROM archive.events WHERE ts < CURDATE() LIMIT 1000)
{
  throttle 2;
}

Convert all InnoDB tables in the 'sakila' database to compressed format:

foreach ($table, $schema, $engine: table in sakila)
{
  if ($engine = 'InnoDB')
    ALTER TABLE :$schema.:$table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
}

Shard your data across multiple schemata:

foreach($shard: {USA, GBR, JAP, FRA})
{
  CREATE DATABASE db_:$shard;
  CREATE TABLE db_:$shard.city LIKE world.City;
  INSERT INTO db_:$shard.city SELECT * FROM world.City WHERE CountryCode = $shard;
}

This tight integration between script and SQL, with the power of iteration, conditional statements, variables, variable expansion, throttling etc., makes QueryScript a power tool, with capabilities superseding those of stored routines, and allowing for simplified, dynamic code.

QueryScript code is interpreted. It's just a text, so it can be read from a @user_defined_variable, a table column, text file, what have you. For example:

mysql> set @script := "while (TIME(SYSDATE()) < '17:00:00') SELECT * FROM world.City WHERE id = 1 + FLOOR((RAND()*4079));";
mysql> call run(@script);

For more details, consult the QueryScript site.

If you're a MySQL user/DBA, better read the common_schema QueryScript documentation, to better understand the specific common_schema implementation and enhanced features.

common_schema, including the QueryScript interpreter, can be downloaded from the common_schema project page.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL command line vs. visual editors

Январь 30th, 2012

Students in my training classes usually prefer to use some kind of visual editor for MySQL. Typically this would be the software they're using at work. Sometimes they just bring over their laptops with the software installed. Or they would use MySQL Workbench, which is what I usually have pre-installed on their desktops.

I see MySQL Workbench, SQLyog, Toad for MySQL, or several more.

I always humbly suggest they close down their software and open up a command line.

It isn't fancy. It may not even be convenient (especially on Windows, in my opinion). And repeating your last command with a minor modification requires a lot of key stroking. Or you would copy+paste from some text editor. Most students will give it a shot, then go back to their favorite editor.

Well, again and again I reach the same conclusion:

Visual editors are not as trustworthy as the command line.

Time and again students show me something on their editor. Behavior seems strange to me. Opening up a console shows a completely different picture.

Things like:

  • The visual editor would open a new connection for every new query (oh, so the @user_defined_variable I've just assigned turns NULL, or the TEMPORARY TABLE disappears).
  • The visual editor will only show 1,000 results, via LIMIT 0,1000. "But the same query runs so much faster on my machine!". Well, sure, a filesort of 1,000,000 rows that can satisfy the first 1,000 will quit early!
  • The visual editor shows table definition graphically. "I didn't realize the index did(n't) cover this and that columns. I didn't realize it only covered first n characters of my VARCHAR.". That's because you can't beat SHOW CREATE TABLE, the definite table structure description.
  • The visual editor allows for export/import/copy/transfer of tables and rows with just one click! "Why is it so complicated in the command line to purge 1,000,000 rows from a table?". Ummm, did you realize the visual editor would typically use a naive approach of doing everything in one huge transaction?
  • The visual editor is smart. But sometimes you don't want smart. You just assume simple. I personally take great precaution with smart solutions. Luckily, with scripts you have so much greater control (i.e. command line options, "dry-run" mode, etc.) that I have greater confidence in them.

I do like it when a visual editor plays it both smart and safe, in such way that before doing its smart work it actually presents you with the query it's going to issue. Which is why I always considered MySQL Query Browser (now replaced by Workbench) to be the visual editor of choice in my classes.

But, at the end of the day, I strongly believe: if you don't know how to do it with command line, you can't really know how it's done.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL command line vs. visual editors

Январь 30th, 2012

Students in my training classes usually prefer to use some kind of visual editor for MySQL. Typically this would be the software they're using at work. Sometimes they just bring over their laptops with the software installed. Or they would use MySQL Workbench, which is what I usually have pre-installed on their desktops.

I see MySQL Workbench, SQLyog, Toad for MySQL, or several more.

I always humbly suggest they close down their software and open up a command line.

It isn't fancy. It may not even be convenient (especially on Windows, in my opinion). And repeating your last command with a minor modification requires a lot of key stroking. Or you would copy+paste from some text editor. Most students will give it a shot, then go back to their favorite editor.

Well, again and again I reach the same conclusion:

Visual editors are not as trustworthy as the command line.

Time and again students show me something on their editor. Behavior seems strange to me. Opening up a console shows a completely different picture.

Things like:

  • The visual editor would open a new connection for every new query (oh, so the @user_defined_variable I've just assigned turns NULL, or the TEMPORARY TABLE disappears).
  • The visual editor will only show 1,000 results, via LIMIT 0,1000. "But the same query runs so much faster on my machine!". Well, sure, a filesort of 1,000,000 rows that can satisfy the first 1,000 will quit early!
  • The visual editor shows table definition graphically. "I didn't realize the index did(n't) cover this and that columns. I didn't realize it only covered first n characters of my VARCHAR.". That's because you can't beat SHOW CREATE TABLE, the definite table structure description.
  • The visual editor allows for export/import/copy/transfer of tables and rows with just one click! "Why is it so complicated in the command line to purge 1,000,000 rows from a table?". Ummm, did you realize the visual editor would typically use a naive approach of doing everything in one huge transaction?
  • The visual editor is smart. But sometimes you don't want smart. You just assume simple. I personally take great precaution with smart solutions. Luckily, with scripts you have so much greater control (i.e. command line options, "dry-run" mode, etc.) that I have greater confidence in them.

I do like it when a visual editor plays it both smart and safe, in such way that before doing its smart work it actually presents you with the query it's going to issue. Which is why I always considered MySQL Query Browser (now replaced by Workbench) to be the visual editor of choice in my classes.

But, at the end of the day, I strongly believe: if you don't know how to do it with command line, you can't really know how it's done.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL command line vs. visual editors

Январь 30th, 2012

Students in my training classes usually prefer to use some kind of visual editor for MySQL. Typically this would be the software they're using at work. Sometimes they just bring over their laptops with the software installed. Or they would use MySQL Workbench, which is what I usually have pre-installed on their desktops.

I see MySQL Workbench, SQLyog, Toad for MySQL, or several more.

I always humbly suggest they close down their software and open up a command line.

It isn't fancy. It may not even be convenient (especially on Windows, in my opinion). And repeating your last command with a minor modification requires a lot of key stroking. Or you would copy+paste from some text editor. Most students will give it a shot, then go back to their favorite editor.

Well, again and again I reach the same conclusion:

Visual editors are not as trustworthy as the command line.

Time and again students show me something on their editor. Behavior seems strange to me. Opening up a console shows a completely different picture.

Things like:

  • The visual editor would open a new connection for every new query (oh, so the @user_defined_variable I've just assigned turns NULL, or the TEMPORARY TABLE disappears).
  • The visual editor will only show 1,000 results, via LIMIT 0,1000. "But the same query runs so much faster on my machine!". Well, sure, a filesort of 1,000,000 rows that can satisfy the first 1,000 will quit early!
  • The visual editor shows table definition graphically. "I didn't realize the index did(n't) cover this and that columns. I didn't realize it only covered first n characters of my VARCHAR.". That's because you can't beat SHOW CREATE TABLE, the definite table structure description.
  • The visual editor allows for export/import/copy/transfer of tables and rows with just one click! "Why is it so complicated in the command line to purge 1,000,000 rows from a table?". Ummm, did you realize the visual editor would typically use a naive approach of doing everything in one huge transaction?
  • The visual editor is smart. But sometimes you don't want smart. You just assume simple. I personally take great precaution with smart solutions. Luckily, with scripts you have so much greater control (i.e. command line options, "dry-run" mode, etc.) that I have greater confidence in them.

I do like it when a visual editor plays it both smart and safe, in such way that before doing its smart work it actually presents you with the query it's going to issue. Which is why I always considered MySQL Query Browser (now replaced by Workbench) to be the visual editor of choice in my classes.

But, at the end of the day, I strongly believe: if you don't know how to do it with command line, you can't really know how it's done.


PlanetMySQL Voting: Vote UP / Vote DOWN

More MySQL foreach()

Декабрь 2nd, 2011

In my previous post I've shown several generic use cases for foreach(), a new scripting functionality introduced in common_schema.

In this part I present DBA's handy syntax for schema and table operations and maintenance.

Confession: while I love INFORMATION_SCHEMA's power, I just hate writing queries against it. It's just so much typing! Just getting the list of tables in a schema makes for this heavy duty query:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='sakila' AND TABLE_TYPE='BASE TABLE';

When a join is involved this really becomes a nightmare. I think it's cumbersome, and as result, many do not remember the names and meaning of columns, making for "oh, I need to read the manual all over again just to get that query right". Anyway, that's my opinion.

A SHOW TABLES statement is easier to type, but cannot be integrated into a SELECT query (though we have a partial solution for that, too), and besides, when filtering out the views, the SHOW statement becomes almost as cumbersome as the one on INFORMATION_SCHEMA.

Which is why foreach() offers handy shortcuts to common iterations on schemata and tables, as follows:

Use case: iterate all databases

call foreach('schema', 'CREATE TABLE ${schema}.event(event_id INT, msg VARCHAR(128))');

In the above we execute a query on each database. Hmmm, maybe not such a good idea to perform this operation on all databases? Let's filter them:

Use case: iterate databases by name match

call foreach('schema like wordpress_%', 'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL');

The above will only iterate my WordPress databases (I have several of these), performing an ALTER on wp_posts for each of those databases.

I don't have to quote the like expression, but I can, if I wish to.

I can also use a regular expression match:

call foreach('schema ~ /^wordpress_[0-9]+$/', 'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL');

Use case: iterate tables in a specific schema

Time to upgrade our sakila tables to InnoDB's compressed format. We use $(), a synonym for foreach().

call $('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPRESSED');

The above will iterate on tables in sakila. I say tables, since it will avoid iterating views (there is still no specific syntax for views iteration). This is done on purpose, as my experience shows there is very little in common between tables and views when it comes to maintenance and operations.

Use case: iterate tables by name match

Here's a interesting scenario: you wish to work on all tables matching some name. The naive approach would be to:

SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'wp_posts' AND TABLE_TYPE = 'BASE TABLE'

Wait! Are you aware this may bring your server down? This query will open all databases at once, opening all .frm files (though thankfully not data files, since we only check for name and type).

Here's a better approach:

call foreach('table like wp_posts', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB');

(There's now FULLTEXT to InnoDB, so the above can make sense in the near future!)

The good part is that foreach() will look for matching tables one database at a time. It will iterate the list of database, then look for matching tables per database, thereby optimizing the query on INFORMATION_SCHEMA.

Here, too, I can use regular expressions:

call $('table ~ /^wp_.*$/', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB');

Conclusion

This is work in the making, but, as someone who maintains a few productions servers, I've already put it to work.

I'm hoping the syntax is easy to comprehend. I know that since I developed it it must be far more intuitive to myself than to others. I've tried to keep close on common syntax and concepts from various programming languages.

I would like to get as much feedback as possible. I have further ideas and thoughts on the direction common_schema is taking, but wish take it in small steps. Your feedback is appreciated!


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL foreach()

Декабрь 2nd, 2011

A new routine is now available in common_schema, which makes for an easier execution syntax for some operations:

foreach(collection_to_iterate_over, queries_to_execute_per_iteration_step);

To illustrate what it can do, consider:

call foreach('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPACT');

call $('schema like shard_%', 'CREATE TABLE ${schema}.messages (id INT)');

call $('2000:2009', 'INSERT IGNORE INTO report (report_year) VALUES (${1})');

$() stands as a synonym to foreach(). I suspect it should look familiar to web programmers.

The idea for foreach() was introduced by Giuseppe Maxia during a correspondence. At first I was skeptic: this isn't jQuery; this is SQL. Why would I want to use foreach()?

Then Giuseppe provided some use cases, and as I started thinking about it, I found more and more cases where such a tool might considerably reduce scripting overhead and avoid requiring SQL-fu skills. In fact, I have been using it myself for the past few weeks

I provide examples which I hope will convince the reader as for the simplicity of using such syntax. Showing off the types of input foreach() accepts (query, table search, schema search, set of constants, single or double numbers range), and the types of queries it can execute (single, multiple, using placeholders).

I stress that this is not a replacement for common queries (i.e. does not come to replace your common SELECT, UPDATE, DELETE), but more for working out administrative tasks. Nevertheless, the last example in this post does provide with an interesting insight on possible "normal" use.

Use case: using values from query

Let's kill all queries running for over 20 seconds:

call foreach('SELECT id FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 20', 'KILL QUERY ${1}');

The thing I like most about foreach() is that it's self explanatory. Nevertheless, I note:

  • The KILL command is executed for each process running for more than 20 seconds (I did round up corners, since I didn't check for sleeping processes, for simplicity).
  • I also use the ${1} placeholder: much like in awk, this will get the first column in the result set. In our case, it is the single column, id.
  • I chose to invoke a single query/command per iteration step.

Compare the above with another solution to the same problem, using eval():

call eval('SELECT CONCAT(\'KILL QUERY \',id) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE TIME > 20');

They both get the same thing done. But foreach() is just a bit more friendly to write (and read).

Let's move to a more complicated example.

Use case: using multiple values from a query, invoking multiple commands

Let's kill some queries, as above, but also write down a log entry so that we know what happened:

call foreach(
  'SELECT id, user FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 20',
  'KILL QUERY ${1}; INSERT INTO my_log VALUES (\'have just killed query ${1}, executed by ${2}\');')
;

In the above, for each long running process, we:

  • Kill the process' query. id being the first column, is referenced by ${1}.
  • INSERT to my_log that said process has been killed. We note both id and user using placeholders ${1} and ${2}, respectively.

It's possible to invoke as many queries/commands per iteration step. It is possible to use placeholders ${1} through ${9}, as well as ${NR}, which works as in awk: it is a row-counter, 1-based.

This example can still be written with eval(), but in much uglier form. I can't just first KILL the processes, then log about them, since by the time I want to log, the queries will not be running; the commands must be coupled. This is naturally done with foreach().

Use case: iterating constant values, invoking DDL

The commands invoked by foreach() can take the form of DML (INSERT/UPDATE/...), DDL (CREATE/ALTER/...) or other (KILL/SET/...). The placeholders can be used anywhere within the text.

Take an installation where different schemata have the same exact table structure. We want to refactor a table on all schemata:

call $('{USA, UK, Japan, NZ}', 'ALTER TABLE db_region_${1}.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8');

The above translates to the following commands:

ALTER TABLE db_region_USA.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_UK.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_Japan.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_NZ.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;

In the above, we:

  • Provide a list of constant values. These can be strings, numbers, whatever. They are space delimited.
  • Use the ${1} place holder. We can also use ${NR}.

Use case: loop through number sequence

We wish to populate a table with values:

call foreach('1970:2038', 'INSERT INTO test.test_dates (dt) VALUES (DATE(\'${1}-01-01\'))');

The above results with:

mysql> SELECT dt FROM test_dates;
+------------+
| dt         |
+------------+
| 1970-01-01 |
| 1971-01-01 |
| 1972-01-01 |
| 1973-01-01 |
| 1974-01-01 |
...
| 2036-01-01 |
| 2037-01-01 |
| 2038-01-01 |
+------------+

With numbers range:

  • Integers are assumed
  • Range is indicated by low and high values, both inclusive
  • Negatives allowed (e.g. '-5:5', resulting with 11 steps)
  • Placeholders ${1} and ${NR} are allowed.

Use case: iterating through two dimensional numbers range:

We use 3 template tables; we create 15 schemata; in each we create 3 tables based on the template tables:

call foreach( '1:15,1:3',
  'CREATE DATABASE IF NOT EXISTS db_test_${1}; CREATE TABLE db_test_${1}.tbl_${2} LIKE db_template.my_table_${2};'
);

Notes:

  • Each of the number ranges has the same restrictions and properties as listed above (integers, inclusive, ascending)
  • We can now use ${1} and ${2} placeholders, noting the first and second numbers range, respectively.
  • We may also use ${NR}, which, in this case, will run 1 through 45 (15 times 3).
  • We use multiple queries per iteration step.

Use case: overcoming MySQL limitations

MySQL does not support ORDER BY & LIMIT in multi-table UPDATE and DELETE statements (as noted last year). So we cannot:

DELETE FROM t1 USING t1 JOIN t2 ON (...) JOIN t3 ON (..) WHERE x = 7 ORDER BY ts LIMIT 100;

However, we can:

call foreach(
  'SELECT t1.id FROM t1 JOIN t2 ON (...) JOIN t3 ON (..) WHERE x = 7 ORDER BY ts LIMIT 100',
  'DELETE FROM t1 WHERE id = ${1}'
);

Of course, it will do a lot of single row DELETEs. There are further MySQL limitations which complicate things if I want to overcome this. Perhaps at a later blog post.

Acknowledgements

I hit a weird bug which prevented me from releasing this earlier on. Actually it's a duplicate of this bug, which makes it 6 years old. Hurray.

To the rescue came Roland Bouman, who suggested an idea so crazy even I was skeptic: to parse and modify the original query so as to rename column names according to my scheme. And of course he made it happen, along with some additional very useful stuff. It's really a super-ultra-meta-meta-sql-fu magic he does there.

So, thanks, Roland, for joining the ride, and thanks, Giuseppe, for testing and helping out to shape this functionality. It's great fun working with other people on open-source -- a new experience for me.

Continued

In this post I've covered the general-purpose iterations. There are also more specific types of iterations with foreach(). Continued next.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL foreach()

Декабрь 2nd, 2011

A new routine is now available in common_schema, which makes for an easier execution syntax for some operations:

foreach(collection_to_iterate_over, queries_to_execute_per_iteration_step);

To illustrate what it can do, consider:

call foreach('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPACT');

call $('schema like shard_%', 'CREATE TABLE ${schema}.messages (id INT)');

call $('2000:2009', 'INSERT IGNORE INTO report (report_year) VALUES (${1})');

$() stands as a synonym to foreach(). I suspect it should look familiar to web programmers.

The idea for foreach() was introduced by Giuseppe Maxia during a correspondence. At first I was skeptic: this isn't jQuery; this is SQL. Why would I want to use foreach()?

Then Giuseppe provided some use cases, and as I started thinking about it, I found more and more cases where such a tool might considerably reduce scripting overhead and avoid requiring SQL-fu skills. In fact, I have been using it myself for the past few weeks

I provide examples which I hope will convince the reader as for the simplicity of using such syntax. Showing off the types of input foreach() accepts (query, table search, schema search, set of constants, single or double numbers range), and the types of queries it can execute (single, multiple, using placeholders).

I stress that this is not a replacement for common queries (i.e. does not come to replace your common SELECT, UPDATE, DELETE), but more for working out administrative tasks. Nevertheless, the last example in this post does provide with an interesting insight on possible "normal" use.

Use case: using values from query

Let's kill all queries running for over 20 seconds:

call foreach('SELECT id FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 20', 'KILL QUERY ${1}');

The thing I like most about foreach() is that it's self explanatory. Nevertheless, I note:

  • The KILL command is executed for each process running for more than 20 seconds (I did round up corners, since I didn't check for sleeping processes, for simplicity).
  • I also use the ${1} placeholder: much like in awk, this will get the first column in the result set. In our case, it is the single column, id.
  • I chose to invoke a single query/command per iteration step.

Compare the above with another solution to the same problem, using eval():

call eval('SELECT CONCAT(\'KILL QUERY \',id) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE TIME > 20');

They both get the same thing done. But foreach() is just a bit more friendly to write (and read).

Let's move to a more complicated example.

Use case: using multiple values from a query, invoking multiple commands

Let's kill some queries, as above, but also write down a log entry so that we know what happened:

call foreach(
  'SELECT id, user FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 20',
  'KILL QUERY ${1}; INSERT INTO my_log VALUES (\'have just killed query ${1}, executed by ${2}\');')
;

In the above, for each long running process, we:

  • Kill the process' query. id being the first column, is referenced by ${1}.
  • INSERT to my_log that said process has been killed. We note both id and user using placeholders ${1} and ${2}, respectively.

It's possible to invoke as many queries/commands per iteration step. It is possible to use placeholders ${1} through ${9}, as well as ${NR}, which works as in awk: it is a row-counter, 1-based.

This example can still be written with eval(), but in much uglier form. I can't just first KILL the processes, then log about them, since by the time I want to log, the queries will not be running; the commands must be coupled. This is naturally done with foreach().

Use case: iterating constant values, invoking DDL

The commands invoked by foreach() can take the form of DML (INSERT/UPDATE/...), DDL (CREATE/ALTER/...) or other (KILL/SET/...). The placeholders can be used anywhere within the text.

Take an installation where different schemata have the same exact table structure. We want to refactor a table on all schemata:

call $('{USA, UK, Japan, NZ}', 'ALTER TABLE db_region_${1}.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8');

The above translates to the following commands:

ALTER TABLE db_region_USA.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_UK.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_Japan.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_NZ.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;

In the above, we:

  • Provide a list of constant values. These can be strings, numbers, whatever. They are space delimited.
  • Use the ${1} place holder. We can also use ${NR}.

Use case: loop through number sequence

We wish to populate a table with values:

call foreach('1970:2038', 'INSERT INTO test.test_dates (dt) VALUES (DATE(\'${1}-01-01\'))');

The above results with:

mysql> SELECT dt FROM test_dates;
+------------+
| dt         |
+------------+
| 1970-01-01 |
| 1971-01-01 |
| 1972-01-01 |
| 1973-01-01 |
| 1974-01-01 |
...
| 2036-01-01 |
| 2037-01-01 |
| 2038-01-01 |
+------------+

With numbers range:

  • Integers are assumed
  • Range is indicated by low and high values, both inclusive
  • Negatives allowed (e.g. '-5:5', resulting with 11 steps)
  • Placeholders ${1} and ${NR} are allowed.

Use case: iterating through two dimensional numbers range:

We use 3 template tables; we create 15 schemata; in each we create 3 tables based on the template tables:

call foreach( '1:15,1:3',
  'CREATE DATABASE IF NOT EXISTS db_test_${1}; CREATE TABLE db_test_${1}.tbl_${2} LIKE db_template.my_table_${2};'
);

Notes:

  • Each of the number ranges has the same restrictions and properties as listed above (integers, inclusive, ascending)
  • We can now use ${1} and ${2} placeholders, noting the first and second numbers range, respectively.
  • We may also use ${NR}, which, in this case, will run 1 through 45 (15 times 3).
  • We use multiple queries per iteration step.

Use case: overcoming MySQL limitations

MySQL does not support ORDER BY & LIMIT in multi-table UPDATE and DELETE statements (as noted last year). So we cannot:

DELETE FROM t1 USING t1 JOIN t2 ON (...) JOIN t3 ON (..) WHERE x = 7 ORDER BY ts LIMIT 100;

However, we can:

call foreach(
  'SELECT t1.id FROM t1 JOIN t2 ON (...) JOIN t3 ON (..) WHERE x = 7 ORDER BY ts LIMIT 100',
  'DELETE FROM t1 WHERE id = ${1}'
);

Of course, it will do a lot of single row DELETEs. There are further MySQL limitations which complicate things if I want to overcome this. Perhaps at a later blog post.

Acknowledgements

I hit a weird bug which prevented me from releasing this earlier on. Actually it's a duplicate of this bug, which makes it 6 years old. Hurray.

To the rescue came Roland Bouman, who suggested an idea so crazy even I was skeptic: to parse and modify the original query so as to rename column names according to my scheme. And of course he made it happen, along with some additional very useful stuff. It's really a super-ultra-meta-meta-sql-fu magic he does there.

So, thanks, Roland, for joining the ride, and thanks, Giuseppe, for testing and helping out to shape this functionality. It's great fun working with other people on open-source -- a new experience for me.

Continued

In this post I've covered the general-purpose iterations. There are also more specific types of iterations with foreach(). Continued next.


PlanetMySQL Voting: Vote UP / Vote DOWN