Archive for the ‘Python’ Category

Liveblogging at Confoo: [not just] PHP Performance by Rasmus Lerdorf

Март 11th, 2010

Most of this stuff is not PHP specific, and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on joind.in, with user comments/feedback, is at http://joind.in/talk/view/1320.

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. Wordpress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of Wordpress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “Wordpress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard Wordpress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile Wordpress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and Wordpress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(Wordpress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.


PlanetMySQL Voting: Vote UP / Vote DOWN

Liveblogging at Confoo: [not just] PHP Performance by Rasmus Lerdorf

Март 11th, 2010

Most of this stuff is not PHP specific, and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on joind.in, with user comments/feedback, is at http://joind.in/talk/view/1320.

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. Wordpress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of Wordpress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “Wordpress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard Wordpress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile Wordpress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and Wordpress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(Wordpress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.


PlanetMySQL Voting: Vote UP / Vote DOWN

Proper SQL table alias use conventions

Март 11th, 2010

After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query.

I’m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I find to be problematic table alias use.

Using the sakila database, take a look at the following queries:

Query #1

SELECT
 R.rental_date, C.customer_id, C.first_name, C.last_name
FROM
 rental R
 JOIN customer C USING (customer_id)
WHERE
 R.rental_date >= DATE('2005-10-01')
 AND C.store_id=1;

The above looks for film rentals done in a specific store (store #1), as of Oct. 1st, 2005.

Query #2

SELECT
 F.title, C.name
FROM
 film AS F
 JOIN film_category AS S ON (F.film_id = S.film_id)
 JOIN category AS C ON (S.category_id = C.category_id)
WHERE F.length > 180;

The above lists the title and category for all films longer than three hours.

Query #3

SELECT c.customer_id, c.last_name
FROM
  customer c
  INNER JOIN address a ON (c.address_id = a.address_id)
  INNER JOIN (
    SELECT
      c.city_id
    FROM
      city AS c
      JOIN country s ON (c.country_id = s.country_id)
    WHERE
      s.country LIKE 'F%'
  ) s1 USING (city_id)
WHERE
  create_date > DATE('2005-10-01');

The above lists customers created after Oct. 1st, 2005, and who live in countries starting with an ‘F’. The query could be solved without a subquery, but there’s a good reason why I made it so.

The problems

I used very different conventions on any one of the queries, and sometimes within each query. And it’s common that I see the same on a customer’s site, what with having many programmers do the SQL coding. Again, I will only discuss the table aliases conventions. I’ll leaver the rest to the reader.

Here’s where I see problems:

  • Query #1: In itself, it looks fine. Rental turns to R, Customer turns to C. I will comment on this slightly later on when I provide my full opinion.
  • Query #2: So film turns to F, category turns to C. What should film_category turn into? Out of letters? Let’s just go for S, shall we? But S has nothing do with film_category. Yet it’s so commonly seen.
  • Query #2: We’re using the AS keyword now. We didn’t use it before.
  • Queries #1, #2: Hold on. Wasn’t C taken for customer in Query #1? Now, in Query #2 it stands for category? I’m beginning to get confused.
  • Query #3: Now aliases are lower case; I was just getting used to them being upper case.
  • Query #3: But, hey, c is back to customer!
  • Query #3: Or, is it? Take a look at the subquery. Theres another c in there! This time it’s city! And it’s perfectly valid syntax. We actually have two identical aliases in the same query.
  • Query #3: If I could, I would name country with c as well. But I can’t. So why not throw in s again?
  • Query #3: and now I don’t even bother using the alias when accessing the create_date. Well, there’s no such column in any of the other tables!

Proper conventions

What I find so disturbing is that whenever I read a complex query, I need to go back and forth, back and forth between table aliases (found everywhere in the query) and their declaration point. Such irregularities make the queries difficult to read.

Any of the above issues could be justified. But I wish to make some suggestions:

  • Decide whether you’re going for upper or lower case.
  • Do not use the same alias twice in your query, even if it’s valid.
  • Aliases do not have to be single character. film_category may just as well be FC.
  • Do not alias something that is hard to interpret. s does not stand for country.
  • Think ahead: use same aliases throughout all your queries, as far as you can. If uniqueness is a problem, make for longer aliases. Use cust instead of c.

The above should make for more organized and readable SQL code. Remember: what one programmer finds as a very intuitive alias, is unintuitive to another!

My own convention

Simple: I only use aliases when using self joins. I am aware that queries are much longer what with long table names. I go farther than that: I prefer fully qualifying questionable columns throughout the query. Yes, it makes the query even longer.

I know this does not appeal to many. But there’s no confusion. And it’s easily searchable. And it’s consistent. And if properly formatted, as in the above queries, is well readable.

Now please join me in asking Oracle if they can add multi-line Strings for java, as there are for python.


PlanetMySQL Voting: Vote UP / Vote DOWN

Kontrollkit – new version is available for download!

Февраль 27th, 2010
Just a quick notice to let everyone know that there is a new version of Kontrollkit available. There are two new scripts included as well as some good updates to the my.cnf files. You can download the new version here: http://kontrollsoft.com/software-downloads kt-mysql-systemcheck – generates a report for point-in-time system status that is useful for troubleshooting MySQL [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

Is emacs not coloring your Python comments?

Февраль 25th, 2010

This is a simple matter with a simple solution that might help someone save time and confusion. Emacs wasn’t coloring my comments correctly so I went ahead and had it change them to red-italic. If you are having similar issues you can drop the following into your home directory’s .emacs file. Enjoy. Keep in mind that if you are using emacs in a terminal session as opposed to the X-server gui then you will not see the italics.


(global-font-lock-mode 1)
(custom-set-variables
'(gud-gdb-command-name "gdb --annotate=1")
'(large-file-warning-threshold nil))
(custom-set-faces
'(font-lock-comment-face ((((class color) (background light)) (:foreground "red" :slant italic)))))


PlanetMySQL Voting: Vote UP / Vote DOWN

Is emacs not coloring your Python comments?

Февраль 25th, 2010

This is a simple matter with a simple solution that might help someone save time and confusion. Emacs wasn’t coloring my comments correctly so I went ahead and had it change them to red-italic. If you are having similar issues you can drop the following into your home directory’s .emacs file. Enjoy. Keep in mind that if you are using emacs in a terminal session as opposed to the X-server gui then you will not see the italics.


(global-font-lock-mode 1)
(custom-set-variables
'(gud-gdb-command-name "gdb --annotate=1")
'(large-file-warning-threshold nil))
(custom-set-faces
'(font-lock-comment-face ((((class color) (background light)) (:foreground "red" :slant italic)))))


PlanetMySQL Voting: Vote UP / Vote DOWN

A simple webpage test script in Python

Февраль 25th, 2010

Looking around on Google for a webpage test script returns a lot of results. Some of them are useful, some are not. In particular, for Python, the scripts on the first page of results are minimal and lacking a useful copy and paste / ready to go script that will answer the question “is my webpage available?”. So I decided to write a quick one that will give you the return code and email you as an alert if the page does not return with a 200 code (successful). You can find the script here.

If you are familiar with Python scripting, this script could easily be modified to post to a form so that you can test a MySQL transaction (or other transactional DB) to ensure your stack is running as needed.


#!/usr/bin/python
###########################################################
## Matt Reid / themattreid at gmail d o t com
## Site: http://themattreid.com
## Date: 2010-02-24
## Purpose: Checks URL for valid page and emails if failed.
###########################################################
###########################################################
## EDIT THE FOLLOWING AS NEEDED
email = "email@gmail.com" ##Email address to send alert to
host = "google.com" ##Hostname base URL without http://
port = "80" ##Port - 80 for http, 443 for https
url = "/index.html" ##Page to look for on host
SENDMAIL = "/usr/sbin/sendmail" ##Binary location for sendmail
## END OF EDITABLE OPTIONS
###########################################################
import sys
import os
from httplib import HTTP

def get(host,port,url):
concaturl = host+url
print "Checking Host:",concaturl,"\n"
h = HTTP(host, port)
h.putrequest('GET', url)
h.putheader('Host', host)
h.putheader('User-agent', 'python-httplib')
h.endheaders()

(returncode, returnmsg, headers) = h.getreply()
if returncode != 200:
print returncode,returnmsg
return returncode

else:
f = h.getfile()
# return f.read() #returns content of page for further parsing
print returncode, returnmsg
return returncode

if __name__ == '__main__':
concaturl = "http://"+host+url
state = get(host,port,url)

if(state != 200):
msg = "Link (%s) is broken or unavailable. Please reference return code: %s" % (concaturl,state)
to = "To: "+email+"\n"
print msg,"\n"
p = os.popen("%s -t" % SENDMAIL, "w")
p.write(to)
p.write("Subject: Demo Server Status\n")
p.write("\n")
p.write(msg)
sts = p.close()
if sts != 0:
print "Sendmail exit status", sts
else:
print "Link is good.\n"


PlanetMySQL Voting: Vote UP / Vote DOWN

A simple webpage test script in Python

Февраль 25th, 2010

Looking around on Google for a webpage test script returns a lot of results. Some of them are useful, some are not. In particular, for Python, the scripts on the first page of results are minimal and lacking a useful copy and paste / ready to go script that will answer the question “is my webpage available?”. So I decided to write a quick one that will give you the return code and email you as an alert if the page does not return with a 200 code (successful). You can find the script here.

If you are familiar with Python scripting, this script could easily be modified to post to a form so that you can test a MySQL transaction (or other transactional DB) to ensure your stack is running as needed.


#!/usr/bin/python
###########################################################
## Matt Reid / themattreid at gmail d o t com
## Site: http://themattreid.com
## Date: 2010-02-24
## Purpose: Checks URL for valid page and emails if failed.
###########################################################
###########################################################
## EDIT THE FOLLOWING AS NEEDED
email = "email@gmail.com" ##Email address to send alert to
host = "google.com" ##Hostname base URL without http://
port = "80" ##Port - 80 for http, 443 for https
url = "/index.html" ##Page to look for on host
SENDMAIL = "/usr/sbin/sendmail" ##Binary location for sendmail
## END OF EDITABLE OPTIONS
###########################################################
import sys
import os
from httplib import HTTP

def get(host,port,url):
concaturl = host+url
print "Checking Host:",concaturl,"\n"
h = HTTP(host, port)
h.putrequest('GET', url)
h.putheader('Host', host)
h.putheader('User-agent', 'python-httplib')
h.endheaders()

(returncode, returnmsg, headers) = h.getreply()
if returncode != 200:
print returncode,returnmsg
return returncode

else:
f = h.getfile()
# return f.read() #returns content of page for further parsing
print returncode, returnmsg
return returncode

if __name__ == '__main__':
concaturl = "http://"+host+url
state = get(host,port,url)

if(state != 200):
msg = "Link (%s) is broken or unavailable. Please reference return code: %s" % (concaturl,state)
to = "To: "+email+"\n"
print msg,"\n"
p = os.popen("%s -t" % SENDMAIL, "w")
p.write(to)
p.write("Subject: Demo Server Status\n")
p.write("\n")
p.write(msg)
sts = p.close()
if sts != 0:
print "Sendmail exit status", sts
else:
print "Link is good.\n"


PlanetMySQL Voting: Vote UP / Vote DOWN

Status report No.2 on SQLAlchemy and MySQL Connector/Python

Февраль 19th, 2010

Few days ago, the folks at SQLAlchemy pushed some proposed modification to the MySQL Connector/Python dialect. Before this patch, previous report yielded 72 errors and 11 failures. Now we got down to 9 errors, but the failures are still lingering. Is this an improvement? Yes and no, failures should go down, but there are some SQLAlchemy tests I just can't figure out, yet.. clues are welcome!

Here are some detailed results which also included MySQLdb and oursql. I used SQLAlchemy revision 6788 (i.e. from svn trunk) which has now 2143 unittests:

  • mysql.connector rev216: SKIP=1, errors=9, failures=11 (355.707s)
  • MySQLdb 1.2.3c1: SKIP=2, errors=8, failures=1 (315.884s)
  • oursql 0.9.1: SKIP=1, errors=8, failures=2 (322.318s)

Software used: MacOSX v10.6.2, MySQL v5.1.42 and Python v2.6.1.

Using SQLAlchemy might not be the best way to messure how mature MySQL Connector/Python is, but it sure helps lots.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL – the best stored routine is the one you don’t write

Февраль 18th, 2010
At Fosdem 2010, already two weeks ago, I had the pleasure of hearing Geert van der Kelen explain the work he has been doing on connecting MySQL and Python. I don't know anything about Python, but anybody that has the courage, perseverance and coding skills to create an implementation of the the MySQL wire protocol from scratch is a class-A programmer in my book. So, I encourage everyone that needs MySQL connectivity for Python programs to check out Geert's brainchild, MySQL Connector/Python.

In relation to MySQL Connector/Python, I just read a post from Geert about how he uses the MySQL information_schema to generate some Python code. In this particular case, he needs the data from the COLLATIONS table to maintain a data structure that describes all collations supported by MySQL.

For some reasons that I cannot fathom, Geert needed to generate a structure for each possible collation, not just the ones for which the COLLATIONS table contains a row. To do this, he wrote a stored procedure that uses a cursor to loop through the COLLATIONS table. In the loop, he detects it whenever there's a gap in the sequence of values from the ID column, and then starts a new loop to "fill the gaps". For each iteration of the outer cursor loop, a piece of text is emitted that conforms to the syntax of a Python tuple describing the collation, and each iteration of the inner loop generates the text None, a Python built-in constant.

The final result of the procedure is a snippet of Python code shown below (abbreviated):

..
("cp1251","cp1251_bulgarian_ci"), # 14
("latin1","latin1_danish_ci"), # 15
("hebrew","hebrew_general_ci"), # 16
None,
("tis620","tis620_thai_ci"), # 18
("euckr","euckr_korean_ci"), # 19
..


In the final code, these lines are themselves used to form yet another tuple:

desc = (
None,
("big5","big5_chinese_ci"), # 1
("latin2","latin2_czech_cs"), # 2
("dec8","dec8_swedish_ci"), # 3
("cp850","cp850_general_ci"), # 4
..


This is excellent use of the information schema! However, I am not too thrilled about using a stored routine for this. Enter my fosdem talk about refactoring stored routines.

In this case, performance is not really an issue, so I won't play that card. But many people that do need well-performing stored procedures might start out like Geert and write a cursor loop, and perhaps do some looping inside that loop. One of the big take-aways in my presentation is to become aware of the ways that you can avoid a stored procedure. Geerts procedure is an excellent candidate to illustrate the point. As a bonus, I'm adding the code that is necessary to generate the entire snippet, not just the collection of tuples inside the outer pair of parenthesis.

So, here goes:

set group_concat_max_len := @@max_allowed_packet;

select concat('desc = (',
group_concat('\n '
, if( collations.id is null, 'None',
concat('(', '"', character_set_name, '"',
',', '"', collation_name, '"', ')')
)
, if(ids.id=255, '', ','), ' #', ids.id
order by ids.id
separator ''
), '\n)'
)
from (select (t0.id<<0) + (t1.id<<1) + (t2.id<<2)
+ (t3.id<<3) + (t4.id<<4) + (t5.id<<5)
+ (t6.id<<6) + (t7.id<<7) id
from (select 0 id union all select 1) t0
, (select 0 id union all select 1) t1
, (select 0 id union all select 1) t2
, (select 0 id union all select 1) t3
, (select 0 id union all select 1) t4
, (select 0 id union all select 1) t5
, (select 0 id union all select 1) t6
, (select 0 id union all select 1) t7) ids
left join information_schema.collations on ids.id = collations.id;

This query works first by generating 256 rows having id's ranging from 0 to 255. (I think I recall Alexander Barkov mentioning that this is currently the maximum number of collations that MySQL supports - perhaps I am wronge there). This is done by cross-joining a simple derived table that generates two rows:

(select 0 id union all select 1)

So, one row that yields 0, and one that yields 1. By cross-joining 8 of these derived tables, we get 2 to the 8th power rows, which equals 256. In the SELECT-list, I use the left bitshift operator << to shift the original 0 and 1 0, 1, 2 and so on up to 7 positions. By then adding those values together, we fill up exactly one byte, and gain all possible values from 0 to 256:

(select (t0.id<<0) + (t1.id<<1) + (t2.id<<2)
+ (t3.id<<3) + (t4.id<<4) + (t5.id<<5)
+ (t6.id<<6) + (t7.id<<7) id
from (select 0 id union all select 1) t0
, ... t1
, ...
, (select 0 id union all select 1) t7) ids

Once we have this, the rest is straightforward - all we have to do now is use a LEFT JOIN to find any collations from the information_schema.COLLATIONS table in case the value of its ID column matches the value we computed with the bit-shifting jiggery-pokery. For the matching rows, we use CONCAT to generate a Python tuple describing the collation, and for the non-matching rows, we generate None:

if( collations.id is null, 'None',
concat('(', '"', character_set_name, '"',
',', '"', collation_name, '"', ')')
)

The final touch is a GROUP_CONCAT that we use to bunch these up into a comma separated list that is used as entries for the outer tuple. As always, you should set the value of the group_concat_max_len server variable to a sufficiently high value to hold the contents of the generated string, and if you want to be on the safe side and not run the risk of getting a truncated result, you should use max_allowed_packet.

I have the honour of speaking at the MySQL user conference, april 12-15 later this year. There, I will be doing a related talk called Optimizing MySQL Stored Routines. In this talk, I will explain how stored routines impact performance, and provide some tips on how you can avoid them, but also on how to improve your stored procedure code in case you really do need them.

PlanetMySQL Voting: Vote UP / Vote DOWN