Archive for the ‘tips’ Category

Lost innodb tables, xfs and binary grep

Ноябрь 9th, 2010

Before I start a story about the data recovery case I worked on yesterday, here’s a quick tip – having a database backup does not mean you can restore from it. Always verify your backup can be used to restore the database! If not automatically, do this manually, at least once a month. No, seriously – in most of the recovery cases I worked on, customers did have some sort of backup, but it just wasn’t working, complete and what not. Someone set it up and never bothered to check if it still works after a while.

Anyway, this post is not really about the backups but rather about few interesting things I learned during last recovery case.

First, some facts about the system and how data was lost:

  • MySQL had a dedicated partition on XFS file system
  • Server was running innodb_file_per_table
  • There was a production master and two slaves, all had same setting
  • Developer accidentally ran DROP DATABASE X on the wrong machine (production master)
  • All slaves followed and dropped their copy of the data
  • The important tables were all InnoDB
  • Having a backup, customer has first attempted to restore from backup on the production master

Luckily (or rather, unfortunately) backup only had table definitions but not the data so no data was written to file system. Mind however that restoring a backup could have been fatal if it would have written some junk data as that would have overwritten the deleted files. Now, here’s what I learned while working on this case:

Recovering from XFS is possible. Just a month ago we had a team meeting in Mallorca where we went through various data loss scenarios. One of them was deleted files on xfs – we all agreed on few things:

  • recovering files from xfs is hard, if at all possible
  • we had no recovery cases on xfs, most likely because:
  • whoever is using xfs, is smart enough to have backups set up properly

Now I’m not picking on the customer or anything – indeed they did have a backup set up, it’s just that some (most important) tables weren’t backed up. We did not try any of the file recovery tools for xfs – apparently they are all targeting specific file types and sure enough InnoDB is not one of the supported files. What we did is we simply ran page_parser on the (already) unmounted file system treating it as a raw device. I was surprised how amazingly simple and fast it was (did you know that latest version of page_parser identifies pages by infimum and supremum records?) – 10G partition was scanned in like 5 minutes and all 4G of innodb pages were successfully written to a separate partition. That’s the easy part though – you run page parser, wait and see what you get.

If InnoDB Data Dictionary was not overwritten by an attempt to restore from the backup, actually second part would’ve been quite easy too, but it was so I could no longer identify correct PK id for specific tables by just mapping data dictionary table records to index records. Instead I had to grep for specific character sequences against all pages. Note however that only works for text in uncompressed text columns (varchar, char, text) but what if tables don’t have any text columns at all? Then, you read further.

GNU grep won’t match binary strings. This isn’t new, I kind of knew grep couldn’t look for binary “junk”, but I really needed it to. Why? Well, here’s few of the scenarios we’ve gone through yesterday:

1. There was this rather big table with integer and enum columns only, where we knew a rather unique PK, well something like 837492636 so we needed a way to find pages that match it. InnoDB would internally store integers in 4-bytes rather than 10 bytes if it were stored as a sequence of characters, so “grep -r 837492636 /dir” would not have worked.

2. There was another table, a small one with 4 smallint columns where all we could match on was a sequence of numbers from a single record – customer knew that there was at least one row with the following sequence: 7, 3, 7, 8. Matching by any of the numbers would be insane as it would match all of the pages while matching on numbers as a sequence of characters would not work for many reasons.

This is where I found bgrep which was exactly the tool for the task. In the case number one, I have just converted number 837492636 to it’s binary representation 0×31EB1F9C and ran “bgrep 31EB1F9C /dir” – there were only like 10 other matches across the 4 gigabytes of pages, some of them probably from the secondary pages, but when you only have that many pages to check it’s really simple.

Second case seemed somewhat complicated, but it really wasn’t – all of the columns were fixed size – 2bytes each, so the thing we had to look for was this sequence: 0007000300070008. I was expecting a lot of mismatches but in fact I ended up with only one match pointing exactly to the right page and so the right index id.

The other thing I would note about bgrep – it was so much faster than matching text using grep, so if you happen to have a lot of data to scan and you have to choose between matching text and number, matching a number using bgrep may work much better.

We are considering shipping bgrep as part of percona recovery toolset, with some additional converters so we can match against various date/time columns as well.


Entry posted by Aurimas Mikalauskas | 2 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

An argument for not using mysqldump

Ноябрь 8th, 2010

I have a 5G mysqldump which takes 30 minutes to restore from backup.  That means that when the database reaches 50G, it should take 30×10=5 hours to restore.  Right?  Wrong.

Mysqldump recovery time is not linear.  Bigger tables, or tables with more indexes will always take more time to restore.

If I restore from a raw backup (LVM snapshot, xtrabackup, innodb hot backup), it is very easy to model how much longer recovery time will take:

Backup is 80G
Copy is at 70MB/s.
10G is already complete.
= ((80-10) * 1024)/70/60 = ~17 minutes

I can tell progress with mysqldump by monitoring the rate at which show global status like 'Handler_write'; increases and compare it to my knowledge of about how many rows are in each table.  But progress != a magic number like “17 minutes”.  Not unless I do a lot of complex modeling.

I am not saying a 5 hour recovery is good or bad.  What I am saying is knowing remaining time is very important during disaster recovery.  Being able to say “we’ll be back at 2PM” is much better than saying “we’ll be back between 1PM and 4PM.. maybe”.


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Sharing an auto_increment value across multiple MySQL tables

Октябрь 5th, 2010

The title is SEO bait – you can’t do it. We’ve seen a few recurring patterns trying to achieve similar – and I thought I would share with you my favorite two:

Option #1: Use a table to insert into, and grab the insert_id:

CREATE TABLE option1 (id int not null primary key auto_increment) engine=innodb;

# each insert does one operations to get the value:
INSERT INTO option1 VALUES (NULL);
# $connection->insert_id();

Option #2: Use a table with one just row:

CREATE TABLE option2 (id int not null primary key) engine=innodb;
INSERT INTO option2 VALUES (1); # start from 1

# each insert does two operations to get the value:
UPDATE option2 SET id=@id:=id+1;
SELECT @id;

So which is better? I don’t think it’s that easy to tell at a first glance, since option 2 does look more elegant – but if the next value is fetched as part of a transaction – I can see a potential for many other transactions to back up waiting on a lock (more on that in a second).

To start with a naive test, I booted two EC2 small instances in the same availability zone. Ping times are ~0.5ms between nodes. –skip-name resolve is enabled on the server. There is some skew from the machine being virtualized. My simulation is:

  1. Create a new sequence value
  2. insert into another table

The testing options are:

  • Option 1 – auto_commit each statement
  • Option 2 – auto_commit each statement
  • Option 1 – nest each loop in a transaction
  • Option 2 – nest each loop in a transaction

The raw results are:

option1 usetransactions =19 seconds for x10000 iterations.
option1 ignoretransactions = 13 seconds for x10000 iterations.
option2 usetransactions = 27 seconds for x10000 iterations.
option2 ignoretransactions =22 seconds for x10000 iterations.

Alright – option1 seems quicker. The problem is that to be like most applications, we can’t really tell until a little concurrency is applied. Using only the “transactional” test in a few more concurrency options:


(results are transactions/second – more is better).

Option #1 starts to flatten out after 4 threads – and this is probably just because the machine I am running it against has 1 CPU. Option #2 stays flat the whole time.. and while we are running it, most of the threads are perpetually in a state of ‘Searching rows for update’ – which is what I suspect is better described as waiting on a lock.

Option #2 will likely scale better in auto_commit, since locks are held for the duration of a transaction, but this is not always possible to do if you have already started modifying data before you need an auto_increment number but you do not want to commit yet.

Full disclosure


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Book review : SQL Antipatterns

Сентябрь 23rd, 2010

SQL Antipatterns, by Bill Karwin


I remember that when I finished reading The Lord Of The Rings, I felt a pang of disappointment. "What? Already finished? What am I going to read now? What can give me the same pleasure and sense of accomplishment that these wonderful pages have given me?"
That's how I felt when I came to the last page of SQL Antipatterns. And, no, Bill Karwin doesn't tell imaginary tales from a fictitious world. This book is full of very real and very practical advice, but all the material is presented with such grace and verve that I could not put it down until the very end. I read it cover to cover in just a few hours, and I savored every page.

What is this Antipatterns, anyway? The title may deceive a casual bookshop browser into believing that it's about some philosophical database theory. Digging further, you realize that it's a collection of best practice in database handling, but told from the side of the bad practice. Every chapter starts with the story of a disaster, followed by an explanation of why it happened, and how to recognize the symptoms. Along the way, you learn what people do to fix the problem when they don't understand where exactly the trouble comes from. And then comes the best practice section, where you are told of several ways of fixing the problem for good.
It's a pleasure from the start. When I read the introductory stories, I recognized mistakes that I have witnessed in my past experience and some that I made myself. It was a fascinating sequence of "look at this mess!", "Here is what you shouldn't do", "It happens when ...", and "this is what you should do".
The more I read, the more I wanted to read. And after a few chapters, a pattern appeared among the Antipatterns. The stories are not a casual collection of database mistakes. There is a thread. Most of the stories tell you what happens when you violate the relational theory principles, and the best practice solutions focus on how to apply relational principles in real life.
The relationship between the horror stories and the relational database theory becomes apparent when you get to the appendix. In other books, you can often skip the appendixes as a reminder of something that you know already. Not here. If you skip this one, you will miss a sort of Ellery Queen ending, where the detective puts together all the pieces of the mystery for the amazed audience.

I feel I would do a disservice to Bill if I revealed more about the book. I may have already spoiled some of the surprise by disclosing the inner structure of the book. But certainly I won't give away any of the juicy stories that make this book an invaluable tool for every database professional.
I can only say this. When I read a technical book, I usually find something wrong, or debatable, or some technical solution that I could improve upon. Not in this book. I completely agree with the theory and the practice that is suggested from start to finish. My QA ego suffers at the idea of not having found anything wrong, but this slight disappointment is more than compensated by the pleasure of reading such a beautiful book. Well done, Bill! I am already queuing to buy your next one!

PlanetMySQL Voting: Vote UP / Vote DOWN

Instrumentation and the cost of Foreign Keys

Сентябрь 21st, 2010

I occasionally get in to light arguments healthy discussions with students about whether or not to use Foreign Key constraints on InnoDB tables.  My standard response has always been: “it depends on how much of a tradeoff you are willing to make for performance. In some situations the cost can be considerable”.

.. that’s when they expect me to “come up with some real proof” to show them. I do not disagree with their logic or proof being on their list-of-demands.  I support the use of data to make decisions.

The problem is that MySQL has (traditionally) been lacking the instrumentation required to make these decisions easy.  This is getting better  – here is an example we recently added to our InnoDB course:

 CREATE TABLE parent (
 id INT NOT NULL AUTO_INCREMENT,
 bogus_column char(32),
 PRIMARY KEY (id)
) ENGINE=InnoDB;

CREATE TABLE child (
 id INT NOT NULL AUTO_INCREMENT,
 parent_id INT NOT NULL,
 bogus_column char(32),
 PRIMARY KEY (id),
 KEY (parent_id),
 CONSTRAINT child_ibfk_1 FOREIGN KEY (parent_id) REFERENCES parent (id)
) ENGINE=InnoDB;

INSERT INTO parent (bogus_column)
VALUES ('aaa'), ('bbb'), ('ccc'), ('ddd'), ('eee');

INSERT INTO child (parent_id,bogus_column) VALUES
(1, 'aaa'), (2,'bbb'), (3, 'ccc'),
(4, 'ddd'), (5, 'eee');

START TRANSACTION; # session1
START TRANSACTION; # session2

# session1
UPDATE child SET parent_id = 5
 WHERE parent_id = 4;

#session2
UPDATE parent SET bogus_column = 'new!' WHERE id = 4;

#session2
UPDATE parent SET bogus_column = 'new!' WHERE id = 5;

In the last statement, session2 will block waiting on a lock. Want to know where that lock is? Check information_schema.innodb_locks:

mysql> SELECT * FROM information_schema.innodb_locks\G
*************************** 1. row ***************************
    lock_id: 87035:1300:3:6
lock_trx_id: 87035
  lock_mode: X
  lock_type: RECORD
 lock_table: `test`.`parent`
 lock_index: `PRIMARY`
 lock_space: 1300
  lock_page: 3
   lock_rec: 6
  lock_data: 5
*************************** 2. row ***************************
    lock_id: 87034:1300:3:6
lock_trx_id: 87034
  lock_mode: S
  lock_type: RECORD
 lock_table: `test`.`parent`
 lock_index: `PRIMARY`
 lock_space: 1300
  lock_page: 3
   lock_rec: 6
  lock_data: 5
2 rows in set (0.00 sec)

The same example without the foreign key constraints does not block on the last statement.  We also see the expected output change to:

mysql> SELECT * FROM information_schema.innodb_locks;
Empty set (0.00 sec)

This information_schema table is new to InnoDB plugin.  In earlier releases of MySQL you may be able to get the data, but it is not in such an easily digestible form.  Instrumentation is the most under talked about feature in all new releases of MySQL and Percona Server.  See BUG #53336 for more examples of pure awesomeness.


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

UDF -vs- MySQL Stored Function

Сентябрь 7th, 2010

Few days ago I was working on a case where we needed to modify a lot of data before pushing it to sphinx – MySQL did not have a function to do the thing so I thought I’ll write MySQL Stored Function and we’ll be good to go. It worked! But not so well really – building the index, which was taking 10 minutes, was now taking 16 minutes. Then we added another MySQL function for different set of attributes and indexing speed went from 16 minutes to 26 minutes. I knew using UDF would be faster, but I had no idea how much. Have you ever wondered?

So what were the modifications we needed? It was couple very simple things – (1) two varchar columns needed leading nonalpha characters trimmed, so “123 ^&* and some text” would become “and some text”, and (2) same two varchar columns needed some double characters changed to single one so “Picasso” becomes “Picaso”, “Wesselmann” becomes “Weselman” and so on. Why we needed that is a whole another store which this blog post is not about. Note however that only very small portion of data really needed to be modified.

Here are the two MySQL functions I wrote to do the job – ltrim_junk_mysql() and remove_dups_mysql(). Although processing single row seemed to be instantaneous, we needed to process much more than that – and that wasn’t as fast. For example, here’s how long it took to process 100k rows:

mysql> select ltrim_junk_mysql(author), ltrim_junk_mysql(title) from paintings limit 100000;
100000 rows in set (2.97 sec)

mysql> select remove_dups_mysql(author), remove_dups_mysql(title) from paintings limit 100000;
100000 rows in set (2.04 sec)

If you looked carefully at the second function though, you may have noticed I did not necessarily have to write a function, I could have written it as an SQL statement:

mysql> select
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( LOWER(author), 'aa', 'a'), 'bb', 'b'), 'cc', 'c'),
'dd', 'd'), 'ff', 'f'), 'gg', 'g'), 'll', 'l'), 'mm', 'm'), 'nn', 'n'),
'oo', 'o'), 'pp', 'p'), 'rr', 'r'), 'ss', 's'), 'tt', 't'), 'vv', 'v'),
'zz', 'z'),
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( LOWER(title), 'aa', 'a'), 'bb', 'b'), 'cc', 'c'),
'dd', 'd'), 'ff', 'f'), 'gg', 'g'), 'll', 'l'), 'mm', 'm'), 'nn', 'n'),
'oo', 'o'), 'pp', 'p'), 'rr', 'r'), 'ss', 's'), 'tt', 't'), 'vv', 'v'),
'zz', 'z') FROM paintings LIMIT 100000;
100000 rows in set (0.33 sec)

Doesn’t look nice, but it already executes more than 6 times faster which is interesting as it shows how much overhead you have by using mysql stored routines interface. So anyway, I asked my colleague Sasha to help me out by rewriting these as UDF functions. Here’s ltrim_junk() function and remove_dups(). Well, guess what:

mysql> select ltrim_junk(author), ltrim_junk(title) from paintings limit 100000;
100000 rows in set (0.13 sec)

mysql> select remove_dups(author), remove_dups(title) from paintings limit 100000;
100000 rows in set (0.17 sec)

So for ltrim_junk() function we got almost 23x improvement and for remove_dups – 12 times if comparing to stored function or 2 times comparing to just using available functions. With that speed I could even scan the whole table of 7 million records:

mysql> select count(*) from paintings where title != ltrim_junk(title);
+----------+
| count(*) |
+----------+
|   101533 |
+----------+
1 row in set (6.82 sec)

mysql> select count(*) from paintings where author != ltrim_junk(author);
+----------+
| count(*) |
+----------+
|    28335 |
+----------+
1 row in set (6.63 sec)

mysql> select count(*) from paintings where author != remove_dups(author) OR title != remove_dups(title);
+----------+
| count(*) |
+----------+
|  2720414 |
+----------+
1 row in set (11.19 sec)

Whereas using stored function used to take minutes!

I don’t mean to say stored functions are bad and you show now rewrite all your functions as UDFs – if you need to process just a few records for a request and you are not burning racks of CPUs to constantly do the job, the speed difference is really negligent. However in case like this one where we have to process many records constantly and every second counts, UDF can really save your day. If you need one and don’t feel confident writing C, you know who to call!


Entry posted by Aurimas Mikalauskas | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL GIS – Part 1

Август 24th, 2010

Tweet

In my business (weather) we use lots map based (Geo) information.  Almost every table has latitude and longitude. Working with this kind of data can be exciting and frustrating.  This should give you a quick start into GIS with MySQL.

“A geographic information system (GIS), or geographical information system, is any system that captures, stores, analyzes, manages, and presents data that are linked to location. In the simplest terms, GIS is the merging of cartography, statistical analysis, and database technology. GIS systems are used in cartography, remote sensing, land surveying, utility management, natural resource management, photogrammetry, geography, urban planning, emergency management, navigation, and localized search engines.”Wikipedia

GIS / Mapping Systems work with both text data and graphical data.  Applications and utilities often blur the lines between the two types and make understanding difficult.  Map servers blend raster images, with point or polygon data, and bitmap images to make complete images to display in the user’s client application.  For this post I will concentrate on the text type “data”.  The type we can index in a MySQL database.

THE SEARCH

After months of reading, [1]  I’m writing this post to describes what I have learned about how to get started using GEO coding data as quickly as possible.  I found very little piratical information on GIS and MySQL.  The MySQL manual covers the functions but doesn’t supply much practical information on GEO.  Anders Karlsson wrote a nice and short story about GIS that give me a good start.

The best information has be written by Florin Duroiu in his post titled “Political boundaries overlay in Google maps”. A good part of my post is based on his work.

STEP BY STEP

Below are the detailed needed to to produce a MySQL database with the Points of Interest (POI).  This is based on CentOS 5.5 with MySQL 5.1.

yum install gdal
mkdir geo
mkdir data
mkdir data/Oklahoma
cd geo/data/Oklahoma
wget http://downloads.cloudmade.com/north_america/united_states/oklahoma/oklahoma.shapefiles.zip
unzip oklahoma.shapefiles.zip
mysql -e 'create database geo'
ogr2ogr -f "MySQL" MySQL:"geo,user=root,host=localhost,password=" -nln oklahoma_poi -lco engine=MYISAM oklahoma_poi.shp
mysql geo -e 'desc oklahoma_poi'

+----------+--------------+------+-----+---------+----------------+
| Field    | Type         | Null | Key | Default | Extra          |
+----------+--------------+------+-----+---------+----------------+
| OGR_FID  | int(11)      | NO   | PRI | NULL    | auto_increment |
| SHAPE    | geometry     | NO   | MUL | NULL    |                |
| category | varchar(30)  | YES  |     | NULL    |                |
| name     | varchar(113) | YES  |     | NULL    |                |
+----------+--------------+------+-----+---------+----------------+

mysql geo -e "select name, category, Y(SHAPE) as lat, X(SHAPE) as lng  from oklahoma_poi where NAME like 'School:Putnam%'"

+-------------------------------------+--------------------------------+------------+-------------+
| name                                | category                       | lat        | lng         |
+-------------------------------------+--------------------------------+------------+-------------+
| School:Putnam City West High School | Government and Public Services |  35.492557 | -97.6605975 |
| School:Putnam City North School     | Government and Public Services | 35.5892209 | -97.6372648 |
| School:Putnam City School           | Government and Public Services | 35.5122794 | -97.6142079 |
| School:Putnam High School           | Government and Public Services | 35.5214459 | -97.6086523 |
| School:Putnam Heights Academy       | Government and Public Services | 35.5081143 | -97.5397619 |
+-------------------------------------+--------------------------------+------------+-------------+

In a coming set of post I’ll go over:

  • The “Data” types your will find and how to convert between them.
  • What data is available and where can you find it?
  • More examples on what you can do with GIS data.
  • Viewing our GIS data.
  • How to collect your own GIS data.
  • Good and bad examples of searching GIS data.
  • Optimizing MySQL GIS.  Is it really worth using?

[1] Books: GIS for DummiesAuthor: Michael N. DeMers – John Wiley & Sons (2009) – ISBN: 0470236825
Open Source GIS: A GRASS GIS Approach. Third Edition.
Author: Markus Neteler and Helena Mitasova – ISBN: 978-0-38735767-6
Web Mapping Illustrated: Using Open Source GIS Toolkits
Author: Tyler Mitchell – ISBN: 9780596008659




PlanetMySQL Voting: Vote UP / Vote DOWN

Why message queues and offline processing are so important

Август 11th, 2010

If you read Percona’s whitepaper on Goal-Driven Performance Optimization, you will notice that we define performance using the combination of three separate terms. You really want to read the paper, but let me summarize it here:

  1. Response Time – This is the time required to complete a desired task.
  2. Throughput – Throughput is measured in tasks completed per unit of time.
  3. Capacity – The system’s capacity is the point where load cannot be increased without degrading response time below acceptable levels.

Setting and meeting your response time goal should always be your primary focus, but the closer throughput is to capacity the worse response time can be.  It’s a trade-off! Cary Millsap reminds us to think of this just like how traffic slows down with more cars on a highway:


Photo Credit: photoAtlas

Which brings me to my point.

You can actually choose to optimize a system in two different ways – for response, or for throughput. When you optimize for throughput you are relaxing (not eliminating) your response time objectives in order to have more tasks completed per unit of time.

It is much easier to relax response time objectives if the task is not user facing, which is why I often see applications and suggest that they convert a task that happens in the foreground to instead by sent to a message queue, or Gearman.  Or in plain English: The same MySQL servers can achieve  much more work, if you allow the potential for each individual task to take a little bit longer.


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Does Size or Type Matter?

Июль 27th, 2010

MySQL seems to be happy to convert types for you. Developers are rushed to complete their project and if the function works they just move on. But what is the costs of mixing your types? Does it matter if your are running across a million rows or more? Lets find out.

Here is what the programmers see.

mysql> select 1+1;
+-----+
| 1+1 |
+-----+
|   2 |
+-----+
1 row in set (0.00 sec)

mysql> select "1"+"1";
+---------+
| "1"+"1" |
+---------+
|       2 |
+---------+
1 row in set (0.00 sec)

Benchmark

What if we do a thousand simple loops?  How long does the looping itself take?

The BENCHMARK() function executes the expression expr repeatedly count times. It may be used to time how quickly MySQL processes the expression. The result value is always 0.

mysql> select benchmark(1000000000, 1);
+--------------------------+
| benchmark(1000000000, 1) |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (5.42 sec)

mysql> select benchmark(1000000000, "1" );
+-----------------------------+
| benchmark(1000000000, "1" ) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (5.40 sec)

So maybe type doesn’t matter? About five seconds just to loop but the type didn’t change it.   What if we add 1+”1″?

mysql> select benchmark(1000000000, 1+1);
+----------------------------+
| benchmark(1000000000, 1+1) |
+----------------------------+
|                          0 |
+----------------------------+
1 row in set (12.65 sec)

mysql> select benchmark(1000000000, 1+"1");
+------------------------------+
| benchmark(1000000000, 1+"1") |
+------------------------------+
|                            0 |
+------------------------------+
1 row in set (35.58 sec)
mysql> select benchmark(1000000000, "1"+"1");
+--------------------------------+
| benchmark(1000000000, "1"+"1") |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (51.59 sec)

It looks like type does matter.  But does it always matter?

mysql> select benchmark(1000000000, sum(1+1));
+---------------------------------+
| benchmark(1000000000, sum(1+1)) |
+---------------------------------+
|                               0 |
+---------------------------------+
1 row in set (9.69 sec)

mysql> select benchmark(1000000000, sum("1"+"1"));
+-------------------------------------+
| benchmark(1000000000, sum("1"+"1")) |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
1 row in set (9.94 sec)

mysql> select benchmark(1000000000, sum("1.23456789"+"1.23456789"));
+-------------------------------------------------------+
| benchmark(1000000000, sum("1.23456789"+"1.23456789")) |
+-------------------------------------------------------+
|                                                     0 |
+-------------------------------------------------------+
1 row in set (10.32 sec)

So, not all functions are the same.  But it looks like size might matter!

mysql> select benchmark(1000000000, 1.1+1.1);
+--------------------------------+
| benchmark(1000000000, 1.1+1.1) |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (34.90 sec)

mysql> select benchmark(1000000000, "1.1"+"1.1");
+------------------------------------+
| benchmark(1000000000, "1.1"+"1.1") |
+------------------------------------+
|                                  0 |
+------------------------------------+
1 row in set (1 min 15.32 sec)

mysql> select  benchmark(1000000000, "1.123456789"+"1.123456789");
+----------------------------------------------------+
| benchmark(1000000000, "1.123456789"+"1.123456789") |
+----------------------------------------------------+
|                                                  0 |
+----------------------------------------------------+
1 row in set (1 min 53.32 sec)

Sorry.  Looks like size does matter.
This doesn't seem logical.

mysql> select benchmark(1000000000, 1=1);
+----------------------------+
| benchmark(1000000000, 1=1) |
+----------------------------+
|                          0 |
+----------------------------+
1 row in set (12.75 sec)

mysql> select benchmark(1000000000, 1="1");
+------------------------------+
| benchmark(1000000000, 1="1") |
+------------------------------+
|                            0 |
+------------------------------+
1 row in set (40.78 sec)
mysql> select benchmark(1000000000, 1=true);
+-------------------------------+
| benchmark(1000000000, 1=true) |
+-------------------------------+
|                             0 |
+-------------------------------+
1 row in set (12.73 sec)

mysql> select benchmark(1000000000, 1="true");
+---------------------------------+
| benchmark(1000000000, 1="true") |
+---------------------------------+
|                               0 |
+---------------------------------+
1 row in set, 65535 warnings (3 min 5.72 sec)
mysql> select benchmark(1000000000, "true"="true");
+--------------------------------------+
| benchmark(1000000000, "true"="true") |
+--------------------------------------+
|                                    0 |
+--------------------------------------+
1 row in set (57.25 sec)

Maybe we should CAST our work?

mysql> select benchmark(1000000000, cast("1" as unsigned));
+----------------------------------------------+
| benchmark(1000000000, cast("1" as unsigned)) |
+----------------------------------------------+
|                                            0 |
+----------------------------------------------+
1 row in set (32.27 sec)

mysql> select benchmark(1000000000, cast("1" as unsigned) + cast("1" as unsigned));
+----------------------------------------------------------------------+
| benchmark(1000000000, cast("1" as unsigned) + cast("1" as unsigned)) |
+----------------------------------------------------------------------+
|                                                                    0 |
+----------------------------------------------------------------------+
1 row in set (1 min 7.24 sec)

Maybe not!
Conclusion:  Be careful with your data types.  If you are taking user input, do the type conversion ONCE in your program.  Don’t let MySQL do the type conversions for you.
query = “SELECT * FROM table where $INPUT = 1″;   could be doing your wrong.

References:

http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_benchmark

http://dev.mysql.com/doc/refman/5.0/en/numeric-type-overview.html

http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html

PlanetMySQL Voting: Vote UP / Vote DOWN