Archive for the ‘SQL’ Category

Have you tested pt-online-schema-change?

Май 22nd, 2012

I’ve been seeing a lot of interest in pt-online-schema-change (nonblocking MySQL schema changes), with a lively discussion on the mailing list (which I think I’m not keeping up with…) and a couple of bug reports filed. I’m really interested whether people have tested it rigorously to ensure that it maintains your data integrity. I have done so, and there is a set of tests for it in the codebase, but nothing replaces real-world testing. If you find any problems or have questions, please address them to the percona-discussion Google Group.

Further Reading:


PlanetMySQL Voting: Vote UP / Vote DOWN

Manage hierarchical data with MySQL stored procedures

Май 21st, 2012
Below you will find all code to create the table and the stored procedures to manage hierarchical trees in MySQL. All code is documented and can be downloaded in a zip file.
PlanetMySQL Voting: Vote UP / Vote DOWN

Manage hierarchical data with MySQL stored procedures

Май 21st, 2012
Below you will find all code to create the table and the stored procedures to manage hierarchical trees in MySQL. All code is documented and can be downloaded in a zip file.
PlanetMySQL Voting: Vote UP / Vote DOWN

Joins: inner, outer, left, right

Май 17th, 2012

In (My)SQL, join is a means for combining records from two tables into a single set which can be either returned as is or used in another join. In order to perform the operation a join has to define the relationship between records in either table, as well as the way it will evaluate the relationship. The relationship itself is created through a set of conditions that are part of the join and usually are put inside ON clause. The rest is determined through a join type, which can either be an inner join or an outer join.

The SQL clauses that set the respective join type in a query are [INNER] JOIN and {LEFT | RIGHT} [OUTER] JOIN. As you can see the actual keywords INNER and OUTER are optional and can be omitted, however outer joins require specifying the direction – either left or right.

Examples of queries:
SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.name = 1;
SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 1;

Inner join, outer join

The primary difference between the two basic types (each has several subtypes) is in making the decision whether joining of two rows was successful or not, which essentially determines whether the combined row can be returned or not.

Inner joins require that a row from the first table has a match in the second table based on the join conditions. In means that the first query from the example above will only return any rows if files table contains at least one record where owner_id is 1 (has to be equal to users.id by the join conditions and users.id is filtered in WHERE to accept only that one value). Otherwise it will return no rows at all, even if users contains a valid user record. Assuming there are two users, but only one has any files:

mysql> SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 1;
+----+--------------+---------+----+----------+------------------+
| id | name         | enabled | id | owner_id | filename         |
+----+--------------+---------+----+----------+------------------+
|  1 | Albin Kolano |       1 |  1 |        1 | audit report.doc |
+----+--------------+---------+----+----------+------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 2;
Empty set (0.01 sec)

Outer joins, on the other hand, consider a join successful even if no records from the second table meet the join conditions (i.e. whether there are any matches or not). In such case outer join sets all values in the missing columns to NULL. The second query from the example will return rows whenever there are matches in users and regardless of the contents of files table.

mysql> SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 1;
+----+--------------+---------+------+----------+------------------+
| id | name         | enabled | id   | owner_id | filename         |
+----+--------------+---------+------+----------+------------------+
|  1 | Albin Kolano |       1 |    1 |        1 | audit report.doc |
+----+--------------+---------+------+----------+------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 2;
+----+-------------------+---------+------+----------+----------+
| id | name              | enabled | id   | owner_id | filename |
+----+-------------------+---------+------+----------+----------+
|  2 | Nadzieja Surowiec |       1 | NULL |     NULL | NULL     |
+----+-------------------+---------+------+----------+----------+
1 row in set (0.00 sec)
Left join, right join

Unlike inner joins, outer joins require that the join direction is specified. Inner join is a symmetrical and bi-directional relationship, which means A JOIN B produces the same result as B JOIN A. That is not true for outer joins, because they accept when for a record in A there is no matching record in B and in such case the reverse operation is impossible (it would have to start with the non-existing record in B). This is the reason why setting the direction is necessary. A LEFT JOIN B finds matches for rows from table A in table B, while A RIGHT JOIN B finds matches for records from B in A.

In practice there is very little or even no real purpose for using RIGHT JOIN and in majority of cases everyone just sticks to using LEFT JOIN if they ever need outer join.

When does the join type matter?

Choosing the appropriate type depends on the logic you are trying to implement.

You have to use inner join when mandatory pieces of information are located in both tables and partial information is considered incomplete or even useless. The case of this could be listing user's files based on the earlier example:

mysql> SELECT *
    -> FROM   users
    ->        JOIN files
    ->        ON     files.owner_id = users.id
    -> WHERE  users.name = 'Nadzieja Surowiec'
    ->        AND users.enabled = 1;
Empty set (0.00 sec)

The query finds the user's record by in users table and verifies that they are allowed to use the service through the value of users.enabled column and then searches for their files in files table. If there are no matches in either table, the query does not return any result, which is the correct behavior. If outer join was used in this case, a useless partial result could be returned or even incorrect result:

mysql> SELECT *
    -> FROM   users
    ->        LEFT JOIN files
    ->        ON     files.owner_id = users.id
    -> WHERE  users.name = 'Nadzieja Surowiec'
    ->        AND users.enabled = 1;
+----+-------------------+---------+------+----------+----------+
| id | name              | enabled | id   | owner_id | filename |
+----+-------------------+---------+------+----------+----------+
|  2 | Nadzieja Surowiec |       1 | NULL |     NULL | NULL     |
+----+-------------------+---------+------+----------+----------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(1)
    -> FROM   users
    ->        LEFT JOIN files
    ->        ON     files.owner_id = users.id
    -> WHERE  users.name = 'Nadzieja Surowiec'
    ->        AND users.enabled = 1;
+----------+
| COUNT(1) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

The application relying on such queries not only would not be able to make anything out of such file information where file data is all set to NULL values, but also it would have to include additional and in fact redundant logic to filter out such results. The row count in this case is correct, as the query returned a single row, but it does not represent how many files the user has, so it is not a valid information that the application could use.

Outer join must be used to perform a join with a table, which holds information that is only optional for the result.

In our example we are working with a query that lists user's files and we already established that the join between users and files has to be inner join. But let's give our users the opportunity to choose a custom icon for any file if they want to. The information could be kept in a separate table called file_icon.

Now, for each listed file we also want to see if user has set a custom icon for that file and return the icon name if they have set it. The icon information is entirely optional, so we want the query to return rows regardless of whether there is an entry for the given file in file_icon or not. Therefore we have to use outer join for this particular task.

mysql> SELECT *
    -> FROM   users
    ->        JOIN files
    ->        ON     files.owner_id = users.id
    ->        LEFT JOIN file_icon
    ->        ON     file_icon.file_id = files.id
    -> WHERE  users.name = 'Albin Kolano'
    ->        AND users.enabled = 1;
+----+--------------+---------+----+----------+------------------+---------+------------------+
| id | name         | enabled | id | owner_id | filename         | file_id | icon_image       |
+----+--------------+---------+----+----------+------------------+---------+------------------+
|  1 | Albin Kolano |       1 |  1 |        1 | audit report.doc |       1 | MS-Word-Icon.png |
|  1 | Albin Kolano |       1 |  2 |        1 | stats-201104.xls |    NULL | NULL             |
+----+--------------+---------+----+----------+------------------+---------+------------------+

The outer join allowed us to grab the complete list of user's files and along with their icons if any were set. If we used inner join instead, the result would be missing the record of stats-201104.xls file.


PlanetMySQL Voting: Vote UP / Vote DOWN

How Percona Toolkit divides tables into chunks

Май 6th, 2012

The tools we’ve redesigned in Percona Toolkit recently have moved away from a legacy technique for operating on small numbers of rows at a time, towards a more reliable and predictable method. We call the old version “chunking” and the new version “nibbling.” Many other MySQL tools I’ve seen either operate on entire tables, or use the “chunking” technique and are exposed to the problems it creates. I’ll compare the two briefly to explain the differences.

Chunking attempts to divide a table into ranges of rows of a desired size, such as 1000 rows. It does this by examining the minimum and maximum value of the primary key (or other suitable index), estimating the number of rows in the table, and dividing one by the other to create a list of boundary values. Suppose that the minimum value is 1 and the maximum is 1000000, and there are an estimated 100000 rows in the table. The chunk boundaries will fall on intervals of 10000. We can operate on 1-10000, 10001-20000, and so on.

This has a number of problems that might not be obvious at first. It practically requires a numeric, single-column index[1]. It can (and does, in practice) create oversized chunks when values are sparse in some places and packed tightly in others. It leads to a lot of code complexity and bugs when the table’s data changes (especially if new rows are inserted) as the tool works. It has edge cases related to special values such as 0, NULL, the date ’0000-00-00′, the beginning of the table, and the end of the table. In short, through years of experience we found that it simply doesn’t work well enough. It works in 95% of cases, but not all that well, and in a small fraction of cases it causes serious problems, such as a massively oversized chunk that interferes with other processes on the server. If you’d like more details on these types of problems, there is a lot of information in old bug reports.

The new “nibbling” technique we are using is actually not new at all. It is something I learned before I was even involved in MySQL very much. The context is in some old blog posts I wrote about archiving and purging. The idea is to fetch a row and use that as the lower bound of the first chunk (or “nibble” — we use the terms pretty interchangeably) of rows. Then use a SELECT with a LIMIT to find the upper boundary of the nibble, as well as the lower bound of the next nibble. After processing the nibble, repeat the steps with the lower bound of the next nibble. The disadvantage of this process is that there is quite a bit of code complexity when you get multi-column indexes, with NULL-able columns adding a whole new twist to the game. However, this is long since solved, and we have a reliable and well-tested library of routines for dealing with this. In return we get predictable behavior on practically any table with an index, even if it isn’t a unique index. The only remaining edge case is when a non-unique index contains a range of identical values that is larger than the desired chunk size, but that is easily detected and handled in application-specific ways.

The result of the work we’ve been doing recently, replacing “chunking” with “nibbling” in pt-table-checksum and pt-online-schema-change, is reliable and safe nibbling behavior. This has the further benefit of allowing us to do much more sophisticated techniques, such as dynamically varying the size of each chunk of rows in response to changing conditions such as server load or varying row size and complexity. Our recent tools are designed to target a predictable query time for each nibble, rather than a specific number of rows. I am happy to report that, in extensive real-world usage, they are able to stay extremely close to the target time even as conditions vary dramatically.

[1] Although in theory you can operate on the first column in a multi-column index, real-world experience shows that the first column in such indexes tends to have low cardinality, thus creating enormous chunks. And although it is possible to treat date, timestamp, and some other types as numeric, in practice it is very difficult. What number corresponds to 0000-00-00? Our attempts to create algorithms that would work on non-numeric types such as character-based columns were tremendously difficult and the results were discouraging; again, in the real world it doesn’t work well.

Further Reading:


PlanetMySQL Voting: Vote UP / Vote DOWN

Free webinar: non-blocking MySQL schema changes

Май 2nd, 2012

Please join me tomorrow (Wednesday) for a webinar about the new version of pt-online-schema-change, Percona Toolkit’s tool for performing nonblocking ALTER TABLE operations. Registration is free. I will explain how the tool works and cover some typical use scenarios. This might be helpful if you’re trying to decide whether the tool is right for you. I am also planning to leave plenty of time for questions!

Further Reading:


PlanetMySQL Voting: Vote UP / Vote DOWN

Free webinar: non-blocking MySQL schema changes

Май 2nd, 2012

Please join me tomorrow (Wednesday) for a webinar about the new version of pt-online-schema-change, Percona Toolkit’s tool for performing nonblocking ALTER TABLE operations. Registration is free. I will explain how the tool works and cover some typical use scenarios. This might be helpful if you’re trying to decide whether the tool is right for you. I am also planning to leave plenty of time for questions!

Further Reading:


PlanetMySQL Voting: Vote UP / Vote DOWN

Books vs. e-Books for DBA’s

Апрель 27th, 2012
As most people still do I learned to read using books. WhooHoo!

Books are nice. Besides reading them they are also a nice decoration on your shelf. There is a brilliant TED talk by Chip Kidd on this subject.

But sometimes books have drawbacks. This is where I have to start the comparison with vinyl records (Yes, you're still reading a database oriented blog). Vinyl records look nice and are still being sold and yes I also still use them. The drawback is that car dealers start to look puzzeled if you ask them if your new multimedia system in your car is able to play your old Led Zeppelin records. The market for portable record players is small, and that's for a good reason.

The problem with books about databases is that they get old very soon. The MySQL 5.1 Cluster Certification Study Guide was printed by lulu.com which made it possible to quickly update the material. This made sure that the material wasn't outdated when you bought it.

I like to use books as reference material, but I tend to use Google more often and the books stay on the bookshelf and are getting old and dusty. One of the reasons for this is that taking books with me just for reference is not an option judging by the weight of it.

At Percona Live UK I got a voucher from O'reilly to get a free e-Book. So I chose 'SQL and Relational Theory'. I started to read it on my laptop with FBreader and on my iPhone using Stanza. Both my phone and laptop are not really made for reading. So I bought an Sony Reader, which is made for reading.

Reading 'SQL and Relational Theory' on the Sony Reader is nice. The only annoyance is that the examples are like this:
SELECT COUNT(*) | SELECT COUNT(col1)
FROM tbl1       | FROM tbl1
And with line wrapping it looks like this:
SELECT COUNT(*) | 
SELECT COUNT(col1)
FROM tbl1       | 
FROM tbl1
Which is not very readable.
The book is very theoretical as you might expect, but nonetheless it's a very good read.

The Sony Reader is not very suitable for reading whitepapers in PDF format  as most whitepapers are in A4 or Letter format which is too big for the device. Of course software like Calibre can covert some of those.
(Oracle, Percona, others… please also publish your whitepapers in a format more suitable for an eReader)

The device itself is very nice. The battery time and e-Ink display are good (especially if you compare them with an tablet).

Unfortunately it doesn't increase my reading speed and it doesn't give me more time to read.

I'm looking forward to read some other database books in e-Book format.  I think the next one on my list is High Performance MySQL.

I planned to publish this post when I finished reading SQL and Relation Theory, but I thought now might be a better time as O'Reilly has a discount for that book and other books by C.J. Date.

The Sony Reader runs a modified Android (yes it's possible to root it to play angry birds on it). It also has a webbrowser, but itsn't well suited for reading Planet MySQL or Planet MariaDB. Using the webbrowser to download the MP3 for the OurSQL Podcast and then playing it does work flawlessly. I tried to download the EPUB file for the MySQL Reference Manual, but that failed so I used USB for that.

PlanetMySQL Voting: Vote UP / Vote DOWN

Books vs. e-Books for DBA’s

Апрель 27th, 2012
As most people still do I learned to read using books. WhooHoo!

Books are nice. Besides reading them they are also a nice decoration on your shelf. There is a brilliant TED talk by Chip Kidd on this subject.

But sometimes books have drawbacks. This is where I have to start the comparison with vinyl records (Yes, you're still reading a database oriented blog). Vinyl records look nice and are still being sold and yes I also still use them. The drawback is that car dealers start to look puzzeled if you ask them if your new multimedia system in your car is able to play your old Led Zeppelin records. The market for portable record players is small, and that's for a good reason.

The problem with books about databases is that they get old very soon. The MySQL 5.1 Cluster Certification Study Guide was printed by lulu.com which made it possible to quickly update the material. This made sure that the material wasn't outdated when you bought it.

I like to use books as reference material, but I tend to use Google more often and the books stay on the bookshelf and are getting old and dusty. One of the reasons for this is that taking books with me just for reference is not an option judging by the weight of it.

At Percona Live UK I got a voucher from O'reilly to get a free e-Book. So I chose 'SQL and Relational Theory'. I started to read it on my laptop with FBreader and on my iPhone using Stanza. Both my phone and laptop are not really made for reading. So I bought an Sony Reader, which is made for reading.

Reading 'SQL and Relational Theory' on the Sony Reader is nice. The only annoyance is that the examples are like this:
SELECT COUNT(*) | SELECT COUNT(col1)
FROM tbl1       | FROM tbl1
And with line wrapping it looks like this:
SELECT COUNT(*) | 
SELECT COUNT(col1)
FROM tbl1       | 
FROM tbl1
Which is not very readable.
The book is very theoretical as you might expect, but nonetheless it's a very good read.

The Sony Reader is not very suitable for reading whitepapers in PDF format  as most whitepapers are in A4 or Letter format which is too big for the device. Of course software like Calibre can covert some of those.
(Oracle, Percona, others… please also publish your whitepapers in a format more suitable for an eReader)

The device itself is very nice. The battery time and e-Ink display are good (especially if you compare them with an tablet).

Unfortunately it doesn't increase my reading speed and it doesn't give me more time to read.

I'm looking forward to read some other database books in e-Book format.  I think the next one on my list is High Performance MySQL.

I planned to publish this post when I finished reading SQL and Relation Theory, but I thought now might be a better time as O'Reilly has discount for that book and other books by C.J. Date.

The Sony Reader runs a modified Android (yes it's possible to root it to play angry birds on it). It also has a webbrowser, but itsn't well suited for reading Planet MySQL or Planet MariaDB. Using the webbrowser to download the MP3 for the OurSQL Podcast and they playing it does work flawlessly. I tried to download the EPUB file for the MySQL Reference Manual, but that failed so I used USB for that.


  



PlanetMySQL Voting: Vote UP / Vote DOWN

Interest building in Percona XtraDB Cluster

Апрель 26th, 2012

In the last few weeks I’ve been caught off guard by the number of people who’ve told me they have been evaluating Percona XtraDB Cluster (link), and even more surprised at the projects they’re considering it for. Yesterday alone I spoke to several people who have been evaluating it for large, mission-critical enterprise deployments. Some new, some to replace existing systems that use standard MySQL replication. What was interesting is that some people said they’ve been putting it through its paces for months — before we even released it as GA.

Another person said he was evaluating it and tried a bunch of things like killing nodes, and it “just worked.” He sounded like he’d been suspicious: had it REALLY worked? But then, on further investigation, he was able to confirm that yes, it had just worked. The node went away; the cluster as a whole was healthy and happy.

It’s funny how you can get a feeling about the momentum on a product or idea or event. It probably builds upon the little things, like tone of voice or expressions on peoples’ faces. In any case I have this gut feeling about Percona XtraDB Cluster: it’s about to happen.

Further Reading:


PlanetMySQL Voting: Vote UP / Vote DOWN