Archive for the ‘null’ Category

Two subtle bugs in OUTER JOIN queries

Август 3rd, 2010

OUTER JOIN queries in SQL are susceptible to two very subtle bugs that I’ve observed a number of times in the real world. Daniel and I have been hammering out ways to automatically detect queries that suffer from these bugs, in a relatively new Maatkit tool called mk-query-advisor. It’s part of our series of advisor tools for MySQL. I wrote a blog post about it a while ago. Automated analysis of bad query patterns is a good thing to write tools to do, because catching buggy queries is hard work if you do it manually.

Let’s dive right in and analyze these subtle bugs. Warning: if you don’t understand how SQL handles NULL, you’re not going to understand the following. Many people have a hard time with NULL, which is why these bugs are so hard to understand and avoid. This is one reason why SQL is a hard language to use properly.

Bug 1: a column could be NULL for two reasons, and you can’t distinguish them

If the outer table in your query contains NULL-able columns, and you place a WHERE clause to filter out all but those rows, you’re going to get bugs because a non-matching row in the outer table will be all-NULL. Here’s an example. Let’s start with a plain outer join query:

select * from L left join R on l_id = r_id;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

Here we see that one row in the outer table is missing, and one row (the middle row) has a NULL r_other column. Now, let’s add a WHERE clause:

select * from L left join R on l_id = r_id where r_other is null;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

This query is buggy, because the two rows are returned for completely different reasons, and you can’t be sure which is which. IS NULL clauses can safely be placed on the columns used in the JOIN clause, but not on other columns in the outer table that might be NULL.

Bug 2: an OUTER JOIN is converted to INNER

If you place a non-null-safe comparison operator on any column in the outer table that isn’t part of the JOIN clause, you implicitly disable the outer-ness of the query and convert it to an INNER JOIN. Here’s an example:

select * from L left join R on l_id = r_id where r_other > 1;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
+------+------+---------+

The left-outer-ness of the above query is what causes the third row to be output in the first query I showed you above. The greater-than operator in this example automatically makes the left-ness impossible, because anytime there’s a row in the inner table that has no match in the outer table, it’ll be filled in with NULLs, and those NULLs will be eliminated by the operator. So the effect is that only matching rows will ever be output.

If you want to ponder variations and subtleties of the above, you can read more discussion on the issue report where we’re hammering out the details of automatically detecting and warning about these sneaky errors.

Related posts:

  1. How to simulate FULL OUTER JOIN in MySQL
  2. How to write a SQL exclusion join
  3. How to write SQL JOIN clauses more compactly
  4. The dangerous subtleties of LEFT JOIN and COUNT() in SQL
  5. How to write INSERT IF NOT EXISTS queries in standard SQL


PlanetMySQL Voting: Vote UP / Vote DOWN

The difference between a unique index and primary key in MySQL

Сентябрь 13th, 2009

There’s a really important difference between a unique index (MySQL’s answer to a “unique constraint”) and a primary key in MySQL. Please take a look at this:

CREATE TABLE `t` (
  `a` int,
  `b` int,
  `c` int,
  UNIQUE KEY `a` (`a`,`b`)
)

The combination of columns a, b should uniquely identify any tuple in the table, right?

select * from t;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    2 |    3 | 
| NULL | NULL |    1 | 
| NULL | NULL |    1 | 
| NULL | NULL |    1 | 
+------+------+------+

Wrong. Our arch-enemy NULL messes things up again:

A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. This constraint does not apply to NULL values except for the BDB storage engine. For other engines, a UNIQUE index allows multiple NULL values for columns that can contain NULL

MySQL doesn’t let you define a primary key over nullable columns, for this reason. This is as of version 4.0, I believe — I recall that in version 3.23 there was nothing special about a primary key; it was just a unique non-nullable index named PRIMARY.

Related posts:

  1. 5 ways to make hexadecimal identifiers perform better on MySQL One of the
  2. Extended covering indexes As you can
  3. 50 things to know before migrating Oracle to MySQL A while ba

Related posts brought to you by Yet Another Related Posts Plugin.


PlanetMySQL Voting: Vote UP / Vote DOWN