Archive for the ‘Syntax’ Category

Syntax of the day: IS TRUE and IS FALSE

Январь 26th, 2012

What makes for a true statement?

We usually test statements using a WHERE clause:

SELECT * FROM world.City WHERE Population > 1000000

The "Population > 1000000" statement makes for a boolean expression. Using WHERE is just one way of evaluating it. One can also test with IF():

SET @val := 7;
SELECT IF(@val > 2, 'Yes', 'No')

TRUE and FALSE

The two are keywords. They also map for the numerals 1 and 0, as follows:

mysql> SELECT TRUE, FALSE;
+------+-------+
| TRUE | FALSE |
+------+-------+
|    1 |     0 |
+------+-------+

Like in the C programming language, a nonzero value evaluates to a true value. A zero evaluates to false. A NULL evaluates to... well, NULL. But aside from 3-valued logic, what's important in our case is that it is not true.

However, simple value comparison is incorrect:

mysql> SELECT @val, @val > 3, @val > 3 = TRUE as result;
+------+----------+--------+
| @val | @val > 3 | result |
+------+----------+--------+
|    7 |        1 |      1 |
+------+----------+--------+

mysql> SELECT @val, @val = TRUE as result;
+------+--------+
| @val | result |
+------+--------+
|    7 |      0 |
+------+--------+

To test for the truth value of an expression, the correct syntax is by using IS TRUE:

SELECT @val, @val IS TRUE as result;
+------+--------+
| @val | result |
+------+--------+
|    7 |      1 |
+------+--------+

Likewise, one may use IS FALSE to test for falsehood. However, if you wish to note NULL as a false value this does not work:

SELECT @empty, @empty IS TRUE, @empty IS FALSE;
+--------+----------------+-----------------+
| @empty | @empty IS TRUE | @empty IS FALSE |
+--------+----------------+-----------------+
| NULL   |              0 |               0 |
+--------+----------------+-----------------+

If you're unsure why, you should read more on three-valued logic in SQL. To solve the above, simply use IS NOT TRUE:

SELECT @empty, @empty IS NOT TRUE;
+--------+--------------------+
| @empty | @empty IS NOT TRUE |
+--------+--------------------+
| NULL   |                  1 |
+--------+--------------------+

In summary, use IS TRUE and IS NOT TRUE so as to normalize truth values into a 0, 1 value range, C style, including handling of NULLs.


PlanetMySQL Voting: Vote UP / Vote DOWN

Quoting text JavaScript/Python style

Ноябрь 15th, 2011

Unless your MySQL is configured to use ANSI_QUOTES in sql_mode, you are able to quote your text in one of two forms: using single quotes or double quotes:

UPDATE world.Country SET HeadOfState = 'Willy Wonka' WHERE Code='USA'
UPDATE world.Country SET HeadOfState = "Willy Wonka" WHERE Code="USA"

This makes for JavaScript- or Python-style quoting: you quote by your needs. Say you have a text which includes single quotes:

It is what you read when you don't have to that determines what you will be when you can't help it. - Oscar Wilde

You wish to insert this text to some tables. You could go through the trouble of escaping it:

INSERT INTO quotes (quote, author) VALUES (
  'It is what you read when you don\'t have to that determines what you will be when you can\'t help it.', 'Oscar Wilde');

or you could just wrap it in double quotes:

INSERT INTO quotes (quote, author) VALUES (
  "It is what you read when you don't have to that determines what you will be when you can't help it.", 'Oscar Wilde');

I find this useful when using SQL to generate queries. Take, for example, eval() for MySQL: the statement:

CALL eval('select concat(\'KILL \',id) from information_schema.processlist where user=\'webuser\'');

is just so more easily written this way:

CALL eval("select concat('KILL ',id) from information_schema.processlist where user='webuser'");

I don't suggest one should use this method throughout her application code. Application code works great with auto-escaping string literals. But for the handy DBA or developer, who needs to work some quick queries by hand, this makes for an easier syntax to use.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL terminology: processes, threads & connections

Ноябрь 3rd, 2010

There’s some confusion in the MySQL terminology for processes, threads & connections, which I will try to resolve. I can’t explain the full nature of what processes and threads are; please see Wikipedia [1] [2] for that. But here’s some basics with regard to MySQL:

  • MySQL server is a single process application.
  • It is multithreaded.
  • It (usually) acts as a TCP/IP server, accepting connections.
  • Each connection gets a dedicated thread.
  • These threads are sometimes named processes, and sometimes they’re referred to as connections.

The last part is where confusion arises, so let me discuss again the use of threads and connections in MySQL.

MySQL truly is a single process server. It is multi threaded, in that there are many obvious and less obvious threads comprising the server. Such threads are the InnoDB I/O threads, the DELAYED INSERT thread, etc. Oh, and of course: the connection threads. More on this in a short while.

On older Linux versions or on glibc-static versions, one may view MySQL as a multi-process server. This is not so: it is merely because threads are mapped to OS processes. For the sake of this discussion this is irrelevant. mysqld is a single process.

So, every new connection gets its own thread. Assuming no thread pool is in use, every new connection makes for the creation of a new thread, and a disconnect causes for that thread’s destruction. Hence, there is a 1-1 mapping between connections and active threads. But then, there is a thread pool, which means there can be threads which are not associated with any connection. So, the number of threads is greater than or equal to the number of connections.

Here’s where terminology gets confusing. When you want to see what’s executing on the server, you issue SHOW PROCESSLIST:

mysql> SHOW PROCESSLIST\G
*************************** 1. row ***************************
     Id: 4
   User: root
   Host: localhost
     db: mycheckpoint
Command: Query
   Time: 0
  State: NULL
   Info: SHOW PROCESSLIST
1 row in set (0.02 sec)

Perhaps this should have been called SHOW THREADLIST; the acting queries are not really processes.

OK, so there’s process #4 which is executing a query. What’s my process id? Turns out I don’t have a process id. I do get to have a CONNECTION_ID():

mysql> SELECT CONNECTION_ID();
+-----------------+
| CONNECTION_ID() |
+-----------------+
|               4 |
+-----------------+

So how many processes or connections are now actually doing anything? We now must check for ‘Threads_running’.

mysql> SHOW GLOBAL STATUS LIKE 'Threads_running';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Threads_running | 1     |
+-----------------+-------+

And so we have ‘Threads_cached’, ‘Threads_connected’ & ‘Max_used_connections’.

Confusing?

Most of the time one can simply think of processes, threads and connections as 1-1-1 mapped, and not bother with it.


PlanetMySQL Voting: Vote UP / Vote DOWN

SQL: good comments conventions

Июль 1st, 2010

I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code.

I list four major places where SQL comments are helpful. I’ll use the sakila database. It is originally scarcely commented; I’ll present it now enhanced with comments, to illustrate.

Table definitions

The CREATE TABLE statement allows for a comment, intended to describe the nature of the table:

CREATE TABLE `film_text` (
 `film_id` smallint(6) NOT NULL,
 `title` varchar(255) NOT NULL,
 `description` text,
 PRIMARY KEY (`film_id`),
 FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Reflection of `film`, used for FULLTEXT search.'

It’s too bad the comment’s max length is 60 characters, though. However, it’s a very powerful field.

Column definitions

One may comment particular columns:

CREATE TABLE `film` (
 `film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
 `title` varchar(255) NOT NULL,
 `description` text,
 `release_year` year(4) DEFAULT NULL,
 `language_id` tinyint(3) unsigned NOT NULL COMMENT 'Soundtrack spoken language',
 `original_language_id` tinyint(3) unsigned DEFAULT NULL COMMENT 'Filmed spoken language',
 `rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
 `rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
 `length` smallint(5) unsigned DEFAULT NULL,
 `replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
  ...
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8

Stored routines definitions

Here’s an original sakila procedure, untouched. It is already commented:

CREATE DEFINER=`root`@`localhost` PROCEDURE `rewards_report`(
 IN min_monthly_purchases TINYINT UNSIGNED
 , IN min_dollar_amount_purchased DECIMAL(10,2) UNSIGNED
 , OUT count_rewardees INT
)
 READS SQL DATA
 COMMENT 'Provides a customizable report on best customers'
BEGIN

 DECLARE last_month_start DATE;
 DECLARE last_month_end DATE;
 ...

SQL queries

Last but not least, while not part of the schema, SQL queries define the use of the schema. That is, the schema exists for the sole reason of being able to query it.

Where did that query come from? Which piece of code issued it? Why? What’s its purpose?

Looking at the PROCESSLIST, the slow log, etc., it is easier when the queries are commented:

SELECT
 /* List film details along with participating actors */
 /* Issued by analytics module */
 film.*,
 COUNT(*) AS count_actors,
 GROUP_CONCAT(CONCAT(actor.first_name, ' ', actor.last_name))
FROM
 film
 JOIN film_actor USING(film_id)
 JOIN actor USING(actor_id)
GROUP BY film.film_id;

Conclusion

Source code commenting is an important practice, and usually watched out for. SQL & table definitions commenting are often scarce or non-existent. I urge DBAs to adopt a comments coding convention for SQL, and apply it whenever they can.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Idiosyncrasies That Bite

Июнь 28th, 2010

The following are my slides that I presented at ODTUG Kaleidoscope 2010. This presentation talks about the MySQL defaults including a non-transactional state, silent data truncations, date management, and transaction isolation options. These are all critical for data integrity and consistency. I also cover in-depth topics including SQL_MODE, character sets and collations.


PlanetMySQL Voting: Vote UP / Vote DOWN

Views: better performance with condition pushdown

Май 20th, 2010

Justin’s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer’s site.

The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.

While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there’s a hack around this.

Let’s use the world database to illustrate. Look at the following view definition:

CREATE
  ALGORITHM=TEMPTABLE
VIEW country_languages AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON (Country.CODE = CountryLanguage.CountryCode)
  GROUP BY
    Country.CODE;

The view presents with a list of spoken languages per country. The execution plan for querying this view looks like this:

mysql> EXPLAIN SELECT * FROM country_languages;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | <derived2>      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 |                                              |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+

And, even if we only want to filter out a single country, we still get the same plan:

mysql> EXPLAIN SELECT * FROM country_languages WHERE Code='USA';
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | <derived2>      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 | Using where                                  |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+

So, we need to scan the entire country_language and country tables in order to return results for just one row.

A non-working solution

The solution offered by the developer was this:

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_non_working AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = @country_code
  GROUP BY Country.CODE;

And follow by:

mysql> SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+

So, pushdown a WHERE condition into the view’s definition. The session variable @country_code is used to filter rows. In the above simplified code the value is assumed to be set; tweak it as you see fit (using IFNULL, for example, or OR statements) to allow for full scan in case the variable is undefined.

This doesn’t work. It used to work a couple years back; but today you cannot create a view which uses session variables or parameters. It is a restriction imposed by views.

A workaround

Justin showed a workaround using an additional table. There is another workaround which does not involve tables, but rather stored routines. Now, this is a patch, and an ugly one. It may not work in future versions of MySQL for all I know. But, here it goes:

DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `get_session_country`() RETURNS CHAR(3)
    NO SQL
    DETERMINISTIC
BEGIN
  RETURN @country_code;
END $$
DELIMITER ;

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_2 AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = get_session_country()
  GROUP BY Country.CODE;

And now:

mysql> SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM country_languages_2;
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY     | <derived2>      | system | NULL          | NULL    | NULL    | NULL |    1 |                          |
|  2 | DERIVED     | Country         | const  | PRIMARY       | PRIMARY | 3       |      |    1 |                          |
|  2 | DERIVED     | CountryLanguage | ref    | PRIMARY       | PRIMARY | 3       |      |    8 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+

Since views are allowed to call stored routines (Justing used this to call upon CONNECTION_ID()), and since stored routines can use session variables, we can take advantage and force the view into filtering out irrelevant rows before these accumulate to temporary tables and big joins.

Back in the customer’s office, we witnessed, what with their real data and multiple views, a reduction of query times from ~30 minutes to a few seconds.

Another kind of use

Eventually we worked to make better view definitions and query splitting, resulting in clearer code and fast queries, but this solution plays nicely into another kind of problem:

Can we force different customers to see different parts of a given table? e.g., only those rows that relate to the customers?

There can be many solutions: different tables; multiple views (one per customer), stored procedures, what have you. The above provides a solution, and I’ve seen it in use.


PlanetMySQL Voting: Vote UP / Vote DOWN

Discovery of the day: GROUP BY … DESC

Май 4th, 2010

I happened on a query where, by mistake, an

SELECT ... ORDER BY x DESC LIMIT 1

was written as

SELECT ... GROUP BY x DESC LIMIT 1

And it took me by surprise to realize GROUP BY x DESC is a valid statement. I looked it up: yep! It’s documented.

In MySQL, GROUP BY results are sorted according to the group statement. You can override this by adding ORDER BY NULL (see past post). I wasn’t aware you can actually control the sort order.


PlanetMySQL Voting: Vote UP / Vote DOWN

But I DO want MySQL to say “ERROR”!

Март 12th, 2010

MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.

  • Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though).
  • Calling CREATE TEMPORARY TABLE? You get silent commit.
  • Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error.
  • Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported.
  • Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken.
  • Inserting 300 to a TINYINT column in a relaxed sql_mode? Give me 255, I’ll silently drop the remaining 45. I owe you.

Warnings and errors

It would be nice to:

  • Have an auto_propagate_warning_to_error server variable (global/session/both) which, well, does what it says.
  • Have an i_am_really_not_a_dummy server variable which implies stricter checks for all the above and prevents you from doing with anything that may be problematic (or rolls back your transactions on your invalid actions).

Connectors may be nice enough to propagate warnings to errors – that’s good. But not enough: since data is already committed in MySQL.

If I understand correctly, and maybe it’s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that “MySQL just works out of the box and does not require me to configure or understand anything”).

MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity — or, be nice to everyone, just allow me to specify what “nice” means for me.


PlanetMySQL Voting: Vote UP / Vote DOWN

Proper SQL table alias use conventions

Март 11th, 2010

After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query.

I’m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I find to be problematic table alias use.

Using the sakila database, take a look at the following queries:

Query #1

SELECT
 R.rental_date, C.customer_id, C.first_name, C.last_name
FROM
 rental R
 JOIN customer C USING (customer_id)
WHERE
 R.rental_date >= DATE('2005-10-01')
 AND C.store_id=1;

The above looks for film rentals done in a specific store (store #1), as of Oct. 1st, 2005.

Query #2

SELECT
 F.title, C.name
FROM
 film AS F
 JOIN film_category AS S ON (F.film_id = S.film_id)
 JOIN category AS C ON (S.category_id = C.category_id)
WHERE F.length > 180;

The above lists the title and category for all films longer than three hours.

Query #3

SELECT c.customer_id, c.last_name
FROM
  customer c
  INNER JOIN address a ON (c.address_id = a.address_id)
  INNER JOIN (
    SELECT
      c.city_id
    FROM
      city AS c
      JOIN country s ON (c.country_id = s.country_id)
    WHERE
      s.country LIKE 'F%'
  ) s1 USING (city_id)
WHERE
  create_date > DATE('2005-10-01');

The above lists customers created after Oct. 1st, 2005, and who live in countries starting with an ‘F’. The query could be solved without a subquery, but there’s a good reason why I made it so.

The problems

I used very different conventions on any one of the queries, and sometimes within each query. And it’s common that I see the same on a customer’s site, what with having many programmers do the SQL coding. Again, I will only discuss the table aliases conventions. I’ll leaver the rest to the reader.

Here’s where I see problems:

  • Query #1: In itself, it looks fine. Rental turns to R, Customer turns to C. I will comment on this slightly later on when I provide my full opinion.
  • Query #2: So film turns to F, category turns to C. What should film_category turn into? Out of letters? Let’s just go for S, shall we? But S has nothing do with film_category. Yet it’s so commonly seen.
  • Query #2: We’re using the AS keyword now. We didn’t use it before.
  • Queries #1, #2: Hold on. Wasn’t C taken for customer in Query #1? Now, in Query #2 it stands for category? I’m beginning to get confused.
  • Query #3: Now aliases are lower case; I was just getting used to them being upper case.
  • Query #3: But, hey, c is back to customer!
  • Query #3: Or, is it? Take a look at the subquery. Theres another c in there! This time it’s city! And it’s perfectly valid syntax. We actually have two identical aliases in the same query.
  • Query #3: If I could, I would name country with c as well. But I can’t. So why not throw in s again?
  • Query #3: and now I don’t even bother using the alias when accessing the create_date. Well, there’s no such column in any of the other tables!

Proper conventions

What I find so disturbing is that whenever I read a complex query, I need to go back and forth, back and forth between table aliases (found everywhere in the query) and their declaration point. Such irregularities make the queries difficult to read.

Any of the above issues could be justified. But I wish to make some suggestions:

  • Decide whether you’re going for upper or lower case.
  • Do not use the same alias twice in your query, even if it’s valid.
  • Aliases do not have to be single character. film_category may just as well be FC.
  • Do not alias something that is hard to interpret. s does not stand for country.
  • Think ahead: use same aliases throughout all your queries, as far as you can. If uniqueness is a problem, make for longer aliases. Use cust instead of c.

The above should make for more organized and readable SQL code. Remember: what one programmer finds as a very intuitive alias, is unintuitive to another!

My own convention

Simple: I only use aliases when using self joins. I am aware that queries are much longer what with long table names. I go farther than that: I prefer fully qualifying questionable columns throughout the query. Yes, it makes the query even longer.

I know this does not appeal to many. But there’s no confusion. And it’s easily searchable. And it’s consistent. And if properly formatted, as in the above queries, is well readable.

Now please join me in asking Oracle if they can add multi-line Strings for java, as there are for python.


PlanetMySQL Voting: Vote UP / Vote DOWN