Archive for the ‘data-types’ Category

23:59:59 is not the end of the day. No, really!

Июль 27th, 2011

How would you check whether some TIMESTAMP column falls within a given day, say July 26th, 2011?

This is a question I tend to ask students, and usually I get the same range of answers. Some are wrong; some cannot utilize an index, some are correct, and some are temporarily correct. I wish to take a closer look at the last. Such an answer looks like this:

SELECT * FROM some_table WHERE timstamp_column BETWEEN '2001-07-26 00:00:00' AND '2001-07-26 23:59:59'

Yikes! I get my allergies when I see this one.

Technically this seems correct. And it seems to work so far for people. There are two things that disturb me:

  1. ’23:59:59′ refers to the beginning of the last second of the day. This means a full second is missing in concept. No one would write such a code on a C, Java or PHP application: that would be a newbie’s mistake. But people feel at ease doing it with SQL
  2. Of course, this works when timestamps are in a one second resolution. Which leads me to:

  3. One second resolution is a temporary issue. Already MariaDB 5.3 presents with microsecond support for NOW() and TIMESTAMP, TIME and DATETIME columns. Sometime in the not-so-far future the standard MySQL distribution will have that, too.

Soon people will start losing data, reporting wrong numbers, perform incorrect aggregations. My advise:

grep -iR "23:59:59" /path/to/application/code/*

And get rid of it. A good alternative would be:

SELECT * FROM some_table WHERE timstamp_column >= DATE('2001-07-26') AND timestamp_column < (DATE('2001-07-26') + INTERVAL 1 DAY)

(Why use all this INTERVAL stuff? Think 28/29/30/31 days in month and get the answer)

And do not use LIKE ’2011-07-26 %’. Read this if you’re not sure why.


PlanetMySQL Voting: Vote UP / Vote DOWN

Announcing common_schema: common views & routines for MySQL

Июль 13th, 2011

Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.

What does it do?

There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes & transactions, security, internals… There are basic functions answering for common needs.

Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.

Here are a few highlights:

There’s more. Take a look at the common_schema documentation for full listing. And it’s evolving: I’ve got quite a few ideas already for future components.

Some of these views rely on heavyweight INFORMATION_SCHEMA tables. You should be aware of the impact and risks.

What do I need to install?

There’s no script or executable file. It’s just a schema. The distribution in an SQL file which generates common_schema. Much like a dump file.

What are the system requirements?

It’s just between you and your MySQL. There are currently three distribution files, dedicated for different versions of MySQL (and allowing for increased functionality):

  • common_schema_mysql_51: fits all MySQL >= 5.1 distributions
  • common_schema_innodb_plugin: fits MySQL >= 5.1, with InnoDB plugin + INFORMATION_SCHEMA tables enabled
  • common_schema_percona_server: fits Percona Server >= 5.1

Refer to the documentation for more details.

What are the terms of use?

common_schema is released under the BSD license.

Where can I download it?

On the common_schema project page. Enjoy it!


PlanetMySQL Voting: Vote UP / Vote DOWN

Announcing common_schema: common views & routines for MySQL

Июль 13th, 2011

Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.

What does it do?

There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes & transactions, security, internals… There are basic functions answering for common needs.

Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.

Here are a few highlights:

There’s more. Take a look at the common_schema documentation for full listing. And it’s evolving: I’ve got quite a few ideas already for future components.

Some of these views rely on heavyweight INFORMATION_SCHEMA tables. You should be aware of the impact and risks.

What do I need to install?

There’s no script or executable file. It’s just a schema. The distribution in an SQL file which generates common_schema. Much like a dump file.

What are the system requirements?

It’s just between you and your MySQL. There are currently three distribution files, dedicated for different versions of MySQL (and allowing for increased functionality):

  • common_schema_mysql_51: fits all MySQL >= 5.1 distributions
  • common_schema_innodb_plugin: fits MySQL >= 5.1, with InnoDB plugin + INFORMATION_SCHEMA tables enabled
  • common_schema_percona_server: fits Percona Server >= 5.1

Refer to the documentation for more details.

What are the terms of use?

common_schema is released under the BSD license.

Where can I download it?

On the common_schema project page. Enjoy it!


PlanetMySQL Voting: Vote UP / Vote DOWN

TIMESTAMP vs. DATETIME, which should I be using?

Май 22nd, 2011

They both look the same, have roughly the same use. Nevertheless, they differ in many respects. I’ll note the difference, and note a few pitfalls and peculiarities.

Range

TIMESTAMP starts with the epoch, ’1970-01-01 00:00:01′ UTC and ends with ’2038-01-19 03:14:07′ UTC. This is all very nice today, and may actually hold up till our retirement, but the younger readers may yet have to deal with the Bug2K+38™, certain to arrive.

DATETIME starts with ’1000-01-01 00:00:00′ and lasts up to ’9999-12-31 23:59:59′. More on that later.

In respect of range, your current event logs may well use a TIMESTAMP value, while your grandfather’s and granddaughter’s birth dates may require DATETIME.

In general I would suggest that anything that relates to now, can be a TIMESTAMP. A new entry is added? It is added now. It can be represented by a TIMESTAMP. Anything has an expiry time of a few minutes, perhaps a few days or a month? You’ll be safe using it up till late 2037. Anything else had better use a DATETIME. In particular, dates relating to issues such as birth, insurance, the market etc. fall into this category.

History, however, does not even hold up with DATETIME. Rome fell long before MIN(DATETIME). You will have to manage your own. Not even YEAR will help you out.

Storage

TIMESTAMP makes for 4 bytes. DATETIME makes for 8 bytes. Now that we have this behind us, let’s see why.

Internal representation

A TIMESTAMP is merely the number of elapsed seconds since the epoch. It is a number. Not only is this a number, it is an incrementing number, and without gaps. Ever ascending. It actually equals the UNIX_TIMESTAMP() function.

A DATETIME is more of an oddity. From the manual:

  • DATETIME: Eight bytes:
    • A four-byte integer packed as YYYY×10000 + MM×100 + DD
    • A four-byte integer packed as HH×10000 + MM×100 + SS

Huh?

There’s nothing sequential about a DATETIME. The value which follows 20110307095159 is 20110307095200. It’s like the string representation without any delimiters. See the following (ignore the fraction, it’s not really there):

root@mysql-5.1.51> SELECT NOW()+0;
+-----------------------+
| NOW()+0               |
+-----------------------+
| 20110307095238.000000 |
+-----------------------+

The implication of this internal representation is that conversions must be made back and forth. When you want to add 5 seconds to a DATETIME, MySQL cannot simply add 5 to some number, but must make for a more complicated calculation.

Peculiarities

Now here’s a question: why does DATETIME start with ’1000-01-01 00:00:00′? Can you provide with a convincing argument? The only argument I can find is hardly convincing.

Did you ever need to type in a password with your phone? Say, for your Credit Card company, or your Bank? Did you get recorded instructions saying “You must type six digits. You must not repeat the same digits three successive times“? That’s somewhat nice, and probably a good advice. Did you ever get told “The first digit must not be 0“? That’s just stupid. This means we’re giving up on 100,000 valid passwords, thereby reducing our search space by 10%!

I suspect in both cases the reasoning is the same: if you start with a 0, and we represent it as a number, it won’t make up for the same number of digits we intended it to.

In the case of your Credit Card company, there is no excuse. What’s the problem with padding with zeros till you make those six digits?

In the case of MySQL’s DATETIME, I can see a weak reasoning. Read this page on the docs to find out that a DATETIME can be represented in many forms, and can be automatically deduced from a string in many formats. Both strings ’110307095100′ and ’20110307095100′ make for the same DATETIME value. It is based on the number of characters in your text, that the parsing decides how to act.

The fact the first year is 1000 makes the number of digits predictable.

My thoughts?

I don’t see why I should care about automatically converting texts of different formats to a DATETIME. As a programmer, I’m perfectly content with strict typing. I don’t mind passing around only texts of the form ‘YYYY-MM-DD HH:MM:SS’ (or, better yet, passing date objects and letting my connector do the translation). Moreover, I prefer it that way! It makes me feel safer, that I haven’t passed a wrong text by mistake, to be silently accepted.

For this reason I don’t like the idea of losing the ability to use DATETIME on 1,000 lost years. As far as I can tell, this is a MySQL specific issue; there is no ANSI SQL for DATETIME.


PlanetMySQL Voting: Vote UP / Vote DOWN

Implicit casting you don’t want to see around

Июль 7th, 2010

In Beware of implicit casting, I have outlined the dangers of implicit casting. Here’s a few more real-world examples I have tackled:

Number-String comparisons

Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:

mysql> SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql> SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
|           0 |
+-------------+

The second query consists of pure strings comparison. I t has no way to determine that number comparison should be made.

Direct DATE arithmetics

The first query seems to work, but is completely incorrect. The second explains why. The third is a total mess.

mysql> SELECT DATE('2010-01-01')+3;
+----------------------+
| DATE('2010-01-01')+3 |
+----------------------+
|             20100104 |
+----------------------+
1 row in set (0.00 sec)

mysql> SELECT DATE('2010-01-01')-3;
+----------------------+
| DATE('2010-01-01')-3 |
+----------------------+
|             20100098 |
+----------------------+
1 row in set (0.00 sec)

mysql> SELECT '2010-01-01' - 3;
+------------------+
| '2010-01-01' - 3 |
+------------------+
|             2007 |
+------------------+
1 row in set, 1 warning (0.00 sec)

Number-String comparisons, big integers

Look at the following crazy comparisons:

mysql> SELECT 1234 = '1234';
+---------------+
| 1234 = '1234' |
+---------------+
|             1 |
+---------------+

mysql> SELECT 123456789012345678 = '123456789012345678';
+-------------------------------------------+
| 123456789012345678 = '123456789012345678' |
+-------------------------------------------+
|                                         0 |
+-------------------------------------------+

mysql> SELECT 123456789012345678 = '123456789012345677';
+-------------------------------------------+
| 123456789012345678 = '123456789012345677' |
+-------------------------------------------+
|                                         1 |
+-------------------------------------------+

The amazing result of the last two comparisons may strike as odd. Actually, it may strike as a bug, and indeed when a customer approached me with this behavior I was at loss for words. But this is documented. The manual describes the cases for casting, then states: “… In all other cases, the arguments are compared as floating-point (real) numbers. …”

Lessons learned:

  • Be careful when comparing strings with floating point values. Matching depends on how both are represented.
  • Avoid converting temporal types to strings when doing date manipulation.
  • Avoid direct math on temporal types.
  • Avoid casting BIGINTs represented by strings. Casting will turn out to use FLOATs and may be incorrect.

Last but not least:

  • Use the proper data types for your data’s representation. When dealing with numbers, use numbers. When dealing with temporal values, use temporal types.

PlanetMySQL Voting: Vote UP / Vote DOWN

Choosing MySQL boolean data types

Июнь 3rd, 2010

How do you implement True/False columns?

There are many ways to do it, each with its own pros and cons.

ENUM

Create you column as ENUM(‘F’, ‘T’), or ENUM(‘N’,'Y’) or ENUM(‘0′, ‘1′).

This is the method used in the mysql tables (e.g. mysql.user privileges table). It’s very simple and intuitive. It truly restricts the values to just two options, which serves well. It’s compact (just one byte).

A couple disadvantages to this method:

  1. Enums are represented by numerical values (which is good) and start with 1 instead of 0. This means ‘F’ is 1, and ‘T’ is 2, and they both translate to True when directly used in a booleanic expression (e.g. IF(val, ‘True’, ‘False’) always yields ‘True’)
  2. There’s no real convention. Is it ‘Y’/'N’? ‘T’/'F’? ‘P’/'N’? ‘1′/’0′?

CHAR(1)

Simple again. Proposed values are, as before, ‘F’, ‘T’ etc. This time there’s no way to limit the range of values. You cannot (in MySQL, unless using triggers) prevent an ‘X’.

Watch out for the charset! If it’s utf8 you pay with 3 bytes instead of just 1. And, again, ‘T’, ‘F’, ‘Y’, ‘N’ values all evaluate as True. It is possible to use the zero-valued character, but it defeats the purpose of using CHAR.

CHAR(0)

Many are unaware that it’s even valid to make this definition. What does it mean? Take a look at the following table:

CREATE TABLE `t1` (
 `bval` char(0) DEFAULT NULL
);
mysql> INSERT INTO t1 VALUES ('');
mysql> INSERT INTO t1 VALUES ('');
mysql> INSERT INTO t1 VALUES (NULL);

mysql> SELECT * FROM t1;
+------+
| bval |
+------+
|      |
|      |
| NULL |
+------+

NULLable columns cause for an additional storage per row. There’s one bit per NULLable column which notes down whether the column’s value is NULL or not. If you only have one NULLable column, you must pay for this bit with 1 byte. If you have two NULLable columns, you still only pay with 1 byte.

Furthermore:

mysql> SELECT bval IS NOT NULL FROM t1;
+------------------+
| bval IS NOT NULL |
+------------------+
|                1 |
|                1 |
|                0 |
+------------------+

So this plays somewhat nicely into booleanic expressions.

However, this method is unintuitive and confusing. I personally don’t use it.

TINYINT

With integer values, we can get down to 0 and 1. With TINYINT, we only pay with 1 byte of storage. As with CHAR(1), we cannot prevent anyone from INSERTing other values. But that doesn’t really matter, if we’re willing to accept that 0 evaluates as False, and all other values as True. In this case, boolean expressions work very well with your column values.

BOOL/BOOLEAN

These are just synonyms to TINYINT. I like to define my boolean values as such. Alas, when issuing a SHOW CREATE TABLE the definition is just a normal TINYINT. Still, it is clearer to look at if you’re storing your table schema under your version control.


PlanetMySQL Voting: Vote UP / Vote DOWN

The actual range and storage size of an INT

Март 29th, 2010

What’s the difference between INT(2) and INT(20) ? Not a lot. It’s about output formatting, which you’ll never encounter when talking with the server through an API (like you do from most app languages).

The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.

At Open Query we tend to pick on things like INT(2) when reviewing a client’s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere – even if not in the exact spot where we pick on it. So it’s a case of pattern recognition.

A very practical example of this comes from a client I worked with last week. I first spotted some harmless ones, we talked about it, and then we hit the jackpot: INT(22) or something, which in fact was storing a unix timestamp converted to int by the application, for the purpose of, wait for this, user’s birth date. There’s a number of things wrong with this, and the result is something that doesn’t work properly.

Currently, the unix epoc/timestamp when stored in binary is a 32 bit unsigned integer, with a range from 1970-01-01 to somewhere in 2037. Note the unsigned qualifier, otherwise it already wraps around 2004.

  • if using signed, you’d currently only find out with users younger than 7 or so. You may be “lucky” to not have any, but kids are tech savvy so websites and systems in general may well have entries with kids younger than that.
  • using a timestamp for date-of-birth tells me that the developers are young ;-) well that’s relative, but in this: younger than 40. I was born in 1969, so I am very aware that it’s impossible to represent my birthdate in a unix timestamp! What dates do you test with? Your own, and people around you. ‘nuf said.
  • finally, INT(22) is still an INT, which for MySQL means 32 bits (4 bytes) and it happened to be signed also.

So, all in all, this wasn’t going to work. Exactly what would fail where would be highly app code (and date) dependent, but you can tell it needs a quick redesign anyway.

I actually suggested checking the requirements whether having just a year would suffice for the intended use (can be stored in a YEAR(4) field), this reduces the amount of personal data stored and thus removes privacy concerns. Otherwise, a DATE field which can optionally be allowed to not have a day-of-month (i.e. only ask for year/month) as that again can be sufficient for the intended purpose.


PlanetMySQL Voting: Vote UP / Vote DOWN

But I DO want MySQL to say “ERROR”!

Март 12th, 2010

MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.

  • Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though).
  • Calling CREATE TEMPORARY TABLE? You get silent commit.
  • Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error.
  • Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported.
  • Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken.
  • Inserting 300 to a TINYINT column in a relaxed sql_mode? Give me 255, I’ll silently drop the remaining 45. I owe you.

Warnings and errors

It would be nice to:

  • Have an auto_propagate_warning_to_error server variable (global/session/both) which, well, does what it says.
  • Have an i_am_really_not_a_dummy server variable which implies stricter checks for all the above and prevents you from doing with anything that may be problematic (or rolls back your transactions on your invalid actions).

Connectors may be nice enough to propagate warnings to errors – that’s good. But not enough: since data is already committed in MySQL.

If I understand correctly, and maybe it’s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that “MySQL just works out of the box and does not require me to configure or understand anything”).

MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity — or, be nice to everyone, just allow me to specify what “nice” means for me.


PlanetMySQL Voting: Vote UP / Vote DOWN

What data types does your innovative storage engine NOT support?

Сентябрь 29th, 2009

I’ve been investigating a few different storage engines for MySQL lately, and something I’ve noticed is that they all list what they support, but they generally don’t say what they don’t support. For example, Infobright’s documentation shows a list of every data type supported. What’s missing? Hmm, I don’t see BLOB, BIT, ENUM, SET… it’s kind of hard to tell, isn’t it? I don’t have an encyclopedic list of all the MySQL data types in my head. The same thing is true of the list of functions that are optimized inside Infobright’s own code instead of at the server layer. I can see what’s optimized, but I can’t see whether FUNC_WHATEVER() is optimized without scanning the page — and there’s no list of un-optimized functions.

I don’t mean to pick on Infobright. I’ve recently looked at another third-party storage engine and they did exactly the same thing. It’s just that the docs I saw weren’t public as far as I know, so I can’t mention them by name.

For a product like this, I think the most helpful thing would be a page explaining two things: 1) the enhancements or extra functionality over the standard MySQL server, and 2) the unavailable or degraded functionality. It would also be good to have both high-level and detailed versions of this.

Related posts:

  1. The Ma.gnolia data might not be permanently lost I keep rea
  2. 50 things to know before migrating Oracle to MySQL A while ba
  3. PostgreSQL adds windowing functions and common table expressions As Hubert

Related posts brought to you by Yet Another Related Posts Plugin.


PlanetMySQL Voting: Vote UP / Vote DOWN