Archive for the ‘MySQL 101’ Category

MySQL 101 — Replication

Ноябрь 1st, 2011

So far we've looked at many aspects of MySQL, not in any great depth, but hopefully with enough information to get you started and whet your appetite for more.  Now we start to look into areas that aren't in the basic tutorials.

Replication is the technology that allows data to be stored on multiple servers. Typically this is used in "scale out" applications.  "Scale out" is used in contrast to "Scale up" where to scale a solution you buy a bigger box to run it on, where "scale out" means you buy more boxes.  Each has its benefits and drawbacks, with the usual benefit of scale out being that you get more bang for your buck.

The way replication works in MySQL is pretty simple.  One server is identified as the master, and writes every transaction to a file, the binary log.  Other servers (and there may be many) act as slaves and request information from the master. The slave keeps track of where it got up to and asks the master for the next transaction in the file.  In general the master doesn't know or care where the slave is up to, it just sends out the requested transaction to whoever has the right credentials to ask for it.

On the slave there are two threads running, one that requests events from the master binlog (binary log) and writes them to the relay log (which in reality is just another binlog), and the second thread that reads the relay log and executes the queries. In order to avoid non-deterministic outcomes the SQL thread is just that, a single thread.  Long running events can result in the replication lagging well behind the state of the master.

This is the traditional asynchronous replication.  Newer versions of MySQL now support semi-synchronous replication as well.  In this mode the master will not return control back to the querying process until at least one slave reports that it has received the transaction events and written them to its own relay log.  Note that this is written to the relay log but not necessarily written to the database.

You may wonder under what circumstance replication would be of use.  If your application is mainly read-intensive (as many web applications are) then replication gives you the ability to spread reads across multiple slaves to reduce contention.  If your application is mainly write intensive, then replication to multiple slaves will not, of itself, give you any relief.  Indeed, because the replication is single-threaded and all writes on the master must be written to the slaves, you can in fact hurt performance with a replicated setup.

What about more than one master?  This is possible but requires careful thought and planning.  MySQL replication keeps track of which server initiated the transaction, so "circular replication" where a slave acts also as a master should not cause problems, however there are some serious issues to consider.  Take for instance a customer table with the customer ID being auto incremented.  What happens if there are writes to both masters in a master<->master replication setup?  Both will grab the next ID in sequence, then the other will pull that transaction and try and will end up with a duplicate index error, causing replication to fail.  You can get around this by setting the initial auto_increment value on each server and setting the auto_increment_increment value to the number of masters in use.  Remebering however that both masters now have to write all traffic that goes to the pair, there is not a great deal of benefit in such an arrangement.  Other technologies, such as sharding, turn out to offer better scale out opportunities for multi-master deployments.

There are, however, some tools to make multi-master work more easily.  MMM (Multi-Master Replication Manager for MySQL) is one.  MHA (Master High Availability for MySQL) is somewhat different, allowing easy management of master failover.

That is probably enough theory, lets look at how to build a replication system.

For the master, the only requirement is that it writes its transactions to a binary log, and that it has a server ID set.  To do this we need to add the following to the my.cnf file:

server-id = 1
log_bin = /var/log/mysql/mysql-bin.log

The server-id is an integer that must be unique for all servers in the same replication set.  This is because of the tracking of transactions to avoid circular replication errors.  The log_bin is the path to the binary log file to use.  Each file will be appended with a sequence number to allow the server to rotate log files when they grow too large, or a flush request is received.

One other value that is useful is to set an expiry on the binlogs so that they don't take over all of your disk.  You need to ensure that you have enough binlog capacity for normal replication lag and for handling backup/recovery, but beyond that you can get MySQL to automatically delete the older binlogs.

expire_log_days = 14

Now the master will at least be writing transactions to the binary logs.  Now we have to set up to allow slaves to replicate.  First step is to create a user that the slaves will connect with.  This user must have REPLICATION SLAVE permissions to be able to read the binlog.  Note that this user is created on the master. As a privileged user, in the MySQL client:

CREATE USER 'replicator'@'192.168.%' IDENTIFIED BY 'mypass';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'192.168.%';

The first command creates our user, the second grants the required permissions.  Note that I've used a wildcarded network address, you can also use a hostname wildcard.

For the slave we need a few things, first we need to set a server-id that is not the same as the master (or any other slave we are creating). Then we need a copy of the data on the master at the point at which the slave is to be started, and we need to then tell the slave about the master.

We can create the server-id in the same manner as we did for the master - but we don't need to specify a binlog (unless we are acting as a relay master).

server-id = 2

To get our first copy of the master data, the best idea is to use mysqldump.  Before we do, we need to stop the master from writing to the binary log and record its current position.  From the mysql command line:

FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000092 | 106 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

We need to keep this session open so that the lock remains in place, and from another terminal run a mysqldump.

mysqldump --all-databases > all_db.sql

Once the backup is completed we can either drop the lock session or issue:

UNLOCK TABLES;

Great, we now have a backup and we have the position in the binary log at the point at which the backup occurred.  These two together provide us with all we need to create our slave.

First step is to load the above SQL file into the slave.   On the slave you can use the mysql command line client with the SQL file copied from the master.

mysql < all_db.sql

On the slave server we now need to issue a CHANGE MASTER command to tell it where the master is, and where to start replicating from.  This information will be written to the master.info file that is read at startup and updated by the replication process so that the slave will be able to resync with the master on restart.

CHANGE MASTER TO MASTER_HOST = 'master'
MASTER_USER = 'replicator'
MASTER_PASSWORD = 'mypass'
MASTER_LOG_FILE = 'mysql-bin.000092'
MASTER_LOG_POS = 106;

The MASTER_HOST, MASTER_USER and MASTER_PASSWORD must match the master and the user created above with replication slave permission.  The MASTER_LOG_FILE and MASTER_LOG_POS come from the SHOW MASTER STATUS used when we did the backup of the master.

Now all that is left to do is to start the slave processes:

START SLAVE;

We can use the same backup and the same processes above to create as many slaves as we need.

If you have to set up a complex replication scenario that is not covered by the above, you may find the SkySQL Reference Architecture provisioning system useful.  This service creates master/slave high availability configurations and packages software and configurations together so that you can easily install a system from scratch that meets all your replication needs.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — More Transactions

Октябрь 17th, 2011

In our last episode we looked at transactions and how to create them.  In this episode I'll look at some of the implications of transactions, especially in a web application.

Transactions and Replication

We will discuss replication in depth later in the series, however it is sufficient for the moment to say that replication allows you to copy data in near real-time between MySQL servers and keep them synchronised.  What gets transferred are the changes that are made to your tables and data. So what about rolled-back (aborted) transactions?  Since the state after the rollback is essentially the same as the state before the transaction started, there seems little point in replicating those statements.  Indeed they are not replicated. Only completed transactions are.

I briefly mentioned that there is an autocommit setting that means that without using a transaction block, every query becomes its own transaction.  In the light of replication this makes a lot of sense.

Transaction Duration

Transactions may need to lock resources so that concurrent changes don't clobber each other, while allowing other queries access to the data that may be subject to a transaction, in a consistent manner. How this is done depends on the transaction isolation level, but in all levels you can be sure that inserting into a primary key will lock that index value out from other transactions doing the same thing.  In general this means that the second transaction will block until the first transaction is either committed or rolled back.  For web, having a locked resource hanging around for any length of time is a real killer.  This can be avoided by keeping transactions as short as possible, and to be careful of using range-based operations.

Let us say that we have two people trying to buy the same book.  As you saw from the previous episode the transaction we've defined does a number of operations, all of which need to complete before the book can be bought.  They are:

  • Create a purchase order record
  • Create purchase order item records for the book(s) purchased
  • Update the summary field on the purchase order record
  • Decrement the stock count for the book(s) purchased

If you look at those steps, you could see how a web application designer might get themselves into trouble.  If you created a purchase order and allow multiple books to be bought, then you may be tempted to keep the transaction open while the user browses and gets all their books purchased.  This would be a complete disaster.  It would mean there was an open-ended time the transaction was held, with locks on things like the book record (for decrementing the count) and therefore the entire system would almost certainly become deadlocked at some point.  This is where you need to be very aware of what the logic of the application is and how to bound the transactions.

In the above case we could handle this in a number of ways.  The easiest would be to create the purchase order record as a separate transaction, then for each book purchased, do the remaining three steps in another transaction.  Another way would be to separate the actual purchase order creation from the selection process.  You'll have seen this in process in a number of sites - where you have a Shopping Cart that you add books to and it isn't until checkout that it all ends up as a single purchase order.

No matter which way you do it - remember that you should never create a transaction that is unbounded in time. And on the web that means never creating a transaction that needs user interaction within the transaction.

Short Circuited Transactions

There are some operations that, if placed within a transaction, will perform an implicit commit on the statements that have preceeded it, effectively short-circuiting the transaction and creating a multiple transaction set.  Things like DDL (Data Definition Language) statements that modify the table structure or add indexes will be run in their own transaction, causing an implicit commit of any statements preceeding them.  DDL statements, in general, cannot be rolled back as they run in their own transaction.

More information on the caveats with transactions can be found in the online manual reference.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — Transactions

Сентябрь 26th, 2011

We've now come a long way since our first steps at creating our online bookshop database. Now we need to start to think about how to sell the books and store details about the sales.  This is the time we need to start understanding database transactions.

Database transactions are very similar to real world transactions. They define a set of steps required to happen together in order for a transaction to be complete.  A real-world example might be that you buy a trinket from a store.  You find the trinket, then take it to the counter, find out the price, hand over the cash and receive your trinket.  That is a completed transaction.  Should you not have the available cash, the transaction would not be able to be completed and you would need to return the item, effectively rolling back that transaction.  It is not a great deal different in the database, except that the transaction can be controlled to a greater degree.

Before going too far, we need to build a few more tables for our database to enable us to sell our books.  To make life easier I've created these tables in this downloadable file.  You can use your favourite MySQL client to create the database for the purposes of following this episode.  The tables I've created are customer, purchase_order and order_item.  We'll see how they work in handling a typical transaction - we can use SHOW CREATE TABLE to see how they are defined:

SHOW CREATE TABLE `customer`\G
*************************** 1. row ***************************
Table: customer
Create Table: CREATE TABLE `customer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET latin1 NOT NULL,
`address1` varchar(80) CHARACTER SET latin1 NOT NULL,
`address2` varchar(80) CHARACTER SET latin1 DEFAULT NULL,
`postcode` varchar(8) CHARACTER SET latin1 DEFAULT NULL,
`city` varchar(80) CHARACTER SET latin1 DEFAULT NULL,
`country` varchar(80) CHARACTER SET latin1 DEFAULT NULL,
`state` varchar(20) CHARACTER SET latin1 DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

SHOW CREATE TABLE `purchase_order`\G
*************************** 1. row ***************************
Table: purchase_order
Create Table: CREATE TABLE `purchase_order` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`customer` int(11) NOT NULL DEFAULT '0',
`order_date` datetime NOT NULL,
`ship_date` datetime DEFAULT NULL,
`order_total` decimal(9,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_order_customer` (`customer`),
CONSTRAINT `fk_order_customer` FOREIGN KEY (`customer`) REFERENCES `customer` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

SHOW CREATE TABLE `order_item`\G
*************************** 1. row ***************************
Table: order_item
Create Table: CREATE TABLE `order_item` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purchase_order` int(11) NOT NULL,
`book` int(11) NOT NULL,
`quantity` int(11) NOT NULL DEFAULT '0',
`price` decimal(5,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_item_order` (`purchase_order`),
KEY `fk_item_book` (`book`),
CONSTRAINT `fk_item_order` FOREIGN KEY (`purchase_order`) REFERENCES `purchase_order` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk_item_book` FOREIGN KEY (`book`) REFERENCES `book` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

You'll notice a few things that may not immediately make sense.  In most cases you don't want duplicate data in different tables.  This is why we break out things like publisher from book, for instance.  However in purchase_order we have an order_total value that could be derived from the order_item records, and similarly we have a price field in order_item that is the same as the price field in book.  There is a lot of sense in doing this however.  Because we are dealing with financial transactions we must ensure they can stand alone.  So we need to ensure that the price of the book is not whatever the current price is, but the price it was at the  time it was sold.  This is why we have the price field in two places as they are not exactly the same.  The order_total field is a summary data field.  It means that we don't necessarily have to have a complicated multi-table join if we want to find how much a customer has bought.  Summary data fields are often used to speed up common queries by simplifying them.

One other field I've added is the stock_on_hand value in the book table.  This will allow us to set a count of books available for sale and decrement them when we sell something.

If we are to sell a book to a customer, we can create the customer record at any time because it won't matter if they don't buy straight away, so customer is not tied directly to a sale, so we can keep it out of our transaction.  When they purchase a book, however, we need to do a number of steps that have to be kept together.   If we don't we could end up with inconsistent data.

Lets go with an example.  I'll create a new customer and set a stock_on_hand value for Peter Temple's The Broken Shore:

INSERT INTO `customer` SET `name` = 'Adam Donnison',
 `address1` = '41 Bendigo Street',
 `city` = 'Collingwood',
 `state` = 'VIC', `postcode` = '3066', `country` = 'Australia';
Query OK, 1 row affected (0.00 sec)

UPDATE `book` SET `stock_on_hand` = 5 WHERE `id` = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

It is important to realise at this point that every single INSERT, UPDATE or other data manipulation statement that we have performed so far has been a transaction in its own right.  This implicit transaction is as a result of the default value of the MySQL server variable 'autocommit', which is normally turned on.  If this is turned off we need to explictly make all queries part of a transaction.  More confusingly if you issue a statement that changes data, a transaction will be automatically started, but you need to commit the transaction for it to be permanently recorded in the database.  But let us look at the anatomy of a simple transaction.

In this example we need to ensure the following happen at effectively the same time:

  • Decrement the stock on hand value of the book sold by the quantity sold
  • Create a purchase order entry
  • Create an entry for each book in the order_item table
  • Ensure the purchase order total is the total of the items in the order
INSERT INTO `purchase_order` SET `customer` = 1, `order_date` = NOW();
Query OK, 1 row affected (0.00 sec)

SET @po_number = LAST_INSERT_ID();
Query OK, 0 rows affected (0.00 sec)

INSERT INTO `order_item` SET `purchase_order` = @po_number,
 `book` = 1,
 `price` = (SELECT `price` FROM `book` WHERE `id` = 1),
 `quantity` = 1;
Query OK, 1 row affected (0.00 sec)

UPDATE `purchase_order` SET `order_total` =
 ( SELECT SUM(`price` * `quantity`) FROM `order_item` WHERE `purchase_order` = @po_number )
 WHERE `id` = @po_number;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1

UPDATE `book` SET `stock_on_hand` = `stock_on_hand` - 1 WHERE `id` = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

COMMIT;
Query OK, 0 rows affected (0.00 sec)

There are a few constructs here that are potentially confusing, so we need to go through each one and make sure everything is clear.

First, the SET @po_number = LAST_INSERT_ID() uses a MySQL function to find the ID of the record that we just inserted (the id field for the purchase_order record) and assigns it to a variable, @po_number, that we can use in subsequent queries. The reason we need to do this is to ensure we get an order_item record correctly associated with our newly created purchase order.

We've also used sub-selects. These are stand-alone select queries that are used to provide values for other queries.  In the first case it is a simple select to get the price field from the book table and insert it as the selling price for the order_item.  The second one uses an aggregation function, SUM(`price` * `quantity`) to find all of our order_item records (in this case, one) and determine what the purchase order total is.  You are more likely to be doing this from an application so you probably wouldn't do a sum up.  Alternatively you could use a separate variable that you updated as you added order_item records.  Regardless, what we have done is a set of queries that together make up a transaction.

If at any point in the transaction we decided that we had an error and we couldn't continue (perhaps we found the customer record was incorrect, or the application received an error) we can use ROLLBACK to abort the transaction and wind the database back to the point it was before we issued the START TRANSACTION.  By default if we were to drop the connection to the database (by closing the connection or a terminal failure in our application) or if the database server failed, any pending transaction, that is one that was STARTed but never COMMITted, would be rolled back.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — Referential Integrity

Сентябрь 19th, 2011

In our last episode we learned how to modify data and table definitions.? This will come in handy as we look at building in referential integrity constraints into our database.? To begin we will need the database definition resulting from last episode's changes. You can download it here ?

A word on Storage Engines

Before we can begin we need to understand a little about MySQL Storage Engines.? MySQL actually does only part of the job of parsing SQL, creating query plans, executing them and returning data sets.? Where the data is stored and retrieved there is a Storage Engine at work.? The original storage engine was MyISAM, based on the industry stalwart of ISAM (Indexed Sequential Access Method).? The idea behind MyISAM was (and is) that it is a low overhead, low complexity storage engine.? It gained rapid early adoption because of its simplicity and ease of use.? Since then other storage engines have been developed, with InnoDB being the most commonly used in anything other than simple web apps.? There are others, for instance the CSV engine for interfacing with CSV files, the Blackhole engine for interfacing with nothing and the Federated engine for interfacing with just about anything.

A database may contain tables that use a number of different storage engines, although each table can only have a single storage engine managing its data.

The big difference between MyISAM and InnoDB is that InnoDB is a fully transactional storage engine, and supports more of the ANSI SQL standard related to referential integrity and transactions. Prior to MySQL 5.5 MyISAM was the default storage engine but MySQL 5.5 and later now have InnoDB as the default.? No matter what the default, you can always change it for a particular table. While we will deal with transactions later in the series we need to ensure our tables are using InnoDB.

Using the SHOW CREATE TABLE syntax we can determine what engine our tables are using

SHOW CREATE TABLE `book` \G
*************************** 1. row ***************************
Table: book
Create Table: CREATE TABLE `book` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(200) NOT NULL,
`author` int(11) NOT NULL DEFAULT '0',
`publisher` int(11) NOT NULL DEFAULT '0',
`format` int(11) NOT NULL DEFAULT '0',
`published_date` date DEFAULT NULL,
`isbn` varchar(13) DEFAULT NULL,
`price` decimal(5,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`),
KEY `author` (`author`),
KEY `publisher` (`publisher`),
KEY `format` (`format`)
) ENGINE=MyISAM AUTO_INCREMENT=9 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Here we used the \G command to the mysql command line client so that it returns the data in a row format rather than the columnar format used by default.? Since we have only one field, this makes the output cleaner.

Most of what is presented will be familiar, what follows the closing bracket is a set of attributes that have been picked up by default (as we didn't specify these when we created the tables).? The first of these, and most important for our current purposes, is the ENGINE attribute.? In the output we can see that it is using MyISAM.? This is not good for data integrity, as MyISAM doesn't support any of the referential integrity constraints, nor does it support ACID transactions.? We need to change this.

ALTER TABLE `book` ENGINE=InnoDB;
Query OK, 8 rows affected (0.34 sec)
Records: 8 Duplicates: 0 Warnings: 0

When changing engines, be aware that it will need to rebuild the data.? For large tables this can quite some time.? We need to do this for all of our tables, so we need to use the ALTER TABLE for author, format and publisher.

 ALTER TABLE `author` ENGINE=InnoDB;
ALTER TABLE `format` ENGINE=InnoDB;
ALTER TABLE `publisher` ENGINE=InnoDB;

I've omitted the output from MySQL to be more concise.? Now we can look at referential integrity.

TIP: In the MySQL command line client on most systems you can use the up-arrow to retrieve the last comand run, and edit it in place, saving having to retype large amounts of text if the new command is similar to the last.

Referential Integrity

We now have data in several tables and that data is interdependent. For instance, if we were to delete an author, then we would have records in the book table that would be orphaned.? We would therefore need logic in our application to be able to ensure the corresponding book records were deleted, or indeed to prevent the deletion of the author record in the first place.? Rather than having to write and test all of this code, we could ask the database to look after this for us.? This ensures our data will always be consistent and follow clearly defined rules that allows our application logic to worry about other things.

To enable these features we need to identify a parent and child relationship between tables, and in the child table we need to build the constraints to control how changes to the parent are reflected in the child.? To do this we need to identify an indexed field in the child that references a unique field in the parent.? We have this situation already in that we have a primary key on each of our tables that satisfies the unique key requirement, and we have indexes on those fields in the book table that reference the primary keys of its parents (the author, publisher and format).? So we need to add a constraint, called a FOREIGN KEY constraint on the book table:

ALTER TABLE `book`
?ADD CONSTRAINT `fk_book_author` FOREIGN KEY (`author`)
?REFERENCES `author` (`id`)
?ON DELETE RESTRICT
?ON UPDATE CASCADE,
?ADD CONSTRAINT `fk_book_publisher` FOREIGN KEY (`publisher`)
?REFERENCES `publisher` (`id`)
?ON DELETE RESTRICT
?ON UPDATE CASCADE,
?ADD CONSTRAINT `fk_book_format` FOREIGN KEY (`format`)
?REFERENCES `format` (`id`)
?ON DELETE RESTRICT
?ON UPDATE CASCADE;
Query OK, 8 rows affected (0.37 sec)
Records: 8 Duplicates: 0 Warnings: 0

Rather than do each alter separately we've done them all in the one ALTER TABLE command.? You will notice that the constraint has a name, fk_book_author, fk_book_publisher and fk_book_format.? The name is not mandatory, the system will generate one for us, but this way ensures we have a name that means something to us and can be used in other ALTER TABLE operations.

The FOREIGN KEY defines a relationship between this table (the child) and the parent table. The bracketed list is the list of fields that form the child end of the relationship.? In our case we only need a single field.? The REFERENCES clause then indicates the table and matching field list for the parent table.? Following this are two clauses, the ON DELETE, and ON UPDATE clauses.?? These are what determine the action taken on the parent and child tables on the corresponding action.

I've used the two most common constraints, CASCADE and RESTRICT.? For CASCADE the update or delete does exactly that, cascades down the foreign key chain.? So in the case of author, for instance, if we changed the id of an author, all records in book that reference that author record will have their author field updated with the new value.? For DELETE we've used RESTRICT, what this does is denies a DELETE on any parent record that has children referencing it.? So we would manually need to delete all books to delete an author, for instance.? We could have chosen CASCADE for delete as well, and the book records would be deleted upon delete of an author.? But we'd better try it out and see.

 DELETE FROM `author` WHERE `last_name` = 'Corris';
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
?(`bookshop`.`book`, CONSTRAINT `fk_book_author` FOREIGN KEY (`author`)
?REFERENCES `author` (`id`) ON UPDATE CASCADE)

Great, we can't delete authors that have books, but what about one that we know doesn't have a book?

DELETE FROM `author` WHERE `first_name` = 'Merrilee' AND `last_name` = 'Moss';
Query OK, 1 row affected (0.00 sec)

Yep, so we know that the RESTRICT works as expected.? Let's take a look at the UPDATE.

 SELECT * FROM `author` WHERE `id` = 1;
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 1 | Peter | Temple |
+----+------------+-----------+
1 row in set (0.00 sec)

SELECT `title` FROM `book` WHERE `author` = 1;
+------------------+
| title |
+------------------+
| The Broken Shore |
| Shooting Star |
+------------------+
2 rows in set (0.00 sec)

UPDATE `author` SET `id` = 15 WHERE `id` = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

SELECT * FROM `author` WHERE `id` = 15;
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 15 | Peter | Temple |
+----+------------+-----------+
1 row in set (0.00 sec)

SELECT `title` FROM `book` WHERE `author` = 1;
Empty set (0.00 sec)

SELECT `title` FROM `book` WHERE `author` = 15;
+------------------+
| title |
+------------------+
| The Broken Shore |
| Shooting Star |
+------------------+
2 rows in set (0.00 sec)

We can see when updating the author ID, which could potentially break the link between author and book, the corresponding book records are also updated, maintaining the link and ensuring that the integrity of the references are maintained.

In our next episode we'll start to look at extending our tables to support sales, and the use of transactions.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,? MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — Changing data and schema, UPDATE, ALTER

Сентябрь 15th, 2011

In our last episode we covered sorting, searching and grouping. We found out that using the COUNT(*) can be problematic when we have unexpected NULL data. Now we look at how to resolve data issues by updating the data, and perhaps even the table schema. We'll use the same database we did for the last episode. You can download it here ».

Updating Data

Let's recap.  If we pull the list of books, and authors, we find that "The Broken Shore" from Peter Temple has no price.  Not that it has a zero price, but it has a NULL value.

mysql> SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title`, `book`.`price`
 FROM `author` INNER JOIN `book` ON `author`.`id` = `book`.`author`
 ORDER BY `book`.`price`
 LIMIT 1;
+--------------+------------------+-------+
| author | title | price |
+--------------+------------------+-------+
| Peter Temple | The Broken Shore | NULL |
+--------------+------------------+-------+
1 row in set (0.00 sec)

We only have one result here.  This is because I made sure the order was in ascending order of price, and used the LIMIT clause to reduce the number of rows being returned.  This does exactly what its name suggests.  LIMIT 1 returns the first row from the ordered result set.  LIMIT also supports an offset (which is zero by default) followed by the number of rows required.  So LIMIT 1 is equivalent to LIMIT 0, 1.   If we wanted the second and third record from the result set we could use LIMIT 1, 2.

OK, we really need to figure out the record that is the problem if we need to replace or fix it.  Lets see if we can find the record using a simple query.

SELECT * FROM `book` WHERE `price` < 1;
Empty set (0.02 sec)

Huh?  But NULL is less than 1 isn't it?  Not exactly.  No matter what value you use in a comparison, a NULL value will always fail.  This is because NULL is the absence of a value, not a value itself.  So we can't find it by checking against a valid value, what about checking against NULL itself?

SELECT * FROM `book` WHERE `price` = NULL;
Empty set (0.00 sec)

Well, this doesn't seem right.  But then, read again the description above.  Any comparison with any value will fail if the value is NULL.  You cannot compare a value against a non-value.  So how do we actually find the row with a NULL in it?  There is a construct that allows that, the IS NULL and its opposite, IS NOT NULL.

SELECT * FROM `book` WHERE `price` IS NULL;
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| id | title | author | publisher | format | published_date | isbn | price |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| 1 | The Broken Shore | 1 | 3 | 1 | 2010-09-27 | 9781921656774 | NULL |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
1 row in set (0.00 sec)

Ah, there we go.  OK, we have the record id, which we know to uniquely identify this record, so we can do one of two things.  Either we can delete the record and replace it,  or we can update the record in situ.  Let's have a look at both options.

DELETE FROM `book` WHERE `id` = 1;
Query OK, 1 row affected (0.05 sec)

INSERT INTO `book` VALUES (1, 'The Broken Shore', 1, 3, 1, '2010-09-27',
 '9781921656774', '21.50');
Query OK, 1 row affected (0.00 sec)

SELECT * FROM `book` WHERE `id` = 1;
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| id | title | author | publisher | format | published_date | isbn | price |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| 1 | The Broken Shore | 1 | 3 | 1 | 2010-09-27 | 9781921656774 | 21.50 |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
1 row in set (0.00 sec)

In this example we delete using a WHERE clause to identify the record.  If we don't provide a WHERE clause, DELETE FROM will delete all data in the table.

This time we are using an INSERT INTO with the full list of fields in the record, replacing everything.  This seems a little over the top, and could result in errors creeping in because of changing so many fields at once.  Fortunately SQL allows us to just update a single field if we need to.

UPDATE `book` SET `price` = '22.50' WHERE `id` = 1;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0

SELECT * FROM `book` WHERE `id` = 1;
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| id | title | author | publisher | format | published_date | isbn | price |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| 1 | The Broken Shore | 1 | 3 | 1 | 2010-09-27 | 9781921656774 | 22.50 |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
1 row in set (0.00 sec)

Here we see that we can use UPDATE to just update the field (or fields) that we need to.  This makes it far easier to maintain consistency in our data as we are only changing the data we need to and not touching fields we don't need to.

We can update multiple rows by adding each field = value pair, comma separated, after the SET keyword.  As an example:

UPDATE `book` SET `price` = '22.50', `format` = 1 WHERE `id` = 1;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 0

Note how in this case because we haven't changed anything we see the message from MySQL for the number of rows changed is now 0, while it was 1 in our last example.  Each case shows Rows matched: 1, which indicates that there was one row that matched the WHERE clause.

Altering table definitions.

So we can now update data effectively, but we also had a problem with the isbn field.  Initially we believed that ISBN is a 13 digit number, but when we loaded data we found that there are two types of ISBN.  The newer version is in fact a UPC (Universal Product Code) and matches the 13 character format, but the older version was a 10 digit version, with leading zeroes.  We probably need to do something about that.  We can change the field format to a string format, which allows us to keep the leading zeroes, but we still have those that were truncated when we loaded them.  We need a two pronged approach.  Firstly lets convert the field format

ALTER TABLE `book` CHANGE `isbn` `isbn` VARCHAR(13);
Query OK, 8 rows affected (0.13 sec)
Records: 8 Duplicates: 0 Warnings: 0

SELECT `isbn` FROM `book`;
+---------------+
| isbn |
+---------------+
| 9781921656774 |
| 9781741752236 |
| 732268133 |
| 9781741750966 |
| 207172137 |
| 1863590285 |
| 732276748 |
| 1863252509 |
+---------------+
8 rows in set (0.00 sec)

Note the use of ALTER TABLE.  This takes a table name and then we supply a field name, and its changed definition.  Since the field name is part of the definition, and we aren't changing the name, we need to supply it again.  We have now changed the type to VARCHAR(13) which is a variable length character field that can hold a maximum of 13 characters.  For values that are shorter than this, less space is used than for the fixed length CHAR(13) equivalent. In this case it is debatable which version is best, as all books will have an ISBN, so we don't really save much space using VARCHAR vs CHAR.

ALTER TABLE can pretty much change anything about a table definition, from field definitions right up to storage engine specifications.  It is a very powerful tool in your SQL toolkit.  Well worth looking up the MySQL Reference Manual page.

We've changed the format, now we need to fix those short ones.  We need two string functions available to us with MySQL to accomplish this, one to work out which are the short ones, and the other to fix the problem.

UPDATE `book` SET `isbn` = LPAD(`isbn`, 10, '0')
 WHERE CHAR_LENGTH(`isbn`) < 10;
Query OK, 3 rows affected (0.00 sec)
Rows matched: 3 Changed: 3 Warnings: 0

SELECT `isbn` FROM `book`;
+---------------+
| isbn |
+---------------+
| 9781921656774 |
| 9781741752236 |
| 0732268133 |
| 9781741750966 |
| 0207172137 |
| 1863590285 |
| 0732276748 |
| 1863252509 |
+---------------+
8 rows in set (0.00 sec)

The WHERE clause used the CHAR_LENGTH function, which returns the length of the character string, in characters.  Note that if using a multi-byte character set this will be different to the storage space required.  The LENGTH function in these cases would return a number higher than the CHAR_LENGTH function.  We will cover character sets later in the series.

By using WHERE CHAR_LENGTH(`isbn`) < 10 we find those records that have less than 10 digits in them.  We then need to pad those out to 10 digits with leading zeroes.  This is what LPAD does.  It is short for 'left pad', or padding the start of a string with the required number of digits.  It takes three parameters, the field (or string) to be modified, the legnth to pad out to, and the string to use for padding.

Why couldn't we just LPAD the entire isbn field?  LPAD not only pads, but also truncates to the length provided.  So we would lose information on those values longer than 10 digits.

You can find more information about string (and other data type) functions in the MySQL Reference Manual section 'Fucntions and Operators'.

In our next episode we will look at using the power of the database to control data integrity.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — Sorting and Searching: ORDER BY, WHERE, GROUP BY

Сентябрь 12th, 2011

In our last episode we were able to select some information from our bookshop database, this time we look at putting this into some semblence of order.  To fully investigate this topic we need a few more entries in our database, so rather than detail them here, I've put together this SQL file you can download and build your database to follow along.

To install the database, unpack the SQL file from its ZIP archive, and use the SOURCE command to pull the data into your database:

SOURCE mysql101_bookshop_20110912.sql;

You can also pass the file to the mysql command line interpreter from the shell:

mysql -uroot -p bookshop < mysql101_bookshop_20110912.sql
Password:

If you need a reminder of how to connect to your MySQL server, check out the second episode in this series.

Ordering data

We have already been able to use SELECT and JOIN to pull together information from our tables to get a composite display.  It would be nice to be able to order the data to make it easier to search.  Lets give that a try, ordering authors and their book titles.  In this example, we are using the CONCAT_WS function outlined in our previous episode to display the author name, but using the separate fields to sort.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title` FROM `author`
 INNER JOIN `book` ON `author`.`id` = `book`.`author`
 ORDER BY `author`.`last_name`, `author`.`first_name`;
+--------------+--------------------+
| author | title |
+--------------+--------------------+
| Peter Corris | Aftershock |
| Peter Corris | The Azanian Action |
| Peter Corris | Appeal Denied |
| Peter Corris | The Big Score |
| Tara Moss | Hit |
| Tara Moss | Split |
| Peter Temple | Shooting Star |
| Peter Temple | The Broken Shore |
+--------------+--------------------+
8 rows in set (0.00 sec)

We now have a list sorted by author name, sorting first by last_name, then by first_name.  The ORDER BY clause takes the name of a field or comma separated list of fields to sort.  By default the sort order is ascending in the collation of the fields in question (we'll get onto collation and character sets in a later episode).  You can make this explicit by using the ASC keyword.  Alternatively you can use DESC to reverse the sort order.  We can now restructure the list to give the publication date in most recent order first.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title`,
 YEAR(`book`.`published_date`) AS `published`
 FROM `author` INNER JOIN `book` ON `author`.`id` = `book`.`author`
 ORDER BY `book`.`published_date` DESC;
+--------------+--------------------+-----------+
| author | title | published |
+--------------+--------------------+-----------+
| Peter Temple | The Broken Shore | 2010 |
| Peter Corris | Appeal Denied | 2007 |
| Peter Corris | The Big Score | 2007 |
| Tara Moss | Hit | 2006 |
| Tara Moss | Split | 2003 |
| Peter Temple | Shooting Star | 1999 |
| Peter Corris | Aftershock | 1992 |
| Peter Corris | The Azanian Action | 1991 |
+--------------+--------------------+-----------+
8 rows in set (0.00 sec)

Using the DESC keyword has allowed us to display the data in the reverse order of date.  I've also used a date function YEAR() to just pick the year portion of the date for display.  There are other date functions (and indeed functions for most data types) to provide formatting or selection criteria for your data.  As this series progresses we will introduce many of these.

Aggregation and Grouping

What if we want to present a count of the books of a particular author?  For this we need a function to count the records, which is conveniently called COUNT(), and we need to be able to group the counts, and GROUP BY provides this functionality.  But lets take a look at why we need grouping.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 COUNT(*) as `number_of_books`
 FROM `author` LEFT JOIN `book` ON `author`.`id` = `book`.`author`;
+--------------+-----------------+
| author | number_of_books |
+--------------+-----------------+
| Peter Temple | 9 |
+--------------+-----------------+
1 row in set (0.00 sec)

That isn't right.  Peter Temple doesn't have 9 books in our database,  and there are no books for Tara Moss or Peter Corris.  This is why we need grouping.  What has happened here is that the COUNT(*) function counts all the rows that meet the requirements.  In our case we have just told it to count all records, we haven't told it when to stop counting or how to associate the count with data in the record.  This is where GROUP BY helps.  It allows us to use functions that result in a summary of information, and correctly correlate it with its associated data.  A picture is worth a thousand words:

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 COUNT(*) as `number_of_books`
 FROM `author` LEFT JOIN `book` ON `author`.`id` = `book`.`author`
 GROUP BY `author`.`last_name`, `author`.`first_name`;
+---------------+-----------------+
| author | number_of_books |
+---------------+-----------------+
| Peter Corris | 4 |
| Merrilee Moss | 1 |
| Tara Moss | 2 |
| Peter Temple | 2 |
+---------------+-----------------+
4 rows in set (0.00 sec)

But wait a minute, this shows Merrilee Moss as having 1 book, but we know we don't have any books for her.  How is this possible?  It turns out that it is a result of two effects.  Firstly we used LEFT JOIN, which we know from our last episode that this will put out a row with all of the book table fields set to NULL.  But there is a record, and therefore COUNT(*) correctly returns that fact.  However, the record represents the absence of a book, not its presence, so we can use another effect to get the correct count.  If we supply a field name to COUNT instead of *, and that field is NULL, COUNT won't count it.  You can verify this using the price field on the book table, which we know has a null value for Peter Temple's "The Broken Shore".

SELECT COUNT(*) FROM `book`;
+----------+
| COUNT(*) |
+----------+
| 8 |
+----------+
1 row in set (0.00 sec)

SELECT COUNT(`price`) FROM `book`;
+----------------+
| COUNT(`price`) |
+----------------+
| 7 |
+----------------+
1 row in set (0.00 sec

This allows us to use a field that should exist in all books, but will be NULL when there is no matching entry in a LEFT JOIN.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 COUNT(`book`.`id`) as `number_of_books`
 FROM `author` LEFT JOIN `book` ON `author`.`id` = `book`.`author`
 GROUP BY `author`.`last_name`, `author`.`first_name`;
+---------------+-----------------+
| author | number_of_books |
+---------------+-----------------+
| Peter Corris | 4 |
| Merrilee Moss | 0 |
| Tara Moss | 2 |
| Peter Temple | 2 |
+---------------+-----------------+
4 rows in set (0.00 sec)

Much better.  All that we have done is replace the * with `book`.`id` in the COUNT function.  The id field is unique to each book, and is never NULL.  When we do a LEFT JOIN and there is no matching record on the right table, however, we get the effect of having a record generated with all fields set to NULL, so we get the NULL id field which is exactly what we need in this instance.

A short word on GROUP BY.  The SQL standard requires that all fields in the SELECT statement, other than the grouping function, should appear in the GROUP BY statement.  Many other databases enforce this, although MySQL is a little more flexible in this regard.  Regardless it makes sense to stick to the standard if you want to write even remotely portable SQL code.

There are other summary or grouping functions, like AVG, MAX and MIN.  We will see more on these when we get to reporting.

Searching and constraining results

We now have the ability to sort, and group our data.  Now if we want to search, we need to be able to select just those records that match our search term.  We've already seen this in summary in our last episode, but we can explore it a little more.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title` FROM `author`
 LEFT JOIN `book` ON `author`.`id` = `book`.`author`
 LEFT JOIN `publisher` ON `publisher`.`id` = `book`.`publisher`
 WHERE `publisher`.`name` = 'Bantam';
+--------------+---------------+
| author | title |
+--------------+---------------+
| Peter Corris | Aftershock |
| Peter Temple | Shooting Star |
+--------------+---------------+
2 rows in set (0.00 sec)

Note that to search on a publisher name, we need to make sure it is JOINed in the query, otherwise there is no way we can associate the name with the books from that publisher.   What about an alphabetical listing of titles?  Can we do that?  Sure, we could, for instance find all those books that start with the letter A in their title.

SELECT `book`.`title` FROM `book`
 WHERE `book`.`title` LIKE 'A%'
 ORDER BY `book`.`title`;
+---------------+
| title |
+---------------+
| Aftershock |
| Appeal Denied |
+---------------+
2 rows in set (0.00 sec)

Here we are using the LIKE keyword to find those that have an A as the first character.  The percent sign '%' is used as a wildcard to indicate all matching fields.  When using the % wildcard you need to be aware that if the percent is used at the start, MySQL will not be able to use an index, and will instead need to match against every row (a full table scan).  With a large data set this can be very slow, and should be avoided.

Unless the field you are searching is using a binary collation both LIKE and = perform case insensitive searches.  I.e. both "LIKE 'a%'" and "LIKE 'A%'" will produce identical results.

With GROUP BY we can also use the HAVING keyword to provide constraints.

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 COUNT(`book`.`id`) as `number_of_books`
 FROM `author` LEFT JOIN `book` ON `author`.`id` = `book`.`author`
 GROUP BY `author`.`last_name`, `author`.`first_name`
 HAVING `number_of_books` > 2;
+--------------+-----------------+
| author | number_of_books |
+--------------+-----------------+
| Peter Corris | 4 |
+--------------+-----------------+
1 row in set (0.00 sec)

Note that we could not have put the constraint on the number of books in a where clause, as it would be evaluated before the group by, and hence before we had a count.  Also note that we have used the field alias in the HAVING clause.

That's probably enough for now.  In our next episode we'll look at updating data and changing table definitions.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — Retrieving data: SELECT and JOIN

Сентябрь 9th, 2011

In our last episode we started building up our online bookshop database, with tables for publishers, authors, formats and books.  At the moment we only have one book in there, so before we go too far, lets add a few more:

INSERT INTO `book` VALUES
( NULL, 'The Big Score', 2, 4, 1, '2007-01-01', 9781741752236, 29.95 ),
( NULL, 'Split', 3, 2, 1, '2003-01-01', 0732268133, 29.95 );

So what is this NULL thing, and why have I used it?  If you remember we set the first field to an auto_increment id.  Because we don't want to supply a value for this, but let the database create the next value, we need to give a value that indicates we want this to happen.  For this instance, NULL is the value to use.  We must supply a value because we didn't restrict our insert by supplying a field list, so we need to supply values for all fields in the table.

Let's have a look at the data in the book table now:

mysql> SELECT * FROM `book`;
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| id | title | author | publisher | format | published_date | isbn | price |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+
| 1 | The Broken Shore | 1 | 3 | 1 | 2010-09-27 | 9781921656774 | NULL |
| 2 | The Big Score | 2 | 4 | 1 | 2007-01-01 | 9781741752236 | 29.95 |
| 3 | Split | 3 | 2 | 1 | 2003-01-01 | 732268133 | 29.95 |
+----+------------------+--------+-----------+--------+----------------+---------------+-------+

 

Note how the auto_increment id field has been updated to create a sequential number for each new entry.

Hmm, we seem to have a few issues here.  In our last episode we created the Peter Temple book by explicitly naming the fields, and one field we left out was price.  Before we can sell that we will need to update the value, however for the moment it shows that leaving fields out will cause the database to supply a default value for us.  In the case of the price field it was not marked as 'NOT NULL', so NULL values are allowed, and we didn't give a DEFAULT so the default is NULL.  We'll look at updating data next week.

One other thing to note is that our ISBN field was defined as a numeric field - DECIMAL(13), and this has the effect of losing leading zeroes, as for a number, the initial zero does not change the value.  In a future episode we'll look at ways around this.  For now it simply serves to show that to get the database definition correct you need to analyse the data requirements in order to get your table definitions right.

Picking the fields we want

So far we have just been using SELECT * to pull all the available fields out of a table.  Now lets look at getting just the fields we want.  Title, Author and Price would be useful for a price list.

SELECT `title`, `author`, `price` FROM `book`;
+------------------+--------+-------+
| title | author | price |
+------------------+--------+-------+
| The Broken Shore | 1 | NULL |
| The Big Score | 2 | 29.95 |
| Split | 3 | 29.95 |
+------------------+--------+-------+
3 rows in set (0.00 sec)

Note that we can provide the fields we want, in the order we want them.  But 'author' isn't the name of the author, it is the id of the author field.  We can use an alias to make this explicit:

SELECT `title`, `author` AS `author_id`, `price` FROM `book`;
+------------------+-----------+-------+
| title | author_id | price |
+------------------+-----------+-------+
| The Broken Shore | 1 | NULL |
| The Big Score | 2 | 29.95 |
| Split | 3 | 29.95 |
+------------------+-----------+-------+
3 rows in set (0.00 sec)

Same data, but note the column heading.  We've effectively renamed the column for the purposes of our query. We haven't changed anything in the database, but this allows us to provide convenient names for use in clarifying data or simplifying queries.  We'll see more on this later.  But we can't provide a number to people instead of the author's name, that wouldn't look very good or make a lot of sense.

Getting data from multiple tables - JOIN

Fortunately there is a solution.  The JOIN clause.  There are a few of them, and each has a different purpose.  The simplest is the comma operator, also known as an implicit join:

SELECT `book`.`title`, `author`.`last_name` FROM `book`, `author`;
+------------------+-----------+
| title | last_name |
+------------------+-----------+
| Split | Temple |
| The Big Score | Temple |
| The Broken Shore | Temple |
| Split | Corris |
| The Big Score | Corris |
| The Broken Shore | Corris |
| Split | Moss |
| The Big Score | Moss |
| The Broken Shore | Moss |
| Split | Moss |
| The Big Score | Moss |
| The Broken Shore | Moss |
+------------------+-----------+
12 rows in set (0.00 sec)

Wait a minute! That isn't right.  We only have 3 books in the database.  What is going on?

This is one of the problems with an implicit join.  Unless you constrain the join it will be what is called a cartesian product.  That is for every row in each table, every row in every other table listed in the join will be matched, so in our case we have four authors and three books, so we get (4 * 3) = 12 results.  Clearly this isn't what we want, and is one of the reasons the comma operator is frowned upon when developing complex queries - it is far too easy to get it wrong.

Instead lets use the JOIN syntax and provide a join condition.

SELECT `book`.`title`, `author`.`last_name` FROM `book`
 JOIN `author` ON `book`.`author` = `author`.`id`;
+------------------+-----------+
| title | last_name |
+------------------+-----------+
| The Broken Shore | Temple |
| The Big Score | Corris |
| Split | Moss |
+------------------+-----------+
3 rows in set (0.00 sec)

Much better! Now we have each book and the correct author.  Oh, by the way, JOIN is a synonym for INNER JOIN, and the result shows records where records from both tables on the left and right of the JOIN match the join condition. There are other possiblities which we'll look at shortly.

We now have a way of getting the data we want out of the database.  What about where we only want books from one publisher?  This is where the WHERE clause comes in.

SELECT `book`.`title`, `author`.`last_name` FROM `book`
JOIN `author` ON `book`.`author` = `author`.`id` WHERE `publisher` = 2;
+-------+-----------+
| title | last_name |
+-------+-----------+
| Split | Moss |
+-------+-----------+
1 row in set (0.00 sec)

Lets now pull a full list of our books, including publisher and format.  This involves a number of joins, but each one builds on the last, so it isn't really a stretch.

SELECT `title`,
 CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `publisher`.`name` AS `publisher`,
 `format`.`name` as `format`,
 `price`, `isbn` FROM `book`
 JOIN `author` ON `book`.`author` = `author`.`id`
 JOIN `publisher` ON `book`.`publisher` = `publisher`.`id`
 JOIN `format` ON `book`.`format` = `format`.`id`;
+------------------+--------------+-----------------+-----------+-------+---------------+
| title | author | publisher | format | price | isbn |
+------------------+--------------+-----------------+-----------+-------+---------------+
| The Broken Shore | Peter Temple | Text | Paperback | NULL | 9781921656774 |
| The Big Score | Peter Corris | Allen and Unwin | Paperback | 29.95 | 9781741752236 |
| Split | Tara Moss | Harper Collins | Paperback | 29.95 | 732268133 |
+------------------+--------------+-----------------+-----------+-------+---------------+
3 rows in set (0.11 sec)

I've taken the liberty of adding a few more features to this query.  So lets go through it.   In all our multi-table queries so far we've used the full `table`.`field` syntax.  This is the preferred syntax and is in fact mandatory if there are similarly named fields in the tables in the query.  For example, we have a `name` field in both the pubisher and format tables, so simply specifying `name` would make it impossible for MySQL to determine which table you were talking about. In this query we've used just `title`, `price` and `isbn` without specifying which table.  We can do this because these fields only exist in one table, so there is no ambiguity.

Rather than pull out the author name as two separate fields, I've used the CONCAT_WS function to create a single output field from the two name fields in the author table.  CONCAT_WS is short for concatenate with separator, where concatenate means to place together.  The first argument is the string to use to place between the output fields, in our case a space, and the following fields are the data fields to concatenate.

Note that we don't have any entry for Merrilee Moss, because we don't have a book entry for her.  This is the way an INNER JOIN works.  What does this mean for getting a list of authors and their books?  Let's take a look at two ways of doing this:

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title`
 FROM `author`
 INNER JOIN `book` ON `author`.`id` = `book`.`author`;
+--------------+------------------+
| author | title |
+--------------+------------------+
| Peter Temple | The Broken Shore |
| Peter Corris | The Big Score |
| Tara Moss | Split |
+--------------+------------------+
3 rows in set (0.00 sec)

SELECT CONCAT_WS(' ',`author`.`first_name`,`author`.`last_name`) AS `author`,
 `book`.`title`
 FROM `author`
 LEFT JOIN `book` ON `author`.`id` = `book`.`author`;
+---------------+------------------+
| author | title |
+---------------+------------------+
| Peter Temple | The Broken Shore |
| Peter Corris | The Big Score |
| Tara Moss | Split |
| Merrilee Moss | NULL |
+---------------+------------------+
4 rows in set (0.00 sec)

Note that in the first version, with INNER JOIN (and remember this is a synonym for JOIN) we are seeing only those authors who have books.   The second version, using LEFT JOIN (which itself is a synonym for LEFT OUTER JOIN) we see all our authors, and those with books have their books listed.  Merrilee Moss is listed with NULL in the book title field, indicating that she has no books listed yet.  LEFT JOIN will return all records from the table to the left of the LEFT JOIN and matching records from the table to the right, but if there are no matching records a record will be created with all fields set to NULL.  There is an analagous RIGHT JOIN operator for compatibility with other systems, however in most cases you can use LEFT JOIN with the tables reversed to achieve the same result.

In our next episode we'll look at searching using WHERE, and ordering and grouping our data.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — Creating your first database

Сентябрь 5th, 2011

In our last episode we found out how to connect to a MySQL server.  This time we learn how to lay out a database and start creating it. For this, and following episodes, we will be looking at creating a database to support an online bookshop.

Creating the database

Using the mysql command line client, you can connect to the server and then create the database.  We need a name for the database, and in this case we'll call it 'bookshop'.   We'll also create a user who is specifically allowed to add and update the database, but not alter its structure:

mysql> CREATE DATABASE `bookshop`;
Query OK, 1 row affected (0.01 sec)

mysql> GRANT INSERT, SELECT, UPDATE, DELETE ON `bookshop`.* to 'bookuser'@'localhost' identified by 'bookpass';
Query OK, 0 rows affected (0.00 sec)

The first command created the database, but it is only accessible by users that have global database privileges. The second command gives the user 'bookuser' access when connecting locally to the four basic data management functions, often referred to as CRUD (Create, Read, Update, Delete).  These correlate to the SQL statements INSERT, SELECT, UPDATE and DELETE. For more information on the GRANT syntax, review the last episode.

Note that we have used two different types of quotes around different parts of the query.  When referring to data structures like the database, tables, columns and indexes, you should always surround them with the backtick (`) (sometimes called an opening single quote).  String data, like the username, the host and the password, should always be surrounded by the single quote character (').  This helps the query parser to identify the parts of the query correctly and result in less likliehood of a query syntax error if you inadvertently use a reserved word.

Tables, Columns and Indexes

Now we have the database we need to work out what data we want to store in it. Databases store related data in tables, structured as columns.  To be able to efficiently find things, indexes are also required.  Lets look at each element in turn.

Tables

A table allows you to store related data together. Data is stored in rows, and each row is constructed of columns. Because each row must have the same columns as every other row, the data must be structured and must be related for the table to make sense.  In our case we may have a table that defines all the books we want to sell.  It would make no sense to also try and store all of the sales in that same table.

Tables can have a storage engine associated with them.  This defines how the data will actually be stored on disk, or in memory.  MySQL supports several storage engines, with the most popular being InnoDB.  This is fast taking over from the original storage engine, MyISAM.  Other storage engines include ARCHIVE, CSV, BLACKHOLE and FEDERATED. If your database server is set up correctly, you should not need to worry about this for most uses, and we will cover some special use cases of storage engines later in this series.

Columns

A column is a collection of like information that makes up a single attribute of the data stored in a table. Columns are defined by a data storage type, which can be INTEGER, CHAR, DECIMAL, FLOAT, TEXT, BLOB and a number of variations on these.  For instance, VARCHAR is a variable length CHAR type and is often used for string data as it is more space efficient than the equivalent CHAR.  Columns are also defined by a length.  Defining too great a length results in wasted space and lowered performance, and too small means you may get truncation and losing information.

As an example, the price of a book is going to be DECIMAL as it will always contain decimal numbers, and since we are selling books that people should be able to afford, we probably don't need to be able to store a number greater than 999.99.  In MySQL we would therefore define the price column as DECIMAL(5,2).  The 5 refers to the total number of digits expressed (the precision), and the 2 is the number following the decimal point (the scale).

Indexes

One of the great benefits of using a database is the ability to find items by various attributes.  For instance a book will probably be searched for by title, but might also be searched for by author or even publisher. Without an index the database would need to match every record in the database to the search criteria you've supplied to find the right book. With a large number of books this could take a long time.  To solve this you could create an index on the title, or the author, or the publisher, or all three.  Indexes are structured in such a way that you don't have to search all of the records in a database to find a match, and the index tends to be a lot smaller than the data it is indexing, so even if it has to scan an entire index, this will be less work than scanning all of the data in the table.

A word of warning though.  Every time you create, update or delete a record in a table, every index associated with that record will also need to be updated.  So you may want to create indexes on every field just in case, but that would not be a good idea as it would make updates extremely slow.  It is far better to analyse what indexes are required than use a shotgun approach.

What is a book?

You know what a book is, right?  Lots of bits of paper bound together with words in it.  This is fine for humans, but if you wanted to describe a book in such a way that a computer could find information about it for you you would need to determine what the unique attributes of the book are.

So, for our books, seeing as we are trying to sell them, we might have the following attributes:

  • Title
  • Author
  • Publisher
  • Date publised
  • ISBN
  • Price
  • Format (paperback/hardback/trade)

Looks like we have the basis for our first table.  A possible table would be:

CREATE TABLE `book` (
`title` VARCHAR(200),
`author` VARCHAR(100),
`publisher` VARCHAR(50),
`published_date` DATE,
`isbn` DECIMAL(13),
`price` DECIMAL(5,2),
`format` VARCHAR(20)
);

Before hitting the Return key, we probably want to look a little closer at what we are storing here.

Looking at these attributes you may notice that some will be shared by more than one book.  For instance, an author may have written multiple books.  A publisher, unless the book is self-published, is likely to have published many books.  This gives us the opportunity to save a lot of storage space, and to improve our database design.  After all we are using a Relational Database system, so lets create some relationships.

What about Author?  Really an author has at a minimum a name.  They are likely to have a first name and a last name, and perhaps a middle initial.  For our purposes though, lets keep it simple with just a first and last name.  Why? If you are looking for a John Grisham novel, you would probably search for Grisham, and not John as that would match way too many authors not related to the John you are after.  So lets first create our author table:

CREATE TABLE `author` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`first_name` VARCHAR(50),
`last_name` VARCHAR(50) NOT NULL,
PRIMARY KEY (`id`),
KEY (`last_name`)
);

Whoa!  There is a lot more there than just a first and last name.  What's going on?  Taking it one step at a time:

`id` INTEGER NOT NULL AUTO_INCREMENT

This is one of the more important thing you need to understand about relationships.  Picking the right field to relate.  When we finally create the book table we need to be able to identify which author we are talking about.  So we need a common field, one that exists in both tables.  With just first name and last name, we are going to have problems.  For instance both Merrilee Moss and Tara Moss write books.  Both Peter Temple and Peter Corris write books.  So neither first name nor last name are unique.  This is where id comes in.  It is an automatically generated number (the AUTO_INCREMENT sees to that), that is independent of the other fields.  It can uniquely identify an author record. To ensure it must exist we also mark it as NOT NULL.

PRIMARY KEY (`id`)

This is the other requirement of a relation field.  It needs to be indexed, and it needs to be a unique index.  KEY is the keyword in SQL that identifies that this is an index, and the name of the field (or fields, we can have more than one) that forms the index is in brackets.  PRIMARY KEYs are special.  Not only are they unique, but as the name implies there can only be one per table.  Some storage engines make special arrangements for primary keys to improve the speed at which data is retrieved.  For instance, the InnoDB engine stores the primary key along with the full row of data for a record.  It doesn't create a separate index.  This means that primary key lookups are very fast as there is no need to go looking for the data associated with the key.

KEY (`last_name`)

This one creates another index on the last_name field.  Because we have left off the UNIQUE keyword, this will allow duplicates on the last name field.   Since the most common lookup of an author is going to be by last name, this index makes a lot of sense.  Note that the field last_name also has a NOT NULL declaration.  This ensures we can't have an empty last name, which wouldn't make a lot of sense.

Lets create a few more tables that we are likely to need:

CREATE TABLE `publisher` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`notes` TEXT,
PRIMARY KEY (`id`),
KEY (`name`)
);

CREATE TABLE `format` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`name` VARCHAR(20) NOT NULL,
`notes` TEXT,
PRIMARY KEY (`id`),
KEY (`name`)
);

Note how similar these two tables are.  Both have a name we can search on, both have a TEXT field that allows us to store (wait for it ...) text associated with the entry, and both have an id field.

This should be enough to start with.  We can refine our design later, but for now lets create the book table, using the relationships we've now defined with the above tables.

CREATE TABLE `book` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`title` VARCHAR(200) NOT NULL,
`author` INTEGER NOT NULL DEFAULT 0,
`publisher` INTEGER NOT NULL DEFAULT 0,
`format` INTEGER NOT NULL DEFAULT 0,
`published_date` DATE,
`isbn` DECIMAL(13),
`price` DECIMAL(5,2),
PRIMARY KEY(`id`),
KEY (`title`),
KEY (`author`),
KEY (`publisher`),
KEY (`format`)
);

Now we have a book table that uses integer values that match the id fields in the related tables.  In this example we've used the table name of the related table as the field name of the relation.  We could have used something like `author_id` to make it clear we are talking about the id field in the author table.  It doesn't really matter, provided you are consistent.  Just on consistency I've also made all the table names singular. Note that we have an id field in our book table as well.  This will be useful later.

The format of the fields that link tables should have the same format as the field they link to.  In all cases id is defined as an INTEGER field, so the fields in the book table that link to the id fields in the author, publisher and format tables must also be INTEGER.  You'll also notice that we have marked them as NOT NULL, which means we must have a value in these fields.  If we don't supply a value, the database will use the DEFAULT value.  In this case I've made this the number 0, but since this cannot be a valid id value, this is really a catch-all.  It might have been better creating 'UNKNOWN' entries in the author, publisher and format tables and using the id number of those records as the default.  Even better would be not to supply a default as this would force there to be a value that should match a valid record in the parent tables.

Checking out our work

Time to see what the database structure looks like.  We have a few options here.  The easiest is the "SHOW TABLES" query:

mysql> SHOW TABLES;
+--------------------+
| Tables_in_bookshop |
+--------------------+
| author |
| book |
| format |
| publisher |
+--------------------+
4 rows in set (0.00 sec)

Looks good.  We can use the 'DESCRIBE' query to determine how each table is set up:

mysql> DESCRIBE `author`;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| first_name | varchar(50) | YES | | NULL | |
| last_name | varchar(50) | NO | MUL | NULL | |
+------------+-------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

Here we can see that our id field is the PRIMARY key (PRI in the Key field), and that it is using AUTO_INCREMENT. This flag means that each new record that doesn't explicitly set the id field will have it set to the next sequential number greater than the highest used number in that field.

Adding data

Time to add some data.  First of all the reference tables.  These are the author, publisher and format that are referenced from the book table.

INSERT INTO `author` (`first_name`, `last_name`)
VALUES
('Peter', 'Temple'),
('Peter', 'Corris'),
('Tara', 'Moss'),
('Merrilee', 'Moss');

In just one query we've added four authors.  INSERT INTO expects the name of a table, and an optional field list.  If you don't supply the field list it will expect  the corresponding VALUES statements to include every field, in the order they appear in the DESCRIBE output.   Each bracketted VALUES entry provides the data matching the field names supplied (or implied) by the field list.

A few quick ones to get us rolling, note I'm not filling in the notes field yet, we'll deal with that in a later episode.

INSERT INTO `publisher` (`name`)  VALUES ('Random House'), ('Harper Collins'), ('Text'), ('Allen and Unwin'); 
INSERT INTO `format` SET `name` = 'Paperback';
INSERT INTO `format` SET `name` = 'Hardback';

Notice in the case of the format table I've used an alternate syntax, it does the same thing, but is often easier to understand than the VALUES syntax.

We have the basic table information here, so now lets put in a book or two, we'll use the SET syntax to make it a little easier to follow.  Before we can start though, we need to find out the ids that were generated when we added the above data, so we need a quick query. We'll use the SELECT statement to request information on what is held in the database.  This is the simplest form of the SELECT statement, where * means all fields.  Otherwise we would need to use a comma separated list of fields we wanted to see.

SELECT * FROM `author`;
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 1 | Peter | Temple |
| 2 | Peter | Corris |
| 3 | Tara | Moss |
| 4 | Merrilee | Moss |
+----+------------+-----------+
4 rows in set (0.00 sec)

SELECT * FROM `publisher`;
+----+-----------------+-------+
| id | name | notes |
+----+-----------------+-------+
| 1 | Random House | NULL |
| 2 | Harper Collins | NULL |
| 3 | Text | NULL |
| 4 | Allen and Unwin | NULL |
+----+-----------------+-------+
4 rows in set (0.00 sec)

SELECT * FROM `format`;
+----+-----------+-------+
| id | name | notes |
+----+-----------+-------+
| 1 | Paperback | NULL |
| 2 | Hardback | NULL |
+----+-----------+-------+
2 rows in set (0.00 sec)

Now we have everything we need to create some books.   Peter Temple wrote "The Broken Shore", which was published by Text in paperback under the ISBN 9781921656774 in September 2010.  Now we need to structure that so the database can store it:

INSERT INTO `book`
SET `title` = 'The Broken Shore',
`author` = 1,
`publisher` = 3,
`format` = 1,
`isbn` = 9781921656774,
`published_date` = '2010-09-27';

You'll be starting to see a pattern developing here. While I haven't stated it previously, all SQL statements must be correctly terminated.  In MySQL as with many other SQL derivatives, the semicolon ';' acts as a terminator.  Think of it like the full stop at the end of a sentence.  Also you'll note that numeric data doesn't need quotes around it, but dates do.

The date format used above is the default and is an ISO (International Standards Organisation) standard.  Often called International or European format, it is in year-month-day format.

In our next episode we'll look at queries to pull information out of the database in a meaningful manner and to search based on our relationships.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL and 'MySQL Enterprise'  are registered trademarks of Oracle Corporation


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — Connecting to a MySQL server

Август 30th, 2011

In our last episode we looked at getting MySQL, today we will be looking at how you connect to a MySQL server and what that implies.

Connection basics

Before a client can connect to a MySQL server it needs a path by which that connection can be made. One method that is almost universal is the 'socket'.  As its name implies it is a way of plugging two (or more) applications together.  Sockets can either be end points for a network connection (for instance a TCP socket) or can use the same system-level functions but use a local connection.  This is sometimes called a UNIX socket, and relies on there being a special file that the two applications can use to initiate a connection.  MySQL can use both.

For a network connection you need a number of pieces of information.  As a connection has two endpoints you need information for both ends, as a connection is uniquely defined by its endpoints.  An endpoint is defined by its address, the port, and the protocol.  As we will be talking about TCP connections the protocol is always going to be TCP. The address and port can be thought of as the connections on a tanker filling a number of tanks, the pipes between them need to conect the right pump with the right tank.  The address is the tanker, the port is the individual outlet. Generally you don't need to worry about the address and port at the end from which you are making connections, they are normally handled automatically. To make a connection to a remote MySQL server you really only need to supply the address (or hostname) of the remote server, the port will default to 3306 unless specified or unless the configuration file dictates another port.

With a UNIX socket you need the name of a special file.  There is no port associated with it and it always resides on the same server as the applications that use it.   So you don't have to know the name of the file, MySQL uses the special host name 'localhost' to indicate that the socket method should be used, and gets the socket name from its configuration file.

Localhost vs 127.0.0.1

With network connections, there are two special addresses that always refer to the machine you are currently using. This is either called 'localhost' or has the IP address of 127.0.0.1.  In general using these is equivalent and interchangeable, and uses a special path through the networking subsystem called the local loopback interface. However MySQL has reserved 'localhost' for the special purpose of indicating the use of a UNIX socket and not use networking.  This can cause some confusion to new users, but if you understand what is going on it is pretty clear.

'localhost' in MySQL uses the UNIX socket and ignores any port you may supply.

127.0.0.1 in MySQL uses networking and requires a port, using the default if you don't supply it.

Why it matters - Authentication

When you make a connection to a MySQL server you need to supply more than just the address.  You also need to supply a user and a password, and will at some point need to identify the database you want to use.  How these are authenticated depends on the path that you use to connect.

When you supply a user and password for a database the MySQL server checks what connection information you supplied and checks the mysql.user table first.  If you look at that table you will find that it includes the fields User, Host and Password, along with a number of permission fields.  Both the User and Host fields can include the wildcard character '%' to indicate all possible values.

The Host field indicates the hostname or IP address from which you are connecting. So you can have a user with the same name but connecting in different ways having completely different permissions, and even different passwords! What is more important is that a user on the same host can end up being treated differently if they use the socket connection (by specifying localhost as the host) or using a network connection (by specifying the IP address 127.0.0.1).

To allow a user access to a particular database, the most common method is to use the GRANT syntax.  Here are a few examples:

GRANT ALL ON mydb.* TO 'myuser'@'%' IDENTIFIED BY 'reallylongpassword';
GRANT ALL ON mydb.* TO 'myuser'@'localhost' IDENTIFIED BY 'anotherlongpassword';

The syntax is pretty simple, the word (or words) after GRANT indicate the permissions being granted, after ON is given the table specification, in database.table format.  In our case we are granting all permissions on all tables in the mydb database.  After TO is a user specification indicating the user followed by @ followed by the hostname.  Finally a password is supplied in the IDENTIFIED BY clause.  More information can be found in the MySQL Reference Manual.

Now look at those two statements.  The first grants access to, and creates, the user 'myuser', regardless of which host he is connecting from (the % being a wildcard).  The second then grants access to 'myuser' when coming from 'localhost'.  Surely the second is not going to be required?  Wrong.  Remember that 'localhost' is special and you need to ensure you have access granted for users connecting locally.  This is by far the most common mistake made by new MySQL admins.

Putting it all together

We now want to connect to our MySQL server.  We need four vital pieces of information, the user name, their password, the host and finally the database name.  We may also need a port or a socket file, but generally they will be implied.

If you have set up MySQL on your own host, generally you will be using 'localhost' as the host name.  The database and user will be dictated by the administrator and if that isn't you, you will need to ask.  If you start the mysql command line utility without supplying any connection information it will assume your operating system level username, 'localhost', and no password.

If you are using a hosted site, then you will need to check with the hosting provider.  Quite often the host will be 'locahost', but just as often it won't. You need to check.

How you connect depends on what your application is.  You can always check the connection using the mysql command line interface (if you have access to it).  It is by far the best way of checking for errors and debugging connection issues.

Using the mysql command line interface, you can supply the user, the host, the password and the database for the connection:

mysql -u username -ppassword -h host databasename

Note that there is no space between the -p and the password, if you supply a space the password will be read as being the database name instead and will prompt you for the password.  You can leave the password out, and just supply the -p for mysql to prompt for the password.  This is the preferred method as it is more secure than passing it on the command line.

If you are setting up an application then read the application installation guide and it will explain how to set up the hostname, username, password and database.  In general there will be a configuration file where you add these details.

But I can't connect! What's wrong?

If you've been following thus far you should have a good idea where to start looking, however here is a simple checklist to determine where the problem lies.

Do you have the right host?

Check that there is in fact a MySQL server on the host you are trying to connect to and remember the difference between 'localhost' and 127.0.0.1. If the information was supplied by a third party check that you have the correct details.

Is the host contactable?

Try using basic network tools like 'ping' to check if the host is responding on the network.  If not then you need to check if you have a network connection and check with the host owner to see that it is reachable.

Is MySQL running?

You can use 'telnet' to connect to the host and port and you should get some information back.  If you cannot connect then chances are the MySQL server is not running, or not listening on that port.  An example session would look something like this:

$ telnet localhost 3306
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
=
5.1.54-1ubuntu4??X32D>G.?"XxqPz6<i}X"Connection closed by foreign host.

Does the user exist?

If you get errors like:

ERROR 1045 (28000): Access denied for user 'fred'@'localhost' (using password: YES)

This could be that the user hasn't been granted access, that the user doesn't exist or that the password is wrong. Remember that a user is associated with a host, so check that the user on that host has access to the required database.

Does the database exist?  Do you have access?

Try connecting without supplying a database name.  As long as you have a user/host pairing available you should be able to connect to a MySQL server.  Then once connected try using the database.  You can use 'USE dbname' in the MySQL command line to check.

mysql> USE mydb;

 

In our next issue we will look at basic database management, how to create and use a database and table.

If you find this series of use and are looking for more detail, there may be a MySQL training course located near you.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL and 'MySQL Enterprise'  are registered trademarks of Oracle Corporation


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 101 — Getting MySQL

Август 29th, 2011

This is the first in a series of posts on MySQL® for those new to the database, or those migrating from another DBMS.

So you've made the decision to try MySQL.  Now you just have to get it installed. Luckily for most purposes MySQL is quite often already available.  If you have a Linux installation then chances are that both the server and client are installed.  If you are planning on using MySQL for your website, chances are the hosting provider gives you several MySQL databases for your use.

Before diving in too deep though, let's get some background and terminology out of the way.

What is MySQL?

This depends.  MySQL was a company, is a trademark, is an ecosystem and is the name of arguably the most popular relational database management system (RDBMS) on the planet.  Originally developed by MySQL AB, MySQL (both the database and the trademark) are now owned by Oracle Corporation. But the story doesn't stop there.  Because the MySQL code is released under the GNU Public License (GPL) v2.0 many other companies and individuals have contributed to the code, either releasing plugins, patches, tools or even code forks (new versions of the code that share a common inheritence, but are not controlled by the original company).

Why do I need it?

Virtually all programs are a combination of code and data.  The data part can be quite simple, or quite complex. For simple applications you can easily manage the data in your code, but for more complex applications the code required just to manage the data can become a large percentage of your entire codebase.  This is where a DBMS (Database Management System) comes into play.  It is both a storage and retrieval system and code hooks to allow application programmers to worry about their application and not about their data.  MySQL, like many DBMS uses SQL (Structured Query Language) to provide the interface between the application and its data. But more on SQL in a later post.

Most websites also have data requirements. This blog site for instance uses code that allows me to edit my posts and categorise them, and share them with the world.  All of the words you see are stored in a MySQL database.  MySQL is FTW! (For the Web, as well as For the Win).

Client? Server? What are these?

MySQL generally refers to the server - the piece of software that actually stores the data on disk and retrieves it for you. It can get confusing though as the CLI (Command Line Interface) utility is also called 'mysql'. Where it matters I'll refer to mysql (note the lower case) when referring to the CLI, and mysqld (note the d) when referring to the server process. The mysql CLI is not the only client though.  Your application can also be a client, and talks to the server using an API (Application Programming Interface).  For instance you may see reference to libmysql or libmysqlclient, which are the C programming libraries that implement the MySQL API.  Languages like PHP, Java, .Net, PERL and Python will provide methods to talk to the MySQL server, usually in a module called a Connector.  On the mysql.com website you will see references to Connector/Net, Connector/J, etc.

The MySQL server process, mysqld, can run on the same computer as client programs or it can be remote. Should your server be remote you may only need to install the client libraries or connectors for your application or language.

Community vs Enterprise vs GA

One of the stumbling blocks you will find when looking on the web for MySQL is the use of the terms GA, Community and Enterprise.  GA is an indication of the build quality of the code. It is an abbreviation of 'Generally Available'.  Many other products use the terms 'Production Ready', 'stable' or some other indication of the code's readiness for production code. You don't want to run a pre-release version which is marked as 'alpha', 'beta' or 'rc' (release candidate).  GA is the "stamp of approval" that gives you that confidence.

The 'Community' vs 'Enterprise' is a slightly different proposition. 'Enterprise' is in fact a product that includes both the software and support and update services for MySQL.  'Community' refers to just the software.  You can download 'Community' versions freely, and up until recently there was no difference between the code downloaded as 'Community' and that provided with 'Enterprise' - the only difference was that you got services with 'Enterprise' that you didn't get with 'Community', which is why there is a price for 'Enterprise' and not for 'Community'.

Getting MySQL

OK, back to business.  You've got your website or your application all mapped out and now you need to get MySQL installed.  Where do you start?  As mentioned it may be simpler than you think.

Linux

On most Linux distributions MySQL will be available via the package management system in use.  There are two main package types in general use: RPM and DEB.  RPM was originally developed by RedHat and is used by many distributions such as RedHat, Fedora, SuSE and CentOS. DEB is used by Debian-based distributions such as Debian and Ubuntu.  Generally the packages are named similarly, 'mysql' for the client and 'mysql-server' for the server. There may be other packages such as 'mysql-common' for the common libraries.  Fortunately you don't need to worry about that much as the package managers deal with dependencies for you.

So, on a Debian based system like  Ubuntu system you would use something like:

sudo apt-get install mysql mysql-server

This installs both the client (CLI) and server components, and any dependencies required for them to work.

On an RPM-based system like Fedora you would use:

sudo yum install mysql mysql-server

Simple!  You now have the server and a command line utility installed.

Official Packages

You may be wondering why there are downloads on various sites, including mysql.com of packages for Linux distributions if you can just install it from the packages supplied by your distribution.  The main reason is that the official packages tend to be more recent and have bug fixes that may not yet be available in your distribution release. There are problems with some of these though.  For instance it has been a nagging issue in the RedHat releases that the official packages have different dependencies to the RedHat supplied packages.  This means that you cannot cleanly upgrade an already installed RedHat system with the official packages. This may be resolved by the time of writing, however it has been an issue for a number of years.  So if you need the latest version, make sure you don't have the distribution release versions installed first, or use the .tar.gz download instead.

Windows

Windows is fast becoming a major platform for MySQL.  In fact the vast majority of downloads of MySQL are for Windows. However Windows doesn't have a packaging system like Linux distributions that collect together packages from a variety of vendors using a single tool.  Instead you need to find and install a package from a trusted source.  For Windows there are two types of install, one is the MSI (Windows Installer) format, and the other is often referred to as the 'No Install' package.  This is a ZIP file containing the components but without a Windows Installer.  You can run MySQL directly with this package, but it takes a bit of understanding.  For most users, download the .msi (Windows Installer) package and execute it.  It will take you through the installation steps required.

One thing you must remember is that when you go through the Configuration Wizard you will be asked for a password. Make sure you remember this as this is required for administrative access to the database, and to the command line utility.  If there is enough interest I'll do a step-by-step through the Installer and Configuration Wizard in a future post.

Web Hosting

The number of web hosting packages out there is mind blowing, and it is impossible to cover all possible variations. In general if you have a control panel for your hosting package there will be a way to set up a database and database user and there will be instructions on how to configure your application to talk to it.  If in doubt contact your web hosting company for support.  If they don't provide MySQL services then you may need to change hosting companies.

Where do you get it?

You can get MySQL and its derivatives from a number of places:

If you want a pre-configured installation for a variety of environments, you might like to try out the SkySQL Reference Architecture

 

The next issue will be talking about how connections are made to MySQL and configuring connections and logins.

Disclaimer

I am employed by SkySQL Ab, which provides support and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL and 'MySQL Enterprise'  are registered trademarks of Oracle Corporation, MariaDB is a trademark of Monty Program Ab.


PlanetMySQL Voting: Vote UP / Vote DOWN