Archive for the ‘Stored Procedures’ Category

MySQL Striped Views

Май 24th, 2012

A question came up today about how to stripe a MySQL view, and this post shows you how. Along with the question, there was a complaint about why you can’t use session variables in a view definition. It’s important to note two things: there’s a workaround and there’s an outstanding request to add lift the feature limitation in Bug 18433.

A striped view lets authorized users see only part of a table, and is how Oracle Database 11g sets up Virtual Private Databases. Oracle provides both schema (or database) level access and fine-grained control access. Fine grained control involves setting a special session variable during a user’s login. This is typically done by checking the rights in an Access Control List (ACL) and using an Oracle built-in package.

You can do more or less the same thing in MySQL by using stored functions. One function would set the session variable and the other would fetch the value for comparison in a view.

Most developers who try this initially meet failure because they try to embed the session variable inside the view, like this trivial example with Hobbits (can’t resist the example with the first installment from Peter Jackson out later this year):

1
2
CREATE VIEW hobbit_v AS
SELECT * FROM hobbit WHERE hobbit_name = @sv_login_name;

The syntax is disallowed, as explained in the MySQL Reference 13.1.20 CREATE VIEW Syntax documentation. The attempt raises the following error message:

ERROR 1351 (HY000): VIEW's SELECT contains a variable or parameter

The fix is quite simple, you write a function that sets the ACL value for the session and another that queries the ACL session value. For the example, I’ve written the SET_LOGIN_NAME and a GET_LOGIN_NAME functions. (If you’re new to stored programs, you can find a 58 page chapter on writing them in my Oracle Database 11g & MySQL 5.6 Developer Handbook (Oracle Press) or you can use Guy Harrison’s MySQL Stored Procedure Programming.)

You would call the SET_LOGIN_NAME when you connect to the MySQL database as the first thing to implement this type of architecture. You would define the function like the following. (Please note that the example includes all setup statements from the command line and should enable you cutting and pasting it. ;-) ):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Change the delimiter to something other than a semicolon.
DELIMITER $$
 
-- Conditionally drop the function.
DROP FUNCTION IF EXISTS set_login_name$$
 
-- Create the function.
CREATE FUNCTION set_login_name(pv_login_name VARCHAR(20)) RETURNS INT UNSIGNED
BEGIN
 
  /* Declare a local variable to verify completion of the task. */
  DECLARE  lv_success_flag  INT UNSIGNED  DEFAULT FALSE;
 
  /* Check whether the input value is something other than a null value. */
  IF pv_login_name IS NOT NULL THEN
 
    /* Set the session variable and enable the success flag. */
    SET @sv_login_name := pv_login_name;
    SET lv_success_flag := TRUE;
 
  END IF;
 
  /* Return the success flag. */
  RETURN lv_success_flag;
END;
$$
 
-- Change the delimiter back to a semicolon.
DELIMITER ;

You can use a query to set and confirm action like this:

SELECT IF(set_login_name('Frodo')=1,'Login Name Set','Login Name Not Set') AS "Login Name Status";

A more practical example in an API would be this, which returns zero when unset and one when set:

SELECT set_login_name('Frodo') AS "Login Name Status";

The getter function for this example, simply reads the current value of the MySQL session variable. Like the prior example, it’s ready to run too.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Change the delimiter to something other than a semicolon.
DELIMITER $$
 
-- Conditionally drop the function.
DROP FUNCTION IF EXISTS get_login_name$$
 
-- Create the function.
CREATE FUNCTION get_login_name() RETURNS VARCHAR(20)
BEGIN
  /* Return the success flag. */
  RETURN @sv_login_name;
END;
$$
 
-- Change the delimiter back to a semicolon.
DELIMITER ;

Before you test it, lets create a HOBBIT table, seed it with data, and create a HOBBIT_V view. They’re bundled together in the following microscript:

-- Conditionally drop the table.
DROP TABLE IF EXISTS hobbit;
 
-- Create the table.
CREATE TABLE hobbit
( hobbit_id    INT UNSIGNED
, hobbit_name  VARCHAR(20));
 
-- Seed two rows.
INSERT INTO hobbit VALUES ( 1,'Bilbo'),( 1,'Frodo');
 
-- Conditionally drop the view.
DROP VIEW IF EXISTS hobbit_v;
 
-- Create the function-enabled view.
CREATE VIEW hobbit_v AS
SELECT * FROM hobbit WHERE hobbit_name = get_login_name();

A query to the table after setting the session variable will only return one row, the row with Frodo in the HOBBIT_NAME column. It also guarantees an unfiltered UPDATE statement against the view only updates the single row returned, like this:

UPDATE hobbit_v SET hobbit_id = 2;

In a real solution, there are more steps. For example, you’d want your tables in one database, views in another, and functions and procedures in a library database. However, I hope this helps seed some ideas for those interested in creating fine-grained virtual private databases in MySQL with user-authenticated application controls.


PlanetMySQL Voting: Vote UP / Vote DOWN

Value or Reference?

Май 10th, 2012

In class today, we reviewed pass-by-value (IN-only mode) parameters and pass-by-reference (INOUT and OUT mode) parameters for stored procedures. The analogy that finally seemed to hit home for the students was linking the modes to the story of Alice in Wonderland.

Here’s the analogy and below is the code to support it:

“A pass-by-reference parameter in a procedure is like sending an immutable copy of Alice into the rabbit hole, which means she can’t shrink, grow, or learn throughout the story; whereas, a pass-by-reference parameter in a procedure is like sending Alice into the rabbit hole where she can shrink, grow, fight the Jabberwocky, and learn things that make her life better when she exits the rabbit hole – consistent with the storyline of Alice’s revisit to Wonderland.”

The example code creates a stored procedure that accepts two parameters – one pass-by-value and one pass-by-reference. Inside the procedure there’s a local variable and a reassignment of value to the pass-by-reference parameter. It’s in this wonderland procedure (by the way don’t forget to manage the DELIMITER value when you test it):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
CREATE PROCEDURE wonderland
( IN     pv_value_param  VARCHAR(20)
, INOUT  pv_ref_param    VARCHAR(20))
BEGIN
 
  /* Declare a variable. */
  DECLARE lv_value_param  VARCHAR(20);
 
  /* Query the local variable and reference parameter before changing values. */ 
  SELECT   'On Entry' AS "Where"
  ,        pv_value_param AS "Value Parameter"
  ,        pv_ref_param AS "Reference Parameter"
  ,        IFNULL(lv_value_param,'     ') AS "Local Variable";
 
  /* Assign a lowercase value parameter to a local variable. */
  SET lv_value_param := LOWER(pv_value_param);  
 
  /* Assign a uppercase reference parameter value to the reference parameter. */
  SET pv_ref_param := UPPER(pv_ref_param);
 
  /* Query the local variable and reference parameter after changing values. */ 
  SELECT   'On Exit ' AS "Where"
  ,        pv_value_param AS "Value Parameter"
  ,        pv_ref_param AS "Reference Parameter"
  ,        IFNULL(lv_value_param,'     ') AS "Local Variable";
 
END;
$$

A tester procedure than tests how the pass-by-value and pass-by-reference modes of operation differ. It’s here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CREATE PROCEDURE tester
( IN  pv_value_param  VARCHAR(20)
, IN  pv_ref_param    VARCHAR(20))
BEGIN
 
  /* Query the local and reference parameters. */ 
  SELECT   'Before  ' AS "Where"
  ,        pv_value_param AS "Value Parameter"
  ,        pv_ref_param AS "Reference Parameter";
 
  /* Call the wonderland procedure that changes the pass-by-reference parameter. */
  CALL wonderland(pv_value_param, pv_ref_param);
 
  /* Query the local and reference parameters. */ 
  SELECT   'After   ' AS "Where"
  ,        pv_value_param AS "Value Parameter"
  ,        pv_ref_param AS "Reference Parameter";
 
END;
$$

You call the tester program with this syntax:

CALL tester('Alice','Alice');

The test case returns the following values:

+----------+-----------------+---------------------+
| Where    | Value Parameter | Reference Parameter |
+----------+-----------------+---------------------+
| Before   | Alice           | Alice               |
+----------+-----------------+---------------------+
1 row in set (0.00 sec)
 
+----------+-----------------+---------------------+----------------+
| Where    | Value Parameter | Reference Parameter | Local Variable |
+----------+-----------------+---------------------+----------------+
| On Entry | Alice           | Alice               |                |
+----------+-----------------+---------------------+----------------+
1 row in set (0.02 sec)
 
+----------+-----------------+---------------------+----------------+
| Where    | Value Parameter | Reference Parameter | Local Variable |
+----------+-----------------+---------------------+----------------+
| On Exit  | Alice           | ALICE               | alice          |
+----------+-----------------+---------------------+----------------+
1 row in set (0.02 sec)
 
+----------+-----------------+---------------------+
| Where    | Value Parameter | Reference Parameter |
+----------+-----------------+---------------------+
| After    | Alice           | ALICE               |
+----------+-----------------+---------------------+
1 row in set (0.03 sec)

Basically, Alice inside the pv_ref_param parameter grows to uppercase during the trip through the wonderland procedure, while Alice inside the pv_value_param remains unchanged. If it didn’t help you learn a principle, maybe it gave you a laugh on how to view the travels of IN-only and INOUT parameters. ;-)


PlanetMySQL Voting: Vote UP / Vote DOWN

Value or Reference?

Май 10th, 2012

In class today, we reviewed pass-by-value (IN-only mode) parameters and pass-by-reference (INOUT and OUT mode) parameters for stored procedures. The analogy that finally seemed to hit home for the students was linking the modes to the story of Alice in Wonderland.

Here’s the analogy and below is the code to support it:

“A pass-by-reference parameter in a procedure is like sending an immutable copy of Alice into the rabbit hole, which means she can’t shrink, grow, or learn throughout the story; whereas, a pass-by-reference parameter in a procedure is like sending Alice into the rabbit hole where she can shrink, grow, fight the Jabberwocky, and learn things that make her life better when she exits the rabbit hole – consistent with the storyline of Alice’s revisit to Wonderland.”

The example code creates a stored procedure that accepts two parameters – one pass-by-value and one pass-by-reference. Inside the procedure there’s a local variable and a reassignment of value to the pass-by-reference parameter. It’s in this wonderland procedure (by the way don’t forget to manage the DELIMITER value when you test it):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
CREATE PROCEDURE wonderland
( IN     pv_value_param  VARCHAR(20)
, INOUT  pv_ref_param    VARCHAR(20))
BEGIN
 
  /* Declare a variable. */
  DECLARE lv_value_param  VARCHAR(20);
 
  /* Query the local variable and reference parameter before changing values. */ 
  SELECT   'On Entry' AS "Where"
  ,        pv_value_param AS "Value Parameter"
  ,        pv_ref_param AS "Reference Parameter"
  ,        IFNULL(lv_value_param,'     ') AS "Local Variable";
 
  /* Assign a lowercase value parameter to a local variable. */
  SET lv_value_param := LOWER(pv_value_param);  
 
  /* Assign a uppercase reference parameter value to the reference parameter. */
  SET pv_ref_param := UPPER(pv_ref_param);
 
  /* Query the local variable and reference parameter after changing values. */ 
  SELECT   'On Exit ' AS "Where"
  ,        pv_value_param AS "Value Parameter"
  ,        pv_ref_param AS "Reference Parameter"
  ,        IFNULL(lv_value_param,'     ') AS "Local Variable";
 
END;
$$

A tester procedure than tests how the pass-by-value and pass-by-reference modes of operation differ. It’s here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CREATE PROCEDURE tester
( IN  pv_value_param  VARCHAR(20)
, IN  pv_ref_param    VARCHAR(20))
BEGIN
 
  /* Query the local and reference parameters. */ 
  SELECT   'Before  ' AS "Where"
  ,        pv_value_param AS "Value Parameter"
  ,        pv_ref_param AS "Reference Parameter";
 
  /* Call the wonderland procedure that changes the pass-by-reference parameter. */
  CALL wonderland(pv_value_param, pv_ref_param);
 
  /* Query the local and reference parameters. */ 
  SELECT   'After   ' AS "Where"
  ,        pv_value_param AS "Value Parameter"
  ,        pv_ref_param AS "Reference Parameter";
 
END;
$$

You call the tester program with this syntax:

CALL tester('Alice','Alice');

The test case returns the following values:

+----------+-----------------+---------------------+
| Where    | Value Parameter | Reference Parameter |
+----------+-----------------+---------------------+
| Before   | Alice           | Alice               |
+----------+-----------------+---------------------+
1 row in set (0.00 sec)
 
+----------+-----------------+---------------------+----------------+
| Where    | Value Parameter | Reference Parameter | Local Variable |
+----------+-----------------+---------------------+----------------+
| On Entry | Alice           | Alice               |                |
+----------+-----------------+---------------------+----------------+
1 row in set (0.02 sec)
 
+----------+-----------------+---------------------+----------------+
| Where    | Value Parameter | Reference Parameter | Local Variable |
+----------+-----------------+---------------------+----------------+
| On Exit  | Alice           | ALICE               | alice          |
+----------+-----------------+---------------------+----------------+
1 row in set (0.02 sec)
 
+----------+-----------------+---------------------+
| Where    | Value Parameter | Reference Parameter |
+----------+-----------------+---------------------+
| After    | Alice           | ALICE               |
+----------+-----------------+---------------------+
1 row in set (0.03 sec)

Basically, Alice inside the pv_ref_param parameter grows to uppercase during the trip through the wonderland procedure, while Alice inside the pv_value_param remains unchanged. If it didn’t help you learn a principle, maybe it gave you a laugh on how to view the travels of IN-only and INOUT parameters. ;-)


PlanetMySQL Voting: Vote UP / Vote DOWN

Summary Tables with MySQL

Февраль 15th, 2012
I was recently talking with a few people and the concept of summary tables came up as solutions for them. The issue is, they never thought of it. Hopefully this post will help them as well as others.

Summary Tables are not a new concept. They are exactly what they sound like, basically summary of existing data. Aggregated tables, materialized views and summary tables are very dominate in data warehouses. You can also take advantage of these concepts as part of your reporting systems as well.

So summary tables are real tables. So you can take advantage of indexes and etc with them. In the examples I am using, I consider them more of a summary table than aggregated tables . Depending on your application and needs it could grow into more of an aggregated tables and or materialized views situation.

How you separate your data and tables is dependent on your reporting and application needs.

The following is a high level example of how you can use summary tables.



Let us assume that you have a report that is populated with different website traffic aspects.
This could be a report that includes impressions, hits and leads gathered via your site from some other sources. Other in the company, of course, want to spend money where it best makes money. So they need updated information reliably and do not want to wait for it.

Currently you have table for your reporting tools and you calculate ROI per report request via SQL.

For example:
CREATE TABLE `some_report` (
`traffic_source_id` int(5) unsigned NOT NULL DEFAULT '0',
`ad_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliate_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliates_key` varchar(25) DEFAULT '',
`date_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`impressions` int(5) unsigned DEFAULT '0',
`hits` int(5) unsigned DEFAULT '0',
`leads` int(5) unsigned DEFAULT '0',
UNIQUE KEY `traffic` (`traffic_source_id`,`affiliate_id`,`ad_id`,`date_time`,`affiliates_key`),
KEY `date_time` (`date_time`)
) ENGINE=InnoDB ;

This table is populated hourly, so an insert could be similar to the query below. Your system executes queries like this thousands+ of time per day because it is based on different traffic sources as well as affiliate and their affiliate keys. So this table is going to grow dynamically, hours per day * traffic_source_id * affiliate_id * ad_id * affiliates_key.

For example:
INSERT INTO some_report VALUES (100,4343,9839,'SomeID',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),89832,44916,22458);
INSERT INTO some_report VALUES (100,4343,9839,'SomeID2',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),93332,34716,23438);
INSERT INTO some_report VALUES (100,2343,9839,'SomeID',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),543232,44316,458);
INSERT INTO some_report VALUES (100,2343,9839,'SomeID2',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),89832,24916,2458);
INSERT INTO some_report VALUES (100,5343,9839,'SomeID',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),1239832,2344916,2538);

Just for reference lets assume :
100 = Google.com
4343 = An ad words ad for MySQL
9839 = Some Marketing Company
SomeID = affiliates_key of some marketing company

You have reviewed your explains per query and your SQL query is the fastest you can get it. It just takes time to calculate all the ROIs when reports are spanning a lot of different entities. The web pages are taking to long and people are getting upset.

So you decide to change your tables and create a process to roll up your data across different summary tables using an ETL or Events or cron job based scripts. So your system populates the RETURN_VALUED per ad_id & affiliate_id. You do not know what the affiliate is paying out to other affiliates_keys but you are aware of what you spent and what the lead returns to you.

First start with our lowest common denominator. Here you have data per hour.

CREATE TABLE `some_report_hour` (
`traffic_source_id` int(5) unsigned NOT NULL DEFAULT '0',
`ad_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliate_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliates_key` varchar(25) DEFAULT '',
`date_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`impressions` int(5) unsigned DEFAULT '0',
`hits` int(5) unsigned DEFAULT '0',
`leads` int(5) unsigned DEFAULT '0',
`cost` decimal(9,2) DEFAULT '00.00',
`roi` decimal(7,2) DEFAULT '00.00',
UNIQUE KEY `traffic` (`traffic_source_id`,`affiliate_id`,`ad_id`,`date_time`,`affiliates_key`),
KEY `date_time` (`date_time`)
) ENGINE=InnoDB ;


Then every hour it populates more data into the new table via your ETL / Event / Stored Procedure / Cron job Script. It depends on how you want to handle and decide your values for

For example:
DELIMITER //
CREATE PROCEDURE some_report_hour_procedure(IN affiliate_id int(6) , RETURN_VALUED int(6) , COST decimal(5,3) )
BEGIN
INSERT INTO some_report_hour
SELECT traffic_source_id, ad_id, affiliate_id, affiliates_key, date_time, impressions, hits, leads, COST, ROUND((SUM(leads * RETURN_VALUED ) / COST),2)
FROM some_report r
WHERE r.date_time = DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00')
GROUP BY traffic_source_id, ad_id, affiliate_id, affiliates_key;
END //
DELIMITER ;

CREATE EVENT some_report_hour_event
ON SCHEDULE
EVERY 1 HOUR
COMMENT ' just an example. Depends on you how you want to populate these values. This is for just 1 affiliate_id '
DO
CALL some_report_hour_procedure(9839,5,100000); # (affiliate_id , RETURN_VALUED , COST)

So the data is much like the following:

+-------------------+-------+--------------+----------------+---------------------+-------------+---------+-------+-----------+---------+
| traffic_source_id | ad_id | affiliate_id | affiliates_key | date_time | impressions | hits | leads | cost | roi |
+-------------------+-------+--------------+----------------+---------------------+-------------+---------+-------+-----------+---------+
| 100 | 2343 | 9839 | SomeID | 2012-02-14 16:00:00 | 543232 | 44316 | 458 | 100000.00 | 22.90 |
| 100 | 2343 | 9839 | SomeID2 | 2012-02-14 16:00:00 | 89832 | 24916 | 2458 | 100000.00 | 122.90 |
| 100 | 4343 | 9839 | SomeID | 2012-02-14 16:00:00 | 89832 | 44916 | 22458 | 100000.00 | 1122.91 |
| 100 | 4343 | 9839 | SomeID2 | 2012-02-14 16:00:00 | 93332 | 34716 | 23438 | 100000.00 | 1171.91 |
| 100 | 5343 | 9839 | SomeID | 2012-02-14 16:00:00 | 1239832 | 2344916 | 2538 | 100000.00 | 126.90 |
+-------------------+-------+--------------+----------------+---------------------+-------------+---------+-------+-----------+---------+


Nothing is dynamically different but you have removed the ROI from your query. It will be a simple select now. But now you can use your ETL/Event/Cron Job Script to keep the day table updated as well.

CREATE TABLE `some_report_day` (
`traffic_source_id` int(5) unsigned NOT NULL DEFAULT '0',
`ad_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliate_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliates_key` varchar(25) DEFAULT '',
`date_time` date NOT NULL DEFAULT '0000-00-00',
`impressions` int(5) unsigned DEFAULT '0',
`hits` int(5) unsigned DEFAULT '0',
`leads` int(5) unsigned DEFAULT '0',
`cost` decimal(9,2) DEFAULT '00.00',
`roi` decimal(7,2) DEFAULT '00.00',
UNIQUE KEY `traffic` (`traffic_source_id`,`affiliate_id`,`ad_id`,`date_time`,`affiliates_key`),
KEY `date_time` (`date_time`)
) ENGINE=InnoDB ;

Notice I left the field date_time labeled as date_time. This is assuming you might have your report presented via column names and you just adjust the FROM table on selects based on what type of report needed. This is of course dependent on your application.

Notice that your inserts are going to be “group by” to match your key.
For example:

CREATE EVENT some_report_day_event
ON SCHEDULE
EVERY 1 HOUR
COMMENT 'updates some_report_day hourly'
DO
REPLACE INTO some_report_day SELECT traffic_source_id, ad_id, affiliate_id, affiliates_key , DATE_FORMAT(NOW() ,'%Y-%m-%d') as date_time, SUM(impressions), SUM(hits), SUM(leads), SUM(cost), ROUND( SUM(leads) * 5 / SUM(cost) ,2) as ROI
FROM some_report_hour r
WHERE r.date_time BETWEEN DATE_FORMAT(NOW() ,'%Y-%m-%d 00:00:00') AND DATE_FORMAT(NOW() ,'%Y-%m-%d 23:59:59')
GROUP BY traffic_source_id, ad_id, affiliate_id, affiliates_key;

Now the reports that gather data per day and not per hour are going to be faster as they are going to be simple selects as well.

So the data is much like the following:

+-------------------+-------+--------------+----------------+------------+-------------+---------+-------+-----------+------+
| traffic_source_id | ad_id | affiliate_id | affiliates_key | date_time | impressions | hits | leads | cost | roi |
+-------------------+-------+--------------+----------------+------------+-------------+---------+-------+-----------+------+
| 100 | 2343 | 9839 | SomeID | 2012-02-14 | 543232 | 44316 | 458 | 100000.00 | 0.02 |
| 100 | 2343 | 9839 | SomeID2 | 2012-02-14 | 89832 | 24916 | 2458 | 100000.00 | 0.12 |
| 100 | 4343 | 9839 | SomeID | 2012-02-14 | 89832 | 44916 | 22458 | 100000.00 | 1.12 |
| 100 | 4343 | 9839 | SomeID2 | 2012-02-14 | 93332 | 34716 | 23438 | 100000.00 | 1.17 |
| 100 | 5343 | 9839 | SomeID | 2012-02-14 | 1239832 | 2344916 | 2538 | 100000.00 | 0.13 |
+-------------------+-------+--------------+----------------+------------+-------------+---------+-------+-----------+------+

You can continue this concept for monthly and yearly data.
You could also build summary tables broken down by affiliate_id and day for example.
It really depends on what your after. Just remember data is never deleted but just copied into summary table for ease of use and speed.

This is a very simple example but hopefully it gets you thinking and started on how you can summarize your data easily.


PlanetMySQL Voting: Vote UP / Vote DOWN

When Clever Goes Wrong & How Etsy Overcame – Arstechnica

Октябрь 5th, 2011

In 2007, Etsy made a big bet on homegrown middleware to help with the site’s scalability. A half-year after it was taken live, the company decided to abandon it. As a senior software engineer at Etsy put it, “if you’re doing something ‘clever,” you’re probably doing it wrong.”

Read the full article at Arstechnica.com

I want to focus on the important lessons from this article, about middleware and using stored procedures in this fashion for a public web application, creating unscalable design complexity (smart and “proper” according to the old enterprise design teachings…) – causing infrastructure, development and maintenance hassles.

In the process they did replace PostgreSQL with MySQL but that’s not the critical change that made all the difference. PostgreSQL is a fine database system also.


PlanetMySQL Voting: Vote UP / Vote DOWN

Curious case with MySQL replication

Сентябрь 1st, 2011

MySQL Replication is a powerful tool and it’s hard to find a production system not using it. On the other hand debugging replication issues can be very hard and time consuming. Especially if your replication setup is not straightforward and you are using filtering of some kind.

Recently we got an alert from our monitoring system that replication stopped on production slave with the following error:

Can't find record in 'some_table', Error_code: 1032;
handler error HA_ERR_KEY_NOT_FOUND;
the event's master log binlog.000011, end_log_pos 735766642

This means that a ROW-based replication event was going to be applied on slave, but could not find the row it was supposed to be applied to. This is something I like about ROW format — it allows you to catch such data synchronization issues right away. In this particular case MIXED format was used, but if this event was written in STATEMENT format, slave would just apply it with no error. Event was just an UPDATE and in STATEMENT format when update doesn’t affect any rows on slave it just proceeds to next event, etc.

This error was very concerning because this was a new production machine which was built out from latest hot backup several hours ago. Two ideas which come into mind in such situations are: something went wrong during switch over to new master server or something was left running on old production master (which now became slave) and is changing data there.

The first idea about initial replication misconfiguration was proven not valid when we checked the timestamp of the row which existed on master, but was missing on slave. The row was added several hours after switchover happened, so it was very unlikely that new slave was just pointed to wrong binary log position.

To check second idea about something still writing to slave we scanned slave binary logs in search of statements with server_id equal slave server id, but there was nothing there. There were also no connections except replication in slave’s SHOW PROCESSLIST. Further investigation has shown that there were many more rows missing on slave, with several different tables affected.

So when all the simple potential root causes are checked and rejected, it’s time to start digging into binary logs. Unfortunately, there not many tools available out there to help you with this. All you have is mysqlbinlog utility and Linux text processing tools like sedand grep. To start with something I took first row which caused issues on slave, I had it’s primary key value and created_at timestamp.

After some time of master’s binary logs debugging I could see the following picture: there were several UPDATE statements, but I couldn’t find original INSERT statement where the row was initially added. Check on several other missing rows gave similar results: there were some UPDATEs in master’s binary log, but no initial INSERT. Another interesting fact was that all the missing rows were created on master around the same time in 5-10 minutes interval.

At this point it became clear that some rows were added on master, but were never written to binary log. How could this happen? I had another set of possibilities: someone was running a manual session on master with SQL_LOG_BIN=0, we were hitting some exotic MySQL bug (new master was built with MySQL 5.5) or something else happened and I just had not enough facts available.

I tried to find correlation between missing rows, but no luck and I knew customer had pretty strict access policy to this servers, so most of the people didn’t have write access to the database. Checking the MySQL bug database also didn’t reveal anything relevant and I would expect a bug which prevents some events from being written to binary log to be found very soon. And then I remembered something, something that was implemented on this system several month ago.

There are some stored procedures that are being executed on daily basis. These stored procedures create several temporary tables, fill them with data, massage the data in some way and then return result set back to client. While there procedures do not modify production data in any way, all operations related to temporary table creation are written to the binary log and thus replicated to slaves, creating absolutely unnecessary additional load on slaves.
To avoid this a decision was made to disable binary logging inside the stored procedure — it was safe as no tables other than temporary were updated. Binary logging was disabled at very beginning of the procedure and was enabled when procedure ended:

CREATE DEFINER = 'user'@'localhost' PROCEDURE proc1()
BEGIN
  SET SESSION SQL_LOG_BIN = 0;
  .............
  .............
  .............
  SET SESSION SQL_LOG_BIN = 1;
END;

This approach worked well, until something went wrong.

We did have logging enabled for stored procedures using our own framework to log every stored procedure step and overall execution status in a separate logging table. When I check this log table I found out that one of the stored procedures failed around the time we start seeing missing rows on slave. All pieces came together and it became clear what actually happened.

When stored procedure failed in the middle (due to space issue in MySQL temp directory) it’s connection was left in state where binary log was disabled but was never enabled back. Then application connection pooling came into play and this “altered” connection was reused. This means for some time one of the application threads was using a connection with disabled binary logging. I didn’t have any facts that would be pointing exactly to this, and I’m not sure if it is possible at all to obtain such evidence, but all other facts were pointing that this was the root cause.

Conclusion. I would like to say do not use custom replication filtering and this solves potential issues, but first of all there are many cases where replication filters are very useful and as with any more or less complex incident there were several factors that in particular combination triggered the issue. MySQL temp directory was placed on a smaller volume on the new master and this fact was overlooked. When decision was made to disable binary logging for stored procedures not all test cases were considered, like what happens if procedure fails. Finally, I would expect connection pool to be able to save connection session variables and restore them when connection is recycled.

This incident had serious impact on the customer’s production system, but it was also a good lesson on how even very unlikely scenarios can happen and how looking at the problem from different angles can help to reveal the real root cause.


PlanetMySQL Voting: Vote UP / Vote DOWN

Stored procedures and Triggers

Июль 6th, 2011

Having multiple ways to achieve a task is something we all enjoy as developers and DBAs. We find, develop and learn new ways to do things better and faster all the time.

At the risk of starting a debate, I am curious on others opinions or practices when it comes to Stored Procedures and Triggers. To use them or not versus code based functions ? Best case use versus worst case use? There is no real wrong answer here as it depends on your development application. Certainly some lean one way over another and there are more than enough valid reasons on both sides of the debate.

Here are couple of my thoughts on the topic....

I come from the dot.com bubble era , and from that I rarely use stored procedures or triggers. Back then PHP was still new, Perl dominated websites with the cgi-bin and MySQL did not have stored procedures or triggers. Thank goodness things have changed. Developing in those days, forced developers to develop in certain ways. Hopefully, we all used functions and classes so we could reuse code as much as possible. Yet we all survived without stored procedures or triggers just fine by having the logic in our code. But did that make it better?
During this time I had to interact with a SQL server and the DBA that ran it. My LAMP stack used MySQL except for the one call to the finance department's SQL server for a yes/no value. ( I actually used FreeTDS for this connection and it worked like a champ, my side of things anyway. No comment on the speed or other issues of the SQL server needed here. ) I did not need or care about the table structures and layout of the finance SQL system. I only needed a yes or no answer from it. It was the finance teams system and they protected it, rightfully so. So in that case it made perfect sense, for a stored procedure. I handled the code on my side , made a call into the SQL server got the answer and got right out.

Now days of course we have stored procedures in MySQL, do we use them? Is it just an old school development versus modern development in terms of who uses them? I have seen them used at times when it just made it harder to track down a problem. Tracing code across six different functions and cursing the entire way at the missing comments in the code only to find a CALL statement. So we end up going back to the DB to find the query and hopefully do not break the site as we test changes. At the same time, I think they are great if you do need to compartmentalize your logic away from others.

It really all comes down to your application. Where do you want your logic to live? The biggest thing we need to take into account when we decide this is, what do your benchmarks say about your query and do you value the results? Do you want to take advantage of not passing the data across a network and keeping it local to the database? Are you ok with the a separate cache per connection to a stored procedure versus a globally cached query? Depending on the query and how often it is used plays a big part in that decision.

A simple example of stored procedure using the “menagerie database”  is below:

DROP PROCEDURE IF EXISTS addnewpet;
DELIMITER //
CREATE PROCEDURE addnewpet(IN petname VARCHAR(20) , ownername VARCHAR(20), speciestype VARCHAR(20), gender CHAR(1) , bday date )
BEGIN
INSERT INTO pet (name,owner,species,sex,birth) VALUES (petname,ownername,speciestype,gender,bday);
INSERT INTO event VALUES (petname, date_format(NOW(),'%Y-%m-%d'), 'first visit', NULL) ;
END //
DELIMITER ;
CALL addnewpet('Clifford','Emily','dog','m',date_format(NOW(),'%Y-%m-%d') );

This simple call will add the dog name to both tables when used.

You will notice the one of the problems with stored procedures is that you might end up having to pass a lot of variables that might already be defined in your class.

Triggers are another example where history has had developers learn to query data after insert and/or have scripts check the status of data to then make changes as needed. Triggers avoid this for you and should be used whenever possible. Let MySQL handle the adjustments for you immediately and avoid having to write code just for a simple update. Granted as situations get more complex the need might arise for function based code as well. Make sure that all code is optimized before used in the trigger as trigger options are limited.

Replication concerns with triggers are addressed here:

A quick example can be shown using the “menagerie database

mysql> SELECT TRIGGER_NAME, EVENT_MANIPULATION, EVENT_OBJECT_TABLE, ACTION_STATEMENT
FROM INFORMATION_SCHEMA.TRIGGERS
WHERE TRIGGER_SCHEMA='menagerie';
No triggers so we will create one. In this example we can assume that people forget to update the death of pets in the pet table. They note it in events but that is it.
So a simple trigger will keep this updated for us :
delimiter ;;
CREATE TRIGGER ed_update AFTER INSERT ON event
for each row
begin
UPDATE pet p
INNER JOIN event e ON p.name = e.name
SET p.death = e.date
WHERE e.remark = 'death' AND p.death IS NULL;
end;;
delimiter ;

This trigger just updated the pets table for us quickly.


To boil down my thoughts on stored procedures and triggers.... When you are considering a stored procedure make sure to benchmark and test to confirm your ok with the cache results. A stored procedure is best when the global cache is not going to be more effective overall. If tests show that the results are faster handling large data sets with stored procedures versus code base changes then also a good option to use stored procedures. Testing all options is critical.

Triggers can be a great addition to clean up additional scripting work that can be done immediately upon data insert. Again test and confirm to be sure your aware of all the options of using them. Options with triggers are limited so do all the explains ahead of time.

More links here:




PlanetMySQL Voting: Vote UP / Vote DOWN

Getting a return code from a stored procedure

Август 26th, 2010

Sometimes we have some special need for a stored procedure to call another to do something. That is fine, but what if the second stored proc failed for some reason? Maybe you want to halt the first stored procedure (the caller) and not proceed with the work until the problem is verified and resolved. So How do you do it?

Simple, get a return code from the called procedure to determine if it worked or not!

Here’s a sample piece of code to explain better:


DROP PROCEDURE IF EXISTS `test`.`testing123`;
DELIMITER $$

CREATE
PROCEDURE `test`.`testing123`(OUT a INT)
BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
SET a=2;
END;
SET a=0;

# toggle the below as comment or not to see the call at the bottom working
# if you uncomment select abc you'll see the error, otherwise all is cool

# select abc;

END$$

DELIMITER ;

DROP PROCEDURE IF EXISTS `test`.`testing456`;
DELIMITER $$

CREATE
PROCEDURE `test`.`testing456`()
BEGIN
SET @a=0;
CALL `test`.`testing123` (@a);

IF @a != 0 THEN
SELECT "There is a problem with proc `testing123`";
ELSE
SELECT "Everything is cool";
END IF;

END$$

DELIMITER ;

CALL `test`.`testing456` ();

testing123 is the second stored proc in this case, called from testing456. The trick is to have an exit handler which returns a status number to the first stored proc (testing456). The latter will hold an if condition to do `something` depending on the return code received by testing123.

If you have any other suggestions I’d appreciate your input.


PlanetMySQL Voting: Vote UP / Vote DOWN

mysqldump each object separately

Август 13th, 2010

As a continuation to a previous blog post last week and inspired by Kedar I have created a small script to export tables, stored procedures, functions and views into their respective file. It works for multiple databases where you can specify a list of databases too and although things like events, triggers and such are still missing they are easily added.

It is especially useful to dump stored procedures separately since it is a lacking functionality in mysqldump.

I placed the script in mysql forge for anybody to use, provide feedback and possibly enhancements to it.

Cheers,
Darren


PlanetMySQL Voting: Vote UP / Vote DOWN

mysqldump each object separately

Август 13th, 2010

As a continuation to a previous blog post last week and inspired by Kedar I have created a small script to export tables, stored procedures, functions and views into their respective file. It works for multiple databases where you can specify a list of databases too and although things like events, triggers and such are still missing they are easily added.

It is especially useful to dump stored procedures separately since it is a lacking functionality in mysqldump.

I placed the script in mysql forge for anybody to use, provide feedback and possibly enhancements to it.

Cheers,
Darren


PlanetMySQL Voting: Vote UP / Vote DOWN