Archive for the ‘Stored Procedures’ Category

When Clever Goes Wrong & How Etsy Overcame – Arstechnica

Октябрь 5th, 2011

In 2007, Etsy made a big bet on homegrown middleware to help with the site’s scalability. A half-year after it was taken live, the company decided to abandon it. As a senior software engineer at Etsy put it, “if you’re doing something ‘clever,” you’re probably doing it wrong.”

Read the full article at Arstechnica.com

I want to focus on the important lessons from this article, about middleware and using stored procedures in this fashion for a public web application, creating unscalable design complexity (smart and “proper” according to the old enterprise design teachings…) – causing infrastructure, development and maintenance hassles.

In the process they did replace PostgreSQL with MySQL but that’s not the critical change that made all the difference. PostgreSQL is a fine database system also.


PlanetMySQL Voting: Vote UP / Vote DOWN

Curious case with MySQL replication

Сентябрь 1st, 2011

MySQL Replication is a powerful tool and it’s hard to find a production system not using it. On the other hand debugging replication issues can be very hard and time consuming. Especially if your replication setup is not straightforward and you are using filtering of some kind.

Recently we got an alert from our monitoring system that replication stopped on production slave with the following error:

Can't find record in 'some_table', Error_code: 1032;
handler error HA_ERR_KEY_NOT_FOUND;
the event's master log binlog.000011, end_log_pos 735766642

This means that a ROW-based replication event was going to be applied on slave, but could not find the row it was supposed to be applied to. This is something I like about ROW format — it allows you to catch such data synchronization issues right away. In this particular case MIXED format was used, but if this event was written in STATEMENT format, slave would just apply it with no error. Event was just an UPDATE and in STATEMENT format when update doesn’t affect any rows on slave it just proceeds to next event, etc.

This error was very concerning because this was a new production machine which was built out from latest hot backup several hours ago. Two ideas which come into mind in such situations are: something went wrong during switch over to new master server or something was left running on old production master (which now became slave) and is changing data there.

The first idea about initial replication misconfiguration was proven not valid when we checked the timestamp of the row which existed on master, but was missing on slave. The row was added several hours after switchover happened, so it was very unlikely that new slave was just pointed to wrong binary log position.

To check second idea about something still writing to slave we scanned slave binary logs in search of statements with server_id equal slave server id, but there was nothing there. There were also no connections except replication in slave’s SHOW PROCESSLIST. Further investigation has shown that there were many more rows missing on slave, with several different tables affected.

So when all the simple potential root causes are checked and rejected, it’s time to start digging into binary logs. Unfortunately, there not many tools available out there to help you with this. All you have is mysqlbinlog utility and Linux text processing tools like sedand grep. To start with something I took first row which caused issues on slave, I had it’s primary key value and created_at timestamp.

After some time of master’s binary logs debugging I could see the following picture: there were several UPDATE statements, but I couldn’t find original INSERT statement where the row was initially added. Check on several other missing rows gave similar results: there were some UPDATEs in master’s binary log, but no initial INSERT. Another interesting fact was that all the missing rows were created on master around the same time in 5-10 minutes interval.

At this point it became clear that some rows were added on master, but were never written to binary log. How could this happen? I had another set of possibilities: someone was running a manual session on master with SQL_LOG_BIN=0, we were hitting some exotic MySQL bug (new master was built with MySQL 5.5) or something else happened and I just had not enough facts available.

I tried to find correlation between missing rows, but no luck and I knew customer had pretty strict access policy to this servers, so most of the people didn’t have write access to the database. Checking the MySQL bug database also didn’t reveal anything relevant and I would expect a bug which prevents some events from being written to binary log to be found very soon. And then I remembered something, something that was implemented on this system several month ago.

There are some stored procedures that are being executed on daily basis. These stored procedures create several temporary tables, fill them with data, massage the data in some way and then return result set back to client. While there procedures do not modify production data in any way, all operations related to temporary table creation are written to the binary log and thus replicated to slaves, creating absolutely unnecessary additional load on slaves.
To avoid this a decision was made to disable binary logging inside the stored procedure — it was safe as no tables other than temporary were updated. Binary logging was disabled at very beginning of the procedure and was enabled when procedure ended:

CREATE DEFINER = 'user'@'localhost' PROCEDURE proc1()
BEGIN
  SET SESSION SQL_LOG_BIN = 0;
  .............
  .............
  .............
  SET SESSION SQL_LOG_BIN = 1;
END;

This approach worked well, until something went wrong.

We did have logging enabled for stored procedures using our own framework to log every stored procedure step and overall execution status in a separate logging table. When I check this log table I found out that one of the stored procedures failed around the time we start seeing missing rows on slave. All pieces came together and it became clear what actually happened.

When stored procedure failed in the middle (due to space issue in MySQL temp directory) it’s connection was left in state where binary log was disabled but was never enabled back. Then application connection pooling came into play and this “altered” connection was reused. This means for some time one of the application threads was using a connection with disabled binary logging. I didn’t have any facts that would be pointing exactly to this, and I’m not sure if it is possible at all to obtain such evidence, but all other facts were pointing that this was the root cause.

Conclusion. I would like to say do not use custom replication filtering and this solves potential issues, but first of all there are many cases where replication filters are very useful and as with any more or less complex incident there were several factors that in particular combination triggered the issue. MySQL temp directory was placed on a smaller volume on the new master and this fact was overlooked. When decision was made to disable binary logging for stored procedures not all test cases were considered, like what happens if procedure fails. Finally, I would expect connection pool to be able to save connection session variables and restore them when connection is recycled.

This incident had serious impact on the customer’s production system, but it was also a good lesson on how even very unlikely scenarios can happen and how looking at the problem from different angles can help to reveal the real root cause.


PlanetMySQL Voting: Vote UP / Vote DOWN

Stored procedures and Triggers

Июль 6th, 2011

Having multiple ways to achieve a task is something we all enjoy as developers and DBAs. We find, develop and learn new ways to do things better and faster all the time.

At the risk of starting a debate, I am curious on others opinions or practices when it comes to Stored Procedures and Triggers. To use them or not versus code based functions ? Best case use versus worst case use? There is no real wrong answer here as it depends on your development application. Certainly some lean one way over another and there are more than enough valid reasons on both sides of the debate.

Here are couple of my thoughts on the topic....

I come from the dot.com bubble era , and from that I rarely use stored procedures or triggers. Back then PHP was still new, Perl dominated websites with the cgi-bin and MySQL did not have stored procedures or triggers. Thank goodness things have changed. Developing in those days, forced developers to develop in certain ways. Hopefully, we all used functions and classes so we could reuse code as much as possible. Yet we all survived without stored procedures or triggers just fine by having the logic in our code. But did that make it better?
During this time I had to interact with a SQL server and the DBA that ran it. My LAMP stack used MySQL except for the one call to the finance department's SQL server for a yes/no value. ( I actually used FreeTDS for this connection and it worked like a champ, my side of things anyway. No comment on the speed or other issues of the SQL server needed here. ) I did not need or care about the table structures and layout of the finance SQL system. I only needed a yes or no answer from it. It was the finance teams system and they protected it, rightfully so. So in that case it made perfect sense, for a stored procedure. I handled the code on my side , made a call into the SQL server got the answer and got right out.

Now days of course we have stored procedures in MySQL, do we use them? Is it just an old school development versus modern development in terms of who uses them? I have seen them used at times when it just made it harder to track down a problem. Tracing code across six different functions and cursing the entire way at the missing comments in the code only to find a CALL statement. So we end up going back to the DB to find the query and hopefully do not break the site as we test changes. At the same time, I think they are great if you do need to compartmentalize your logic away from others.

It really all comes down to your application. Where do you want your logic to live? The biggest thing we need to take into account when we decide this is, what do your benchmarks say about your query and do you value the results? Do you want to take advantage of not passing the data across a network and keeping it local to the database? Are you ok with the a separate cache per connection to a stored procedure versus a globally cached query? Depending on the query and how often it is used plays a big part in that decision.

A simple example of stored procedure using the “menagerie database”  is below:

DROP PROCEDURE IF EXISTS addnewpet;
DELIMITER //
CREATE PROCEDURE addnewpet(IN petname VARCHAR(20) , ownername VARCHAR(20), speciestype VARCHAR(20), gender CHAR(1) , bday date )
BEGIN
INSERT INTO pet (name,owner,species,sex,birth) VALUES (petname,ownername,speciestype,gender,bday);
INSERT INTO event VALUES (petname, date_format(NOW(),'%Y-%m-%d'), 'first visit', NULL) ;
END //
DELIMITER ;
CALL addnewpet('Clifford','Emily','dog','m',date_format(NOW(),'%Y-%m-%d') );

This simple call will add the dog name to both tables when used.

You will notice the one of the problems with stored procedures is that you might end up having to pass a lot of variables that might already be defined in your class.

Triggers are another example where history has had developers learn to query data after insert and/or have scripts check the status of data to then make changes as needed. Triggers avoid this for you and should be used whenever possible. Let MySQL handle the adjustments for you immediately and avoid having to write code just for a simple update. Granted as situations get more complex the need might arise for function based code as well. Make sure that all code is optimized before used in the trigger as trigger options are limited.

Replication concerns with triggers are addressed here:

A quick example can be shown using the “menagerie database

mysql> SELECT TRIGGER_NAME, EVENT_MANIPULATION, EVENT_OBJECT_TABLE, ACTION_STATEMENT
FROM INFORMATION_SCHEMA.TRIGGERS
WHERE TRIGGER_SCHEMA='menagerie';
No triggers so we will create one. In this example we can assume that people forget to update the death of pets in the pet table. They note it in events but that is it.
So a simple trigger will keep this updated for us :
delimiter ;;
CREATE TRIGGER ed_update AFTER INSERT ON event
for each row
begin
UPDATE pet p
INNER JOIN event e ON p.name = e.name
SET p.death = e.date
WHERE e.remark = 'death' AND p.death IS NULL;
end;;
delimiter ;

This trigger just updated the pets table for us quickly.


To boil down my thoughts on stored procedures and triggers.... When you are considering a stored procedure make sure to benchmark and test to confirm your ok with the cache results. A stored procedure is best when the global cache is not going to be more effective overall. If tests show that the results are faster handling large data sets with stored procedures versus code base changes then also a good option to use stored procedures. Testing all options is critical.

Triggers can be a great addition to clean up additional scripting work that can be done immediately upon data insert. Again test and confirm to be sure your aware of all the options of using them. Options with triggers are limited so do all the explains ahead of time.

More links here:




PlanetMySQL Voting: Vote UP / Vote DOWN

Getting a return code from a stored procedure

Август 26th, 2010

Sometimes we have some special need for a stored procedure to call another to do something. That is fine, but what if the second stored proc failed for some reason? Maybe you want to halt the first stored procedure (the caller) and not proceed with the work until the problem is verified and resolved. So How do you do it?

Simple, get a return code from the called procedure to determine if it worked or not!

Here’s a sample piece of code to explain better:


DROP PROCEDURE IF EXISTS `test`.`testing123`;
DELIMITER $$

CREATE
PROCEDURE `test`.`testing123`(OUT a INT)
BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
SET a=2;
END;
SET a=0;

# toggle the below as comment or not to see the call at the bottom working
# if you uncomment select abc you'll see the error, otherwise all is cool

# select abc;

END$$

DELIMITER ;

DROP PROCEDURE IF EXISTS `test`.`testing456`;
DELIMITER $$

CREATE
PROCEDURE `test`.`testing456`()
BEGIN
SET @a=0;
CALL `test`.`testing123` (@a);

IF @a != 0 THEN
SELECT "There is a problem with proc `testing123`";
ELSE
SELECT "Everything is cool";
END IF;

END$$

DELIMITER ;

CALL `test`.`testing456` ();

testing123 is the second stored proc in this case, called from testing456. The trick is to have an exit handler which returns a status number to the first stored proc (testing456). The latter will hold an if condition to do `something` depending on the return code received by testing123.

If you have any other suggestions I’d appreciate your input.


PlanetMySQL Voting: Vote UP / Vote DOWN

mysqldump each object separately

Август 13th, 2010

As a continuation to a previous blog post last week and inspired by Kedar I have created a small script to export tables, stored procedures, functions and views into their respective file. It works for multiple databases where you can specify a list of databases too and although things like events, triggers and such are still missing they are easily added.

It is especially useful to dump stored procedures separately since it is a lacking functionality in mysqldump.

I placed the script in mysql forge for anybody to use, provide feedback and possibly enhancements to it.

Cheers,
Darren


PlanetMySQL Voting: Vote UP / Vote DOWN

mysqldump each object separately

Август 13th, 2010

As a continuation to a previous blog post last week and inspired by Kedar I have created a small script to export tables, stored procedures, functions and views into their respective file. It works for multiple databases where you can specify a list of databases too and although things like events, triggers and such are still missing they are easily added.

It is especially useful to dump stored procedures separately since it is a lacking functionality in mysqldump.

I placed the script in mysql forge for anybody to use, provide feedback and possibly enhancements to it.

Cheers,
Darren


PlanetMySQL Voting: Vote UP / Vote DOWN

Dumping DDL – mysqldump tables, stored procedures, events, triggers (separately)

Август 6th, 2010

If you like to keep your ddl backed up in some source management tool like svn or cvs and want to do it individually for stored procedures, events, triggers, tables and such rather than having a single file you can easily do so using the below. You could even include the –skip-dump-date or –skip-comments and use the below to compare ddl daily checking for alterations thus making sure you are aware of any ddl changes done on the database.

user=backup_user
password=`cat ~/.backup_password`
hostname=127.0.0.1
port=3306
dbname=test_db
path=/home/mysql/ddl
date=`date +%Y%m%d`

mysqldump -u$user -p$password -h$hostname -P$port --no-create-info --no-data --no-create-db --skip-opt $dbname > "$path"/"$dbname"_triggers_"$date".sql
mysqldump -u$user -p$password -h$hostname -P$port --routines --skip-triggers --no-create-info --no-data --no-create-db --skip-opt $dbname > "$path"/"$dbname"_routines_"$date".sql
mysqldump -u$user -p$password -h$hostname -P$port --events --skip-triggers --no-create-info --no-data --no-create-db --skip-opt $dbname > "$path"/"$dbname"_events_"$date".sql
mysqldump -u$user -p$password -h$hostname -P$port --skip-triggers --no-data --no-create-db --skip-opt $dbname > "$path"/"$dbname"_tables_"$date".sql

PlanetMySQL Voting: Vote UP / Vote DOWN

Next Week’s MySQL Sessions at ODTUG Kaleidoscope

Июнь 22nd, 2010

By now you know that there is a MySQL Track during next week’s ODTUG Kaleidoscope in Washington, DC. Ronald Bradford and I organized the schedule at the last minute (Ronald did a lot of the work!). It was difficult to fill a schedule with 19 sessions that are either 1 hour or 1.5 hours long, and to do it I ended up with three presentations.

At each presentation I will be giving away a copy of The MySQL Administrator’s Bible, so be sure to show up! All MySQL track sessions are in Maryland C, and all times are Eastern.

On Monday, June 28th from 4 pm – 5:30 pm I will be presenting “What do you mean, SQL Syntax Error?”, a presentation about how MySQL’s SQL syntax extends and deviates from the ANSI/ISO SQL:2003 standard. There is an 80-page PDF accompaniment that will be given out for free during this session.

On Tuesday, June 29th from 11 am to 12 noon I will be presenting Importing and Exporting Data with MySQL, about the many tools to load and bulk load data, and how to export data for regular and bulk loads. I will also be going over which storage engines are particularly well-suited for bulk loading, and the caveats to watch out for. This session is useful for those who know MySQL as well as those asking the question, “What’s the equivalent of Oracle’s SQL Loader for MySQL?”

On Wednesday, June 30th from 8:30 am to 9:30 am I will be presenting Navigating MySQL Stored Procedures & Functions, Views and Triggers, which covers all the ways stored procedures, stored functions, views and triggers can be used, including a highlight of Oracle differences.

I hope to see you there!


PlanetMySQL Voting: Vote UP / Vote DOWN

A micro-benchmark of stored routines in MySQL

Август 17th, 2009

Ever wondered how fast stored routines are in MySQL? I just ran a quick micro-benchmark to compare the speed of a stored function against a "roughly equivalent" subquery. The idea -- and there may be shortcomings that are poisoning the results here, your comments welcome -- is to see how fast the SQL procedure code is at doing basically the same thing the subquery code does natively (so to speak).

Before we go further, I want to make sure you know that the queries I'm writing here are deliberately mis-optimized to force a bad execution plan. You should never use IN() subqueries the way I do, at least not in MySQL 5.1 and earlier.

I loaded the World sample database and cooked up this query:

SQL:
  1. SELECT sql_no_cache sum(ci.Population) FROM City AS ci
  2.   WHERE CountryCode IN (
  3.     SELECT DISTINCT co.Code FROM Country AS co
  4.       INNER JOIN CountryLanguage AS cl ON cl.CountryCode = co.Code
  5.     WHERE lower(cl.LANGUAGE) = 'English');
  6. +--------------------+
  7. | sum(ci.Population) |
  8. +--------------------+
  9. |          237134840 |
  10. +--------------------+
  11. 1 row IN SET (0.23 sec)

This pretty consistently runs in just about 1/4th of a second. If you look at the abridged explain plan below, you'll see the query is doing a table scan against the first query, and then executing the subquery for each row:

SQL:
  1. mysql> EXPLAIN SELECT ....\G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: ci
  6.          type: ALL
  7. possible_keys: NULL
  8.           KEY: NULL
  9.       key_len: NULL
  10.           ref: NULL
  11.          rows: 4079
  12.         Extra: USING WHERE
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: DEPENDENT SUBQUERY
  16. *************************** 3. row ***************************
  17.            id: 2
  18.   select_type: DEPENDENT SUBQUERY

Now I took the subquery and basically rewrote it as a stored function.

SQL:
  1. mysql> delimiter //
  2. mysql> CREATE FUNCTION speaks_english(c char(3)) returns integer deterministic
  3.     > begin
  4.     > declare res int;
  5.     > SELECT count(DISTINCT co.Code) INTO res FROM Country AS co INNER JOIN CountryLanguage AS cl ON cl.CountryCode = co.Code WHERE lower(cl.LANGUAGE) = 'English' AND co.Code = c;
  6.     > RETURN res;
  7.     > end//
  8. mysql> delimiter ;

Now the query can be rewritten as this:

SQL:
  1. mysql> SELECT sql_no_cache sum(ci.Population) FROM City AS ci WHERE speaks_english(CountryCode)> 0;
  2. +--------------------+
  3. | sum(ci.Population) |
  4. +--------------------+
  5. |          237134840 |
  6. +--------------------+
  7. 1 row IN SET (1.00 sec)

If we explain it, we get output similar to the first table shown above, but the further two rows are not shown. The query can't be optimized to use indexes, and the stored function is opaque to the optimizer. This is why I purposefully wrote the subquery badly in the first query! (If you think of a better way to compare apples and uhm, apples... please comment).

The poorly-optimized-subquery portion of the query essentially happens inside that function now.

And it's four times slower, consistently, and that's all I wanted to show here. Thanks for reading.


Entry posted by Baron Schwartz | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Creating a MySQL plugin to produce an integer timestamp

Август 12th, 2009

This article shows how to create a MySQL-plugin that can be used to create a function which can in turn be used in stored procedures. The function will produce an integer value representing the time (to the nearest usec).

I’m working on an article for conflict detection/resolution when using MySQL Cluster asynchronous replication which requires an integer column to store a timestamp for comparison purposes. In fact, it doesn’t actually need the timestamp to represent an absolute or even a relative point in time – all it cares about is that the if the function is called twice on 2 different hosts that the 2nd call will always result in a larger number than the 1st. Obviously, in a production environment the times on the 2 hosts would need to be kept in sync.

The c code (inttime.c)

#include <mysql.h>
#include <sys/time.h>

my_bool inttime_init(UDF_INIT *initid,UDF_ARGS *args, char *message) {
  return 0;
}

void inttime_deinit(UDF_INIT *initid) {};

unsigned long int inttime(UDF_INIT *initid, UDF_ARGS *args, char *result,
                          unsigned long  *length, char *is_null, char *error)
{
  struct timeval tv;
  gettimeofday(&tv,(void *)0);
  return ((double)tv.tv_usec)+tv.tv_sec*1000000;
}

Compiling and deploying the shared library

[billy@ws1 timestamp]$ gcc -I/usr/local/mysql/include/ -fPIC -shared -o inttime.so inttime.c
[billy@ws1 timestamp]$ cp inttime.so /usr/local/mysql/lib/plugin/

Creating the function

mysql> create function inttime RETURNS REAL SONAME 'inttime.so';

Test the function

mysql> select inttime();select inttime();
+------------------+
| inttime()        |
+------------------+
| 1250080524270706 |
+------------------+
1 row in set (0.00 sec)

+------------------+
| inttime()        |
+------------------+
| 1250080524270833 |
+------------------+
1 row in set (0.00 sec)

Note that the results are different and the second result is always larger than the first – function is fit for purpose :)

Using the function from a stored procedure

mysql> CREATE TRIGGER updateTAB1 BEFORE UPDATE ON TAB1 FOR EACH ROW SET NEW.ts = inttime();

Acknowledgements

I used the c code found at http://lists.mysql.com/internals/33702 as the starting point.


PlanetMySQL Voting: Vote UP / Vote DOWN