Archive for the ‘scalability’ Category

Best of Guide – Highlights of Our Popular Content

Май 24th, 2012

Read the original article at Best of Guide – Highlights of Our Popular Content

Top 5 Most Popular

We use a broad brush to highlight the biggest no-nos in web application scalability.

We dig into scalability, steering to the richest areas to focus on.

MySQL on Amazon EC2, the what, how and when.

We highlight some of the big differences between the two database engines. We’re also working on revamping this content, so stay tuned for more.

Interviewing a MySQL DBA – a guide for managers. Also helpful if you’re gearing up for an interview.

Hiring Guides

This two part guide for a hiring manager, focuses on the MySQL Database Operations role.

A long time favorite, a hiring guide for Oracle DBAs.

Devops is the latest craze in bringing the worlds of operations and development together.  We talk about how to identify and attract such a candidate.

Today the cloud is *almost* synonymous with Amazon Web Services and their Elastic Compute (EC2) cloud offering.  Want to hire the best, here’s the step-by-step guide.

Howtos

Caching caching everywhere.  Learn to do it right.

Hotbackups make building replicas a snap.  Avoid the downtime & speedup the process.

Get your backups right so you can get some sleep at night!

Learn how to automatically spinup new MySQL slaves in Amazon EC2.

Industry Commentary

A decade ago startups large and small were running on Oracle, but no longer.  As the shift intensifies, it becomes harder and harder to find the right talent.  Here’s why.

High availability ain’t what it used to be.  Here’s why nobody is really achieving so-called five nines.

We argue that technologists with broad experience are needed to achieve scalability for today’s high traffic high transaction websites.

Startup & Small Business Advice

You’ve heard all the hype.  Now for some medicine.

Our three part guide takes you through ten steps to building a successful consulting business.  This is as much a guide for freelancers or wanna be consultants, as it is for startups, and those wishing to hire good temporary resources.

Book Reviews

Learn about scalability from the guys at AKF.

Here’s a great how-to book, short and to the point.  Optimizing those queries!

Building a startup doesn’t have to mean big money. Stay efficient and build those margins.

Jeff Jarvis champions Google, but we flip the question around.

Hidden Gems

Cut through the hype.  Which types of applications really do lend themselves to deploying in the cloud?

Ask some tough questions before you deploy everything in the cloud.

Migrating your application from Oracle to MySQL, you may be in for a bumpy road.  Here are a few things to watch out for.

Soup to nuts guide to deploying applications on Amazon Web Services EC2.

For more articles like these go to iHeavy, Inc +1-212-533-6828


PlanetMySQL Voting: Vote UP / Vote DOWN

Best of Guide – Highlights of Our Popular Content

Май 24th, 2012

Read the original article at Best of Guide – Highlights of Our Popular Content

Top 5 Most Popular

We use a broad brush to highlight the biggest no-nos in web application scalability.

We dig into scalability, steering to the richest areas to focus on.

MySQL on Amazon EC2, the what, how and when.

We highlight some of the big differences between the two database engines. We’re also working on revamping this content, so stay tuned for more.

Interviewing a MySQL DBA – a guide for managers. Also helpful if you’re gearing up for an interview.

Hiring Guides

This two part guide for a hiring manager, focuses on the MySQL Database Operations role.

A long time favorite, a hiring guide for Oracle DBAs.

Devops is the latest craze in bringing the worlds of operations and development together.  We talk about how to identify and attract such a candidate.

Today the cloud is *almost* synonymous with Amazon Web Services and their Elastic Compute (EC2) cloud offering.  Want to hire the best, here’s the step-by-step guide.

Howtos

Caching caching everywhere.  Learn to do it right.

Hotbackups make building replicas a snap.  Avoid the downtime & speedup the process.

Get your backups right so you can get some sleep at night!

Learn how to automatically spinup new MySQL slaves in Amazon EC2.

Industry Commentary

A decade ago startups large and small were running on Oracle, but no longer.  As the shift intensifies, it becomes harder and harder to find the right talent.  Here’s why.

High availability ain’t what it used to be.  Here’s why nobody is really achieving so-called five nines.

We argue that technologists with broad experience are needed to achieve scalability for today’s high traffic high transaction websites.

Startup & Small Business Advice

You’ve heard all the hype.  Now for some medicine.

Our three part guide takes you through ten steps to building a successful consulting business.  This is as much a guide for freelancers or wanna be consultants, as it is for startups, and those wishing to hire good temporary resources.

Book Reviews

Learn about scalability from the guys at AKF.

Here’s a great how-to book, short and to the point.  Optimizing those queries!

Building a startup doesn’t have to mean big money. Stay efficient and build those margins.

Jeff Jarvis champions Google, but we flip the question around.

Hidden Gems

Cut through the hype.  Which types of applications really do lend themselves to deploying in the cloud?

Ask some tough questions before you deploy everything in the cloud.

Migrating your application from Oracle to MySQL, you may be in for a bumpy road.  Here are a few things to watch out for.

Soup to nuts guide to deploying applications on Amazon Web Services EC2.

For more articles like these go to iHeavy, Inc +1-212-533-6828


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.6 Replication: FAQ

Май 23rd, 2012

On Wednesday May 16th, we ran a webinar to provide an overview of all of the new replication features and enhancements that are previewed in the MySQL 5.6 Development Release – including Global Transaction IDs, auto-failover and self-healing, multi-threaded, crash-safe slaves and more.

Collectively, these new capabilities enable MySQL users to scale for next generation web and cloud applications.

Attendees posted a number of great questions to the MySQL developers, serving to provide additional insights into how these new features are implemented. So I thought it would be useful to post those below, for the benefit of those unable to attend the live webinar (note, you can listen to the On-Demand replay which is available now).

Before getting to the Q&A, there are a couple of other resources that maybe useful to those wanting to learn more about Replication in MySQL 5.6

On-Demand webinar

Slides used during the webinar

For more detail on any of the features discussed below, be sure to check out the Developer Zone article: Replication developments in MySQL 5.6

So here is the Q&A from the event 

Multi-Threaded Slaves

The results from recent benchmarking of the Multi-Threaded Slave enhancement were discussed, prompting the following questions

Q. Going from 0 - 10 threads, did you notice any increase in IOwait on CPU?

A. In this case, yes. The actual amount depends on a number of factors: your hardware, query/transaction distribution across databases, server general and InnoDB specific configuration parameters.

Q. Will the multiple threads work on different transactions on the same database, or each thread works on a separate database?

A. Each worker thread works on separate databases (schemas).

Q. If I set the slave-parallel-workers to less than the number of databases, can I configure which databases use which worker threads?

A. There is no such configuration option to assign a certain Worker thread to a database. Configuring slave-parallel-workers to less than the number of databases is a good setup. A Worker can handle multiple databases.

Q. If I create 4 threads and I have 100 databases, can I configure which databases use which threads?

A. There won't be 1 worker to a mapping of exactly 25 databases, but it will be very close to that type of distribution.

Q. Thank You. I ask as we have about 8 databases that have a lot of transactions and about 30 that are used less frequently. It would be nice to have the ability to create 9 workers, 1 for each if the heavy databases and 1 for all the others

A. The current design enables relatively equal distribution of transactions across your databases, but it could be that 9 workers will fit anyway.

Q. Does multi-thread slave still work if there is foreign key across database?

A. MTS preserves slave consistency *within* a database, but not necessarily *between* databases. With MTS enabled and replication ongoing, updates to one database can be executed before updates to another causing them to be temporarily out of sync with each other.

Q. How is auto-increment managed with the multi-thread slave?

A. MTS makes sure Worker threads do not execute concurrently on the same table. Auto-increment is guaranteed to be handled in the same way as the single threaded "standard" slave handles auto-increment

Q. Can you use semi-synchronous replication on one of the slaves when using MTS?

A. Semi-sync is friendly to MTS. MTS is about parallel execution - so they cooperate well.

Optimized Row Based Replication

Q. If you only store the PK column in the Before Image for updates, does this mean you don't care about the slave's data potentially being out-of-sync? Will we be able to control how much data is stored in the binary logs?

A. The rule is that we ship a *PK equivalent* so that the slave is always able to find a row. This means:
  1. if master table has PK, then we ship the PK only
  2. if master table does not have PK, then we ship the entire row

Global Transaction IDs and HA Utilities

Q. Would the failover utility need to sit on a 3rd party host to allow arbitration?

A. The utility would typically run on a client, not on the hosts it is monitoring

Q. Can you explain the upgrade process to move to MySQL 5.6? I am assuming that the slave(s) are upgraded first and that replication would be backwards compatible. And after the slave(s) are upgraded, the master would be next. But then how do you turn on the GTID?

A. Right: the slave(s) are upgraded first. After upgrading Slaves they would be started with --gtid-mode=on (technically a couple of other options are needed as well). And then the same process would be followed for the upgraded Master.

Q. For failover functionality, if I had a setup like this: Server1 replicates to Server2, Server2 replicates to Server3, Server4, and Server5. If Server2 were to fail, can I have it configured so that Server1 can become the new master for Server3/4/5?

A. Yes - you can either configure the failover to the most recent slave based on GTID, or list specific candidates. What will happen is that Slave 1 will temporarily become a slave of 3, 4 and 5 to ensure it replicates any more recent transactions those slaves may have, and then it will become the master

Replication Event Checksums

Q. What is the overhead of replication checksums?

A. It is minimal - we plan to publish benchmarks over the summer to better characterize any overhead

Q. Are you exposing the checksum to the plugin api so we can add our own checksum types?

A. We prepared some of interfaces, i.e. checksum methods are identified by a specific code byte (1-127) values. But it's not really a plugin at this point.

Q. Do checksums verify both replicated data and the data on slave?

A. Yes - checksums are implemented across the entire path - so you can check for issues in replication itself, or in the hardware or network

Q. I think, it is better to turn on checksums at the relaylog to avoid overhead on the master, but if we do that and checksum fails (i.e. not matching the master's data) then what happens – will the slave throw an error

A. I agree, it's better to relax the Master, which verifies the checksum only optionally when the Dump Thread reads from the binlog prior to the replication event being sent out to the Slave. The slave mandatorily checks the checksummed events when they are sent across the network, and optionally when they are read from Relay-log. In either case, an error is thrown.

Q. Are checksums optional? In some cases we don't care for huge data loads

A. Yes, checksums are optional.

Time Delayed Replication

Q. Is Time delayed replication applied at the database level only and not for the entire slave?

A. Applied for the slave as execution is global.

Informational Log Events

Q. When we configure information log event, does it show meaningful query log for any binlog format? (Row based especially)

A. When using row based replication, you get the original query in human readable format... obviously you don’t want to see all the rows modified in a table of size, which can be huge

Q. Will the binlog include user id?

A. User id is replicated in some cases for Query-log-event - namely user id of the invoker when a stored routine is called.

Remote Binlog Backup

Q. What level of access does the remote binlog backup need?

A. Replication slave

Summary

As you can see, our Engineering team was kept busy with questions over the course of the webinar. Be sure to check out the MySQL 5.6 Replication webinar replay and if you have further questions, please don’t hesitate to use the comments below!


PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB-5.5 Thread Pool Performance

Март 20th, 2012

MariaDB-5.5.21-beta is the first MariaDB release featuring the new thread pool. Oracle offers a commercial thread pool plugin for MySQL Enterprise, but now MariaDB brings a thread pool implementation to the community!

If you are not familiar with the term, please read the Knowledge Base article about it.

The main design goal of the thread pool is to increase the scalability of the MariaDB server with many concurrent connections. In order to test and demonstrate this, I have run the sysbench OLTP RO benchmark with up to 4096 threads to compare the new pool-of-threads and the traditional thread-per-connection scheduler:

OLTP(ro) MariaDB-5.5.21 pool-of-threads vs. thread-per-connection

Benchmark description:

  • sysbench multi table OLTP, readonly
  • 16 tables, totaling 40 mio rows (~10G of data)
  • 16G buffer pool – result is independent of disk performance
  • mysqld bound to 16 cpu cores, sysbench to the other 8

Read/write OLTP benchmark results will be published as soon as they are available.

Raw benchmark results and the scripts used can be downloaded here


PlanetMySQL Voting: Vote UP / Vote DOWN

Black-Box Performance Analysis with TCP Traffic

Февраль 23rd, 2012

This is a cross-post from the MySQL Performance Blog. I thought it would be interesting to users of PostgreSQL, Redis, Memcached, and $system-of-interest as well.

For about the past year I’ve been formulating a series of tools and practices that can provide deep insight into system performance simply by looking at TCP packet headers, and when they arrive and depart from a system. This works for MySQL as well as a lot of other types of systems, because it doesn’t require any of the contents of the packet. Thus, it works without knowledge of what the server and client are conversing about. Packet headers contain only information that’s usually regarded as non-sensitive (IP address, port, TCP flags, etc), so it’s also very easy to get access to this data even in highly secure environments.

I’ve finally written up a paper that shows some of my techniques for detecting problems in a system, which can be an easy way to answer questions such as “is there something we should look into more deeply?” without launching a full-blown analysis project first. It’s available from the white paper section of our website: MySQL Performance Analysis with Percona Toolkit and TCP/IP Network Traffic

Further Reading:


PlanetMySQL Voting: Vote UP / Vote DOWN

What Ops doesn’t tell you about your MySQL Database

Декабрь 16th, 2011

MySQL is a very scalable platform which has proven robust even in the most dense and complex data environments. MySQL's indispensable replication function is 'sold' as being fail-safe so you have little to sweat about as long as your backups are running regularly. But what the ops guys aren't telling you is MySQL performs replication with tiny margins of error that could cause big problems in times of disaster.

Replication database backup

Same-same but different?


The Scenario

Imagine the scene, you use replication to backup your data. Your secondary database is your peace of mind. It's the always-on clone of your crown jewels. You even perform backups off of it so you don't impact your live website. Your backups run without errors. Your slave database runs without errors.
Then the dreaded day comes when your primary database fails. You instruct your team to switchover your application to point to your live backup database. The site comes online again. But all is not right. You notice subtle differences and your team begins to question how deep the data divide could be.

The Problem
Although MySQL replication is fairly easy to setup, and even to keep running without error, you may have unseen problems. MySQL's core technology to replicate data between master and slave is primarily statement based. Various scenarios can cause what in other database platforms you might call database corruption, that is silent drifting of data from what tables and rows contain on the master. It is no fault of your own, or perhaps one might argue even of your operations team. It is a fundamental flaw in how MySQL performs replication.

The Solution
Fortunately there is a solution. Checksums, the wonderful computational tool for comparing things can be put to work nicely to compare database. The Percona Toolkit (formerly maatkit) includes just such a utility for use with MySQL. It can be used to check the integrity of your slave databases.
If you've never performed such a check, you should do so ASAP. If your database has been running for months at a stretch, chances are there could be differences lying undiscovered between the two systems.

Depending on the volume changing in your database, you can continue to use this tool periodically to confirm that all is consistent. If integrity checks fail, there is another tool in Maatkit to syncronize differences, and bring everything back to order.


PlanetMySQL Voting: Vote UP / Vote DOWN

High Performance PHP Session Storage on Scale

Ноябрь 17th, 2011

One of the great things about the HTTP protocol, besides status code 418, is that it's stateless. A web server therefore is not required to store any information on the user or allocate resources for a user after the individual request is done. By that a single web server can handle many many many different users easily, and well if it can't anymore one can add a new server, put a simple load balancer in front and scale out. Each of those web servers then handles its requests without the need for communication which leads to linear scaling (assuming network provides enough bandwidth etc.).

Now the Web isn't used for serving static documents only anymore but we have all these fancy web apps. And those applications often have the need for a state. The most trivial information they need is the current user. HTTP is a great protocol and provides a way to do authentication which works well with its stateless nature - unfortunately this authentication is implemented badly in current clients. Ugly popups, no logout button, ... I don't have to tell more I think. For having nicer login systems people want web forms. Now the stateless nature of HTTP is a problem: The user may login and then browse around. On later requests it should still be known who that user is - with a custom HTML form based login alone this is not possible. A solution might be cookies. At least one might think so for a second. But setting a cookie "this is an authorized user" alone doesn't make sense as it could easily be faked. Better is to simply store a random identifier in a cookie and then keep a state information on the server. Then all session data is protected and only the user who knows this random identifier is authenticated. If this identifier is wisely chosen and hard to guess this works quite well. Luckily this is a mostly PHP- and MySQL-focused blog and as PHP is a system for building web applications this functionality is part of the core language: The PHP session module.

The session module, which was introduced in PHP 4, partly based on work on the famous phplib library, is quite a fascinating piece of code. It is open and extendable in so many directions but still so simple to use that everybody uses it, often newcomers learn about it on their first day in PHP land. Of course you can not only store the information whether the user is logged in but cache some user-specific data or keep the state on some transactions by the user, like multi-page forms or such.

In its default configuration session state will be stored on the web server's file system. Each session's data in its own file in serialized form. If the filesystem does some caching or one uses a ramdisk or something this can be quite efficient. But as we suddenly have a state on the web server we can't scale as easily as before anymore: If we add a new server and then route a user with an existing session to the new server all the session data won't be there. That is bad. This is often solved by a configuration of the load balancer to route all requests from the same user to the same web server. In some cases this works quite ok, but it is often seen that this might cause problems. Let's assume you want to take a machine down for maintenance. All sessions there will die. Or imagine there's a bunch of users who do complex and expensive tasks - then one of your servers will have a hard time, giving these users bad response times which feels like bad service, even though your other systems are mostly idle.

A nice solution for this would be to store the sessions in a central repository which can be accessed from all web servers.


Continue reading "High Performance PHP Session Storage on Scale"
PlanetMySQL Voting: Vote UP / Vote DOWN

Replication stars

Ноябрь 4th, 2011
Working with replication, you come across many topologies, some of them sound and established, some of them less so, and some of them still in the realm of the hopeless wishes. I have been working with replication for almost 10 years now, and my wish list grew quite big during this time. In the last 12 months, though, while working at Continuent, some of the topologies that I wanted to work with have moved from the cloud of wishful thinking to the firm land of things that happen. My quest for star replication starts with the most common topology. One master, many slaves.
Replication 1 master slave

Fig 1. Master/Slave topology

Replication legend

Legend

It looks like a star, with the rays extending from the master to the slaves. This is the basis of most of the replication going on mostly everywhere nowadays, and it has few surprises. Setting aside the problems related to failing over and switching between nodes, which I will examine in another post, let's move to another star.
Replication 2 fan in slave

Fig 2. Fan-in slave, or multiple sources

The multiple source replication, also known as fan-in topology, has several masters that replicate to the same slave. For years, this has been forbidden territory for me. But Tungsten Replicator allows you to create multiple source topologies easily. This is kind of uni-directional, though. I am also interested in topologies where I have more than one master, and I can retrieve data from multiple points.
Replication 3 all to all three nodes

Fig 3. all-to-all three nodes

Replication 4 all to all four nodes

Fig 4. All-to-all four nodes

Tungsten Multi-Master Installation solves this problem. It allows me to create topologies where every node replicates to every other node. Looking at the three-node scheme, it appears a straightforward solution. When we add one node, though, we see that the amount of network traffic grows quite a lot. The double sided arrows mean that there is a replication service at each end of the line, and two open data channels. When we move from three nodes to four, we double the replication services and the channels needed to sustain the scheme. For several months, I was content with this. I thought: it is heavy, but it works, and it's way more than what you can do with native replication, especially if you consider that you can have a practical way of preventing conflicts using Shard Filters. But that was not enough. Something kept gnawing at me, and from time to time I experimented with Tungsten Replicator huge flexibility to create new topologies. But the star kept eluding me. Until … Until, guess what? a customer asked for it. The problem suddenly ceased to be a personal whim, and it became a business opportunity. Instead of looking at the issue in the idle way I often think about technology, I went at it with practical determination. What failed when I was experimenting in my free time was that either the pieces did not glue together the way I wanted, or I got an endless loop. Tungsten Replicator has a set of components that are conceptually simple. You deploy a pipeline between two points, open the tap, and data starts flowing in one direction. Even with multiple masters replication, the principle is the same. You deploy many pipes, and each one has one purpose only.
Replication 5 star topology 3 rays

Fig 5. All-masters star topology

In the star topology, however, you need to open more taps, but not too many, as you need to avoid the data looping around. The recipe, as it turned out, is to create a set of bi-directional replication systems, where you enable the central node slave services to get changes only from a specific master, and the slave services on the peripheral nodes to accept changes from any master. It was as simple as that. There are, of course, benefits and drawbacks with a star topology, compared to a all-replicate-to-all design. In the star topology, we create a single point of failure. If the central node fails, replication stops, and the central node needs to be replaced. Instead, the all-to-all design has no weaknesses. Its abundance of connections makes sure that, if a node fails, the system continues working without any intervention. There is no need for fail-over.
Replication 6 all to all extending png

Fig 6. extending an all-to-all topology

Replication 7 star extending

Fig 7. Extending a star topology

However, there is a huge benefit in the node management. If you need to add a new node, it costs two services and two connections, while the same operation in the all-to-all replication costs 8 services and 8 connections. With the implementation of this topology, a new challenge has arisen. While conflict prevention by sharding is still possible, this is not the kind of scenario where you want to apply it. We have another conflict prevention mechanism in mind, and this new topology is a good occasion make it happen. YMMV. I like the additional choice. There are cases where a all-replicate-to-all topology is still the best option, and there are cases where a star topology is more advisable.

PlanetMySQL Voting: Vote UP / Vote DOWN

Book Review – Effective MySQL

Ноябрь 4th, 2011

Effective MySQL: Optimizing SQL Statements

by Ronald Bradford

No Nonsense, Readable, Practical, and Compact

Effective MySQLI like that this book is small; 150 pages means you can carry it easily.  It's also very no nonsense.  It does not dig too deeply into theory unless it directly relates to your day-to-day needs.  And those needs probably cluster heavily around optimizing SQL queries, as those pesky developers are always breaking things ;)

Jokes aside, this new book out on Oracle Press is a very readable volume. Bradford has drawn directly from real-world experience to give you the right bite size morsels you need in your day-to-day MySQL activities.

Highlights

Chapter one, The Five Minute DBA gives you the basic methodology if you don't already know it.  Enable the slow query log, analyze it, and use the explain facility.  Then index as appropriate, or eliminate queries if you can.

Chapter two digs a little deeper into the basics, introducing explain extended, table statistics and storage engines.  You'll also learn how to use show session & global status, as well as session & global variables.  You'll also have your first look at MySQL's data dictionary - INFORMATION_SCHEMA.

Chapter three is where it starts to get meaty.  You probably know that MySQL has b-tree indexes, but did you know that it has b+tree indexes, or hash indexes?

Chapter four digs into indexes further with single & multi-column indexes using them for sorting and joining.  You'll also find out about covering indexes which are multi-column matching the where clause, but also including columns needed in the SELECT predicate.  Do you have duplicate or unused indexes?  You'll learn why they matter to performance and how to eliminate them with tools like mk-duplicate-key-checker.

Chapter five continues along the same lines, with more coverage of indexes.  Learn to identify when you are using a covering index, fulfilling the entire query by only accessing the index.  You'll also learn about partial indexes, how they can reduce the size of index storage and retrieval while still getting your data for you.

Chapter six covers configuring the server itself, hitting on the system variables such as the innodb buffer pool (innodb_buffer_pool_size) and key buffer (key_buffer_size) as well as the query cache.  You'll also learn how to set the four main session memory settings - sort buffer (sort_buffer_size) and join buffer ( join_buffer_size) as well as the lesser known read buffers (read_buffer_size and read_rnd_buffer_size).

Chapter seven is all about the process of tuning and optimizing MySQL.  Rolling all the previous sections into marching orders, and prescriptive advice, he takes you through step by step how to apply the principles.  You'll get an introduction to mk-query-digest (though strangely without attribution to Baron Schwartz), the great maatkit tool for query analysis and aggregation, as well as the microsecond precision patch, which allows your mysql shell client to display more exact timing data.  For the patch he links back to an article on his own site which seems to be not found.  The author of the high precision mysql timer patch is Stewart Smith.

I personally got the most out of Chapter eight, full of self-described hidden performance tips.  From identifying unused or duplicate indexes, to replacing inefficient data types with better ones, why it's important to use NOT NULL where possible or how to store IP addresses efficiently, this chapter has a lot of goodies.  For those still struggling with SQL statement tuning, there are a few patterns that are described, offering advice on how to rewrite a subquery as an inner join,

What you might not know

  • MySQL includes Oracle's index organized tables by a different name
  • Too many indexes can dramatically impact INSERT & UPDATE performance
  • Many DDL operations can be done online - see oak-online-alter-table (Shlomi Noach)
  • Datatypes matter - use enum, int unsigned, timestamp & not null where possible
  • Covering indexes are your friend, duplicate & unused indexes are not!
  • A replication slave can have different storage engines or indexes from the master. These can support different uses - such as data warehousing or non-transactional requirements.
  • While a_string LIKE '%end of my sentence.' won't use an index, you can index reverse_string, then use reverse_string LIKE REVERSE '%end of my sentence.' and MySQL will use this index.  You've simulated an advanced Oracle feature, reverse key indexes!

A few small gripes

If I were to add a few complaints it would be to say that some of the examples were rather simplistic.  In many cases tuning SQL is not as simple as just adding the right index.  For instance there was no good discussion of the dreaded "using temporary, using filesort" that we see a lot in MySQL explains when sorting has to be done, but will not fit in memory.  Or what about tmpdir=/dev/shm, how will that improve things?  What about UNION versus UNION ALL where appropriate.  Why does DISTINCT do a sort?

The book was also missing a discussion of triggers, stored procedures, when or if the query cache can cause problems and so forth.  Also the article link mentioned about chapter seven isn't the only missing link.  I followed links to optimizing sql  statements and it seems to go to a generic holding page.  Also the main link effectivemysql.com/book leads to an outline of an as yet unreleased title on Backup and Recovery.

All in all, well worth your money

However, other than these few gripes the book overall is a very welcome addition to the small family of MySQL books.  Get a copy quick before they're all gone!

 

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Book Review – Effective MySQL

Ноябрь 4th, 2011

Effective MySQL: Optimizing SQL Statements

by Ronald Bradford

No Nonsense, Readable, Practical, and Compact

Effective MySQLI like that this book is small; 150 pages means you can carry it easily.  It's also very no nonsense.  It does not dig too deeply into theory unless it directly relates to your day-to-day needs.  And those needs probably cluster heavily around optimizing SQL queries, as those pesky developers are always breaking things ;)

Jokes aside, this new book out on Oracle Press is a very readable volume. Bradford has drawn directly from real-world experience to give you the right bite size morsels you need in your day-to-day MySQL activities.

Highlights

Chapter one, The Five Minute DBA gives you the basic methodology if you don't already know it.  Enable the slow query log, analyze it, and use the explain facility.  Then index as appropriate, or eliminate queries if you can.

Chapter two digs a little deeper into the basics, introducing explain extended, table statistics and storage engines.  You'll also learn how to use show session & global status, as well as session & global variables.  You'll also have your first look at MySQL's data dictionary - INFORMATION_SCHEMA.

Chapter three is where it starts to get meaty.  You probably know that MySQL has b-tree indexes, but did you know that it has b+tree indexes, or hash indexes?

Chapter four digs into indexes further with single & multi-column indexes using them for sorting and joining.  You'll also find out about covering indexes which are multi-column matching the where clause, but also including columns needed in the SELECT predicate.  Do you have duplicate or unused indexes?  You'll learn why they matter to performance and how to eliminate them with tools like mk-duplicate-key-checker.

Chapter five continues along the same lines, with more coverage of indexes.  Learn to identify when you are using a covering index, fulfilling the entire query by only accessing the index.  You'll also learn about partial indexes, how they can reduce the size of index storage and retrieval while still getting your data for you.

Chapter six covers configuring the server itself, hitting on the system variables such as the innodb buffer pool (innodb_buffer_pool_size) and key buffer (key_buffer_size) as well as the query cache.  You'll also learn how to set the four main session memory settings - sort buffer (sort_buffer_size) and join buffer ( join_buffer_size) as well as the lesser known read buffers (read_buffer_size and read_rnd_buffer_size).

Chapter seven is all about the process of tuning and optimizing MySQL.  Rolling all the previous sections into marching orders, and prescriptive advice, he takes you through step by step how to apply the principles.  You'll get an introduction to mk-query-digest (though strangely without attribution to Baron Schwartz), the great maatkit tool for query analysis and aggregation, as well as the microsecond precision patch, which allows your mysql shell client to display more exact timing data.  For the patch he links back to an article on his own site which seems to be not found.  The author of the high precision mysql timer patch is Stewart Smith.

I personally got the most out of Chapter eight, full of self-described hidden performance tips.  From identifying unused or duplicate indexes, to replacing inefficient data types with better ones, why it's important to use NOT NULL where possible or how to store IP addresses efficiently, this chapter has a lot of goodies.  For those still struggling with SQL statement tuning, there are a few patterns that are described, offering advice on how to rewrite a subquery as an inner join,

What you might not know

  • MySQL includes Oracle's index organized tables by a different name
  • Too many indexes can dramatically impact INSERT & UPDATE performance
  • Many DDL operations can be done online - see oak-online-alter-table (Shlomi Noach)
  • Datatypes matter - use enum, int unsigned, timestamp & not null where possible
  • Covering indexes are your friend, duplicate & unused indexes are not!
  • A replication slave can have different storage engines or indexes from the master. These can support different uses - such as data warehousing or non-transactional requirements.
  • While a_string LIKE '%end of my sentence.' won't use an index, you can index reverse_string, then use reverse_string LIKE REVERSE '%end of my sentence.' and MySQL will use this index.  You've simulated an advanced Oracle feature, reverse key indexes!

A few small gripes

If I were to add a few complaints it would be to say that some of the examples were rather simplistic.  In many cases tuning SQL is not as simple as just adding the right index.  For instance there was no good discussion of the dreaded "using temporary, using filesort" that we see a lot in MySQL explains when sorting has to be done, but will not fit in memory.  Or what about tmpdir=/dev/shm, how will that improve things?  What about UNION versus UNION ALL where appropriate.  Why does DISTINCT do a sort?

The book was also missing a discussion of triggers, stored procedures, when or if the query cache can cause problems and so forth.  Also the article link mentioned about chapter seven isn't the only missing link.  I followed links to optimizing sql  statements and it seems to go to a generic holding page.  Also the main link effectivemysql.com/book leads to an outline of an as yet unreleased title on Backup and Recovery.

All in all, well worth your money

However, other than these few gripes the book overall is a very welcome addition to the small family of MySQL books.  Get a copy quick before they're all gone!

 

 


PlanetMySQL Voting: Vote UP / Vote DOWN