Archive for the ‘memcached’ Category

Caching could be the last thing you want to do

Июль 24th, 2010

I recently had a run-in with very popular PHP ecommerce package which makes me want to voice a recurring mistake I see in how many web applications are architected.

What is that mistake?

The ecommerce package I was working with depended on caching.  Out of the box it couldn’t serve 10 pages/second unless I enabled some features which were designed to be “optional” (but clearly they weren’t).

I think with great tools like memcached it is easy to get carried away and use it as the mallet for every performance problem, but in many cases it should not be your first choice.  Here is why:

  • Caching might not work for all visitors - You look at a page, it loads fast.  But is this the same for every user?  Caching can sometimes be an optimization that makes the average user have a faster experience, but in reality you should be caring more that all users get a good experience (Peter explains why here, taking about six sigma).  In practice it can often be the same user that has all the cache misses, which can make this problem even worse.
  • Caching can reduce visibility – You look at the performance profile of what takes the most time for a page to load and start trying to apply optimization.  The problem is that the profile you are looking at may skew what you should really be optimizing.  The real need (thinking six sigma again) is to know what the miss path costs, but it is somewhat hidden.
  • Cache management is really hard – have you planned for cache stampeding, or many cache items being invalidated at the same time?

What alternative approach should be taken?

Caching should be seen more as a burden that many applications just can’t live without.  You don’t want that burden until you have exhausted all other easily reachable optimizations.

What other optimizations are possible?

Before implementing caching, here is a non-exhaustive checklist to run through:

  • Do you understand every execution plan of every query? If you don’t, set long_query_time=0 and use mk-query-digest to capture queries.  Run them through MySQL’s EXPLAIN command.
  • Do your queries SELECT *, only to use subset of columns?  Or do you extract many rows, only to use a subset? If so, you are extracting too much data, and (potentially) limiting further optimizations like covering indexes.
  • Do you have information about how many queries were required to generate each page? Or more specifically do you know that each one of those queries is required, and that none of those queries could potentially be eliminated or merged?

I believe this post can be summed up as “Optimization rarely decreases complexity. Avoid adding complexity by only optimizing what is necessary to meet your goals.”  – a quote from Justin’s slides on instrumentation-for-php.  In terms of future-proofing design, many applications are better off keeping it simple and refusing the temptation to try and solve some problems “like the big guys do”.


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

As of late…

Июль 15th, 2010
What I'm up to lately (giving love to some projects):

* Fixing bugs in DBD::mysql, just released 4.015, 4.016, and next 4.017. I had a patch sent yesterday from a user/developer that I want to get out there
* Memcached::libmemcached - 0.4201 version - now using latest libmemcached 0.42. This is the only Perl client that supports binary protocol!

patg@patg-desktop:~/code_dev/perl-libmemcached$ PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/12-set-get-binary.t
t/12-set-get-binary....ok                                                   
All tests successful.
Files=1, Tests=5,  0 wallclock secs ( 0.04 cusr +  0.01 csys =  0.05 CPU)

Whoot!

* FederatedX (in Maria) - fixing MySQL bug 32426, https://bugs.launchpad.net/maria/+bug/571200 . This involves a little work as it is fixed in Federated (not FederatedX) and FederatedX has a whole new design using an IO class to abstract database driver details as well as numerous other changes. But it will happen.

* Delving into C++ Boost libraries. These look quite useful!
PlanetMySQL Voting: Vote UP / Vote DOWN

As of late…

Июль 15th, 2010
What I'm up to lately (giving love to some projects):

* Fixing bugs in DBD::mysql, just released 4.015, 4.016, and next 4.017. I had a patch sent yesterday from a user/developer that I want to get out there
* Memcached::libmemcached - 0.4201 version - now using latest libmemcached 0.42. This is the only Perl client that supports binary protocol!

patg@patg-desktop:~/code_dev/perl-libmemcached$ PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/12-set-get-binary.t
t/12-set-get-binary....ok                                                   
All tests successful.
Files=1, Tests=5,  0 wallclock secs ( 0.04 cusr +  0.01 csys =  0.05 CPU)

Whoot!

* FederatedX (in Maria) - fixing MySQL bug 32426, https://bugs.launchpad.net/maria/+bug/571200 . This involves a little work as it is fixed in Federated (not FederatedX) and FederatedX has a whole new design using an IO class to abstract database driver details as well as numerous other changes. But it will happen.

* Delving into C++ Boost libraries. These look quite useful!
PlanetMySQL Voting: Vote UP / Vote DOWN

Webinar today – Scaling Web Services with MySQL Cluster, Part 1: An Alternative to MySQL Server & memcached

Июнь 9th, 2010

MySQL and memcached has become, and will remain, the foundation for many dynamic web services with proven deployments in some of the largest and most prolific names on the web. There are classes of web services however that are update-intensive, demanding real-time responsiveness and continuous availability. In these cases, MySQL Cluster provides the familiarity and ease-of-use of the regular MySQL Server, while delivering significantly higher levels of write performance with less complexity, lower latency and 99.999% availability. This webinar will discuss the use-cases for both approaches, and provide an insight into how MySQL Cluster is enabling users to scale their update-intensive web services.

The webinar starts at 09:00 Pacific/17:00 UK/18:00 CET today (June 9th 2010).

Still time to register (for free) at http://www.mysql.com/news-and-events/web-seminars/display-545.html – even if you can’t attend, this way you’ll get sent a link to the charts and replay.


PlanetMySQL Voting: Vote UP / Vote DOWN

On Good Instrumentation

Июнь 1st, 2010

In so many cases troubleshooting applications I keep thinking how much more efficient things could be going if only there would be a good instrumentation available. Most of applications out there have very little code to help understand what is going on and if it is there it is frequently looking at some metrics which are not very helpful.

If you look at the system from bird eye view – system needs to process transactions and you want it to successfully complete large number of transactions it gets (this is what called availability) and we want it to serve them with certain response time, which is what is called performance. There could be many variables in environment which change – load, number of concurrent users, database, the way users use the system but in the nutshell all what you really care is having predictable response time within certain range. So if we care about response time – this is exactly what our instrumentation should measure

Response Time Summary We want to understand where exactly response time comes from. For example if we define transaction as the time it took to generate HTML page we want to understand how much time was spent waiting on the database, memcache, other external services, as well as how much CPU time it consumed.

Now what is important we need this information for individual transactions. It may be every transaction which is best and easily achievable for small-medium systems or at least for large enough sample. It is very important this information is available for individual transactions not the average. Average is useless because 100 transactions taking 1 sec and 99 transactions having 1ms and 1 taking 99.1 sec will have the same average while for sake of performance analyzes these are completely different. When you have transaction sample make sure it contains fair population of transaction – getting only transactions which are slow is not helpful as we might want to compare them to the fast transactions to understand why they are slow.

What kind of components do you need to have in response time summary – all components which are significant enough. If your instrumentation has 95% of response time unaccounted for it is useless. You also want blocks not to mix apples with oranges. For example “mysql and memcache” block would not be helpful. Even further I would prefer to split “mysql time” in the “connect time” and “query time” as there are situations when one but not other would be affected.

In is important for response time summary stored in the logs which are easy to query so you can analyze data in a lot of various ways. Sometimes you may find the response time is impacted by queries from certain user, in others it may be attributed to different application/web server.

The goal for Response Time Summary is to quickly point direction where problem happens. Whenever you have spike in response time or it is bad response time for certain kind of request you can quickly understand where does it come from ? Is it wasted CPU time Slow response from MySQL or Memcache.

I also like to see numbers of calls stored together with attributed response time. For example I’d like to see number of mysql calls in addition to MySQL response time. This helps to understand if it is the issue with number of queries or their performance. If I see 2 queries taking 30 seconds it is clearly slow queries. If it is 10.000 queries executed and total response time is 4 sec I know it is pretty much as good as it gets with standard Ethernet network and finding a ways to reduce number of queries is going to be more helpful.

The Glue Our applications involve multiple layers and typically higher layer can only report response time it took to call lower layer, but not the reason for that response time. For example we can report time it took to execute SQL query from PHP application side, but we can’t say why it has taken so long. Was it row level lock ? waiting on disk IO or was it simply question of burning a lot of CPU. On the other hand this information may be available in the instrumentation stats from that lower layer – for example in MySQL Query Log. What is important is however to be able to connect the data from these logs – glue them together. The easy way to do it is to provide an unique identifier to all requests and put it in the logs with request of the lower levels. With mySQL the simple way to do it is to put it in the comments for queries you execute.

Optional Tracing The information in lower layers logs is very helpful however it typically have two problems. First not every layer has good logs. For example if you’re running memcached you probably do not have the logs detailing all requests and their response time. Second – the lower layer may only know response time from its vantage point, which in many cases does not include network communication time which can be very important.

Tracing should be optional and normally applied to the small sample of requests, though it needs to be detailed. Typically you would include the calls to the lower level services together with timestamp, the response summary with timestamp again. The information about request has to be complete enough to identify target action and response completely. For example if I’m speaking about memcache I’d like to know which server:port request was issued to which key was requested, and on response I’d like to know if it was hit, miss or error.

The way I use it may be as follows. I see the increased response time for given kind of request. I see response time is coming from MySQL. I check the number of MySQL Queries and it is 5x when it usually is for this kind of request. Looking at memcache stats I can see high number of misses. Looking at some available traces shows the server memcache01 has very high miss ratio. Checking what is going on with memcache01 shows it just was restarted (and hence has almost empty cache). This is important example as it shows your increased response time from MySQL may not have anything to do with MySQL itself but you would not know unless you’re capturing the right data.

If you’re looking for nice example framework for instrumentation, check out instrumentation for PHP – It has everything mentioned by tracing which is trivial to add.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

On Good Instrumentation

Июнь 1st, 2010

In so many cases troubleshooting applications I keep thinking how much more efficient things could be going if only there would be a good instrumentation available. Most of applications out there have very little code to help understand what is going on and if it is there it is frequently looking at some metrics which are not very helpful.

If you look at the system from bird eye view – system needs to process transactions and you want it to successfully complete large number of transactions it gets (this is what called availability) and we want it to serve them with certain response time, which is what is called performance. There could be many variables in environment which change – load, number of concurrent users, database, the way users use the system but in the nutshell all what you really care is having predictable response time within certain range. So if we care about response time – this is exactly what our instrumentation should measure

Response Time Summary We want to understand where exactly response time comes from. For example if we define transaction as the time it took to generate HTML page we want to understand how much time was spent waiting on the database, memcache, other external services, as well as how much CPU time it consumed.

Now what is important we need this information for individual transactions. It may be every transaction which is best and easily achievable for small-medium systems or at least for large enough sample. It is very important this information is available for individual transactions not the average. Average is useless because 100 transactions taking 1 sec and 99 transactions having 1ms and 1 taking 99.1 sec will have the same average while for sake of performance analyzes these are completely different. When you have transaction sample make sure it contains fair population of transaction – getting only transactions which are slow is not helpful as we might want to compare them to the fast transactions to understand why they are slow.

What kind of components do you need to have in response time summary – all components which are significant enough. If your instrumentation has 95% of response time unaccounted for it is useless. You also want blocks not to mix apples with oranges. For example “mysql and memcache” block would not be helpful. Even further I would prefer to split “mysql time” in the “connect time” and “query time” as there are situations when one but not other would be affected.

In is important for response time summary stored in the logs which are easy to query so you can analyze data in a lot of various ways. Sometimes you may find the response time is impacted by queries from certain user, in others it may be attributed to different application/web server.

The goal for Response Time Summary is to quickly point direction where problem happens. Whenever you have spike in response time or it is bad response time for certain kind of request you can quickly understand where does it come from ? Is it wasted CPU time Slow response from MySQL or Memcache.

I also like to see numbers of calls stored together with attributed response time. For example I’d like to see number of mysql calls in addition to MySQL response time. This helps to understand if it is the issue with number of queries or their performance. If I see 2 queries taking 30 seconds it is clearly slow queries. If it is 10.000 queries executed and total response time is 4 sec I know it is pretty much as good as it gets with standard Ethernet network and finding a ways to reduce number of queries is going to be more helpful.

The Glue Our applications involve multiple layers and typically higher layer can only report response time it took to call lower layer, but not the reason for that response time. For example we can report time it took to execute SQL query from PHP application side, but we can’t say why it has taken so long. Was it row level lock ? waiting on disk IO or was it simply question of burning a lot of CPU. On the other hand this information may be available in the instrumentation stats from that lower layer – for example in MySQL Query Log. What is important is however to be able to connect the data from these logs – glue them together. The easy way to do it is to provide an unique identifier to all requests and put it in the logs with request of the lower levels. With mySQL the simple way to do it is to put it in the comments for queries you execute.

Optional Tracing The information in lower layers logs is very helpful however it typically have two problems. First not every layer has good logs. For example if you’re running memcached you probably do not have the logs detailing all requests and their response time. Second – the lower layer may only know response time from its vantage point, which in many cases does not include network communication time which can be very important.

Tracing should be optional and normally applied to the small sample of requests, though it needs to be detailed. Typically you would include the calls to the lower level services together with timestamp, the response summary with timestamp again. The information about request has to be complete enough to identify target action and response completely. For example if I’m speaking about memcache I’d like to know which server:port request was issued to which key was requested, and on response I’d like to know if it was hit, miss or error.

The way I use it may be as follows. I see the increased response time for given kind of request. I see response time is coming from MySQL. I check the number of MySQL Queries and it is 5x when it usually is for this kind of request. Looking at memcache stats I can see high number of misses. Looking at some available traces shows the server memcache01 has very high miss ratio. Checking what is going on with memcache01 shows it just was restarted (and hence has almost empty cache). This is important example as it shows your increased response time from MySQL may not have anything to do with MySQL itself but you would not know unless you’re capturing the right data.

If you’re looking for nice example framework for instrumentation, check out instrumentation for PHP – It has everything mentioned by tracing which is trivial to add.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Introduction to memcached

Май 28th, 2010

These are the slides to a talk I did earlier this week for students of the professional bachelor in ICT course at KaHo St. Lieven. I wanted to give a clear and simple introduction to the memcached service, as I think it’s an invaluable tool in today’s web development.


PlanetMySQL Voting: Vote UP / Vote DOWN

Scaling Web Services with MySQL Cluster: An Alternative Approach to MySQL & memcached

Май 24th, 2010

A new white paper is available from http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster_ScalingWebServices.php

MySQL and memcached has become, and will remain, the foundation for many dynamic web services with proven deployments in some of the largest and most prolific names on the web.

There are classes of web services however that are highly transactional and update-intensive, demanding real-time responsiveness and continuous availability. In these cases, MySQL Cluster provides the familiarity and ease-of-use of the regular MySQL Server, while delivering significantly higher levels of write performance with less complexity, lower latency and 99.999% availability.

This whitepaper will discuss the use-cases for both approaches, and provides an insight into how MySQL Cluster is enabling users to scale update-intensive web services.

Scaling Web Services with MySQL Cluster: An Alternative Approach to MySQL & memcached

PlanetMySQL Voting: Vote UP / Vote DOWN

Beyond great cache hit ratio

Май 20th, 2010

I worked with application recently which has great memcached hit ratio – over 99% but yet still has average page response time over 500ms. Reason ? There are hundreds memcached gets and even though they have some 0.4ms response time they add up to add hundreds of ms to the total response time.

The great memcached hit ratio is great however even more than that you should target eliminating requests all together. Hit rate is very bad indicator to begin with. Imagine you have application which gets 90 memcache gets (hits) to retrieve some data plus there are 10 more requests which resulted in misses and caused MySQL queries. The hit ratio is 90%. Imagine now you have found a way to store the data which required 90 requests as the single object. You have 1 request (hit) now and 10 misses which drops your hit rate down to less than 10% but performance will likely be a lot better.

The answer in many cases is to use larger objects for the cache.
If you look at a web page you typically will see some “blocks” which will be static and possibly highly cachable while others may be very dynamic or badly cachable. For example the area which displays user name and current time may not be cachable however widget displaying last 10 blog entries cachable at least for several minutes.

There are many development implications, component dependencies etc on how you can really implement caching but if you’re looking from birds eye view you want to have as fewer blocks in the page as possible (to reduce number of get requests and CPU efford required to build the page) at the same time you have to split the page in blocks which can allow good caching:

Make non cacheable blocks as small as possible If you have something like current time on the page which is not cachable do not mix it with other content which would have to be also re-created each time on the cache miss.

Maximize amount of uses of the cache item It is the most efficient if the item in the cache can be used by multiple pages and multiple users this will maximize cache efficiency while reduce amount of cache space required. The importance of this advice depends on the use case for your applications – some applications which have only couple of page views per user per day would really need to have items shared by multiple users to get best hit rate. Others which have hundreds of page views per user per session may be good enough with caches which are effectively per user.

Control invalidation Item in cache expire or get invalidated. It is best if such invalidation only kicks of the data in cache which needs to be invalidated but this may result in too small blocks and hence too many cache requests required. You may look for balance by looking to cache data together which invalidates in about same time.

Multi-Get Multi Get is truly nuclear option for caching with memcached both because it can be so efficient and because it can be so complicated to add it to existing code which was not design with it in mind. Multi-Get effectively allows you to manage items (expire,invalidate) separately while fetching them in the single round trip. You can have have hundreds of different items fetched from cache for the page without paying for network latency.

Does Multi Get removes the need to think about using larger items for cache ? I do not think so. You still have to pay extra cost of processing both on memcached server and application side. However the optimal cache topology would probably be different with multi get than without it.

The good theoretical example would be having complicated front page which would require a lot of cache requests and hundreds of milliseconds of CPU processing and which can be cached almost int its entirety with exception of user name in the corner. Storing complete page HTML in the cache and doing template replace for user name can be very efficient.

This may sound complicated and indeed there is a lot of art and science in designing optimal cache solution. The good thing is most web sites do not have to get it absolutely perfect to get good enough performance, it is most important to avoid biggest design mistakes.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Conference Review

Апрель 17th, 2010
I am back home from a good week at the 2010 O'Reilly MySQL Conference & Expo. I had a great time and got to see some old friends I had not seen in a while.

Oracle gave the opening keynote and it went pretty much like I thought it would. Oracle said they will keep MySQL alive. They talked about the new 5.5 release. It was pretty much the same keynote Sun gave last year. Time will tell what Oracle does with MySQL.

The expo hall was sparse. Really sparse. There were a fraction of the booths compared to the past. I don't know why the vendors did not come. Maybe because they don't want to compete with Oracle/Sun? In the past you would see HP or Intel have a booth at the conference. But, with Oracle/Sun owning MySQL, why even try. Or maybe they are not allowed? I don't know. It was just sad.

I did stop by the Maatkit booth and was embarrassed to tell Baron (its creator) I was not already using it. I had heard people talk about it in the past, but never stopped to see what it does. It would have only saved me hours and hours of work over the last few years. Needless to say it is now being installed on our servers. If you use MySQL, just go install Maatkit now and start using it. Don't be like me. Don't wait for years, writing the same code over and over to do simple maintenance tasks.

Gearman had a good deal of coverage at the conference. There were three talks and a BoF. All were well attended. Some people seemed to have an AHA! moment where they saw how Gearman could help their architecture. I also got to sit down with the PECL/gearman maintainers and discuss the recent bug I found that is keeping me from using it.

I spoke about Memcached as did others. Again, there was a BoF. It was well attended and people had good questions about it. There seemed to be some FUD going around that memcached is somehow inefficient or not keeping up with technology. However, I have yet to see numbers or anything that proves any of this. They are just wild claims by people that have something to sell. Everyone wants to be the caching company since there is no "Memcached, Inc.". There is no company in charge. That is a good thing, IMO.

That brings me to my favorite topic for the conference, Drizzle. I wrote about Drizzle here on this blog when it was first announced. At the time MySQL looked like it was moving forward at a good pace. So, I had said that it would only replace MySQL in one part of our stack. However, after what, in my opinion, has been a lack of real change in MySQL, I think I may have changed my mind. Brian Aker echoed this sentiment in his keynote address about Drizzle. He talked about how MySQL AB and later Sun had stopped focusing on the things that made MySQL popular and started trying to be a cheap version of Oracle. That is my interpretation of what he said, not his words.

Why is Drizzle different? Like Memcached and Gearman, there is no "Drizzle, Inc.". It is an Open Source project that is supported by the community. It is being supported by companies like Rackspace who hired five developers to work on it. The code is kept on Launchpad and is completely open. Anyone can create a branch and work on the code. If your patches are good, they will be merged into the main branch. But, you can keep your own branch going if you want to. Unlike the other forks, Drizzle has started over in both the code and the community. I personally see it as the only way forward. It is not ready today, but my money is on Drizzle five or ten years from now.
PlanetMySQL Voting: Vote UP / Vote DOWN