Archive for the ‘cache’ Category

Cache pre-loading on mysqld startup

Декабрь 1st, 2010

The following quirky dynamic SQL will scan each index of each table so that they’re loaded into the key_buffer (MyISAM) or innodb_buffer_pool (InnoDB). If you also use the PBXT engine which does have a row cache but no clustered primary key, you could also incorporate some full table scans.

To make mysqld execute this on startup, create /var/lib/mysql/initfile.sql and make it be owned by mysql:mysql

SET SESSION group_concat_max_len=100*1024*1024;
SELECT GROUP_CONCAT(CONCAT('SELECT COUNT(',column_name,') FROM ',table_schema,'.',table_name,' FORCE INDEX (',index_name,')') SEPARATOR ' UNION ALL ') INTO @sql FROM information_schema.statistics WHERE table_schema NOT IN ('information_schema','mysql') AND seq_in_index = 1;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET SESSION group_concat_max_len=@@group_concat_max_len;

and in my.cnf add a line in the [mysqld] block

init-file = /var/lib/mysql/initfile.sql

That’s all. mysql reads that file on startup and executes each line. Since we can do the whole select in a single (admittedly quirky) query and then use dynamic SQL to execute the result, we don’t need to create a stored procedure.

Of course this kind of simplistic “get everything” only really makes sense if the entire dataset+indexes fit in memory, otherwise you’ll want to be more selective. Still, you could use the above as a basis, perhaps using another table to provide a list of tables/indexes to be excluded – or if the schema is really stable, simply have a list of tables/indexes to be included instead of dynamically using information_schema.

Practical (albeit niche) application:

In a system with multiple slaves, adding in a new slave makes it start with cold caches, but since with loadbalancing it will pick up only some of the load it often works out ok. However, some environments have dual masters but the application is not able to do read/write splits to utilise slaves. In that case all the reads also go to the active master. Consequentially, the passive master will have relatively cold caches (only rows/indexes that have been updated will be in memory) so in case of a failover the amount of disk reads for the many concurrent SELECT queries will go through the roof – temporarily slowing the effective performance to a dismal crawl: each query takes longer with the required additional disk access so depending on the setup the server may even run out of connections which in turn upsets the application servers. It’d sort itself out but a) it looks very bad on the frontend and b) it may take a number of minutes.

The above construct prevents that scenario, and as mentioned it can be used as a basis to deal with other situations. Not many people know about the init-file option, so this is a nice example.

If you want to know how the SQL works, read on. The original line is very long so I’ll reprint it below with some reformatting:

SELECT GROUP_CONCAT(CONCAT(
  'SELECT COUNT(',column_name,')
          FROM ',table_schema,'.',table_name,
          ' FORCE INDEX (',index_name,')'
       ) SEPARATOR ' UNION ALL ')
  INTO @sql
  FROM information_schema.statistics
  WHERE table_schema NOT IN ('information_schema','mysql')
  AND seq_in_index = 1;

The outer query grabs each regular db/table/index/firstcol name that exists in the server, writing out a SELECT query that counts all not-NULL values of the indexed column (so it must scan the index), forcing that specific index. We then abuse the versatile and flexible GROUP_CONCAT() function to glue all those SELECTs together, with “UNION ALL” inbetween. The result is a single very long string, so we need to tweak the maximum allowed group_concat output beforehand to prevent truncation.


PlanetMySQL Voting: Vote UP / Vote DOWN

Caching could be the last thing you want to do

Июль 24th, 2010

I recently had a run-in with very popular PHP ecommerce package which makes me want to voice a recurring mistake I see in how many web applications are architected.

What is that mistake?

The ecommerce package I was working with depended on caching.  Out of the box it couldn’t serve 10 pages/second unless I enabled some features which were designed to be “optional” (but clearly they weren’t).

I think with great tools like memcached it is easy to get carried away and use it as the mallet for every performance problem, but in many cases it should not be your first choice.  Here is why:

  • Caching might not work for all visitors - You look at a page, it loads fast.  But is this the same for every user?  Caching can sometimes be an optimization that makes the average user have a faster experience, but in reality you should be caring more that all users get a good experience (Peter explains why here, taking about six sigma).  In practice it can often be the same user that has all the cache misses, which can make this problem even worse.
  • Caching can reduce visibility – You look at the performance profile of what takes the most time for a page to load and start trying to apply optimization.  The problem is that the profile you are looking at may skew what you should really be optimizing.  The real need (thinking six sigma again) is to know what the miss path costs, but it is somewhat hidden.
  • Cache management is really hard – have you planned for cache stampeding, or many cache items being invalidated at the same time?

What alternative approach should be taken?

Caching should be seen more as a burden that many applications just can’t live without.  You don’t want that burden until you have exhausted all other easily reachable optimizations.

What other optimizations are possible?

Before implementing caching, here is a non-exhaustive checklist to run through:

  • Do you understand every execution plan of every query? If you don’t, set long_query_time=0 and use mk-query-digest to capture queries.  Run them through MySQL’s EXPLAIN command.
  • Do your queries SELECT *, only to use subset of columns?  Or do you extract many rows, only to use a subset? If so, you are extracting too much data, and (potentially) limiting further optimizations like covering indexes.
  • Do you have information about how many queries were required to generate each page? Or more specifically do you know that each one of those queries is required, and that none of those queries could potentially be eliminated or merged?

I believe this post can be summed up as “Optimization rarely decreases complexity. Avoid adding complexity by only optimizing what is necessary to meet your goals.”  – a quote from Justin’s slides on instrumentation-for-php.  In terms of future-proofing design, many applications are better off keeping it simple and refusing the temptation to try and solve some problems “like the big guys do”.


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

PBXT 1.5.02 Beta adds 2nd Level Cache

Июль 16th, 2010
As many probably already know, PBXT is the first MySQL Storage Engine to use a log-based architecture. Log-based means that data that would normally first be written to the transaction log, and then to the database tables, is just written to the log, and the log becomes part of the database.

This result is that data is only written once, and is always written sequentially. The advantage when writing is obvious, but there is a down side (as always). The data is written to the disk in write order, which is seldom the order in which the data is retrieved. So this results in a lot of random reads to the disk when accessing the data later.

Placing the data logs on a Solid State Drive would solve this problem, because SSDs have no seek time. But the problem with this solution is that SSDs are still way to expense to base all your storage needs on such hardware.

The solution: an SSD-based 2nd Level Cache.

Using an SSD-based 2nd Level Cache you can store the most commonly accessed parts of your database on SSD for a reasonable price. For example, if you have a Terabyte database, you can cache about 15% (160 GB) of it on SSD for around $400. This can significantly affect the performance of your system.

With this thought in mind, I have just released PBXT 1.5.02 Beta, which implements a 2nd level cache for the data logs. How this works is illustrated below.

Data written to the data log is also written to the, main memory based, Data Log Cache. Once the Data Log Cache is full, pages need to be freed up when new data arrives. Pages that are freed from the Data Log Cache are written to the 2nd Level Cache.

Now, when the Data Log records are read, PBXT will read the corresponding page from the Data Log Cache. If the page is not already in the cache, it will first check to see if the page is in the 2nd Level Cache, before reading from the Data Log itself.

PBXT 1.5 is available for download from primebase.org, or you can check out lp:pbxt/1.5 from Launchpad using bazaar. The documentation has also been updated for 1.5.

Using the 2nd level cache is easy. It is controlled by 3 system variables:
  • pbxt_dlog_lev2_cache_file - the name and path of the file in which the data is stored.
  • pbxt_dlog_lev2_cache_size - the size of the 2nd level cache.
  • pbxt_dlog_lev2_cache_enabled - set to 1 to enable the 2nd level cache.
It also makes sense to set a higher value for the Data Log Cache, using the pbxt_data_log_cache_size variable, which has a default value of 16MB.

Of course it will be interesting to do some benchmarks on this implementation. But that will have to wait until after my holiday! I will be away until late August, but if you decide to test the new version, be sure to let me know.

PlanetMySQL Voting: Vote UP / Vote DOWN

change accelerator cache ratio

Июнь 7th, 2010

I was given the task of checking the array accelerator cache ratio and see if it was set to optimal levels. Our ideal preference was a read/write ratio of 0/100.

The machine configuration is HP DL180 G5, 2 x Xeon L5420 2.50GHz, 15.7GB / 16GB 667MHz DDR2, 6 x 300GB-15K SAS.This machine was running mysql 5.1.36 using the innodb plugin.

The command line utility to check the controller configuration is “hpacucli”. Navigating using hpacucli is very straight forward.

“ctrl all show config detail” Will give you the entire controller configuration.

=> ctrl all show config detail

Smart Array P400 in Slot 5
Bus Interface: PCI
Slot: 5
Serial Number: P61630K9SW31NL
Cache Serial Number: PA82C0J9SW02H1
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Chassis Slot:
Hardware Revision: Rev D
Firmware Version: 4.12
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Accelerator Ratio: 0% Read / 100% Write
Drive Write Cache: Disabled
Total Cache Size: 256 MB
Battery Pack Count: 1
Battery Status: OK
SATA NCQ Supported: True

Array: A
Interface Type: SAS
Unused Space: 0 MB
Status: OK

Logical Drive: 1
Size: 1.4 TB
Fault Tolerance: RAID 5
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Stripe Size: 64 KB
Status: OK
Array Accelerator: Enabled
Parity Initialization Status: Initialization Completed
Unique Identifier: 600508B100104B39535733314E4C0003
Disk Name: /dev/cciss/c0d0
Mount Points: / 3.9 GB, none 12.0 GB, /var 3.9 GB, /tmp 3.9 GB, /home 1.3 TB
Logical Drive Label: A0432BCEP61630K9SW31NLD55F

physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1BEP400009004UQCD
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1ZZRN000090035Q2Q
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP20VCQ00009004XE2V
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1ZZSB00009003MMWZ
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:5
Port: 1I
Box: 1
Bay: 5
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP1L4T000009004UQCV
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown
physicaldrive 1I:1:6
Port: 1I
Box: 1
Bay: 6
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 15000
Firmware Revision: 0005
Serial Number: 3QP196KG00009004S0ZH
Model: SEAGATE ST3300656SS
PHY Count: 2
PHY Transfer Rate: 3.0GBPS, Unknown

In the above output our point of interest was “Accelerator Ratio: 0% Read / 100% Write“. In this case it has been set to an optimal value. In case it wasn’t set to an optimal value it can be changed using the command “ctrl slot=5 modify cacheratio=0/100

When you are stuck for a particular command, you can just run “help <command name>” for more input.

=> help aa

The following documentation pertains to your search:

<target> modify [arrayaccelerator=enable|disable]
Enables or disables the array accelerator for a given logical drive. The
target can be any valid logical drive target on a controller that supports
array accelerator management.

<target> modify [cacheratio=#/#|?]
Sets the array accelerator cache ratio for the controller. The first # is
the read cache %. The second # is the write cache %. The target can be any
valid controller.

<target> create [type=ld]
[drives=[#:]#:#,[#:]#:#,[#:]#:#-[#:]#:#],…|all|allunassigned]
[raid=6|5|1+0|1|0|?]
[size=#|?]
[stripesize=8|16|32|64|128|256|default|?]
[sectors=32|63|?]
[arrayaccelerator=enable|disable|?]
[drivetype=sas|satalogical|sata|saslogical|parallelscsi|?]

[type=] The type parameter specifies the device type that is being created.
A logical drive is the only device type supported at this time.

[drives=] The drives parameter specifies the physical drives to be used for
creating a logical drive on a new or existing array. If the drives specified
are all unassigned drives, then a new array will be created with a new
logical drive on it. If the drives specified are all assigned to an existing
array, then a new logical drive will be created on that array. The symbol
#:# stands for port:id or box:bay, depending on the controller. Some
controllers may also support port:box:bay and use the #:#:# syntax. The all
and allunassigned keywords both target all physical drives that are not
currently assigned to an array.

[raid=] The raid parameter sets the raid level of the logical drive. If not
specified, the default raid is the highest level possible. The availability
of certain raid settings depends on the number of drives designated in the
“drives=” parameter. For example, RAID 1 will only be available if two
drives are selected while RAID 1+0 will be shown for a selection of 4 or
more drives.

[size=] The size parameter specifies the size of the logical drive, the
implied units are MB. If not specified, the default is the maximum possible
size.

[stripesize=] The stripesize parameter sets the logical drive’s stripesize.
The implied units of stripe size are KB.

[sectors=] The sectors parameter specifies the sectors per track of the
logical drive. If not specified, the default is 32.

[arrayaccelerator=] The arrayaccelerator parameter specifies the array
accelerator state for the logical drive. If not specified, the default is
enable.

[drivetype=] The drivetype parameter specifies the drive
interface type. If there are multiple drive types when selecting all
physical drives, the desired drive type needs to be specified. Mixed drives
are not allowed on the same array or logical drive. If all drive types in a
controller are the same this parameter is not needed. The target can be a
controller or an array in the system.

Examples:
controller slot=3 logicaldrive 2 modify arrayaccelerator=enable
controller slot=1 modify cacheratio=25/75
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=6
ctrl slot=1 create type=ld drives=1:1-1:6,1:9,1:10-1:12 raid=6
ctrl slot=1 create type=ld drives=all drivetype=parallelscsi
controller slot=5 array A create type=ld raid=5 size=1000
controller slot=1 array C create type=ld raid=1 stripesize=32
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 stripesize=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=1+0 stripesize=?
ctrl slot=1 create type=ld drives=1:1,1:2 raid=1 size=?
ctrl slot=1 create type=ld drives=1:1,1:2,1:3,1:5 raid=1+0 sectors=?


Tagged: accelerator, cache, controller, HP DL180, hpacucli, mysql
PlanetMySQL Voting: Vote UP / Vote DOWN

Is the query cache useful?

Сентябрь 29th, 2009

Mark Callaghan posted a good test of the MySQL query cache in different versions. His tests clearly show that in 5.0.44 and 5.0.84 and 5.1.38, there is more query throughput when the query cache is disabled.

However, the tests are skewed — not on purpose, I am sure, and Mark admits he has not used the query cache before — but they are skewed all the same. Mark’s error was that he assumed he could just turn on the query cache and see if it works. Most features of MySQL do not work that way — you have to understand the strengths and weaknesses of the feature in order to use it properly.

Mark’s benchmark definitely reinforces that turning on the query cache without any knowledge of your system is a bad idea, and I agree with him on that. But it does not in any way mean that the query cache is always a bad idea. In fact, the MySQL manual page at http://dev.mysql.com/doc/refman/5.1/en/query-cache.html has some caveats about when the query cache is useful and when it is not useful.

It is important to know how the MySQL query cache works, so I will first explain that, and then explain why Mark’s test is not a very good broad generalization. MySQL’s query cache is not useful all the time, but it can be quite useful! If you are already familiar with the query cache, you can skip ahead to why Mark’s benchmark is skewed.


How to use the MySQL query cache
Firstly, there are a few ways to use the MySQL query cache. You can disable it, which means no queries are cached; you can set it to cache all queries, and use a special keyword to not cache queries (replace SELECT with SELECT SQL_NO_CACHE). query_cache_type=1 or ON will set the query cache in this way.

You can also set it to cache no queries except those that have a special keyword — you guessed it, replace SELECT with SELECT SQL_CACHE and a query will attempt to be cached. To set the query cache in this way, set query_cache_type=2 or DEMAND.

What queries are cached
Regardless of which method you choose, not all possible queries are cached. Only deterministic SELECT statements are cached — a statement such as SELECT count(*) from tbl is deterministic, while SELECT NOW() is non-deterministic for the purpose of the query cache.* Queries that use user-defined functions, temporary tables, LOCK IN SHARE MODE, FOR UPDATE, INFORMATION_SCHEMA or mysql system tables are not cached. Most SELECT subqueries are not cached. SELECT queries that are part of stored functions, triggers, views and events are not cached. However, in MySQL 5.1, SELECT queries that contain a view can be cached. The MySQL manual page at http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html has more detailed information on what is allowed and what is not allowed.

In the query cache, a hash of the query is stored along with the result. There is a variable, query_cache_limit, which sets an upper limit on the size of the result set that can be cached. So even if a query can be cached, it may not be cached if the result set is larger than query_cache_limit.

How the query cache is used
The lifecycle of a non-prepared SELECT statement when the query cache is on (query_cache_type > 0) is:

1) hash the SELECT query
2) check the hash against the query cache
3) if hash matches, returned the result set from the query cache. If not, the query is sent to the parser and executed in the usual manner, and results are stored in the query cache if the query is cachable.

This means that having the query cache on has some overhead, even if no queries are ever cached.

A query (and its result) is removed from the query cache when it is invalidated or pruned. A query is invalidated when any table it refers to has been changed, even if the changes do not affect the actual result set in the query cache. A query is pruned when a new query needs to be inserted into the query cache and there is no room; the query cache prunes queries using a least recently used (LRU) algorithm.

In other words, there will be a lot of unnecessary overhead with the query cache if a table is often written to, because queries using that table will be constantly removed from the query cache. This does not mean that if you have some tables that are frequently written to, you should turn the query cache off — you can use the query_cache_type of ON with the SQL_NO_CACHE keyword to cache all except some queries, or you can use the query_cache_type of DEMAND and the SQL_CACHE keyword in SELECT queries to cache no queries except those you specify.

Why Mark’s Benchmark is skewed

It is important to note that InnoDB was set up with:
innodb_buffer_pool_size=2000M

and the sysbench program was run with:

../sysbench --test=oltp --oltp-table-size=2000000 --max-time=180 --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes --oltp-read-only --num-threads=$nt run

I will explain what the important sysbench variables are in context.

1) The benchmark used sysbench in OLTP mode against an InnoDB table. The InnoDB buffer pool is hot after the table is created and populated, while the query cache is empty and cold.

InnoDB has its own cache — the InnoDB buffer pool. sysbench is run with –oltp-read-only, meaning there are no INSERT, DELETE or UPDATE statements. This is good; it means that the results are not skewed by query cache invalidations, though there may be pruning.

The first thing sysbench does is preparation — it creates the table and populates it with data (2,000,000 rows as per oltp-table-size). After this is done, the InnoDB buffer pool (2000M in size) has gone from empty to containing data. I do not know how large the row size of the sysbench table is; however, I can do some calculations:

2000 Mb * 1024 = 2,048,000 Kb / 2,000,000 rows = 1.024 Kb per row

This means that in order for some data to *not* be current in the InnoDB buffer pool, the row size had to be approximately larger than 1 Kb per row.

The manual at http://sysbench.sourceforge.net/docs/#database_mode shows the schema of the table that is used.

CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL auto_increment, `k` int(10) unsigned NOT NULL default ‘0′, `c` char(120) NOT NULL default ”, `pad` char(60) NOT NULL default ”, PRIMARY KEY (`id`), KEY `k` (`k`)

(note that you at least need to add a final parenthesis “)” for the query to work, and should also specify ENGINE=INNODB if you want to test on your system).

So, that’s two 4-byte integers, and two character fields…for the sake of argument, let’s say you are using UTF-8 in MySQL 5.1, where utf8 is 3 bytes per character, that’s:

4 + 4 + 120*3 + 60*3 = 548 bytes

Let’s say you are using a version of MySQL that uses 4 bytes per character for utf8 (I know 6.0 was using that, not sure about 5.4):

4 + 4 + 120*4 + 60*4 = 728 bytes

So either way, all of the data in the table is already loaded in the InnoDB buffer pool when the test begins. The InnoDB buffer pool is already hot, whereas the query cache is empty.

On systems where a query can be cached and used instead of executing and having to look on disk for data, using the MySQL query cache will give you *much* better performance.

2) The query cache statistics are not given. It’s very possible that the only thing that was measured was how much overhead the query cache provides if it is never used.

The queries that are run are the default (oltp-test-mode is not set, and it defaults to “complex”) which can be seen under the “Advanced transactional” heading in the sysbench manual at http://sysbench.sourceforge.net/docs/#database_mode. Only read queries are done, so the queries are:

SELECT c FROM sbtest WHERE id=N
SELECT c FROM sbtest WHERE id BETWEEN N AND M
SELECT SUM(K) FROM sbtest WHERE id BETWEEN N and M
SELECT c FROM sbtest WHERE id between N and M ORDER BY c
SELECT DISTINCT c FROM sbtest WHERE id BETWEEN N and M ORDER BY c

The values for the variables in the query are chosen at random, though they are sensible values. The testing time is very short, only 3 minutes of running time (sysbench is called with max-time=180). It is quite possible that all the SELECT queries were put into the query cache and there was never a query cache hit. Of course there will be a lot of overhead if the query cache is never used! Particularly if it has to prune for more memory.

I would like to see Qcache_hits, Qcache_inserts, Com_select (number of SELECT statements that did not use the query cache) and Qcache_lowmem_prunes. That is what I look at for any system that has the query cache running, so I can determine the effectiveness of the query cache.

There is more in-depth information on how to find the usefulness of the query cache, query cache fragmentation, and a few ways to cache more queries in the MySQL Administrator’s Bible on pages 427-434 — part of the “Caching with MySQL” chapter. I used the book to help me write this blog post, so I do think it is fair to cite my sources….even though it happens to be a source I created (and that’s why I consider it a shameless plug, I have no shame citing my sources)!

—————-

So that all being said, the query cache is extremely useful when it is used for repeated queries. The benchmark Mark performed most likely did not actually use this. I have seen production environments fall apart when someone turned on on the query cache for all queries; Mark’s benchmark proves that it is not a good idea to randomly turn on the query cache. You have to know what you’re doing.

* for the purpose of replication, it is deterministic, but that’s another topic.


PlanetMySQL Voting: Vote UP / Vote DOWN

Advanced Squid Caching in Scribd: Hardware + Software Used

Август 4th, 2009

After the previous post in this caching related series I’ve received many questions on hardware and software configuration of our servers so in this post I’ll describe our server’s configs and the motivation behind those configs.

Hardware Configuration

Since in our setup Squid server uses one-process model (with an asynchronous requests processing) there was no point in ordering multi-core CPUs for our boxes and since we have a lots of pages on the site and the cache is pretty huge all the servers ended up being highly I/O bound. Considering these facts we’ve decided to use the following hardware specs for the servers:

CPU: One pretty cheap dual-core Intel Xeon 5148 (no need in multiple cores or really high frequencies – even these CPUs have ~1% avg load)
RAM: 8Gb (basically to reduce I/O pressure by caching hot content in RAM)
Disks: 4 x small SAS 15k drives in JBOD mode (no RAIDS – we’ve tried all kinds of RAID configs and it did not help with the I/O performance)

So, once again: nothing is as important in a squid box as I/O throughput.

Here is a sample CPU load graph from one of the boxes:

squid-cpu-graph

Software Configuration

This could be a long story, but in a few words our experience with different squid versions was the following.

First, when I’ve started working on this caching project I’ve just installed squid using Debian’s apt-get install squid command. As the result we’ve got some ancient squid 2.6 release that for some reason (still unclear to me) was painfully slow in I/O operations and it had some leaking file descriptors problem so after a few hours under production load the box would simply stop processing requests.

When the first approach failed, I’ve decided to go to the squid web site, download the latest production release and install it from sources (yes, we do it all the time when OS vendor ships too old or buggy releases). Result – freaking fast and stable squid 3.0 which worked flawlessly for about 5 months.

Few months ago we’ve found out about the stale-* extensions available in squid 2.7 and I’ve started wondering if we should change our perfectly stable 3.0 setup to 2.7. And some time later I’ve decided to use Vary HTTP header in our caching architecture and then I found out that vary-caching correctly implemented only in 2.7 and since 3.0 is a complete rewrite of the 2.X branch, vary-caching is not yet implemented there (or not in a way we’d want it to be implemented).

So, the final result: at this moment in time we’re using custom-built Squid 2.7STABLE6 and really happy with it, it is stable, fast and feature-rich caching proxy server.

Caching Cluster Configuration

Apparently we have more than one squid server in scribd and this makes it a bit harder to use those servers (comparing to one box when you’d send all requests to one IP:port pair). We’ve tried to use round-robin balancing for the squid boxes + ICP-based neighbor checks but it was adding more latency to our responses and we’ve decided to put haproxy load balancer between nginx and squid farm and set up URL hash based balancing to distribute requests evenly amongst squid backends.

This scheme worked pretty nice, but we had one serious problem with this setup: if one squid box would go down, haproxy would quickly detect the problem and would remove it from the pool… And here comes the problem – removing a server from the pool completely changes hashing keys space and all cached requests become invalid. To solve this problem we’ve developed a nginx balancer module that performs consistent hashing of URLs and we’re testing this module now in production. What is really good about this module is that it removes one hop from the chain if http proxies between the site and a user.

So, this was a short description of what hardware we use for our caching cluster and why do we use it. In the next posts of this series we’ll talk about cache control and objects invalidation.