<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PlanetMysql.ru - информация о СУБД MySQL &#187; Admin-tips</title>
	<atom:link href="http://planetmysql.ru/category/admin-tips/feed/" rel="self" type="application/rss+xml" />
	<link>http://planetmysql.ru</link>
	<description>Блог о самой популярной СУБД MySQL</description>
	<lastBuildDate>Wed, 08 Feb 2012 21:24:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Momentum MTA Performance Tuning Tips</title>
		<link>http://feedproxy.google.com/~r/Homo-Adminus/~3/aMjvuL6CJTM/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=momentum-mta-performance-tuning-tips</link>
		<comments>http://feedproxy.google.com/~r/Homo-Adminus/~3/aMjvuL6CJTM/#comments</comments>
		<pubDate>Sat, 07 Jan 2012 18:40:25 +0000</pubDate>
		<dc:creator>Alexey Kovyrin</dc:creator>
				<category><![CDATA[Admin-tips]]></category>
		<category><![CDATA[ecelerity]]></category>
		<category><![CDATA[mail]]></category>
		<category><![CDATA[Momentum]]></category>
		<category><![CDATA[MTA]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[tuning]]></category>

		<guid isPermaLink="false">http://kovyrin.net/?p=568</guid>
		<description><![CDATA[About 2 months ago I&#8217;ve joined LivingSocial technical operations team and one of my first tasks there was to figure out a way to make our MTAs perform better and deliver faster. We use a really great product called Momentum MTA (former Ecelerity) and it is really fast, but it is always good to be able to squeeze as much performance as possible so I&#8217;ve started looking for a ways to make our system faster.
While working on it I&#8217;ve created a set of scripts to integrate Momentum with Graphite for all kinds of crazy stats graphing, those scripts will be opensourced soon, but for now I&#8217;ve decided to share a few tips about performance-related changes we&#8217;ve made to improve our performance at least 2x:


Use EXT2 Filesystem for the spool storage – After a lot of benchmarking we&#8217;ve noticed that amounts of I/O we&#8217;ve been doing was way too high compared to our throughput. Some investigation showed that EXT3 filesystem we were using for the spool partition had way too high metadata update overhead because of the fact that the spool storage uses a lot of really small files. Switching to EXT2 helped us gain at least 50-75% additional performance. Additional performance gain was caused by turning on noatime option for our spool.
There are some sources that claim using XFS for spool directories is a better option, but we&#8217;ve decided to stick with EXT2 for now.
Do not use %h{X} macro in your custom logs – Custom logs is an awesome feature of momentum and we use it to log our bounces along with some information from mail headers. Unfortunately the most straighforward thing to do (using %h{X} macro) was not the best option for I/O loaded servers because every time Momentum needs to log a bounce it needs to swap message body in from the disk and parse it to get you the header value.
To solve this issue we&#8217;ve created a Sieve+ policy script that would extract the headers we need from a message during initial spooling phase (when the message is still in memory) and put those values to the message metadata. This way when we need to log those values we wouldn&#8217;t need to swap message body in from the disk. Here is the Sieve script to extract header value:
123456require &#091; &#34;ec_header_get&#34;, &#34;vctx_mess_set&#34;, &#34;ec_log&#34; &#093;;

# Extract x-ls-send-id header to LsSendId context variable 
# (later used in deliver log)
&#040;$send_id&#041; = ec_header_get &#34;x-ls-send-id&#34;;
vctx_mess_set &#34;LsSendId&#34; $send_id;
After this we could use it in a custom logger like this:
123456custom_logger &#34;custom_logger1&#34;
&#123;
&#160; delivery_logfile = &#34;cluster:///var/log/ecelerity/ls-delivery_log.cluster=&#62;master&#34;
&#160; delivery_format = &#34;%t@%BI@%i@%CI@D@%r@%R@%m@%M@%H@%p@%g@%b@%vctx_mess{LsSendId}&#34;
&#160; delivery_log_mode = 0664
&#125;

Give more RAM to Momentum – When Momentum receives a message, it stores it to the disk (as required by SMTP standard) and then tries to deliver the copy it has in memory, if delivery succeeds, on-disk copy is unliked. The problem with a really have outbound traffic load is that momentum needs to keep tons of emails in memory, but by default it could only hold 250 messages. With a load of 250-500 messages a second this is just too small.
To change this limit we&#8217;ve increased Max_Resident_Active_Queue parameter and changed it to 1000000 (of course we made sure have enough RAM to hold that many messages if needed) and Max_Resident_Messages to 0 (which means unlimited). This allows Momentum keep as many messages resident as possible and reduce the load caused by swap-in operations required for re-delivery attempts, etc.
Choose a proper size for your I/O-related thread pools – in default Momentum config they set SwapIn and SwapOut thread pool sizes to 20. Under really high load even on our 4xSAS15k RAID10 this tends to be too high value. We&#8217;ve switched those pools to 8 threads each and it helped to reduce I/O contention and overall I/O throughput. 

As a summary, I&#8217;d like to note, that as with any optimizations, before tuning your system it really helps to set up as much monitoring for your MTA servers as possible: cacti graphs, graphite, ganglia or something else &#8211; does not matter. Just make sure you see all the aspects of your system performance and understand what is going on with your system before changing any performance-related settings.]]></description>
			<content:encoded><![CDATA[<p>About 2 months ago I&#8217;ve joined <a href="http://www.livingsocial.com/">LivingSocial</a> technical operations team and one of my first tasks there was to figure out a way to make our MTAs perform better and deliver faster. We use a really great product called <a href="http://www.messagesystems.com/products-momentum-outbound.php">Momentum MTA</a> (former Ecelerity) and it is really fast, but it is always good to be able to squeeze as much performance as possible so I&#8217;ve started looking for a ways to make our system faster.</p>
<p>While working on it I&#8217;ve created a set of scripts to integrate Momentum with Graphite for all kinds of crazy stats graphing, those scripts will be opensourced soon, but for now I&#8217;ve decided to share a few tips about performance-related changes we&#8217;ve made to improve our performance at least 2x:</p>
<p><span></span></p>
<ol>
<li><strong>Use EXT2 Filesystem for the spool storage</strong> – After a lot of benchmarking we&#8217;ve noticed that amounts of I/O we&#8217;ve been doing was way too high compared to our throughput. Some investigation showed that EXT3 filesystem we were using for the spool partition had way too high metadata update overhead because of the fact that the spool storage uses a lot of really small files. Switching to EXT2 helped us gain at least 50-75% additional performance. Additional performance gain was caused by turning on <tt>noatime</tt> option for our spool.
<p><a href="http://archives.neohapsis.com/archives/postfix/2006-01/1916.html">There</a> <a href="http://www.dovecot.org/list/dovecot/2011-August/060574.html">are</a> <a href="http://www.thesmbexchange.com/eng/qmail_fs_benchmark.html">some</a> <a href="http://www.htiweb.inf.br/benchmark/fsbench.htm">sources</a> that claim using XFS for spool directories is a better option, but we&#8217;ve decided to stick with EXT2 for now.</li>
<li><strong>Do not use <tt>%h{X}</tt> macro in your custom logs</strong> – Custom logs is an awesome feature of momentum and we use it to log our bounces along with some information from mail headers. Unfortunately the most straighforward thing to do (using <tt>%h{X}</tt> macro) was not the best option for I/O loaded servers because every time Momentum needs to log a bounce it needs to swap message body in from the disk and parse it to get you the header value.
<p>To solve this issue we&#8217;ve created a Sieve+ policy script that would extract the headers we need from a message during initial spooling phase (when the message is still in memory) and put those values to the message metadata. This way when we need to log those values we wouldn&#8217;t need to swap message body in from the disk. Here is the Sieve script to extract header value:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br />4<br />5<br />6<br /></div></td><td><div><a href="http://perldoc.perl.org/functions/require.html"><span>require</span></a> <span>&#91;</span> <span>&quot;ec_header_get&quot;</span><span>,</span> <span>&quot;vctx_mess_set&quot;</span><span>,</span> <span>&quot;ec_log&quot;</span> <span>&#93;</span><span>;</span><br />
<br />
<span># Extract x-ls-send-id header to LsSendId context variable </span><br />
<span># (later used in deliver log)</span><br />
<span>&#40;</span><span>$send_id</span><span>&#41;</span> <span>=</span> ec_header_get <span>&quot;x-ls-send-id&quot;</span><span>;</span><br />
vctx_mess_set <span>&quot;LsSendId&quot;</span> <span>$send_id</span><span>;</span></div></td></tr></tbody></table></div>
<p>After this we could use it in a custom logger like this:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br />4<br />5<br />6<br /></div></td><td><div>custom_logger <span>&quot;custom_logger1&quot;</span><br />
<span>&#123;</span><br />
&nbsp; delivery_logfile = <span>&quot;cluster:///var/log/ecelerity/ls-delivery_log.cluster=&gt;master&quot;</span><br />
&nbsp; delivery_format = <span>&quot;%t@%BI@%i@%CI@D@%r@%R@%m@%M@%H@%p@%g@%b@%vctx_mess{LsSendId}&quot;</span><br />
&nbsp; delivery_log_mode = 0664<br />
<span>&#125;</span></div></td></tr></tbody></table></div>
</li>
<li><strong>Give more RAM to Momentum</strong> – When Momentum receives a message, it stores it to the disk (as required by SMTP standard) and then tries to deliver the copy it has in memory, if delivery succeeds, on-disk copy is unliked. The problem with a really have outbound traffic load is that momentum needs to keep tons of emails in memory, but by default it could only hold 250 messages. With a load of 250-500 messages a second this is just too small.
<p>To change this limit we&#8217;ve increased <tt><a href="https://support.messagesystems.com/docs/web-ref3/conf.ref.max_resident_active_queue.php">Max_Resident_Active_Queue</a></tt> parameter and changed it to 1000000 (of course we made sure have enough RAM to hold that many messages if needed) and <tt><a href="https://support.messagesystems.com/docs/web-ref3/conf.ref.max_resident_messages.php">Max_Resident_Messages</a></tt> to 0 (which means unlimited). This allows Momentum keep as many messages resident as possible and reduce the load caused by swap-in operations required for re-delivery attempts, etc.</li>
<li><strong>Choose a proper size for your I/O-related thread pools</strong> – in default Momentum config they set SwapIn and SwapOut <a href="https://support.messagesystems.com/docs/web-ref3/conf.ref.threadpool.php">thread pool</a> sizes to 20. Under really high load even on our 4xSAS15k RAID10 this tends to be too high value. We&#8217;ve switched those pools to 8 threads each and it helped to reduce I/O contention and overall I/O throughput. </li>
</ol>
<p>As a summary, I&#8217;d like to note, that as with any optimizations, before tuning your system it really helps to set up as much monitoring for your MTA servers as possible: cacti graphs, graphite, ganglia or something else &#8211; does not matter. Just make sure you see all the aspects of your system performance and understand what is going on with your system before changing any performance-related settings.</p>

<p><a href="http://feedads.g.doubleclick.net/~a/JFeXCYQN80l77l6_N973ajmIqy4/0/da"><img src="http://feedads.g.doubleclick.net/~a/JFeXCYQN80l77l6_N973ajmIqy4/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/JFeXCYQN80l77l6_N973ajmIqy4/1/da"><img src="http://feedads.g.doubleclick.net/~a/JFeXCYQN80l77l6_N973ajmIqy4/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=aMjvuL6CJTM:kXxTOu-ISSg:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=aMjvuL6CJTM:kXxTOu-ISSg:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=aMjvuL6CJTM:kXxTOu-ISSg:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=aMjvuL6CJTM:kXxTOu-ISSg:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=aMjvuL6CJTM:kXxTOu-ISSg:V_sGLiPBpWU" border="0"></img></a>
</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=31515&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=31515&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2012/01/07/momentum-mta-performance-tuning-tips/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick (and dirty) Patch for Ruby Enterprise Edition 2011.03 to Prevent Hash Collision Attacks</title>
		<link>http://feedproxy.google.com/~r/Homo-Adminus/~3/V534dICoisA/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=quick-and-dirty-patch-for-ruby-enterprise-edition-2011-03-to-prevent-hash-collision-attacks</link>
		<comments>http://feedproxy.google.com/~r/Homo-Adminus/~3/V534dICoisA/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 18:59:39 +0000</pubDate>
		<dc:creator>Alexey Kovyrin</dc:creator>
				<category><![CDATA[Admin-tips]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[DoS]]></category>
		<category><![CDATA[patch]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[rubyee]]></category>

		<guid isPermaLink="false">http://kovyrin.net/?p=550</guid>
		<description><![CDATA[Since there is no patch for Ruby Enterprise Edition 2011.03 to prevent the Hash Collision DoS Attack, I&#8217;ve quickly ported ruby 1.8.7 patchlevel 357 patch. Here it is:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314From e19bd3eaa8bd71cfc9e5bf436527f015b093f31e Mon Sep 17 00:00:00 2001
From: shyouhei &#60;shyouhei@b2dd03c8-39d4-4d8f-98ff-823fe69b080e&#62;
Date: Wed, 28 Dec 2011 12:47:15 +0000
Subject: [PATCH] -This line, and those below, will be ignored--

M &#160; &#160;ruby_1_8_7/inits.c
M &#160; &#160;ruby_1_8_7/string.c
M &#160; &#160;ruby_1_8_7/st.c
M &#160; &#160;ruby_1_8_7/test/ruby/test_string.rb
M &#160; &#160;ruby_1_8_7/random.c


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_8_7@34151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
---
&#160;ChangeLog &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#124; &#160; 26 ++++++++++++++++
&#160;inits.c &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#124; &#160; &#160;4 ++
&#160;random.c &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#124; &#160; 74 +++++++++++++++++++++++++++++++++++----------
&#160;st.c &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#124; &#160; 14 ++++++++-
&#160;string.c &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#124; &#160; &#160;7 ++++-
&#160;test/ruby/test_string.rb &#124; &#160; 13 ++++++++
&#160;version.h &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#124; &#160; &#160;8 ++--
&#160;7 files changed, 123 insertions(+), 23 deletions(-)

diff --git a/inits.c b/inits.c
index 947bbbe..a0e061f 100644
--- a/inits.c
+++ b/inits.c
@@ -38,6 +38,7 @@
&#160;void Init_sym _((void));
&#160;void Init_process _((void));
&#160;void Init_Random _((void));
+void Init_RandomSeed _((void));
&#160;void Init_Range _((void));
&#160;void Init_Regexp _((void));
&#160;void Init_signal _((void));
@@ -46,10 +47,13 @@
&#160;void Init_Time _((void));
&#160;void Init_var_tables _((void));
&#160;void Init_version _((void));
+void Init_st _((void));
&#160;
&#160;void
&#160;rb_call_inits()
&#160;{
+ &#160; &#160;Init_RandomSeed();
+ &#160; &#160;Init_st();
&#160; &#160; &#160;Init_sym();
&#160; &#160; &#160;Init_var_tables();
&#160; &#160; &#160;Init_Object();
diff --git a/random.c b/random.c
index c0560e3..24a0787 100644
--- a/random.c
+++ b/random.c
@@ -189,6 +189,7 @@
&#160;#include &#60;fcntl.h&#62;
&#160;#endif
&#160;
+static int seed_initialized = 0;
&#160;static VALUE saved_seed = INT2FIX(0);
&#160;
&#160;static VALUE
@@ -250,27 +251,22 @@
&#160; &#160; &#160;return old;
&#160;}
&#160;
-static VALUE
-random_seed()
+#define DEFAULT_SEED_LEN (4 * sizeof(long))
+
+static void
+fill_random_seed(ptr)
+ &#160; &#160;char *ptr;
&#160;{
&#160; &#160; &#160;static int n = 0;
+ &#160; &#160;unsigned long *seed;
&#160; &#160; &#160;struct timeval tv;
&#160; &#160; &#160;int fd;
&#160; &#160; &#160;struct stat statbuf;
+ &#160; &#160;char *buf = (char*)ptr;
&#160;
- &#160; &#160;int seed_len;
- &#160; &#160;BDIGIT *digits;
- &#160; &#160;unsigned long *seed;
- &#160; &#160;NEWOBJ(big, struct RBignum);
- &#160; &#160;OBJSETUP(big, rb_cBignum, T_BIGNUM);
-
- &#160; &#160;seed_len = 4 * sizeof(long);
- &#160; &#160;big-&#62;sign = 1;
- &#160; &#160;big-&#62;len = seed_len / SIZEOF_BDIGITS + 1;
- &#160; &#160;digits = big-&#62;digits = ALLOC_N(BDIGIT, big-&#62;len);
- &#160; &#160;seed = (unsigned long *)big-&#62;digits;
+ &#160; &#160;seed = (unsigned long *)buf;
&#160;
- &#160; &#160;memset(digits, 0, big-&#62;len * SIZEOF_BDIGITS);
+ &#160; &#160;memset(buf, 0, DEFAULT_SEED_LEN);
&#160;
&#160;#ifdef S_ISCHR
&#160; &#160; &#160;if ((fd = open(&#34;/dev/urandom&#34;, O_RDONLY
@@ -285,7 +281,7 @@
&#160;#endif
&#160; &#160; &#160; &#160; &#160; &#160; &#160;)) &#62;= 0) {
&#160; &#160; &#160; &#160; &#160;if (fstat(fd, &#38;statbuf) == 0 &#38;&#38; S_ISCHR(statbuf.st_mode)) {
- &#160; &#160; &#160; &#160; &#160; &#160;read(fd, seed, seed_len);
+ &#160; &#160; &#160; &#160; &#160; &#160;read(fd, seed, DEFAULT_SEED_LEN);
&#160; &#160; &#160; &#160; &#160;}
&#160; &#160; &#160; &#160; &#160;close(fd);
&#160; &#160; &#160;}
@@ -296,13 +292,37 @@
&#160; &#160; &#160;seed[1] ^= tv.tv_sec;
&#160; &#160; &#160;seed[2] ^= getpid() ^ (n++ &#60;&#60; 16);
&#160; &#160; &#160;seed[3] ^= (unsigned long)&#38;seed;
+}
+
+static VALUE
+make_seed_value(char *ptr)
+{
+ &#160; &#160;BDIGIT *digits;
+ &#160; &#160;NEWOBJ(big, struct RBignum);
+ &#160; &#160;OBJSETUP(big, rb_cBignum, T_BIGNUM);
+
+ &#160; &#160;RBIGNUM_SET_SIGN(big, 1);
+
+ &#160; &#160;digits = ALLOC_N(char, DEFAULT_SEED_LEN);
+ &#160; &#160;RBIGNUM(big)-&#62;digits = digits;
+ &#160; &#160;RBIGNUM(big)-&#62;len = DEFAULT_SEED_LEN / SIZEOF_BDIGITS;
+
+ &#160; &#160;MEMCPY(digits, ptr, char, DEFAULT_SEED_LEN);
&#160;
&#160; &#160; &#160;/* set leading-zero-guard if need. */
- &#160; &#160;digits[big-&#62;len-1] = digits[big-&#62;len-2] &#60;= 1 ? 1 : 0;
+ &#160; &#160;digits[RBIGNUM_LEN(big)-1] = digits[RBIGNUM_LEN(big)-2] &#60;= 1 ? 1 : 0;
&#160;
&#160; &#160; &#160;return rb_big_norm((VALUE)big);
&#160;}
&#160;
+static VALUE
+random_seed(void)
+{
+ &#160; &#160;char buf[DEFAULT_SEED_LEN];
+ &#160; &#160;fill_random_seed(buf);
+ &#160; &#160;return make_seed_value(buf);
+}
+
&#160;/*
&#160; * &#160;call-seq:
&#160; * &#160; &#160; srand(number=0) &#160; &#160;=&#62; old_seed
@@ -443,6 +463,9 @@
&#160; &#160; &#160;long val, max;
&#160;
&#160; &#160; &#160;rb_scan_args(argc, argv, &#34;01&#34;, &#38;vmax);
+ &#160; &#160;if (!seed_initialized) {
+ &#160; &#160; &#160; rand_init(random_seed());
+ &#160; &#160;}
&#160; &#160; &#160;switch (TYPE(vmax)) {
&#160; &#160; &#160; &#160;case T_FLOAT:
&#160; &#160; if (RFLOAT(vmax)-&#62;value &#60;= LONG_MAX &#38;&#38; RFLOAT(vmax)-&#62;value &#62;= LONG_MIN) {
@@ -490,6 +513,8 @@
&#160; &#160; &#160;return LONG2NUM(val);
&#160;}
&#160;
+static char initial_seed[DEFAULT_SEED_LEN];
+
&#160;void
&#160;rb_reset_random_seed()
&#160;{
@@ -497,9 +522,24 @@
&#160;}
&#160;
&#160;void
+Init_RandomSeed(void)
+{
+ &#160; &#160;fill_random_seed(initial_seed);
+ &#160; &#160;init_by_array((unsigned long*)initial_seed, DEFAULT_SEED_LEN/sizeof(unsigned long));
+ &#160; &#160;seed_initialized = 1;
+}
+
+static void
+Init_RandomSeed2(void)
+{
+ &#160; &#160;saved_seed = make_seed_value(initial_seed);
+ &#160; &#160;memset(initial_seed, 0, DEFAULT_SEED_LEN);
+}
+
+void
&#160;Init_Random()
&#160;{
- &#160; &#160;rand_init(random_seed());
+ &#160; &#160;Init_RandomSeed2();
&#160; &#160; &#160;rb_define_global_function(&#34;srand&#34;, rb_f_srand, -1);
&#160; &#160; &#160;rb_define_global_function(&#34;rand&#34;, rb_f_rand, -1);
&#160; &#160; &#160;rb_global_variable(&#38;saved_seed);
diff --git a/st.c b/st.c
index c16c310..21e157a 100644
--- a/st.c
+++ b/st.c
@@ -9,6 +9,7 @@
&#160;#include &#60;stdlib.h&#62;
&#160;#endif
&#160;#include &#60;string.h&#62;
+#include &#60;limits.h&#62;
&#160;#include &#34;st.h&#34;
&#160;
&#160;typedef struct st_table_entry st_table_entry;
@@ -521,6 +522,8 @@ struct st_table_entry {
&#160; &#160; &#160;return 0;
&#160;}
&#160;
+static unsigned long hash_seed = 0;
+
&#160;static int
&#160;strhash(string)
&#160; &#160; &#160;register const char *string;
@@ -550,10 +553,11 @@ struct st_table_entry {
&#160;
&#160; &#160; &#160;return val + (val &#60;&#60; 15);
&#160;#else
- &#160; &#160;register int val = 0;
+ &#160; &#160;register unsigned long val = hash_seed;
&#160;
&#160; &#160; &#160;while ((c = *string++) != '') {
&#160; &#160; val = val*997 + c;
+ &#160; val = (val &#60;&#60; 13) &#124; (val &#62;&#62; (sizeof(st_data_t) * CHAR_BIT - 13));
&#160; &#160; &#160;}
&#160;
&#160; &#160; &#160;return val + (val&#62;&#62;5);
@@ -573,3 +577,11 @@ struct st_table_entry {
&#160;{
&#160; &#160; &#160;return n;
&#160;}
+
+extern unsigned long rb_genrand_int32(void);
+
+void
+Init_st(void)
+{
+ &#160; &#160;hash_seed = rb_genrand_int32();
+}
diff --git a/string.c b/string.c
index c6b2301..94a0281 100644
--- a/string.c
+++ b/string.c
@@ -875,13 +875,15 @@
&#160; &#160; &#160;return str1;
&#160;}
&#160;
+static unsigned long hash_seed;
+
&#160;int
&#160;rb_str_hash(str)
&#160; &#160; &#160;VALUE str;
&#160;{
&#160; &#160; &#160;register long len = RSTRING(str)-&#62;len;
&#160; &#160; &#160;register char *p = RSTRING(str)-&#62;ptr;
- &#160; &#160;register int key = 0;
+ &#160; &#160;register unsigned long key = hash_seed;
&#160;
&#160;#if defined(HASH_ELFHASH)
&#160; &#160; &#160;register unsigned int g;
@@ -905,6 +907,7 @@
&#160; &#160; &#160;while (len--) {
&#160; &#160; key = key*65599 + *p;
&#160; &#160; p++;
+ &#160; key = (key &#60;&#60; 13) &#124; (key &#62;&#62; ((sizeof(unsigned long) * CHAR_BIT) - 13));
&#160; &#160; &#160;}
&#160; &#160; &#160;key = key + (key&#62;&#62;5);
&#160;#endif
@@ -5062,4 +5065,6 @@ struct tr {
&#160; &#160; &#160;rb_fs = Qnil;
&#160; &#160; &#160;rb_define_variable(&#34;$;&#34;, &#38;rb_fs);
&#160; &#160; &#160;rb_define_variable(&#34;$-F&#34;, &#38;rb_fs);
+
+ &#160; &#160;hash_seed = rb_genrand_int32();
&#160;}
diff --git a/test/ruby/test_string.rb b/test/ruby/test_string.rb
index 5f2c54f..4d97182 100644
--- a/test/ruby/test_string.rb
+++ b/test/ruby/test_string.rb
@@ -1,4 +1,5 @@
&#160;require 'test/unit'
+require File.expand_path('envutil', File.dirname(__FILE__))
&#160;
&#160;class TestString &#60; Test::Unit::TestCase
&#160; &#160;def check_sum(str, bits=16)
@@ -29,4 +30,16 @@ def test_inspect
&#160; &#160;ensure
&#160; &#160; &#160;$KCODE = original_kcode
&#160; &#160;end
+
+ &#160;def test_hash_random
+ &#160; &#160;str = 'abc'
+ &#160; &#160;a = [str.hash.to_s]
+ &#160; &#160;cmd = sprintf(&#34;%s -e 'print %s.hash'&#34;, EnvUtil.rubybin, str.dump)
+ &#160; &#160;3.times {
+ &#160; &#160; &#160;IO.popen(cmd, &#34;rb&#34;) {&#124;o&#124;
+ &#160; &#160; &#160; &#160;a &#60;&#60; o.read
+ &#160; &#160; &#160;}
+ &#160; &#160;}
+ &#160; &#160;assert_not_equal([str.hash.to_s], a.uniq)
+ &#160;end
&#160;end

--- a/version.c 2011-12-19 03:22:43.000000000 +0000
+++ b/version.c 2011-12-29 18:18:58.000000000 +0000
@@ -46,7 +46,7 @@
&#160; &#160; &#160;rb_define_global_const(&#34;RUBY_PATCHLEVEL&#34;, INT2FIX(RUBY_PATCHLEVEL));
&#160;
&#160; &#160; &#160;snprintf(description, sizeof(description),
- &#160; &#160; &#160; &#160; &#160; &#160; &#34;ruby %s (%s %s %d) [%s], MBARI 0x%x, Ruby Enterprise Edition %s&#34;,
+ &#160; &#160; &#160; &#160; &#160; &#160; &#34;ruby %s (%s %s %d) [%s], MBARI 0x%x, Ruby Enterprise Edition %s (with hash random)&#34;,
&#160; &#160; &#160; &#160; &#160; &#160; &#160; RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_RELEASE_STR,
&#160; &#160; &#160; &#160; &#160; &#160; &#160; RUBY_RELEASE_NUM, RUBY_PLATFORM,
&#160; &#160; &#160; &#160; &#160; &#160; &#160; STACK_WIPE_SITES, REE_VERSION);
-- 
1.7.5.4
You can view it or download it from github.
Disclaimer: This is provided as is, no guarantees are provided, etc.]]></description>
			<content:encoded><![CDATA[<p>Since there is no patch for Ruby Enterprise Edition 2011.03 to prevent the <a href="http://www.youtube.com/watch?v=R2Cq3CLI6H8" rel="shadowbox[post-550];player=swf;width=640;height=385;">Hash Collision DoS Attack</a>, I&#8217;ve quickly ported <a href="http://www.ruby-forum.com/topic/3312298#1038441">ruby 1.8.7 patchlevel 357</a> patch. Here it is:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br />28<br />29<br />30<br />31<br />32<br />33<br />34<br />35<br />36<br />37<br />38<br />39<br />40<br />41<br />42<br />43<br />44<br />45<br />46<br />47<br />48<br />49<br />50<br />51<br />52<br />53<br />54<br />55<br />56<br />57<br />58<br />59<br />60<br />61<br />62<br />63<br />64<br />65<br />66<br />67<br />68<br />69<br />70<br />71<br />72<br />73<br />74<br />75<br />76<br />77<br />78<br />79<br />80<br />81<br />82<br />83<br />84<br />85<br />86<br />87<br />88<br />89<br />90<br />91<br />92<br />93<br />94<br />95<br />96<br />97<br />98<br />99<br />100<br />101<br />102<br />103<br />104<br />105<br />106<br />107<br />108<br />109<br />110<br />111<br />112<br />113<br />114<br />115<br />116<br />117<br />118<br />119<br />120<br />121<br />122<br />123<br />124<br />125<br />126<br />127<br />128<br />129<br />130<br />131<br />132<br />133<br />134<br />135<br />136<br />137<br />138<br />139<br />140<br />141<br />142<br />143<br />144<br />145<br />146<br />147<br />148<br />149<br />150<br />151<br />152<br />153<br />154<br />155<br />156<br />157<br />158<br />159<br />160<br />161<br />162<br />163<br />164<br />165<br />166<br />167<br />168<br />169<br />170<br />171<br />172<br />173<br />174<br />175<br />176<br />177<br />178<br />179<br />180<br />181<br />182<br />183<br />184<br />185<br />186<br />187<br />188<br />189<br />190<br />191<br />192<br />193<br />194<br />195<br />196<br />197<br />198<br />199<br />200<br />201<br />202<br />203<br />204<br />205<br />206<br />207<br />208<br />209<br />210<br />211<br />212<br />213<br />214<br />215<br />216<br />217<br />218<br />219<br />220<br />221<br />222<br />223<br />224<br />225<br />226<br />227<br />228<br />229<br />230<br />231<br />232<br />233<br />234<br />235<br />236<br />237<br />238<br />239<br />240<br />241<br />242<br />243<br />244<br />245<br />246<br />247<br />248<br />249<br />250<br />251<br />252<br />253<br />254<br />255<br />256<br />257<br />258<br />259<br />260<br />261<br />262<br />263<br />264<br />265<br />266<br />267<br />268<br />269<br />270<br />271<br />272<br />273<br />274<br />275<br />276<br />277<br />278<br />279<br />280<br />281<br />282<br />283<br />284<br />285<br />286<br />287<br />288<br />289<br />290<br />291<br />292<br />293<br />294<br />295<br />296<br />297<br />298<br />299<br />300<br />301<br />302<br />303<br />304<br />305<br />306<br />307<br />308<br />309<br />310<br />311<br />312<br />313<br />314<br /></div></td><td><div>From e19bd3eaa8bd71cfc9e5bf436527f015b093f31e Mon Sep 17 00:00:00 2001<br />
From: shyouhei &lt;shyouhei@b2dd03c8-39d4-4d8f-98ff-823fe69b080e&gt;<br />
Date: Wed, 28 Dec 2011 12:47:15 +0000<br />
Subject: [PATCH] -This line, and those below, will be ignored--<br />
<br />
M &nbsp; &nbsp;ruby_1_8_7/inits.c<br />
M &nbsp; &nbsp;ruby_1_8_7/string.c<br />
M &nbsp; &nbsp;ruby_1_8_7/st.c<br />
M &nbsp; &nbsp;ruby_1_8_7/test/ruby/test_string.rb<br />
M &nbsp; &nbsp;ruby_1_8_7/random.c<br />
<br />
<br />
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_8_7@34151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e<br />
---<br />
&nbsp;ChangeLog &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| &nbsp; 26 ++++++++++++++++<br />
&nbsp;inits.c &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| &nbsp; &nbsp;4 ++<br />
&nbsp;random.c &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; 74 +++++++++++++++++++++++++++++++++++----------<br />
&nbsp;st.c &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; 14 ++++++++-<br />
&nbsp;string.c &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; &nbsp;7 ++++-<br />
&nbsp;test/ruby/test_string.rb | &nbsp; 13 ++++++++<br />
&nbsp;version.h &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| &nbsp; &nbsp;8 ++--<br />
&nbsp;7 files changed, 123 insertions(+), 23 deletions(-)<br />
<br />
diff --git a/inits.c b/inits.c<br />
index 947bbbe..a0e061f 100644<br />
--- a/inits.c<br />
+++ b/inits.c<br />
@@ -38,6 +38,7 @@<br />
&nbsp;void Init_sym _((void));<br />
&nbsp;void Init_process _((void));<br />
&nbsp;void Init_Random _((void));<br />
+void Init_RandomSeed _((void));<br />
&nbsp;void Init_Range _((void));<br />
&nbsp;void Init_Regexp _((void));<br />
&nbsp;void Init_signal _((void));<br />
@@ -46,10 +47,13 @@<br />
&nbsp;void Init_Time _((void));<br />
&nbsp;void Init_var_tables _((void));<br />
&nbsp;void Init_version _((void));<br />
+void Init_st _((void));<br />
&nbsp;<br />
&nbsp;void<br />
&nbsp;rb_call_inits()<br />
&nbsp;{<br />
+ &nbsp; &nbsp;Init_RandomSeed();<br />
+ &nbsp; &nbsp;Init_st();<br />
&nbsp; &nbsp; &nbsp;Init_sym();<br />
&nbsp; &nbsp; &nbsp;Init_var_tables();<br />
&nbsp; &nbsp; &nbsp;Init_Object();<br />
diff --git a/random.c b/random.c<br />
index c0560e3..24a0787 100644<br />
--- a/random.c<br />
+++ b/random.c<br />
@@ -189,6 +189,7 @@<br />
&nbsp;#include &lt;fcntl.h&gt;<br />
&nbsp;#endif<br />
&nbsp;<br />
+static int seed_initialized = 0;<br />
&nbsp;static VALUE saved_seed = INT2FIX(0);<br />
&nbsp;<br />
&nbsp;static VALUE<br />
@@ -250,27 +251,22 @@<br />
&nbsp; &nbsp; &nbsp;return old;<br />
&nbsp;}<br />
&nbsp;<br />
-static VALUE<br />
-random_seed()<br />
+#define DEFAULT_SEED_LEN (4 * sizeof(long))<br />
+<br />
+static void<br />
+fill_random_seed(ptr)<br />
+ &nbsp; &nbsp;char *ptr;<br />
&nbsp;{<br />
&nbsp; &nbsp; &nbsp;static int n = 0;<br />
+ &nbsp; &nbsp;unsigned long *seed;<br />
&nbsp; &nbsp; &nbsp;struct timeval tv;<br />
&nbsp; &nbsp; &nbsp;int fd;<br />
&nbsp; &nbsp; &nbsp;struct stat statbuf;<br />
+ &nbsp; &nbsp;char *buf = (char*)ptr;<br />
&nbsp;<br />
- &nbsp; &nbsp;int seed_len;<br />
- &nbsp; &nbsp;BDIGIT *digits;<br />
- &nbsp; &nbsp;unsigned long *seed;<br />
- &nbsp; &nbsp;NEWOBJ(big, struct RBignum);<br />
- &nbsp; &nbsp;OBJSETUP(big, rb_cBignum, T_BIGNUM);<br />
-<br />
- &nbsp; &nbsp;seed_len = 4 * sizeof(long);<br />
- &nbsp; &nbsp;big-&gt;sign = 1;<br />
- &nbsp; &nbsp;big-&gt;len = seed_len / SIZEOF_BDIGITS + 1;<br />
- &nbsp; &nbsp;digits = big-&gt;digits = ALLOC_N(BDIGIT, big-&gt;len);<br />
- &nbsp; &nbsp;seed = (unsigned long *)big-&gt;digits;<br />
+ &nbsp; &nbsp;seed = (unsigned long *)buf;<br />
&nbsp;<br />
- &nbsp; &nbsp;memset(digits, 0, big-&gt;len * SIZEOF_BDIGITS);<br />
+ &nbsp; &nbsp;memset(buf, 0, DEFAULT_SEED_LEN);<br />
&nbsp;<br />
&nbsp;#ifdef S_ISCHR<br />
&nbsp; &nbsp; &nbsp;if ((fd = open(&quot;/dev/urandom&quot;, O_RDONLY<br />
@@ -285,7 +281,7 @@<br />
&nbsp;#endif<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;)) &gt;= 0) {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;if (fstat(fd, &amp;statbuf) == 0 &amp;&amp; S_ISCHR(statbuf.st_mode)) {<br />
- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;read(fd, seed, seed_len);<br />
+ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;read(fd, seed, DEFAULT_SEED_LEN);<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;close(fd);<br />
&nbsp; &nbsp; &nbsp;}<br />
@@ -296,13 +292,37 @@<br />
&nbsp; &nbsp; &nbsp;seed[1] ^= tv.tv_sec;<br />
&nbsp; &nbsp; &nbsp;seed[2] ^= getpid() ^ (n++ &lt;&lt; 16);<br />
&nbsp; &nbsp; &nbsp;seed[3] ^= (unsigned long)&amp;seed;<br />
+}<br />
+<br />
+static VALUE<br />
+make_seed_value(char *ptr)<br />
+{<br />
+ &nbsp; &nbsp;BDIGIT *digits;<br />
+ &nbsp; &nbsp;NEWOBJ(big, struct RBignum);<br />
+ &nbsp; &nbsp;OBJSETUP(big, rb_cBignum, T_BIGNUM);<br />
+<br />
+ &nbsp; &nbsp;RBIGNUM_SET_SIGN(big, 1);<br />
+<br />
+ &nbsp; &nbsp;digits = ALLOC_N(char, DEFAULT_SEED_LEN);<br />
+ &nbsp; &nbsp;RBIGNUM(big)-&gt;digits = digits;<br />
+ &nbsp; &nbsp;RBIGNUM(big)-&gt;len = DEFAULT_SEED_LEN / SIZEOF_BDIGITS;<br />
+<br />
+ &nbsp; &nbsp;MEMCPY(digits, ptr, char, DEFAULT_SEED_LEN);<br />
&nbsp;<br />
&nbsp; &nbsp; &nbsp;/* set leading-zero-guard if need. */<br />
- &nbsp; &nbsp;digits[big-&gt;len-1] = digits[big-&gt;len-2] &lt;= 1 ? 1 : 0;<br />
+ &nbsp; &nbsp;digits[RBIGNUM_LEN(big)-1] = digits[RBIGNUM_LEN(big)-2] &lt;= 1 ? 1 : 0;<br />
&nbsp;<br />
&nbsp; &nbsp; &nbsp;return rb_big_norm((VALUE)big);<br />
&nbsp;}<br />
&nbsp;<br />
+static VALUE<br />
+random_seed(void)<br />
+{<br />
+ &nbsp; &nbsp;char buf[DEFAULT_SEED_LEN];<br />
+ &nbsp; &nbsp;fill_random_seed(buf);<br />
+ &nbsp; &nbsp;return make_seed_value(buf);<br />
+}<br />
+<br />
&nbsp;/*<br />
&nbsp; * &nbsp;call-seq:<br />
&nbsp; * &nbsp; &nbsp; srand(number=0) &nbsp; &nbsp;=&gt; old_seed<br />
@@ -443,6 +463,9 @@<br />
&nbsp; &nbsp; &nbsp;long val, max;<br />
&nbsp;<br />
&nbsp; &nbsp; &nbsp;rb_scan_args(argc, argv, &quot;01&quot;, &amp;vmax);<br />
+ &nbsp; &nbsp;if (!seed_initialized) {<br />
+ &nbsp; &nbsp; &nbsp; rand_init(random_seed());<br />
+ &nbsp; &nbsp;}<br />
&nbsp; &nbsp; &nbsp;switch (TYPE(vmax)) {<br />
&nbsp; &nbsp; &nbsp; &nbsp;case T_FLOAT:<br />
&nbsp; &nbsp; if (RFLOAT(vmax)-&gt;value &lt;= LONG_MAX &amp;&amp; RFLOAT(vmax)-&gt;value &gt;= LONG_MIN) {<br />
@@ -490,6 +513,8 @@<br />
&nbsp; &nbsp; &nbsp;return LONG2NUM(val);<br />
&nbsp;}<br />
&nbsp;<br />
+static char initial_seed[DEFAULT_SEED_LEN];<br />
+<br />
&nbsp;void<br />
&nbsp;rb_reset_random_seed()<br />
&nbsp;{<br />
@@ -497,9 +522,24 @@<br />
&nbsp;}<br />
&nbsp;<br />
&nbsp;void<br />
+Init_RandomSeed(void)<br />
+{<br />
+ &nbsp; &nbsp;fill_random_seed(initial_seed);<br />
+ &nbsp; &nbsp;init_by_array((unsigned long*)initial_seed, DEFAULT_SEED_LEN/sizeof(unsigned long));<br />
+ &nbsp; &nbsp;seed_initialized = 1;<br />
+}<br />
+<br />
+static void<br />
+Init_RandomSeed2(void)<br />
+{<br />
+ &nbsp; &nbsp;saved_seed = make_seed_value(initial_seed);<br />
+ &nbsp; &nbsp;memset(initial_seed, 0, DEFAULT_SEED_LEN);<br />
+}<br />
+<br />
+void<br />
&nbsp;Init_Random()<br />
&nbsp;{<br />
- &nbsp; &nbsp;rand_init(random_seed());<br />
+ &nbsp; &nbsp;Init_RandomSeed2();<br />
&nbsp; &nbsp; &nbsp;rb_define_global_function(&quot;srand&quot;, rb_f_srand, -1);<br />
&nbsp; &nbsp; &nbsp;rb_define_global_function(&quot;rand&quot;, rb_f_rand, -1);<br />
&nbsp; &nbsp; &nbsp;rb_global_variable(&amp;saved_seed);<br />
diff --git a/st.c b/st.c<br />
index c16c310..21e157a 100644<br />
--- a/st.c<br />
+++ b/st.c<br />
@@ -9,6 +9,7 @@<br />
&nbsp;#include &lt;stdlib.h&gt;<br />
&nbsp;#endif<br />
&nbsp;#include &lt;string.h&gt;<br />
+#include &lt;limits.h&gt;<br />
&nbsp;#include &quot;st.h&quot;<br />
&nbsp;<br />
&nbsp;typedef struct st_table_entry st_table_entry;<br />
@@ -521,6 +522,8 @@ struct st_table_entry {<br />
&nbsp; &nbsp; &nbsp;return 0;<br />
&nbsp;}<br />
&nbsp;<br />
+static unsigned long hash_seed = 0;<br />
+<br />
&nbsp;static int<br />
&nbsp;strhash(string)<br />
&nbsp; &nbsp; &nbsp;register const char *string;<br />
@@ -550,10 +553,11 @@ struct st_table_entry {<br />
&nbsp;<br />
&nbsp; &nbsp; &nbsp;return val + (val &lt;&lt; 15);<br />
&nbsp;#else<br />
- &nbsp; &nbsp;register int val = 0;<br />
+ &nbsp; &nbsp;register unsigned long val = hash_seed;<br />
&nbsp;<br />
&nbsp; &nbsp; &nbsp;while ((c = *string++) != '\0') {<br />
&nbsp; &nbsp; val = val*997 + c;<br />
+ &nbsp; val = (val &lt;&lt; 13) | (val &gt;&gt; (sizeof(st_data_t) * CHAR_BIT - 13));<br />
&nbsp; &nbsp; &nbsp;}<br />
&nbsp;<br />
&nbsp; &nbsp; &nbsp;return val + (val&gt;&gt;5);<br />
@@ -573,3 +577,11 @@ struct st_table_entry {<br />
&nbsp;{<br />
&nbsp; &nbsp; &nbsp;return n;<br />
&nbsp;}<br />
+<br />
+extern unsigned long rb_genrand_int32(void);<br />
+<br />
+void<br />
+Init_st(void)<br />
+{<br />
+ &nbsp; &nbsp;hash_seed = rb_genrand_int32();<br />
+}<br />
diff --git a/string.c b/string.c<br />
index c6b2301..94a0281 100644<br />
--- a/string.c<br />
+++ b/string.c<br />
@@ -875,13 +875,15 @@<br />
&nbsp; &nbsp; &nbsp;return str1;<br />
&nbsp;}<br />
&nbsp;<br />
+static unsigned long hash_seed;<br />
+<br />
&nbsp;int<br />
&nbsp;rb_str_hash(str)<br />
&nbsp; &nbsp; &nbsp;VALUE str;<br />
&nbsp;{<br />
&nbsp; &nbsp; &nbsp;register long len = RSTRING(str)-&gt;len;<br />
&nbsp; &nbsp; &nbsp;register char *p = RSTRING(str)-&gt;ptr;<br />
- &nbsp; &nbsp;register int key = 0;<br />
+ &nbsp; &nbsp;register unsigned long key = hash_seed;<br />
&nbsp;<br />
&nbsp;#if defined(HASH_ELFHASH)<br />
&nbsp; &nbsp; &nbsp;register unsigned int g;<br />
@@ -905,6 +907,7 @@<br />
&nbsp; &nbsp; &nbsp;while (len--) {<br />
&nbsp; &nbsp; key = key*65599 + *p;<br />
&nbsp; &nbsp; p++;<br />
+ &nbsp; key = (key &lt;&lt; 13) | (key &gt;&gt; ((sizeof(unsigned long) * CHAR_BIT) - 13));<br />
&nbsp; &nbsp; &nbsp;}<br />
&nbsp; &nbsp; &nbsp;key = key + (key&gt;&gt;5);<br />
&nbsp;#endif<br />
@@ -5062,4 +5065,6 @@ struct tr {<br />
&nbsp; &nbsp; &nbsp;rb_fs = Qnil;<br />
&nbsp; &nbsp; &nbsp;rb_define_variable(&quot;$;&quot;, &amp;rb_fs);<br />
&nbsp; &nbsp; &nbsp;rb_define_variable(&quot;$-F&quot;, &amp;rb_fs);<br />
+<br />
+ &nbsp; &nbsp;hash_seed = rb_genrand_int32();<br />
&nbsp;}<br />
diff --git a/test/ruby/test_string.rb b/test/ruby/test_string.rb<br />
index 5f2c54f..4d97182 100644<br />
--- a/test/ruby/test_string.rb<br />
+++ b/test/ruby/test_string.rb<br />
@@ -1,4 +1,5 @@<br />
&nbsp;require 'test/unit'<br />
+require File.expand_path('envutil', File.dirname(__FILE__))<br />
&nbsp;<br />
&nbsp;class TestString &lt; Test::Unit::TestCase<br />
&nbsp; &nbsp;def check_sum(str, bits=16)<br />
@@ -29,4 +30,16 @@ def test_inspect<br />
&nbsp; &nbsp;ensure<br />
&nbsp; &nbsp; &nbsp;$KCODE = original_kcode<br />
&nbsp; &nbsp;end<br />
+<br />
+ &nbsp;def test_hash_random<br />
+ &nbsp; &nbsp;str = 'abc'<br />
+ &nbsp; &nbsp;a = [str.hash.to_s]<br />
+ &nbsp; &nbsp;cmd = sprintf(&quot;%s -e 'print %s.hash'&quot;, EnvUtil.rubybin, str.dump)<br />
+ &nbsp; &nbsp;3.times {<br />
+ &nbsp; &nbsp; &nbsp;IO.popen(cmd, &quot;rb&quot;) {|o|<br />
+ &nbsp; &nbsp; &nbsp; &nbsp;a &lt;&lt; o.read<br />
+ &nbsp; &nbsp; &nbsp;}<br />
+ &nbsp; &nbsp;}<br />
+ &nbsp; &nbsp;assert_not_equal([str.hash.to_s], a.uniq)<br />
+ &nbsp;end<br />
&nbsp;end<br />
<br />
--- a/version.c 2011-12-19 03:22:43.000000000 +0000<br />
+++ b/version.c 2011-12-29 18:18:58.000000000 +0000<br />
@@ -46,7 +46,7 @@<br />
&nbsp; &nbsp; &nbsp;rb_define_global_const(&quot;RUBY_PATCHLEVEL&quot;, INT2FIX(RUBY_PATCHLEVEL));<br />
&nbsp;<br />
&nbsp; &nbsp; &nbsp;snprintf(description, sizeof(description),<br />
- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;ruby %s (%s %s %d) [%s], MBARI 0x%x, Ruby Enterprise Edition %s&quot;,<br />
+ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;ruby %s (%s %s %d) [%s], MBARI 0x%x, Ruby Enterprise Edition %s (with hash random)&quot;,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_RELEASE_STR,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; RUBY_RELEASE_NUM, RUBY_PLATFORM,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; STACK_WIPE_SITES, REE_VERSION);<br />
-- <br />
1.7.5.4</div></td></tr></tbody></table></div>
<p>You can <a href="https://gist.github.com/1535569">view it</a> or <a href="https://raw.github.com/gist/1535569/b59b5e3e753288fa7a2f23d640a285775ab0879b/ruby-1.8.7-hash-randomize.patch">download it from github</a>.</p>
<p><b>Disclaimer:</b> This is provided as is, no guarantees are provided, etc.</p>

<p><a href="http://feedads.g.doubleclick.net/~a/DYL8ZeWiKqDFVgPaEJ1FI_fdOPE/0/da"><img src="http://feedads.g.doubleclick.net/~a/DYL8ZeWiKqDFVgPaEJ1FI_fdOPE/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/DYL8ZeWiKqDFVgPaEJ1FI_fdOPE/1/da"><img src="http://feedads.g.doubleclick.net/~a/DYL8ZeWiKqDFVgPaEJ1FI_fdOPE/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=V534dICoisA:fJDDu5P_f0A:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=V534dICoisA:fJDDu5P_f0A:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=V534dICoisA:fJDDu5P_f0A:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=V534dICoisA:fJDDu5P_f0A:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=V534dICoisA:fJDDu5P_f0A:V_sGLiPBpWU" border="0"></img></a>
</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=31448&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=31448&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/12/29/quick-and-dirty-patch-for-ruby-enterprise-edition-2011-03-to-prevent-hash-collision-attacks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nginx-Fu: X-Accel-Redirect From Remote Servers</title>
		<link>http://feedproxy.google.com/~r/Homo-Adminus/~3/cuG-f1auDlo/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nginx-fu-x-accel-redirect-from-remote-servers</link>
		<comments>http://feedproxy.google.com/~r/Homo-Adminus/~3/cuG-f1auDlo/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 03:45:52 +0000</pubDate>
		<dc:creator>Alexey Kovyrin</dc:creator>
				<category><![CDATA[Admin-tips]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[download]]></category>
		<category><![CDATA[networks]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[nginx-fu]]></category>
		<category><![CDATA[redirect]]></category>
		<category><![CDATA[S3]]></category>
		<category><![CDATA[x-accel-redirect]]></category>

		<guid isPermaLink="false">http://kovyrin.net/?p=462</guid>
		<description><![CDATA[We use nginx and its features a lot in Scribd. Many times in the last year we needed some pretty interesting, but not supported feature &#8211; we wanted nginx X-Accel-Redirect functionality to work with remote URLs. Our of the box nginx supports this functionality for local URIs only. In this short post I want to explain how did we make nginx serve remote content via X-Accel-Redirect.

First of all, here is what you may need this feature. Let&#8217;s imagine you have a file storage on Amazon S3 where you store tons of content. And you have an application where you have some content downloading functionality that you want to be available for logged-in/paying/premium users and/or you want to keep track of downloads your users perform on your site. If your content was on your web server, you could have used simple controlled downloads functionality built-in to nginx out of the box. But the problem is that your content is remote.
Here is what we do to solve this problem.
First, we create a special location on our nginx server. This location will be used as a proxy for all our accelerated file downloads:
1234567891011121314151617181920212223242526272829303132333435# Proxy download 
location ~* ^/internal_redirect/&#40;.*?&#41;/&#40;.*&#41; &#123;
&#160; &#160; # Do not allow people to mess with this location directly
&#160; &#160; # Only internal redirects are allowed
&#160; &#160; internal;

&#160; &#160; # Location-specific logging
&#160; &#160; access_log logs/internal_redirect.access.log main;
&#160; &#160; error_log logs/internal_redirect.error.log warn;

&#160; &#160; # Extract download url from the request
&#160; &#160; set $download_uri $2;
&#160; &#160; set $download_host $1;

&#160; &#160; # Compose download url
&#160; &#160; set $download_url http://$download_host/$download_uri;

&#160; &#160; # Set download request headers
&#160; &#160; proxy_set_header Host $download_host;
&#160; &#160; proxy_set_header Authorization '';

&#160; &#160; # The next two lines could be used if your storage 
&#160; &#160; # backend does not support Content-Disposition 
&#160; &#160; # headers used to specify file name browsers use 
&#160; &#160; # when save content to the disk
&#160; &#160; proxy_hide_header Content-Disposition;
&#160; &#160; add_header Content-Disposition 'attachment; filename=&#34;$args&#34;';

&#160; &#160; # Do not touch local disks when proxying 
&#160; &#160; # content to clients
&#160; &#160; proxy_max_temp_file_size 0;

&#160; &#160; # Download the file and send it to client
&#160; &#160; proxy_pass $download_url;
&#125;
After adding this location to our nginx config we could start sending responses with headers like the following:
1234567# This header will ask nginx to download a file 
# from http://some.site.com/secret/url.ext and send it to user
X-Accel-Redirect: /internal_redirect/some.site.com/secret/url.ext

# This header will ask nginx to download a file 
# from http://blah.com/secret/url and send it to user as cool.pdf
X-Accel-Redirect: /internal_redirect/blah.com/secret/url?cool.pdf
Here is an example code you could use in a Rails application to use our internal redirect location:
12345678910def x_accel_url&#40;url, file_name = nil&#41;
&#160; uri = &#34;/internal_redirect/#{url.gsub('http://', '')}&#34;
&#160; uri &#60;&#60; &#34;?#{file_name}&#34; if file_name
&#160; return uri
end

def download
&#160; headers&#91;'X-Accel-Redirect'&#93; = x_accel_url&#40;some_secret_url, pretty_name&#41;
&#160; render :nothing =&#62; true
end
As you can see, nginx is really powerful tool and when you turn your creativity on you can make it even more powerful. Stay tuned for more Nginx-Fu posts.



  
]]></description>
			<content:encoded><![CDATA[<p>We use <a href="http://nginx.org/">nginx</a> and its features a lot in <a href="http://www.scribd.com/">Scribd</a>. Many times in the last year we needed some pretty interesting, but not supported feature &#8211; we wanted nginx <a href="http://wiki.nginx.org/NginxXSendfile"><tt>X-Accel-Redirect</tt></a> functionality to work with remote URLs. Our of the box nginx supports this functionality for local URIs only. In this short post I want to explain how did we make nginx serve remote content via <nobr><tt>X-Accel-Redirect</tt></nobr>.</p>
<p><span></span></p>
<p>First of all, here is what you may need this feature. Let&#8217;s imagine you have a file storage on <a href="http://aws.amazon.com/s3/">Amazon S3</a> where you store tons of content. And you have an application where you have some content downloading functionality that you want to be available for logged-in/paying/premium users and/or you want to keep track of downloads your users perform on your site. If your content was on your web server, you could have used simple <a href="http://kovyrin.net/2006/11/01/nginx-x-accel-redirect-php-rails/">controlled downloads</a> functionality built-in to nginx out of the box. But the problem is that your content is remote.</p>
<p>Here is what we do to solve this problem.</p>
<p>First, we create a special location on our nginx server. This location will be used as a proxy for all our accelerated file downloads:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br />28<br />29<br />30<br />31<br />32<br />33<br />34<br />35<br /></div></td><td><div><span># Proxy download </span><br />
<span>location</span> ~* ^/internal_redirect/<span>&#40;</span>.*?<span>&#41;</span>/<span>&#40;</span>.*<span>&#41;</span> <span>&#123;</span><br />
&nbsp; &nbsp; <span># Do not allow people to mess with this location directly</span><br />
&nbsp; &nbsp; <span># Only internal redirects are allowed</span><br />
&nbsp; &nbsp; <span>internal</span>;<br />
<br />
&nbsp; &nbsp; <span># Location-specific logging</span><br />
&nbsp; &nbsp; <span>access_log</span> logs/internal_redirect.access.log main;<br />
&nbsp; &nbsp; <span>error_log</span> logs/internal_redirect.error.log warn;<br />
<br />
&nbsp; &nbsp; <span># Extract download url from the request</span><br />
&nbsp; &nbsp; set <span>$download_uri</span> <span>$2</span>;<br />
&nbsp; &nbsp; set <span>$download_host</span> <span>$1</span>;<br />
<br />
&nbsp; &nbsp; <span># Compose download url</span><br />
&nbsp; &nbsp; set <span>$download_url</span> <span>http</span>://<span>$download_host</span>/<span>$download_uri</span>;<br />
<br />
&nbsp; &nbsp; <span># Set download request headers</span><br />
&nbsp; &nbsp; <span>proxy_set_header</span> <span>Host</span> <span>$download_host</span>;<br />
&nbsp; &nbsp; <span>proxy_set_header</span> Authorization <span>''</span>;<br />
<br />
&nbsp; &nbsp; <span># The next two lines could be used if your storage </span><br />
&nbsp; &nbsp; <span># backend does not support Content-Disposition </span><br />
&nbsp; &nbsp; <span># headers used to specify file name browsers use </span><br />
&nbsp; &nbsp; <span># when save content to the disk</span><br />
&nbsp; &nbsp; proxy_hide_header Content-Disposition;<br />
&nbsp; &nbsp; add_header Content-Disposition <span>'attachment; filename=&quot;$args&quot;'</span>;<br />
<br />
&nbsp; &nbsp; <span># Do not touch local disks when proxying </span><br />
&nbsp; &nbsp; <span># content to clients</span><br />
&nbsp; &nbsp; proxy_max_temp_file_size <span>0</span>;<br />
<br />
&nbsp; &nbsp; <span># Download the file and send it to client</span><br />
&nbsp; &nbsp; <span>proxy_pass</span> <span>$download_url</span>;<br />
<span>&#125;</span></div></td></tr></tbody></table></div>
<p>After adding this location to our nginx config we could start sending responses with headers like the following:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div><span># This header will ask nginx to download a file </span><br />
<span># from http://some.site.com/secret/url.ext and send it to user</span><br />
X-Accel-Redirect: /internal_redirect/some.site.com/secret/url.ext<br />
<br />
<span># This header will ask nginx to download a file </span><br />
<span># from http://blah.com/secret/url and send it to user as cool.pdf</span><br />
X-Accel-Redirect: /internal_redirect/blah.com/secret/url?cool.pdf</div></td></tr></tbody></table></div>
<p>Here is an example code you could use in a Rails application to use our internal redirect location:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br /></div></td><td><div><span>def</span> x_accel_url<span>&#40;</span>url, file_name = <span>nil</span><span>&#41;</span><br />
&nbsp; uri = <span>&quot;/internal_redirect/#{url.gsub('http://', '')}&quot;</span><br />
&nbsp; uri <span>&lt;&lt;</span> <span>&quot;?#{file_name}&quot;</span> <span>if</span> file_name<br />
&nbsp; <span>return</span> uri<br />
<span>end</span><br />
<br />
<span>def</span> download<br />
&nbsp; headers<span>&#91;</span><span>'X-Accel-Redirect'</span><span>&#93;</span> = x_accel_url<span>&#40;</span>some_secret_url, pretty_name<span>&#41;</span><br />
&nbsp; render <span>:nothing</span> <span>=&gt;</span> <span>true</span><br />
<span>end</span></div></td></tr></tbody></table></div>
<p>As you can see, nginx is really powerful tool and when you turn your creativity on you can make it even more powerful. Stay tuned for more <a href="http://kovyrin.net/tag/nginx-fu/">Nginx-Fu</a> posts.</p>

<p><a href="http://feedads.g.doubleclick.net/~a/s4KHA4LgENGAamK4PL09U7Ia3f8/0/da"><img src="http://feedads.g.doubleclick.net/~a/s4KHA4LgENGAamK4PL09U7Ia3f8/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/s4KHA4LgENGAamK4PL09U7Ia3f8/1/da"><img src="http://feedads.g.doubleclick.net/~a/s4KHA4LgENGAamK4PL09U7Ia3f8/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=cuG-f1auDlo:LUrpmTqfO4k:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=cuG-f1auDlo:LUrpmTqfO4k:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=cuG-f1auDlo:LUrpmTqfO4k:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=cuG-f1auDlo:LUrpmTqfO4k:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=cuG-f1auDlo:LUrpmTqfO4k:V_sGLiPBpWU" border="0"></img></a>
</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=25095&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=25095&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2010/06/24/nginx-fu-x-accel-redirect-from-remote-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Advanced Squid Caching in Scribd: Cache Invalidation Techniques</title>
		<link>http://feedproxy.google.com/~r/Homo-Adminus/~3/4ywVA01ppFY/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=advanced-squid-caching-in-scribd-cache-invalidation-techniques</link>
		<comments>http://feedproxy.google.com/~r/Homo-Adminus/~3/4ywVA01ppFY/#comments</comments>
		<pubDate>Sat, 29 May 2010 17:02:17 +0000</pubDate>
		<dc:creator>Alexey Kovyrin</dc:creator>
				<category><![CDATA[Admin-tips]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[HTCP]]></category>
		<category><![CDATA[invalidation]]></category>
		<category><![CDATA[My Projects]]></category>
		<category><![CDATA[networks]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[plugin]]></category>
		<category><![CDATA[squid]]></category>

		<guid isPermaLink="false">http://kovyrin.net/?p=322</guid>
		<description><![CDATA[Having a reverse-proxy web cache as one of the major infrastructure elements brings many benefits for large web applications: it reduces your application servers load, reduces average response times on your site, etc. But there is one problem every developer experiences when works with such a cache &#8211; cached content invalidation.
It is a complex problem that usually consists of two smaller ones: individual cache elements invalidation (you need to keep an eye on your data changes and invalidate cached pages when related data changes) and full cache purges (sometimes your site layout or page templates change and you need to purge all the cached pages to make sure users will get new visual elements of layout changes). In this post I&#8217;d like to look at a few techniques we use at Scribd to solve cache invalidation problems.

So, the first problem &#8211; ongoing cache invalidation when content changes. This is actually a pretty simple task in squid: you just use HTCP protocol and send CLR requests to your caching farm (we didn&#8217;t find any HTCP protocol implementations so we&#8217;ve implemented our own simple client that supports just one command).
Since we use haproxy to balance our traffic in the cluster it is hard to predict where should we send a purge request. So we fan those out to all cache servers.
To make sure cache purging won&#8217;t slow the site down, especially considering we need to do more that just a simple cache purge (submit documents to search indexes, etc, etc), we just spool a &#8220;document changed&#8221; request to a queue and then have a set of asynchronous processes that do all the work in background.
Next, The Hard Problem &#8211; handling full cache purges w/o killing our backend servers with 5x-10x traffic (our normal hit ratio is ~90-95%).
We&#8217;ve spent a lot of time thinking about this problem and the first idea we came up with was to have a loop process somewhere that would iterate all documents we have cached and purge them one by one&#8230; but that does not seem to be a practical solution when you have tens of millions documents (and few page versions per document) and obviously the solution would not scale with constantly growing documents corpus.
So we kept brainstorming and finally got one idea that works just perfectly for us: what if we&#8217;d be able to take our traffic and define a function f(t) that would return a percentage of the traffic that should be purged at any moment in time. So we did it &#8211; we&#8217;ve implemented a nginx module that would version our cache by assigning every cached page a revision (using a custom HTTP-headers + Vary-caching) and would be able to slowly migrate the cache from one revision to another over a pre-defined period of time.
Having that module we are able to do so called &#8220;slow&#8221; cache purges that could take any time from a few minutes (that still helps to reduce the load spike generated by the hottest content) up to many hours (this is what we normally use) or days (never used this option, but it is definitely possible).
Here is an example 100% cache purge over an 8 hour interval:

 Daily hit ratio graph:


 Weekly hit ratio graph:



As you can see, during those slow purges our cached pages would be slowly updated without putting too much pressure on the backend. Cache hit ratio would slowly degrade and then slowly get back to its normal levels, but with our normal (6-8 hours) purges hit ratio never gets lower that 65-70% which makes it possible for us to save huge amounts of money on not having 90% spare capacity just for the cache purge load surges (we used to have lots of spare application cluster capacity before introducing this approach).



  
]]></description>
			<content:encoded><![CDATA[<p>Having a <a href="http://kovyrin.net/2008/10/25/advanced-squid-caching-for-rails-applications-preface/">reverse-proxy</a> web cache as one of the major infrastructure elements brings many benefits for large web applications: it reduces your application servers load, reduces average response times on your site, etc. But there is one problem every developer experiences when works with such a cache &#8211; <em>cached content invalidation</em>.</p>
<p>It is a complex problem that usually consists of two smaller ones: i<em>ndividual cache elements invalidation</em> (you need to keep an eye on your data changes and invalidate cached pages when related data changes) and <em>full cache purges</em> (sometimes your site layout or page templates change and you need to purge all the cached pages to make sure users will get new visual elements of layout changes). In this post I&#8217;d like to look at a few techniques we use at <a href="http://www.scribd.com/">Scribd</a> to solve cache invalidation problems.</p>
<p><span></span></p>
<hr />So, the <strong>first problem &#8211; ongoing cache invalidation when content changes</strong>. This is actually a pretty simple task in squid: you just use <a href="http://www.htcp.org/">HTCP protocol</a> and send CLR requests to your caching farm (we didn&#8217;t find any HTCP protocol implementations so we&#8217;ve implemented <a href="http://github.com/kovyrin/htcp-ruby">our own simple client</a> that supports just one command).</p>
<p>Since we use <a href="http://haproxy.1wt.eu/">haproxy</a> to balance our traffic in the cluster it is hard to predict where should we send a purge request. So we fan those out to all cache servers.</p>
<p>To make sure cache purging won&#8217;t slow the site down, especially considering we need to do more that just a simple cache purge (submit documents to search indexes, etc, etc), we just spool a &#8220;document changed&#8221; request to a queue and then have a set of <a href="http://github.com/kovyrin/loops">asynchronous processes</a> that do all the work in background.</p>
<p>Next, <strong>The Hard Problem &#8211; handling full cache purges w/o killing our backend servers</strong> with 5x-10x traffic (our normal hit ratio is ~90-95%).</p>
<p>We&#8217;ve spent a lot of time thinking about this problem and the first idea we came up with was to have a loop process somewhere that would iterate all documents we have cached and purge them one by one&#8230; but that does not seem to be a practical solution when you have tens of millions documents (and few page versions per document) and obviously the solution would not scale with constantly growing documents corpus.</p>
<p>So we kept brainstorming and finally got one idea that works just perfectly for us: what if we&#8217;d be able to take our traffic and define a function <em>f(t)</em> that would return a percentage of the traffic that should be purged at any moment in time. So we did it &#8211; we&#8217;ve implemented a nginx module that would version our cache by assigning every cached page a revision (<a href="http://kovyrin.net/2009/07/21/advanced-squid-caching-scribd-logged-in-users-complex-urls/">using a custom HTTP-headers + Vary-caching</a>) and would be able to slowly migrate the cache from one revision to another over a pre-defined period of time.</p>
<p>Having that module we are able to do so called &#8220;slow&#8221; cache purges that could take any time from a few minutes (that still helps to reduce the load spike generated by the hottest content) up to many hours (this is what we normally use) or days (never used this option, but it is definitely possible).</p>
<p>Here is an example 100% cache purge over an 8 hour interval:</p>
<ol>
<li> Daily hit ratio graph:<br />
<a href="http://img.skitch.com/20100529-pkx64g6the9winqcnk6sigiyns.png" rel="shadowbox[post-322];player=img;"><img rel="shadowbox" src="http://img.skitch.com/20100529-pkx64g6the9winqcnk6sigiyns.preview.jpg" alt="day" /></a>
</li>
<li> Weekly hit ratio graph:<br />
<a href="http://img.skitch.com/20100529-nk2hyafgtbw1pc1nrkgbec8st3.png" rel="shadowbox[post-322];player=img;"><img rel="shadowbox" src="http://img.skitch.com/20100529-nk2hyafgtbw1pc1nrkgbec8st3.preview.jpg" alt="week" /></a>
</li>
</ol>
<p>As you can see, during those slow purges our cached pages would be slowly updated without putting too much pressure on the backend. Cache hit ratio would slowly degrade and then slowly get back to its normal levels, but with our normal (6-8 hours) purges hit ratio never gets lower that 65-70% which makes it possible for us to save huge amounts of money on not having 90% spare capacity just for the cache purge load surges (we used to have lots of spare application cluster capacity before introducing this approach).</p>

<p><a href="http://feedads.g.doubleclick.net/~a/-nlVyidsWJg1e-DbtPeODrcR9bY/0/da"><img src="http://feedads.g.doubleclick.net/~a/-nlVyidsWJg1e-DbtPeODrcR9bY/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/-nlVyidsWJg1e-DbtPeODrcR9bY/1/da"><img src="http://feedads.g.doubleclick.net/~a/-nlVyidsWJg1e-DbtPeODrcR9bY/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=4ywVA01ppFY:b2ode2vaNL0:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=4ywVA01ppFY:b2ode2vaNL0:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=4ywVA01ppFY:b2ode2vaNL0:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=4ywVA01ppFY:b2ode2vaNL0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=4ywVA01ppFY:b2ode2vaNL0:V_sGLiPBpWU" border="0"></img></a>
</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24897&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24897&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2010/05/29/advanced-squid-caching-in-scribd-cache-invalidation-techniques/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Installing Midnight Commander 4.7 on Mac OS X</title>
		<link>http://feedproxy.google.com/~r/Homo-Adminus/~3/n_CrAVJAsNM/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=installing-midnight-commander-4-7-on-mac-os-x</link>
		<comments>http://feedproxy.google.com/~r/Homo-Adminus/~3/n_CrAVJAsNM/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 22:34:00 +0000</pubDate>
		<dc:creator>Alexey Kovyrin</dc:creator>
				<category><![CDATA[Admin-tips]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[gnu]]></category>
		<category><![CDATA[Mac]]></category>
		<category><![CDATA[macos]]></category>
		<category><![CDATA[mc]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://kovyrin.net/?p=390</guid>
		<description><![CDATA[Another short post just to remember the procedure for the next time I&#8217;ll be setting up a new mac. For those of my readers who do not know what Midnight Commander (aka mc) is, GNU Midnight Commander is a visual file manager, created under a heavy influence of Norton Commander file manager from dark DOS ages   For more information, you can visit their web site. Now, get to the installation topic itself.
To install mc on a Mac OS X machine, you need macports installed and then first thing you&#8217;ll need to do is to install some prerequisite libraries:
1$ sudo port install libiconv slang2
Next thing, download the sources from their web site and unpack them. When the sources are ready, you can configure the build:
12345678$ ./configure \
&#160; &#160; &#160; &#160; --prefix=/opt/mc \
&#160; &#160; &#160; &#160; --with-screen=slang \
&#160; &#160; &#160; &#160; --enable-extcharset \
&#160; &#160; &#160; &#160; --enable-charset \
&#160; &#160; &#160; &#160; --with-libiconv-prefix=/opt/local \
&#160; &#160; &#160; &#160; --with-slang-includes=/opt/local/include \
&#160; &#160; &#160; &#160; --with-slang-libs=/opt/local/lib
Then, normal GNU-style build and install procedure:
123$ make 
........
$ sudo make install
And the last thing would be to add /opt/mc/bin to your PATH environment variable.



  
]]></description>
			<content:encoded><![CDATA[<p>Another short post just to remember the procedure for the next time I&#8217;ll be setting up a new mac. For those of my readers who do not know what Midnight Commander (aka mc) is, <a href="http://www.midnight-commander.org">GNU Midnight Commander</a> is a visual file manager, created under a heavy influence of Norton Commander file manager from dark DOS ages <img src="http://kovyrin.net/wp-includes/images/smilies/icon_smile.gif" alt=":-)" class="wp-smiley" />  For more information, you can visit <a href="http://www.midnight-commander.org">their web site</a>. Now, get to the installation topic itself.</p>
<p>To install mc on a Mac OS X machine, you need <a href="http://www.macports.org/">macports</a> installed and then first thing you&#8217;ll need to do is to install some prerequisite libraries:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br /></div></td><td><div>$ sudo port install libiconv slang2</div></td></tr></tbody></table></div>
<p>Next thing, download the sources <a href="http://www.midnight-commander.org/downloads">from their web site</a> and unpack them. When the sources are ready, you can configure the build:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br /></div></td><td><div>$ ./configure \<br />
&nbsp; &nbsp; &nbsp; &nbsp; --prefix=/opt/mc \<br />
&nbsp; &nbsp; &nbsp; &nbsp; --with-screen=slang \<br />
&nbsp; &nbsp; &nbsp; &nbsp; --enable-extcharset \<br />
&nbsp; &nbsp; &nbsp; &nbsp; --enable-charset \<br />
&nbsp; &nbsp; &nbsp; &nbsp; --with-libiconv-prefix=/opt/local \<br />
&nbsp; &nbsp; &nbsp; &nbsp; --with-slang-includes=/opt/local/include \<br />
&nbsp; &nbsp; &nbsp; &nbsp; --with-slang-libs=/opt/local/lib</div></td></tr></tbody></table></div>
<p>Then, normal GNU-style build and install procedure:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br /></div></td><td><div>$ make <br />
........<br />
$ sudo make install</div></td></tr></tbody></table></div>
<p>And the last thing would be to add <code><span>/opt/mc/bin</span></code> to your PATH environment variable.</p>

<p><a href="http://feedads.g.doubleclick.net/~a/oJxGQFCQ1N5Ppts2QARXe-ghnCM/0/da"><img src="http://feedads.g.doubleclick.net/~a/oJxGQFCQ1N5Ppts2QARXe-ghnCM/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/oJxGQFCQ1N5Ppts2QARXe-ghnCM/1/da"><img src="http://feedads.g.doubleclick.net/~a/oJxGQFCQ1N5Ppts2QARXe-ghnCM/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=n_CrAVJAsNM:wPhsQocSzm8:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=n_CrAVJAsNM:wPhsQocSzm8:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=n_CrAVJAsNM:wPhsQocSzm8:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=n_CrAVJAsNM:wPhsQocSzm8:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=n_CrAVJAsNM:wPhsQocSzm8:V_sGLiPBpWU" border="0"></img></a>
</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=23316&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=23316&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2010/02/03/installing-midnight-commander-4-7-on-mac-os-x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Enabling IPv6 Support in nginx</title>
		<link>http://feedproxy.google.com/~r/Homo-Adminus/~3/TpJuKctpPjk/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=enabling-ipv6-support-in-nginx</link>
		<comments>http://feedproxy.google.com/~r/Homo-Adminus/~3/TpJuKctpPjk/#comments</comments>
		<pubDate>Sat, 16 Jan 2010 09:39:44 +0000</pubDate>
		<dc:creator>Alexey Kovyrin</dc:creator>
				<category><![CDATA[admin]]></category>
		<category><![CDATA[Admin-tips]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[ipv6]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[networks]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://kovyrin.net/?p=362</guid>
		<description><![CDATA[This is going to be a really short post, but for someone it could save an hour of life.
So, you&#8217;ve nothing to do and you&#8217;ve decided to play around with IPv6 or maybe you&#8217;re happened to be an administrator of a web service that needs to support IPv6 connectivity and you need to make your nginx server work nicely with this protocol. 
First thing you need to do is to enable IPv6 in nginx by recompiling it with --with-ipv6 configure option and reinstalling it. If you use some pre-built package, check if your nginx already has this key enabled by running nginx -V. 

The results should have --with-ipv6 option in configure arguments:
12345[root@node ~]# nginx -V
nginx version: nginx/0.7.64
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-46)
TLS SNI support disabled
configure arguments: --with-ipv6 ... --prefix=/opt/nginx
After you&#8217;ve got your nginx binary with IPv6 support, you need to enable it by changing listen directives in your configuration file. 
If your server binds to all interfaces/IPs, you already have listen 80 or something like that in your file. Those lines should be changed to make sure you tell your nginx to bind on both IPv4 and IPv6 addresses:
1listen [::]:80;
For situations when you do not want to listen on IPv4 interfaces, there is ipv6only=on parameter:
1listen [::]:443 default ipv6only=on;
For configurations that need to bind to specific ip addresses you could use similar notation:
1listen [2607:f0d0:1004:2::2]:80;
After changing your configs and testing them you need to restart (not reload) your nginx process and then check your system port bindings to make sure it works as expected:
123[root@node ~]# netstat -nlp &#124; grep nginx
tcp &#160; 0 &#160; &#160;0 :::80 &#160; &#160; &#160; &#160;:::* &#160; &#160; &#160; &#160; LISTEN &#160; &#160;23817/nginx
tcp &#160; 0 &#160; &#160;0 :::443 &#160; &#160; &#160; :::* &#160; &#160; &#160; &#160; LISTEN &#160; &#160;23817/nginx
This is it, now you can add AAAA records to your main domain name or just create a dedicated ipv6.yourcompany.com sub-domain and show it to your friends  



  
]]></description>
			<content:encoded><![CDATA[<p>This is going to be a really short post, but for someone it could save an hour of life.</p>
<p>So, you&#8217;ve nothing to do and you&#8217;ve decided to play around with <a href="http://en.wikipedia.org/wiki/IPv6">IPv6</a> or maybe you&#8217;re happened to be an administrator of a web service that needs to support IPv6 connectivity and you need to make your <a href="http://nginx.org/">nginx</a> server work nicely with this protocol. </p>
<p>First thing you need to do is to enable IPv6 in nginx by recompiling it with <code><span>--with-ipv6</span></code> configure option and reinstalling it. If you use some pre-built package, check if your nginx already has this key enabled by running <code><span>nginx -V</span></code>. </p>
<p><span></span></p>
<p>The results should have <code><span>--with-ipv6</span></code> option in configure arguments:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br />4<br />5<br /></div></td><td><div>[root@node ~]# nginx -V<br />
nginx version: nginx/0.7.64<br />
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-46)<br />
TLS SNI support disabled<br />
configure arguments: --with-ipv6 ... --prefix=/opt/nginx</div></td></tr></tbody></table></div>
<p>After you&#8217;ve got your nginx binary with IPv6 support, you need to enable it by changing <code><span>listen</span></code> directives in your configuration file. </p>
<p>If your server binds to all interfaces/IPs, you already have <code><span>listen 80</span></code> or something like that in your file. Those lines should be changed to make sure you tell your nginx to bind on both IPv4 and IPv6 addresses:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br /></div></td><td><div>listen [::]:80;</div></td></tr></tbody></table></div>
<p>For situations when you do not want to listen on IPv4 interfaces, there is <code><span>ipv6only=on</span></code> parameter:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br /></div></td><td><div>listen [::]:443 default ipv6only=on;</div></td></tr></tbody></table></div>
<p>For configurations that need to bind to specific ip addresses you could use similar notation:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br /></div></td><td><div>listen [2607:f0d0:1004:2::2]:80;</div></td></tr></tbody></table></div>
<p>After changing your configs and testing them you need to restart (not reload) your nginx process and then check your system port bindings to make sure it works as expected:</p>
<div><table cellspacing="0" cellpadding="0"><tbody><tr><td><div>1<br />2<br />3<br /></div></td><td><div>[root@node ~]# netstat -nlp | grep nginx<br />
tcp &nbsp; 0 &nbsp; &nbsp;0 :::80 &nbsp; &nbsp; &nbsp; &nbsp;:::* &nbsp; &nbsp; &nbsp; &nbsp; LISTEN &nbsp; &nbsp;23817/nginx<br />
tcp &nbsp; 0 &nbsp; &nbsp;0 :::443 &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; LISTEN &nbsp; &nbsp;23817/nginx</div></td></tr></tbody></table></div>
<p>This is it, now you can add <a href="http://en.wikipedia.org/wiki/IPv6_Addresses#IPv6_addresses_in_the_Domain_Name_System">AAAA</a> records to your main domain name or just create a dedicated <a href="http://ipv6.scribd.com">ipv6</a>.<a href="http://ipv6.google.com">yourcompany</a>.<a href="http://ipv6.netflix.com">com</a> sub-domain and show it to your friends <img src="http://kovyrin.net/wp-includes/images/smilies/icon_smile.gif" alt=":-)" class="wp-smiley" /> </p>

<p><a href="http://feedads.g.doubleclick.net/~a/sJSdXloba3FdnnOJmBMpY3OC5U0/0/da"><img src="http://feedads.g.doubleclick.net/~a/sJSdXloba3FdnnOJmBMpY3OC5U0/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/sJSdXloba3FdnnOJmBMpY3OC5U0/1/da"><img src="http://feedads.g.doubleclick.net/~a/sJSdXloba3FdnnOJmBMpY3OC5U0/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=TpJuKctpPjk:Yc3Q-fyY3Vw:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=TpJuKctpPjk:Yc3Q-fyY3Vw:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=TpJuKctpPjk:Yc3Q-fyY3Vw:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=TpJuKctpPjk:Yc3Q-fyY3Vw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=TpJuKctpPjk:Yc3Q-fyY3Vw:V_sGLiPBpWU" border="0"></img></a>
</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=23077&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=23077&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2010/01/16/enabling-ipv6-support-in-nginx/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Advanced Squid Caching in Scribd: Hardware + Software Used</title>
		<link>http://feedproxy.google.com/~r/Homo-Adminus/~3/s8GVmmmES8s/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=advanced-squid-caching-in-scribd-hardware-software-used</link>
		<comments>http://feedproxy.google.com/~r/Homo-Adminus/~3/s8GVmmmES8s/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 05:23:18 +0000</pubDate>
		<dc:creator>Alexey Kovyrin</dc:creator>
				<category><![CDATA[Admin-tips]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[haproxy]]></category>
		<category><![CDATA[hardware]]></category>
		<category><![CDATA[networks]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[scribd]]></category>
		<category><![CDATA[squid]]></category>

		<guid isPermaLink="false">http://kovyrin.net/?p=298</guid>
		<description><![CDATA[After the previous post in this caching related series I&#8217;ve received many questions on hardware and software configuration of our servers so in this post I&#8217;ll describe our server&#8217;s configs and the motivation behind those configs.

Hardware Configuration
Since in our setup Squid server uses one-process model (with an asynchronous requests processing) there was no point in ordering multi-core CPUs for our boxes and since we have a lots of pages on the site and the cache is pretty huge all the servers ended up being highly I/O bound. Considering these facts we&#8217;ve decided to use the following hardware specs for the servers:
CPU: One pretty cheap dual-core Intel Xeon 5148 (no need in multiple cores or really high frequencies &#8211; even these CPUs have ~1% avg load)
RAM: 8Gb (basically to reduce I/O pressure by caching hot content in RAM)
Disks:  4 x small SAS 15k drives in JBOD mode (no RAIDS &#8211; we&#8217;ve tried all kinds of RAID configs and it did not help with the I/O performance)
So, once again: nothing is as important in a squid box as I/O throughput. 
Here is a sample CPU load graph from one of the boxes:

Software Configuration
This could be a long story, but in a few words our experience with different squid versions was the following.
First, when I&#8217;ve started working on this caching project I&#8217;ve just installed squid using Debian&#8217;s apt-get install squid command. As the result we&#8217;ve got some ancient squid 2.6 release that for some reason (still unclear to me) was painfully slow in I/O operations and it had some leaking file descriptors problem so after a few hours under production load the box would simply stop processing requests.
When the first approach failed, I&#8217;ve decided to go to the squid web site, download the latest production release and install it from sources (yes, we do it all the time when OS vendor ships too old or buggy releases). Result &#8211; freaking fast and stable squid 3.0 which worked flawlessly for about 5 months. 
Few months ago we&#8217;ve found out about the stale-* extensions available in squid 2.7 and I&#8217;ve started wondering if we should change our perfectly stable 3.0 setup to 2.7. And some time later I&#8217;ve decided to use Vary HTTP header in our caching architecture and then I found out that vary-caching correctly implemented only in 2.7 and since 3.0 is a complete rewrite of the 2.X branch, vary-caching is not yet implemented there (or not in a way we&#8217;d want it to be implemented).
So, the  final result: at this moment in time we&#8217;re using custom-built Squid 2.7STABLE6 and really happy with it, it is stable, fast and feature-rich caching proxy server.
Caching Cluster Configuration
Apparently we have more than one squid server in scribd and this makes it a bit harder to use those servers (comparing to one box when you&#8217;d send all requests to one IP:port pair). We&#8217;ve tried to use round-robin balancing for the squid boxes + ICP-based neighbor checks but it was adding more latency to our responses and we&#8217;ve decided to put haproxy load balancer between nginx and squid farm and set up URL hash based balancing to distribute requests evenly amongst squid backends. 
This scheme worked pretty nice, but we had one serious problem with this setup: if one squid box would go down, haproxy would quickly detect the problem and would remove it from the pool&#8230; And here comes the problem &#8211; removing a server from the pool completely changes hashing keys space and all cached requests become invalid. To solve this problem we&#8217;ve developed a nginx balancer module that performs consistent hashing of URLs and we&#8217;re testing this module now in production. What is really good about this module is that it removes one hop from the chain if http proxies between the site and a user.
So, this was a short description of what hardware we use for our caching cluster and why do we use it. In the next posts of this series we&#8217;ll talk about cache control and objects invalidation.



  
]]></description>
			<content:encoded><![CDATA[<p>After <a href="http://kovyrin.net/2009/07/21/advanced-squid-caching-scribd-logged-in-users-complex-urls/">the previous post in this caching related series</a> I&#8217;ve received many questions on hardware and software configuration of our servers so in this post I&#8217;ll describe our server&#8217;s configs and the motivation behind those configs.</p>
<p><span></span></p>
<h3>Hardware Configuration</h3>
<p>Since in our setup Squid server uses one-process model (with an asynchronous requests processing) there was no point in ordering multi-core CPUs for our boxes and since we have a lots of pages on the site and the cache is pretty huge all the servers ended up being highly I/O bound. Considering these facts we&#8217;ve decided to use the following hardware specs for the servers:</p>
<p><b>CPU:</b> One pretty cheap dual-core Intel Xeon 5148 (no need in multiple cores or really high frequencies &#8211; even these CPUs have ~1% avg load)<br />
<b>RAM:</b> 8Gb (basically to reduce I/O pressure by caching hot content in RAM)<br />
<b>Disks: </b> 4 x small SAS 15k drives in JBOD mode (no RAIDS &#8211; we&#8217;ve tried all kinds of RAID configs and it did not help with the I/O performance)</p>
<p>So, once again: <i>nothing is as important in a squid box as I/O throughput</i>. </p>
<p>Here is a sample CPU load graph from one of the boxes:</p>
<p><a href="http://kovyrin.net/wp-content/uploads/2009/08/squid-cpu-graph.png"><img src="http://kovyrin.net/wp-content/uploads/2009/08/squid-cpu-graph-300x139.png" alt="squid-cpu-graph" title="squid-cpu-graph" width="300" height="139" class="aligncenter size-medium wp-image-305" /></a></p>
<h3>Software Configuration</h3>
<p>This could be a long story, but in a few words our experience with different squid versions was the following.</p>
<p>First, when I&#8217;ve started working on this caching project I&#8217;ve just installed squid using Debian&#8217;s apt-get install squid command. As the result we&#8217;ve got some ancient squid 2.6 release that for some reason (still unclear to me) was painfully slow in I/O operations and it had some leaking file descriptors problem so after a few hours under production load the box would simply stop processing requests.</p>
<p>When the first approach failed, I&#8217;ve decided to go to the <a href="http://www.squid-cache.org/">squid web site</a>, download the latest production release and install it from sources (yes, we do it all the time when OS vendor ships too old or buggy releases). Result &#8211; freaking fast and stable squid 3.0 which worked flawlessly for about 5 months. </p>
<p>Few months ago we&#8217;ve found out about the <a href="http://www.mnot.net/blog/2007/12/12/stale">stale-* extensions</a> available in squid 2.7 and I&#8217;ve started wondering if we should change our perfectly stable 3.0 setup to 2.7. And some time later I&#8217;ve decided to use Vary HTTP header in our caching architecture and then I found out that vary-caching correctly implemented only in 2.7 and since 3.0 is a complete rewrite of the 2.X branch, vary-caching is not yet implemented there (or not in a way we&#8217;d want it to be implemented).</p>
<p>So, the  final result: at this moment in time we&#8217;re using custom-built Squid 2.7STABLE6 and really happy with it, it is stable, fast and feature-rich caching proxy server.</p>
<h3>Caching Cluster Configuration</h3>
<p>Apparently we have more than one squid server in scribd and this makes it a bit harder to use those servers (comparing to one box when you&#8217;d send all requests to one IP:port pair). We&#8217;ve tried to use round-robin balancing for the squid boxes + ICP-based neighbor checks but it was adding more latency to our responses and we&#8217;ve decided to put haproxy load balancer between nginx and squid farm and set up URL hash based balancing to distribute requests evenly amongst squid backends. </p>
<p>This scheme worked pretty nice, but we had one serious problem with this setup: if one squid box would go down, haproxy would quickly detect the problem and would remove it from the pool&#8230; And here comes the problem &#8211; removing a server from the pool completely changes hashing keys space and all cached requests become invalid. To solve this problem we&#8217;ve developed a nginx balancer module that performs consistent hashing of URLs and we&#8217;re testing this module now in production. What is really good about this module is that it removes one hop from the chain if http proxies between the site and a user.</p>
<p>So, this was a short description of what hardware we use for our caching cluster and why do we use it. In the next posts of this series we&#8217;ll talk about cache control and objects invalidation.</p>

<p><a href="http://feedads.g.doubleclick.net/~a/CCp90YJkbcR5Mr-i71FKxJDXc08/0/da"><img src="http://feedads.g.doubleclick.net/~a/CCp90YJkbcR5Mr-i71FKxJDXc08/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/CCp90YJkbcR5Mr-i71FKxJDXc08/1/da"><img src="http://feedads.g.doubleclick.net/~a/CCp90YJkbcR5Mr-i71FKxJDXc08/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=s8GVmmmES8s:rOiXzkyoXvI:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=s8GVmmmES8s:rOiXzkyoXvI:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=s8GVmmmES8s:rOiXzkyoXvI:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Homo-Adminus?a=s8GVmmmES8s:rOiXzkyoXvI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Homo-Adminus?i=s8GVmmmES8s:rOiXzkyoXvI:V_sGLiPBpWU" border="0"></img></a>
</div>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/08/04/advanced-squid-caching-in-scribd-hardware-software-used/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

