Archive for the ‘Nginx’ Category

451 CAOS Links. 2011.12.02

Декабрь 2nd, 2011

Talend delivers v5. Zentyal raises series A. The TCO of OSS. And more.

# Talend announced version 5 of its data integration suite, adding business process management capabilities via an OEM relationship with BonitaSoft. Yves De Montcheuil explained the name changes in version 5.

# Zentyal closed a series A venture capital funding of over $1m by Open Ocean Capital.

# The London School of Economics released a report on the total cost of ownership of open source software.

# Couchbase announced the availability of the Couchbase Hadoop Connector, developed in conjunction with Cloudera.

# Rackspace announced the private beta of Rackspace MySQL Cloud Database.

# The debate over the role of open source foundations in the Git era continued, including a follow-up by the instigator, Mikael Rogers, a rallying cry for autonomy from Ceki Gülcü, and Simon Phipps warning about throwing the baby out with the bathwater.

# Marco Abis is stepping down as CEO of Sourcesense.

# NGINX usage has grown almost 300% over the last year, according to Netcraft figures discussed by Royal Pingdom.

# The Wireless Innovation Forum announced the formation of the Open Source Framework for Commercial Baseband Software project.


PlanetMySQL Voting: Vote UP / Vote DOWN

CAOS Theory Podcast 2011.10.28

Октябрь 28th, 2011

Topics for this podcast:

*Opscode Chef extends to Windows for more enterprise devops
*Black Duck continues growth, gains new funding
*Cloudant expands NoSQL database focus, customers
*New open source Web server and vendor Nginx arrives
*The downside of Microsoft’s Android dollars

iTunes or direct download (27:35, 4.7MB)


PlanetMySQL Voting: Vote UP / Vote DOWN

Installing Nginx With PHP5 (And PHP-FPM) And MySQL Support On Ubuntu 11.10

Октябрь 27th, 2011

Installing Nginx With PHP5 (And PHP-FPM) And MySQL Support On Ubuntu 11.10

Nginx (pronounced "engine x") is a free, open-source, high-performance HTTP server. Nginx is known for its stability, rich feature set, simple configuration, and low resource consumption. This tutorial shows how you can install Nginx on an Ubuntu 11.10 server with PHP5 support (through PHP-FPM) and MySQL support.


PlanetMySQL Voting: Vote UP / Vote DOWN

Running phpMyAdmin On Nginx (LEMP) On Debian Squeeze/Ubuntu 11.04

Октябрь 4th, 2011

Running phpMyAdmin On Nginx (LEMP) On Debian Squeeze/Ubuntu 11.04

The phpMyAdmin package from the Debian/Ubuntu repositories comes with configuration files for Apache and Lighttpd, but not for nginx. This tutorial shows how you can use the Debian Squeeze/Ubuntu 11.04 phpMyAdmin package in an nginx vhost. Nginx is a HTTP server that uses much less resources than Apache and delivers pages a lot of faster, especially static files.


PlanetMySQL Voting: Vote UP / Vote DOWN

Installing Nginx With PHP5 (And PHP-FPM) And MySQL Support On Fedora 15

Октябрь 2nd, 2011

Installing Nginx With PHP5 (And PHP-FPM) And MySQL Support On Fedora 15

Nginx (pronounced "engine x") is a free, open-source, high-performance HTTP server. Nginx is known for its stability, rich feature set, simple configuration, and low resource consumption. This tutorial shows how you can install Nginx on a Fedora 15 server with PHP5 support (through PHP-FPM) and MySQL support.


PlanetMySQL Voting: Vote UP / Vote DOWN

Installing Nginx With PHP5 (And PHP-FPM) And MySQL Support On CentOS 6.0

Август 14th, 2011

Installing Nginx With PHP5 (And PHP-FPM) And MySQL Support On CentOS 6.0

Nginx (pronounced "engine x") is a free, open-source, high-performance HTTP server. Nginx is known for its stability, rich feature set, simple configuration, and low resource consumption. This tutorial shows how you can install Nginx on a CentOS 6.0 server with PHP5 support (through PHP-FPM) and MySQL support.


PlanetMySQL Voting: Vote UP / Vote DOWN

Installing Nginx With PHP5 And MySQL Support On CentOS 5.6

Июль 27th, 2011

Installing Nginx With PHP5 And MySQL Support On CentOS 5.6

Nginx (pronounced "engine x") is a free, open-source, high-performance HTTP server. Nginx is known for its stability, rich feature set, simple configuration, and low resource consumption. This tutorial shows how you can install Nginx on a CentOS 5.6 server with PHP5 support (through FastCGI) and MySQL support.


PlanetMySQL Voting: Vote UP / Vote DOWN

CB1 Ubuntu 10.10 Linux Development Setup

Октябрь 17th, 2010

I use a MacBook Pro for my day-to-day operations here at CB1, INC. I’m a huge believer that a development environment should mimic the production environment, so I find myself running a couple virtual machines in VMware Fusion.

The following guide is a reference for myself as well as possibly a helpful resource for setting up your own Linux development environment. Here’s an checklist of the tasks to perform and software to install:

Operating System

Start by installing Ubuntu 10.10 Desktop (or server). I’m not going to cover installing Ubuntu since there are already several other resources out there. Once Ubuntu is installed, open a Terminal:

user@ubuntu:~# sudo passwd root
[sudo] password for user: <type your password>
Enter new UNIX password: <type new root password>
Retype new UNIX password: <type new root password again>
passwd: password updated successfully

user@ubuntu:~# sudo apt-get update
user@ubuntu:~# sudo apt-get upgrade

user@ubuntu:~# mkdir ~/src

New File Permissions

user@ubuntu:~# sudo pico /etc/profile

Change 022 to 002. This setting controls the default permissions when a new file or directory is created. This is mostly useful when managing files over Samba.

Network IP Addresses

Optionally, you may want to assign a static IP address. I set up one IP address for Apache and another for nginx.

user@ubuntu:~# sudo pico /etc/network/interfaces

The following is a reference for adding two static IPs. Change the IPs to meet your needs.

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
	address 192.168.1.200
	netmask 255.255.255.0
	gateway 192.168.1.1

auto eth0:1
iface eth0:1 inet static
	address 192.168.1.201
	netmask 255.255.255.0
user@ubuntu:~# sudo /etc/init.d/networking restart

Packages

Here’s a bunch of packages that will set up compilers, version control, Java, MySQL, Apache, PHP, Memcache, Gearman, Samba, and more.

user@ubuntu:~# sudo apt-get install build-essential autotools-dev autoconf \
 autoconf2.13 openssh-server ethtool traceroute openjdk-6-jdk \
 mysql-server-5.1 bzr subversion subversion-tools ntp ntpdate \
 libpcre3-dev libevent-dev automake bison libtool scons  g++ \
 ncurses-dev libreadline-dev libz-dev libssl-dev  libcurl4-openssl-dev \
 ruby rubygems libzip-ruby1.8 libzip-ruby1.9.1 python-dev ruby-dev \
 libdbus-glib-1-dev uuid-dev libpam0g libpam0g-dev gperf samba valgrind \
 libxml2-dev libfreetype6-dev curl libcurl4-openssl-dev \
 libjpeg62-dev libpng12-dev sqlite3 libsqlite3-dev git-core \
 postgresql postgis gearman libgearman-dev php5 \
 libapache2-mod-php5 php5-dev memcached php5-memcached \
 php5-curl php5-gd php5-mysql php5-pgsql php-apc \
 php5-xdebug php5-fpm libapache2-mod-fastcgi

MySQL

During the package install above, MySQL will prompt you for the root password.

After the packages are installed, we need to allow remote MySQL connections.

user@ubuntu:~# sudo pico /etc/mysql/my.cnf

Comment out the bind-address line.

# bind-address          = 127.0.0.1

SSH

Next, you may optionally increase the connection keep alive interval for remote ssh connections. Timeouts aren’t really an issue for SSH’ing into a local VM, but really helps for remote installs.

user@ubuntu:~# sudo echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config

Samba

Samba allows me to drag and drop files between my Mac and Linux VM. I personally do not enable/install Samba on production servers.

user@ubuntu:~# sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.orig
user@ubuntu:~# sudo pico /etc/samba/smb.conf

You can add a share such as the following:

[ubuntu]
        force user = <your username>
        writeable = yes
        create mode = 644
        path = /home/<your username>
        directory mode = 755
        force group = <your username>

Then create yourself a Samba user:

user@ubuntu:~# sudo smbpasswd -a <your username>

Apache 2

Apache is mostly configured out of the box, but I like to enable rewrite and SSL so I can test production features.

user@ubuntu:~# sudo a2enmod rewrite
user@ubuntu:~# sudo a2enmod ssl

Since I’m going to run Apache and nginx, I’m going bind Apache to eth0.

user@ubuntu:~# sudo pico /etc/apache2/ports.conf
NameVirtualHost 192.168.1.200:80
Listen 192.168.1.200:80

<IfModule mod_ssl.c>
    Listen 192.168.1.200:443
</IfModule>

Now we need to add eth0‘s IP to the default host:

user@ubuntu:~# sudo pico /etc/apache2/sites-enabled/000-default
<VirtualHost 192.168.1.200:80>
        ServerAdmin webmaster@localhost

        DocumentRoot /var/www
        <Directory />
                Options FollowSymLinks
                AllowOverride None
        </Directory>
        <Directory /var/www/>
                Options Indexes FollowSymLinks MultiViews
                AllowOverride None
                Order allow,deny
                allow from all
        </Directory>

        ErrorLog ${APACHE_LOG_DIR}/error.log
        LogLevel warn
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Restart Apache for the changes to take effect.

user@ubuntu:~# sudo apache2ctl restart

Gearman

By default, Gearman uses memory to store pending jobs in the queue, but I prefer to use MySQL for persistent storage. To do this, first create the queue database and table:

user@ubuntu:~# mysqladmin -uroot -p123123 create gearman
user@ubuntu:~# mysql -uroot -p123123 -e "CREATE TABLE gearman.gearman_queue (
  unique_key VARCHAR(64) NOT NULL,
  function_name VARCHAR(255) NULL,
  priority INT NULL,
  data LONGBLOB NULL,
  PRIMARY KEY (unique_key)
) ENGINE = InnoDB;"

Next update the init script to tell Gearman to use the database:

user@ubuntu:~# sudo mv /etc/default/gearman-job-server /etc/default/gearman-job-server.bak
user@ubuntu:~# sudo echo "PARAMS=\"-q libdrizzle --libdrizzle-host=127.0.0.1" \
   "--libdrizzle-user=root --libdrizzle-password=123123 --libdrizzle-db=gearman" \
   "--libdrizzle-table=gearman_queue --libdrizzle-mysql\"" > /etc/default/gearman-job-server
user@ubuntu:~# sudo /etc/init.d/gearman-job-server restart

Gearman PHP Extension

We need to download and install the Gearman PHP extension if we want to write PHP workers or post jobs to the queue.

user@ubuntu:~# cd ~/src
user@ubuntu:~/src# wget http://pecl.php.net/get/gearman-0.7.0.tgz
user@ubuntu:~/src# tar xzf gearman-0.7.0.tgz
user@ubuntu:~/src# rm gearman-0.7.0.tgz package.xml
user@ubuntu:~/src# cd gearman-0.7.0
user@ubuntu:~/src# phpize
user@ubuntu:~/src# ./configure
user@ubuntu:~/src# make
user@ubuntu:~/src# sudo make install

Next, add the config file to load the Gearman PHP extension:

user@ubuntu:~# sudo echo "extension=gearman.so" >> /etc/php5/conf.d/gearman.ini

memcached PHP Extension

Since we have memcached and the memcached PHP extension install, let’s use it for storing session data:

user@ubuntu:~/src# sudo echo "session.save_handler = memcached
session.save_path = \"127.0.0.1:11211\"" >> /etc/php5/conf.d/memcached.ini

nginx

nginx is web server that is really fast. I use nginx as my primary development web server unless I’m running a web app that only works with Apache. You can choose to install nginx from package, but I like to live life on the bleeding edge, so I’ll be building nginx from source. To install nginx, we need to download the source, compile it, install it, and configure it.

user@ubuntu:~# cd ~/src
user@ubuntu:~/src# wget http://nginx.org/download/nginx-0.8.52.tar.gz
user@ubuntu:~/src# tar xzf nginx-0.8.52.tar.gz
user@ubuntu:~/src# rm nginx-0.8.52.tar.gz
user@ubuntu:~/src# cd nginx-0.8.52
user@ubuntu:~/src# mkdir /var/lib/nginx
user@ubuntu:~/src# ./configure \
    --sbin-path=/usr/sbin \
    --conf-path=/etc/nginx/nginx.conf \
    --error-log-path=/var/log/nginx/error.log \
    --pid-path=/var/run/nginx.pid \
    --lock-path=/var/lock/nginx.lock \
    --http-log-path=/var/log/nginx/access.log \
    --http-client-body-temp-path=/var/lib/nginx/body \
    --http-proxy-temp-path=/var/lib/nginx/proxy \
    --http-fastcgi-temp-path=/var/lib/nginx/fastcgi \
    --http-uwsgi-temp-path=/var/lib/nginx/uwsgi \
    --http-scgi-temp-path=/var/lib/nginx/scgi \
    --with-http_stub_status_module
user@ubuntu:~/src# make
user@ubuntu:~/src# sudo make install

user@ubuntu:~# sudo pico /etc/init.d/nginx

Here’s the init script that will start nginx for us:

#! /bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/nginx
NAME=nginx
DESC=nginx
test -x $DAEMON || exit 0
case "$1" in
  start)
        echo -n "Starting $DESC: "
        start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
        echo "$NAME."
        ;;
  stop)
        echo -n "Stopping $DESC: "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        echo "$NAME."
        ;;
  restart|force-reload)
        echo -n "Restarting $DESC: "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        sleep 1
        start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
        echo "$NAME."
        ;;
  reload)
        echo -n "Reloading $DESC configuration: "
        start-stop-daemon --stop --signal HUP --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        echo "$NAME."
        ;;
  *)
        echo "Usage: /etc/init.d/$NAME {start|stop|restart|reload|force-reload}" >&2
        exit 1
        ;;
esac
exit 0

Now we need to make the init script executable and enable it:

user@ubuntu:~# sudo chmod +x /etc/init.d/nginx
user@ubuntu:~# sudo update-rc.d nginx defaults

user@ubuntu:~# sudo pico /etc/nginx/nginx.conf

Here’s a starter nginx.conf with some basic settings:

user  www-data www-data;
worker_processes  2;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile                on;
    tcp_nodelay             on;
    tcp_nopush              on;
    keepalive_timeout       65;
    server_name_in_redirect off;
    server_tokens           off;

    add_header Strict-Transport-Security max-age=1800;
    add_header X-Frame-Options deny;

    gzip            on;
    gzip_buffers    16 8k;
    gzip_comp_level 9;
    gzip_types      text/plain text/xml application/x-javascript text/css;

    include /etc/nginx/sites/*;
}
user@ubuntu:~# sudo mkdir /etc/nginx/sites
user@ubuntu:~# sudo pico /etc/nginx/sites/default

Now we need to set up a default host that supports PHP (via PHP-FPM, PHP’s FastCGI Process Manager) and we want the default host to use the eth0:1 IP address:

server {
    listen       192.168.1.201:80 default;
    server_name  _;
    root   /var/www;
    index  index.php;
    location / {
        if (!-e $request_filename) {
            rewrite ^/(.*)$ /index.php?q=$1 last;
            break;
        }
    }
    location ~ \.php$ {
        fastcgi_pass   127.0.0.1:9000;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME  /var/www$fastcgi_script_name;
        include        fastcgi_params;
    }
    location ~* (\.(htaccess|engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)|code-style\.pl|Entries.*|Repository|Root|Tag|Template)$ {
        deny all;
    }
}

After the config files are good to go, start nginx:

user@ubuntu:~# sudo /etc/init.d/nginx start

Service Names

I also like to add service names so I can see what ports are in use when I run netstat. I added drizzle and Cassandra for fun despite this post not including them.

user@ubuntu:~# sudo cp /etc/services /etc/services.bak
user@ubuntu:~# su
root@ubuntu:~# echo "drizzle     4427/tcp
drizzle     4427/udp
memcached   11211/tcp
memcached   11211/udp
gearmand    4730/tcp
gearmand    4730/udp
fastcgi     9000/tcp
cassandra   9160/tcp" >> /etc/services
root@ubuntu:~# exit

Android SDK

The Android SDK is unfortunately not in package, so you’ll need to download it from the Android Developer site: http://developer.android.com/sdk/index.html.

user@ubuntu:~# wget http://dl.google.com/android/android-sdk_r07-linux_x86.tgz
user@ubuntu:~# tar xzf android-sdk_r07-linux_x86.tgz
user@ubuntu:~# rm android-sdk_r07-linux_x86.tgz
user@ubuntu:~# sudo mv android-sdk-linux_x86 /usr/local
user@ubuntu:~# sudo find /usr/local/android-sdk-linux_x86 -type d -exec chmod 777 {} \;

You’ll need to add the Android SDK path near the top of your ~/.bash_profile or ~/.bashrc:

export PATH=${PATH}:/usr/local/android-sdk-linux_x86/tools

To manage your Android SDK packages and virtual devices, you’ll need to run the android app:

user@ubuntu:~# android

First go to Available Packages and download version 1.6 and 2.2 Android SDK packages. You can also choose to download the documentation, samples, and Google APIs.

Downloading the package may take several minutes. You don’t have to create a virtual device right now if you are planning on installing Appcelerator’s Titanium platform. You can exit the Android app when you’re done.

Desktop Apps

If you’re running Ubuntu Desktop, there are a couple handy apps I install. The first is Google Chrome and can be directly downloaded from the Google Chrome download page.

I find KCachegrind and GHex to be useful:

user@ubuntu:~# sudo apt-get install kcachegrind ghex

Appcelerator Titanium

Titanium is an awesome platform for developing desktop applications for Linux, Mac OS X, and Windows as well as mobile apps for iPhone and Android. We use Titanium Developer to create Titanium projects. Begin by downloading the 64-bit version of Titanium:

user@ubuntu:~# wget -O titanium.tgz http://www.appcelerator.com/download-linux64

There’s also a 32-bit version available at http://www.appcelerator.com/download-linux32.

Next we unpack Titanium Developer and move it to a safe place:

user@ubuntu:~# tar xzf titanium.tgz
user@ubuntu:~# rm titanium.tgz

Next you need to run the installer by double-clicking the Titanium Developer executable. Run the executable and then click the Install button. You can try installing to /opt/titanium, but you might need root privileges.

Next, there are a few issues with outdated libraries, so we simply delete them:

user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgobject-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libglib-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgio-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgthread-2.0.*

Titanium Developer also complains if /bin/java doesn’t exist, so create a quick link:

user@ubuntu:~# sudo ln -s /usr/bin/java /bin/java

Relaunch Titanium Developer and enter your login credentials. If you don’t have a login, you can get a free account.

After signing in, you may notice there are some updates available in the upper right corner of the window. Click in the box and the updates will be downloaded and installed.

Optionally you can create a launcher icon for your GNOME panel. Don’t forget to escape spaces in the command with a backslash!

Finishing Touches

Lastly, I like to re-arrange my desktop to maximize my coding real estate.

Conclusion

That should get you up and running with a neato dev environment. If you need to run SSL, I wrote a post on Creating Self-Signed Certs on Apache 2.2 and Virtual Hosts and Wildcard SSL Certificates with Apache 2.2.

If you find any typos or additions, please feel free to sound off in the comments!


PlanetMySQL Voting: Vote UP / Vote DOWN

Nginx-Fu: X-Accel-Redirect From Remote Servers

Июнь 24th, 2010

We use nginx and its features a lot in Scribd. Many times in the last year we needed some pretty interesting, but not supported feature – we wanted nginx X-Accel-Redirect functionality to work with remote URLs. Our of the box nginx supports this functionality for local URIs only. In this short post I want to explain how did we make nginx serve remote content via X-Accel-Redirect.

First of all, here is what you may need this feature. Let’s imagine you have a file storage on Amazon S3 where you store tons of content. And you have an application where you have some content downloading functionality that you want to be available for logged-in/paying/premium users and/or you want to keep track of downloads your users perform on your site. If your content was on your web server, you could have used simple controlled downloads functionality built-in to nginx out of the box. But the problem is that your content is remote.

Here is what we do to solve this problem.

First, we create a special location on our nginx server. This location will be used as a proxy for all our accelerated file downloads:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Proxy download
location ~* ^/internal_redirect/(.*?)/(.*) {
    # Do not allow people to mess with this location directly
    # Only internal redirects are allowed
    internal;

    # Location-specific logging
    access_log logs/internal_redirect.access.log main;
    error_log logs/internal_redirect.error.log warn;

    # Extract download url from the request
    set $download_uri $2;
    set $download_host $1;

    # Compose download url
    set $download_url http://$download_host/$download_uri;

    # Set download request headers
    proxy_set_header Host $download_host;
    proxy_set_header Authorization '';

    # The next two lines could be used if your storage
    # backend does not support Content-Disposition
    # headers used to specify file name browsers use
    # when save content to the disk
    proxy_hide_header Content-Disposition;
    add_header Content-Disposition 'attachment; filename="$args"';

    # Do not touch local disks when proxying
    # content to clients
    proxy_max_temp_file_size 0;

    # Download the file and send it to client
    proxy_pass $download_url;
}

After adding this location to our nginx config we could start sending responses with headers like the following:

1
2
3
4
5
6
7
# This header will ask nginx to download a file
# from http://some.site.com/secret/url.ext and send it to user
X-Accel-Redirect: /internal_redirect/some.site.com/secret/url.ext

# This header will ask nginx to download a file
# from http://blah.com/secret/url and send it to user as cool.pdf
X-Accel-Redirect: /internal_redirect/blah.com/secret/url?cool.pdf

Here is an example code you could use in a Rails application to use our internal redirect location:

1
2
3
4
5
6
7
8
9
10
def x_accel_url(url, file_name = nil)
  uri = "/internal_redirect/#{url.gsub('http://', '')}"
  uri << "?#{file_name}" if file_name
  return uri
end

def download
  headers['X-Accel-Redirect'] = x_accel_url(some_secret_url, pretty_name)
  render :nothing => true
end

As you can see, nginx is really powerful tool and when you turn your creativity on you can make it even more powerful. Stay tuned for more Nginx-Fu posts.



PlanetMySQL Voting: Vote UP / Vote DOWN

Advanced Squid Caching in Scribd: Cache Invalidation Techniques

Май 29th, 2010

Having a reverse-proxy web cache as one of the major infrastructure elements brings many benefits for large web applications: it reduces your application servers load, reduces average response times on your site, etc. But there is one problem every developer experiences when works with such a cache – cached content invalidation.

It is a complex problem that usually consists of two smaller ones: individual cache elements invalidation (you need to keep an eye on your data changes and invalidate cached pages when related data changes) and full cache purges (sometimes your site layout or page templates change and you need to purge all the cached pages to make sure users will get new visual elements of layout changes). In this post I’d like to look at a few techniques we use at Scribd to solve cache invalidation problems.


So, the first problem – ongoing cache invalidation when content changes. This is actually a pretty simple task in squid: you just use HTCP protocol and send CLR requests to your caching farm (we didn’t find any HTCP protocol implementations so we’ve implemented our own simple client that supports just one command).

Since we use haproxy to balance our traffic in the cluster it is hard to predict where should we send a purge request. So we fan those out to all cache servers.

To make sure cache purging won’t slow the site down, especially considering we need to do more that just a simple cache purge (submit documents to search indexes, etc, etc), we just spool a “document changed” request to a queue and then have a set of asynchronous processes that do all the work in background.

Next, The Hard Problem – handling full cache purges w/o killing our backend servers with 5x-10x traffic (our normal hit ratio is ~90-95%).

We’ve spent a lot of time thinking about this problem and the first idea we came up with was to have a loop process somewhere that would iterate all documents we have cached and purge them one by one… but that does not seem to be a practical solution when you have tens of millions documents (and few page versions per document) and obviously the solution would not scale with constantly growing documents corpus.

So we kept brainstorming and finally got one idea that works just perfectly for us: what if we’d be able to take our traffic and define a function f(t) that would return a percentage of the traffic that should be purged at any moment in time. So we did it – we’ve implemented a nginx module that would version our cache by assigning every cached page a revision (using a custom HTTP-headers + Vary-caching) and would be able to slowly migrate the cache from one revision to another over a pre-defined period of time.

Having that module we are able to do so called “slow” cache purges that could take any time from a few minutes (that still helps to reduce the load spike generated by the hottest content) up to many hours (this is what we normally use) or days (never used this option, but it is definitely possible).

Here is an example 100% cache purge over an 8 hour interval:

  1. Daily hit ratio graph:
    day
  2. Weekly hit ratio graph:
    week

As you can see, during those slow purges our cached pages would be slowly updated without putting too much pressure on the backend. Cache hit ratio would slowly degrade and then slowly get back to its normal levels, but with our normal (6-8 hours) purges hit ratio never gets lower that 65-70% which makes it possible for us to save huge amounts of money on not having 90% spare capacity just for the cache purge load surges (we used to have lots of spare application cluster capacity before introducing this approach).



PlanetMySQL Voting: Vote UP / Vote DOWN