Archive for the ‘Gearman’ Category

Talking about Gearman at Etsy Labs

Август 5th, 2011
I find myself flying to New York on Monday for some dealnews related business. Anytime I travel I try and find something fun to do at night. (Watching a movie by myself in Provo, Utah was kinda not that fun.) So, this week I asked on Twitter if anything was happening while I would be in town. Anything would do. A meetup of PHP/MySQL users or some design/css/js related stuff for example. Pretty much anything interesting. Well, later that day I received an IM from the brilliant John Allspaw, Senior VP of Technical Operations at Etsy. He wanted me to swing by the Etsy offices and say hi. Turns out it is only a block away from where I would be. Awesome! He also mentioned that he would like to have me come and speak at their offices some time. That would be neat too. I will have to plan better next time I am traveling up there.

Fast forward another day. I get an email from Kellan Elliott-McCrea, CTO of Etsy wanting to know if I would come to the Etsy offices and talk about Gearman. At first I thought "That is short notice, man. I don't know that I can pull that off." Then I remembered the last time I was asked to speak at an event on short notice based off a recommendation from John Allspaw.

It was in 2008 for some new conference called Velocity. That only turned out to be the best conference I have ever attended. I have been to Velocity every year since and this year took our whole team. In addition, I spoke again in 2009 at Velocity, wrote a chapter for John's book Web Operations that was released at Velocity in 2010 and was invited to take part in the Velocity Summit this year (2011) which helps kick off the planning for the actual conference. The moral of that story for me is: when John Allspaw wants you to take part in something, you do it.

In reality, it was not that tough a decision. Even without John's involvement, I love the chance to talk about geeky stuff. The Etsy and dealnews engineering teams are like two twins separated at birth. Every time we compare notes, we are doing the same stuff. For example, we have been trading Open Source code lately. They are using my GearmanManager and we just started using their statistics collection daemon, statsd. So, speaking to their people about what we do seem like a great opportunity to share and get input.

The event is open to the public. So, if you use Gearman, want to use Gearman, or just want to hear how we use Gearman at dealnews, come here me ramble on about how awesome it is Tuesday night in Dumbo at Etsy Labs. You can RSVP on the event page.

Best Practices for Gearman by Brian Moon
Etsy Labs
55 Washington St. Ste 712
NY 11222

Tuesday, August 09, 2011 from 7:00 PM - 10:00 PM (ET)
PlanetMySQL Voting: Vote UP / Vote DOWN

CB1 Ubuntu 10.10 Linux Development Setup

Октябрь 17th, 2010

I use a MacBook Pro for my day-to-day operations here at CB1, INC. I’m a huge believer that a development environment should mimic the production environment, so I find myself running a couple virtual machines in VMware Fusion.

The following guide is a reference for myself as well as possibly a helpful resource for setting up your own Linux development environment. Here’s an checklist of the tasks to perform and software to install:

Operating System

Start by installing Ubuntu 10.10 Desktop (or server). I’m not going to cover installing Ubuntu since there are already several other resources out there. Once Ubuntu is installed, open a Terminal:

user@ubuntu:~# sudo passwd root
[sudo] password for user: <type your password>
Enter new UNIX password: <type new root password>
Retype new UNIX password: <type new root password again>
passwd: password updated successfully

user@ubuntu:~# sudo apt-get update
user@ubuntu:~# sudo apt-get upgrade

user@ubuntu:~# mkdir ~/src

New File Permissions

user@ubuntu:~# sudo pico /etc/profile

Change 022 to 002. This setting controls the default permissions when a new file or directory is created. This is mostly useful when managing files over Samba.

Network IP Addresses

Optionally, you may want to assign a static IP address. I set up one IP address for Apache and another for nginx.

user@ubuntu:~# sudo pico /etc/network/interfaces

The following is a reference for adding two static IPs. Change the IPs to meet your needs.

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
	address 192.168.1.200
	netmask 255.255.255.0
	gateway 192.168.1.1

auto eth0:1
iface eth0:1 inet static
	address 192.168.1.201
	netmask 255.255.255.0
user@ubuntu:~# sudo /etc/init.d/networking restart

Packages

Here’s a bunch of packages that will set up compilers, version control, Java, MySQL, Apache, PHP, Memcache, Gearman, Samba, and more.

user@ubuntu:~# sudo apt-get install build-essential autotools-dev autoconf \
 autoconf2.13 openssh-server ethtool traceroute openjdk-6-jdk \
 mysql-server-5.1 bzr subversion subversion-tools ntp ntpdate \
 libpcre3-dev libevent-dev automake bison libtool scons  g++ \
 ncurses-dev libreadline-dev libz-dev libssl-dev  libcurl4-openssl-dev \
 ruby rubygems libzip-ruby1.8 libzip-ruby1.9.1 python-dev ruby-dev \
 libdbus-glib-1-dev uuid-dev libpam0g libpam0g-dev gperf samba valgrind \
 libxml2-dev libfreetype6-dev curl libcurl4-openssl-dev \
 libjpeg62-dev libpng12-dev sqlite3 libsqlite3-dev git-core \
 postgresql postgis gearman libgearman-dev php5 \
 libapache2-mod-php5 php5-dev memcached php5-memcached \
 php5-curl php5-gd php5-mysql php5-pgsql php-apc \
 php5-xdebug php5-fpm libapache2-mod-fastcgi

MySQL

During the package install above, MySQL will prompt you for the root password.

After the packages are installed, we need to allow remote MySQL connections.

user@ubuntu:~# sudo pico /etc/mysql/my.cnf

Comment out the bind-address line.

# bind-address          = 127.0.0.1

SSH

Next, you may optionally increase the connection keep alive interval for remote ssh connections. Timeouts aren’t really an issue for SSH’ing into a local VM, but really helps for remote installs.

user@ubuntu:~# sudo echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config

Samba

Samba allows me to drag and drop files between my Mac and Linux VM. I personally do not enable/install Samba on production servers.

user@ubuntu:~# sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.orig
user@ubuntu:~# sudo pico /etc/samba/smb.conf

You can add a share such as the following:

[ubuntu]
        force user = <your username>
        writeable = yes
        create mode = 644
        path = /home/<your username>
        directory mode = 755
        force group = <your username>

Then create yourself a Samba user:

user@ubuntu:~# sudo smbpasswd -a <your username>

Apache 2

Apache is mostly configured out of the box, but I like to enable rewrite and SSL so I can test production features.

user@ubuntu:~# sudo a2enmod rewrite
user@ubuntu:~# sudo a2enmod ssl

Since I’m going to run Apache and nginx, I’m going bind Apache to eth0.

user@ubuntu:~# sudo pico /etc/apache2/ports.conf
NameVirtualHost 192.168.1.200:80
Listen 192.168.1.200:80

<IfModule mod_ssl.c>
    Listen 192.168.1.200:443
</IfModule>

Now we need to add eth0‘s IP to the default host:

user@ubuntu:~# sudo pico /etc/apache2/sites-enabled/000-default
<VirtualHost 192.168.1.200:80>
        ServerAdmin webmaster@localhost

        DocumentRoot /var/www
        <Directory />
                Options FollowSymLinks
                AllowOverride None
        </Directory>
        <Directory /var/www/>
                Options Indexes FollowSymLinks MultiViews
                AllowOverride None
                Order allow,deny
                allow from all
        </Directory>

        ErrorLog ${APACHE_LOG_DIR}/error.log
        LogLevel warn
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Restart Apache for the changes to take effect.

user@ubuntu:~# sudo apache2ctl restart

Gearman

By default, Gearman uses memory to store pending jobs in the queue, but I prefer to use MySQL for persistent storage. To do this, first create the queue database and table:

user@ubuntu:~# mysqladmin -uroot -p123123 create gearman
user@ubuntu:~# mysql -uroot -p123123 -e "CREATE TABLE gearman.gearman_queue (
  unique_key VARCHAR(64) NOT NULL,
  function_name VARCHAR(255) NULL,
  priority INT NULL,
  data LONGBLOB NULL,
  PRIMARY KEY (unique_key)
) ENGINE = InnoDB;"

Next update the init script to tell Gearman to use the database:

user@ubuntu:~# sudo mv /etc/default/gearman-job-server /etc/default/gearman-job-server.bak
user@ubuntu:~# sudo echo "PARAMS=\"-q libdrizzle --libdrizzle-host=127.0.0.1" \
   "--libdrizzle-user=root --libdrizzle-password=123123 --libdrizzle-db=gearman" \
   "--libdrizzle-table=gearman_queue --libdrizzle-mysql\"" > /etc/default/gearman-job-server
user@ubuntu:~# sudo /etc/init.d/gearman-job-server restart

Gearman PHP Extension

We need to download and install the Gearman PHP extension if we want to write PHP workers or post jobs to the queue.

user@ubuntu:~# cd ~/src
user@ubuntu:~/src# wget http://pecl.php.net/get/gearman-0.7.0.tgz
user@ubuntu:~/src# tar xzf gearman-0.7.0.tgz
user@ubuntu:~/src# rm gearman-0.7.0.tgz package.xml
user@ubuntu:~/src# cd gearman-0.7.0
user@ubuntu:~/src# phpize
user@ubuntu:~/src# ./configure
user@ubuntu:~/src# make
user@ubuntu:~/src# sudo make install

Next, add the config file to load the Gearman PHP extension:

user@ubuntu:~# sudo echo "extension=gearman.so" >> /etc/php5/conf.d/gearman.ini

memcached PHP Extension

Since we have memcached and the memcached PHP extension install, let’s use it for storing session data:

user@ubuntu:~/src# sudo echo "session.save_handler = memcached
session.save_path = \"127.0.0.1:11211\"" >> /etc/php5/conf.d/memcached.ini

nginx

nginx is web server that is really fast. I use nginx as my primary development web server unless I’m running a web app that only works with Apache. You can choose to install nginx from package, but I like to live life on the bleeding edge, so I’ll be building nginx from source. To install nginx, we need to download the source, compile it, install it, and configure it.

user@ubuntu:~# cd ~/src
user@ubuntu:~/src# wget http://nginx.org/download/nginx-0.8.52.tar.gz
user@ubuntu:~/src# tar xzf nginx-0.8.52.tar.gz
user@ubuntu:~/src# rm nginx-0.8.52.tar.gz
user@ubuntu:~/src# cd nginx-0.8.52
user@ubuntu:~/src# mkdir /var/lib/nginx
user@ubuntu:~/src# ./configure \
    --sbin-path=/usr/sbin \
    --conf-path=/etc/nginx/nginx.conf \
    --error-log-path=/var/log/nginx/error.log \
    --pid-path=/var/run/nginx.pid \
    --lock-path=/var/lock/nginx.lock \
    --http-log-path=/var/log/nginx/access.log \
    --http-client-body-temp-path=/var/lib/nginx/body \
    --http-proxy-temp-path=/var/lib/nginx/proxy \
    --http-fastcgi-temp-path=/var/lib/nginx/fastcgi \
    --http-uwsgi-temp-path=/var/lib/nginx/uwsgi \
    --http-scgi-temp-path=/var/lib/nginx/scgi \
    --with-http_stub_status_module
user@ubuntu:~/src# make
user@ubuntu:~/src# sudo make install

user@ubuntu:~# sudo pico /etc/init.d/nginx

Here’s the init script that will start nginx for us:

#! /bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/nginx
NAME=nginx
DESC=nginx
test -x $DAEMON || exit 0
case "$1" in
  start)
        echo -n "Starting $DESC: "
        start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
        echo "$NAME."
        ;;
  stop)
        echo -n "Stopping $DESC: "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        echo "$NAME."
        ;;
  restart|force-reload)
        echo -n "Restarting $DESC: "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        sleep 1
        start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
        echo "$NAME."
        ;;
  reload)
        echo -n "Reloading $DESC configuration: "
        start-stop-daemon --stop --signal HUP --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        echo "$NAME."
        ;;
  *)
        echo "Usage: /etc/init.d/$NAME {start|stop|restart|reload|force-reload}" >&2
        exit 1
        ;;
esac
exit 0

Now we need to make the init script executable and enable it:

user@ubuntu:~# sudo chmod +x /etc/init.d/nginx
user@ubuntu:~# sudo update-rc.d nginx defaults

user@ubuntu:~# sudo pico /etc/nginx/nginx.conf

Here’s a starter nginx.conf with some basic settings:

user  www-data www-data;
worker_processes  2;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile                on;
    tcp_nodelay             on;
    tcp_nopush              on;
    keepalive_timeout       65;
    server_name_in_redirect off;
    server_tokens           off;

    add_header Strict-Transport-Security max-age=1800;
    add_header X-Frame-Options deny;

    gzip            on;
    gzip_buffers    16 8k;
    gzip_comp_level 9;
    gzip_types      text/plain text/xml application/x-javascript text/css;

    include /etc/nginx/sites/*;
}
user@ubuntu:~# sudo mkdir /etc/nginx/sites
user@ubuntu:~# sudo pico /etc/nginx/sites/default

Now we need to set up a default host that supports PHP (via PHP-FPM, PHP’s FastCGI Process Manager) and we want the default host to use the eth0:1 IP address:

server {
    listen       192.168.1.201:80 default;
    server_name  _;
    root   /var/www;
    index  index.php;
    location / {
        if (!-e $request_filename) {
            rewrite ^/(.*)$ /index.php?q=$1 last;
            break;
        }
    }
    location ~ \.php$ {
        fastcgi_pass   127.0.0.1:9000;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME  /var/www$fastcgi_script_name;
        include        fastcgi_params;
    }
    location ~* (\.(htaccess|engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)|code-style\.pl|Entries.*|Repository|Root|Tag|Template)$ {
        deny all;
    }
}

After the config files are good to go, start nginx:

user@ubuntu:~# sudo /etc/init.d/nginx start

Service Names

I also like to add service names so I can see what ports are in use when I run netstat. I added drizzle and Cassandra for fun despite this post not including them.

user@ubuntu:~# sudo cp /etc/services /etc/services.bak
user@ubuntu:~# su
root@ubuntu:~# echo "drizzle     4427/tcp
drizzle     4427/udp
memcached   11211/tcp
memcached   11211/udp
gearmand    4730/tcp
gearmand    4730/udp
fastcgi     9000/tcp
cassandra   9160/tcp" >> /etc/services
root@ubuntu:~# exit

Android SDK

The Android SDK is unfortunately not in package, so you’ll need to download it from the Android Developer site: http://developer.android.com/sdk/index.html.

user@ubuntu:~# wget http://dl.google.com/android/android-sdk_r07-linux_x86.tgz
user@ubuntu:~# tar xzf android-sdk_r07-linux_x86.tgz
user@ubuntu:~# rm android-sdk_r07-linux_x86.tgz
user@ubuntu:~# sudo mv android-sdk-linux_x86 /usr/local
user@ubuntu:~# sudo find /usr/local/android-sdk-linux_x86 -type d -exec chmod 777 {} \;

You’ll need to add the Android SDK path near the top of your ~/.bash_profile or ~/.bashrc:

export PATH=${PATH}:/usr/local/android-sdk-linux_x86/tools

To manage your Android SDK packages and virtual devices, you’ll need to run the android app:

user@ubuntu:~# android

First go to Available Packages and download version 1.6 and 2.2 Android SDK packages. You can also choose to download the documentation, samples, and Google APIs.

Downloading the package may take several minutes. You don’t have to create a virtual device right now if you are planning on installing Appcelerator’s Titanium platform. You can exit the Android app when you’re done.

Desktop Apps

If you’re running Ubuntu Desktop, there are a couple handy apps I install. The first is Google Chrome and can be directly downloaded from the Google Chrome download page.

I find KCachegrind and GHex to be useful:

user@ubuntu:~# sudo apt-get install kcachegrind ghex

Appcelerator Titanium

Titanium is an awesome platform for developing desktop applications for Linux, Mac OS X, and Windows as well as mobile apps for iPhone and Android. We use Titanium Developer to create Titanium projects. Begin by downloading the 64-bit version of Titanium:

user@ubuntu:~# wget -O titanium.tgz http://www.appcelerator.com/download-linux64

There’s also a 32-bit version available at http://www.appcelerator.com/download-linux32.

Next we unpack Titanium Developer and move it to a safe place:

user@ubuntu:~# tar xzf titanium.tgz
user@ubuntu:~# rm titanium.tgz

Next you need to run the installer by double-clicking the Titanium Developer executable. Run the executable and then click the Install button. You can try installing to /opt/titanium, but you might need root privileges.

Next, there are a few issues with outdated libraries, so we simply delete them:

user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgobject-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libglib-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgio-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgthread-2.0.*

Titanium Developer also complains if /bin/java doesn’t exist, so create a quick link:

user@ubuntu:~# sudo ln -s /usr/bin/java /bin/java

Relaunch Titanium Developer and enter your login credentials. If you don’t have a login, you can get a free account.

After signing in, you may notice there are some updates available in the upper right corner of the window. Click in the box and the updates will be downloaded and installed.

Optionally you can create a launcher icon for your GNOME panel. Don’t forget to escape spaces in the command with a backslash!

Finishing Touches

Lastly, I like to re-arrange my desktop to maximize my coding real estate.

Conclusion

That should get you up and running with a neato dev environment. If you need to run SSL, I wrote a post on Creating Self-Signed Certs on Apache 2.2 and Virtual Hosts and Wildcard SSL Certificates with Apache 2.2.

If you find any typos or additions, please feel free to sound off in the comments!


PlanetMySQL Voting: Vote UP / Vote DOWN

OSCON and OpenStack

Июль 26th, 2010

The past two weeks have been both exciting and extremely busy, first traveling to Austin, TX for the first OpenStack Design Summit, and then back home to Portland, OR for The O’Reilly Open Source Conference (OSCON) and Community Leadership Summit. The events were great in different ways, and there was some overlap with OpenStack since we announced it on the first day of OSCON and created quite a bit of buzz around the conference. I want to comment on a few things that came up during these two weeks.

New Role

I’m now focusing on OpenStack related projects at Rackspace. I’m no longer working on Drizzle, but I will still be involved in the MySQL and database ecosystems through future projects and conferences (see you at OpenSQL Camp). I will also still be working on a couple of Gearman related projects in my spare time. At OSCON I gave two presentations on Gearman and Drizzle, you can find the slides here.

The Five Steps to Open

One question that came up a few times over the past couple weeks is what the term “Open” means when a business or organization decides to adopt the open source philosophy. It turns out this means many different things to folks, and when an organization decides to go open, they need to make a decision on how open they are willing to be. Here are the various layers we’ve seen over the years:

  • Open API – You’ve decided to take the first step to being open and released a well documented API to work with your web service or project. Everything behind the API is still a black-box though.
  • Open Core – Beyond the APIs, you’ve decided to release part of the code open source, but you still keep some of the bits proprietary in an attempt to keep a competitive advantage. This is a hot debate lately on whether it is a viable Open Source business model.
  • Open Source – You’ve decided keeping some code proprietary doesn’t help, and actually even hurts your project or adoption. You put all of the code out in the open for everyone to see. While everyone can see all of the source code, there still isn’t a whole lot of interaction going on.
  • Open Development – Putting the source code out wasn’t enough. You want to enable users and external developers to be able to file bugs, submit patches, and track the development process to see what to expect next. This usually involves running your project on a public project site such as github or Launchpad.
  • Open Decision Making – You’ve postponed the inevitable for long enough. Feature requests and bug reports are pouring in, and the community wants to have a say in what gets prioritized. Should we focus only on stability? Performance? New features? Porting to mobile platforms? Let the community decided the direction of the project.

There have been examples of success for organizations who have stopped at each of these steps. Given the proper environment, any can work. My preference is to work on projects that are fully open, where company and organizational boundaries do not exist between developers and users. I’m thrilled to say that we’ve gone all in with OpenStack. We’re hosted on Launchpad and have a governance structure that allows all parties within the community to have a say in the future of the project.

Preventing Vendor Lock-in

During the Cloud Summit at OSCON, there was a debate titled: “Are Open APIs Enough to Prevent Lock-in?”. Most folks came to the conclusion that the answer is “no,” and I agree. While I feel open APIs are necessary, they are by no means sufficient. Even if a project is open source and allows for open development, it probably will not prevent vendor lock-in. The key is to provide some incentive for vendors to adopt and invest resources within a project. Much like customers don’t want vendor lock-in when choosing a platform, vendors do not want project or feature lock-in when choosing the software to power their business. Each vendor who chooses to participate must have the ability to voice their opinion on the direction of APIs, features, and other project priorities. This is why it is critical that any open source project must take all the steps described above to give the project a chance of being adopted and becoming the de facto standard. There is of course no guarantee that adoption and prevention of vendor lock-in will happen, but I see them as necessary steps.

This is another area where OpenStack has done the correct thing. We are planning on having another developer summit in November, and then once every six months after that time. All design discussions and decision making will happen in public forums such as the mailing list and IRC. We want all participants in the community to have a chance to respond to topics being discussed, and we believe the more we have, the more successful the project will be. Having many voices allows the project to be more applicable to different environments. For example, Rackspace and NASA have different requirements for their compute architectures, but they also share many components as well. Through open participation we can ensure all needs are accounted for. Much like the LAMP stack has powered universities, governments, and competing business, we hope OpenStack can do the same.

Contributor License Agreement (CLA)

During the past couple of weeks a few folks asked what the CLA was all about. When the foundations of OpenStack were forming, the requirement of having a CLA came up from the legal side. Having been involved with open source projects that had very invasive CLAs, initially I had quite a bit of concern. The CLA is actually quite innocuous, and it does NOT require assignment or dual-ownership of copyright. You are the sole owner of code you contribute. For all intents and purposes it is a signed version of the Apache 2.0 license, the CLA just makes these terms more explicit. The CLA is handled through digital signatures, so no papers, pens, or faxing is required.

Get Involved!

Expect to see more posts on my blog related to OpenStack topics. If you would like to get involved, you can join the IRC channel (#openstack on irc.freenode.net), join the mailing list, or start contributing code! There are even jobs around OpenStack popping up already!


PlanetMySQL Voting: Vote UP / Vote DOWN

OpenSQLCamp Boston Pages are online

Июнь 23rd, 2010

OpenSQLCamp is less than 4 months away, and I have finally gotten around to updating the site. Special thanks go to Bradley Kuzsmaul and the folks at Tokutek for getting the ball rolling and making the reservation at MIT. Using MIT means that we will have *free* reliable wireless guest access and projects.

OpenSQL Camp is a free unconference for people interested in open source databases (MySQL, SQLite, Postgres, Drizzle), including non-relational databases, database alternatives like NoSQL stores, and database tools such as Gearman. We are not focusing on any one project, and hope to see representatives from a variety of open source database projects attend. As usual I am one of the main organizers of Open SQL Camp (in previous years, Baron Schwartz, Selena Deckelmann and Eric Day have been main organizers too; this year Bradley Kuzsmaul is the other main organizer). The target audience are users and developers, but others are encouraged to attend too. There will be both presentations and hackathons, with plenty of opportunities to learn, contribute, and collaborate!

I have updated the main Boston 2010 page at http://opensqlcamp.org/Events/Boston2010/ with travel and logistics information, including links to:

Register — it’s free and easy, and you can always change your mind later!

Maybe you have an idea for a session you would like to see, or a session you would like to give? If so, you can note it on the sessions page. This will give everyone a sense of what type of presentations will be there. I have started by putting 2 sessions I am willing to give and a third at the bottom for one I’d like to see, to give everyone an idea of both types of descriptions.

Probably the most important link right now is the way we keep OpenSQLCamp free for all attendees – sponsor or donate to the conference! Any donation amount is accepted, and all donations are tax-exempt to the fullest extent of the law. Businesses and organizations will be listed as sponsors if they make a donation of $250 or more, and individuals will be listed as sponsors if they make a donation of $100 or more. More information on sponsor benefits, including where to send a graphic to, at the link.

There is a preliminary schedule, up until the conference itself it will only show the agenda of the conference — how many rooms and what time the presentations are supposed to be. During and after the conference we will update this schedule page with the titles, presenters and links to any notes/videos/audio taken.

If you have any questions, please do not hesitate to ask on the mailing list or by posting a comment here.


PlanetMySQL Voting: Vote UP / Vote DOWN

Threads with Events

Апрель 20th, 2010

Last week I was surprised to see this paper bubble back up on Planet MySQL. It describes the pros and cons of thread and event based programming for high concurrency applications (like a web server), arguing that thread-based programming is superior if you use an appropriate lightweight threading implementation. I don’t entirely disagree with this, but the problem is such a library does not exist that is standard, portable, and useful for all types of applications. We have POSIX threads in the portable Linux/Unix/BSD world, so we need to work with this. Other experimental libraries based on lightweight threads or “fibers” are really interesting as they can maintain your stack without all the normal overhead, but it is hard to get the scheduling correct for all application types. I would even argue that thread and event based programming is actually not all that different, it’s just a matter of how state is maintained (stack vs state variables) and how scheduling is performed.

The comparisons done in that paper also put a C-based web server using a co-routine threading library against a Java based server that depends on the poll() system call. I’m sorry, but this is comparing apples to oranges. First, you’re in the Java VM with a number of runtime components (like garbage collection) which may be getting in the way. Also, the standard poll() system call is not an efficient event-handling mechanism, it’s much better to use epoll or some other Kernel-based handling mechanism.

One high-concurrency userland threading implementation I do like is in Erlang. Erlang processes are extremely lightweight and I’ve written apps that depend heavily on them. One interesting application I saw was caching objects where each object got it’s own Erlang process. This put a whole new spin on cache management, and it looked like it could actually scale reasonably well. The “problem” with Erlang, which may or may not be a problem depending on your requirements, is that it is still a bit of overhead running byte-code in a VM, as well as it being a functional language. I love functional programming, but I’ve found it still ties most developer’s heads in knots if they don’t have a reason to use it regularly. For open source projects trying to build a contributor community, it can act as one more hurdle.

So, what is the “best” paradigm?

Back in 2000 some colleagues and I wrote a hybrid thread-event library that would create one event-handler instance per thread, and connections would be spread across the pool of event-handling threads. I believe this gave the best of both worlds, and I saw high throughputs with fairly minimal overhead. I wrote a number of servers based on this architecture, including HTTP, IMAP, POP3, and DNS, and with each server type this model proved to be efficient and scalable. Ultimately the best architecture depends on your application. If you never intend to have many connections, and your applications has long-running computations, one-thread-per-connection would probably be best. If you need to handle large numbers of connections and have short, non-blocking request processing, event-based scales extremely well. You can of course create a hybrid of these two and have all connections managed by event threads and asynchronous queues to dedicated processing threads for heavy request handling (this is sort of what I did in the C Gearman Job Server).

There is no single correct answer, so take a look at your options before deciding how to approach your own applications. Don’t be afraid to create hybrids as well. Regardless of which paradigm you choose, concurrent programming can be hard, especially at the lower levels. There have been a number of higher level abstractions to help developers, from new libraries to new languages, but most of these come with a cost in performance or flexibility. When you need to squeeze every bit of performance out of your application, you will most likely end up in C or C++ dealing with these issues directly.

This is actually one of the problems I’m attempting to address with the Scale Stack Event modules. I’m trying to create a healthy level of abstraction on hybrid thread/event based applications so you don’t have any overhead or limitations while a lot of the common headaches are taken care of for you. If you have a need for such a system, get in touch, I’d be interested to talk. Since it is BSD licensed you can use it in any application, including commercial.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Conference Review

Апрель 17th, 2010
I am back home from a good week at the 2010 O'Reilly MySQL Conference & Expo. I had a great time and got to see some old friends I had not seen in a while.

Oracle gave the opening keynote and it went pretty much like I thought it would. Oracle said they will keep MySQL alive. They talked about the new 5.5 release. It was pretty much the same keynote Sun gave last year. Time will tell what Oracle does with MySQL.

The expo hall was sparse. Really sparse. There were a fraction of the booths compared to the past. I don't know why the vendors did not come. Maybe because they don't want to compete with Oracle/Sun? In the past you would see HP or Intel have a booth at the conference. But, with Oracle/Sun owning MySQL, why even try. Or maybe they are not allowed? I don't know. It was just sad.

I did stop by the Maatkit booth and was embarrassed to tell Baron (its creator) I was not already using it. I had heard people talk about it in the past, but never stopped to see what it does. It would have only saved me hours and hours of work over the last few years. Needless to say it is now being installed on our servers. If you use MySQL, just go install Maatkit now and start using it. Don't be like me. Don't wait for years, writing the same code over and over to do simple maintenance tasks.

Gearman had a good deal of coverage at the conference. There were three talks and a BoF. All were well attended. Some people seemed to have an AHA! moment where they saw how Gearman could help their architecture. I also got to sit down with the PECL/gearman maintainers and discuss the recent bug I found that is keeping me from using it.

I spoke about Memcached as did others. Again, there was a BoF. It was well attended and people had good questions about it. There seemed to be some FUD going around that memcached is somehow inefficient or not keeping up with technology. However, I have yet to see numbers or anything that proves any of this. They are just wild claims by people that have something to sell. Everyone wants to be the caching company since there is no "Memcached, Inc.". There is no company in charge. That is a good thing, IMO.

That brings me to my favorite topic for the conference, Drizzle. I wrote about Drizzle here on this blog when it was first announced. At the time MySQL looked like it was moving forward at a good pace. So, I had said that it would only replace MySQL in one part of our stack. However, after what, in my opinion, has been a lack of real change in MySQL, I think I may have changed my mind. Brian Aker echoed this sentiment in his keynote address about Drizzle. He talked about how MySQL AB and later Sun had stopped focusing on the things that made MySQL popular and started trying to be a cheap version of Oracle. That is my interpretation of what he said, not his words.

Why is Drizzle different? Like Memcached and Gearman, there is no "Drizzle, Inc.". It is an Open Source project that is supported by the community. It is being supported by companies like Rackspace who hired five developers to work on it. The code is kept on Launchpad and is completely open. Anyone can create a branch and work on the code. If your patches are good, they will be merged into the main branch. But, you can keep your own branch going if you want to. Unlike the other forks, Drizzle has started over in both the code and the community. I personally see it as the only way forward. It is not ready today, but my money is on Drizzle five or ten years from now.
PlanetMySQL Voting: Vote UP / Vote DOWN

Expert PHP and MySQL published!

Апрель 15th, 2010
Expert PHP and MySQLI'm very pleased to announce the publication of my book Expert PHP and MySQL, published by Wrox. This book was written by myself, Andrew Curioso and Ronald Bradford. The short of it is, upon finishing my previous book, Developing Web Applications with Apache, MySQL, memcached, and Perl, Wiley asked me if I knew of anyone who would like to write a MySQL/PHP book. I had worked with Andrew at Lycos and found him to be a brilliant PHP developer, having been the primary developer of Lycos's Webon product-- which has some excellent usage of PHP, Javascript and MySQL. When I friend of mine Bob Wilkins, who started MyVBO, was looking for a developer I suggested Andrew (he now works at MyVBO), and for this book I also suggested Andrew. Andrew had also written a short book for O'Reilly on AJAX, so Wiley was glad to have had him as a suggestion. They came back and asked if I would lend a hand to which despite being exhausted from writing my first book, decided to contribute and work on some chapters that covered material that thought would be a great benefit to the community-- namely Gearman, Sphinx, the Memcached Functions for MySQL as well as Memcached itself. Finally, we needed someone else with MySQL expertise to add to the mix to cover more advanced MySQL information to which Ronald Bradford, who I know, like, and respect as an expert in the field, signed on also.

For me, I have been more of a Perl guy, however, I like PHP just as well and even prefer how it's web deployment model is much easier (you don't need to modify your httpd config files) was interested in a challenge of writing a book on a different language. Also, some fruits of this being that my project from my Perl book, Narada, now has a PHP port.

I'm proud of this book. It was harder in some ways to write a book with other people than do it all myself, despite far fewer pages. However, the end product took advantage of all our strengths.

Topics covered in this book are (but not limited to!):

* PHP and MySQL techniques every programmer should know
* Advanced PHP concepts such as using iterators, making classes behave like functions, using true lambda functions and closures
* MySQL storage engines
* Using the information schema
* Improving performance through caching - using memcached to add a caching layer to your application
* Writing UDFs
* Writing PHP extensions
* Using the Memcached Functions for MySQL (UDFs)
* Full text search - installing, configuring, and using Sphinx in a PHP application
* Multi-tasking in PHP using Gearman. Narada is used as an example application demonstrating how to put many of the concepts of the book into an application
* Using Apache rewrite rules
* HTTP-based authentication

All-in-all, this is a great book that I hope will benefit PHP/MySQL or any other web developers who need a source of expert information!

I want to thank Andrew, Ronald, Wiley (Bob Elliot, Maureen Spears), Trond Norbye and Eric Day (who tech-edited the book!), and all others who I could write a book in itself who have helped (see my credits in the book!). Also, I want to thank the team at NorthScale, my employer, for being able to work with a team experts who I could refer to while working on this book.

You can buy this book from any major publisher, and from Wiley at http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470563125.html

PlanetMySQL Voting: Vote UP / Vote DOWN

Gearman Releases and Talks at the MySQL Conference

Апрель 6th, 2010

I spent some time this weekend fixing up the Gearman MySQL UDFs (user defined functions) and fixed a few bugs in the Gearman Server. You can find links to the new releases on the Gearman website. The UDFs now use Monty Taylor’s pandora-build autoconf files instead of the old fragile autoconf setup that relied on pkgconfig.

If you are attending the MySQL Conference & Expo next week and want to learn more about Gearman, be sure to check out one of the three sessions Giuseppe Maxia and I are giving:

Hope to see you there!


PlanetMySQL Voting: Vote UP / Vote DOWN

Gearman Releases and Talks at the MySQL Conference

Апрель 6th, 2010

I spent some time this weekend fixing up the Gearman MySQL UDFs (user defined functions) and fixed a few bugs in the Gearman Server. You can find links to the new releases on the Gearman website. The UDFs now use Monty Taylor’s pandora-build autoconf files instead of the old fragile autoconf setup that relied on pkgconfig.

If you are attending the MySQL Conference & Expo next week and want to learn more about Gearman, be sure to check out one of the three sessions Giuseppe Maxia and I are giving:

Hope to see you there!


PlanetMySQL Voting: Vote UP / Vote DOWN

Logging with MySQL

Март 24th, 2010
I was reading a post by Dathan Vance Pattishall titled "Cassandra is my NoSQL solution but..". In the post, Dathan explains that he uses Cassandra to store clicks because it can write a lot faster than MySQL. However, he runs into problems with the read speed when he needs to get a range of data back from Cassandra. This is the number one problem I have with NoSQL solutions.

SQL is really good at retrieving a set of data based on a key or range of keys. Whereas NoSQL products are really good at writing things and retrieving one item from storage. When looking at redoing our architecture a few years ago to be more scalable, I had to consider these two issues. For what it is worth, the NoSQL market was not nearly as mature as it is now. So, my choices were much more limited. In the end, we decided to stick with MySQL. It turns out that a primary or unique key lookup on a MySQL/InnoDB table is really fast. It is sort of like having a key/value storage system. And, I can still do range based queries against it.

But, back to Dathan's problem: clicks. We store clicks at dealnews. Lots of clicks. We also store views. We store more views than we do clicks. So, lots of views and lots of clicks. (Sorry for the vague numbers, company secrets and all. We are a top 1,000 Compete.com site during peak shopping season.) And we do it all in MySQL. And we do it all with one server. I should disclose we are deploying a second server, but it is more for high availability than processing power. Like Dathan, we only use about the last 24 hours of data at any given time. There are three keys for us doing logging like this in MySQL.

Use MyISAM

MyISAM supports concurrent inserts. Concurrent inserts means that inserts can add rows to the end of a table while selects are being performed on other parts of the data set. This is exactly the use case for our logging. There are caveats with range queries as pointed out by the MySQL Performance Blog.

Rotating tables

MySQL (and InnoDB in particular) really sucks at deleting rows. Like, really sucks. Deleting causes locks. Bleh. So, we never delete rows from our logging tables. Instead, nightly we rotate the tables. RENAME TABLE is an (near) atomic process in MySQL. So, we just create a new table.
create table clicks_new like clicks;
rename table clicks to clicks_2010032500001, clicks_new to clicks;

Tada! We now have an empty table for today's clicks. We now drop any table with a date stamp that is longer than x days old. Drops are fast, we like drops.

For querying these tables, we use UNION. It works really well. We just issue a SHOW TABLES LIKE 'clicks%' and union the query across all the tables. Works like a charm.

Gearman

So, I get a lot of flack at work for my outright lust for Gearman. It is my new duct tape. When you have a scalability problem, there is a good chance you can solve it with Gearman. So, how does this help with logging to MySQL? Well, sometimes, MySQL can become backed up with inserts. It happens to the best of us. So, instead of letting that pile up in our web requests, we let it pile up in Gearman. Instead of having our web scripts write to MySQL directly, we have them fire Gearman background jobs with the logging data in them. The Gearman workers can then write to the MySQL server when it is available. Under normal operating procedure, that is in near real time. But, if the MySQL server does get backed up, the jobs just queue up in Gearman and are processed when the MySQL server is available.

BONUS! Insert Delayed

This is our old trick before we used Gearman. MySQL (MyISAM) has a neat feature where you can have inserts delayed until the table is available. The query is sent to the MySQL server and it answers with success immediately to the client. This means your web script can continue on and not get blocked waiting for the insert. But, MySQL will only queue up so many before it starts erroring out. So, it is not as fool proof as a job processing system like Gearman.

Summary

To log with MySQL:
  • Use MyISAM with concurrent inserts
  • Rotate tables daily and use UNION to query
  • Use delayed inserts with MySQL or a job processing agent like Gearman
Happy logging!

PS: You may be asking, "Brian, what about Partitioned Tables?" I asked myself that before deploying this solution. More importantly, in IRC I asked Brian Aker about MySQL partitioned tables. I am paraphrasing, but he said that if I ever think I might alter that table, I would not trust it with the partitions in MySQL. So, that kind of turned me off of them.
PlanetMySQL Voting: Vote UP / Vote DOWN