Archive for the ‘Gearman’ Category
Talking about Gearman at Etsy Labs
Август 5th, 2011Fast forward another day. I get an email from Kellan Elliott-McCrea, CTO of Etsy wanting to know if I would come to the Etsy offices and talk about Gearman. At first I thought "That is short notice, man. I don't know that I can pull that off." Then I remembered the last time I was asked to speak at an event on short notice based off a recommendation from John Allspaw.
It was in 2008 for some new conference called Velocity. That only turned out to be the best conference I have ever attended. I have been to Velocity every year since and this year took our whole team. In addition, I spoke again in 2009 at Velocity, wrote a chapter for John's book Web Operations that was released at Velocity in 2010 and was invited to take part in the Velocity Summit this year (2011) which helps kick off the planning for the actual conference. The moral of that story for me is: when John Allspaw wants you to take part in something, you do it.
In reality, it was not that tough a decision. Even without John's involvement, I love the chance to talk about geeky stuff. The Etsy and dealnews engineering teams are like two twins separated at birth. Every time we compare notes, we are doing the same stuff. For example, we have been trading Open Source code lately. They are using my GearmanManager and we just started using their statistics collection daemon, statsd. So, speaking to their people about what we do seem like a great opportunity to share and get input.
The event is open to the public. So, if you use Gearman, want to use Gearman, or just want to hear how we use Gearman at dealnews, come here me ramble on about how awesome it is Tuesday night in Dumbo at Etsy Labs. You can RSVP on the event page.
Best Practices for Gearman by Brian Moon
Etsy Labs
55 Washington St. Ste 712
NY 11222
Tuesday, August 09, 2011 from 7:00 PM - 10:00 PM (ET)
PlanetMySQL Voting: Vote UP / Vote DOWN
CB1 Ubuntu 10.10 Linux Development Setup
Октябрь 17th, 2010I use a MacBook Pro for my day-to-day operations here at CB1, INC. I’m a huge believer that a development environment should mimic the production environment, so I find myself running a couple virtual machines in VMware Fusion.
The following guide is a reference for myself as well as possibly a helpful resource for setting up your own Linux development environment. Here’s an checklist of the tasks to perform and software to install:
- Operating System
- Ubuntu 10.10 64-bit: I use Ubuntu Desktop in dev and Ubuntu Server in production
- Package updates and upgrades
- Network configuration (at least 2 static IP addresses)
- Development Tools
- C/C++ development environment
- Autotools
- Sun Java JDK
- Valgrind
- Version control: Subversion, Bazaar, git
- Android SDK
- Servers
- Samba (file sharing)
- SSH (remote shell access)
- Apache 2.2 (web server)
- nginx 0.8 (web server)
- PHP 5.3.3 (application server)
- PHP-FPM (PHP’s FastCGI process manager)
- MySQL 5.1 (database server)
- PostgreSQL (database server)
- memcached 1.4.5 (caching layer)
- Gearman (job queue manager)
- PHP Extensions
- Desktop Applications
- Google Chrome
- KCachegrind
- Appcelerator Titanium
Operating System

Start by installing Ubuntu 10.10 Desktop (or server). I’m not going to cover installing Ubuntu since there are already several other resources out there. Once Ubuntu is installed, open a Terminal:
user@ubuntu:~# sudo passwd root [sudo] password for user: <type your password> Enter new UNIX password: <type new root password> Retype new UNIX password: <type new root password again> passwd: password updated successfully user@ubuntu:~# sudo apt-get update user@ubuntu:~# sudo apt-get upgrade user@ubuntu:~# mkdir ~/src
New File Permissions
user@ubuntu:~# sudo pico /etc/profile
Change 022 to 002. This setting controls the default permissions when a new file or directory is created. This is mostly useful when managing files over Samba.
Network IP Addresses
Optionally, you may want to assign a static IP address. I set up one IP address for Apache and another for nginx.
user@ubuntu:~# sudo pico /etc/network/interfaces
The following is a reference for adding two static IPs. Change the IPs to meet your needs.
auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 192.168.1.200 netmask 255.255.255.0 gateway 192.168.1.1 auto eth0:1 iface eth0:1 inet static address 192.168.1.201 netmask 255.255.255.0
user@ubuntu:~# sudo /etc/init.d/networking restart
Packages
Here’s a bunch of packages that will set up compilers, version control, Java, MySQL, Apache, PHP, Memcache, Gearman, Samba, and more.
user@ubuntu:~# sudo apt-get install build-essential autotools-dev autoconf \
autoconf2.13 openssh-server ethtool traceroute openjdk-6-jdk \
mysql-server-5.1 bzr subversion subversion-tools ntp ntpdate \
libpcre3-dev libevent-dev automake bison libtool scons g++ \
ncurses-dev libreadline-dev libz-dev libssl-dev libcurl4-openssl-dev \
ruby rubygems libzip-ruby1.8 libzip-ruby1.9.1 python-dev ruby-dev \
libdbus-glib-1-dev uuid-dev libpam0g libpam0g-dev gperf samba valgrind \
libxml2-dev libfreetype6-dev curl libcurl4-openssl-dev \
libjpeg62-dev libpng12-dev sqlite3 libsqlite3-dev git-core \
postgresql postgis gearman libgearman-dev php5 \
libapache2-mod-php5 php5-dev memcached php5-memcached \
php5-curl php5-gd php5-mysql php5-pgsql php-apc \
php5-xdebug php5-fpm libapache2-mod-fastcgi
MySQL
During the package install above, MySQL will prompt you for the root password.
After the packages are installed, we need to allow remote MySQL connections.
user@ubuntu:~# sudo pico /etc/mysql/my.cnf
Comment out the bind-address line.
# bind-address = 127.0.0.1
SSH
Next, you may optionally increase the connection keep alive interval for remote ssh connections. Timeouts aren’t really an issue for SSH’ing into a local VM, but really helps for remote installs.
user@ubuntu:~# sudo echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config
Samba
Samba allows me to drag and drop files between my Mac and Linux VM. I personally do not enable/install Samba on production servers.
user@ubuntu:~# sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.orig user@ubuntu:~# sudo pico /etc/samba/smb.conf
You can add a share such as the following:
[ubuntu]
force user = <your username>
writeable = yes
create mode = 644
path = /home/<your username>
directory mode = 755
force group = <your username>
Then create yourself a Samba user:
user@ubuntu:~# sudo smbpasswd -a <your username>
Apache 2
Apache is mostly configured out of the box, but I like to enable rewrite and SSL so I can test production features.
user@ubuntu:~# sudo a2enmod rewrite user@ubuntu:~# sudo a2enmod ssl
Since I’m going to run Apache and nginx, I’m going bind Apache to eth0.
user@ubuntu:~# sudo pico /etc/apache2/ports.conf
NameVirtualHost 192.168.1.200:80
Listen 192.168.1.200:80
<IfModule mod_ssl.c>
Listen 192.168.1.200:443
</IfModule>
Now we need to add eth0‘s IP to the default host:
user@ubuntu:~# sudo pico /etc/apache2/sites-enabled/000-default
<VirtualHost 192.168.1.200:80>
ServerAdmin webmaster@localhost
DocumentRoot /var/www
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
<Directory /var/www/>
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
</Directory>
ErrorLog ${APACHE_LOG_DIR}/error.log
LogLevel warn
CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
Restart Apache for the changes to take effect.
user@ubuntu:~# sudo apache2ctl restart
Gearman
By default, Gearman uses memory to store pending jobs in the queue, but I prefer to use MySQL for persistent storage. To do this, first create the queue database and table:
user@ubuntu:~# mysqladmin -uroot -p123123 create gearman user@ubuntu:~# mysql -uroot -p123123 -e "CREATE TABLE gearman.gearman_queue ( unique_key VARCHAR(64) NOT NULL, function_name VARCHAR(255) NULL, priority INT NULL, data LONGBLOB NULL, PRIMARY KEY (unique_key) ) ENGINE = InnoDB;"
Next update the init script to tell Gearman to use the database:
user@ubuntu:~# sudo mv /etc/default/gearman-job-server /etc/default/gearman-job-server.bak user@ubuntu:~# sudo echo "PARAMS=\"-q libdrizzle --libdrizzle-host=127.0.0.1" \ "--libdrizzle-user=root --libdrizzle-password=123123 --libdrizzle-db=gearman" \ "--libdrizzle-table=gearman_queue --libdrizzle-mysql\"" > /etc/default/gearman-job-server user@ubuntu:~# sudo /etc/init.d/gearman-job-server restart
Gearman PHP Extension
We need to download and install the Gearman PHP extension if we want to write PHP workers or post jobs to the queue.
user@ubuntu:~# cd ~/src user@ubuntu:~/src# wget http://pecl.php.net/get/gearman-0.7.0.tgz user@ubuntu:~/src# tar xzf gearman-0.7.0.tgz user@ubuntu:~/src# rm gearman-0.7.0.tgz package.xml user@ubuntu:~/src# cd gearman-0.7.0 user@ubuntu:~/src# phpize user@ubuntu:~/src# ./configure user@ubuntu:~/src# make user@ubuntu:~/src# sudo make install
Next, add the config file to load the Gearman PHP extension:
user@ubuntu:~# sudo echo "extension=gearman.so" >> /etc/php5/conf.d/gearman.ini
memcached PHP Extension
Since we have memcached and the memcached PHP extension install, let’s use it for storing session data:
user@ubuntu:~/src# sudo echo "session.save_handler = memcached
session.save_path = \"127.0.0.1:11211\"" >> /etc/php5/conf.d/memcached.ini
nginx
nginx is web server that is really fast. I use nginx as my primary development web server unless I’m running a web app that only works with Apache. You can choose to install nginx from package, but I like to live life on the bleeding edge, so I’ll be building nginx from source. To install nginx, we need to download the source, compile it, install it, and configure it.
user@ubuntu:~# cd ~/src user@ubuntu:~/src# wget http://nginx.org/download/nginx-0.8.52.tar.gz user@ubuntu:~/src# tar xzf nginx-0.8.52.tar.gz user@ubuntu:~/src# rm nginx-0.8.52.tar.gz user@ubuntu:~/src# cd nginx-0.8.52 user@ubuntu:~/src# mkdir /var/lib/nginx user@ubuntu:~/src# ./configure \ --sbin-path=/usr/sbin \ --conf-path=/etc/nginx/nginx.conf \ --error-log-path=/var/log/nginx/error.log \ --pid-path=/var/run/nginx.pid \ --lock-path=/var/lock/nginx.lock \ --http-log-path=/var/log/nginx/access.log \ --http-client-body-temp-path=/var/lib/nginx/body \ --http-proxy-temp-path=/var/lib/nginx/proxy \ --http-fastcgi-temp-path=/var/lib/nginx/fastcgi \ --http-uwsgi-temp-path=/var/lib/nginx/uwsgi \ --http-scgi-temp-path=/var/lib/nginx/scgi \ --with-http_stub_status_module user@ubuntu:~/src# make user@ubuntu:~/src# sudo make install user@ubuntu:~# sudo pico /etc/init.d/nginx
Here’s the init script that will start nginx for us:
#! /bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/nginx
NAME=nginx
DESC=nginx
test -x $DAEMON || exit 0
case "$1" in
start)
echo -n "Starting $DESC: "
start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
echo "$NAME."
;;
stop)
echo -n "Stopping $DESC: "
start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
echo "$NAME."
;;
restart|force-reload)
echo -n "Restarting $DESC: "
start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
sleep 1
start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
echo "$NAME."
;;
reload)
echo -n "Reloading $DESC configuration: "
start-stop-daemon --stop --signal HUP --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
echo "$NAME."
;;
*)
echo "Usage: /etc/init.d/$NAME {start|stop|restart|reload|force-reload}" >&2
exit 1
;;
esac
exit 0
Now we need to make the init script executable and enable it:
user@ubuntu:~# sudo chmod +x /etc/init.d/nginx user@ubuntu:~# sudo update-rc.d nginx defaults user@ubuntu:~# sudo pico /etc/nginx/nginx.conf
Here’s a starter nginx.conf with some basic settings:
user www-data www-data;
worker_processes 2;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nodelay on;
tcp_nopush on;
keepalive_timeout 65;
server_name_in_redirect off;
server_tokens off;
add_header Strict-Transport-Security max-age=1800;
add_header X-Frame-Options deny;
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 9;
gzip_types text/plain text/xml application/x-javascript text/css;
include /etc/nginx/sites/*;
}
user@ubuntu:~# sudo mkdir /etc/nginx/sites user@ubuntu:~# sudo pico /etc/nginx/sites/default
Now we need to set up a default host that supports PHP (via PHP-FPM, PHP’s FastCGI Process Manager) and we want the default host to use the eth0:1 IP address:
server {
listen 192.168.1.201:80 default;
server_name _;
root /var/www;
index index.php;
location / {
if (!-e $request_filename) {
rewrite ^/(.*)$ /index.php?q=$1 last;
break;
}
}
location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /var/www$fastcgi_script_name;
include fastcgi_params;
}
location ~* (\.(htaccess|engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)|code-style\.pl|Entries.*|Repository|Root|Tag|Template)$ {
deny all;
}
}
After the config files are good to go, start nginx:
user@ubuntu:~# sudo /etc/init.d/nginx start
Service Names
I also like to add service names so I can see what ports are in use when I run netstat. I added drizzle and Cassandra for fun despite this post not including them.
user@ubuntu:~# sudo cp /etc/services /etc/services.bak user@ubuntu:~# su root@ubuntu:~# echo "drizzle 4427/tcp drizzle 4427/udp memcached 11211/tcp memcached 11211/udp gearmand 4730/tcp gearmand 4730/udp fastcgi 9000/tcp cassandra 9160/tcp" >> /etc/services root@ubuntu:~# exit
Android SDK
The Android SDK is unfortunately not in package, so you’ll need to download it from the Android Developer site: http://developer.android.com/sdk/index.html.
user@ubuntu:~# wget http://dl.google.com/android/android-sdk_r07-linux_x86.tgz user@ubuntu:~# tar xzf android-sdk_r07-linux_x86.tgz user@ubuntu:~# rm android-sdk_r07-linux_x86.tgz user@ubuntu:~# sudo mv android-sdk-linux_x86 /usr/local user@ubuntu:~# sudo find /usr/local/android-sdk-linux_x86 -type d -exec chmod 777 {} \;
You’ll need to add the Android SDK path near the top of your ~/.bash_profile or ~/.bashrc:
export PATH=${PATH}:/usr/local/android-sdk-linux_x86/tools
To manage your Android SDK packages and virtual devices, you’ll need to run the android app:
user@ubuntu:~# android
First go to Available Packages and download version 1.6 and 2.2 Android SDK packages. You can also choose to download the documentation, samples, and Google APIs.


Downloading the package may take several minutes. You don’t have to create a virtual device right now if you are planning on installing Appcelerator’s Titanium platform. You can exit the Android app when you’re done.
Desktop Apps
If you’re running Ubuntu Desktop, there are a couple handy apps I install. The first is Google Chrome and can be directly downloaded from the Google Chrome download page.
I find KCachegrind and GHex to be useful:
user@ubuntu:~# sudo apt-get install kcachegrind ghex
Appcelerator Titanium
Titanium is an awesome platform for developing desktop applications for Linux, Mac OS X, and Windows as well as mobile apps for iPhone and Android. We use Titanium Developer to create Titanium projects. Begin by downloading the 64-bit version of Titanium:
user@ubuntu:~# wget -O titanium.tgz http://www.appcelerator.com/download-linux64
There’s also a 32-bit version available at http://www.appcelerator.com/download-linux32.
Next we unpack Titanium Developer and move it to a safe place:
user@ubuntu:~# tar xzf titanium.tgz user@ubuntu:~# rm titanium.tgz
Next you need to run the installer by double-clicking the Titanium Developer executable. Run the executable and then click the Install button. You can try installing to /opt/titanium, but you might need root privileges.


Next, there are a few issues with outdated libraries, so we simply delete them:
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgobject-2.0.* user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libglib-2.0.* user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgio-2.0.* user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgthread-2.0.*
Titanium Developer also complains if /bin/java doesn’t exist, so create a quick link:
user@ubuntu:~# sudo ln -s /usr/bin/java /bin/java
Relaunch Titanium Developer and enter your login credentials. If you don’t have a login, you can get a free account.

After signing in, you may notice there are some updates available in the upper right corner of the window. Click in the box and the updates will be downloaded and installed.

Optionally you can create a launcher icon for your GNOME panel. Don’t forget to escape spaces in the command with a backslash!

Finishing Touches
Lastly, I like to re-arrange my desktop to maximize my coding real estate.

Conclusion
That should get you up and running with a neato dev environment. If you need to run SSL, I wrote a post on Creating Self-Signed Certs on Apache 2.2 and Virtual Hosts and Wildcard SSL Certificates with Apache 2.2.
If you find any typos or additions, please feel free to sound off in the comments!
PlanetMySQL Voting: Vote UP / Vote DOWN
OSCON and OpenStack
Июль 26th, 2010
The past two weeks have been both exciting and extremely busy, first traveling to Austin, TX for the first OpenStack Design Summit, and then back home to Portland, OR for The O’Reilly Open Source Conference (OSCON) and Community Leadership Summit. The events were great in different ways, and there was some overlap with OpenStack since we announced it on the first day of OSCON and created quite a bit of buzz around the conference. I want to comment on a few things that came up during these two weeks.
New Role
I’m now focusing on OpenStack related projects at Rackspace. I’m no longer working on Drizzle, but I will still be involved in the MySQL and database ecosystems through future projects and conferences (see you at OpenSQL Camp). I will also still be working on a couple of Gearman related projects in my spare time. At OSCON I gave two presentations on Gearman and Drizzle, you can find the slides here.
The Five Steps to Open
One question that came up a few times over the past couple weeks is what the term “Open” means when a business or organization decides to adopt the open source philosophy. It turns out this means many different things to folks, and when an organization decides to go open, they need to make a decision on how open they are willing to be. Here are the various layers we’ve seen over the years:
- Open API – You’ve decided to take the first step to being open and released a well documented API to work with your web service or project. Everything behind the API is still a black-box though.
- Open Core – Beyond the APIs, you’ve decided to release part of the code open source, but you still keep some of the bits proprietary in an attempt to keep a competitive advantage. This is a hot debate lately on whether it is a viable Open Source business model.
- Open Source – You’ve decided keeping some code proprietary doesn’t help, and actually even hurts your project or adoption. You put all of the code out in the open for everyone to see. While everyone can see all of the source code, there still isn’t a whole lot of interaction going on.
- Open Development – Putting the source code out wasn’t enough. You want to enable users and external developers to be able to file bugs, submit patches, and track the development process to see what to expect next. This usually involves running your project on a public project site such as github or Launchpad.
- Open Decision Making – You’ve postponed the inevitable for long enough. Feature requests and bug reports are pouring in, and the community wants to have a say in what gets prioritized. Should we focus only on stability? Performance? New features? Porting to mobile platforms? Let the community decided the direction of the project.
There have been examples of success for organizations who have stopped at each of these steps. Given the proper environment, any can work. My preference is to work on projects that are fully open, where company and organizational boundaries do not exist between developers and users. I’m thrilled to say that we’ve gone all in with OpenStack. We’re hosted on Launchpad and have a governance structure that allows all parties within the community to have a say in the future of the project.
Preventing Vendor Lock-in
During the Cloud Summit at OSCON, there was a debate titled: “Are Open APIs Enough to Prevent Lock-in?”. Most folks came to the conclusion that the answer is “no,” and I agree. While I feel open APIs are necessary, they are by no means sufficient. Even if a project is open source and allows for open development, it probably will not prevent vendor lock-in. The key is to provide some incentive for vendors to adopt and invest resources within a project. Much like customers don’t want vendor lock-in when choosing a platform, vendors do not want project or feature lock-in when choosing the software to power their business. Each vendor who chooses to participate must have the ability to voice their opinion on the direction of APIs, features, and other project priorities. This is why it is critical that any open source project must take all the steps described above to give the project a chance of being adopted and becoming the de facto standard. There is of course no guarantee that adoption and prevention of vendor lock-in will happen, but I see them as necessary steps.
This is another area where OpenStack has done the correct thing. We are planning on having another developer summit in November, and then once every six months after that time. All design discussions and decision making will happen in public forums such as the mailing list and IRC. We want all participants in the community to have a chance to respond to topics being discussed, and we believe the more we have, the more successful the project will be. Having many voices allows the project to be more applicable to different environments. For example, Rackspace and NASA have different requirements for their compute architectures, but they also share many components as well. Through open participation we can ensure all needs are accounted for. Much like the LAMP stack has powered universities, governments, and competing business, we hope OpenStack can do the same.
Contributor License Agreement (CLA)
During the past couple of weeks a few folks asked what the CLA was all about. When the foundations of OpenStack were forming, the requirement of having a CLA came up from the legal side. Having been involved with open source projects that had very invasive CLAs, initially I had quite a bit of concern. The CLA is actually quite innocuous, and it does NOT require assignment or dual-ownership of copyright. You are the sole owner of code you contribute. For all intents and purposes it is a signed version of the Apache 2.0 license, the CLA just makes these terms more explicit. The CLA is handled through digital signatures, so no papers, pens, or faxing is required.
Get Involved!
Expect to see more posts on my blog related to OpenStack topics. If you would like to get involved, you can join the IRC channel (#openstack on irc.freenode.net), join the mailing list, or start contributing code! There are even jobs around OpenStack popping up already!
PlanetMySQL Voting: Vote UP / Vote DOWN
OpenSQLCamp Boston Pages are online
Июнь 23rd, 2010OpenSQLCamp is less than 4 months away, and I have finally gotten around to updating the site. Special thanks go to Bradley Kuzsmaul and the folks at Tokutek for getting the ball rolling and making the reservation at MIT. Using MIT means that we will have *free* reliable wireless guest access and projects.
OpenSQL Camp is a free unconference for people interested in open source databases (MySQL, SQLite, Postgres, Drizzle), including non-relational databases, database alternatives like NoSQL stores, and database tools such as Gearman. We are not focusing on any one project, and hope to see representatives from a variety of open source database projects attend. As usual I am one of the main organizers of Open SQL Camp (in previous years, Baron Schwartz, Selena Deckelmann and Eric Day have been main organizers too; this year Bradley Kuzsmaul is the other main organizer). The target audience are users and developers, but others are encouraged to attend too. There will be both presentations and hackathons, with plenty of opportunities to learn, contribute, and collaborate!
I have updated the main Boston 2010 page at http://opensqlcamp.org/Events/Boston2010/ with travel and logistics information, including links to:
Register — it’s free and easy, and you can always change your mind later!
Maybe you have an idea for a session you would like to see, or a session you would like to give? If so, you can note it on the sessions page. This will give everyone a sense of what type of presentations will be there. I have started by putting 2 sessions I am willing to give and a third at the bottom for one I’d like to see, to give everyone an idea of both types of descriptions.
Probably the most important link right now is the way we keep OpenSQLCamp free for all attendees – sponsor or donate to the conference! Any donation amount is accepted, and all donations are tax-exempt to the fullest extent of the law. Businesses and organizations will be listed as sponsors if they make a donation of $250 or more, and individuals will be listed as sponsors if they make a donation of $100 or more. More information on sponsor benefits, including where to send a graphic to, at the link.
There is a preliminary schedule, up until the conference itself it will only show the agenda of the conference — how many rooms and what time the presentations are supposed to be. During and after the conference we will update this schedule page with the titles, presenters and links to any notes/videos/audio taken.
If you have any questions, please do not hesitate to ask on the mailing list or by posting a comment here.
PlanetMySQL Voting: Vote UP / Vote DOWN
Threads with Events
Апрель 20th, 2010Last week I was surprised to see this paper bubble back up on Planet MySQL. It describes the pros and cons of thread and event based programming for high concurrency applications (like a web server), arguing that thread-based programming is superior if you use an appropriate lightweight threading implementation. I don’t entirely disagree with this, but the problem is such a library does not exist that is standard, portable, and useful for all types of applications. We have POSIX threads in the portable Linux/Unix/BSD world, so we need to work with this. Other experimental libraries based on lightweight threads or “fibers” are really interesting as they can maintain your stack without all the normal overhead, but it is hard to get the scheduling correct for all application types. I would even argue that thread and event based programming is actually not all that different, it’s just a matter of how state is maintained (stack vs state variables) and how scheduling is performed.
The comparisons done in that paper also put a C-based web server using a co-routine threading library against a Java based server that depends on the poll() system call. I’m sorry, but this is comparing apples to oranges. First, you’re in the Java VM with a number of runtime components (like garbage collection) which may be getting in the way. Also, the standard poll() system call is not an efficient event-handling mechanism, it’s much better to use epoll or some other Kernel-based handling mechanism.
One high-concurrency userland threading implementation I do like is in Erlang. Erlang processes are extremely lightweight and I’ve written apps that depend heavily on them. One interesting application I saw was caching objects where each object got it’s own Erlang process. This put a whole new spin on cache management, and it looked like it could actually scale reasonably well. The “problem” with Erlang, which may or may not be a problem depending on your requirements, is that it is still a bit of overhead running byte-code in a VM, as well as it being a functional language. I love functional programming, but I’ve found it still ties most developer’s heads in knots if they don’t have a reason to use it regularly. For open source projects trying to build a contributor community, it can act as one more hurdle.
So, what is the “best” paradigm?
Back in 2000 some colleagues and I wrote a hybrid thread-event library that would create one event-handler instance per thread, and connections would be spread across the pool of event-handling threads. I believe this gave the best of both worlds, and I saw high throughputs with fairly minimal overhead. I wrote a number of servers based on this architecture, including HTTP, IMAP, POP3, and DNS, and with each server type this model proved to be efficient and scalable. Ultimately the best architecture depends on your application. If you never intend to have many connections, and your applications has long-running computations, one-thread-per-connection would probably be best. If you need to handle large numbers of connections and have short, non-blocking request processing, event-based scales extremely well. You can of course create a hybrid of these two and have all connections managed by event threads and asynchronous queues to dedicated processing threads for heavy request handling (this is sort of what I did in the C Gearman Job Server).
There is no single correct answer, so take a look at your options before deciding how to approach your own applications. Don’t be afraid to create hybrids as well. Regardless of which paradigm you choose, concurrent programming can be hard, especially at the lower levels. There have been a number of higher level abstractions to help developers, from new libraries to new languages, but most of these come with a cost in performance or flexibility. When you need to squeeze every bit of performance out of your application, you will most likely end up in C or C++ dealing with these issues directly.
This is actually one of the problems I’m attempting to address with the Scale Stack Event modules. I’m trying to create a healthy level of abstraction on hybrid thread/event based applications so you don’t have any overhead or limitations while a lot of the common headaches are taken care of for you. If you have a need for such a system, get in touch, I’d be interested to talk. Since it is BSD licensed you can use it in any application, including commercial.
PlanetMySQL Voting: Vote UP / Vote DOWN
MySQL Conference Review
Апрель 17th, 2010Oracle gave the opening keynote and it went pretty much like I thought it would. Oracle said they will keep MySQL alive. They talked about the new 5.5 release. It was pretty much the same keynote Sun gave last year. Time will tell what Oracle does with MySQL.
The expo hall was sparse. Really sparse. There were a fraction of the booths compared to the past. I don't know why the vendors did not come. Maybe because they don't want to compete with Oracle/Sun? In the past you would see HP or Intel have a booth at the conference. But, with Oracle/Sun owning MySQL, why even try. Or maybe they are not allowed? I don't know. It was just sad.
I did stop by the Maatkit booth and was embarrassed to tell Baron (its creator) I was not already using it. I had heard people talk about it in the past, but never stopped to see what it does. It would have only saved me hours and hours of work over the last few years. Needless to say it is now being installed on our servers. If you use MySQL, just go install Maatkit now and start using it. Don't be like me. Don't wait for years, writing the same code over and over to do simple maintenance tasks.
Gearman had a good deal of coverage at the conference. There were three talks and a BoF. All were well attended. Some people seemed to have an AHA! moment where they saw how Gearman could help their architecture. I also got to sit down with the PECL/gearman maintainers and discuss the recent bug I found that is keeping me from using it.
I spoke about Memcached as did others. Again, there was a BoF. It was well attended and people had good questions about it. There seemed to be some FUD going around that memcached is somehow inefficient or not keeping up with technology. However, I have yet to see numbers or anything that proves any of this. They are just wild claims by people that have something to sell. Everyone wants to be the caching company since there is no "Memcached, Inc.". There is no company in charge. That is a good thing, IMO.
That brings me to my favorite topic for the conference, Drizzle. I wrote about Drizzle here on this blog when it was first announced. At the time MySQL looked like it was moving forward at a good pace. So, I had said that it would only replace MySQL in one part of our stack. However, after what, in my opinion, has been a lack of real change in MySQL, I think I may have changed my mind. Brian Aker echoed this sentiment in his keynote address about Drizzle. He talked about how MySQL AB and later Sun had stopped focusing on the things that made MySQL popular and started trying to be a cheap version of Oracle. That is my interpretation of what he said, not his words.
Why is Drizzle different? Like Memcached and Gearman, there is no "Drizzle, Inc.". It is an Open Source project that is supported by the community. It is being supported by companies like Rackspace who hired five developers to work on it. The code is kept on Launchpad and is completely open. Anyone can create a branch and work on the code. If your patches are good, they will be merged into the main branch. But, you can keep your own branch going if you want to. Unlike the other forks, Drizzle has started over in both the code and the community. I personally see it as the only way forward. It is not ready today, but my money is on Drizzle five or ten years from now.
PlanetMySQL Voting: Vote UP / Vote DOWN
Expert PHP and MySQL published!
Апрель 15th, 2010For me, I have been more of a Perl guy, however, I like PHP just as well and even prefer how it's web deployment model is much easier (you don't need to modify your httpd config files) was interested in a challenge of writing a book on a different language. Also, some fruits of this being that my project from my Perl book, Narada, now has a PHP port.
I'm proud of this book. It was harder in some ways to write a book with other people than do it all myself, despite far fewer pages. However, the end product took advantage of all our strengths.
Topics covered in this book are (but not limited to!):
* PHP and MySQL techniques every programmer should know
* Advanced PHP concepts such as using iterators, making classes behave like functions, using true lambda functions and closures
* MySQL storage engines
* Using the information schema
* Improving performance through caching - using memcached to add a caching layer to your application
* Writing UDFs
* Writing PHP extensions
* Using the Memcached Functions for MySQL (UDFs)
* Full text search - installing, configuring, and using Sphinx in a PHP application
* Multi-tasking in PHP using Gearman. Narada is used as an example application demonstrating how to put many of the concepts of the book into an application
* Using Apache rewrite rules
* HTTP-based authentication
All-in-all, this is a great book that I hope will benefit PHP/MySQL or any other web developers who need a source of expert information!
I want to thank Andrew, Ronald, Wiley (Bob Elliot, Maureen Spears), Trond Norbye and Eric Day (who tech-edited the book!), and all others who I could write a book in itself who have helped (see my credits in the book!). Also, I want to thank the team at NorthScale, my employer, for being able to work with a team experts who I could refer to while working on this book.
You can buy this book from any major publisher, and from Wiley at http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470563125.html
PlanetMySQL Voting: Vote UP / Vote DOWN
Gearman Releases and Talks at the MySQL Conference
Апрель 6th, 2010I spent some time this weekend fixing up the Gearman MySQL UDFs (user defined functions) and fixed a few bugs in the Gearman Server. You can find links to the new releases on the Gearman website. The UDFs now use Monty Taylor’s pandora-build autoconf files instead of the old fragile autoconf setup that relied on pkgconfig.
If you are attending the MySQL Conference & Expo next week and want to learn more about Gearman, be sure to check out one of the three sessions Giuseppe Maxia and I are giving:
- Getting started with Gearman for MySQL
- Boosting Database Performance with Gearman
- Gearman MySQL hacks, or Everything you wanted to do with a database server and you never dared to hope
Hope to see you there!
PlanetMySQL Voting: Vote UP / Vote DOWN
Gearman Releases and Talks at the MySQL Conference
Апрель 6th, 2010I spent some time this weekend fixing up the Gearman MySQL UDFs (user defined functions) and fixed a few bugs in the Gearman Server. You can find links to the new releases on the Gearman website. The UDFs now use Monty Taylor’s pandora-build autoconf files instead of the old fragile autoconf setup that relied on pkgconfig.
If you are attending the MySQL Conference & Expo next week and want to learn more about Gearman, be sure to check out one of the three sessions Giuseppe Maxia and I are giving:
- Getting started with Gearman for MySQL
- Boosting Database Performance with Gearman
- Gearman MySQL hacks, or Everything you wanted to do with a database server and you never dared to hope
Hope to see you there!
PlanetMySQL Voting: Vote UP / Vote DOWN
Logging with MySQL
Март 24th, 2010SQL is really good at retrieving a set of data based on a key or range of keys. Whereas NoSQL products are really good at writing things and retrieving one item from storage. When looking at redoing our architecture a few years ago to be more scalable, I had to consider these two issues. For what it is worth, the NoSQL market was not nearly as mature as it is now. So, my choices were much more limited. In the end, we decided to stick with MySQL. It turns out that a primary or unique key lookup on a MySQL/InnoDB table is really fast. It is sort of like having a key/value storage system. And, I can still do range based queries against it.
But, back to Dathan's problem: clicks. We store clicks at dealnews. Lots of clicks. We also store views. We store more views than we do clicks. So, lots of views and lots of clicks. (Sorry for the vague numbers, company secrets and all. We are a top 1,000 Compete.com site during peak shopping season.) And we do it all in MySQL. And we do it all with one server. I should disclose we are deploying a second server, but it is more for high availability than processing power. Like Dathan, we only use about the last 24 hours of data at any given time. There are three keys for us doing logging like this in MySQL.
Use MyISAM
MyISAM supports concurrent inserts. Concurrent inserts means that inserts can add rows to the end of a table while selects are being performed on other parts of the data set. This is exactly the use case for our logging. There are caveats with range queries as pointed out by the MySQL Performance Blog.
Rotating tables
MySQL (and InnoDB in particular) really sucks at deleting rows. Like, really sucks. Deleting causes locks. Bleh. So, we never delete rows from our logging tables. Instead, nightly we rotate the tables. RENAME TABLE is an (near) atomic process in MySQL. So, we just create a new table.
create table clicks_new like clicks;
rename table clicks to clicks_2010032500001, clicks_new to clicks;
Tada! We now have an empty table for today's clicks. We now drop any table with a date stamp that is longer than x days old. Drops are fast, we like drops.
For querying these tables, we use UNION. It works really well. We just issue a SHOW TABLES LIKE 'clicks%' and union the query across all the tables. Works like a charm.
Gearman
So, I get a lot of flack at work for my outright lust for Gearman. It is my new duct tape. When you have a scalability problem, there is a good chance you can solve it with Gearman. So, how does this help with logging to MySQL? Well, sometimes, MySQL can become backed up with inserts. It happens to the best of us. So, instead of letting that pile up in our web requests, we let it pile up in Gearman. Instead of having our web scripts write to MySQL directly, we have them fire Gearman background jobs with the logging data in them. The Gearman workers can then write to the MySQL server when it is available. Under normal operating procedure, that is in near real time. But, if the MySQL server does get backed up, the jobs just queue up in Gearman and are processed when the MySQL server is available.
BONUS! Insert Delayed
This is our old trick before we used Gearman. MySQL (MyISAM) has a neat feature where you can have inserts delayed until the table is available. The query is sent to the MySQL server and it answers with success immediately to the client. This means your web script can continue on and not get blocked waiting for the insert. But, MySQL will only queue up so many before it starts erroring out. So, it is not as fool proof as a job processing system like Gearman.
Summary
To log with MySQL:
- Use MyISAM with concurrent inserts
- Rotate tables daily and use UNION to query
- Use delayed inserts with MySQL or a job processing agent like Gearman
PS: You may be asking, "Brian, what about Partitioned Tables?" I asked myself that before deploying this solution. More importantly, in IRC I asked Brian Aker about MySQL partitioned tables. I am paraphrasing, but he said that if I ever think I might alter that table, I would not trust it with the partitions in MySQL. So, that kind of turned me off of them.
PlanetMySQL Voting: Vote UP / Vote DOWN
