Archive for the ‘Gearman’ Category

Drizzling from the Rackspace Cloud

Март 8th, 2010

Since I left Sun back in January, folks have been asking what was next. I’m happy to say that I’m going to continue hacking on open source projects like Drizzle and Gearman, but now at the Rackspace Cloud. Not only will I be there, but I get to continue working closely with a few of the amazing Drizzle hackers who have also joined, including Monty Taylor, Jay Pipes, Stewart Smith, and Lee Bieber.

Why Rackspace Cloud? Late last year I was considering what I wanted to do next with the Oracle acquisition looming near, and this was one of the options that presented itself. Rackspace had been a supporter of Drizzle from early on by offering virtual machines to develop and test on, and when talking to some folks more closely, something really hit home. Rackspace provides first-class service and “fanatical” support – they are not a software company. One might ask why an open source software developer would be interested in a company that doesn’t create software or vice-versa, and the answer is that Rackspace wants to find ways to offer the best possible service now and into the future. What better way than to help develop the next generation of service software and get a jump start into integrating this into their architecture? Both the open source community and Rackspace win.

Another thing I learned while talking with Rackspace is that one of their core principles is transparency. This applies to both customer and employees, and anyone within an open source community can appreciate this. The more I learned about the company and the folks within it, the more impressed I was at the lack of internal barriers or “need-to-know” information. One of Drizzle’s core goals is also transparency, from discussing design decisions on public mailing lists and IRC, to having the entire project management infrastructure hosted out in the open at Launchpad.

What does this mean for the Drizzle project? It means continued support for a number of core developers, more infrastructure for development, and most importantly in my eyes, more context. One of the Drizzle tag-lines is “A Lightweight SQL Database for Cloud and Web,” so what better place to develop a database designed for the cloud than on one of the fastest growing cloud platforms. We’ll get a detailed look at the demands, get feedback from cloud customers, and have the perfect test bed for offering new services. We’ll also be able to work closely with a top-notch group of DBAs, developers, and sysadmins in one of the most demanding service architectures out there. This invaluable context will help the Drizzle developers make more informed decisions moving forward, which also means better software for the community.

Personally, this also means getting back to my hosting roots. Before Sun, I worked at Concentric for almost 10 years in a clustered hosting environment. I’m very familiar with many of the multi-tenant scalability concerns Rackspace has, and I’m excited to be working in this type of environment again. We’ve already been working closely with the MySQL DBAs at Rackspace to learn what the biggest pain points are for a multi-tenant architecture, and we’ll be taking steps to address these as it will help anyone wanting to run Drizzle in a cloud-like environment. Drizzle’s modular architecture has already proved useful, as some of these concerns are easily answered with “oh, we have a plugin point for that.”

I’m excited, this is going to be a fun ride.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Conf & Drizzle Dev Day

Февраль 18th, 2010

I’m glad to announce that we’ll be having a Drizzle developer day again this year on the Friday after the MySQL Conference! Be sure to sign up and add any topic ideas you may have so we know what folks are interested in. Space is limited!

While at the MySQL Conference, I’ll be speaking with Monty Taylor on “Using Drizzle.” This will take a non-developer approach to the project, so everyday DBAs and web developers should find this interesting. I’ll also be teaming up with Giuseppe Maxia to talk about Gearman in three sessions. These include:

We’re also going to have a combo Drizzle/Gearman booth in the expo hall, so be sure to stop by and chat. See you there!


PlanetMySQL Voting: Vote UP / Vote DOWN

Gearman meets MySQL Cluster (NDBAPI)

Январь 20th, 2010
After a discussion with my colleague Stephane Varoqui we decided to see how Gearman and the NDBAPI could be used together. The result of the POC was a Gearman worker and a couple of clients (clients and workers use Google Protocol Buffers as the protocol). The worker can:
  • set/get/delete records on a single table in MySQL Cluster using the primary key
  • set/get/delete "any" type. It is not possible to dynamically add types but this is done at compile time.
  • supports the following SQL data types: (UNSIGNED) INTEGER, (UNSIGNED) BIGINT, CHAR, VARCHAR/VARBINARY
  • support the following Google Protocol Buffer scalars: int32, uint32, int64, uint64, string, bytes.
  • not handle much errors for the time being
and a client that can
  • create a message and send it to the Gearman Job Server
  • clients ca n be written in either C++, Java, or Python (subject to what languages that Google Protocol Buffers supports)
  • receive (deserialize) the data.
So basically this is a new, albeit simple, connector to MySQL Cluster! Hopefully someone will find it useful.

The code can be downloaded here and some short instructions are here, and if you guys out there thinks this is usable, then it might make it to launchpad. Let me know!

Here follows some information what has been done and how to use this.

First you have to create the relation tables (engine=ndb). I will use a 'Person' (for the rest of the examples) that I will persist to the database. I have created the following relational table:
CREATE TABLE `Person` (
`id` int(11) NOT NULL,
`name` varchar(128) DEFAULT NULL,
`message` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
The relational tables needs to be translated to Google Protocol Buffers:
> cat proto/Person.proto
message Person {
required int32 id = 1; //PRIMARY KEY attributes are 'required'
optional string name = 2;
optional string message = 3;
}
All columns in the relational table must exist in the protocol buffer definition.

There is union-proto buffer called NdbMessage.proto that then contains all the proto buffers that can be sent between the client and worker:
> cat proto/NdbMessage.proto
// import the relevant protos that can be in
// the NdbMessage
import "Person.proto";

message NdbMessage {
enum Type { Person=1;}
// Identifies which field is filled in.
required Type type = 1;
// list of possible protos comes here:
optional Person person = 2;
}

The proto files then needs to be run through the protocol buffer compiler:
> /usr/local/bin/protoc --proto_path=proto --cpp_out=`pwd` proto/Person.proto proto/NdbMessage.proto
# this generates .h and .cc files for Proto buffer files.
There are three clients for this, one for each operation (set,get,delete).

In the ndbapi_set_client.cpp (the clients are based on the reverse_client.cpp in gearman) we invoke the 'ndbapi_set' function, that will be executed by the worker:

#include "NdbMessage.pb.h"
#include "Person.pb.h"

.
/**instantiate a NdbMessage object and associate a Person to it*/
NdbMessage m;
m.set_type(NdbMessage_Type_Person);
Person * p=m.mutable_person();
/* I must set all fields for now */
p->set_id(1);
p->set_name("Johan Andersson");
p->set_message("hello world, my first insert");

string s;
m.SerializeToString(&s);

...

const char * data = s.data();
result= (char*)gearman_client_do(&client,
"ndbapi_set",
NULL,
(void *)data,
(size_t)p.ByteSize(),
&result_size,
&ret);

The worker (ndbapi_worker.cpp) registers three functions:
ret=gearman_worker_add_function(&worker, "ndbapi_get", 0, ndbapi_get,NULL);
ret=gearman_worker_add_function(&worker, "ndbapi_set", 0, ndbapi_set,NULL);
ret=gearman_worker_add_function(&worker, "ndbapi_delete", 0, ndbapi_delete,NULL);

And when the worker receives the function call to 'ndbapi_set' it has to deserialize the received message into a NdbMessage:

static void *ndbapi_set(gearman_job_st *job,
void *context,
size_t *result_size,
gearman_return_t *ret_ptr)
{
/** receive message and convert into c++ string
* construct the wanted object (dataObject) from parsing c++ string.
*/
const void * rawmessage;
rawmessage= gearman_job_workload(job);
string s((char*)rawmessage, gearman_job_workload_size(job));

NdbMessage dataObject;
if(! dataObject.ParseFromString(s))
{
*ret_ptr= GEARMAN_WORK_FAIL;
return NULL;
}

The worker then looks at the type of the message and gets the underlying object (in this case Person):

google::protobuf::Message * message;
switch(dataObject.type())
{
case NdbMessage_Type_Person:
{
message= (google::protobuf::Message*)&dataObject.person();
reflection = (google::protobuf::Reflection *)message->GetReflection();
descriptor = (google::protobuf::Descriptor*)message->GetDescriptor();
}
break;
/*
case NdbMessage_Type_MyType:
{
// the myType() .. is the name of the field in MyType.proto:
// MyType myType = ;
message= (google::protobuf::Message*)&dataObject.myType();
reflection = (google::protobuf::Reflection *)message->GetReflection();
descriptor = (google::protobuf::Descriptor*)message->GetDescriptor();
}
break;
*/
default:
cout << "unknown type: "<< ret_ptr=" GEARMAN_WORK_FAIL;"> the insert was successful */
*result_size=0;
*ret_ptr= GEARMAN_SUCCESS;
return NULL;
}

In order to add a new type, you need to add a new 'case' to handle the type and how to get that object from the NdbMessage object (dataObject).

The worker loops over all fields in the received Proto Message and creates a transaction in the NDBAPI and executes it. Thus this part agnostic to the type you give it. As long as the following is true:
  • The relational table only uses (UNSIGNED) INTEGER, (UNSIGNED) BIGINT, CHAR, VARCHAR/VARBINARY data types.
  • The .proto definition contains all columns in the relational table
  • The .proto file marks the PRIMARY KEY of the relational table as 'required'
  • For 'ndbapi_set' you need to set all columns in the table ( i will fix that as soon as possible)
The data is then persisted in the table (currently the worker expects all tables to be stored in the 'test' database):
mysql> select * from Person;
+----+-----------------+------------------------------+
| id | name | message |
+----+-----------------+------------------------------+
| 1 | Johan Andersson | hello world, my first insert |
+----+-----------------+------------------------------+
1 row in set (0.00 sec)


Now the worker can also handle 'get' requests, and by using the get_client:
> ./get_client  1
name: Johan Andersson
message: hello world, my first insert
And there is also a client that does deletes (delete_client):
> ./delete_client  1
Delete successful: id=1
Summary/Conclusions
  • What are the performance implications of using Proto Buffer's reflection mechanism?
  • Proto Buffer only works for C++, Java, and Python - currently no support for PHP.
  • Is it better to use something else than Proto Buffers for this?
  • Gearman was super-easy to install so thanks for that!
  • Google Protocol Buffers was super-easy to install so thanks for that!
  • The worker needs also to be extended to support range searches and to make use of the batching interface so that it is possible persist either many types or many instances of a type in a batch.
  • NO-SQL -- YES-NDB !

PlanetMySQL Voting: Vote UP / Vote DOWN

Multi dimensional cubes in MySQL through Gearman

Январь 20th, 2010

MySQL cubes with Gearman

I gave two presentations about Gearman at the Linux.conf.au. As part of the preparation for these talks, I created several sample applications. One of them, about remote replication administration, I will cover in a separate post. The most amazing one, which I cover here, is a quick and painless solution for multiple level crosstabs in MySQL.

Some background is needed. Crosstabs (also called data cubes or pivot tables, have been one of my favorite hacks for long time. In 2001 I wrote an article about a simple way of doing single level crosstabs. A few years later, I developed a Perl module that generates multiple levels of data cubes in most any database systems. Since then, I have received countless requests to convert this module to PHP, Python, Java, and I have always declined, for lack of time or abilities.
In the coming years, I tackled the same problem using MySQL Proxy and some SQL hacks. Both attempts were not completely satisfactory. The options offered by the Perl module are simply too hard to replicate to any other system.
When I started using Gearman, I realized that I could use the original Perl module through a Gearman worker, without converting to any other language. The idea is to write a simple worker that accepts some parameters and runs the Perl module to return a crosstab query to the client. The query being the most complicated thing to generate, the architecture could look like the image below.

To take the idea one step further, I used the Gearman UDF for MySQL, which makes the crosstab function available at the SQL level, thus being transparent no matter which programming language the client uses, and without need of using the Gearman API.

In this scenario, what you need to do is just querying the worker (through the UDF), with a simple string of parameters.

mysql> set @q = (select gman_do('crosstab',
'from=all_personnel;op=sum salary;rows=country;cols=gender'));

mysql> prepare q from @q; execute q;
+---------+-------+-------+-------+
| country | m | f | total |
+---------+-------+-------+-------+
| Germany | 16000 | 11000 | 27000 |
| Italy | 6000 | 6000 | 12000 |
| UK | 10500 | NULL | 10500 |
| zzzz | 32500 | 17000 | 49500 |
+---------+-------+-------+-------+

To make this work, what's missing is the worker. You can try the sample crosstab worker from MySQL Forge.

PlanetMySQL Voting: Vote UP / Vote DOWN

Multi dimensional cubes in MySQL through Gearman

Январь 20th, 2010

MySQL cubes with Gearman

I gave two presentations about Gearman at the Linux.conf.au. As part of the preparation for these talks, I created several sample applications. One of them, about remote replication administration, I will cover in a separate post. The most amazing one, which I cover here, is a quick and painless solution for multiple level crosstabs in MySQL.

Some background is needed. Crosstabs (also called data cubes or pivot tables, have been one of my favorite hacks for long time. In 2001 I wrote an article about a simple way of doing single level crosstabs. A few years later, I developed a Perl module that generates multiple levels of data cubes in most any database systems. Since then, I have received countless requests to convert this module to PHP, Python, Java, and I have always declined, for lack of time or abilities.
In the coming years, I tackled the same problem using MySQL Proxy and some SQL hacks. Both attempts were not completely satisfactory. The options offered by the Perl module are simply too hard to replicate to any other system.
When I started using Gearman, I realized that I could use the original Perl module through a Gearman worker, without converting to any other language. The idea is to write a simple worker that accepts some parameters and runs the Perl module to return a crosstab query to the client. The query being the most complicated thing to generate, the architecture could look like the image below.

To take the idea one step further, I used the Gearman UDF for MySQL, which makes the crosstab function available at the SQL level, thus being transparent no matter which programming language the client uses, and without need of using the Gearman API.

In this scenario, what you need to do is just querying the worker (through the UDF), with a simple string of parameters.

mysql> set @q = (select gman_do('crosstab',
'from=all_personnel;op=sum salary;rows=country;cols=gender'));

mysql> prepare q from @q; execute q;
+---------+-------+-------+-------+
| country | m | f | total |
+---------+-------+-------+-------+
| Germany | 16000 | 11000 | 27000 |
| Italy | 6000 | 6000 | 12000 |
| UK | 10500 | NULL | 10500 |
| zzzz | 32500 | 17000 | 49500 |
+---------+-------+-------+-------+

To make this work, what's missing is the worker. You can try the sample crosstab worker from MySQL Forge.

PlanetMySQL Voting: Vote UP / Vote DOWN

Using ini files for PHP application settings

Январь 20th, 2010
At dealnews we have three tiers of servers. First is our development servers, then staging and finally production. The complexity of the environment increases at each level. On a development server, everything runs on the localhost: mysql, memcached, etc. At the staging level, there is a dedicated MySQL server. In production, it gets quite wild with redundant services and two data centers.

One of the challenges of this is where and how to store the connection information for all these services. We have done several things in the past. The most common thing is to store this information in a PHP file. It may be per server or there could be one big file like:

<?php

if(DEV){
    $server = "localhost";
} else {
    $server = "10.1.1.25";
}

?>


This gets messy quickly. Option two is to deploy a single file that has the settings in a PHP array. And that is a good option. But, we have taken that one step further using some PHP ini trickeration. We use ini files that are loaded at PHP's startup and therefore the information is kept in PHP's memory at all times.

When compiling PHP, you can specify the --with-config-file-scan-dir to tell PHP to look in that directory for additional ini files. Any it finds will be parsed when PHP starts up. Some distros (Gentoo I know) use this for enabling/disabling PHP extensions via configuration. For our uses we put our custom configuration files in this directory. FWIW, you could just put the above settings into php.ini, but that is quite messy, IMO.

To get to this information, you can't use ini_get() as you might think.  No, you have to use get_cfg_var() instead. get_cfg_var returns you the setting, in php.ini or any other .ini file when PHP was started. ini_get will only return values that are registered by an extension or the PHP core. Likewise, you can't use ini_set on these variables. Also, get_cfg_var will always reflect the initial value from the ini file and not anything changed with ini_set.

So, lets look at an example.

; db.ini
[myconfig]
myconfig.db.mydb.db     = mydb
myconfig.db.mydb.user   = user
myconfig.db.mydb.pass   = pass
myconfig.db.mydb.server = host


This is our ini file. the group in the braces is just for looks. It has no impact on our usage. Because this is parsed along with the rest of our php.ini, it needs a unique namespace within the ini scope. That is what myconfig is for. We could have used a DSN style here, but it would have required more parsing in our PHP code.

<?php

/**
 * Creates a MySQLi instance using the settings from ini files
 *
 * @author     Brian Moon 
 * @copyright  1997-Present dealnews.com, Inc.
 *
 */

class MyDB {

    /**
     * Namespace for my settings in the ini file
     */
    const INI_NAMESPACE = "dealnews";

    /**
     * Creates a MySQLi instance using the settings from ini files
     *
     * @param   string  $group  The group of settings to load.
     * @return  object
     *
     */
    public static function init($group) {

        static $dbs = array();

        if(!is_string($group)) {
            throw new Exception("Invalid group requested");
        }

        if(empty($dbs["group"])){

            $prefix = MyDB::INI_NAMESPACE.".db.$group";

            $db   = get_cfg_var("$prefix.db");
            $host = get_cfg_var("$prefix.server");
            $user = get_cfg_var("$prefix.user");
            $pass = get_cfg_var("$prefix.pass");

            $port = get_cfg_var("$prefix.port");
            if(empty($port)){
                $port = null;
            }

            $sock = get_cfg_var("$prefix.socket");
            if(empty($sock)){
                $sock = null;
            }

            $dbs[$group] = new MySQLi($host, $user, $pass, $db, $port, $sock);

            if(!$dbs[$group] || $dbs[$group]->connect_errno){
                throw new Exception("Invalid MySQL parameters for $group");
            }
        }

        return $dbs[$group];

    }

}

?>


We can now call DB::init("myconfig") and get a mysqli object that is connected to the database we want. No file IO was needed to load these settings except when the PHP process started initially.  They are truly constant and will not change while this process is running.

Once this was working, we created separate ini files for our different datacenters. That is now simply configuration information just like routing or networking configuration. No more worrying in code about where we are.

We extended this to all our services like memcached, gearman or whatever. We keep all our configuration in one file rather than having lots of them. It just makes administration easier. For us it is not an issue as each location has a unique setting, but every server in that location will have the same configuration.

Here is a more real example of how we set up our files.

[myconfig.db]
myconfig.db.db1.db         = db1
myconfig.db.db1.server     = db1hostname
myconfig.db.db1.user       = db1username
myconfig.db.db1.pass       = db1password

myconfig.db.db2.db         = db2
myconfig.db.db2.server     = db2hostname
myconfig.db.db2.user       = db2username
myconfig.db.db2.pass       = db2password

[myconfig.memcache]
myconfig.memcache.app.servers    = 10.1.20.1,10.1.20.2,10.1.20.3
myconfig.memcache.proxy.servers  = 10.1.20.4,10.1.20.5,10.1.20.6

[myconfig.gearman]
myconfig.gearman.workload1.servers = 10.1.20.20
myconfig.gearman.workload2.servers = 10.1.20.21

PlanetMySQL Voting: Vote UP / Vote DOWN

Gearman meets MySQL Cluster (NDBAPI)

Январь 18th, 2010
After a discussion with my colleague Stephane Varoqui we decided to see how Gearman and the NDBAPI could be used together. The result of the POC was a Gearman worker and a couple of clients (clients and workers use Google Protocol Buffers as the protocol). The worker can:
  • set/get/delete records on a single table in MySQL Cluster using the primary key
  • set/get/delete "any" type. It is not possible to dynamically add types but this is done at compile time.
  • supports the following SQL data types: (UNSIGNED) INTEGER, (UNSIGNED) BIGINT, CHAR, VARCHAR/VARBINARY
  • support the following Google Protocol Buffer scalars: int32, uint32, int64, uint64, string, bytes.
  • not handle much errors for the time being
and a client that can
  • create a message and send it to the Gearman Job Server
  • clients ca n be written in either C++, Java, or Python (subject to what languages that Google Protocol Buffers supports)
  • receive (deserialize) the data.
So basically this is a new, albeit simple, connector to MySQL Cluster! Hopefully someone will find it useful.

The code can be downloaded here and some short instructions are here, and if you guys out there thinks this is usable, then it might make it to launchpad. Let me know!

Here follows some information what has been done and how to use this.

First you have to create the relation tables (engine=ndb). I will use a 'Person' (for the rest of the examples) that I will persist to the database. I have created the following relational table:
CREATE TABLE `Person` (
`id` int(11) NOT NULL,
`name` varchar(128) DEFAULT NULL,
`message` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
The relational tables needs to be translated to Google Protocol Buffers:
> cat proto/Person.proto
message Person {
required int32 id = 1; //PRIMARY KEY attributes are 'required'
optional string name = 2;
optional string message = 3;
}
All columns in the relational table must exist in the protocol buffer definition.

There is union-proto buffer called NdbMessage.proto that then contains all the proto buffers that can be sent between the client and worker:
> cat proto/NdbMessage.proto
// import the relevant protos that can be in
// the NdbMessage
import "Person.proto";

message NdbMessage {
enum Type { Person=1;}
// Identifies which field is filled in.
required Type type = 1;
// list of possible protos comes here:
optional Person person = 2;
}

The proto files then needs to be run through the protocol buffer compiler:
> /usr/local/bin/protoc --proto_path=proto --cpp_out=`pwd` proto/Person.proto proto/NdbMessage.proto
# this generates .h and .cc files for Proto buffer files.
There are three clients for this, one for each operation (set,get,delete).

In the ndbapi_set_client.cpp (the clients are based on the reverse_client.cpp in gearman) we invoke the 'ndbapi_set' function, that will be executed by the worker:

#include "NdbMessage.pb.h"
#include "Person.pb.h"

.
/**instantiate a NdbMessage object and associate a Person to it*/
NdbMessage m;
m.set_type(NdbMessage_Type_Person);
Person * p=m.mutable_person();
/* I must set all fields for now */
p->set_id(1);
p->set_name("Johan Andersson");
p->set_message("hello world, my first insert");

string s;
m.SerializeToString(&s);

...

const char * data = s.data();
result= (char*)gearman_client_do(&client,
"ndbapi_set",
NULL,
(void *)data,
(size_t)p.ByteSize(),
&result_size,
&ret);

The worker (ndbapi_worker.cpp) registers three functions:
ret=gearman_worker_add_function(&worker, "ndbapi_get", 0, ndbapi_get,NULL);
ret=gearman_worker_add_function(&worker, "ndbapi_set", 0, ndbapi_set,NULL);
ret=gearman_worker_add_function(&worker, "ndbapi_delete", 0, ndbapi_delete,NULL);

And when the worker receives the function call to 'ndbapi_set' it has to deserialize the received message into a NdbMessage:

static void *ndbapi_set(gearman_job_st *job,
void *context,
size_t *result_size,
gearman_return_t *ret_ptr)
{
/** receive message and convert into c++ string
* construct the wanted object (dataObject) from parsing c++ string.
*/
const void * rawmessage;
rawmessage= gearman_job_workload(job);
string s((char*)rawmessage, gearman_job_workload_size(job));

NdbMessage dataObject;
if(! dataObject.ParseFromString(s))
{
*ret_ptr= GEARMAN_WORK_FAIL;
return NULL;
}

The worker then looks at the type of the message and gets the underlying object (in this case Person):

google::protobuf::Message * message;
switch(dataObject.type())
{
case NdbMessage_Type_Person:
{
message= (google::protobuf::Message*)&dataObject.person();
reflection = (google::protobuf::Reflection *)message->GetReflection();
descriptor = (google::protobuf::Descriptor*)message->GetDescriptor();
}
break;
/*
case NdbMessage_Type_MyType:
{
// the myType() .. is the name of the field in MyType.proto:
// MyType myType = ;
message= (google::protobuf::Message*)&dataObject.myType();
reflection = (google::protobuf::Reflection *)message->GetReflection();
descriptor = (google::protobuf::Descriptor*)message->GetDescriptor();
}
break;
*/
default:
cout << "unknown type: "<< ret_ptr=" GEARMAN_WORK_FAIL;"> the insert was successful */
*result_size=0;
*ret_ptr= GEARMAN_SUCCESS;
return NULL;
}

In order to add a new type, you need to add a new 'case' to handle the type and how to get that object from the NdbMessage object (dataObject).

The worker loops over all fields in the received Proto Message and creates a transaction in the NDBAPI and executes it. Thus this part agnostic to the type you give it. As long as the following is true:
  • The relational table only uses (UNSIGNED) INTEGER, (UNSIGNED) BIGINT, CHAR, VARCHAR/VARBINARY data types.
  • The .proto definition contains all columns in the relational table
  • The .proto file marks the PRIMARY KEY of the relational table as 'required'
  • For 'ndbapi_set' you need to set all columns in the table ( i will fix that as soon as possible)
The data is then persisted in the table (currently the worker expects all tables to be stored in the 'test' database):
mysql> select * from Person;
+----+-----------------+------------------------------+
| id | name | message |
+----+-----------------+------------------------------+
| 1 | Johan Andersson | hello world, my first insert |
+----+-----------------+------------------------------+
1 row in set (0.00 sec)


Now the worker can also handle 'get' requests, and by using the get_client:
> ./get_client  1
name: Johan Andersson
message: hello world, my first insert
And there is also a client that does deletes (delete_client):
> ./delete_client  1
Delete successful: id=1
Summary/Conclusions
  • What are the performance implications of using Proto Buffer's reflection mechanism?
  • Proto Buffer only works for C++, Java, and Python - currently no support for PHP.
  • Is it better to use something else than Proto Buffers for this?
  • Gearman was super-easy to install so thanks for that!
  • Google Protocol Buffers was super-easy to install so thanks for that!
  • The worker needs also to be extended to support range searches and to make use of the batching interface so that it is possible persist either many types or many instances of a type in a batch.
  • NO-SQL -- YES-NDB !

PlanetMySQL Voting: Vote UP / Vote DOWN

Moving On

Январь 11th, 2010

Friday was my last day at Sun Microsystems, and today is the first day at my new job (location coming soon). I’ve had a great time at Sun, and thank them for all the opportunities given to me there. I’ll be doing mostly the same work at the new gig, working on projects like Drizzle, but with a slightly different focus. For the most part my day-to-day won’t change much.

Right now I’m focusing on libdrizzle again and am implementing the prepared statement API, cleaning up the MySQL protocol support a little, and also implementing the new Drizzle client/server protocol. I’ll continue to work on Gearman as well, especially where it is relevant to Drizzle. I also need to start blogging again with specific topics in the projects I’m working on, I’ve been fairly quiet lately.

I’ll be in New Zealand next week at Linux Conf AU (yes, it’s not in AU this year). I have a talk on Gearman, and it looks like I’ll also be helping out with the Drizzle talk. It will be really nice to escape the Portland, OR winter for a bit. :)


PlanetMySQL Voting: Vote UP / Vote DOWN

Speaking, speaking, speaking: Dubai-Sydney-Wellington

Январь 5th, 2010


(*)

From January 12th to 27th I will be traveling to the Southern Hemisphere and speaking at two user groups and two conferences.
The schedule (see below) is almost scary. I will be talking about Partitioning (Dubai and Wellington), MySQL Sandbox (Sydney and Wellington), Gearman (Wellington), and some general topics now and then.

The complete schedule and location follows:

(*) I know. The world map is upside down. That is how you would see it if people in the Southern Hemisphere had started drawing maps before the ones in the North.

PlanetMySQL Voting: Vote UP / Vote DOWN

Non-blocking State Machines

Октябрь 8th, 2009

If you’ve ever done any non-blocking programming (usually for socket I/O), you’ve probably had to come up with a non-trivial state machine to handle all the places where everything can pause. Say you’re reading an application level packet from a socket, and half way through the read() system call it screams EAGAIN. You need to stop, save any state, and exit out of whatever chain of functions got you there so the calling application can regain control. I’m going to explain a few techniques I’ve come up with over the years, each with their strengths and weaknesses, and I hope this will spur some conversation of what other folks have done. While I’m fairly happy with how I handle these state machines now, but I’m always looking for a more succinct way of handling things. Please share your thoughts!

Switch Statements

The obvious way to handle non-blocking I/O is with one or more switch statements. Say we need to check the status of something by sending a request over a TCP connection, possibly connecting to the remote host first, and then reading the response. Here is a bit of pseudo-code that demonstrates how this could work (ignoring some error cases, efficient buffer handling, and non-blocking connect cases):

int check_status(struct connection *con)
{
  switch (con->state)
  {
  case CONNECTION_STATE_NONE:
    getaddrinfo(...);
    con->fd = socket(...);
    /* Fall through to next state. */

  case CONNECTION_STATE_CONNECT:
    ret = connect(con->fd, ...);
    if (ret == -1 && errno == EAGAIN)
    {
      con->state = CONNECTION_STATE_CONNECT;
      return WAIT_FOR_WRITE;
    }
    /* Fall through to next state. */

  case CONNECTION_STATE_REQUEST:
    ret = write(con->fd, ...);
    if (ret == -1 && errno == EAGAIN)
    {
      con->state = CONNECTION_STATE_REQUEST;
      return WAIT_FOR_WRITE;
    }
    /* Fall through to next state. */

  case CONNECTION_STATE_RESPONSE_HEADER:
    ret = read(con->fd, ...);
    if (ret == -1 && errno == EAGAIN)
    {
      con->state = CONNECTION_STATE_RESPONSE_HEADER;
      return WAIT_FOR_READ;
    }
    /* Save header. */
    /* Fall through to next state. */

  case CONNECTION_STATE_RESPONSE:
    ret = read(con->fd, ...);
    if (ret == -1 && errno == EAGAIN)
    {
      con->state = CONNECTION_STATE_RESPONSE;
      return WAIT_FOR_READ;
    }
    /* Save response. */

    /* Set this here so we skip the connect state next time around. */
    con->state = CONNECTION_STATE_REQUEST;
    break;
  }
}

The first thing you may cringe at is the fall-through cases in switch statements. The alternative is to set a new state at the end of each case, break, and then reevaluate the switch again with that new state (wrapping the above switch in a while loop). I skipped that version since those are some extra ops that are just not necessary. The above machine may be a bit clunky, but it works for simple cases. But what about when you have more complex states that have loops, non-sequential state execution, or nested switch statements? The above has the potential to grow into an unwieldy mess of code. For example, say if we need to read multiple responses back in the last state above, this could be expanded to:

int check_status(struct connection *con)
{
  switch (con->state)
  {
...
    /* Fall through to next state. */

  case CONNECTION_STATE_RESPONSE:
    while (1)
    {
      if (con->need_header)
      {
        ret = read(con->fd, ...);
        if (ret == -1 && errno == EAGAIN)
        {
          con->state = CONNECTION_STATE_RESPONSE;
          return WAIT_FOR_READ;
        }
        /* Save header. */
        con->need_header = false;
      }

      ret = read(con->fd, ...);
      if (ret == -1 && errno == EAGAIN)
      {
        con->state = CONNECTION_STATE_RESPONSE;
        return WAIT_FOR_READ;
      }
      /* Save response. */
      if (last_response)
        break;
      con->need_header = true;
    }

    /* Set this here so we skip the connect state next time around. */
    con->state = CONNECTION_STATE_REQUEST;
    break;
  }
}

As you can see, another state variable has been added as a boolean (con->need_header). What if responses were not made up of simple header and body? What if there are more nested levels? We can add more switch statements and start breaking this up some into nested functions to make it more readable, but the complexity is still there. For non-trivial non-blocking state machines, this approach is not scalable.

Nested switch/while Statements

Early on in my C years I stumbled upon Duff’s Device. At first I was confused, is that even valid C? Oh, it compiles! Then I was offended. Eventually it clicked and I appreciated the cleverness of the code. Nesting while/for/if with switch statements. I went off to re-write my non-blocking state machines with this new trick:

int check_status(struct connection *con)
{
  switch (con->state)
  {
...
    /* Fall through to next state. */

    while (1)
    {
  case CONNECTION_STATE_RESPONSE_HEADER:
      ret = read(con->fd, ...);
      if (ret == -1 && errno == EAGAIN)
      {
        con->state = CONNECTION_STATE_RESPONSE_HEADER;
        return WAIT_FOR_READ;
      }
      /* Save header. */
      /* Fall through to next state. */

  case CONNECTION_STATE_RESPONSE:
      ret = read(con->fd, ...);
      if (ret == -1 && errno == EAGAIN)
      {
        con->state = CONNECTION_STATE_RESPONSE;
        return WAIT_FOR_READ;
      }
      /* Save response. */
      if (last_response)
        break;
    }

    /* Set this here so we skip the connect state next time around. */
    con->state = CONNECTION_STATE_REQUEST;
    break;
  }
}

Yup, that’s correct. Shove that while loop right in there. Think of it this way: write your state machine as you would if it were blocking, nesting as deep as you need with for/if/while statements. Next, put a switch around the entire thing, and toss a case statement in wherever something could hit a non-blocking condition (regardless of scope or nesting level). Some folks have commented this feels a lot like using gotos, but I disagree. With switch, you have structure, and compiler warnings for when things are missing (like a case). Sure, it may not be the most elegant solution, but you avoid the nested switch statements and multiple state variables. I still use this today for some things (like inside of the Gearman C server and library), but only when they are fairly simple state machines.

Function Pointer Stack

Last year I started writing a non-blocking C library for MySQL. When I head about Drizzle, I decided to focus my effort there (while keeping the MySQL compatibility), and renamed it to libdrizzle. Today it supports the Drizzle protocol and the most common parts of the MySQL protocol. The protocols for these projects are a bit more involved, so when I began writing the library, I went through a few iterations of state machines and didn’t find anything I was happy with. After some brainstorming I came up with an alternative design, I usually refer to it as a “function pointer stack” or “callback stack”. Please let me know if you have seen something like this and point me to the proper name. :)

This works by creating a traditional stack (LIFO structure) of function pointers. When a state needs to be executed, push it on, when a state is complete, it can pop itself off. It’s similar to a program execution stack, but maintained in user space and state is kept so you know where things left off. Still not getting it? Lets look at the code. First, one quick note about function pointer typedefs:

typedef int (state_fn)(struct connection *con);

These are not required of course, but it makes things a bit more legible. This is saying ’state_fn’ is now a type that points to a function with the given signature. It’s a lot easier that having to write the function signature out every time you have a variable of this type.

Now, the code:

typedef int (state_fn)(struct connection *con);

struct connection
{
  ...
  state_fn *state_stack[STACK_SIZE];
  int state_current;
};

/* These functions operation on the function pointer stack. */
static inline bool state_none(struct connection *con)
{
  return con->state_current == 0;
}

static inline void state_push(struct connection *con, state_fn *function)
{
  assert(con->state_current < STACK_SIZE);
  con->state_stack[con->state_current]= function;
  con->state_current++;
}

static inline void state_pop(struct connection *con)
{
  con->state_current--;
}

int state_run(struct connection *con)
{
  int ret;

  while (!state_none(con))
  {
    ret= con->state_stack[con->state_current - 1](con);
    if (ret)
      return ret;
  }

  return 0;
}

/* These are the states that can be pushed onto the stack. */
int start_state(struct connection *con)
{
  getaddrinfo(...);
  con->fd = socket(...);
  state_pop(con);
  state_push(con, connect_state);
  return 0;
}

int connect_state(struct connection *con)
{
  ret = connect(con->fd, ...);
  if (ret == -1 && errno == EAGAIN)
    return WAIT_FOR_WRITE;
  state_pop(con);
  return 0;
}

int request_state(struct connection *con)
{
  if (not connected)
  {
    state_push(con, start_state);
    return 0;
  }

  ret = write(con->fd, ...);
  if (ret == -1 && errno == EAGAIN)
    return WAIT_FOR_WRITE;
  state_pop(con);
  state_push(con, response_header_state);
  return 0;
}

int response_header_state(struct connection *con)
{
  ret = read(con->fd, ...);
  if (ret == -1 && errno == EAGAIN)
    return WAIT_FOR_READ;
  /* Save header. */
  state_pop(con);
  state_push(con, response_state);
  return 0;
}

int response_state(struct connection *con)
{
  ret = read(con->fd, ...);
  if (ret == -1 && errno == EAGAIN)
    return WAIT_FOR_READ;
  /* Save response. */
  state_pop(con);
  if (have_more_responses)
    state_push(con, response_header_state);
  return 0;
}

/* Here is a function you would make public in the API to start the state machine. */
int check_status(struct connection *con)
{
  /* If we are coming back into this after a blocking event,
     make sure we don't push a new state on again. */
  if (state_none(con))
    state_push(con, request_state);

  return state_run(con);
}

As you can see, we still start in the check_status function, but push a state if there is no state and then go into our run loop. You can follow along the various functions (sort of like a choose your own adventure book) but eventually you should end up with an empty stack. When this happens, the state_run() function returns 0 and the call is complete.

This may be a bit overkill for such a simple state machine, but as your state execution flow becomes non-sequential (random jumps, recursion, …) the power and flexibility of this design becomes apparent. And what? No switch statements? As far as performance is concerned, you may have more function calls, but you are eliminating jumps (those nested if/switches). For example, if your state is five levels deep and you need to keep pausing and returning to that point, you hit all those switch statements every time. With the above approach? You jump directly into the function you left off in. I’m not sure which one is faster in general (really depends on application), but the cost of switches vs function calls will be insignificant compared to what normal applications are actually doing (like system calls for I/O).

I have working C and C++ examples of what a complete state machine looks like. There is also some micro-benchmarking numbers in there comparing C vs C++ (you take a hit in C++ due to inheritance, but that cost is fairly insignificant).

Thoughts?
Another choice I didn’t bother to mention is to have one thread per connection and let it block, but that doesn’t scale. What methods have you used to solve this? Do the nested switch/whiles offend you? Are the function pointer stacks elegant or spaghetti code?


PlanetMySQL Voting: Vote UP / Vote DOWN