Archive for the ‘Drizzle’ Category

HOWTO screw up launching a free software project

Июль 27th, 2010

Josh Berkus gave a great talk at linux.conf.au 2010 (the CFP for linux.conf.au 2011 is open until August 7th) entitled “How to destroy your community” (lwn coverage). It was a simple, patented, 10 step program, finely homed over time to have maximum effect. Each step is simple and we can all name a dozen companies that have done at least three of them.

Simon Phipps this past week at OSCON talked about Open Source Continuity in practice – specifically mentioning some open source software projects that were at Sun but have since been abandoned by Oracle and different strategies you can put in place to ensure your software survives, and check lists for software you use to see if it will survive.

So what can you do to not destroy your community, but ensure you never get one to begin with?

Similar to destroying your community, you can just make it hard: “#1 is to make the project depend as much as possible on difficult tools.

#1 A Contributor License Agreement and Copyright Assignment.

If you happen to be in the unfortunate situation of being employed, this means you get to talk to lawyers. While your employer may well have an excellent Open Source Contribution Policy that lets you hack on GPL software on nights and weekends without a problem – if you’re handing over all the rights to another company – there gets to be lawyer time.

Your 1hr of contribution has now just ballooned. You’re going to use up resources of your employer (hey, lawyers are not cheap), it’s going to suck up your work time talking to them, and if you can get away from this in under several hours over a few weeks, you’re doing amazingly well – especially if you work for a large company.

If you are the kind of person with strong moral convictions, this is a non-starter. It is completely valid to not want to waste your employers’ time and money for a weekend project.

People scratching their own itch, however small is how free software gets to be so awesome.

I think we got this almost right with OpenStack. If you compare the agreement to the Apache License, there’s so much common wording it ends up pretty much saying that you agree you are able to submit things to the project under the Apache license.  This (of course) makes the entire thing pretty redundant as if people are going to be dishonest about submitting things under the Apache licnese there’s no reason they’re not going to be dishonest and sign this too.

You could also never make it about people – just make it about your company.

#2 Make it all about the company, and never about the project

People are not going to show up, do free work for you to make your company big, huge and yourself rich.

People are self serving. They see software they want only a few patches away, they see software that serves their company only a few patches away. They see software that is an excellent starting point for something totally different.

I’m not sure why this is down at number three… it’s possibly the biggest one for danger signs that you’re going to destroy something that doesn’t even yet exist…

#3 Open Core

This pretty much automatically means that you’re not going to accept certain patches for reasons of increasing your own company’s short term profit. i.e. software is no longer judged on technical merits, but rather political ones.

There is enough politics in free software as it is, creating more is not a feature.

So when people ask me about how I think the OpenStack launch went, I really want people to know how amazing it can be to just not fuck it up to begin with. Initial damage is very, very hard to ever undo. The number of Open Source software projects originally coming out of a company that are long running, have a wide variety of contributors and survive the original company are much smaller than you think.

PostgreSQL has survived many companies coming and going around it, and is stronger than ever. MySQL only has a developer community around it almost in spite of the companies that have shepherded the project. With Drizzle I think we’ve been doing okay – I think we need to work on some things, but they’re more generic to teams of people working on software in general rather than anything to do with a company.


PlanetMySQL Voting: Vote UP / Vote DOWN

OSCON and OpenStack

Июль 26th, 2010

The past two weeks have been both exciting and extremely busy, first traveling to Austin, TX for the first OpenStack Design Summit, and then back home to Portland, OR for The O’Reilly Open Source Conference (OSCON) and Community Leadership Summit. The events were great in different ways, and there was some overlap with OpenStack since we announced it on the first day of OSCON and created quite a bit of buzz around the conference. I want to comment on a few things that came up during these two weeks.

New Role

I’m now focusing on OpenStack related projects at Rackspace. I’m no longer working on Drizzle, but I will still be involved in the MySQL and database ecosystems through future projects and conferences (see you at OpenSQL Camp). I will also still be working on a couple of Gearman related projects in my spare time. At OSCON I gave two presentations on Gearman and Drizzle, you can find the slides here.

The Five Steps to Open

One question that came up a few times over the past couple weeks is what the term “Open” means when a business or organization decides to adopt the open source philosophy. It turns out this means many different things to folks, and when an organization decides to go open, they need to make a decision on how open they are willing to be. Here are the various layers we’ve seen over the years:

  • Open API – You’ve decided to take the first step to being open and released a well documented API to work with your web service or project. Everything behind the API is still a black-box though.
  • Open Core – Beyond the APIs, you’ve decided to release part of the code open source, but you still keep some of the bits proprietary in an attempt to keep a competitive advantage. This is a hot debate lately on whether it is a viable Open Source business model.
  • Open Source – You’ve decided keeping some code proprietary doesn’t help, and actually even hurts your project or adoption. You put all of the code out in the open for everyone to see. While everyone can see all of the source code, there still isn’t a whole lot of interaction going on.
  • Open Development – Putting the source code out wasn’t enough. You want to enable users and external developers to be able to file bugs, submit patches, and track the development process to see what to expect next. This usually involves running your project on a public project site such as github or Launchpad.
  • Open Decision Making – You’ve postponed the inevitable for long enough. Feature requests and bug reports are pouring in, and the community wants to have a say in what gets prioritized. Should we focus only on stability? Performance? New features? Porting to mobile platforms? Let the community decided the direction of the project.

There have been examples of success for organizations who have stopped at each of these steps. Given the proper environment, any can work. My preference is to work on projects that are fully open, where company and organizational boundaries do not exist between developers and users. I’m thrilled to say that we’ve gone all in with OpenStack. We’re hosted on Launchpad and have a governance structure that allows all parties within the community to have a say in the future of the project.

Preventing Vendor Lock-in

During the Cloud Summit at OSCON, there was a debate titled: “Are Open APIs Enough to Prevent Lock-in?”. Most folks came to the conclusion that the answer is “no,” and I agree. While I feel open APIs are necessary, they are by no means sufficient. Even if a project is open source and allows for open development, it probably will not prevent vendor lock-in. The key is to provide some incentive for vendors to adopt and invest resources within a project. Much like customers don’t want vendor lock-in when choosing a platform, vendors do not want project or feature lock-in when choosing the software to power their business. Each vendor who chooses to participate must have the ability to voice their opinion on the direction of APIs, features, and other project priorities. This is why it is critical that any open source project must take all the steps described above to give the project a chance of being adopted and becoming the de facto standard. There is of course no guarantee that adoption and prevention of vendor lock-in will happen, but I see them as necessary steps.

This is another area where OpenStack has done the correct thing. We are planning on having another developer summit in November, and then once every six months after that time. All design discussions and decision making will happen in public forums such as the mailing list and IRC. We want all participants in the community to have a chance to respond to topics being discussed, and we believe the more we have, the more successful the project will be. Having many voices allows the project to be more applicable to different environments. For example, Rackspace and NASA have different requirements for their compute architectures, but they also share many components as well. Through open participation we can ensure all needs are accounted for. Much like the LAMP stack has powered universities, governments, and competing business, we hope OpenStack can do the same.

Contributor License Agreement (CLA)

During the past couple of weeks a few folks asked what the CLA was all about. When the foundations of OpenStack were forming, the requirement of having a CLA came up from the legal side. Having been involved with open source projects that had very invasive CLAs, initially I had quite a bit of concern. The CLA is actually quite innocuous, and it does NOT require assignment or dual-ownership of copyright. You are the sole owner of code you contribute. For all intents and purposes it is a signed version of the Apache 2.0 license, the CLA just makes these terms more explicit. The CLA is handled through digital signatures, so no papers, pens, or faxing is required.

Get Involved!

Expect to see more posts on my blog related to OpenStack topics. If you would like to get involved, you can join the IRC channel (#openstack on irc.freenode.net), join the mailing list, or start contributing code! There are even jobs around OpenStack popping up already!


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Server Protocol Bug

Июль 25th, 2010

A few months ago I wrote a tool that verified MySQL and Drizzle protocol compatibility, along with testing for all sorts of edge cases. In analyzing protocol command interactions in mysqld, I found that the MySQL server will happily read an infinite amount of data if you exceed the maximum packet size while using a special sequence of protocol packets. The reasoning behind this behavior is so that the server can be polite and flush your data before sending a “max packet exceeded” error message, but perhaps there should be a limit to one’s politeness. What’s more interesting is that you can do this during the client handshake packet without authorization, so anyone could do this to any open MySQL server. The appropriate thing to do here would be to set some maximum limit of data to read and force a connection close when it is reached, otherwise your bandwidth and CPU could be consumed (essentially a DoS attack).

This portion of code was ripped out entirely in Drizzle, so there are no risks there. I submitted this as a bug to MySQL and MariaDB back in February and they both have patches available to fix this as well. You can find the bug here and a patch here. If you have publicly accessible MySQL or MariaDB servers, you probably want to upgrade binaries or patch this.


PlanetMySQL Voting: Vote UP / Vote DOWN

PBMS version 0.5.015 beta has been released.

Июль 23rd, 2010
A new release of the PrimeBase Media Streaming daemon is now available for download at
http://www.blobstreaming.org .

This release doesn't contain any major new features just some bug fixes and a lot of house keeping changes.

If you look at the download section on http://www.blobstreaming.org you will see that there are now more packages that can be downloaded. I have separated out different client side components from the PBMS project and created separate launchPad projects for each one. You can see them listed in the "Related Links" side panel to the right of this post.

  • The "PBMS Client Library" facilitates communication with the PBMS daemon. This library is independent of the PBMS daemon's host server and can be used to communicate with a PBMS daemon hosted by the MySQL or Drizzle database servers.
  • The "PBMS PHP extension" is a PHP module that enables PHP to connect directly to the PBMS engine and stream BLOB data in and out of a MySQL or Drizzle database.
  • The "Streaming enabled JDBC Driver" is a streaming enable version of the standard MySQL Connector/J, JDBC Driver.
One minor new feature that was added is that the PBMS HTTP server how understands the HTTP "range" header that can be used to request a section of BLOB data. The Client Library and PHP extension have both been updated with pbms_get_data_range() functions.


PlanetMySQL Voting: Vote UP / Vote DOWN

BlitzDB Crash Safety and Auto Recovery

Июль 22nd, 2010

Crash Safety is a big deal in the database league. Lack of durability can lead to all sorts of terrible things upon a catastrophic event. Many projects, especially in the so called NoSQL world compromises crash safety in return for higher QPS. The argument there is that the availability of the overall system should be accomplished by replication since a database server can’t be rescued if the physical disk breaks. I happen to agree with this philosophy but I am also aware that this isn’t a correct answer for everyone. So, what will I do with BlitzDB?

Several relational database hackers have pointed out that BlitzDB isn’t any safer than MyISAM since it doesn’t guarantee crash safety. This is currently true but I plan on making BlitzDB much safer than MyISAM by providing following features.

  1. Auto Recovery Routine (startup option)
  2. Tokyo Cabinet’s Transaction API (table-specific option)

The second feature above would actually guarantee BlitzDB to be crash safe (especially combined with auto recovery) but I won’t get into depth in this post since this topic deserves a blog post of it’s own. Let me just state that this feature will be provided in a form like this:

CREATE TABLE t1 (
  a int PRIMARY KEY,
  b varchar(256)
) ENGINE = BLITZDB, CRASH_SAFE;

From here on, I’ll cover how I plan on hacking auto recovery in BlitzDB.

Auto Recovery Challenges

As I blogged a while back, recovering Tokyo Cabinet is relatively simple. However, this is not a sufficient solution in BlitzDB since the data file (hash database that actually holds the rows) and the index file(s) are independent from each other. That is, the likelihood of the data file and the index file(s) to be inconsistent is very high after a crash. So, how can we hack on this? Pretty simple.

Indexes aren’t Important at Recovery Phase

Because BlitzDB logically separates the data file and it’s indexes, index files aren’t that important. If a server crash had occurred, BlitzDB could delete the index file(s) and recompute them from the data file. Needless to say, this process would involve a lot of random access and computation but it would not dominate the time space of the system since it’s a one-time cost. This approach however has one flaw in it such that the index files can’t be recomputed if the data file is broken or is unrecoverable.

Therefore to guarantee crash safety, BlitzDB must ensure that the data file is unbreakable. This is precisely where Tokyo Cabinet’s Transaction API comes in. I’m planning on using it to protect the data file from breaking. If the data file is protected, the table can be rescued. Simple!

So, that’s what I have in mind for making BlitzDB a safer engine. Unfortunately I can’t start hacking on it immediately since I have several bugs to fix first. Nevertheless I’m looking forward to start hacking on it. This challenge should be quite fun to tackle.


PlanetMySQL Voting: Vote UP / Vote DOWN

A tale of a bug…

Июль 22nd, 2010

So I sometimes get asked if we funnel back bug reports or patches back to MySQL from Drizzle. Also, MariaDB adds some interest here as they are a lot closer (and indeed compatible with) to MySQL. With Drizzle, we have deviated really quite heavily from the MySQL codebase. There are still some common areas, but they’re getting rarer (especially to just directly apply a patch).

Back in June 2009, while working on Drizzle at Sun, I found a bug that I knew would affect both. The patch would even directly apply (well… close, but I made one anyway).

So the typical process of me filing a MySQL bug these days is:

  • Stewart files bug
  • In the next window of Sveta being awake, it’s verified.

This happened within a really short time.

Unfortunately, what happens next isn’t nearly as awesome.

Namely, nothing. For a year.

So a year later, I filed it in launchpad for MariaDB.

So, MariaDB is gearing up for a release, it’s a relatively low priority bug (but it does have a working, correct and obvious patch), within 2 months, Monty applied it and improved the error checking around it.

So MariaDB bug 588599 is Fix Committed (June 2nd 2010 – July 20th 2010), MySQL Bug 45377 is still Verified (July 20th 2009 – ….).

(and yes, this tends to be a general pattern I find)

But Mark says he gets things through… so yay for him.2


PlanetMySQL Voting: Vote UP / Vote DOWN

At OSCON

Июль 20th, 2010

I’m at OSCON this week. Come say hi and talk Drizzle, Rackspace, cloud, photography, vegan food or brewing.


PlanetMySQL Voting: Vote UP / Vote DOWN

linux.conf.au 2011 CFP Open!

Июль 15th, 2010

Head on over to http://lca2011.linux.org.au/ and check it out!

You’ve got until August 7th to put in a paper, miniconf, poster or tutorial.

Things I’d like to see come from my kinda world:

  • topics on running large numbers of machines
  • latest in large scale web infrastructure
  • latest going on in the IO space: (SSD, filesystems, SSD as L2 cache)
  • Applications of above technologies and what it means for application performance
  • Scalable and massive tcp daemons (i.e. Eric should come talk on scalestack)
  • exploration of pain points in current technologies and discussion on ways to fix them (from people really in the know)
  • A Hydra tutorial: starting with stock Ubuntu lucid, and exiting the tutorial with some analysis running on my project.
  • Something that completely takes me off guard and is awesome.

I’d love to see people from the MySQL, Drizzle and Rackspace worlds have a decent presence. For those who’ve never heard of/been to an LCA before: we reject at least another whole conference worth of papers. It’s the conference on the calendar that everything else moves around.


PlanetMySQL Voting: Vote UP / Vote DOWN

PBMS is in the Drizzle tree!

Июль 8th, 2010
If you haven't already heard PBMS is now part of the Drizzle tree.

Getting it there was a fair bit of work but not as much as I had thought it would be. The process of getting it to work with Drizzle and running it thorough Hudson has improved the code a lot. It is amazing what some compilers will catch that others will let by. I am now a firm believer in treating all compiler warnings as errors.

I am just in the process of updating the PBMS plugin so that it will build and install the PBMS client library (libpbmscl.so) as well as the plugin. The PBMS client library is a standalone library that can be used to access the PBMS daemon weather it is running as part of MySQL or Drizzle. So a PBMS client library built with Drizzle can be used to access a PBMS daemon running as part of MySQL and vice-versa.

There is also PHP extension for PBMS that is basically just a wrapper for the library. Currently this is part of the PBMS project on launchpad but I am working on getting it into pecl. The PHP extension has a set of test cases with it which is what I use to test PBMS with.

If anybody is interested in taking on the task of creating a python module for PBMS I would be happy to provide what ever help you may need. I think it would just be a wrapper around the PBMS client library almost identical to the PHP extension. I would recoment just taking the PHP extension and converting it.

Now that I have PBMS in Drizzle I am planning on getting replication working with PBMS. I have decided that the best way to do this is to do a bit of work rearranging how PBMS uses the BLOB URLs to reference the BLOB data. The URLs already contain a server, database, and table id so the plan is to change things so that PBMS can handle BLOB URLS from other servers being inserted or referenced. Once this is working then I will automatically have replication and 95% of what is needed to support clustered servers.

I will go into detail on this in a later posting which will include pretty pictures.

Barry

PlanetMySQL Voting: Vote UP / Vote DOWN

PBMS in Drizzle

Июль 8th, 2010

Some of you may have noticed that blob streaming has been merged into the main Drizzle tree recently. There are a few hooks inside the Drizzle kernel that PBMS uses, and everything else is just in the plug in.

For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.

This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).

This is a great addition to the AlsoSQL arsenal of Drizzle. I’m looking forward to it advancing and being adopted (now much easier that it’s in the main repository)


PlanetMySQL Voting: Vote UP / Vote DOWN