Archive for the ‘BLOBS’ Category

New PBMS version

Март 24th, 2011
A new version of PBMS for drizzle has been pushed up to launchpad:

drizzle_pbmsV2

I have rewritten PBMS and changed the way that BLOBs are referenced in order to make PBMS more flexible and to fix some of it's limitations. I have also removed some of the more confusing parts of the code and reorganized it in an attempt to make it easier for people to find there way around it.

So apart form some cosmetic changes what is different?

Maybe the best answer would be to say what hasn't changed: the user and engine API  and the way in which the actual data is stored on the disk remains pretty much unchanged, but everything else has changed.

The best place to start is with the BLOB URL, the old URL looked like this:
"~*1261157929~5-128-6147b252-0-0-37"
the new URL looks like this:
"pbmsAdaVAQCAAAAAAAAAANaVAQCAAAAAAAAAAG30qzsGAAAAAAAAAAEAAACAAAAAAAAAAAEAAAAAAAAA"
which is obviously a lot more intuitive.  :)

OK maybe it is bigger and uglier but it contains a lot more information. It is actually a base64 URL encoding of a data  structure containing information about the BLOB that makes it universally locatable across different PBMS daemons running locally or remotely.

How is this done?

When a BLOB is uploaded to a PBMS daemon the URL generated for it contains, among other things, the PBMS daemon's server id as well as the database ID and the BLOB's repository index value. These 3 values remain with the BLOB for it's life regardless of what server or database it may eventually end up in.  This allows you to insert a BLOB URL from one database into another database, possibly on a different server, and the PBMS engine will be able to use the URL to look up the blob, if it cannot find it in the database's repository then PBMS will automatically fetch the BLOB from the source server or database.

When fetching the BLOB the current server id, database id and index, which are also stored in the URL, are used.

This means that the following will work:

insert into foo.blob_table1 select * from  bar.ablob_table;
insert into foo.blob_table2 select * from  bar.ablob_table;

The first insert will copy the BLOBs from the BLOB repository for database 'bar' into the repository for database 'foo'.

The second insert will will recognize that the BLOBs already exists in foo's BLOB repository and just add references to them.

The same would hold true if database 'foo' was on a different server on the other side of the world.

BLOBs and replication:

A practical use for this is with replication, the replication process replicates the BLOB URLS to the slave server and PBMS pulls the BLOB across automatically. You can try this out using drizzle replication and my drizzle_pbmsV2 branch.

The only thing you need to do is tell the slave server how to map the PBMS server ID to the actual server. You do that by inserting the information into the 'pbms_server' table in the slave machine's pbms database. The master server's PBMS server ID can be found by doing the following select on the master server:
select * from pbms.pbms_server;
Resulting in something like this:
+-------+-----------+------+--------------+
| Id    | Address   | Port | Description  |
+-------+-----------+------+--------------+
| 38358 | localhost | 8080 | This server. |
+-------+-----------+------+--------------+

Then on the slave server do the following insert:
 insert into pbms.pbms_server values(38358, "master_host", 8080, "The master replication server");
where "master_host" will be the same IP address as you have for "master-host" in the drizzle slave config file.

Note: The PBMS server ID is not the same as the drizzle server id.

What else:

With the new design of the PBMS daemon it would not be very difficult to create a stand alone BLOB repository server that could be used as a backup for BLOBs or a a central repository for a cluster of servers. 

The next step though is to update the PBMS documentation and build a version  for MySQL.

PlanetMySQL Voting: Vote UP / Vote DOWN

PBMS version 0.5.015 beta has been released.

Июль 23rd, 2010
A new release of the PrimeBase Media Streaming daemon is now available for download at
http://www.blobstreaming.org .

This release doesn't contain any major new features just some bug fixes and a lot of house keeping changes.

If you look at the download section on http://www.blobstreaming.org you will see that there are now more packages that can be downloaded. I have separated out different client side components from the PBMS project and created separate launchPad projects for each one. You can see them listed in the "Related Links" side panel to the right of this post.

  • The "PBMS Client Library" facilitates communication with the PBMS daemon. This library is independent of the PBMS daemon's host server and can be used to communicate with a PBMS daemon hosted by the MySQL or Drizzle database servers.
  • The "PBMS PHP extension" is a PHP module that enables PHP to connect directly to the PBMS engine and stream BLOB data in and out of a MySQL or Drizzle database.
  • The "Streaming enabled JDBC Driver" is a streaming enable version of the standard MySQL Connector/J, JDBC Driver.
One minor new feature that was added is that the PBMS HTTP server how understands the HTTP "range" header that can be used to request a section of BLOB data. The Client Library and PHP extension have both been updated with pbms_get_data_range() functions.


PlanetMySQL Voting: Vote UP / Vote DOWN

PBMS is in the Drizzle tree!

Июль 8th, 2010
If you haven't already heard PBMS is now part of the Drizzle tree.

Getting it there was a fair bit of work but not as much as I had thought it would be. The process of getting it to work with Drizzle and running it thorough Hudson has improved the code a lot. It is amazing what some compilers will catch that others will let by. I am now a firm believer in treating all compiler warnings as errors.

I am just in the process of updating the PBMS plugin so that it will build and install the PBMS client library (libpbmscl.so) as well as the plugin. The PBMS client library is a standalone library that can be used to access the PBMS daemon weather it is running as part of MySQL or Drizzle. So a PBMS client library built with Drizzle can be used to access a PBMS daemon running as part of MySQL and vice-versa.

There is also PHP extension for PBMS that is basically just a wrapper for the library. Currently this is part of the PBMS project on launchpad but I am working on getting it into pecl. The PHP extension has a set of test cases with it which is what I use to test PBMS with.

If anybody is interested in taking on the task of creating a python module for PBMS I would be happy to provide what ever help you may need. I think it would just be a wrapper around the PBMS client library almost identical to the PHP extension. I would recoment just taking the PHP extension and converting it.

Now that I have PBMS in Drizzle I am planning on getting replication working with PBMS. I have decided that the best way to do this is to do a bit of work rearranging how PBMS uses the BLOB URLs to reference the BLOB data. The URLs already contain a server, database, and table id so the plan is to change things so that PBMS can handle BLOB URLS from other servers being inserted or referenced. Once this is working then I will automatically have replication and 95% of what is needed to support clustered servers.

I will go into detail on this in a later posting which will include pretty pictures.

Barry

PlanetMySQL Voting: Vote UP / Vote DOWN

BLOBs are not just blobs

Апрель 27th, 2010
Recently when talking to someone about PBMS it occurred to me that I had been thinking about BLOBs in the traditional database sense in that they were atomic blocks of data the content of which the server knew nothing about. But with PBMS that need not be the case.

The simplest enhancement would be to allow the client to send a BLOB request to the PBMS daemon with an offset and size to just return a chunk of the BLOB. Depending on the application and the BLOB contents this may make perfectly good sense, why force the client to retrieve the entire BLOB if it only want part of it.

A much more interesting idea would be to enable the user to provide custom server side functions that they could run against the BLOB.

So how would his work?

The PBMS daemon would provide its own "BLOB functions" plugin API. The API would be quite simple where the plugin would register the function names it supports. When the PBMS daemon receives a BLOB request specifying a BLOB function name, it calls the BLOB function passing it a hook to the BLOB data and then returns to the client what ever the function returns.

The first use of this that I can imagine would be to provide a function that would return the thumbnail from a jpeg image rather than the entire image. Other functions may just return the jpeg metadata.

The idea is that BLOBs are not just blobs but are highly structured documents which, given the knowledge of the document structure, it is possible to return portions of the BLOB that are of interest to particular applications.

PlanetMySQL Voting: Vote UP / Vote DOWN

Blobs in MySQL Cluster

Январь 29th, 2010

If there is one thing that confuses people about tables in MySQL Cluster (including me at times) it is BLOB/TEXT columns.  When NDB was originally created it was not designed to handle BLOB data, so the handling of BLOB data was difficult to implement and is sometimes not exactly what users expect.

How MySQL Cluster BLOBs work

When you create a table in MySQL Cluster which has a BLOB column the first 256 bytes of the BLOB is stored in the main table (and in memory when using disk data tables), subsequent data is then stored in a hidden table (typically split into 2KB rows).  This means there is an extra table for every BLOB or TEXT column in your main table (and extra resource usage).

BLOB locking in MySQL Cluster

These extra tables can cause some problems, firstly with performance (retrieving BLOB data is not very fast) and more importantly with locking.  MySQL Cluster works in READ-COMMITTED transaction isolation level, but this makes things difficult when handling BLOB data to keep the tables in-sync and consistent.  When selecting a row which has a BLOB MySQL Cluster needs to gain a shared lock on that row, and when updating that row an exclusive lock is needed.

This can be a problem if, for example, you update a row in one transaction and select it at another at the same time.  The select will wait for the update to complete because it cannot obtain a shared lock until the exclusive lock is cleared.  This then can lead to the temporary error 'Time-out in NDB, probably caused by deadlock'.

Finally there are certain settings that may need to be increased to handle the large amount of data in a big BLOB, most notably SendBufferMemory and in the case of ndbmtd LongMessageBuffer.

The moral of this story?

1. If you can use VARCHAR/VARBINARY instead, this will avoid these problems
2. Be very careful about writing your applications with BLOBs, if they are large it may well be better to keep them stored separately on a SAN and have your application retrieve them
3. Keep your transactions short so locking time is kept to a minimum


PlanetMySQL Voting: Vote UP / Vote DOWN

PBMS will be at the OpenSQL camp in Portland Nov. 14-15

Сентябрь 30th, 2009
I am planning on presenting a session on PBMS at the OpenSQL camp. I am in hopes of have a chance to discuss PBMS with people and find out how they are planning on using it and what features they would like to see in it.

But even if you are not interested in PBMS you should still come if for no other reason than the free pizza!

I am proud to say the PrimeBase Technologies is one of the organizations who's sponsorship money is helping to provide the free pizza.

I will see you all there,

Barry

PlanetMySQL Voting: Vote UP / Vote DOWN