Archive for the ‘api’ Category

Python Interface to MySQL

Сентябрь 26th, 2011
There has been a lot of discussions lately about various non-SQL languages that provide access to databases without having to resort to using SQL. I wondered how difficult it would be to implement such an interface, so as an experiment, I implemented a simple interface in Python that similar to the document-oriented interfaces available elsewhere. The interface generate SQL queries to query the database, but does not require any knowlegdge of SQL to use. The syntax is inspired by JQuery, but since JQuery works with documents, the semantics is slightly different.

A simple example would look like this:

from native_db import *
server = Server(host='127.0.0.1')
server.test.t1.insert({'more': 3, 'magic': 'just a test', 'count': 0})
server.test.t1.insert({'more': 3, 'magic': 'just another test', 'count': 0})
server.test.t1.insert({'more': 4, 'magic': 'quadrant', 'count': 0})
server.test.t1.insert({'more': 5, 'magic': 'even more magic', 'count': 0})
for row in server.test.t1.find({'more': 3}):
  print "The magic is:", row['magic']
server.test.t1.update({'more': 3}, {'count': 'count+1'})
for row in server.test.t1.find({'more': 3}, ['magic', 'count']):
  print "The magic is:", row['magic'], "and the count is", row['count']
server.test.t1.delete({'more': 5})
The first line define a server to communicate with, which is simply done by creating a Server object with the necessary parameters. The constructor accepts the normal parameters for Connector/Python (which is what I'm using internally), but the user defaults to whatever getpass.getuser() returns, and the host default to 127.0.0.1, even though I've provided it here.

After that, the necessary methods are overridden so that server.database.table will refer to the table with name table in database with name database on the given server. One possibility would be to just skip the database and go directly on the table (using some default database name), but since this is just an experiment, I did this instead. After that, there are various methods defined to support searching, inserting, deleting, and updating.

Since this is intended to be a simple interface, autocommit is on. Each of the functions generate a single SQL statement, so they will be executed atomically if you're using InnoDB.

table.insert(row)
This function will insert the contents of the dictionary into the table. using the keys of the dictionary as column names. If the table does not exist, it will be created with a "best effort" guess of what types to use for the columns.
table.delete(condition)
This function will remove all rows in the table that matches the supplied dictionary. Currently, only equality mapping is supported, but see below for how it could be extended.
table.find(condition, fields="*")
This will search the table and return an iterable to the rows that match condition. If fields is supplied (as a list of field names), only those fields are returned.
table.update(condition, update)
This will search for rows matching condition and update each matching row according to the update dictionary. The values of the dictionary is used on the right side of the assignments of the UPDATE statement, so expressions can be given here as strings.

That's all folks!

The code is available at http://mats.kindahl.net/python/native_db.py if you're interested in trying it out. The code is very basic, and there's potential for a lot of extensions. If there's interest, I could probably create a repository somewhere.

Note that this is not a replacement for an ORM library. The intention is not to allow storing arbitrary objects in the database: the intention is to be able to query the database using a Python interface without resorting to using SQL.

I'm just playing around and testing some things out, and I'm not really sure if there is any interest in anything like this, so what do you think? Personally, I have no problems with using SQL, but since I'm working with MySQL on a daily basis, I'm strongly biased on the subject. For simple jobs, this is probably easier to work with than a "real" SQL interface, but it cannot handle as complex queries as SQL can (at least not without extensions).

There is a number of open issues for the implementation (this is just a small list of obvious ones):

Only equality searching supported
Searching can only be done with equality matches, but it is trivial to extend to support more complex comparisons. To allow more complex conditions, the condition supplied to find, delete, and update can actually be a string, in which case it is used "raw".

Conditions could be extended to support something like {'more': '>3'}, or a more object-oriented approach would be to support something similar to {'more': operator.gt(3)}.

No support for indexes
There's no support for indexes yet, but that can easily be added. The complication is what kind of indexes should be generated.

For example, right now rows are identified by their content, but if we want unique rows to be handled as a set? Imagine the following (not supported) query where we insert :

server.test.t1.insert(content with some more=3).find({'more': eq(3)})
In this case, we have to fetch the row identifiers for the inserted rows to be able to manipulate exactly those rows and none other. Not sure how to do this right now, but auto-inventing a row-identifier would mean that tables lacking it cannot be handled naturally.

Creating and dropping tables
The support for creation of tables is to create tables automatically if they do not exist. A simple heuristic is used to figure out the table definition, but this has obvious flaws if later inserts have more fields than the first one.

To support extending the table, one would have to generate an ALTER TABLE statement to "fix" the table.

There is no support for dropping tables... or databases.


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL 5.6 Replication – New Early Access Features

Август 1st, 2011

At OSCON 2011 last week, Oracle delivered more early access (labs) features for MySQL 5.6 replication. These features are focused on better integration, performance and data integrity:

- The Binlog API: empowering the community to seamlessly integrate MySQL with other applications and data stores;

- Binlog Group Commit and Enhanced Multi-Threaded Slaves: continuing to deliver major improvements to replication performance;

- Durable Slave Reads: further enhancing data integrity.

These new features build on the significant replication enhancements announced as part of the MySQL 5.6.2 Development Milestone Release back in April.

We are always listening to our customers and community. And, based on their needs and input, the MySQL engineering team continues to take replication to the next level.

This new functionality is available for evaluation now and can be downloaded today from http://labs.mysql.com/

Highlights for each of the new capabilities are discussed below with links to blogs written by the MySQL engineers, describing implementation and how to get started in evaluating them.

We value your feedback and therefore encourage you to share your input and experiences through the comments sections of this and the blogs referenced below.

Binlog API

The Binlog API empowers the MySQL community to seamlessly integrate MySQL with both new and legacy applications and data stores.

Data volumes around the world now growing at the rate of 40% per annum, driving users to implement more integrated data management technologies to capture, integrate and analyze the trillions of bytes coming daily from web applications, social networking, mobile broadband networks, embedded sensors, etc.

The Binlog API enables developers to reduce the complexity of integration by standardizing their SQL data management operations on MySQL, while replicating data to other applications within their data management infrastructure.

The Binlog API exposes a programmatic C++ interface to the binary log, implemented as a standalone library. 
Using the API, developers can read and parse binary log events both from existing binlog files as well as from running servers and replicate the 
changes into other data stores.

The Binlog API can be evaluated with the MySQL 5.6.2 Development Milestone Release as well as the current GA 5.5 release.

Learn more >> 

You can download the code as part of MySQL Server Snapshot: mysql-5.6-labs-binary-log-api from http://labs.mysql.com

To demonstrate the possibilities of the Binlog API, an example application for replicating changes from MySQL Server to Apache Solr 
 (a full text search server) has been developed. The example is available with the source download on Launchpad 

Enhanced Multi-Threaded Slaves

Multi-threaded slaves were previewed as part of a MySQL 5.6 Labs build in April 2011, delivering significant performance enhancements by allowing updates to be applied in parallel across databases, rather than sequentially.

On-going testing and feedback from the community has enabled Oracle to release an enhanced implementation of multi-threaded slaves including:

- code refactoring;

- bug fixes;

- crash-recovery processes;

- new slave management options - now, when a user issues a STOP SLAVE command, the operation waits until all threads across each database have applied their updates before stopping, ensuring slave consistency. Slaves can still be stopped immediately by killing the SQL thread on the slave;

- improved support for statement-based replication by enhancing the handling of temporary tables;

- simplified configuration through the renaming, reworking and reduction in parameters, without affecting overall tuning ability.

The enhancements delivered as part of this labs release enables the community to extend evaluation and prototyping of new applications using a more robust and feature-complete implementation of multi-threaded slaves.

Learn more >>

You can download the code as part of MySQL Server Snapshot: mysql-5.6-labs-multi-threaded-slave from http://labs.mysql.com

Binlog Group Commit

Designed to improve the performance of MySQL replication, Group Commit applies updates to the binary log in parallel and then commits them as a group to the binlog on disk.

Users have complete control over the frequency of commits to disk – providing two options:

1. Configure the number of transactions that should be grouped together before commit

2. Define the time interval, with millisecond granularity, before the binary log is persisted.

Checksums are used to ensure all events have been written to the binary log.

The current Binlog Group Commit is a snapshot of a work-in-progress.  We have not benchmarked the implementation at this time, and we expect results from that exercise to influence final implementation decisions.  This is a great opportunity for the community to evaluate the implementation and feedback to the MySQL development team.

Learn more >>

You can download the code as part of MySQL Server Snapshot: mysql-5.6-labs-binlog-group-commit from http://labs.mysql.com

Durable Slave Reads

Users now have the option to control when a slave reads the master’s binary log. There are two options:

- Read the binlog as soon as updates are applied to it (with the risk of the data being lost in the event of a master crash).

- Read the binlog only once the updates have been committed to disk, making them read-durable and therefore not risking lost transactions, but meaning the slave will be behind the master.

This new flexibility means users can configure for performance or read-durability, depending on the requirements of their application.

Summary

So in summary, the enhancements delivered by these latest early access features deliver new capabilities and flexibility that benefits almost all users of MySQL replication.

You can evaluate them today by downloading the snapshot from http://labs.mysql.com/

Let us know what you think of these enhancements directly in comments for each blog. We look forward to working with the community to perfect these new features.


PlanetMySQL Voting: Vote UP / Vote DOWN

A more complete look at Storage Engine API

Ноябрь 29th, 2010

Okay… So I’ve blogged many times before about the Storage Engine API in Drizzle. This API is somewhat inherited from MySQL. We have very much attempted to make it a much cleaner interface. Our goals in making changes include: make it much easier to write and maintain a storage engine, make the upper layer code obviously correct and clear in what it’s doing and being able to more easily introduce optimisations.

I’ve recently added a Storage Engine that is only used in testing: storage_engine_api_tester. I’ve blogged on it producing call graphs (really state transition graphs) before both for Storage Engine and Cursor.

I’ve been expanding the test. My test engine is now a wrapper around a real engine instead of just a fake one. This lets us run real queries (and test cases) while testing what’s going on. At some point in the near future I plan to make it so that it will be able to log what calls go on to the engine and produce a graph just of those.

I added a lot more to the Storage Engine part of the wrapper. Below is what you can see is the current graph:

I’ve coded what I consider to be bugs as red and what I consider suspect as blue.

Also for the Cursor (colours mean the same):

As you can see, there’s currently some wacky possibilities. I’m investigating exactly what’s going on here – If I’m somehow missing some calls that I should be wrapping (I don’t think so) or if we are really doing some dumb-ass things in the upper layer.

Also, please do not be under any impression that any of this means that we’re going to have a stable API. We’re not. To stabilise on this would just be insane – way too much of it still makes not much sense.


PlanetMySQL Voting: Vote UP / Vote DOWN

New APIs in HailDB

Октябрь 18th, 2010

In the current HailDB we have a couple of new API calls that you may like:

  • ib_status_get_all()
    Is very similar to ib_cfg_get_all(). This allows the library to add new status variables without applications having to know about them – because we return a list of what there are. For Drizzle, this means that the DATA_DICTIONARY.HAILDB_STATUS table will automatically have any new status variables we add to HailDB without a single extra line of code having to be written.
  • ib_set_panic_handler()
    Having a shared library call exit() is generally considered impolite. Previously, if HailDB hit corruption (or some other nasty conditions), it could call exit() and you’d never get a chance to display a sensible error message to your user (especially bad in a GUI app where the printed to console error message would be unseen). This call allows an application to specify a callback in the case of HailDB entering such a condition. We’ll still be unable to continue (and we strongly advise that you do in fact exit the process in your callback) but you’re at least now able to (for example) pop up a dialog box saying sorry.
  • ib_trx_set_client_data()
    This call lets you associate a void* with a transaction. HailDB keeps this pointer in its transaction data structure and in some callbacks (e.g. ib_set_trx_is_interrupted_handler(), see below) will pass this pointer back to you for you to use to help make a decision. In InnoDB in MySQL, this is the THD. In Drizzle, it’s the Session.
  • ib_set_trx_is_interrupted_handler()
    In various wait conditions (e.g. waiting for a row lock), HailDB will call the callback you set with this function with the client data (set with ib_trx_set_client_data()) to work out if the transaction has been cancelled. This enables an application to implement something like the MySQL/Drizzle KILL command to cancel a transaction in another thread.
  • ib_get_duplicate_key()
    If you just got a duplicate key error, this function will tell you what key it was. This allows you to implement a nicer error message.
  • ib_get_table_statistics()
    This function gives you access to some basic table statistics that HailDB maintains. This includes an approximate row count, clustered index size, total of secondary indexes as well as a “modified counter” which can give you a rough idea about how out of date these statistics are.

All of these are new to HailDB (and weren’t available in embedded_innodb), many in the new 2.3 development release. You can see usage examples both in the HailDB test suite and (for most of them) in the Drizzle HailDB Storage Engine.


PlanetMySQL Voting: Vote UP / Vote DOWN

on TableIdentifier (and the death of path as a parameter to StorageEngines)

Март 23rd, 2010

As anybody who has ever implemented a Storage Engine for MySQL will know, a bunch of the DDL calls got passed a parameter named “path”. This was a filesystem path. Depending on what platform you were running, it may contain / or \ (and no, it’s not consistent on each platform). Add to that the difference if you were creating temporary tables (table name of #sql_somethingsomething) and the difference if you were one of the two (built in) engines that were able to be used for creating internal temporary tables (temp tables that are created during query execution that do not belong in a schema). Well… you had a bit of a mess.

My earlier attempts involved splitting everything up into two strings: schema name and table name. This ended badly. The final architecture we decided on was to have an object passed around that would deal with various transformations (from what the user entered to what we can store on file systems, or to what temporary table maps to what unique name). This is TableIdentifier.

Brian has been introducing it around the code for a while now, and we just got it to now most of the places where table names are passed to Storage Engines. This means that if you’re writing a Storage Engine that doesn’t just blindly store things in files, you can sensibly use the getSchemaName() and getTableName() methods to call your API.


PlanetMySQL Voting: Vote UP / Vote DOWN