Archive for the ‘NoSQL’ Category

Guide to MySQL & NoSQL, Webinar Q&A

Март 30th, 2012

Yesterday we ran a webinar discussing the demands of next generation web services and how blending the best of relational and NoSQL technologies enables developers and architects to deliver the agility, performance and availability needed to be successful.

Attendees posted a number of great questions to the MySQL developers, serving to provide additional insights into areas like auto-sharding and cross-shard JOINs, replication, performance, client libraries, etc. So I thought it would be useful to post those below, for the benefit of those unable to attend the webinar.

Before getting to the Q&A, there are a couple of other resources that maybe useful to those looking at NoSQL capabilities within MySQL:

- On-Demand webinar (coming soon!)

- Slides used during the webinar

- Guide to MySQL and NoSQL whitepaper 

- MySQL Cluster demo, including NoSQL interfaces, auto-sharing, high availability, etc. 

So here is the Q&A from the event 

Q. Where does MySQL Cluster fit in to the CAP theorem?

A. MySQL Cluster is flexible. A single Cluster will prefer consistency over availability in the presence of network partitions. A pair of Clusters can be configured to prefer availability over consistency. A full explanation can be found on the MySQL Cluster & CAP Theorem blog post. 

Q. Can you configure the number of replicas? (the slide used a replication factor of 1)

Yes. A cluster is configured by an .ini file. The option NoOfReplicas sets the number of originals and replicas: 1 = no data redundancy, 2 = one copy etc. Usually there's no benefit in setting it >2.

Q. Interestingly most (if not all) of the NoSQL databases recommend having 3 copies of data (the replication factor).   

Yes, with configurable quorum based Reads and writes. MySQL Cluster does not need a quorum of replicas online to provide service. Systems that require a quorum need > 2 replicas to be able to tolerate a single failure. Additionally, many NoSQL systems take liberal inspiration from the original GFS paper which described a 3 replica configuration. MySQL Cluster avoids the need for a quorum by using a lightweight arbitrator. You can configure more than 2 replicas, but this is a tradeoff between incrementally improved availability, and linearly increased cost.

Q. Can you have cross node group JOINS? Wouldn't that run into the risk of flooding the network?

MySQL Cluster 7.2 supports cross nodegroup joins. A full cross-join can require a large amount of data transfer, which may bottleneck on network bandwidth. However, for more selective joins, typically seen with OLTP and light analytic applications, cross node-group joins give a great performance boost and network bandwidth saving over having the MySQL Server perform the join.

Q. Are the details of the benchmark available anywhere? According to my calculations it results in approx. 350k ops/sec per processor which is the largest number I've seen lately

The details are linked from Mikael Ronstrom's blog

The benchmark uses a benchmarking tool we call flexAsynch which runs parallel asynchronous transactions. It involved 100 byte reads, of 25 columns each. Regarding the per-processor ops/s, MySQL Cluster is particularly efficient in terms of throughput/node. It uses lock-free minimal copy message passing internally, and maximizes ID cache reuse. Note also that these are in-memory tables, there is no need to read anything from disk.

Q. Is access control (like table) planned to be supported for NoSQL access mode?

Currently we have not seen much need for full SQL-like access control (which has always been overkill for web apps and telco apps). So we have no plans, though especially with memcached it is certainly possible to turn-on connection-level access control. But specifically table level controls are not planned.

Q. How is the performance of memcached APi with MySQL against memcached+MySQL or any other Object Cache like Ecache with MySQL DB?

With the memcache API we generally see a memcached response in less than 1 ms. and a small cluster with one memcached server can handle tens of thousands of operations per second.

Q. Can .NET can access MemcachedAPI?

Yes, just use a .Net memcache client such as the enyim or BeIT memcache libraries.

Q. Is the row level locking applicable when you update a column through memcached API?

An update that comes through memcached uses a row lock and then releases it immediately. Memcached operations like "INCREMENT" are actually pushed down to the data nodes. In most cases the locks are not even held long enough for a network round trip.

Q. Has anyone published an example using something like PHP? I am assuming that you just use the PHP memcached extension to hook into the memcached API. Is that correct?

Not that I'm aware of but absolutely you can use it with php or any of the other drivers

Q. For beginner we need more examples.

Take a look here for a fully worked example

Q. Can I access MySQL using Cobol (Open Cobol) or C and if so where can I find the coding libraries etc?

A. There is a cobol implementation that works well with MySQL, but I do not think it is Open Cobol. Also there is a MySQL C client library that is a standard part of every mysql distribution

Q. Is there a place to go to find help when testing and/implementing the NoSQL access?

If using Cluster then you can use the cluster@lists.mysql.com alias or post on the MySQL Cluster forum

Q. Are there any white papers on this? 

Yes - there is more detail in the MySQL Guide to NoSQL whitepaper

If you have further questions, please don’t hesitate to use the comments below!


PlanetMySQL Voting: Vote UP / Vote DOWN

Simple GUI to edit JSON records in Drizzle

Март 29th, 2012

So yesterday I introduced the newly committed HTTP JSON key-value interface in Drizzle. The next step of course is to create some simple application that would use this to store data, this serves both as an example use case as well as for myself to get the feeling for whether this makes sense as a programming paradigm.

Personally, I have been a fan of the schemaless key-value approach ever since I graduated university and started doing projects with dozens of tables and hundreds of columns in total. Especially in small projects I always found the array structures in languages like PHP and Perl and Python to be very flexible to develop with. As I was developing and realized I need a new variable or new data field somewhere, it was straightforward to just toss a new key-value into the array and continue with writing code. No need to go back and edit some class definition. If I ever needed to find out what is available in some struct, I could always do dump_var($obj) to find out. Even large projects like Drupal get along with this model very well.

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

The CAP theorem and MySQL Cluster

Март 7th, 2012
tldr; A single MySQL Cluster prioritises Consistency in Network partition events. Asynchronously replicating MySQL Clusters prioritise Availability in Network partition events.


I was recently asked about the relationship between MySQL Cluster and the CAP theorem. The CAP theorem is often described as a pick two out of three problem, such as choosing from good, cheap, fast. You can have any two, but you can't have all three. For CAP the three qualities are 'Consistency', 'Availability' and 'Partition tolerance'. CAP states that in a system with data replicated over a network only two of these three qualities can be maintained at once, so which two does MySQL Cluster provide?

Standard 'my interpretation of CAP' section

Everyone who discusses CAP like to rehash it, and I'm no exception. Daniel Abadi has the best CAP write-up that I've read so far, which reframes CAP as a decision about whether to ultimately prioritise availability or data consistency in the event of a network partition. This is how I think of CAP. He also discusses related system behaviour in normal operation which I'll return to later.

While this reframing clarifies CAP, the terms network partition, availability and consistency also need some definition.

Network replicated database

CAP is only really relevant in the context of a network replicated database (or filesystem or state machine). A network replicated database stores copies of data in multiple different systems (database nodes), connected by a network. Data can be read and updated. Updates are propagated to all nodes with replicas via the network. Database clients connect to database nodes via the network to read data and make updates. Replication may occur to improve availability, to improve request latency, or to improve read bandwidth.

Availability

The network replicated database exists to provide services such as Read and Write on the data it stores. Its availability can be measured as the ability of any client to perform any service on any data item.

This Service Availability can be compromised by :
  • Failure of client nodes
  • Network failures between clients and database nodes
  • Network failures between database nodes
  • Failure of database nodes
Client node and networking failures cannot really be considered a property within the control of a database system, so I consider their effects out of the scope of CAP. However, where clients connect to a database node, and that database node is isolated from other database nodes, whether or not those clients are given service is within the scope of CAP.

Service Availability is not binary, it can partially degrade, perhaps by affecting :
  • A subset of all clients
  • A subset of all stored data
  • A subset of request types

The shades of grey within the definition of availability are responsible for most of the arguments around CAP. If we take a strict view - either all services available on all data for all clients, or nothing, then availability is fragile and hard to maintain. If we take a more flexible approach then some service availabilty can be preserved even with a completely decimated network. In the loosest definition, if any client receives any service on any data, then the system is still available. Rather than choose one position, I regard availability as a range from 100% down to 0% for a full outage. Anything in the middle is reduced availability, but it does not mean that the system is not serving its purpose adequately.

Consistency

For consistency to be satisfied, the multiple replicas of data in a network replicated database should behave as though there were only one copy of the data. Simultaneous reads of the same data item from clients connected to different database nodes must always return the same result. Where two or more updates to the same data item are submitted simulteneously, they must be serialised, or one must be rejected, or they must be merged so that a single value results. This one-copy model makes it simple for database clients to use the network replicated database as if it were a single database system with one atomically read/written copy of their data.

If one copy consistency is relaxed, then different database nodes may observably have different values for the same data item simultaneously. Over time the data copies may be aligned, but clients accessing the data must beware that reads may not return the results of the most recently accepted writes. This behaviour may be described as eventual consistency. Providing eventual consistency allows a network replicated database to maximise availability, but pushes the problem of dealing with transient inconsistencies up the stack to user applications. Furthermore there are varying qualities of eventual consistency, with varying guarantees and levels of application support available.

Network Partitions

Network partitions isolate subsets of the nodes of a network replicated database. The interesting property of a network partition is that each node subset cannot tell whether the other node subset(s) are :
  1. dead
  2. alive but isolated from clients
  3. alive and reachable by clients but isolated from us
Not knowing the state of the other subset(s) is what forces a system to decide between maximising service availability and maximising consistency. The interesting case is 3) where some database nodes (potentially containing all or some of the data) are alive elsewhere and have clients connected to them. If those clients are allowed to make writes on data copies stored on those database nodes, then we must lose one copy consistency as we cannot supply those new values in response to a read of our local copy. If those clients are not allowed to make writes then we have degraded service availability for them. Which is it to be? This is the unavoidable choice at the centre of the CAP theorem. Stated this way it seems less of a theorem and more of a fact.

Back to MySQL Cluster - which does it provide?

A single MySQL Cluster prioritises data consistency over availability when network partitions occur.

A pair of asynchronously replicating MySQL Clusters prioritise service availability over data consistency when network partitions occur.

So you can have it both ways with MySQL Cluster - Great!

Single MySQL Cluster - CP

Within a single MySQL Cluster, data is synchronously replicated between database nodes using two-phase commit. Nodes are monitored using heartbeats, and failed or silent nodes are promptly isolated by live and responsive nodes. Where a network partition occurs, live nodes in each partition regroup and decide what to do next :
  • If there are not enough live nodes to serve all of the data stored - shutdown
    Serving a subset of user data (and risking data consistency) is not an option
  • If there are not enough failed or unreachable nodes to serve all of the data stored - continue and provide service
    No other subset of nodes can be isolated from us and serving clients
  • If there are enough failed or unreachable nodes to serve all of the data stored - arbitrate.
    There could be another subset of nodes regrouped into a viable cluster out there.

Arbitration occurs to avoid the split brain scenario where a cluster could theoretically split in two (or more), with each half (or third, or quarter) accepting writes and diverging from the others. In other words, arbitration occurs to preserve consistency.

Arbitration involves :
  • Database nodes agree on an arbitrator in advance
  • During node or network failure handling, no data writes are committed.
  • When arbitration is required due to node failures or network issues, viable node subsets (potential clusters) request permission from the previously agreed arbitrator to provide service.
  • Each request to the arbitrator will result in either : Yes, No or timeout
  • Anything other than Yes results in node shutdown.
  • The arbitrator only says Yes once per election round (First come first served). Therefore the arbitrator only says yes to one potential cluster in a partitioned network.

Note that arbitration is not the same as achieving a quorum. A cluster with three replicas and an arbitrator node can survive the loss of two data nodes as long as the arbitrator remains reachable to the last survivor. The arbitrator role is lightweight as it is not involved in normal traffic. I am surprised that the lightweight arbitrator pattern is not more common.

How does a single MySQL Cluster degrade service availability as a result of network partitions?

Where some subset of data nodes are isolated and shut-down :
  • Those nodes are 100% out of service, until they restart and can rejoin the cluster
    They will attempt to do so automatically
  • Any clients connected only to those nodes are out of service
    By default clients attempt to connect to all data nodes, so partial connectivity issues needn't degrade client availability.
  • The remaining live nodes are 100% in-service
  • Clients connected to the remaining live nodes are 100% in service
Where no subset of data nodes is live
  • All clients experience 100% service loss, until the data nodes restart and can rejoin the cluster
    They will attempt to do so automatically.

A single MySQL Cluster does not degrade to partial data access, or read only modes as a result of network partitions. It does not sacrifice consistency.

How can MySQL Cluster be described as highly available if it sacrifices availability for consistency in the event of a network partition?

Availability is not binary - many types of network partition can erode availability, for some clients, but do not extinguish it. Some set of clients continue to receive 100% service. Only double failures in the network can cause a network partition resulting in full service loss.
Furthermore, network partitions are not the only risks to availability, software errors, power failures, upgrades, overloads are other potential sources of downtime which Cluster is designed to overcome.

Asynchronously replicating clusters - AP


Where two Clusters are asynchronously replicating via normal MySQL Replication, in a circular configuration, reads and writes can be performed locally at both clusters. Data consistency within each cluster is guaranteed as normal, but data consistency across the two clusters is not. On the other hand, availability is not compromised by network partitioning of the two clusters. Each cluster can continue to accept read and write requests to all of the data from any connected client.

Eventual consistency between the clusters is possible when using conflict resolution functions such as NDB$EPOCH_TRANS, NDB$EPOCH, NDB$MAX etc.

How does consistency degrade between replicating MySQL Clusters during a network partition?

This depends on the conflict resolution function chosen, and how detected conflicts are handled. Some details of consistency guarantees provided by NDB$EPOCH et al are described here.

What about normal operation?

Abadi's post introduced his PACELC acronym, standing for something like :

 if (network Partition)
{
trade-off Availability vs Consistency;
}
else
{
trade-off Latency vs Consistency;
}


My first comment has to be that it's bad form to put the common case in an else branch!
However, it is certainly true that the properties during normal operation are usually more important than what happens during a network partition. The ELC section is stating that while all database nodes are present, a network replicated database can choose between minimising request Latency, or maintaining Consistency. In theory this normal operation latency-vs-consistency tradeoff could be completely independent to the Network Partitioning availability-vs-consistency tradeoff, e.g. you could have any of :
  1. PA EL (Partition - Availability, Else - Latency minimisation)
  2. PA EC (Partition - Availability, Else - Consistency)
  3. PC EL (Partition - Consistency, Else - Latency minimisation)
  4. PC EC (Partition - Consistency, Else - Consistency)

The common cases are 1 + 4, where we choose either consistency at all times, or Maximum Availability and Minimum Latency. Case 2 is a system which aims for consistency, but when a network partition occurs, aims for Availability. Case 3 is a system which aims for minimal request Latency, and when a partition occurs aims for consistency.

Examples of systems of each type :
  1. Any eventually consistent system, especially with local-database-node updates + reads
  2. Best-effort consistent systems that degrade in failure modes (e.g. MySQL semi-synchronous replication)
  3. ???
  4. Always consistent systems (e.g. single database instance, single MySQL Cluster)

I am not aware of systems meeting case 3 where normally they minimise latency over consistency, but start choosing consistency after a network partition. Maybe this category should be called 'repentant systems'?

The problem for systems in Cases 1 or 2 - anywhere where Latency minimisation or Availability is chosen over consistency - is the need for user applications to deal with potential inconsistencies. It is not enough to say that things will 'eventually' be consistent. It's important to describe how inconsistent they can be, whether the temporary inconsistencies are values which were once valid, how those values relate to other, connected values etc.

There are certainly applications which can operate correctly with practical eventually consistent databases, but it's not well known how to design applications and schemas to cope with the transient states of an eventually consistent database. The first ORM framework to opaquely support an underlying eventually consistent database may actually be worth the effort to use! A reasonable approach is to design schemas with associated read/modification 'protocols' as if they were abstract data types (ADTs). These ADTs can then have strengths and weaknesses, properties and limitations which make sense in some parts of an application schema where the need to support eventual consistency overcomes the inherent effort and limitations.

Stonebraker and others have commented on network partitions being a minor concern for a well designed datacentre-local network, where redundancy can be reliably implemented. Also the latency cost of maintaining consistency is lower as physical distances are smaller and hop counts are lower. This results in 'CP' systems being attractive at the data centre scale as the need to sacrifice availability due to network partition is rarely dominant, and the latency implications during normal operation are bearable. Perhaps this highlights the need in these theoretical discussions to illustrate theoretically problematic latencies and availabilities with real numbers.

At a wider network scale, latencies are naturally higher, implying that bandwidth is lower. The probability of network partitions of some sort may also increase, due to the larger number of components (and organisations) involved. The factors combine to make 'AP' systems more palatable. The everyday latency cost of consistency is higher, and losing availability due to potentially more frequent network partitions may not be acceptable. Again, real numbers are required to illuminate whether the achievable latencies and probable availability impacts are serious enough to warrant changing applications to deal with eventually consistent data. For a particular application there may or may not be a point at which an AP system would meet its requirements better.

Consistent systems can be scaled across many nodes and high latency links, but the observed operation latency, and the necessary impacts to availability implied by link failure set a natural ceiling on the desirable scale of a consistent system. Paraphrasing John Mashey, "Bandwidth improves, latency is forever". Applications that find the latency and availability constraints of a single consistent system unacceptable, must subdivide their datasets into smaller independent consistency zones and manage potential consistency shear between them.

Finally (another excessively long post), I think the technical and actual merits of widely distributed 'CP' systems are not well known as they have not been commonly available. Many different database systems support some form of asynchronous replication, but few offer synchronous replication, fewer still offer to support it over wide areas with higher latency and fluctuating links. As this changes, the true potential and weaknesses of these technologies, backed by real numbers, will start to appear.

Edit 7/3/12 : Fix bad link

PlanetMySQL Voting: Vote UP / Vote DOWN

Presentation: Databases and the Cloud (and why it is more difficult for databases)

Март 6th, 2012

A week ago I again had the pleasure to give a guest lecture at Tampere University of Technology. I've visited them the first time when I worked as MySQL pre-sales in Sun.

To be trendy, I of course had to talk about the cloud. It turns out every section has the subtitle "...and why it is more difficult for databases". I also rightfully claim to have invented the NoSQL key-value development model in 2005.

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

Deploy MySQL Cluster 7.2 GA in 288 seconds

Февраль 23rd, 2012
It seems that our friends at Oracle have been pretty busy with the GA release of MySQL Cluster 7.2.

This is not just old wine in new bottles.

While it may be a dot release, it does appear to be a more significant step forward than a dot release would imply.

First off, we are very excited to announce that the Severalnines Cluster Configurator now includes support for 7.2 GA.

As the title of this blog suggests, it is possible, as we have experienced, to deploy 7.2 GA in 288 seconds, i.e. just a few minutes. This was done on m1.large instances on Amazon. We have published a quick how-to deployment guide for Amazon here: http://support.severalnines.com/entries/20916823-installing-on-amazon-ec2.

This being said, let’s have a look at some of the new features in 7.2 and why they matter.

MySQL Cluster has always suffered from poor JOIN performance, mainly because the nested loop algorithm did not work very well in a parallel environment. That meant the join was executed entirely in the MySQL Server. This resulted in a lot of network round-trips between the MySQL Server and the data nodes.

Now with 7.2, this has changed. With Adaptive Query Localization, a JOIN query is now shipped down to the data nodes and executed in parallel on local copies of data.


Results look very promising, with up to 70x factor improvement on some queries as described in this excellent post.

By getting better index statistics, the SQL optimizer can also build a better execution plan for each query. This in turn reduces the need for manual query tuning. Prior to 7.2, a lot of index hinting was needed. However, 70x speed improvement does not mean that a single query on MySQL Cluster will perform as fast as on InnoDB, since there will always be the network latency between SQL and Data nodes. But it does mean that many queries that were running very slowly in 7.1 might now run fast enough.

The Memcached API is a new addition to an already extensive list of API connectors, aimed at existing Memcached users and the NoSQL community. For a product trying to be attractive to the NoSQL community, perhaps the addition of distributed parallel JOIN algorithms may be a bit confusing. NoSQL programmers typically avoid JOIN operations, for reasons including scalability and schema flexibility.

Is someone trying to have their cake and eat it too? :-)

Among the other features described here, the separation of Data Nodes across data centers (and specially Figure 3) is misleading. With a 2-phase commit protocol to guarantee synchronous replication within a Node Group, NDB would have to defy the laws of physics to support the below diagram. Unless the data centers are physically close to each other with a low-latency redundant link in-between them, splitting of data nodes across data centers is not a good idea.

It is also worth noting that 7.2 GA is based on MySQL 5.5, whereas 7.1 was based on MySQL 5.1 – probably more relevant for users with mixed NDB and InnoDB tables.

So: is MySQL Cluster 7.2 for you?

If yes, then try it out using the Severalnines Cluster Configurator: http://www.severalnines.com/cluster-configurator/).

It will take just a few minutes to generate a deployment package, which automates the installation of the cluster. Or it might take 288 seconds ... let us know ;-)

PlanetMySQL Voting: Vote UP / Vote DOWN

One billion

Февраль 21st, 2012
As always, I am a little late, but I want to jump on the bandwagon and mention the recent MySQL Cluster milestone of passing 1 billion queries per minute. Apart from echoing the arbitrarily large ransom demand of Dr Evil, what does this mean?

Obviously 1 billion is only of interest to us humans as we generally happen to have 10 fingers, and seem to name multiples in steps of 10^3 for some reason. Each processor involved in this benchmark is clocked at several billion cycles per second, so a single billion is not so vast or fast.

Measuring over a minute also feels unnatural for a computer performance benchmark - we are used to lots of things happening every second! A minute is a long time in silicon.

What's more, these reads are served from tables stored entirely in memory - and everyone knows that main memory is infinitely fast and scalable and always getting cheaper, right?

If we convert to seconds we are left with only 17 million reads per second! Hardly worth getting out of bed for?

On the contrary, I think that achieving 17 million independent random reads per second, each read returning 100 bytes across a network, from a database that also supports arbitrary SQL, row locking, transactions, high availability and all sorts of other stuff, is pretty cool. I doubt that (m)any other similar databases can match this raw performance, though I look forward to being proved wrong.

(Also, don't forget to meet + beat 1.9 million random updates/s, synchronously replicated)

Raw performance is good, but not everyone just needs horsepower. The parallel, independent work on improving join performance (also known as SPJ/AQL) and query optimisation helps more applications harness this power, by improving the efficiency of joins.

I wrote a post about SPJ/AQL at the start of last year, when it was still in the early stages. Since then much has improved, to the extent that the performance improvement factors have become embarrassingly high on real user queries. A further post on the technical details of SPJ/AQL is long overdue... Perhaps the most interesting details are on the integration between the parallel, streaming linked operations and the essentially serialised MySQL Nested Loops join executor. A linked scan and lookup operation can be considered to be a form of parallel hash join, which the normal MySQL NLJ executor can invoke as part of executing a query. Who says Nested Loop joins can't scale?

PlanetMySQL Voting: Vote UP / Vote DOWN

A super-set of MySQL for Big Data. Interview with John Busch, Schooner.

Февраль 20th, 2012
“Legacy MySQL does not scale well on a single node, which forces granular sharding and explicit application code changes to make them sharding-aware and results in low utilization of severs”– Dr. John Busch, Schooner Information Technology A super-set of MySQL suitable for Big Data? On this subject, I have interviewed Dr. John Busch, Founder, Chairman, [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

NoSQL performance numbers — MySQL and Redis

Февраль 18th, 2012

Links to performance numbers posted wrt various NoSQL solutions:

A top 20 global website announced they have migrated from MySQL to Redis. There will be a keynote and everything. It doesn't say how big the Redis Cluster is, but they serve 100M pages / day, and clock 300k Redis queries / second.
https://groups.google.com/forum/?fromgroups#!topic/redis-db/d4QcWV0p-YM

Btw, they mention that MySQL remains as the master data store from which the Redis indexes are generated.
(The reason I don't mention the name of this Redis user is simply I feat my mom is sometimes reading my blog...)

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

More on database consistency

Февраль 16th, 2012
I've written a few times about database consistency before, mainly in conjunction with NoSQL and the concept of Eventual consistency. Now, I'm about to do an update on the subject, as I have come to realize a few things.

From an oldtimer like myself, having been an SQL guy for 25 years, I remember Punk-rock and even The Beatles and I having hair growing out of my ears, what can be contributed? Well, let me beging with stating what I mean when I say Database consistency. What I mean is Consistency as the C in ACID (no, we aren't talking drugs here, we are talking databases). Let's see what the online authorative reference work on just about anything on this planet, from the size of J-Lo's feet to the number of Atoms in the universe (those two numbers are quite far apart by the way), Wikipedia: "The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof." In other words, consistency means that the databas is always in a consistent state, the different data items in it (rows, if you wish) are "in sync" with eachother. I think most of you agree with this notion.

Now, when it comes to NoSQL databases, like MongoDB, this terminology is different. These guys introduced Eventual consistency, which means that the database will eventually reach a consistent state with regards to a specific transaction that changes that "state" of the database. But there are multiple transactions at the same time, and they aren't necessarily, in an Eventual consistentcy model, consistent with eachother as they aren't on the same node. But the theory goes that some time, eventually, they will. If the system never stops, and transactions keep coming, then eventual consistens is determined to happen within 100 ms or less from the point in time when pigs fly. But if you stop all state changing transactions, then the state of the database will reach consistency. Eventually.

Now in NoSQL circles there is a thing called a Consistent read. If my database was consistent, then any read is consistent, right? And in the case of use SQL RDBMS folks, consistency is about the state of the database when I write to it? Well, if you have an eventual consistency model, where you have data distributed all over the place, things are different. To begin with, the basic thing that you have to make sure, and the NoSQL databases do this, is to ensure that the writes to the databases are all in order (we know this from MySQL also, and it is part of the issue with the MySQL slaves, and the NoSQL guys aren't fixing this particular bottleneck). And here we mean they are in order in each and every node. Across nodes, we don't care, which is where I get my abilility to scale out writes from!

A consistent read is a read where the data I am reading is in a consistent state, or sometime that my data is the most recent data. These two aren't always the same, but the second (reading most recent data) typically implies the former, although I assume this is not always the case. This is VERY different from the meaning of Database Consistency as we RDBMS folks look at it. All the same, the concept sure is useful, and as the NoSQL distributed systems doesn't need to keep the data consistent on a global level, a lot of shortcuts can be taken. But having Read Consistency has litte to do with Database Consistency. Your NoSQL fans will complain here and try to tell you that these achieve the same thing, but they don't. Achieving global Database Consistency costs an arm and a leg or two in performance, but the database is ALWAYS consistent.

So two different things, both with advantages and disadvantages, but they are STILL different! And the NoSQL folks will confuse things by allowing you not to have even Read Consistency, somewhat implying that turning it on means you get Database Consistency and that the Read Consistency model (which is very very simple by the way) means you get the effect of Database Consistency using Eventual Consistency. Nope. You don't. Which doesn't make it bad, but IT IS NOT THE SAME THING!

/Karlsson

PlanetMySQL Voting: Vote UP / Vote DOWN

More on database consistency

Февраль 16th, 2012
I've written a few times about database consistency before, mainly in conjunction with NoSQL and the concept of Eventual consistency. Now, I'm about to do an update on the subject, as I have come to realize a few things.

From an oldtimer like myself, having been an SQL guy for 25 years, I remember Punk-rock and even The Beatles and I having hair growing out of my ears, what can be contributed? Well, let me beging with stating what I mean when I say Database consistency. What I mean is Consistency as the C in ACID (no, we aren't talking drugs here, we are talking databases). Let's see what the online authorative reference work on just about anything on this planet, from the size of J-Lo's feet to the number of Atoms in the universe (those two numbers are quite far apart by the way), Wikipedia: "The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof." In other words, consistency means that the databas is always in a consistent state, the different data items in it (rows, if you wish) are "in sync" with eachother. I think most of you agree with this notion.

Now, when it comes to NoSQL databases, like MongoDB, this terminology is different. These guys introduced Eventual consistency, which means that the database will eventually reach a consistent state with regards to a specific transaction that changes that "state" of the database. But there are multiple transactions at the same time, and they aren't necessarily, in an Eventual consistentcy model, consistent with eachother as they aren't on the same node. But the theory goes that some time, eventually, they will. If the system never stops, and transactions keep coming, then eventual consistens is determined to happen within 100 ms or less from the point in time when pigs fly. But if you stop all state changing transactions, then the state of the database will reach consistency. Eventually.

Now in NoSQL circles there is a thing called a Consistent read. If my database was consistent, then any read is consistent, right? And in the case of use SQL RDBMS folks, consistency is about the state of the database when I write to it? Well, if you have an eventual consistency model, where you have data distributed all over the place, things are different. To begin with, the basic thing that you have to make sure, and the NoSQL databases do this, is to ensure that the writes to the databases are all in order (we know this from MySQL also, and it is part of the issue with the MySQL slaves, and the NoSQL guys aren't fixing this particular bottleneck). And here we mean they are in order in each and every node. Across nodes, we don't care, which is where I get my abilility to scale out writes from!

A consistent read is a read where the data I am reading is in a consistent state, or sometime that my data is the most recent data. These two aren't always the same, but the second (reading most recent data) typically implies the former, although I assume this is not always the case. This is VERY different from the meaning of Database Consistency as we RDBMS folks look at it. All the same, the concept sure is useful, and as the NoSQL distributed systems doesn't need to keep the data consistent on a global level, a lot of shortcuts can be taken. But having Read Consistency has litte to do with Database Consistency. Your NoSQL fans will complain here and try to tell you that these achieve the same thing, but they don't. Achieving global Database Consistency costs an arm and a leg or two in performance, but the database is ALWAYS consistent.

So two different things, both with advantages and disadvantages, but they are STILL different! And the NoSQL folks will confuse things by allowing you not to have even Read Consistency, somewhat implying that turning it on means you get Database Consistency and that the Read Consistency model (which is very very simple by the way) means you get the effect of Database Consistency using Eventual Consistency. Nope. You don't. Which doesn't make it bad, but IT IS NOT THE SAME THING!

/Karlsson

PlanetMySQL Voting: Vote UP / Vote DOWN