Archive for the ‘shared-nothing’ Category

ScaleDB: Shared-Disk / Shared-Nothing Hybrid

Июль 25th, 2011
The primary database architectures—shared-disk and shared-nothing—each have their advantages. Shared-disk has functional advantages such as high-availability, elasticity, ease of set-up and maintenance, eliminates partitioning/sharding, eliminates master-slave, etc. The shared-nothing advantages are better performance and lower costs. What if you could offer a database that is a hybrid of the two; one that offers the advantages of both. This sounds too good to be true, but it is fact what ScaleDB has done.

The underlying architecture is shared-disk, but in many situations it can operate like shared-nothing. You see the problems with shared-disk arise from the messaging necessary to (a) ship data among nodes and storage; and (b) synchronize the nodes in the cluster. The trick is to move the messaging outside of the transaction so it doesn’t impact performance. The way to achieve that is to exploit locality. Let me explain.

When using a shared-disk database, if your application or load balancer just randomly sprays the database requests to any node in the cluster, all of the nodes end up sharing all of the data. This involves a lot of data shipping between nodes and messaging to keep track of which node has what data and what they have done to it. This is at the core of the challenge for companies like ours to build shared-disk databases…it ain’t easy. There are many things you can do to optimize performance in such a scenario like local caching, shared cache (we use CAS, Oracle uses CacheFusion), etc. However, the bottom line is that even with these optimizations, random distribution of database requests results in suboptimal database performance for some scenarios.

Once you have solved the worst case scenario of random database requests, you can start optimizing for the intelligent routing of database requests. By this I mean that either the application or the load balancer sends specific database requests to specific nodes in the cluster. Intelligent database request routing results in something we in the shared-database world call locality. The database nodes are able to operate on local data while only updating the rest of the cluster asynchronously. In this scenario, the database nodes, which are still using a shared-disk architecture, operate much more independently, like shared-nothing. As a result, data shipping and messaging are almost completely eliminated, resulting in performance comparable to shared-nothing, while still maintaining the advantages of shared-disk.

The trick is for the database to recognize on-the-fly when the separate nodes can and cannot operate in this independent fashion. This is complicated by the fact that the database must recognize and adapt to locality which can evolve as database usage changes, nodes are added or removed, etc. This is one aspect of the secret sauce that is built into ScaleDB.

Note: Now that we’ve built a shared-disk database that can recognize locality and respond by acting (and performing) like a shared-nothing database, how do we achieve locality? There are many ways to achieve locality. It can be built into the application, or you can rely on a SQL-aware routing/caching solution like those available from Netscaler and Scalarc that handle this for you.


PlanetMySQL Voting: Vote UP / Vote DOWN

Database Architectures & Performance

Июль 20th, 2010
For decades the debate between shared-disk and shared-nothing databases has raged. The shared-disk camp points to the laundry list of functional benefits such as improved data consistency, high-availability, scalability and elimination of partitioning/replication/promotion. The shared-nothing camp shoots back with superior performance and reduced costs. Both sides have a point.

First, let’s look at the performance issue. RAM (average access time of 200 nanoseconds) is considerably faster than disk (average access time of 12,000,000 nanoseconds). Let me put this 200:12,000,000 ratio into perspective. A task that takes a single minute in RAM would take 41 days in disk. So why do I bring this up?

Shared-Nothing: Since the shared-nothing database has sole ownership of its data—it doesn’t share the data with other nodes—it can operate in the machine’s local RAM, only writing infrequently to disk (flushing the data to disk). This makes shared-nothing databases very fast.

Shared-Disk: Cannot rely on the machine’s local RAM, because every write by one node must be instantly available to the other nodes, to ensure that they don’t use stale data and corrupt the database. So instead of relying on local RAM, all write transactions must be written to disk. This is where the 1 minute to 41 days ratio above comes into play and kills performance of shared-disk databases.

Let’s look at some of the ways databases can utilize RAM instead of disk to improve performance:

Read Cache: Databases typically use the RAM as a fast read cache. Upon reading data from the disk, this data is stored in the read cache so that subsequent use of that data is satisfied from RAM instead of the disk. For example, upon reading a person’s name from disk, that name is stored in the cache for fast access. The database wouldn’t need to read that name from disk again until that person’s name is changed (rare), or that RAM space is reused for a piece of data that is used more frequently. Read cache can significantly improve database performance.

BOTH shared-disk and shared-nothing databases can exploit read cache. The shared-disk database just needs a system to either invalidate or update the data in read cache when one of the nodes has made a change. This is pretty standard in shared-disk databases.

Background Writing: Writing data to the disk is by far the most time consuming process in a write transaction. During the transaction, that portion of the data is locked, meaning it is unavailable for other functions. So, if you can move the writing of the data outside of the transaction—write the data in the background—you get faster transactions, which means less locking contention, which means faster throughput.

SHARED-NOTHING can exploit this performance enhancement, since each server owns the data in its RAM. However, shared-disk databases cannot do this because they need to share that updated data with the other database nodes in the cluster. Since the local node’s cache is not shared, in a shared-disk database, the only option is to use the shared disk to share that data across the nodes.

Transactional Cache: The next step in utilizing RAM instead of disk is to use it in a transactional manner. This means that the database can make multiple changes to data in RAM prior to writing the final results to disk. For example, if you have 100 widgets, you can store that inventory count in RAM, and then decrement it with each sale. If you sell 23 widgets, then instead of writing each transaction to disk, you update it in RAM. When you flush this data to disk, it results in a single disk write, writing the inventory number 77, instead of writing each of the 23 transactions individually to disk.

SHARED-NOTHING can perform transactions on data while it is in RAM. Once again, shared-disk databases cannot do this because you might have multiple nodes updating the inventory. Since they cannot look into each others local RAM, they must once again write each transaction to disk.

As you can see, shared-nothing databases have an inherent performance advantage. The next blog post will address how modern shared-disk databases address these performance challenges.

PlanetMySQL Voting: Vote UP / Vote DOWN