Archive for the ‘scaledb’ Category

Cloud DaaS Managed Service Fuels NewSQL Market

Сентябрь 21st, 2011
As public clouds are commoditized, the public cloud vendors are increasingly moving to higher margin and stickier managed services. In the early days of the public cloud, renting compute and storage was unique, exciting, sticky and profitable. It has quickly become a commodity. In order to provide differentiation, maintain margins and create barriers to customer exit, against increasing competition, the cloud is moving toward a collection of managed services.

Public clouds are growing beyond simple compute instances to platform as a service (PaaS). PaaS is then comprised of various modules, including database as a service (DaaS). In the early days you rented a number of compute instances, loaded your database software and you were the DBA managing all aspects of that database. Increasingly, public clouds are moving toward a DaaS model, where the cloud customer writes to a simple database API and the cloud provider is the DBA.

If the database resides in a single server and does not require high-availability, providing that as a managed service is no problem. Of course, if this is the use case, then it is no problem for the customer to manage their own database. In other words, there is little value to a managed service.

The real value-add for the customer, and hence the real price premium, is derived by offering things like auto-scaling across multiple servers, hot backup, high-availability, etc. If the public cloud provider can offer a SQL-based DaaS, where the customer writes to a simple API and everything else is handled for them, that is a tremendous value and customers will pay a premium for it.

While this sounds simple, public cloud companies soon learn that the Devil is in the details. Managing someone else’s database, without insight into their business processes, performance demands, scaling demands, evolving application requirements, and more, is extremely challenging and demands a new class of DBMS. These demands have created a market need that is now being filled by companies using the moniker “NewSQL”.

In short, when it comes to DaaS, public cloud vendors want the following:
• Simple “write to our API and we’ll handle the messy stuff like scaling, HA, etc.”
• Premium value that translates to a higher profit margin business
• Barriers to customer exit

Future posts will delve into the operational demands of DaaS, and how these demands a driving NewSQL DBMS architectures and features.


PlanetMySQL Voting: Vote UP / Vote DOWN

ScaleDB: Shared-Disk / Shared-Nothing Hybrid

Июль 25th, 2011
The primary database architectures—shared-disk and shared-nothing—each have their advantages. Shared-disk has functional advantages such as high-availability, elasticity, ease of set-up and maintenance, eliminates partitioning/sharding, eliminates master-slave, etc. The shared-nothing advantages are better performance and lower costs. What if you could offer a database that is a hybrid of the two; one that offers the advantages of both. This sounds too good to be true, but it is fact what ScaleDB has done.

The underlying architecture is shared-disk, but in many situations it can operate like shared-nothing. You see the problems with shared-disk arise from the messaging necessary to (a) ship data among nodes and storage; and (b) synchronize the nodes in the cluster. The trick is to move the messaging outside of the transaction so it doesn’t impact performance. The way to achieve that is to exploit locality. Let me explain.

When using a shared-disk database, if your application or load balancer just randomly sprays the database requests to any node in the cluster, all of the nodes end up sharing all of the data. This involves a lot of data shipping between nodes and messaging to keep track of which node has what data and what they have done to it. This is at the core of the challenge for companies like ours to build shared-disk databases…it ain’t easy. There are many things you can do to optimize performance in such a scenario like local caching, shared cache (we use CAS, Oracle uses CacheFusion), etc. However, the bottom line is that even with these optimizations, random distribution of database requests results in suboptimal database performance for some scenarios.

Once you have solved the worst case scenario of random database requests, you can start optimizing for the intelligent routing of database requests. By this I mean that either the application or the load balancer sends specific database requests to specific nodes in the cluster. Intelligent database request routing results in something we in the shared-database world call locality. The database nodes are able to operate on local data while only updating the rest of the cluster asynchronously. In this scenario, the database nodes, which are still using a shared-disk architecture, operate much more independently, like shared-nothing. As a result, data shipping and messaging are almost completely eliminated, resulting in performance comparable to shared-nothing, while still maintaining the advantages of shared-disk.

The trick is for the database to recognize on-the-fly when the separate nodes can and cannot operate in this independent fashion. This is complicated by the fact that the database must recognize and adapt to locality which can evolve as database usage changes, nodes are added or removed, etc. This is one aspect of the secret sauce that is built into ScaleDB.

Note: Now that we’ve built a shared-disk database that can recognize locality and respond by acting (and performing) like a shared-nothing database, how do we achieve locality? There are many ways to achieve locality. It can be built into the application, or you can rely on a SQL-aware routing/caching solution like those available from Netscaler and Scalarc that handle this for you.


PlanetMySQL Voting: Vote UP / Vote DOWN

Database Architectures & Performance II

Август 3rd, 2010
As described in the prior post, the shared-disk performance dilemma is simple:

1. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.

In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.

Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.

Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.

ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:

1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage

Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.

The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.

The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.

While the diagram above shows a variety of physical servers, these can be virtual servers. An entire cluster, including the lock manager, database nodes and CAS could be implemented on just two physical servers.

Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.

PlanetMySQL Voting: Vote UP / Vote DOWN

ScaleDB Introduces Clustered Database Based Upon Water Vapor

Апрель 2nd, 2010
ScaleDB is proud to announce the introduction of a database that takes data storage to a new level, and a new altitude. ScaleDB’s patent pending “molecular-flipping technology” enables low energy molecular flipping that changes selected water molecules from H20 to HOH, representing positive and negative states that mimic the storage mechanism used on hard drive disks.

“Because we act at the molecular level, we achieve massive storage density with minimal energy consumption, which is critical in today’s data centers, where energy consumption is the primary cost,” said Mike Hogan, ScaleDB CEO. “A single thimble of water vapor provides the same storage capacity as a high-end SAN.”

The technology does have one small challenge: persistence. Clouds are not known for their persistence. ScaleDB relies on the Cumulus formation, since it is far beefier than some of those wimpy cirrus clouds. However, when deployed in the data center, the dry heat can be particularly damaging to cloud maintenance. One of the company’s patents centers around using heavy water, which resists evaporation and is therefore far more persistent than its lighter brethren. The company has already received approval from the IAEA to commercialize this technique.

This new technology considerably improves ScaleDB’s “green cred”. By greatly reducing energy consumption in data centers, it cuts their carbon footprint, leaving little more than a toeprint. Once the cloud storage—which has a 3-year half-life—is worn out, you can release it into the atmosphere. There is mingles with natural clouds making them denser and more reflective. Leading IPCC climate scientists have modeled the effects of this mingling and the scientific consensus is that it will reduce global temperatures by 5-6 degrees centigrade within 20 years (+/- 10 degrees centigrade). The company is in negotiations with Al Gore to promote this new technology, but they cannot comment on these negotiations because the mere fact that such negotiations are in fact happening is covered by a strict NDA and the even more legally binding pinky promise.

ScaleDB set out to become THE cloud database company and today’s announcement really takes that to a whole new level. The tentative name for this new database is VaporWare.

PlanetMySQL Voting: Vote UP / Vote DOWN

Comparing Cloud Databases: SimpleDB, RDS and ScaleDB

Октябрь 30th, 2009

Amazon’s SimpleDB isn’t a relational database, but it does provide elastic scalability and high-availability. Amazon’s recently announced Relational Database Services (RDS) is a relational database, but it doesn’t provide elastic scalability or high-availability. If you are deploying enterprise applications on the cloud (including Amazon Web Services), you might want to look at ScaleDB because it is a relational database and it does provide elastic scalability and high-availability.

Amazon describes SimpleDB by comparing it to a clustered database:

"A traditional, clustered relational database requires a sizable upfront capital outlay, is complex to design, and often requires extensive and repetitive database administration. Amazon SimpleDB is dramatically simpler, requiring no schema, automatically indexing your data and providing a simple API for storage and access. This approach eliminates the administrative burden of data modeling, index maintenance, and performance tuning. Developers gain access to this functionality within Amazon’s proven computing environment, are able to scale instantly, and pay only for what they use."

In other words, if there was a clustered database that was cost-efficient, simple, low-maintenance, and provided dynamic elasticity, that would be ideal. That is exactly what ScaleDB provides. Granted it isn’t as simple to use as SimpleDB (just look at the name, one is simple, the other is scale) but it does eliminate data partitioning and slaves/replication, both of which account for the bulk of the pain in clustering. ScaleDB also runs MySQL applications without modification.

Amazon, in a nod to SQL developers and MySQL applications, released Relational Database Services (RDS) this week. This too comes up short of Amazon’s ideal of a dynamically scalable and highly available MySQL database. Again, that is exactly what ScaleDB provides.

Comparing SimpleDB, RDS and ScaleDB

Function

SimpleDB

RDS

ScaleDB

Transactions

No

Yes

Yes

Joins

No

Yes

Yes1

Data Consistency

No (Eventual)

Yes

Yes2

SQL Support

No

Yes

Yes

ACID Compliant

No

Yes

Yes

Exploits EBS

No

Yes

Yes

Supports MySQL applications without modification

No

Yes

Yes

Dynamic Elasticity (w/o interrupting the application)

Yes

No

Yes

High-Availability

Yes

No

Yes

Eliminates Partitioning

Yes

No

Yes

Eliminates possible 5-minute data loss upon failure

Yes

No

Yes

Cluster-level load balancing

Yes

No

Yes

1The ScaleDB index delivers multi-table joins with the performance of a single table lookup using a technology that rivals materialized views but without the data synchronization headache.

2ScaleDB’s shared-disk architecture ensures data consistency across all nodes in the cluster.

ScaleDB is a storage engine that plugs into MySQL. It turns MySQL into a shared-disk DBMS, like Oracle RAC. ScaleDB, running on AWS provides elastic scalability, adding/removing nodes according to the number of database connections, all without interrupting any running applications. Also, because ScaleDB doesn’t rely on data partitioning-as you would with shared-nothing databases-the set-up and tuning are very simple.

SimpleDB and RDS are very good and they have their roles. However, I believe that ScaleDB is really the high-end solution, without the high-end price-that enterprise users of the cloud are looking for.


PlanetMySQL Voting: Vote UP / Vote DOWN

Video: The ScaleDB shared-disk clustering Storage Engine for MySQL

Сентябрь 23rd, 2009

Mike Hogan, CEO of ScaleDB spoke at the Boston MySQL User Group in September 2009:

ScaleDB is a storage engine for MySQL that delivers shared-disk clustering. It has been described as the Oracle RAC of MySQL. Using ScaleDB, you can scale your cluster by simply adding nodes, without partitioning your data. Each node has full read/write capability, eliminating the need for slaves, while delivering cluster-level load balancing. ScaleDB is looking for additional beta testers, there is a sign up at http://www.scaledb.com.

Slides are online (and downloadable) at http://www.slideshare.net/Sheeri/scale-db-preso-for-boston-my-sql-meetup-92009

Watch the video online at http://www.youtube.com/watch?v=emu2WfNx4KA or directly embedded here:


PlanetMySQL Voting: Vote UP / Vote DOWN

EU Should Protect MySQL-based Special Purpose Database Vendors

Сентябрь 12th, 2009
In my recent post on the EU antitrust regulators' probe into the Oracle Sun merger I did not mention an important class of stakeholders: the MySQL-based special purpose database startups. By these I mean:

I think it's safe to say the first three are comparable in the sense that they are all analytical databases: they are designed for data warehousing and business intelligence applications. ScaleDB might be a good fit for those applications, but I think it's architecture is sufficiently different from the first three to not call it an analytical database.

For Kickfire and Infobright, the selling point is that they are offering a relatively cheap solution to build large data warehouses and responsive business intelligence applications. (I can't really find enough information on Calpoint pricing, although they do mention low total cost of ownership.) An extra selling point is that they are MySQL compatible, which may make some difference for some customers. But that compatibility is in my opinion not as important as the availability of a serious data warehousing solution at a really sharp price.

Now, in my previous post, I mentioned that the MySQL and Oracle RDBMS products are very different, and I do not perceive them as competing. Instead of trying to kill the plain MySQL database server product, Oracle should take advantage of a huge opportunity to help shape the web by being a good steward, leading ongoing MySQL development, and in addition, enable their current Oracle Enterprise customers to build cheap LAMP-based websites (with the possibility of adding value by offering Oracle to MySQL data integration).

For these analytical database solutions, things may be different though.

I think these MySQL based analytical databases really are competitive to Oracle's Exadata analytical appliance. Oracle could form a serious threat to these MySQL-based analytical database vendors. After the merger, Oracle would certainly be in a position to hamper these vendors by resticting the non-GPL licensed usage of MySQL.
In a recent ad, Oracle vouched to increase investments in developing Sun's hardware and operating system technology. And this would eventually put them in an even better position to create appliances like Exadata, allowing them to ditch an external hardware partner like HP (which is their Exadata hardware partner).

So, all in all, in my opinion the EU should definitely take a serious look at the dynamics of the analytical database market and decide how much impact the Oracle / Sun merger could have on this particular class of MySQL OEM customers. The rise of these relatvely cheap MySQL-based analytical databases is a very interesting development for the business intelligence and data warehousing space in general, and means a big win for customers that need affordable datawarhousing / business intelligence. It would be a shame if it would be curtailed by Oracle. After the merger, Oracle sure would have the means and the motive, so if someone needs protection, I think it would be these MySQL-based vendors of analytical databases.

As always, these are just my musing and opinions - speculation is free. Feel free to correct me, add applause or point out my ignorance :)

PlanetMySQL Voting: Vote UP / Vote DOWN

Cloud Computing Ideal for Shared-Disk Databases

Сентябрь 2nd, 2009

Cloud computing is disrupting many aspects of computing. One need only witness the manner in which online applications like Google Docs and Salesforce.com are disrupting entrenched competitors. Soon, cloud computing will significantly disrupt the database market, for the reasons explained below.

One of the most powerful arguments in technology is the price/performance ratio. Significant declines in price or significant increases in performance can result in disruption. When you get both price declines and performance increases, you get significant disruption. This is exactly what is coming to the database market.

The Past
Moore’s Law enabled the CPU to process data faster than the hard disk drive could get the data to the CPU. Because getting data to the CPU was the bottleneck, the database that solved that bottleneck would have a performance advantage.

The shared-disk database had two glaring deficiencies: (1) it required expensive shared storage; (2) running over Fast Ethernet that most data it could get to the CPU was 12.5 MB/second.

Compare this to the shared-nothing database, which: (1) runs on inexpensive commodity servers; (2) delivers up to 64 MB/second of data to the CPU from the local hard disk. Shared-nothing had the advantages of performance and price! It is no surprise that it grew to dominate the database market.

The Present
A lot has changed. SANs, NAS and clustered storage, leveraging fast interconnects like fiber channel, Infiniband, Dolphin Express and others, deliver up to 400 MB/second of data to the CPU, 6-TIMES the performance of the local disk. But these high-end storage devices are still too high for most users. That is changing now.

Cloud computing is based on virtualization and the sharing the IT infrastructure across large numbers of users in a multi-tenant model. By amortizing both the capital expenses and the operating expenses across its users, cloud companies are rapidly commoditizing high-end computing resources. The net result is that users can run their software on a high-end managed cloud with a super fast SAN or NAS for less than running their application on their own commodity servers. As a result shared-nothing databases lost both of their advantages in performance and price.

The Future

Using a shared-nothing database introduces significant complexity for enterprise systems. As you scale to multiple servers, you must partition your data, set-up slaves and replication, deal with slave promotion, address the issues of data inconsistency, data shipping and more. The shared-disk architecture eliminates all of these headaches.

Until recently, these headaches were simply the price you paid in exchange for shared-nothing’s price/performance advantage. Now shared-nothing’s price/performance advantage is gone. Provisioned cloud storage is the new price/performance leader. Now that you are using shared storage there is no longer a rationale to accept all of the deficiencies of shared-nothing. If you are using shared storage, a shared-disk database is simply more compatible.

Shared-disk databases are inherently easier to manage and more cost-effective for cloud companies. Virtualization is a powerful driver of the underlying economics of cloud computing. It allows the cloud company to efficiently host multiple companies by tapping into a pool of compute resources, instead of dedicating specific compute resources to specific companies. For example, a single server may adress the computing demands of 10 or more different companies throughout the day. This is far more efficient than dedicating that machine to a single company. Unfortunately, the shared-nothing database is tightly linked with a physical computer where the data is stored. The shared-disk database, on the other hand, separates the computing form the data, so it can be easily virtualized and distributed over a multi-tenant computing infrastructure. The net result for cloud companies is that because shared-disk databases support virtualization, they reduce the cloud company’s costs and increase profits.

For the reasons defined above, the cloud will usher in the age of the shared-disk database.


PlanetMySQL Voting: Vote UP / Vote DOWN