PlanetMySQL Voting: Vote UP / Vote DOWN
Archive for the ‘scaledb’ Category
Cloud DaaS Managed Service Fuels NewSQL Market
Сентябрь 21st, 2011PlanetMySQL Voting: Vote UP / Vote DOWN
ScaleDB: Shared-Disk / Shared-Nothing Hybrid
Июль 25th, 2011PlanetMySQL Voting: Vote UP / Vote DOWN
Database Architectures & Performance II
Август 3rd, 20101. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.
In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.
Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.
Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.
ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:
1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage
Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.
The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.
The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.
While the diagram above shows a variety of physical servers, these can be virtual servers. An entire cluster, including the lock manager, database nodes and CAS could be implemented on just two physical servers.Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.
PlanetMySQL Voting: Vote UP / Vote DOWN
ScaleDB Introduces Clustered Database Based Upon Water Vapor
Апрель 2nd, 2010“Because we act at the molecular level, we achieve massive storage density with minimal energy consumption, which is critical in today’s data centers, where energy consumption is the primary cost,” said Mike Hogan, ScaleDB CEO. “A single thimble of water vapor provides the same storage capacity as a high-end SAN.”
The technology does have one small challenge: persistence. Clouds are not known for their persistence. ScaleDB relies on the Cumulus formation, since it is far beefier than some of those wimpy cirrus clouds. However, when deployed in the data center, the dry heat can be particularly damaging to cloud maintenance. One of the company’s patents centers around using heavy water, which resists evaporation and is therefore far more persistent than its lighter brethren. The company has already received approval from the IAEA to commercialize this technique.
This new technology considerably improves ScaleDB’s “green cred”. By greatly reducing energy consumption in data centers, it cuts their carbon footprint, leaving little more than a toeprint. Once the cloud storage—which has a 3-year half-life—is worn out, you can release it into the atmosphere. There is mingles with natural clouds making them denser and more reflective. Leading IPCC climate scientists have modeled the effects of this mingling and the scientific consensus is that it will reduce global temperatures by 5-6 degrees centigrade within 20 years (+/- 10 degrees centigrade). The company is in negotiations with Al Gore to promote this new technology, but they cannot comment on these negotiations because the mere fact that such negotiations are in fact happening is covered by a strict NDA and the even more legally binding pinky promise.
ScaleDB set out to become THE cloud database company and today’s announcement really takes that to a whole new level. The tentative name for this new database is VaporWare.
PlanetMySQL Voting: Vote UP / Vote DOWN
Comparing Cloud Databases: SimpleDB, RDS and ScaleDB
Октябрь 30th, 2009Amazon’s SimpleDB isn’t a relational database, but it does provide elastic scalability and high-availability. Amazon’s recently announced Relational Database Services (RDS) is a relational database, but it doesn’t provide elastic scalability or high-availability. If you are deploying enterprise applications on the cloud (including Amazon Web Services), you might want to look at ScaleDB because it is a relational database and it does provide elastic scalability and high-availability.
Amazon describes SimpleDB by comparing it to a clustered database:
"A traditional, clustered relational database requires a sizable upfront capital outlay, is complex to design, and often requires extensive and repetitive database administration. Amazon SimpleDB is dramatically simpler, requiring no schema, automatically indexing your data and providing a simple API for storage and access. This approach eliminates the administrative burden of data modeling, index maintenance, and performance tuning. Developers gain access to this functionality within Amazon’s proven computing environment, are able to scale instantly, and pay only for what they use."
In other words, if there was a clustered database that was cost-efficient, simple, low-maintenance, and provided dynamic elasticity, that would be ideal. That is exactly what ScaleDB provides. Granted it isn’t as simple to use as SimpleDB (just look at the name, one is simple, the other is scale) but it does eliminate data partitioning and slaves/replication, both of which account for the bulk of the pain in clustering. ScaleDB also runs MySQL applications without modification.
Amazon, in a nod to SQL developers and MySQL applications, released Relational Database Services (RDS) this week. This too comes up short of Amazon’s ideal of a dynamically scalable and highly available MySQL database. Again, that is exactly what ScaleDB provides.
Comparing SimpleDB, RDS and ScaleDB
| Function |
SimpleDB |
RDS |
ScaleDB |
| Transactions |
No |
Yes |
Yes |
| Joins |
No |
Yes |
Yes1 |
| Data Consistency |
No (Eventual) |
Yes |
Yes2 |
| SQL Support |
No |
Yes |
Yes |
| ACID Compliant |
No |
Yes |
Yes |
| Exploits EBS |
No |
Yes |
Yes |
| Supports MySQL applications without modification |
No |
Yes |
Yes |
| Dynamic Elasticity (w/o interrupting the application) |
Yes |
No |
Yes |
| High-Availability |
Yes |
No |
Yes |
| Eliminates Partitioning |
Yes |
No |
Yes |
| Eliminates possible 5-minute data loss upon failure |
Yes |
No |
Yes |
| Cluster-level load balancing |
Yes |
No |
Yes |
1The ScaleDB index delivers multi-table joins with the performance of a single table lookup using a technology that rivals materialized views but without the data synchronization headache.
2ScaleDB’s shared-disk architecture ensures data consistency across all nodes in the cluster.
ScaleDB is a storage engine that plugs into MySQL. It turns MySQL into a shared-disk DBMS, like Oracle RAC. ScaleDB, running on AWS provides elastic scalability, adding/removing nodes according to the number of database connections, all without interrupting any running applications. Also, because ScaleDB doesn’t rely on data partitioning-as you would with shared-nothing databases-the set-up and tuning are very simple.
SimpleDB and RDS are very good and they have their roles. However, I believe that ScaleDB is really the high-end solution, without the high-end price-that enterprise users of the cloud are looking for.
PlanetMySQL Voting: Vote UP / Vote DOWN
Video: The ScaleDB shared-disk clustering Storage Engine for MySQL
Сентябрь 23rd, 2009Mike Hogan, CEO of ScaleDB spoke at the Boston MySQL User Group in September 2009:
ScaleDB is a storage engine for MySQL that delivers shared-disk clustering. It has been described as the Oracle RAC of MySQL. Using ScaleDB, you can scale your cluster by simply adding nodes, without partitioning your data. Each node has full read/write capability, eliminating the need for slaves, while delivering cluster-level load balancing. ScaleDB is looking for additional beta testers, there is a sign up at http://www.scaledb.com.
Slides are online (and downloadable) at http://www.slideshare.net/Sheeri/scale-db-preso-for-boston-my-sql-meetup-92009
Watch the video online at http://www.youtube.com/watch?v=emu2WfNx4KA or directly embedded here:
PlanetMySQL Voting: Vote UP / Vote DOWN
EU Should Protect MySQL-based Special Purpose Database Vendors
Сентябрь 12th, 2009I think it's safe to say the first three are comparable in the sense that they are all analytical databases: they are designed for data warehousing and business intelligence applications. ScaleDB might be a good fit for those applications, but I think it's architecture is sufficiently different from the first three to not call it an analytical database.
For Kickfire and Infobright, the selling point is that they are offering a relatively cheap solution to build large data warehouses and responsive business intelligence applications. (I can't really find enough information on Calpoint pricing, although they do mention low total cost of ownership.) An extra selling point is that they are MySQL compatible, which may make some difference for some customers. But that compatibility is in my opinion not as important as the availability of a serious data warehousing solution at a really sharp price.
Now, in my previous post, I mentioned that the MySQL and Oracle RDBMS products are very different, and I do not perceive them as competing. Instead of trying to kill the plain MySQL database server product, Oracle should take advantage of a huge opportunity to help shape the web by being a good steward, leading ongoing MySQL development, and in addition, enable their current Oracle Enterprise customers to build cheap LAMP-based websites (with the possibility of adding value by offering Oracle to MySQL data integration).
For these analytical database solutions, things may be different though.
I think these MySQL based analytical databases really are competitive to Oracle's Exadata analytical appliance. Oracle could form a serious threat to these MySQL-based analytical database vendors. After the merger, Oracle would certainly be in a position to hamper these vendors by resticting the non-GPL licensed usage of MySQL.
In a recent ad, Oracle vouched to increase investments in developing Sun's hardware and operating system technology. And this would eventually put them in an even better position to create appliances like Exadata, allowing them to ditch an external hardware partner like HP (which is their Exadata hardware partner).
So, all in all, in my opinion the EU should definitely take a serious look at the dynamics of the analytical database market and decide how much impact the Oracle / Sun merger could have on this particular class of MySQL OEM customers. The rise of these relatvely cheap MySQL-based analytical databases is a very interesting development for the business intelligence and data warehousing space in general, and means a big win for customers that need affordable datawarhousing / business intelligence. It would be a shame if it would be curtailed by Oracle. After the merger, Oracle sure would have the means and the motive, so if someone needs protection, I think it would be these MySQL-based vendors of analytical databases.
As always, these are just my musing and opinions - speculation is free. Feel free to correct me, add applause or point out my ignorance :)
PlanetMySQL Voting: Vote UP / Vote DOWN
Cloud Computing Ideal for Shared-Disk Databases
Сентябрь 2nd, 2009Cloud computing is disrupting many aspects of computing. One need only witness the manner in which online applications like Google Docs and Salesforce.com are disrupting entrenched competitors. Soon, cloud computing will significantly disrupt the database market, for the reasons explained below.
One of the most powerful arguments in technology is the price/performance ratio. Significant declines in price or significant increases in performance can result in disruption. When you get both price declines and performance increases, you get significant disruption. This is exactly what is coming to the database market.
The Past
Moore’s Law enabled the CPU to process data faster than the hard disk drive could get the data to the CPU. Because getting data to the CPU was the bottleneck, the database that solved that bottleneck would have a performance advantage.
The shared-disk database had two glaring deficiencies: (1) it required expensive shared storage; (2) running over Fast Ethernet that most data it could get to the CPU was 12.5 MB/second.
Compare this to the shared-nothing database, which: (1) runs on inexpensive commodity servers; (2) delivers up to 64 MB/second of data to the CPU from the local hard disk. Shared-nothing had the advantages of performance and price! It is no surprise that it grew to dominate the database market.
The Present
A lot has changed. SANs, NAS and clustered storage, leveraging fast interconnects like fiber channel, Infiniband, Dolphin Express and others, deliver up to 400 MB/second of data to the CPU, 6-TIMES the performance of the local disk. But these high-end storage devices are still too high for most users. That is changing now.
Cloud computing is based on virtualization and the sharing the IT infrastructure across large numbers of users in a multi-tenant model. By amortizing both the capital expenses and the operating expenses across its users, cloud companies are rapidly commoditizing high-end computing resources. The net result is that users can run their software on a high-end managed cloud with a super fast SAN or NAS for less than running their application on their own commodity servers. As a result shared-nothing databases lost both of their advantages in performance and price.
The Future
Using a shared-nothing database introduces significant complexity for enterprise systems. As you scale to multiple servers, you must partition your data, set-up slaves and replication, deal with slave promotion, address the issues of data inconsistency, data shipping and more. The shared-disk architecture eliminates all of these headaches.
Until recently, these headaches were simply the price you paid in exchange for shared-nothing’s price/performance advantage. Now shared-nothing’s price/performance advantage is gone. Provisioned cloud storage is the new price/performance leader. Now that you are using shared storage there is no longer a rationale to accept all of the deficiencies of shared-nothing. If you are using shared storage, a shared-disk database is simply more compatible.
Shared-disk databases are inherently easier to manage and more cost-effective for cloud companies. Virtualization is a powerful driver of the underlying economics of cloud computing. It allows the cloud company to efficiently host multiple companies by tapping into a pool of compute resources, instead of dedicating specific compute resources to specific companies. For example, a single server may adress the computing demands of 10 or more different companies throughout the day. This is far more efficient than dedicating that machine to a single company. Unfortunately, the shared-nothing database is tightly linked with a physical computer where the data is stored. The shared-disk database, on the other hand, separates the computing form the data, so it can be easily virtualized and distributed over a multi-tenant computing infrastructure. The net result for cloud companies is that because shared-disk databases support virtualization, they reduce the cloud company’s costs and increase profits.
For the reasons defined above, the cloud will usher in the age of the shared-disk database.
PlanetMySQL Voting: Vote UP / Vote DOWN