Archive for the ‘architecture’ Category
Mysql HA solutions
Апрель 12th, 2012PlanetMySQL Voting: Vote UP / Vote DOWN
Working with ScaleBase and NOSQL
Декабрь 15th, 2011There is a huge amount of buzz around NOSQL, and we at ScaleBase are happy to see companies making the move to NOSQL. Despite what some people might think, we consider it a blessed change. It is time for applications to stop having a single data store – namely a relational database (probably Oracle) – and start using the best tool for the job.
In the last couple of years, since NOSQL technologies broke into our world, a lot of experience has been gathered on how to use them. Mainly, we see NoSQL technologies used for one of the following scenarios:
- Queries that require a very short response time
- Storing data without a well-defined schema, or storing data with a frequently modified schema
Now, I’m not in any way saying that NOSQL solutions are not used for other scenarios as well; I’m only saying that from our experience here at ScaleBase , these are the most common scenarios.
Other needs, like data backup, complex joins queries, consistent data storage – all are still being delivered by relational databases.
So the implementation is along the lines of a hybrid model – NOSQL for some tasks, MySQL (or other database, but MySQL is by far the most popular) for others.
ScaleBase is determined to assist in the relational database part of the problem, letting it scale and perform – just as the NOSQL side can scale and perform by itself (and frankly it can scale and perform very well, as this was the original requirement for most NOSQL solutions).
As NOSQL solutions grow in popularity and use, I expect we’ll see “design patterns” pop up – when to use relational databases and when to use NOSQL solutions (and of course – which one). For now, if you’re architecting your new web application/SaaS solution or social game – try to learn from the architectures of existing sites. You can get some at http://highscalability.com , and others at http://nosql.mypopescu.com/.
PlanetMySQL Voting: Vote UP / Vote DOWN
ScaleBase achieves 180K NO-TPM TPCC results on Amazon RDS
Декабрь 12th, 2011ScaleBase Releases Database TPC-C Performance Results
Technology achieves unprecedented transaction speed for a MySQL database at a low cost
Boston, Mass., December 12, 2011 – ScaleBase, Inc. today announced the results of its MySQL database benchmark, based on the industry-standard TPC-C test. ScaleBase has achieved an unmatched 180,000 Transactions per Minute – the highest result for a MySQL database – while running on an Amazon RDS environment. Cost per Transaction was reported to be 50 cents, which demonstrates the cost-effectiveness of the ScaleBase solution on the Amazon EC2 cloud. Full details of the benchmark can be found at http://www.scalebase.com/resources/performance/.
TPC, the Transaction Processing Performance Council, defines transaction processing and database benchmarks and delivers reliable, independent results to the industry. The TPC-C benchmark is a popular yardstick for comparing Online Transaction Processing (OLTP) performance on various hardware and software configurations.
The ScaleBase Database Load Balancer is a packaged solution for transparently scaling MySQL databases. ScaleBase utilizes two techniques for scaling: read-write splitting and transparent sharding (a technique for massively scaling-out relational databases). The software enables MySQL to scale transparently, without forcing developers to change a single line of code or perform a long data migration process. The technology is ideally suited for any application in which scalability, performance and speed are critical, including: gaming, e-commerce, SaaS, machine-generated data, Web 2.0 and more.
“Some people feel that by using MySQL they stand the chance of limiting their performance options, however, these TPC-C results proves that this simply is no longer the case,” said Rob Levine, ScaleBase’s VP of Sales. “Without writing specialized code you can still get top performance – perhaps optimal performance – at an affordable rate, accounting for the requisite hardware and infrastructure resources. Especially in today’s economy, getting such great performance and optimizing every dollar spent can save companies substantial amounts of money.”
ScaleBase’s Database Load Balancer solution has been successfully used by numerous customers since its official release in August 2011.
About ScaleBase
ScaleBase has developed an innovative database load balancing technology that enables MySQL users to achieve scalability and high availability, without changing a single line of application code. ScaleBase utilizes two techniques for scaling: read-write splitting and transparent sharding, which is a method for massively scaling-out relational databases. The ScaleBase technology is ideally suited for any application in which scalability, performance and speed are critical, including: gaming, e-commerce, SaaS, machine-generated data and more. The company is privately-held and headquartered near Boston, Mass. Follow @SCLBase on Twitter.
Media Contact
Candice Perodeau
508-475-0025 x112
PlanetMySQL Voting: Vote UP / Vote DOWN
Making the case for Database Sharding using a Proxy
Декабрь 6th, 2011There are several ways to implement sharding in your application. The first and by far the most popular, is to implement it inside your application. It can be implemented as part of your own Data Access Layer, database driver, or an ORM extension. However, there are many limitations with such implementation, which drove us, at ScaleBase, to look for an alternative architecture.

As the above diagram shows, ScaleBase is implemented as a standalone proxy. There are several benefits to using such an architecture.
First and foremost, since the sharding logic is not embedded inside the application, third party applications can be used, be it MySQL Workbench, MySQL command line interface or any other third party product. This translates to a huge saving in the day-to-day costs of both developers and system administrators.
Backup can be executed via the proxy, and so allows users to consistently backup a sharded environment – not an easy task when sharding is developed internally.
Since the application server machines are usually highly utilized (as they should be, to optimize costs), running additional code on application server machines will just slow them down. Running the code on external proxies allows for a more efficient division of tasks between the servers, and allows requests to be unaffected by data crunching (for instance cross-shard queries) requests.
So all in all there are many reasons to run sharding code outside the scope of the application and application server. If you’re interested – we’d love to chat.
PlanetMySQL Voting: Vote UP / Vote DOWN
How do you know when to shard your database?
Ноябрь 17th, 2011We at ScaleBase talk about sharding so much, it’s difficult for us to see why someone wouldn’t want to shard. But just because we’re so enthusiastic about our transparent sharding mechanism, it doesn’t mean we can’t understand the very basic question, “When do I shard?”
Well, it’s not the most difficult question to answer. I’ll keep it short: if your database exceeds the memory you have on a single machine, you should shard. If you hit I/O, your performance suffers, and sharding will assist.
Why? That’s easy to explain.
Databases in general (and MySQL is no exception) try to cache data. Because accessing memory is so much faster than accessing disk (even with SSDs), database providers have developed rather sophisticated caching algorithms. For instance, running a query caches the query and its results. Indexes are stored in memory so that, when running a query, the database doesn’t have to hit the disk twice.
But if the database is big, it won’t fit into memory. Sometime even the index won’t fit into memory. This is when you start seeing database performance degradation. So the best date to start sharding is when you can’t add more memory to your database server. This can come sooner rather than later. As we all know, data is booming, and if you’re running in the cloud there is only so much memory your cloud provider will give you. With sharding, every machine has its own data, which fits in RAM. And if you need more – just add an additional shard.
The other parameter is the number of concurrent connections. If you reach the limit of connections your machine can handle, it’s time to shard your database. Every sharded database gets less hits/second, requires fewer connections – and can work faster.
So, if your database does not fit in memory, or if you have too many concurrent users hitting your database – try out ScaleBase, for our transparent sharding solution.
PlanetMySQL Voting: Vote UP / Vote DOWN
Oracle’s NoSQL
Октябрь 7th, 2011
Oracle's turn-about announcement
of a NoSQL product wasn't really surprising. When Oracle spends time and effort putting down a technology, you can bet
that its secretly impressed, and trying to re-implement it in its back room. So Oracle's paper
"Debunking the NoSQL Hype" should really have been read as a backhanded
product announcement. (By the way, don't click that link; the paper appears to have been taken down. Surprise.)
I have to agree with DataStax and other developers in the NoSQL movement: Oracle's announcement is a validation, more than anything else. It's certainly a validation of NoSQL, and it's worth thinking about exactly what that means. It's long been clear that NoSQL isn't about any particular architecture. When databases as fundamentally different as MongoDB, Cassandra, and Neo4J can all be legitimately characterized as "NoSQL," it's clear that NoSQL isn't a "thing." We've become accustomed to talking about the NoSQL "movement," but what does that mean?
As Justin Sheehy, CTO of Basho Technologies, said, the NoSQL movement isn't about any particular architecture, but about architectural choice. For as long as I can remember, application developers have debated software architecture choices with gusto. There were many choices for the front end; many choices for middleware; and careers rose and fell based on those choices. Somewhere along the way, "Software Architect" even became a job title. But for the backend, for the past 20 years there has really been only one choice: a relational database that looks a lot like Oracle (or MySQL, if you'd prefer). And choosing between Oracle, MySQL, PostgreSQL, or some other relational database just isn't that big a choice.
Did we really believe that one size fits all for database problems? If we ever did, the last three years have made it clear that the model was broken. I've got nothing against SQL (well, actually, I do, but that's purely personal), and I'm willing to admit that relational databases solve many, maybe even most, of the database problems out there. But just as it's clear that the universe is a more complicated place than physicists thought it was in 1990, it's also clear that there are data problems that don't fit 20-year-old models. NoSQL doesn't use any particular model for storing data; it represents the ability to think about and choose your data architecture. It's important to see Oracle recognize this. The company's announcement isn't just a validation of key-value stores, but of the entire discussion of database architecture.
Of course, there's more to the announcement than NoSQL. Oracle is selling a big data appliance: an integrated package including Hadoop and R. The software is available standalone, though Oracle clearly hopes that the package will be running on its Exadata Database hardware (or equivalent), which is an impressive monster of a database machine (though I agree with Mike Driscoll, that machines like these are on the wrong side of history). There are other bits and pieces to solve ETL and other integration problems. And it's fair to say that Oracle's announcement validates more than just NoSQL; it validates the "startup stack" or "data stack" that we've seen in many of most exciting new businesses that we watch. Hadoop plus a non-relational database (often MongoDB, HBase, or Cassandra), with R as an analytics platform, is a powerful combination. If nothing else, Oracle has given more conservative (and well-funded) enterprises permission to make the architectural decisions that the startups have been making all along, and to work with data that goes beyond what traditional data warehouses and BI technologies allow. That's a good move, and it grows the pie for everyone.
I don't think many young companies will be tempted to invest millions in Oracle products. Some larger enterprises should, and will, question whether investing in Oracle products is wise when there are much less expensive solutions. And I am sure that Oracle will take its share of the well-funded enterprise business. It's a win all around.
Web 2.0 Summit, being held October 17-19 in San Francisco, will examine "The Data Frame" — focusing on the impact of data in today's networked economy.
Related:
- Oracle's Big Data Appliance: what it means
- Oracle's big data week
- Hadoop: What it is, how it works, and what it can do
- Building data startups: Fast, big, and focused
PlanetMySQL Voting: Vote UP / Vote DOWN
When Clever Goes Wrong & How Etsy Overcame – Arstechnica
Октябрь 5th, 2011In 2007, Etsy made a big bet on homegrown middleware to help with the site’s scalability. A half-year after it was taken live, the company decided to abandon it. As a senior software engineer at Etsy put it, “if you’re doing something ‘clever,” you’re probably doing it wrong.”
Read the full article at Arstechnica.com
I want to focus on the important lessons from this article, about middleware and using stored procedures in this fashion for a public web application, creating unscalable design complexity (smart and “proper” according to the old enterprise design teachings…) – causing infrastructure, development and maintenance hassles.
In the process they did replace PostgreSQL with MySQL but that’s not the critical change that made all the difference. PostgreSQL is a fine database system also.
PlanetMySQL Voting: Vote UP / Vote DOWN
How to Implement MySQL Sharding – Part 3
Октябрь 4th, 2011In the previous post of this series (which can be found here) I discussed how to migrate your data once you have decided how to shard your schema.
Once your data is sharded, it’s time to modify your application code. I will not dive into the many open source platforms that provide partial sharding support (Hibernate Shards, Gizzard, and the like), and will take Java (sorry, old habits are hard to overcome) as an example – however, the same holds true for any programming language.
Without Using ORM
If you wrote your code without an Object/Relational Mapping tool, kudos to you. Sharding will be easier, as you control the SQL statements.
Upgrading Connection Pool
Your first task is to write a connection pool that is “sharding” aware. The class should look something like this:
public class ShardingAwareDatasource {
public static Connection getConnection(Object shardingKey) {…}
public static Connection getAnyConnection() {…}
}
Queries that run on specific shards need to use the getConnection method, which must contain the logic that returns the correct connection based on the sharding key. Queries that run on global tables (as described in the previous posts) can use getAnyConnection, which returns a connection to any of the shards.
Note that these methods must be session aware, to ensure transaction isolation.
Changing Queries
- Go over all of your queries.
- For each query:
- Identify whether it runs on a global or shard table.
i. If it runs on a global table, make sure the connection used is from getAnyConnection.
- If it is based on a shard table, check if it contains the shard key.
i. If so, then use that shard key in the getConnection method.
- If the query uses other tables, break it down into multiple queries.
ii. If not, then split the query into multiple queries, so that each contains a shard key.
iii. Make sure your code merges data that is gathered across multiple queries.
Usually, you’ll see that if the query is not trivial (contains only one table; if the table is sharded, it must contain the shard key in the where clause; etc.) it will have to be changed. It’s a lot of work, but it pays off in performance.
When Using ORM
Since most ORMs don’t support sharding, you’re out of luck. Most likely you’ll have to rewrite your ORM code, directly use SQL, and handle the object mapping by yourself. Not an easy task.
Summary
Implementing your own sharding is not impossible. It’s been done before, and in this series I tried to focus on what tasks are needed when you implement your own sharding.
Of course, if you’re serious about sharding your database, I strongly urge you to give ScaleBase a try. We’ll make sure all this heavy lifting is done for you, in a transparent way – no code changes, no schema configuration, everything is automatic.
I’ll be happy to chat on this page or through our contact us page. Also, you can get more information on how to write your own sharding code in our upcoming webinar. It will be held on November 2nd, 11AM PST. You can register here.
And if you’d like to find out what’s the best way to shard your schema, try our 100% free Analyzer. You can download it here.
PlanetMySQL Voting: Vote UP / Vote DOWN
How to Implement MySQL Sharding – Part 2
Сентябрь 26th, 2011In the previous post of this series (which can be found here) I discussed how to identify tables that can serve as good candidates for sharding.
Once you have decided which tables should be sharded (all the rest should be global tables), the choice of sharding keys is rather straightforward, as most will use the table primary key as the shard key. Of course, if multiple tables are sharded, and there is a foreign key relationship between these tables, then the foreign key will serve as the shard key for some tables.
Many people attempt to shard based on customer_id or a resource id, but I have seen how this usually fails in production environments. It is very hard to know in advance which customers belong together in the same database, and since customers can suddenly increase their traffic, this might create an unbalanced situation in which some shards are very busy while others are relaxed (see the details of last year’s FourSquare outage for some possible results of unbalanced sharding).
As with database partitioning, there are multiple algorithms available for sharding: hash , list, or range. Usually you’ll use list and range for multi-tenancy – saving customer information across different databases and maybe even different data centers. I’ll touch on that subject in a future post. But hash will probably give you the best results when it comes to sharding, as statistically it ensures that data is evenly distributed across all shards.
So after sharding configuration, what’s next?
If you have a new application you can skip the next section and just wait for the next post. But if you have an existing database, you’re stuck with huge amounts of data that you need to split.
We at ScaleBase ran a lot of tests and found that the following is the best mechanism for the initial data migration (BTW – if you use ScaleBase,– we handle and also optimize the data migration for you).
- Have the database cloned in all shards. It can be done by cloning a VM, or copying the physical files, or using mysqldump to export once and import to all shards.
- For each shard (on shard tables only):
- Drop all indexes.
- Delete the irrelevant data from the shard (this should be done by an automatic script of course).
Note: This action creates a lot of fragmentation. You might consider creating temporary a table, inserting to it only the relevant rows, drop the original table and rename the temporary one to the real name - Create all indexes.
In the next post we’ll talk about the programming language modifications that sharding requires.
PlanetMySQL Voting: Vote UP / Vote DOWN
Backing Up MySQL With ScaleBase
Сентябрь 12th, 2011Backing up data is critical for production databases – and there are a lot of well-known solutions for backing up databases.
When the database is sharded, backing up data becomes problematic. If the backup is not synchronized across all shards, data inconsistency might occur. In this blog post I’ll try to detail the possible backup scenarios for sharded databases when using ScaleBase.
Backup Types
Let’s start by understanding the different backup types that are out there. You can read all about it here.
A physical backup involves copying all database files to a different location. Copying can take several hours for a decent database if it’s done to a disk or a tape. It might take only seconds if the database files reside on SAN/NAS storage hardware that supports snapshot technology.
A logical backup is a copy of the logical database structure. It backs up meaningful data rather than the physical backup’s bits and bytes. The logical backup is comprised of all CREATE TABLE statements and INSERT statements for the content.
Physical backup methods are faster than logical because they involve only file copying without conversion.
A full restore is also faster from a physical backup. However with a physical backup you can’t restore only one table, or selected specific data. If this is what you need, you’ll have to use logical backups.
Physical Backup
A physical backup can be cold, warm or hot.
| Backup Type | Single Database | Sharded Database with ScaleBase |
| Cold |
|
|
| Warm |
|
|
| Hot | Needs tools like “MySQL Enterprise Backup”, or “Percona xtrabackup”. | Needs tools like “MySQL Enterprise Backup”, or “Percona xtrabackup” on all databases servers. |
Logical Backup
| 1 DB | Sharded |
| The most common command for a logical backup is:mysqldump –single-transaction –all-databases | Run the command through ScaleBase. |
Benefits of Backing Up with ScaleBase
The added value of using ScaleBase when backing up data is:
- Vs. single database:
- Backup takes only a fraction of the time. Since each database is smaller, copying the data is faster.
- Vs. home-grown sharded environment:
- Instead of updating backup scripts, just change the IP address to ScaleBase. Everything will continue working exactly as before.
PlanetMySQL Voting: Vote UP / Vote DOWN