Archive for the ‘fractal tree indexes’ Category

Write Optimization: Myths, Comparison, Clarifications, Part 2

Октябрь 4th, 2011

In my last post, we talked about the read/write tradeoff of indexing data structures, and some ways that people augment B-trees in order to get better write performance. We also talked about the significant drawbacks of each method, and I promised to show some more fundamental approaches.

We had two “workload-based” techniques: inserting in sequential order, and using fewer indexes, and two “data structure-based” techniques: a write buffer, and OLAP. Remember, the most common thing people do when faced with an insertion bottleneck is to use fewer indexes, and this kills query performance. So keep in mind that all our work on write-optimization is really work for read-optimization, in that write-optimized indexes are cheap enough that you can keep all the ones you need to get good read performance.

Today, I’ll draw some parallels with the write buffer and OLAP. Recall that the write buffer gets you a small insertion speedup but doesn’t really hurt query time, and OLAP gets you a big insertion speedup but doesn’t let you query your data for a long time. We’ll also use the fact that sequential insertions are orders of magnitude faster than random insertions.

With that in mind, let’s move on to the new generation of write optimization.

Two Great Tastes: Log-Structured Merge trees (LSMs)

We couldn’t manage to get both insertion and query performance out of either a write buffer or OLAP. But they’re similar techniques, just at two extremes.

LSMs are two great tastes that taste great together. To start, make the buffer big, but make it a B-tree, so you can query data as it arrives. In fact, suppose you have log(N) B-trees, B1, B2, …, Blog(N), each twice the size of the one before. If B-trees are slow, using more of them sounds crazy, but I promise we’re getting somewhere.

An LSM Tree with 4 levels

Each B-tree has twice the capacity of the one before it. So Bk can hold 2k elements. When a new row arrives, put it in B-tree B1. When B1 reaches capacity (which in this case is 2 rows), dump those rows into B2. B2‘s capacity is 4 rows. When B2 overflows, you dump the items into B3, which has a capacity of 8 rows, and so on. The trick here is that each time we dump things down to the next B-tree, they’re already sorted, so we get the insertion boost out of doing sequential insertions.

The first log(M) B-trees are in memory (where M is the size of memory). A simple optimization is to just have one B-tree for all these levels, because in-memory B-trees are fast. Once you start flowing out of memory, you are always merging one tree with another which has at most twice as many rows. This way, the smaller B-tree can be treated like the large, OLAP-style buffer, and you get a similarly large speedup, in fact, this merge happens at disk bandwidth speeds.

Not so fast, you say: You don’t get to use all the bandwidth, because each row gets moved from B-tree to B-tree, and it uses up bandwidth each time. This is true, but it turns out that you’re operating at a 1/log(N/M) fraction of bandwidth, which is a lot better than a B-tree, by orders of magnitude.

Alas, the queries are not so great. Even though we made the buffer into B-trees, which are good for queries, you now need to do a query in each one. There are log(N/M) of them on disk, so this ends up being slower than a B-tree by a log(N/M) factor. There’s that pesky tradeoff, which is much better than the B-tree tradeoff, but still not the mathematically optimal tradeoff.

One last point: if instead of growing the B-trees by a factor of 2, you grow them by a larger factor, you slow down your insertions but speed up your queries. Once again, the tradeoff emerges.

Have Your Cake and Eat It Too: COLAs

A COLA (that’s Cache-Oblivious Lookahead Array) is a lot like an LSM, with the queries done in a better way. To begin with, you use a technique called fractional cascading to maintain some information about the relationship from one level to the next. This information can take many forms, but what’s important is that you don’t restart your query at each level and end up doing a full B-tree query log(N) times. Instead, you get to do a small local search. If you do things just right, you can match the query time of a single B-tree. This is true even if you are doubling your structures at each level, so in addition, COLAs are as fast at insertions as LSMs.

Let me repeat that: they match B-trees for queries while simultaneously matching LSMs for insertions. It’s nice to note that COLAs are on the mathematically optimal write/read tradeoff curve, and they’re a proof, by example, that B-trees are not optimal.

COLAs are on the optimal read/write tradeoff curve

This flavor of data structure, which combines the insertion speed of the size-doubling hierarchy of sorted structures (the LSM) with the query speed boost of fractional cascading, goes by many names and can be found dressed up in a bunch of surprising ways, but the underlying math, as well as the performance, is exactly the same.

For bonus points, if you read my colleagues’ paper on COLAs, you’ll see that they are described as being log(B) slower than B-trees on queries. This log(B) is easily recouped in practice—giving you the same query speed as B-trees—if you give up so-called cache obliviousness (a property which is nice mathematically, but not as nice as having faster queries).

Write Optimization is the Best Read Optimization

I’ve been focusing on write optimization, and Fractal Trees do go a couple of orders of magnitude faster than B-trees for indexing that pesky non-sequential data. What that means for the user is typically read optimization: you start adding all the indexes you needed all along, since indexes are so wonderfully cheap to update. My motto is: write optimization is the best read optimization!

You can get COLA-style read-optimal, write-optimized goodness here at Tokutek, where it is marketed as Fractal Trees and available in TokuDB for MySQL and MariaDB.


PlanetMySQL Voting: Vote UP / Vote DOWN

Are You Forcing MySQL to Do Twice as Many JOINs as Necessary?

Сентябрь 29th, 2011
.
Baron Schwartz
This guest post is from our friends at Percona. They’re hosting Percona Live London from October 24-25, 2011. Percona Live is a two day summit with 100% technical sessions led by some of the most established speakers in the MySQL field.

In the London area and interested in attending? We are giving away two free passes in the next few days. Watch our @tokutek twitter feed for a chance to win.

Did you know that the following query actually performs a JOIN? You can’t see it, but it’s there:

SELECT the_day, COUNT(*), SUM(clicks), SUM(cost)
FROM ad_clicks_by_day
WHERE the_day >= '2005-07-01' AND the_day < '2005-07-07'
GROUP BY the_day;

Let me explain.

Suppose you define the table as follows:

CREATE TABLE ad_clicks_by_day (
customer INT  NOT NULL,
the_day  DATE NOT NULL,
clicks   INT  NOT NULL DEFAULT 0,
cost     INT  NOT NULL DEFAULT 0,
PRIMARY KEY(customer, the_day),
INDEX(the_day)
) ENGINE=InnoDB;

What happens when MySQL executes this query? Here’s the EXPLAIN:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ad_clicks_by_day
         type: range
possible_keys: the_day
          key: the_day
      key_len: 3
          ref: NULL
         rows: 368
        Extra: Using where

That looks fine, doesn’t it? It’s using an index range scan to find approximately 368 matching rows, and adding up the clicks and cost for them. Can it be done any more efficiently than this?

In fact, it turns out that it’s possible to execute this query much more efficiently. What happens internally when the server executes this query is that it begins reading from the ‘the_day’ index, and for each row it finds, it performs a lookup in the primary key (which, in InnoDB, stores the whole table) to find the other columns mentioned in the query. That is, it’s joining the index to the primary key! This is not just an ivory-tower abstraction. In real-life workloads, the random I/O caused by such index-to-table joins slows queries dramatically.

The “join” here isn’t really the same as a true table-to-table SQL JOIN in some ways, and it has a little less overhead at the server level than a table-to-table join, but in terms of the way an index-to-row lookup works, it’s quite similar in many ways.

If you’re skilled at logical and physical database design, you already noticed that my example can be improved by indexing differently: just put the primary key on (the_day, customer) instead of the other way around. Then the query will use the primary key instead of the secondary key, and the whole row is available to the query without doing a join to another index. This is called a covering index, which means that the index covers the whole query–there is no need for any data outside the index. You’ll see “Using index” in the Extra column from EXPLAIN when an index covers a query.

But then what if we want to group the results by customer? No problem, we can just add another index. To achieve the same results for ad-hoc querying, we’ll need to index every column:

ALTER TABLE ad_clicks_by_day ADD INDEX(the_day, customer, clicks, cost);

That works, but isn’t that going to be kind of expensive? We’re essentially duplicating the whole table. And what if we have some other columns we want to target for grouping, or filtering, or sorting, or any other operation that can be optimized with an index? Uh-oh, that’s going to be a lot of indexes.

The overhead of maintaining indexes is often mentioned, but rarely measured. It is sometimes exaggerated as a result. (For further reading on this point, see the book Relational Database Index Design and the Optimizers by Tapio Lahdenmaki.) However, if you did measure the overhead, you would notice two important things: increased disk I/O for modifications to the table, and increased disk space consumption. Even if it’s sometimes blown out of proportion, it’s real.

That’s why I like TokuDB’s indexing technology and modifications to the storage engine API. Indexes are much cheaper in TokuDB than they are in InnoDB, for several reasons:

  • TokuDB compresses its data quite well, which reduces the footprint both in space consumption and in I/O operations.
  • TokuDB lets you define multiple clustering indexes–indexes that sort and compare only on the key columns, but which include all columns in the table, transparently. That avoids invisible JOINs. I want this feature in InnoDB.
  • TokuDB uses a different data structure for indexes, which is significantly more efficient. Instead of the classic B-Tree structure, TokuDB uses Fractal Trees, which don’t slow down when they get bigger than memory.

As a result of these three properties of TokuDB, you can maintain a lot of indexes on a lot of data relatively cheaply. And that means that you can index your data to match your query patterns, and get much faster performance by removing a lot of random I/O operations caused by finding data and then looking up the rest of the row.

Disclosure: Tokutek paid Percona to evaluate the technology, but it is our practice to offer only our honest opinion in blog posts such as this one.


About the Author

Baron is Percona’s Chief Performance Architect. He’s the lead author of High Performance MySQL, and creator of Maatkit. He consults with Percona’s customers, and develops tools and practices for Percona’s team.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Write Optimization: Myths, Comparison, Clarifications

Сентябрь 22nd, 2011

Some indexing structures are write optimized in that they are better than B-trees at ingesting data. Other indexing structures are read optimized in that they are better than B-trees at query time. Even within B-trees, there is a tradeoff between write performance and read performance. For example, non-clustering B-trees (such as MyISAM) are typically faster at indexing than clustering B-trees (such as InnoDB), but are then slower at queries.

This post is the first of two about how to understand write optimization, what it means for overall performance, and what the difference is between different write-optimized indexing schemes. We’ll be talking about how to deal with workloads that don’t fit in memory—in particular, if we had our data in B-trees, only the internal nodes (perhaps not even all of them) would fit in memory.

As I’ve already said, there is a tradeoff between write and read speed in B-trees. There is also a mathematically optimal tradeoff that’s been proven between data ingestion and query speed, no matter what data structure you have.

But these are not the same tradeoff! B-trees are not even remotely optimal in terms of read/write tradeoff. I’d say this is the most common confusion I run into on this topic. The way it comes up is in the assumption people seem to make that if you are faster than a B-tree at indexing, you must pay a read penalty. This is simply not true, and I intend to convince you, but today I’ll just describe how I think people arrive at this confusion.

Extreme solutions

What’s the best we can do if we only worry about writes? Or reads? By considering these extreme cases; we can get a feel for the space of all possible solutions.

What’s the most write-optimized structure? Simply log all the data. You ingest data at the bandwidth of the storage system, and you can’t do better than that. The problem is that each query now requires all the data to be examined, and that’s a pretty lousy way to get query performance (unless, of course, you intend to look at it all anyway, like MapReduce does).

At the other extreme, you get maximum read performance by re-sorting the data and laying it out optimally for queries (for each index!) every time new data is inserted. The layout is technical, but can be done, and query performance doesn’t get any better than this. But then insertion speeds are disastrously slow.

B-trees

For most real workloads, we can’t sacrifice one aspect of performance to benefit another, we need something that works well for both.

A B-tree is one compromise between reads and writes. They’re easy to understand and they’ve been popular for decades, but as the data structure grows, their performance falls to pieces. Heavy use over the years has led to all kinds of refinements, and there are plenty of write optimizations for B-trees we could discuss at length. Here’s a little observation that will help us out: write optimization in B-trees involves maximizing the useful work we accomplish each time we touch a leaf.1

Let’s examine some write optimizations and see how this observation applies. We’ll also see some drawbacks of each technique.

  • Insert in sequential order: B-trees are a couple of orders of magnitude faster at inserting data in sequential order compared to random order. This is because once we insert a row into a leaf, that leaf is in memory, and since the next run of rows are destined for the same leaf, they can all be inserted cheaply. According to the disk, it’s almost like you’re just logging the data.

    This property of B-trees leads many people to bend over backwards trying to keep their insertions sequential. In many applications, this is not a natural or easy thing to do, and causes all sorts of problems elsewhere in the system.

  • Use a write buffer: We store up a bunch of insertions in memory. When we have too many, we pick a leaf and write all the rows we’ve stored that are going to that leaf.

    This optimization works when we get to pick a bunch of stored rows that are going to the same leaf. When this happens, we see a speedup: you get to accomplish a bunch of work just for touching a leaf once.

    You can expect a win of a factor of 2 or so for this optimization, which is nice, but leaves a factor of more than 100 on the table. The query cost doesn’t take much of a hit in this case. Even though you do have to query the write buffer, it’s in memory so it’s way faster than querying the tree on disk.

  • OLAP: OLAP is a bunch of things, but with respect to insertions, the idea is to save up a big batch of rows, pre-sort them, and then insert them into an existing B-tree in sorted order. You can think of it like an on-disk version of the insertion buffer. The big win happens when the amount you batch up is of size comparable to the B-tree you already have, and this gets you the insertion boost that a write buffer can’t achieve.

    The downside is that, unlike the in-memory write buffer, the rows you are buffering don’t get indexed—they just get logged, and we already said how bad query performance is for a log. In practice, OLAP users don’t even look at their data until it gets indexed.

    So by batching more, you get better insertion speed, but wait longer before your data is available for queries. In this case, you get a write/freshness tradeoff, rather than a write/read tradeoff, but the fundamental reason is the same.

  • Use fewer indexes: In general, each row inserted must go to the leaf of every index. This technique is just: maintain fewer indexes, so we have less to do. We wanted to maximize the work accomplished per leaf we touch, but now we’re minimizing the number of leaves we need to touch per amount of work (row insertion), but we get the same effect.

    This is less like a technique to be employed, and more like an artifact of bad B-tree performance, but it’s probably the most common “optimization.” Its downside makes it the clearest case for true write optimization: if you can’t afford to keep the right set of indexes, you kill your query performance, and this is without a doubt a read/write tradeoff.

These are pretty common tricks for speeding up B-tree insertions. You may have even tried some of them yourself. Each one has a downside though, and often they’re not obvious in production. Maybe if you didn’t need to insert sequentially for speed, you could simplify your application in a useful way. Or if you thought you could afford more indexes, you’d spend the time to think up cool new ways to analyze your data.

I think the reason people are convinced that B-trees are on the optimal tradeoff curve is pretty simple. Since all of the popular write optimizations are modifications to a B-tree, most people end up stuck on the B-tree tradeoff curve. But contrary to conventional wisdom, it is possible to do a lot better. Next post, I’ll explain why.


1: That’s because we can amortize the cost of the I/O—which is needed by a B-tree for most leaves, when the database is bigger than memory—against the work we are able to complete.

PlanetMySQL Voting: Vote UP / Vote DOWN

Online Advertiser Intent Media Selects TokuDB over InnoDB and NoSQL for Big Data Ad-Hoc Analysis

Сентябрь 8th, 2011

Intent Media

Issue addressed: Ad hoc analytics on clickstream data arriving too fast for InnoDB or NoSQL to handle.

TokuDB powers an online advertising application

The Company: Headquartered in New York, Intent Media is a fast-growing online advertising startup. The company helps some of the largest online retailers monetize their traffic more efficiently at scale by showing highly relevant and targeted advertising to the 97+% of e-commerce visitors who do not transact.

The Challenge: The Intent Media platform processes hundreds of millions of events a day generated by media placements across leading e-commerce sites — a textbook “Big Data” challenge. Intent Media’s data is used to optimize media placements, drive segmentation models, and create analytics reports supporting publishers, advertisers, and internal business processes. Intent Media hosts its systems on Amazon EC2, and its TokuDB database contains tables approaching a billion records.

Intent Media turned to TokuDB to support ad hoc analysis for two core reasons: performance at scale with massive data volumes; and analyst familiarity with the SQL toolset.

A number of options they had considered were insufficient. These included:

InnoDB – Familiar toolset, but not fast enough. “InnoDB performance breaks down quickly when tables get very large, but our testing demonstrated that TokuDB could provide a familiar SQL environment to analysts that continues to perform superbly as data sizes grow,” according to CTO Josh Hartmann.

Pig/ Hive, backed by MapReduce – Not fast enough for ad-hoc reporting, with limited support. “Analysts need a responsive toolset that can bring back answers in seconds or minutes – not hours,” Hartmann said. “While Pig and Hive on top of MapReduce can handle very large datasets, it comes at a big cost. Both tools are much less responsive in the hands of analysts, and in the case of Pig requires retooling the team to learn a new language.”

Other NoSQL solutions – Promising performance in limited situations, but with big functional limitations. “We looked at a variety of NoSQL engines, but the ability for our data analyst team to stick with what they know was key for us,” Hartmann said. “Our analysts can write more complex queries with joins without having to fall back to implementing logic in software.”

The Solution: Intent Media imports its data into TokuDB.

Intent Media’s original installation of TokuDB in 2010 was completed in a matter of hours. Since then, they have upgraded to TokuDB v5.0 to take advantage of its rich feature set, including Hot Column Addition and Deletion (HCAD).

As a growing business with an evolving data model, HCAD was a big win for Intent Media. Now Intent Media has the flexibility to quickly and painlessly modify their schema on the fly, without taking the database offline.

“Column additions in the past simply were not practical, taking days to complete,” Hartmann said. “They now take a matter of seconds, and can be accomplished in a non-disruptive fashion. This has dramatically improved our ability to adapt to the changing needs of our business, without fear that a schema change would lock up a table for a week or more, blocking other time-sensitive analyses.”

The Benefits

Performance: When evaluating TokuDB, Intent Media looked at several metrics. “Insert performance was important to us, but even more critical was how fast queries run after the fact,” Hartmann said. This behavior helped drive the decision to TokuDB.

Scalability: “Tokutek has been with us from the beginning, starting with a few million rows at the start, to scaling with us now for a database with tables approaching billions of rows. With TokuDB, we’ve been able to keep up with this growth with consistently fast performance,” Hartmann said. “Managing terabytes of data now is as easy as managing 50 gigabytes was at the beginning.”

Flexibility: “This dramatic reduction in time it takes to add a column will allow us to continually and dynamically test and adopt algorithms on a daily basis,” Hartmann said. “It gives our business the agility that our competitors lack and allows us to maximize performance for our customers.”

SQL Interface: By not having to switch to a NoSQL solution or a MapReduce toolset such as Pig, Intent Media was able to leverage capabilities such as rich indexing and a powerful high level language like SQL.


PlanetMySQL Voting: Vote UP / Vote DOWN

Ask What Your Database Can Do for Your Country

Август 31st, 2011
Adding Machine

How many in your household again?

One of President John Kennedy’s most memorable phrases is “ask not what your country can do for you –  ask what can you do for your country”.  I got to thinking about this over lunch with a fellow colleague in the big data space. After comparing named customers for a while, we realized we had forgotten one of the biggest “big data” customers whom we both have in common – the government.

Whether you believe in small or big government, one thing is for certain – it has some very big data on its hands. Some of this is freely available, such as the census data that was recently released. They have so much “big data” in fact, that the US government even went so far as to set up an entire subcommittee on the topic.

Tokutek has been privileged to be in the middle of many of these conversations. This includes discussions at the state level, where Tokutek was the sole database company recently invited to an international business discussion at the Massachusetts statehouse. Several of our recent talks and product enhancements have also squarely hit upon federal government needs, including:

  • Keeping the Earth Green
  • Ensuring Defense
    • Big Data presents a wealth of information; however harnessing it can be a struggle. One area for this is machine-generated data, such as sensor networks, can provide a fire hose of useful data, but can also be burdensome to work with. Earlier this year, our CTO spoke at the Morrelly Homeland Security Center about how to handle data ingestion rates, a critical area for defense systems.

The US government certainly has some sizable challenges these days, with large deficits and rancorous politics. Some of these issues almost seem intractable. But one area where technology has a clear way to help the government is with its big data problems. We’ve been grateful to be able to work on some of the most interesting challenges and learn quite a bit from the community.


PlanetMySQL Voting: Vote UP / Vote DOWN

It Actually is Easy Being Green

Август 11th, 2011

(Fractal) Tree Frog

Fractal Tree™ indexes are green. They have the potential to be greener still. Here’s why:

Remarkably, data centers consume 1-3 percent of all the US electricity. A majority of this power is used to drive servers and storage systems. Significant energy savings remain on the table.

Here’s why Factal Tree indexing enables more energy-efficient storage: Data centers typically use many small-capacity disks rather than a few large-capacity disks. Why? One reason is to harness more spindles to obtain more I/Os per second. In some high-performance applications, users go so far as to employ techniques such as “short stroking” to get more performance (and less storage) out of drives. But Fractal Tree indexes are so I/O-efficient that they don’t need as many I/Os.

Consider the power consumption of disks. An enterprise 80 to 160 GB disk runs at something like 4W (idle power), while an enterprise 1-2 TB disk runs at something like 8W (idle power). If you replace many small-capacity disks by a small number of large-capacity disks, you can maintain the same capacity, but reduce your storage power consumption per GB by close to an order of magnitude. So Fractal Tree indexes enable energy-efficient hardware when the metric is Watts per GB.

For a databases, however, joules per DB operation may be a better metric. Fractal Tree indexes are so I/O efficient, that they are terrific when measured as Joules per operation.

What about power consumed by servers? A lot of our customers see an increase in server activity due to the increase in throughput. Fractal Tree indexes are so I/O-efficient that they drive CPUs harder, consequently using more power. But, assuming that a user is trying to keep the same overall target number of inserts/deletes, Fractal Tree indexes are still more efficient in terms of joules per database insert/delete.

Given how important these topics are, Bradley and I recently attended the National Science Foundation workshop “Energy-Efficient Data Management” in Arlington, VA. This was a two-day planning meeting, where researchers from industry and academia convened to discuss open problems in energy-efficient data management. We discussed how to devise and deploy new data-management methods and new data-intensive applications that are more energy efficient.

I spoke about how better data structures have the potential to deliver energy savings. For details, see the slides themselves: “How Fast Indexing Makes Databases Greener.”

The main purpose of the talk was to discuss open areas for research. Here are three open problems I covered in my talk. For more details see the slides.

  • Area 1: Develop a massively multithreaded Fractal Tree variant that could run on future-generation machines consisting of thousands very very slow cores.
  • Area 2: Develop an Energy-Efficient SSD/Rotational Disk Hybrid.
  • Area 3: The proof is in the pudding.

Thanks again to the NSF for supporting Tokutek through SBIR grants on topics like these.


PlanetMySQL Voting: Vote UP / Vote DOWN

Cage Match: OldSQL, NoSQL and NewSQL

Июль 29th, 2011

 

When I interviewed at Tokutek, I met a team of distinguished academics and engineers who could calmly and thoughtfully wax eloquent about the finer points of B-tree and Fractal Tree™ indexing,  drive I/Os, and database engines. Soon after, I discovered that several of my colleagues have a second passion — they practice Mixed Martial Arts (MMA). As Wikipedia explains, MMA showcases the “fighters of different disciplines, including boxing, Brazilian Jiu-Jitsu, wrestling, Muay Thai, karate and others.” I’ve since learned about many different fighting styles.

This was useful to understand when an MMA-style fight broke out in the MySQL world earlier this month between the different variants or “styles” of SQL — OldSQL, NewSQL and NoSQL. If you haven’t heard about this (hard to believe), it all started with a GigaOM article. There have been many many posts since with my favorite perhaps being the ones about the mud wrestling priest and Lady Gaga.

As a NewSQL vendor, we keep getting asked for our view on this hullabaloo.  First of all, it’s hard to knock MySQL — it is ubiquitous, it is easy to use and get started with, it leverages well-known SQL interfaces, it offers index flexibility, it has good community support, and delivers a high degree of adaptability, just to name a few points. It is perfect for many applications and hence so tremendously popular.

MySQL does, of course, have its limitations, particularly in the face of larger databases, where performance can dip and maintenance becomes challenging. That leads to customers who try to either extend and tune its performance, employ a lot of workarounds, and/or look to explore a next step or alternative. Sometimes, rightly so, that next step is NoSQL. An example use case is when one has to touch a large amount of data, and where the occasional queries are completely ad-hoc and likely not to be repeated (i.e., massive amounts of research data). Of course, there are inevitable trade-offs with NoSQL – lack of ACID, lack of training and familiar SQL interfaces, lack of indexes etc…just to name a few. In this particular Stonebraker-Facebook disagreement, Facebook questioned how realistic it would be to go give up MySQL to go to all RAM when disks are so much cheaper. I doubt Facebook raised over $1B in cash, just so they could go all in-memory.

On the other hand, as Curt Monash noted, when one needs to extend the performance of relational databases, there are many alternatives for scaling or extending MySQL (Tokutek being one of them). TokuDB brings a number of capabilities — high insertion rates, ability to improve queries with rich indexing, on the fly schema modifications, efficient use of disk I/O and more — to the table (pun intended). As opposed to giving up SQL or ACID functionality (or paying for it at the application level) to get additional horsepower, an innovative architecture such as Fractal Tree™ indexes can be employed to significantly improve performance. Having the flexibility of MySQL plus the power of a new storage engine means that one has a single versatile solution. To borrow a phrase from NimbusDB, “in reality, one size fits most” needs, especially when the general solution gets such a performance boost. That’s good news for customers who simply want a more potent general purpose database without having to adopt a whole new “style.”  Tokutek gives you that with a MySQL storage engine that leaves all your application logic and MySQL code completely unchanged. Tokutek is, in fact, less like a new “style” and more like a whole next belt level or set of powerful new “moves” for an existing and proven style.

So what MySQL variant or “style” is right for your organization? In some cases a radical approach may be warranted. Your data center is your ring (or cage) – you know the challenges better than anyone. But we have found that in many cases, a few new “moves” may do the trick and save you the time, money, and the challenges of new training. And with a few new moves you may find your database problems are suddenly a lot easier to take down.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

This Weekend in Japan

Июль 25th, 2011

We were happy to see a lot of folks from Japan on Twitter this weekend having a discussion about MySQL and Tokutek. While we always endeavor to explain ourselves as simply as possible, hearing what users and peers have to say and ask in their native language is very helpful. Here is a sampling of several of the 30+ tweets and re-tweets (translations courtesy of a colleague I know from frequent past visits to Tokyo and Yokohama):

.

First, @frsyuki provided a general overview:

“TokuDB” 新種のMySQLストレージエンジン。INSERTが20〜80倍ほど速い、パーティションなしで数TBのデータを突っ込める、MVCCサポートなど。Fractal Treeというアルゴリズムを実装しているらしい。http://www.tokutek.com/

(Translated: TokuDB is a new type MySQL Storage Engine.  The main features are a) INSERT speed is 20-80times faster, b) load several TB data without partitions and c) support MVCC.  http://www.tokutek.com/)

.

Next, @mtanda remarked on TokuDB v5.0′s addition of Hot Schema Changes:

TokuDBの説明にHot Schema Changesとある。これもけっこう便利そう。 http://ow.ly/5LtVr

(Translated:  I found Hot Schema Changes in the explanation of TokuDB.  It sounds very convenient. http://ow.ly/5LtVr)

.

We heard how one of our founders gave hope for theorists from  @_eiko_ :

@frsyuki TokuDB開発者のKuszmaulさんは、以前はAkamaiのネットワークのアーキテクト、古くはThinking Machines CM-5というスパコンのネットワーク開発者。理論やさんの活躍の場って多彩。

(Translated: TokuDB developer “Kuszmaul” used to work as the network architect in Akamai, and as the network developer of the supercomputer called Thinking Machines CM-5 in the past.  There are many opportunities for a “theorist”. )

.

All this chatter got the attention of some new folks such as @KrdLab:

Fractal Tree というものがあるんですね.知らんかった. http://bit.ly/pbUS1Q

(Translated: I did not know things like a “Fractal Tree” exists…. http://bit.ly/pbUS1Q )

.

Finally, @repeatedly could not have summed it up better:

これからの時代はB-TreeではなくてFractal-Treeなのか? > http://en.wikipedia.org/wiki/TokuDB#Fractal_Tree_Indexes

(Translated: Time flies so fast… it appears the trend has now gone from B-Trees to Fractal-Trees? http://en.wikipedia.org/wiki/TokuDB#Fractal_Tree_Indexes)

.

So, for those reading from Japan, いらっしゃいませ (welcome).

To address some of the questions and comments raised above:

In general, even if we are limited today to responding only in English, we are always happy to try to explain the benefits of Fractal Tree™ indexing , work with you to help select optimal indexes, and troubleshoot when it seems like TokuDB isn’t performing as well as expected. Please drop us a line anytime, either via e-mail or twitter.

Thanks again to Sadayuki for kicking off this Twitter stream. Next time you are in Boston, or I am in Japan, the first Suntory is on me.


PlanetMySQL Voting: Vote UP / Vote DOWN

Dude, Where’s my Fractal Tree?

Июль 18th, 2011

Unless you are Aston Kutcher (@aplusk), or one of his Hollywood buddies, you don’t need to read any further. Allow me to explain…

Over the weekend, we launched our new website. This type of announcement used to be interesting in the high-tech world. I heard Kara Swisher of the WSJ’s All things D speak at a MassTLC event in May.  She admitted back in the 1990s, when the web was just getting into high gear, that a new website from an interesting company might actually get some coverage. Not anymore.

I’ve also been told at all the SEO classes I’ve taken that as much as marketing folks sweat over every detail, link, font and color, it all doesn’t matter. And as far as Google is concerned, your site could be all in 5 point courier grey font on a black background – as long as you have the right keywords and lots of links to you, you’ll be ranked well and the right people could find you.

So, who can I share my excitement with over our new site? It occurred to me this weekend – Ashton Kutcher and his Hollywood pals!

With Ashton’s recent investment in MemSQL maybe Hollywood is finally getting hip to databases! If that’s the case, then Ashton, do we have a site for you! The all new Tokutek.com brings you:

We hope you’ll swing by and check us out!


PlanetMySQL Voting: Vote UP / Vote DOWN

Announcing TokuDB v4.1.1

Август 20th, 2010

Tokutek is pleased to announce immediate availability of TokuDB for MySQL, version 4.1.1. It is ideally suited for delivering fast response times for complex / high-volume Web applications that must simultaneously store and query large volumes of rapidly arriving data:

  • Social Networking
  • Real-time clickstream analysis
  • Logfile Analysis
  • eCommerce Personalization
  • High-speed Webcrawling

TokuDB v4.1.1 replaces TokuDB v4.1.0 and is recommended for all users. (We found a bug in v4.1.0 and have withdrawn it from our website). The new version has all of v4.1.0′s new features, including support for SAVEPOINT and an even better Fast Loader. As always, this release uses our high-performance Fractal Tree™ indexing to provide a unique combination of capabilities:

  • 10x-50x faster indexing for faster querying
  • Full support for ACID transactions
  • Short recovery time (seconds or minutes, not hours or days)
  • Immunity to database aging to eliminate performance degradation and maintenance headaches
  • 5x-15x data compression for reduced disk use and lower storage costs

PlanetMySQL Voting: Vote UP / Vote DOWN