Archive for the ‘announcement’ Category

SwRI Chooses TokuDB to Tackle Machine Data for an 800M+ Record Database

Май 16th, 2012

Tackling machine data on the ground to ensure successful operations for NASA in space

Issues addressed:

  • Scaling MySQL to multi-terabytes
  • Insertion rates as InnoDB hit a performance wall
  • Schema flexibility to handle an evolving data model

The Company:  Southwest Research Institute (SwRI) is an independent, nonprofit applied research and development organization. The staff of more than 3,000 specializes in the creation and transfer of technology in engineering and the physical sciences. Currently, SwRI is part of an international team working on the NASA Magnetospheric Multiscale (MMS) mission. MMS is a Solar Terrestrial Probes mission comprising four identically instrumented spacecraft that will use Earth’s magnetosphere as a laboratory to study the microphysics of three fundamental plasma processes: magnetic reconnection, energetic particle acceleration, and turbulence.

The Challenge:  SwRI is responsible for archiving an enormous quantity of data generated by the Hot Plasma Composition Analyzer (HPCA). The device is used to count hydrogen, helium, and oxygen ions in space at different energy levels. These instruments require extensive calibration data and each one is a customized, high precision device that is built, tested, and integrated by hand. SwRI must capture and store all the test and calibration data during the 2-3 week bursts activity that are required for each of the 4 devices.

“During each of these calibration runs, there are several data sources flowing into the server, each one leading to an index in the database,” said Greg Dunn, a Senior Research Engineer at SWRI. “Each packet that arrives gets a timestamp, message type, file name and location associated with it. A second process goes through that data and parses it out – information such as voltage, temperature, pressure, current, ion energy, particle counts, and instrument health must be inserted into the database for every record. This can load the database with up to 400 or 500 inserts per second.”

“Being able to monitor the performance of the instrument and judge the success of the tests and calibrations in near real time is critical to the project,” noted Dunn. “There are limited windows to do testing cycles and make adjustments for any issues that arise. Any significant slip in the testing could cost tens of thousands of dollars and jeopardize the timing of the satellite launch.”

“We started seeing red flags with InnoDB early in the ramp-up phase of the project, as our initial data set hit 400GB,” said Dunn. “Size was the first issue. Each test run was generating around 94 million inserts or around 90GB of data, quickly exceeding the capacity allocated for the program. In addition, as our database grew to 800M records, we saw InnoDB insertion performance drop off to a trickle. Even with modest data streams at 100 records per second, InnoDB was topping out at 45 insertions per second. Being able to monitor these crucial calibration activities in a timely fashion and in a cost effective manner was at risk.”

To keep up with the workload and data set, SwRI considered several options, but they failed to meet program performance and price goals. These included:

Partitioning / Separate Databases – “We considered partitioning, but this can be a challenge to set up and it introduces additional complexity,” said Dunn. “We also looked at putting each calibration into its own database, but that would have made it much more difficult to correlate across different databases.”

Additional RAM – “Increasing the available RAM from 12 GB up to 100 GB was not enough by itself,” claimed Dunn. “We briefly considered keeping everything in RAM, but that was not a realistic or efficient way to address a data set size that was promising to grow to several terabytes by the end of the program.”

The Solution:  Once TokuDB was installed, SwRI’s big data management headache quickly subsided. “The impact to our required storage was dramatic,” noted Dunn. “We benefited from over 9x compression. In our comparison benchmarks, we went from 452GB with InnoDB to 49GB with TokuDB.”

There was also a dramatic improvement in performance. “Suddenly, we no longer had to struggle to keep up with hundreds of insertions per second,” stated Dunn. “Our research staff could immediately see whether or not the experiment was running correctly and whether the test chamber was being used effectively. We didn’t have to worry that insufficient data analysis horsepower might lead to downstream schedule delays.”

The Benefits: 

Cost Savings: “The hardware savings were impressive,” noted Dunn. “With InnoDB, going to larger servers, adding 100s of GBs of additional RAM along with many additional drives would have easily cost $20,000 or more, and still would not have addressed all our needs. TokuDB was by far both a cheaper and simpler solution.”

Hot Column Addition: “As we continue to build out the system and retool the experiments, flexibility in schema remains important,” stated Dunn. “TokuDB’s capability to quickly add columns of data is a good match for our environment, where our facility is still evolving and sometimes has new sensors or monitors installed that need to be added to existing large tables.”

Fast Loader: “The open source toolset that Tokutek designed to parallelize the loading of the database was very helpful,” said Dunn.  “We were able to bring down the load of the database from MySQL dump backup from 30 hours to 7 hours.”


PlanetMySQL Voting: Vote UP / Vote DOWN

Tokutek Welcomes Gerry Narvaja!

Май 14th, 2012

We are excited to have Gerry Narvaja start today at Tokutek! Gerry has spent more than 25 years in the software industry, most of them working with databases for different kinds of applications, from embedded to large-scale web products. Gerry worked first at MySQL, and then Sun Microsystems supporting the Sales teams. In 2008 he transitioned into being a Senior MySQL DBA. Gerry graduated as an Electronic Engineer from I.T.B.A (Instituto Tecnológico de Buenos Aires) and has an M.B.A. from Universidad del Salvador in collaboration with S.U.N.Y.A (State University of NY at Albany).

Gerry enjoys helping users to solve complex database production issues. For almost a year he has been co-hosting the popular MySQL Community podcast, OurSQL, which was given the MySQL Community Contributor of the Year 2012 award at the recent Percona MySQL Users Conference. Gerry and Martín Farach-Colton, our CTO, will also be speaking next month at the first ever Latin American MySQL / MariaDB Conference in Argentina.

Please feel free to drop Gerry a line at gerry@tokutek.com with your toughest MySQL and MariaDB issues!


PlanetMySQL Voting: Vote UP / Vote DOWN

Tokutek Welcomes Gerry Narvaja!

Май 14th, 2012

We are excited to have Gerry Narvaja start today at Tokutek! Gerry has spent more than 25 years in the software industry, most of them working with databases for different kinds of applications, from embedded to large-scale web products. Gerry worked first at MySQL, and then Sun Microsystems supporting the Sales teams. In 2008 he transitioned into being a Senior MySQL DBA. Gerry graduated as an Electronic Engineer from I.T.B.A (Instituto Tecnológico de Buenos Aires) and has an M.B.A. from Universidad del Salvador in collaboration with S.U.N.Y.A (State University of NY at Albany).

Gerry enjoys helping users to solve complex database production issues. For almost a year he has been co-hosting the popular MySQL Community podcast, OurSQL, which was given the MySQL Community Contributor of the Year 2012 award at the recent Percona MySQL Users Conference. Gerry and Martín Farach-Colton, our CTO, will also be speaking next month at the first ever Latin American MySQL / MariaDB Conference in Argentina.

Please feel free to drop Gerry a line at gerry@tokutek.com with your toughest MySQL and MariaDB issues!


PlanetMySQL Voting: Vote UP / Vote DOWN

Tokutek and PalominoDB Partner to Bring Scale, Performance to Database Deployments

Май 2nd, 2012

MySQL storage engine provider joins forces with leading database consultants to deliver support for growing number of MySQL and MariaDB customers

Lexington, MA – (May 2, 2012) – Tokutek, the leader in high-performance and agile database storage engines, today announced a strategic partnership with PalominoDB, a premier database operations and engineering consultancy, to provide database services and support to joint customers. Tokutek’s storage engine will be complemented with PalominoDB’s operational excellence, 24×7 on-call support and access to the company’s skilled team of professional database administrators (DBAs).

“TokuDB has immeasurably improved our ability to react to changing business requirements in a large data environment. The ability to change schemas and indexes on the fly and no need to repair fragmented indexes has led to a simplification of our environment and reduced maintenance windows,” said Adrian Roston, CTO, Frequency. “With PalominoDB’s knowledge and expertise, we were rapidly able to leverage TokuDB’s advantages and substantially improve our system’s throughput.”

TokuDB is a highly scalable, zero-maintenance downtime MySQL Storage Engine that delivers indexing-based query acceleration, improved replication performance, unparalleled compression, and hot schema modifications. Under the agreement, PalominoDB will provide end-to-end solutions and support for MySQL and MariaDB systems that run on the TokuDB storage engine.

“Tokutek’s ability to improve database performance brings an entirely new value proposition to MySQL,” said Laine Campbell, Owner and CEO at PalominoDB.

“In partnering with Tokutek, PalominoDB is making a firm commitment to expanding MySQL’s viability as an enterprise-class database capable of supporting complex queries with high data rates on terabyte-scale databases.”

“PalominoDB brings unrivaled domain expertise and a range of service offerings to the MySQL and MariaDB market,” said John Partridge, President and CEO of Tokutek. “Tokutek’s partnership with PalominoDB will help TokuDB deployments go smoothly and provide access to extended support and design capabilities for customers needing those services.”

 

About PalominoDB

For startups and established companies of all sizes, PalominoDB provides ongoing operational support and professional expertise in database architecture, performance and scale. With a focus on open-source and other best-in-class software components, and extensive experience in all major and emerging database technologies, PalominoDB engages with customers to develop custom, cost-effective projects and long-term support contracts in areas from system design to automation to business intelligence and more. PalominoDB is renowned for an emphasis on transparency, communication and responsiveness, as well as providing operational excellence for leading companies including Zappos, Chegg, Technorati, Slideshare and Zendesk. For more information, please visit www.palominodb.com

About Tokutek Inc.
Tokutek, Inc. is the leader in high-performance and agile database storage engines. TokuDB is a highly scalable, zero-maintenance downtime MySQL Storage Engine that delivers indexing-based query acceleration, improved replication performance, unparalleled compression, and hot schema modifications. TokuDB is a “drop-in” storage engine requiring no changes to MySQL applications or code and is fully ACID and MVCC compliant. The company is headquartered in Lexington, MA and has offices in New York, NY. For more information, visit tokutek.com.



PlanetMySQL Voting: Vote UP / Vote DOWN

TokuDB v6.0: Download Available

Апрель 30th, 2012

TokuDB v6.0 is full of great improvements, like getting rid of slave lag, better compression, improved checkpointing, and support for XA.

I’m happy to announce that TokuDB v6.0 is now generally available and can be downloaded here.

Sysbench Performance

I wanted to take this time to talk about one more under-the-hood goody we’ve added to v6.0. In particular, we’ve been working on our locking schemes and have made some nice improvements in multi-threaded performance. In TokuDB v5.2, we outperformed InnoDB on sysbench by about 20% out to 64 threads. The following shows the performance of TokuDB v6.0 vs InnoDB on the same test:

InnoDB now has better multi-threading as well, so with standard compression on, we are now neck-in-neck with InnoDB out to 64 client threads, and then pull ahead out to 1024 client threads. With high compression, we top out at 72% faster than InnoDB!

We hope you enjoy this and all the other TokuDB v6.0 improvements.

To learn more about TokuDB:

  • Read the press release here.
  • Hear me talk about TokuDB v6.0 on the MySQL Database Community Podcast in Episode 86.
  • Read the Bloor Research Report on TokuDB v6.0 here.

    PlanetMySQL Voting: Vote UP / Vote DOWN

My Talk on Tuesday at IOUG COLLABORATE 12

Апрель 20th, 2012

 

 

Challenges of Big Databases with MySQL

Many database management tasks become difficult as you move from millions of rows and gigabytes of data to billions of rows and terabytes of data. Such tasks include ingesting data while maintaining indexes; changing schemas without downtime; and supporting connections, replication, and backup. For some scaling problems (connections and replication), MySQL is better than most of the competition. For others, such as indexing, schema changes, and backup, MySQL has typically been harder to use. Fortunately, the tasks MySQL does well are in its core, whereas the tasks that are more difficult can be solved with storage engine plug-ins.

This presentation discusses how MySQL’s storage engines have recently made dramatic progress in large database manageability. I’ll be speaking Tuesday (4/24) 8:00 am in Lagoon D. Details can be found here. A complete list of MySQL talks can be found here.


PlanetMySQL Voting: Vote UP / Vote DOWN

Percona MySQL Conference and Expo Week in Review

Апрель 18th, 2012

Thanks to all of those who came by our booth and to see Leif’s presentation on Read Optimization, and to my Lightning Talk on OLTP and OLAP at the Percona MySQL Conference and Expo. It was an incredible week and a great place to launch TokuDB v6.0 from! A big thanks to Percona for a great event, to Pythian for a fantastic dinner, and to SkySQL for a worthwhile follow on. We are also very grateful to Network World for giving us a product of the week award, and to Bloor Research for an insightful review of TokuDB v6.0.

Mr. Bill Gets Hammered by Big Data

For those who missed it, here is a copy of Leif’s presentation with a good photo from Percona. Thanks to Sheeri for her tweet as well. In addition, here is a copy of my Lightning Talk (in case you were too distracted by Mr.Bill). There were some great photos taken by Mark Lehmann (including the one shown above, and those in the “Scanner Wars“) as well as Percona. Thanks to Erin,  SheeriAmrith and Ernie for their tweets too!

I considered a detailed conference review, but others have already captured the event so well that there was little to add. In case you missed it, there are great write-ups by O’Reilly, Percona, Shlomi, and several others.

Thanks again to those who came by!

 


PlanetMySQL Voting: Vote UP / Vote DOWN

This Monday: Silicon Valley NewSQL Meetup

Апрель 13th, 2012

This week, I was at Percona Live, and it was a lot of fun! I even got to give a talk on write optimization techniques (not just ours), that I’m told will be online soon.

But if you missed that, or even if you didn’t, I’m still in the valley until Monday night. I’ll be speaking very briefly, and fielding questions this Monday, April 16th, at 6PM, at the Silicon Valley NewSQL group’s meetup in Sunnyvale. It’s shaping up to be a great crowd—Amazon, Microsoft, Clustrix, VoltDB, Drizzle, and many others will be there. So if you’re at all curious, come on by, I’d love to talk to you. Hope to see you there!

To learn more about TokuDB:


PlanetMySQL Voting: Vote UP / Vote DOWN

TokuDB v6.0: Frequent Checkpoints with No Performance Hit

Апрель 13th, 2012

Checkpointing — which involves periodically writing out dirty pages from memory — is central to the design of crash recovery for both TokuDB and InnoDB. A key issue in designing a checkpointing system is how often to checkpoint, and TokuDB takes a very different approach from InnoDB. The frequency of InnoDB depends on the amount of fuzzy checkpointing, the log-file size, and how often the memory files with dirty pages — but the upshot is that it runs a checkpoint infrequently. TokuDB runs a checkpoint one minute after the last one ended.

Frequent checkpoints make for fast recovery. Once MySQL crashes, the storage engine needs to replay the log to get back to a correct state. The length of the log is a function of the time since the last checkpoint. And replaying the log is single threaded. So TokuDB recovers in minutes, and usually much faster. If InnoDB crashes late in its checkpoint cycle it can take hours or more to recover. Indeed, there is considerable lore around making InnoDB recover faster.

So what’s the downside to frequent checkpoints? Up until now, the answer was simple: when you are in a checkpoint, your performance drops. This was famously illustrated for InnoDB when Vadim Tkachenko at Percona Consulting showed that MySQL could become completely unresponsive for minutes at a time during an InnoDB checkpoint. We see a similar outcome here:

In this case we see a stall in which the throughput drops to around 25%, and the stall lasts for minutes. I want to stress that fuzzy checkpointing was designed to help avoid catastrophic checkpoints, and sometimes it works, but the tpcc benchmark shows that it doesn’t always work.

In previous versions of TokuDB, we also had a dip in performance associated with checkpoints, but frequent checkpoints are also smaller checkpoints, so our performance would drop to around 80% of peak for a couple of seconds. A drop to 80% is better than a drop to 25%, but we knew we could do better.

And we did. As of TokuDB v6.0, we’ve eliminated the performance variability from checkpointing. We’re still checkpointing just as frequently, so you still get fast recovery. How? It was a combination of reducing the amount of work a checkpoint needs to do and fixing the locking interaction between checkpoints and other operations. Below is a sysbench benchmark. This is a case where InnoDB checkpoint behavior is as good as it gets, and I wanted to compare us with InnoDB’s best case, not its worst case.

Sysbench performance with different compressors

This graph shows that TokuDB v6.0 has no checkpoint variability. It turns out that TokuDB v6.0 with standard compression has about the same average TPS as TokuDB v5.2, but with no checkpointing artifacts. Finally, if you have the CPU budget for it, turning on aggressive compression gives a big boost in transactions per second, still with no checkpointing variability.

So in a nutshell, I feel like we’ve taken care of the checkpointing issue. As of TokuDB v6.0, we have the upside of frequent checkpoints — small logs and fast recovery — without the downside of variability. The engineering team at Tokutek is pretty proud of these results.

To learn more about TokuDB:

  • Download a free trial of TokuDB.
  • Read the press release here.
  • Hear me talk about TokuDB v6.0 on the MySQL Database Community Podcast in Episode 86.
  • Come to our booth #410 at Percona Live.

PlanetMySQL Voting: Vote UP / Vote DOWN

TokuDB v6.0: Frequent Checkpoints with No Performance Hit

Апрель 13th, 2012

Checkpointing — which involves periodically writing out dirty pages from memory — is central to the design of crash recovery for both TokuDB and InnoDB. A key issue in designing a checkpointing system is how often to checkpoint, and TokuDB takes a very different approach from InnoDB. The frequency of InnoDB depends on the amount of fuzzy checkpointing, the log-file size, and how often the memory files with dirty pages — but the upshot is that it runs a checkpoint infrequently. TokuDB runs a checkpoint one minute after the last one ended.

Frequent checkpoints make for fast recovery. Once MySQL crashes, the storage engine needs to replay the log to get back to a correct state. The length of the log is a function of the time since the last checkpoint. And replaying the log is single threaded. So TokuDB recovers in minutes, and usually much faster. If InnoDB crashes late in its checkpoint cycle it can take hours or more to recover. Indeed, there is considerable lore around making InnoDB recover faster.

So what’s the downside to frequent checkpoints? Up until now, the answer was simple: when you are in a checkpoint, your performance drops. This was famously illustrated for InnoDB when Vadim Tkachenko at Percona Consulting showed that MySQL could become completely unresponsive for minutes at a time during an InnoDB checkpoint. We see a similar outcome here:

In this case we see a stall in which the throughput drops to around 25%, and the stall lasts for minutes. I want to stress that fuzzy checkpointing was designed to help avoid catastrophic checkpoints, and sometimes it works, but the tpcc benchmark shows that it doesn’t always work.

In previous versions of TokuDB, we also had a dip in performance associated with checkpoints, but frequent checkpoints are also smaller checkpoints, so our performance would drop to around 80% of peak for a couple of seconds. A drop to 80% is better than a drop to 25%, but we knew we could do better.

And we did. As of TokuDB v6.0, we’ve eliminated the performance variability from checkpointing. We’re still checkpointing just as frequently, so you still get fast recovery. How? It was a combination of reducing the amount of work a checkpoint needs to do and fixing the locking interaction between checkpoints and other operations. Below is a sysbench benchmark. This is a case where InnoDB checkpoint behavior is as good as it gets, and I wanted to compare us with InnoDB’s best case, not its worst case.

Sysbench performance with different compressors

This graph shows that TokuDB v6.0 has no checkpoint variability. It turns out that TokuDB v6.0 with standard compression has about the same average TPS as TokuDB v5.2, but with no checkpointing artifacts. Finally, if you have the CPU budget for it, turning on aggressive compression gives a big boost in transactions per second, still with no checkpointing variability.

So in a nutshell, I feel like we’ve taken care of the checkpointing issue. As of TokuDB v6.0, we have the upside of frequent checkpoints — small logs and fast recovery — without the downside of variability. The engineering team at Tokutek is pretty proud of these results.

To learn more about TokuDB:

  • Download a free trial of TokuDB.
  • Read the press release here.
  • Hear me talk about TokuDB v6.0 on the MySQL Database Community Podcast in Episode 86.
  • Come to our booth #410 at Percona Live.

PlanetMySQL Voting: Vote UP / Vote DOWN