<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PlanetMysql.ru - информация о СУБД MySQL &#187; Blogging</title>
	<atom:link href="http://planetmysql.ru/category/blogging/feed/" rel="self" type="application/rss+xml" />
	<link>http://planetmysql.ru</link>
	<description>Блог о самой популярной СУБД MySQL</description>
	<lastBuildDate>Thu, 24 May 2012 14:20:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Welcome googleCL</title>
		<link>http://datacharmer.blogspot.com/2010/06/welcome-googlecl.html?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=welcome-googlecl</link>
		<comments>http://datacharmer.blogspot.com/2010/06/welcome-googlecl.html#comments</comments>
		<pubDate>Sat, 19 Jun 2010 16:10:00 +0000</pubDate>
		<dc:creator>Giuseppe Maxia</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[command line]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[partitioning]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[ I am writing this blog post with Vim, my favorite editor, instead of using the online editor offered by blogger. And I am uploading this post to my Blogger account using Google CL a tool that lets you use Google services from the command line.I am a command line geek, and as soon as I saw the announcement, I installed it in my laptop. The mere fact that you are reading this blog post shows that it works.GoogleCL is an apparently simple application. If you install it on Mac using macports you realize how many dependencies it has and how much complexity it gives under the hood.Using an easy to understand syntax, it allows you to access your blog, pictures, calendar, contacts, videos, and online documents at your fingertips. For example, let's query my blog for partitioning:$  google blogger --blog="The Data Charmer" --title=partitioning list "title,url"Hmm. No results. The manual doesn't help much, but something happened during this query. The first thing ist that I was asked to authorize the script to access my blog, and that was done by activating a key that I got in the command line. So far, so good. The second thing was a message informing me that a default configuration file was created in my home directory. Looking at that file, I saw an option saying "regex = True". Aha! So the title supports regular expressions. Let's try:$  google blogger --blog="The Data Charmer" --title=".*partitioning" list "title"Holiday gift - A deep look at MySQL 5.5 partitioning enhancementsThe partition helper - Improving usability with MySQL 5.1 partitioningA quick usability hack with partitioningMySQL 5.1  Improving ARCHIVE performance with partitioningOK. This gives me everything with the word "partitioning" in the title. But I know that some titles are missing. Comparing with the results that I get online, I see that the titles where "partitioning" is capitalized are not reported. So the search is case sensitive. What I need to do is to tell the regular expression that I want a case insensitive search. Fortunately, I know how to speak regular expressions. Let's try again.$  google blogger --blog="The Data Charmer" --title="(?i).*partitioning.*" list "title"Holiday gift - A deep look at MySQL 5.5 partitioning enhancementsPartitioning with non integer values using triggersTutorial on Partitioning at the MySQL Users Conference 2009The partition helper - Improving usability with MySQL 5.1 partitioningA quick usability hack with partitioningMySQL 5.1  Improving ARCHIVE performance with partitioningNow I feel confident enough to do some changes to my online contents.To create this blog post, I used some of googlecl capabilities. After I created an image, I uploaded it to my Picasa album using this command:$google picasa post -n "Blogger Pictures" -t googlecl ~/Desktop/google_cl.png Then I asked Picasa to give me the URL of the image:$ google picasa list -n "Blogger Pictures" --query googlecl title,url_direct google_cl.png,http://lh6.ggpht.com/_gVfZHGgf5LA/TBzjaKiJJvI/AAAAAAAAA74/dthDDhybsmc/google_cl.jpgAnd then I inserted that URL in this blog post. Finally, I uploaded the blog post with this command:google blogger --blog="The Data Charmer" --draft --title "Welcome googleCL" --tags="google,mysql,partitioning,command line,blogging" post ~/blog/welcome_googlecl.html(Now writing online) And after I checked that the post was looking as I wanted it, I hit the "PUBLICH POST" button.Welcome, GoogleCL!]]></description>
			<content:encoded><![CDATA[<table border="0"><tr><td><a href="http://code.google.com/p/googlecl"><img src="http://lh6.ggpht.com/_gVfZHGgf5LA/TBzjaKiJJvI/AAAAAAAAA74/dthDDhybsmc/google_cl.jpg" width="200" /></a> </td><td>I am writing this blog post with Vim, my favorite editor, instead of using the online editor offered by blogger. And I am uploading this post to my Blogger account using <a href="http://code.google.com/p/googlecl">Google CL</a> a tool that lets you use Google services from the command line.<br />I am a command line geek, and as soon as I saw the <a href="http://google-opensource.blogspot.com/2010/06/introducing-google-command-line-tool.html">announcement</a>, I installed it in my laptop. The mere fact that you are reading this blog post shows that it works.</td></tr></table><br />GoogleCL is an apparently simple application. If you install it on Mac using macports you realize how many dependencies it has and how much complexity it gives under the hood.<br />Using an <a href="http://code.google.com/p/googlecl/wiki/ExampleScripts">easy to understand syntax</a>, it allows you to access your blog, pictures, calendar, contacts, videos, and online documents at your fingertips. <br />For example, let's query my blog for partitioning:<br /><pre><code><br />$  google blogger --blog="The Data Charmer" --title=partitioning list "title,url"<br /></code></pre><br />Hmm. No results. The manual doesn't help much, but something happened during this query. The first thing ist that I was asked to authorize the script to access my blog, and that was done by activating a key that I got in the command line. So far, so good. The second thing was a message informing me that a default configuration file was created in my home directory. Looking at that file, I saw an option saying "regex = True". Aha! So the title supports regular expressions. Let's try:<br /><pre><code><br />$  google blogger --blog="The Data Charmer" --title=".*partitioning" list "title"<br />Holiday gift - A deep look at MySQL 5.5 partitioning enhancements<br />The partition helper - Improving usability with MySQL 5.1 partitioning<br />A quick usability hack with partitioning<br />MySQL 5.1  Improving ARCHIVE performance with partitioning<br /></code></pre><br />OK. This gives me everything with the word "partitioning" in the title. But I know that some titles are missing. Comparing with the results that I get online, I see that the titles where "partitioning" is capitalized are not reported. So the search is case sensitive. What I need to do is to tell the regular expression that I want a case insensitive search. Fortunately, I know how to speak <a href="http://xkcd.com/208/">regular expressions</a>. Let's try again.<br /><pre><code><br />$  google blogger --blog="The Data Charmer" --title="(?i).*partitioning.*" list "title"<br />Holiday gift - A deep look at MySQL 5.5 partitioning enhancements<br />Partitioning with non integer values using triggers<br />Tutorial on Partitioning at the MySQL Users Conference 2009<br />The partition helper - Improving usability with MySQL 5.1 partitioning<br />A quick usability hack with partitioning<br />MySQL 5.1  Improving ARCHIVE performance with partitioning<br /></code></pre><br />Now I feel confident enough to do some changes to my online contents.<br />To create this blog post, I used some of <i>googlecl</i> capabilities. After I created an image, I uploaded it to my Picasa album using this command:<br /><pre><code><br />$google picasa post -n "Blogger Pictures" -t googlecl ~/Desktop/google_cl.png <br /></code></pre><br />Then I asked Picasa to give me the URL of the image:<br /><pre><code><br />$ google picasa list -n "Blogger Pictures" --query googlecl title,url_direct <br />google_cl.png,http://lh6.ggpht.com/_gVfZHGgf5LA/TBzjaKiJJvI/AAAAAAAAA74/dthDDhybsmc/google_cl.jpg<br /></code></pre><br />And then I inserted that URL in this blog post. Finally, I uploaded the blog post with this command:<br /><pre><code><br />google blogger --blog="The Data Charmer" --draft --title "Welcome googleCL" --tags="google,mysql,partitioning,command line,blogging" post ~/blog/welcome_googlecl.html<br /></code></pre><br /><br /><i>(Now writing online)</i> And after I checked that the post was looking as I wanted it, I hit the "PUBLICH POST" button.<br />Welcome, GoogleCL!<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/16959946-3458005024637347997?l=datacharmer.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=25060&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=25060&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2010/06/19/welcome-googlecl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Gearman for Nightly Build and Test</title>
		<link>http://blogs.tokutek.com/tokuview/using_gearman_for_nightly_build_and_test/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=using-gearman-for-nightly-build-and-test</link>
		<comments>http://blogs.tokutek.com/tokuview/using_gearman_for_nightly_build_and_test/#comments</comments>
		<pubDate>Sat, 17 Oct 2009 00:18:00 +0000</pubDate>
		<dc:creator>Tokuview Blog</dc:creator>
				<category><![CDATA[Blogging]]></category>

		<guid isPermaLink="false">http://blogs.tokutek.com/tokuview/using_gearman_for_nightly_build_and_test/#When:17:18:00Z</guid>
		<description><![CDATA[At Tokutek, Rich Prohaska used

Gearman to automate our nightly build and

test process for TokuDB for MySQL.&#160; Rich is busy working on TokuDB, so I&#8217;m

writing up an overview of the build and test architecture on his behalf.





Build and Test Process

Rich created a script, nightly.bash, that gets kicked off every night as a cron

job.&#160; Nightly.bash creates a separate Gearman job for each build target.

We have a separate build target (unique binary) for each combination of

operating system (e.g. Linux, Windows, etc.) and HW architecture (e.g.

i686, x86_64) supported by TokuDB.&#160; As we support more operating

systems over time, the number of build targets grows quickly so we needed

a build and test architecture that scales, and Gearman makes it easy.



    Gearman then automatically distributes the build jobs to a set of systems

    set up as &#8220;Build Workers&#8221; with each available worker running a build

    for the specified build in parallel.&#160; For each build that completes,

    successfully, the resulting

    binary is stored in an Amazon S3 bucket and a regression test job is

    submitted to the Gearman job scheduler.&#160; Storing binaries in S3 costs less

    than checking them in to our hosted svn repository.



    Test jobs are then distributed by Gearman to a set of &#8220;Test Workers&#8221; where

    the appropriate binary is read from Amazon S3 and regression tests are run.

    We currrently run mysql tests, sql bench tests, and TokuDB specific

    tests. New tests can easily be added by modifying scripts.

    Jobs submitted to Gearman specify a &#8220;function,&#8221; and workers specify 

    supported functions when registering with Gearman.&#160; Functions are used

    to distribute build and test jobs to servers with the correct

    operating system and HW architecture.



Looking Ahead

    Using Gearman, we built a highly flexible build and test

    infrastructure that easily scales to support more build targets and

    additional tests.&#160; Build and test workers can be added seamlessly to increase

    capacity and reduce the elapsed time required to finish the build and

    test cycle.&#160; Currently, everything runs on physical servers inside Tokutek

    but we designed the architecture to run in the cloud where virtual machines

    can be created on demand to support a large matrix of build targets and to

    complete large regression cycles quickly.&#160; All we need now is SSL support

    in Gearman to deploy on Amazon EC3&#8230;


]]></description>
			<content:encoded><![CDATA[<p>At <a href="http://tokutek.com">Tokutek</a>, Rich Prohaska used
<br />
<a href="http://gearman.org/">Gearman</a> to automate our nightly build and
<br />
test process for TokuDB for MySQL.&nbsp; Rich is busy working on TokuDB, so I&#8217;m
<br />
writing up an overview of the build and test architecture on his behalf.
<br />
</p>
<br />
<img src="http://www.tokutek.com/images/uploads/gearman-nightly.png" style="border: 0;" alt="Nightly Build and Test Architecture" width="525" height="301" />
<br />
<h3>Build and Test Process</h3><p>
<p>
Rich created a script, nightly.bash, that gets kicked off every night as a cron
<br />
job.&nbsp; Nightly.bash creates a separate Gearman job for each build target.
<br />
We have a separate build target (unique binary) for each combination of
<br />
operating system (e.g. Linux, Windows, etc.) and HW architecture (e.g.
<br />
i686, x86_64) supported by TokuDB.&nbsp; As we support more operating
<br />
systems over time, the number of build targets grows quickly so we needed
<br />
a build and test architecture that scales, and Gearman makes it easy.
<br />
</p>
<p>
    Gearman then automatically distributes the build jobs to a set of systems
<br />
    set up as &#8220;Build Workers&#8221; with each available worker running a build
<br />
    for the specified build in parallel.&nbsp; For each build that completes,
<br />
    successfully, the resulting
<br />
    binary is stored in an Amazon S3 bucket and a regression test job is
<br />
    submitted to the Gearman job scheduler.&nbsp; Storing binaries in S3 costs less
<br />
    than checking them in to our hosted svn repository.
<br />
</p>
<p>
    Test jobs are then distributed by Gearman to a set of &#8220;Test Workers&#8221; where
<br />
    the appropriate binary is read from Amazon S3 and regression tests are run.
<br />
    We currrently run mysql tests, sql bench tests, and TokuDB specific
<br />
    tests. New tests can easily be added by modifying scripts.
<br />
    Jobs submitted to Gearman specify a &#8220;function,&#8221; and workers specify 
<br />
    supported functions when registering with Gearman.&nbsp; Functions are used
<br />
    to distribute build and test jobs to servers with the correct
<br />
    operating system and HW architecture.
<br />
</p>
<br />
<h3>Looking Ahead</h3><p>
<p>
    Using Gearman, we built a highly flexible build and test
<br />
    infrastructure that easily scales to support more build targets and
<br />
    additional tests.&nbsp; Build and test workers can be added seamlessly to increase
<br />
    capacity and reduce the elapsed time required to finish the build and
<br />
    test cycle.&nbsp; Currently, everything runs on physical servers inside Tokutek
<br />
    but we designed the architecture to run in the cloud where virtual machines
<br />
    can be created on demand to support a large matrix of build targets and to
<br />
    complete large regression cycles quickly.&nbsp; All we need now is SSL support
<br />
    in Gearman to deploy on Amazon EC3&#8230;
<br />

</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21759&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21759&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/10/17/using-gearman-for-nightly-build-and-test/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&quot;Idle&quot;</title>
		<link>http://www.krisbuytaert.be/blog/idle?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=idle</link>
		<comments>http://www.krisbuytaert.be/blog/idle#comments</comments>
		<pubDate>Wed, 23 Sep 2009 21:25:49 +0000</pubDate>
		<dc:creator>Kris Buytaert</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[drupal]]></category>
		<category><![CDATA[european comission]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[reading]]></category>
		<category><![CDATA[reviewing]]></category>
		<category><![CDATA[snorkle]]></category>
		<category><![CDATA[sun]]></category>
		<category><![CDATA[t-dose]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[For those who wonder why my blogging is so low these days (apart from today)  .. I`m actually writing more Lines of Code than Blog Entries the last couple of weeks:)
And when I`m not writing code I`m reading :)  Either proofreading an upcoming book on Zabbix or reading some of the other books Packt sent me.
Next to that I`m busy preparing my T-Dose  presentation
Oh and did I mention a 40 something questions questionnaire about some merger ?
Technorati Tags: blogging code drupal european comission mysql reading reviewing snorkle sun t-dose  Share with Shareomatic! 



  Trackback URL for this post:

  http://www.krisbuytaert.be/blog/trackback/940

]]></description>
			<content:encoded><![CDATA[<p>For those who wonder why my blogging is so low these days (apart from today)  .. I`m actually writing more Lines of Code than Blog Entries the last couple of weeks:)</p>
<p>And when I`m not writing code I`m reading :)  Either proofreading an upcoming book on <a href="http://www.packtpub.com/zabbix-1-6-network-monitoring/book">Zabbix</a> or reading <a href="http://www.packtpub.com/build-social-networking-website-with-drupal-6/book">some</a> of the <a href="http://www.packtpub.com/building-enterprise-ready-telephony-systems-with-sipxecs-4-0/book">other</a> books Packt sent me.</p>
<p>Next to that I`m busy preparing <a href="http://www.t-dose.org/2009/talk/virtsec">my T-Dose</a>  presentation</p>
<p>Oh and did I mention a 40 something questions questionnaire about some merger ?</p>
<div><img alt="Technorati Tags:" src="http://www.krisbuytaert.be/blog/sites/all/modules/technorati/technobubble.gif" /><strong>Technorati Tags: </strong><a href="http://technorati.com/tag/blogging" rel="tag">blogging</a> <a href="http://technorati.com/tag/code" rel="tag">code</a> <a href="http://technorati.com/tag/drupal" rel="tag">drupal</a> <a href="http://technorati.com/tag/european+comission" rel="tag">european comission</a> <a href="http://technorati.com/tag/mysql" rel="tag">mysql</a> <a href="http://technorati.com/tag/reading" rel="tag">reading</a> <a href="http://technorati.com/tag/reviewing" rel="tag">reviewing</a> <a href="http://technorati.com/tag/snorkle" rel="tag">snorkle</a> <a href="http://technorati.com/tag/sun" rel="tag">sun</a> <a href="http://technorati.com/tag/t-dose" rel="tag">t-dose</a></div><div> <a href="http://www.shareomatic.com/http://www.krisbuytaert.be/blog/idle/shareomatic-drupal/"Idle""><img src="http://www.shareomatic.com/images/s_16_black.gif" alt="Share with Shareomatic!" title="Post this item on various social news sites with Shareomatic!" /></a> <a href="http://www.shareomatic.com/http://www.krisbuytaert.be/blog/idle/shareomatic-drupal/"Idle"">Share with Shareomatic!</a> </div>
<!--
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
<rdf:Description rdf:about="http://www.krisbuytaert.be/blog/idle" dc:identifier="http://www.krisbuytaert.be/blog/idle" dc:title="&quot;Idle&quot;" trackback:ping="http://www.krisbuytaert.be/blog/trackback/940" />
</rdf:RDF>
-->
<div><div>

  <h3>Trackback URL for this post:</h3>

  <div>http://www.krisbuytaert.be/blog/trackback/940</div>
</div>
</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21283&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21283&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/09/24/idle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Attempting to Quantify Fragmentation Effects</title>
		<link>http://blogs.tokutek.com/tokuview/attempting_to_quantify_fragmentation_effects/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=attempting-to-quantify-fragmentation-effects</link>
		<comments>http://blogs.tokutek.com/tokuview/attempting_to_quantify_fragmentation_effects/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 04:41:00 +0000</pubDate>
		<dc:creator>Tokuview Blog</dc:creator>
				<category><![CDATA[Blogging]]></category>

		<guid isPermaLink="false">http://blogs.tokutek.com/tokuview/attempting_to_quantify_fragmentation_effects/#When:21:41:00Z</guid>
		<description><![CDATA[We often hear from customers and MySQL experts that fragmentation causes problems
such as wasting disk space, increasing backup times, and degrading performance.
Typical remedies include periodic "optimize table" or dump and re-load (for example,
see Project Golden Gate).
Unfortunately, these techniques impact database availability and/or require additional administrative
cost and complexity.
Tokutek's
Fractal Tree algorithms do not not cause fragmentation, and we're looking
for ways to measure the effects of fragmentation to quantify TokuDB's benefits.


I ran some tests using the iiBench
benchmark as an experiment to try and quantify the impact of fragmentation, and observed
some interesting results.

Initial Load &#8211; 50M Rows

I created an iiBench table with 50M rows, and recorded how long it took to
complete along with the amount of disk space used by the data and indexes;
log files are *not* included in the reported disk use.


$ iibench.py --setup --insert_only --engine=[innodb &#124; tokudb] \
   --max_rows=50000000 --table_name=t1


OperationTimeTokuDBTimeInnoDBDisk UseTokuDBDisk UseInnoDB
 Insert 50M Rows3,349s28,821s 3.3GiB11GiB 
 select count(*) 56s643s3.4GiB11GiB 
 select count(*) 19.0s14.4s3.5GiB11GiB
 optimize table 153s28,456s4.5GiB20GiB


select count(*) was run twice to see how fast it ran with the cache
warmed up (the query cache was off).

When inserting into TokuDB, some data may remain in internal data structures, so optimize table was run to
push all of the data to disk for an accurate measure of disk use. Initially, I thought running optimize table
would not be necessary for InnoDB, but I ran it to ensure a consistent procedure on both engines.
Surprisingly, InnoDB used much more disk space after running optimize table.
According to this post,
InnoDB's insert buffer is stored in the tablespace, so data in the insert buffer should have been included in the measured
disk use before optimizing the table.  I'm not sure what caused such a large increase in disk use.

Deleting 10M Rows

    mysql&#62; delete from t1 where transactionid ]]></description>
			<content:encoded><![CDATA[<p>We often hear from customers and MySQL experts that fragmentation causes problems
such as wasting disk space, increasing backup times, and degrading performance.
Typical remedies include periodic "optimize table" or dump and re-load (for example,
see <a href="http://everythingmysql.ning.com/profiles/blogs/project-golden-gate">Project Golden Gate</a>).
Unfortunately, these techniques impact database availability and/or require additional administrative
cost and complexity.
<a href="http://tokutek.com">Tokutek's</a>
Fractal Tree algorithms do not not cause fragmentation, and we're looking
for ways to measure the effects of fragmentation to quantify TokuDB's benefits.
</p>
<p>
I ran some tests using the <a href="http://code.google.com/p/google-mysql-tools/source/browse/trunk/ibench.py">iiBench</a>
benchmark as an experiment to try and quantify the impact of fragmentation, and observed
some interesting results.
</p>
<h3>Initial Load &ndash; 50M Rows</h3>
<p>
I created an iiBench table with 50M rows, and recorded how long it took to
complete along with the amount of disk space used by the data and indexes;
log files are *not* included in the reported disk use.
</p>
<pre>
$ iibench.py --setup --insert_only --engine=[innodb | tokudb] \
   --max_rows=50000000 --table_name=t1
</pre>
<table cellhalign="right", cellpadding="5px", border="1">
<tr><td><strong>Operation</strong></td><td><strong>Time<br>TokuDB</strong></td><td><strong>Time<br>InnoDB</strong></td><td><strong>Disk Use<br>TokuDB</strong></td><td><strong>Disk Use<br>InnoDB</strong>
</td></tr><tr align="right"><td> Insert 50M Rows</td><td>3,349s</td><td>28,821s </td><td>3.3GiB</td><td>11GiB 
</td></tr><tr align="right"><td> select count(*) </td><td>56s</td><td>643s</td><td>3.4GiB</td><td>11GiB 
</td></tr><tr align="right"><td> select count(*) </td><td>19.0s</td><td>14.4s</td><td>3.5GiB</td><td>11GiB
</td></tr><tr align="right"><td> optimize table </td><td>153s</td><td>28,456s</td><td>4.5GiB</td><td>20GiB
</td></tr></table>
<p>
select count(*) was run twice to see how fast it ran with the cache
warmed up (the query cache was off).
</p>
When inserting into TokuDB, some data may remain in internal data structures, so optimize table was run to
push all of the data to disk for an accurate measure of disk use. Initially, I thought running optimize table
would not be necessary for InnoDB, but I ran it to ensure a consistent procedure on both engines.
Surprisingly, InnoDB used much more disk space after running optimize table.
According to <a href="http://www.mysqlperformanceblog.com/2009/01/13/some-little-known-facts-about-innodb-insert-buffer/">this post</a>,
InnoDB's insert buffer is stored in the tablespace, so data in the insert buffer should have been included in the measured
disk use before optimizing the table.  I'm not sure what caused such a large increase in disk use.
</p>
<h3>Deleting 10M Rows</h3>
<pre>
    mysql> delete from t1 where transactionid <= 10000000;
</pre>
<table cellpadding="5px", border="1">
<tr><td><strong>Operation</strong></td><td><strong>Time<br>TokuDB</strong></td><td><strong>Time<br>InnoDB</strong></td><td><strong>Disk Use<br>TokuDB</strong></td><td><strong>Disk Use<br>InnoDB</strong>
</td></tr><tr align="right"><td> Delete 10M rows </td><td>1,050s</td><td>169,238s</td><td>4.7GiB</td><td>20GiB
</td></tr><tr align="right"><td> select count(*) </td><td>42s</td><td>772s</td><td>4.7GiB</td><td>20GiB 
</td></tr><tr align="right"><td> select count(*) </td><td>15.6s</td><td>809s</td><td>4.7GiB</td><td>20GiB 
</td></tr><tr align="right"><td> select count(*) </td><td>14.7s</td><td>802s</td><td>4.7GiB</td><td>20GiB 
</td></tr><tr align="right"><td> optimize table </td><td>129s</td><td>28,539s</td><td>4.6GiB</td><td>20GiB
</td></tr><tr align="right"><td> select count(*) </td><td>16.2s</td><td>372s</td><td>4.6GiB</td><td> 20GiB 
</td></tr><tr align="right"><td> select count(*) </td><td>13.8s</td><td>11.6s</td><td>4.6GiB</td><td>20GiB 
</td></tr></table>
<p>
    Deleting 10M rows took about 17 minutes on TokuDB, and over 47 hours on InnoDB.
    After deleting the rows, select count(*) on InnoDB ran slow, even after
    attempting to warm up the cache.  I ran it three times, and it was consistently slow.  I ran
    optimize table, and select count(*)
    ran much faster, suggesting that the problem may have been caused by fragmentation.
    TokuDB does not fragment, so select count(*) ran fast on TokuDB after deleting rows,
    with no need to optimize.
</p>
<h3>Summary</h3>
<p>
    Deleting 10M rows from a 50M row table caused the time to run select count(*)
    on InnoDB to increase by a factor of about 55.  Running optimize table
    solved the problem, but it took almost 8 hours to complete.
    <a href="http://everythingmysql.ning.com/profiles/blogs/whats-faster-than-alter">Dumping and reloading</a>
    is likely to be faster than optimize or alter, but it still takes time and effort.
</p>
<p>
    After deleting 10M rows on TokuDB, the time to run select count(*) decreased
    from about 19s to about 15s (proportional to the decrease from 50M to 40M rows in the table), without a need
    for optimizing or dumping and reloading.
</p>
<h3>Going Further</h3>
<p>
    <a href="http://www.facebook.com/note.php?note_id=137682990932">Posts from Mark Callaghan</a>
    and
    <a href="http://blogs.tokutek.com/tokuview/cache_miss_rate_as_a_function_of_cache_size/">Bradley C. Kuszmaul</a>
    show that iiBench's linear distribution of data does not provide a good model of some real world data sets, and
    a <a href="http://en.wikipedia.org/wiki/Zipf's_law">Zipfian distribution</a> is probably a better model.
    It would be interesting to re-run the experiment with an updated version of iiBench using a Zipfian distribution.
    Running similar delete experiments on large
    <a href="http://ronaldbradford.com/blog/seeking-public-data-for-benchmarks-2009-08-28/">real world data sets</a>
    would be interesting as well.
</p>
<h3>Additional Details</h3>
<p>
    I ran the tests on a machine with a modest amount of memory and slow cores by today's standards.
    <ul>
        <li>CentOS 5.1</li>
        <li>Dell PowerEdge 2950</li>
        <li>2 Socket, Quad Core Intel Xeon 1.6GHz</li>
        <li>4GB Main Memory (2GB InnoDB Buffer Pool, 2GB TokuDB Cache)</li>
        <li> 5 disk SW RAID5 1TB SATA</li>
        <li> ext3 Filesystem</li>
    </ul>
</p>
<p>
    I used default parameters for TokuDB and the InnoDB parameters are shown in the my.cnf file below.
    By default, TokuDB uses 1/2 of physical memory (2GB) for it's cache size, so the InnoDB buffer pool
    was set to 2GB for a fair comparison.  It may be possible to achieve better InnoDB results
    through tuning, but the goal of this exercise was to search for ways to quantify the impacts of
    fragmentation.
</p>
<pre>
[mysqld]

innodb_log_buffer_size=4M
innodb_thread_concurrency=8
innodb_log_files_in_group=3
innodb_log_file_size=1300M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_buffer_pool_size=2G
innodb_max_dirty_pages_pct=90
</pre><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21238&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21238&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/09/19/attempting-to-quantify-fragmentation-effects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cache Miss Rate as a function of Cache Size</title>
		<link>http://blogs.tokutek.com/tokuview/cache_miss_rate_as_a_function_of_cache_size/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=cache-miss-rate-as-a-function-of-cache-size</link>
		<comments>http://blogs.tokutek.com/tokuview/cache_miss_rate_as_a_function_of_cache_size/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 04:07:00 +0000</pubDate>
		<dc:creator>Tokuview Blog</dc:creator>
				<category><![CDATA[Blogging]]></category>

		<guid isPermaLink="false">http://blogs.tokutek.com/tokuview/cache_miss_rate_as_a_function_of_cache_size/#When:21:07:00Z</guid>
		<description><![CDATA[I saw Mark Callaghan&#8217;s post, and his graph showing miss rate as a function of cache size for InnoDB running MySQL.&#160; He plots miss rate against cache size and compares it to two simple models:

  A linear model where the miss rate is (1-C/D)/50, and

  A inverse-proportional model where the miss rate is D/(1000C).
He seemed happy (and maybe surprised) that that the linear model is a bad match and that inverse-proportional model is a good match.&#160; The linear model is the one that would make sense if every page were equally likely to have a hit.

I&#8217;ll argue here that it&#8217;s not so surprising.&#160; Suppose that miss rate has a heavy-tailed distribution, such as Zipf&#8217;s law. An example of a Zipf&#8217;s-law distribution would be if the most frequently accessed cache block accounts for X of the accesses, the second most frequent accounts for X/2 of the accesses, the third most frequent accounts for X/3 of the accesses, and so forth with the ith most frequently accessed block accountig for X/i of the accesses.
What miss rate should we expect?&#160; Essentially in this distribution, if you look at the number of misses in the first N blocks, it&#8217;s half the number of misses found in the next N blocks.&#160; Thus, we would expect the miss rate to be proportional to 1/C, where C is the cache size.&#160; That matches Mark&#8217;s experiment.
This simple heavy-tailed distribution shows up all over the place, and is are often a good model for this kind of system.&#160; For example, I would expect one were to collect the top page returned for every google query, the frequency of page hits follows this distribution. A more frequently cited example is the frequency distribution of words in a natural language Such a distribution probably controls the query cache too.
One shortcoming of iiBench is that iiBench assumes that all inserted index values are equally likely, leading to a linear miss-rate model.&#160; Since TokuDB&#8217;s advantage on insertions is related to the cache miss rate, reducing the miss rate will tend to make InnoDB look better, reducing TokuDB&#8217;s advantage.&#160; Thus InnoDB&#8217;s miss rate probably isn&#8217;t as bad on real data as iiBench suggests.&#160; That probably explains why iiBench shows a 200x advantage for TokuDB, but when talking to customers the advantage is often more like 10x or 20x.&#160; It seems clear to me that iiBench would serve us better if it had the option of generating data according to Zipf&#8217;s law.&#160; (Since I designed iiBench, I have no qualms about criticizing it.)
Here&#8217;s a little game that helps show why heavy-tailed distributions are fun to think about.&#160; This game comes from the St. Petersberg Paradox.&#160; The game is a lottery in which I&#8217;m going to pay you some money, and the amount of money is a function of a random variable. The payoff schedule is that I&#8217;ll pay you one dollar half the time, I&#8217;ll pay you two dollars a quarter of the time, I&#8217;ll pay you $4 with probability 1/8, and in general I&#8217;ll pay 2i dollars with probability 2-i-1.&#160; How much would you pay to buy one of these lottery tickets?&#160; (For an analysis, see Wikipedia).
]]></description>
			<content:encoded><![CDATA[<p>I saw <a href="http://www.facebook.com/note.php?note_id=137682990932">Mark Callaghan&#8217;s post</a>, and his graph showing miss rate as a function of cache size for InnoDB running MySQL.&nbsp; He plots miss rate against cache size and compares it to two simple models:<ul>
<br />
  <li>A linear model where the miss rate is (1-<i>C</i>/<i>D</i>)/50, and</li>
<br />
  <li>A inverse-proportional model where the miss rate is <i>D</i>/(1000<i>C</i>).</li>
</ul>He seemed happy (and maybe surprised) that that the linear model is a bad match and that inverse-proportional model is a good match.&nbsp; The linear model is the one that would make sense if every page were equally likely to have a hit.</p>
<p>
I&#8217;ll argue here that it&#8217;s not so surprising.&nbsp; Suppose that miss rate has a heavy-tailed distribution, such as <a href="http://en.wikipedia.org/wiki/Zipf's_law">Zipf&#8217;s law</a>. An example of a Zipf&#8217;s-law distribution would be if the most frequently accessed cache block accounts for <i>X</i> of the accesses, the second most frequent accounts for <i>X</i>/2 of the accesses, the third most frequent accounts for <i>X</i>/3 of the accesses, and so forth with the <i>i</i>th most frequently accessed block accountig for <i>X</i>/<i>i</i> of the accesses.</p>
<p>What miss rate should we expect?&nbsp; Essentially in this distribution, if you look at the number of misses in the first <i>N</i> blocks, it&#8217;s half the number of misses found in the next <i>N</i> blocks.&nbsp; Thus, we would expect the miss rate to be proportional to 1/<i>C</i>, where <i>C</i> is the cache size.&nbsp; That matches Mark&#8217;s experiment.</p>
<p>This simple heavy-tailed distribution shows up all over the place, and is are often a good model for this kind of system.&nbsp; For example, I would expect one were to collect the top page returned for every google query, the frequency of page hits follows this distribution. A more frequently cited example is the frequency distribution of words in a natural language Such a distribution probably controls the query cache too.</p>
<p>One shortcoming of <a href="http://code.google.com/p/google-mysql-tools/source/browse/trunk/ibench.py">iiBench</a> is that iiBench assumes that all inserted index values are equally likely, leading to a linear miss-rate model.&nbsp; Since TokuDB&#8217;s advantage on insertions is related to the cache miss rate, reducing the miss rate will tend to make InnoDB look better, reducing TokuDB&#8217;s advantage.&nbsp; Thus InnoDB&#8217;s miss rate probably isn&#8217;t as bad on real data as iiBench suggests.&nbsp; That probably explains why iiBench shows a 200x advantage for TokuDB, but when talking to customers the advantage is often more like 10x or 20x.&nbsp; It seems clear to me that iiBench would serve us better if it had the option of generating data according to Zipf&#8217;s law.&nbsp; (Since I designed iiBench, I have no qualms about criticizing it.)</p>
<p>Here&#8217;s a little game that helps show why heavy-tailed distributions are fun to think about.&nbsp; This game comes from the <a href="http://en.wikipedia.org/wiki/St._Petersburg_paradox">St. Petersberg Paradox</a>.&nbsp; The game is a lottery in which I&#8217;m going to pay you some money, and the amount of money is a function of a random variable. The payoff schedule is that I&#8217;ll pay you one dollar half the time, I&#8217;ll pay you two dollars a quarter of the time, I&#8217;ll pay you $4 with probability 1/8, and in general I&#8217;ll pay 2<sup><i>i</i></sup> dollars with probability 2<sup>-<i>i</i>-1</sup>.&nbsp; How much would you pay to buy one of these lottery tickets?&nbsp; (For an analysis, see Wikipedia).</p>
<br /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21095&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21095&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/09/13/cache-miss-rate-as-a-function-of-cache-size/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sponsoring OpenSQL Camp 2009</title>
		<link>http://blogs.tokutek.com/tokuview/sponsoring_opensql_camp_2009/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=sponsoring-opensql-camp-2009</link>
		<comments>http://blogs.tokutek.com/tokuview/sponsoring_opensql_camp_2009/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 07:01:00 +0000</pubDate>
		<dc:creator>Tokuview Blog</dc:creator>
				<category><![CDATA[Blogging]]></category>

		<guid isPermaLink="false">http://blogs.tokutek.com/tokuview/sponsoring_opensql_camp_2009/#When:00:01:00Z</guid>
		<description><![CDATA[We&#8217;re supporting the OpenSQL Camp, which will be held in Portland on November 14.&#160; 


One of my objectives for the cam[ is to make progress on a universal storage engine API, to make it possible to use the same storage engines in MySQL, PostgreSQL, Ingres, or any other database.&#160; I&#8217;m also looking forward to hearing other people&#8217;s great ideas.


After OpenSQLcamp, I&#8217;ll be attending Supercomputing&#8217;09.&#160; Supercomputing and database hardware technology seems to be converging.&#160; Many of the fastest databases today look like a supercomputer with disks attached.&#160;  Will there be other kinds of convergence?&#160; For example, what kind of convergence will we see between multicore computing and cluster computing?&#160; Today we program multicore machines very differently from clusters.&#160; I think in the future that difference will vanish.
]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re supporting the <a href="http://opensqcampl.org" title="OpenSQL Camp">OpenSQL Camp</a>, which will be held in Portland on November 14.&nbsp; 
</p>
<p>
One of my objectives for the cam[ is to make progress on a universal storage engine API, to make it possible to use the same storage engines in MySQL, PostgreSQL, Ingres, or any other database.&nbsp; I&#8217;m also looking forward to hearing other people&#8217;s great ideas.
</p>
<p>
After OpenSQLcamp, I&#8217;ll be attending Supercomputing&#8217;09.&nbsp; Supercomputing and database hardware technology seems to be converging.&nbsp; Many of the fastest databases today look like a supercomputer with disks attached.&nbsp;  Will there be other kinds of convergence?&nbsp; For example, what kind of convergence will we see between multicore computing and cluster computing?&nbsp; Today we program multicore machines very differently from clusters.&nbsp; I think in the future that difference will vanish.
</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21068&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=21068&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/09/11/sponsoring-opensql-camp-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sorting a Terabyte in 197 seconds</title>
		<link>http://blogs.tokutek.com/tokuview/sorting_a_terabyte_in_197_seconds/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=sorting-a-terabyte-in-197-seconds</link>
		<comments>http://blogs.tokutek.com/tokuview/sorting_a_terabyte_in_197_seconds/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 01:10:01 +0000</pubDate>
		<dc:creator>Tokuview Blog</dc:creator>
				<category><![CDATA[Blogging]]></category>

		<guid isPermaLink="false">http://blogs.tokutek.com/tokuview/sorting_a_terabyte_in_197_seconds/#When:20:10:01Z</guid>
		<description><![CDATA[Sorting a Terabyte in 197 seconds
I just returned from The 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), held in Calgary, where I gave a talk about my entry to the sorting contest.&#160; I sorted 1TB in 197s on a 400-node machine at MIT Lincoln Laboratory, a record which still stands today.&#160; (And it will likely remain standing, since terabyte sorting is now deprecated because it&#8217;s too fast.&#160; Now the challenge is to sort 100TB.)
For many years Jim Gray ran a sorting contest to see how fast anyone could sort a terabtye worth of  100-byte records, how much data could be sorted in one minute, and how much data could be sorted for a penny.&#160; After Jim&#8217;s disappearance at sea in January 2007, a committee formed to continue the contest.
I entered in 2007, and this week I finally got around to talking about it at a conference.&#160; My sorting algorithm is a variant of the SampleSort Algorithm by Blelloch, Plaxton, Leiseron et al.    To learn more about TokuSampleSort, take a look at the long version of the paper or the slides from the SPAA talk.
TokuSampleSort is not directly related to the Tokutek&#8217;s TokuDB storage engine for MySQL.&#160; One is a sort, and the other maintains indexes.&#160;  The difference between sorting and indexing is that an index is a dynamically sorted collection of data.&#160; Sorting, however, operates on a static set of data.&#160; Sorting and indexing both require attention to how to achieve high performance from processors and disks.&#160; It can be a little bit surprising that sorting is easier than indexing.&#160; Sorting is easier because all the data is available at the beginning of the calculation.&#160; In contrast, when indexing, data may arrive a little bit at a time, and at any time a user might query a range of the data.&#160; So indexing requires maintaining data in sorted order as the data arrives (and thus produces many sorted data sets, if you count each intermediate state), whereas a sort produces a single sorted answer.
Many storage engines (including, for example, MyISAM) use a sorting algorithm when creating an index (e.g., with CREATE INDEX or ALTER TABLE, and then use B-trees to maintain the index after it&#8217;s been created.&#160; It&#8217;s difficult to maintain a B-tree index if records arrive at high speed, however.&#160; TokuDB uses Fractal Tree indexes to maintain indexes orders of magnitude faster than B-trees.
How good would a fractal tree index be for sorting?&#160; TokuSampleSort sorts at a rate of about 127,000 records per processor per second.&#160; TokuDB can index random data at about 30,000 records per processor per second.&#160; Fractal Tree indexes are not as fast as a sort, but they are surprisingly close.
]]></description>
			<content:encoded><![CDATA[<h2>Sorting a Terabyte in 197 seconds</h2>
<p>I just returned from <a href="http://www.cs.jhu.edu/~spaa/2009/">The 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)</a>, held in Calgary, where I gave a talk about my entry to the <a href="http://sortbenchmark.org/">sorting contest</a>.&nbsp; I sorted 1TB in 197s on a 400-node machine at MIT Lincoln Laboratory, a record which still stands today.&nbsp; (And it will likely remain standing, since terabyte sorting is now deprecated because it&#8217;s too fast.&nbsp; Now the challenge is to sort 100TB.)</p>
<p>For many years <a href="http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)">Jim Gray</a> ran a sorting contest to see how fast anyone could sort a terabtye worth of  100-byte records, how much data could be sorted in one minute, and how much data could be sorted for a penny.&nbsp; After Jim&#8217;s disappearance at sea in January 2007, a committee formed to continue the <a href="http://sortbenchmark.org/">contest</a>.</p>
<p>I entered in 2007, and this week I finally got around to talking about it at a conference.&nbsp; My sorting algorithm is a variant of the <a href="http://www.cs.cmu.edu/afs/cs.cmu.edu/project/scandal/public/papers/cm-sort-SPAA91.html">SampleSort Algorithm</a> by Blelloch, Plaxton, Leiseron <i>et al.</i>    To learn more about TokuSampleSort, take a look at the <a href="http://sortbenchmark.org/tokutera.pdf">long version of the paper</a> or the <a href="http://bradley.csail.mit.edu/~bradley/talks/spaa09-terabyte.pdf">slides from the SPAA talk</a>.</p>
<p>TokuSampleSort is not directly related to the Tokutek&#8217;s TokuDB storage engine for MySQL.&nbsp; One is a sort, and the other maintains indexes.&nbsp;  The difference between sorting and indexing is that an index is a dynamically sorted collection of data.&nbsp; Sorting, however, operates on a static set of data.&nbsp; Sorting and indexing both require attention to how to achieve high performance from processors and disks.&nbsp; It can be a little bit surprising that sorting is easier than indexing.&nbsp; Sorting is easier because all the data is available at the beginning of the calculation.&nbsp; In contrast, when indexing, data may arrive a little bit at a time, and at any time a user might query a range of the data.&nbsp; So indexing requires maintaining data in sorted order as the data arrives (and thus produces many sorted data sets, if you count each intermediate state), whereas a sort produces a single sorted answer.</p>
<p>Many storage engines (including, for example, MyISAM) use a sorting algorithm when creating an index (e.g., with <tt>CREATE INDEX</tt> or <tt>ALTER TABLE</tt>, and then use B-trees to maintain the index after it&#8217;s been created.&nbsp; It&#8217;s difficult to maintain a B-tree index if records arrive at high speed, however.&nbsp; TokuDB uses Fractal Tree indexes to maintain indexes orders of magnitude faster than B-trees.</p>
<p>How good would a fractal tree index be for sorting?&nbsp; TokuSampleSort sorts at a rate of about 127,000 records per processor per second.&nbsp; TokuDB can index random data at about 30,000 records per processor per second.&nbsp; Fractal Tree indexes are not as fast as a sort, but they are surprisingly close.</p>
<br /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=20672&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=20672&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/08/18/sorting-a-terabyte-in-197-seconds/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Announcing TokuDB 2.1.0</title>
		<link>http://blogs.tokutek.com/tokuview/announcing_tokudb_210/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=announcing-tokudb-2-1-0</link>
		<comments>http://blogs.tokutek.com/tokuview/announcing_tokudb_210/#comments</comments>
		<pubDate>Thu, 06 Aug 2009 22:23:00 +0000</pubDate>
		<dc:creator>Tokuview Blog</dc:creator>
				<category><![CDATA[Blogging]]></category>

		<guid isPermaLink="false">http://blogs.tokutek.com/tokuview/announcing_tokudb_210/#When:17:23:00Z</guid>
		<description><![CDATA[Tokutek&#174; announces the release the release of the TokuDB storage engine for MySQL&#174;, version 2.1.0.&#160; This release offers the following improvements over our previous release:


 Faster indexing of sequential keys.
 Faster bulk loads on tables with auto-increment fields.
 Faster range queries in some circumstances.
 Added support for InnoDB.
 Upgraded from MySQL 5.1.30 to 5.1.36.
 Fixed all known bugs.

About TokuDB
TokuDB for MySQL is a storage engine built with Tokutek&#8217;s Fractal Tree&#8482; technology. TokuDB provides near seamless compatibility for MySQL applications. Tables can be individually defined to use TokuDB, MyISAM, InnoDB&#174; or other MySQL-compliant storage engines. Data is loaded, inserted, and queried using standard MySQL commands, with no restrictions or special requirements. Fractal Trees can index data up to 50 times faster than traditional database technologies, enabling near real time analysis on large volumes of rapidly arriving data.



Notice: Tokutek and Fractal Tree are trademarks or registered trademarks of Tokutek, Inc.&#160; MySQL is a registered trademark of Sun Microsystems, Inc.&#160; InnoDB is a registered trademark of Oracle Corporation.


]]></description>
			<content:encoded><![CDATA[<p>Tokutek&#0174; announces the release the release of the <a href="http://www.tokutek.com/early_release.php">TokuDB storage engine for MySQL&#0174;, version 2.1.0</a>.&nbsp; This release offers the following improvements over our previous release:
</p>
<ul>
<li> Faster indexing of sequential keys.
<li> Faster bulk loads on tables with auto-increment fields.
<li> Faster range queries in some circumstances.
<li> Added support for InnoDB.
<li> Upgraded from MySQL 5.1.30 to 5.1.36.
<li> Fixed all known bugs.
</ul>
<h3>About TokuDB</h3>
<p>TokuDB for MySQL is a storage engine built with Tokutek&#8217;s Fractal Tree&#0153; technology. TokuDB provides near seamless compatibility for MySQL applications. Tables can be individually defined to use TokuDB, MyISAM, InnoDB&#0174; or other MySQL-compliant storage engines. Data is loaded, inserted, and queried using standard MySQL commands, with no restrictions or special requirements. Fractal Trees can index data up to 50 times faster than traditional database technologies, enabling near real time analysis on large volumes of rapidly arriving data.
<br />
</p>
<p>
Notice: Tokutek and Fractal Tree are trademarks or registered trademarks of Tokutek, Inc.&nbsp; MySQL is a registered trademark of Sun Microsystems, Inc.&nbsp; InnoDB is a registered trademark of Oracle Corporation.
<br />
</p>
<br /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=20533&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=20533&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/08/07/announcing-tokudb-2-1-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Autoincrement Semantics</title>
		<link>http://blogs.tokutek.com/tokuview/autoincrement_semantics/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=autoincrement-semantics</link>
		<comments>http://blogs.tokutek.com/tokuview/autoincrement_semantics/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 21:28:00 +0000</pubDate>
		<dc:creator>Tokuview Blog</dc:creator>
				<category><![CDATA[Blogging]]></category>

		<guid isPermaLink="false">http://blogs.tokutek.com/tokuview/autoincrement_semantics/#When:16:28:00Z</guid>
		<description><![CDATA[In this post I&#8217;m going to talk about how TokuDB&#8217;s implementation of auto increment works, and contrast it to the behavior of MyISAM and InnoDB.&#160; We feel that the TokuDB behavior is easier to understand, more standard-compliant and offers higher performance (especially when implemented with Fractal Tree indexes).
In TokuDB, each table can have an auto-increment column.&#160; That column can be used as any part of a key, but it doesn&#8217;t have to be part of any key.&#160; The value produced by auto incrementing is always greater than the previous maximum value for that column. There are some cases where auto-incremented values are skipped, such as when a transaction aborts, which &#8220;uses up&#8221; auto-incremented values.
This behavior is close to that required for SQL:2003 (see SQL:2003 at wikipedia), which specifies that each table provides one unnamed sequence which behaves essentially in the way we implemented auto increment.&#160; The SQL standard permits but doesn&#8217;t require auto-incremented values to be used up when a transaction aborts.
The semantics provided by TokuDB is different from MyISAM and InnoDB.&#160; The value you get in an auto-increment field depends on the exact type and order of the secondary indexes (which is described in the MySQL manual).&#160; Those relatively complex semantics seem to be viewed as a feature by MyISAM users.&#160; Also, for MyISAM, there are no transactions, and so aborting a transaction doesn&#8217;t use up auto-increment values.
InnoDB provides yet another semantics: An auto-increment field must be the first field in some key.
For many users, the difference in semantics does not seem important.
Part of the reason we implemented a different behavior than the other storage engines is that the other engines&#8217; behaviors don&#8217;t play well with fractal tree indexing.&#160; If rows can be inserted into a table or index without fetching an existing row, then fractal trees are two orders of magnitude faster than B-trees. If we had to fetch an existing row (to find out what the previous maximum auto-increment field was), the performance advantage would be less in many cases.&#160; By using the moral equivalent of a SQL:2003 sequence, we can generate auto-increment values without fetching rows, and so indexes can be maintained at high data-arrival rates.
Thus TokuDB&#8217;s auto-increment fields have several advantages over those of MyISAM and InnoDB.&#160; TokuDB&#8217;s are simpler to use, they act more like SQL:2003 standard sequences, and (perhaps most importantly to us) they admit higher performance.
]]></description>
			<content:encoded><![CDATA[<p>In this post I&#8217;m going to talk about how TokuDB&#8217;s implementation of auto increment works, and contrast it to the behavior of MyISAM and InnoDB.&nbsp; We feel that the TokuDB behavior is easier to understand, more standard-compliant and offers higher performance (especially when implemented with Fractal Tree indexes).</p>
<p>In TokuDB, each table can have an auto-increment column.&nbsp; That column can be used as any part of a key, but it doesn&#8217;t have to be part of any key.&nbsp; The value produced by auto incrementing is always greater than the previous maximum value for that column. There are some cases where auto-incremented values are skipped, such as when a transaction aborts, which &#8220;uses up&#8221; auto-incremented values.</p>
<p>This behavior is close to that required for SQL:2003 (see <a href"=http://en.wikipedia.org/wiki/SQL:2003>SQL:2003 at wikipedia</a>), which specifies that each table provides one unnamed sequence which behaves essentially in the way we implemented auto increment.&nbsp; The SQL standard permits but doesn&#8217;t require auto-incremented values to be used up when a transaction aborts.</p>
<p>The semantics provided by TokuDB is different from MyISAM and InnoDB.&nbsp; The value you get in an auto-increment field depends on the exact type and order of the secondary indexes (which is described in the MySQL manual).&nbsp; Those relatively complex semantics seem to be viewed as a feature by MyISAM users.&nbsp; Also, for MyISAM, there are no transactions, and so aborting a transaction doesn&#8217;t use up auto-increment values.</p>
<p>InnoDB provides yet another semantics: An auto-increment field must be the first field in some key.</p>
<p>For many users, the difference in semantics does not seem important.</p>
<p>Part of the reason we implemented a different behavior than the other storage engines is that the other engines&#8217; behaviors don&#8217;t play well with fractal tree indexing.&nbsp; If rows can be inserted into a table or index without fetching an existing row, then fractal trees are two orders of magnitude faster than B-trees. If we had to fetch an existing row (to find out what the previous maximum auto-increment field was), the performance advantage would be less in many cases.&nbsp; By using the moral equivalent of a SQL:2003 sequence, we can generate auto-increment values without fetching rows, and so indexes can be maintained at high data-arrival rates.</p>
<p>Thus TokuDB&#8217;s auto-increment fields have several advantages over those of MyISAM and InnoDB.&nbsp; TokuDB&#8217;s are simpler to use, they act more like SQL:2003 standard sequences, and (perhaps most importantly to us) they admit higher performance.</p>
<br />]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2009/07/30/autoincrement-semantics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

