Archive for the ‘data warehousing’ Category

YAPCEU 2010 – Day Two…

Август 6th, 2010

After enjoying the excellent hospitality of our host here in Pisa (6 courses) we were ready for our second day at YAPCEU 2010 here in sunny Pisa.

Larry’s new catch phrase “My Language is a four letter word” was the ‘Buzz word’ for today. We settled down to some very interesting talks, the highlight for me being Tim Bunce’s talk on using Devel::NYTProf to Optimize your code. Tim first gave us a quick and dirty overview of optimization which covered the basics of where to start and what to look for he followed up with real examples of Optimizer output and than wrapped up with a few before and after results on an optimization effort.

The rest of the day was dedicated in my opinion, to the future of DBs in with Nelson Ferraz giving an excellent presentation of his concepts for using Perl as to glue for a Data Warehouse application. Next on my agenda, Martin Berends reports on the present state of Perl 6 and interfaces database. There is progress here as we now have some access to the DBI for MySQL and some others. Also great news is the fact there is a good deal of development work going on.

Martin was quickly followed by Tim Bunce again who presented his proposal for the new Perl 6’s DBDI. Seems
we are going to use the JDBC specification with a little tweaking as our road map for the future. Tim also
showed of some Perl 6 black majik from Jonathan Worthington and he was able to how us DBI with DBD::Pg
running on Perl 6.

Allison Randal finished off today’s formal presentations with her “Migration Strategies” presentation. She
gave us some good insight into migration in that we, as developers, cannot force migration on the community. She also provided two examples of migrations: Apache, which took seven years but has almost full buy-in by users and Python, which was quick and dirty but has not received the same buy-in from the community.

Finally we all enjoyed the lightning talks as our wrap up. A number of neat quick ones such as ‘I speak Perl with a ‘c’ accent’, was a highlight for me.

Seems the videos and most of the slides (including my presentation on XS) may be online tomorrow. I will be sure to post them as soon as I see them.


PlanetMySQL Voting: Vote UP / Vote DOWN

Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning)

Июль 30th, 2010

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).

Partition – Larger tables or fact tables can benefit from partitioning because it makes data load easier and can increase join performance and use data elimination. Parallel execution can be done with partitioning due to partition pruning. The degree of parallelism should be a power of 2, because of hash-based algorithm in hash partitioning. To translate this to the MySQL world, if you are using LINEAR HASH partitioning, then you should use a degree of parallelism that is a power of 2 (I checked, and indeed. Otherwise, use a degree of parallelism that makes sense given the number of partitions you have.

One important note that during Pythian’s testing of MySQL partitioning, we found that all partitions were locked when an INSERT occurs, for the duration of the INSERT. Bulk-loading with MySQL partitioning is not as fast as it would be if MySQL allowed partition pruning for INSERTs.

So, what should be partitioned? For the first level of partitioning, the goal is to enable partitioning pruning and simplify data management. The most typical partitioning is range or interval partitioning on a date column. Interval partitioning is you say what the partition is (date, month) and partition is automatically created. MySQL does not have interval partitioning, and I have seen typical first-level partitioning be range or list based on a date or timestamp column. Note that if you use a timestamp field, the partitioning expression is optimized if you use TO_DAYS(timestamp_field) or YEAR(timestamp_field). In my experience, using anything else (such as DATE(timestamp_field)) actually makes partitioning slower than not using partitioning at all. Note that this is based on tests I did a few months ago, and your mileage may vary.

So — how do you decide partitioning strategy? Ask yourself:

  • What range of data do the queries touch – a quarter, a year?
  • What is the data loading frequency?
  • Is an incremental load required?
  • How much data is involved, a day, a week, a month?

The answers to the above questions will tell you about how big your interval needs to be. The best scenario is that all answers are the same, “we load every day, and people query by day.” If the answers are different weight access a higher priority than loading, because most people care more about query performance than performance of ETL.

This is true even if your intervals have different sizes — ie sales per day are much bigger in Dec but that’s OK. However, Maria recommends that the subpartition be as evenly divided as possible.

Easier to look at more partitions than to look at a partition that’s too big. But you don’t want too many partitions, max Oracle allows partitions is 1 million partitions, prior to 11g it was 64,000. “Stick closer to 64,000 than 1 million”. MySQL’s limitation is 1024 per table.

For the second level of partitioning, also called subpartitioning, the goal is to allow for multi-level pruning and improve join performance. In Oracle, the most typical subpartition is hash or list – in MySQL, you can only subpartition by hash or key.

How do you decide subpartitioning strategy?

  • Select the dimension queried most frequently on the fact table OR
  • Pick the common join column

For example, if you want to look at sales per day, per store, you would choose “per day” as the partition and “per store” as the subpartition.

If you do not have a good partition on logical elements (like grouping), then you can subpartition using hash partitioning on common joins — perhaps surrogate keys, or using join key of the largest table involved in the join.

For example, if the sales table is partitioned and another big table is product, you can hash subpartition product_id.

Because there’s overhead in partitions (loading metadata, reading metadata), make sure size of partitions and subpartitions is >20 Mb. So better to have a 30 Mb subpartition than a 15 Mb subpartition. [I have no idea if this is true in MySQL or not -- I think the general concept is true, because there is some overhead, but I have no idea about the 20 Mb figure and why that's true for Oracle, nor do I know what is true in MySQL.]

One easy calculation is double the # of CPUs, round up to nearest power of 2. If you’re executing in parallel, Oracle will use 2x CPUs. (all this advice, by the way, follows 80/20 rule, this is probably good for about 80% of the environments out there). Of course, MySQL does not do parallel execution very well, so this probably does not apply.

Oracle knows it can get partition elimination while it does a join.

If 2 tables have the same degree of parallelism (same # of buckets) and are partitioned in the same way on the join column (say, customer_id in a subpartition of sales and a partition of customer), Oracle will match the partitions when joining:

sales table joined with customer table can change into 4 small joins:
sales sub part 1 joins with customer part 1
sales sub part 2 joins with customer part 2
sales sub part 3 joins with customer part 3
sales sub part 4 joins with customer part 4

And with parallelism, the total time is now reduced to the time it takes to do one of those smaller joins.

This is also why you want to have a power of 2 for buckets – because cores/processors come in powers of 2. Partition-wise joins like this can also be done with range or list, assuming both tables in the join have the same buckets.

I have no idea if MySQL partitioning works this way, but it’s certainly a functionality that makes sense to me.


PlanetMySQL Voting: Vote UP / Vote DOWN

Data Warehousing Best Practices: Comparing Oracle to MySQL, part 1 (introduction and power)

Июль 30th, 2010

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.

One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from Forrester saying, “3NF is typically a selfless model used by Enterprise data warehouse, which is used by the whole company. A star schema is a selfish model, used by a department, because it’s already got aggregation in it.”

I thought that was an interesting way of pointing that out — most people do not understand why 3NF is not good enough for data warehousing, and I have had a hard time explaining why a star or snowflake schema should be used. Another schema-related topic I had a hard time putting into words before this workshop was the difference between a star and a snowflake schema: compared to a star schema, in a snowflake schema, you have more than one fact table and maybe some dimensions that are not used often.

From Maria and the slides:
“Oracle says model what will suit your business best. Don’t get lost in academia. Most schemas are not 100% according to the theoretical
models. Some examples: 3NF schema with denormalized attributes to avoid costly joins, Star schema with multiple hierarchies in same fact table.”

Data warehousing has a 3-step approach —

1) data sources -> staging layer (temp loading layer)
2) staging layer (temp loading layer)-> foundation (logical, data store) layer
3) foundation (logical, data store) layer -> access and performance layer

The foundation layer is usually 3NF the access layer is usually a star or snowflake schema. As for the data sources, they can be varied, you would hope that they are in 3NF (and if they are you can skip the first 2 steps) but they are not always that way.

The 3 P’s of best practice for data warehousing (on Oracle) are power, partitioning, parallelism. The goal of the data warehousing environment is to minimize the amount of data accessed and use the most efficient joins – so it is not so index focused. This may be based on Oracle’s way of doing joins, I am not so sure if it applies to MySQL as well.

Power The weakest link in the chain (the 3 steps above) will define the throughput, so make sure your hardware configuration is balanced. Maria mentioned that as DBAs, “most of the time we don’t have control over this, but we’re still bound to the SLAs.”

This includes hardware that immediately comes to mind such as # of CPUs/cores, speed of CPU, amount of RAM, speed of disk as well as what we may not think of immediately: speed of network switches, speed of disk controllers, number and speed of host BUS adapters. Notes on host BUS adapters (HBAs): Know the # of HBA ports you have. 4 Gb HBA does 400 Mb/sec. 2 Gb HBA does 200 Mb/sec. Make sure there’s enough HBA capacity to sustain the CPU throughput (ie, make sure HBA isn’t the bottleneck). Also the speed at which it all talks. If you have a 4 Gb machine but a 2 Gb switch, you end up having 2 Gb throughput. Upgrade the network at the same time you upgrade machines.

Because we are talking about data warehousing, it is often not possible to eliminate disk I/O, so the goal is to have the fastest I/O throughput possible. Data warehouses need to be sized on I/O throughput not number of I/O’s.

I made a post earlier about how to determine I/O throughput for a system, which used information from this session. Justin Swanhart already pointed out that this is based on the fact that Oracle can do hash joins and MySQL can only do nested loop joins. I wonder, though, if there is indeed no case when using MySQL for which I/O throughput is a more useful metric than iops.

Disk arrays that are expensive are usually sized for iops, not throughput, and because they’re expensive the disk array is shared throughout the company. A DBA needs to ask ‘how many connections into the storage array do I have? How many disk controllers do I have? Where are my physical disks, and which controllers are they hanging off of?’

Typical 15k rpm disk can do about 25-35 Mb/sec (per disk) random i/o’s. Disk manufacturers will throw out numbers like 200-300 Mb/sec but that’s sequential I/O and leading edge of the drive. Make sure all your LUNs are not coming off the same set of disks, so that you’re not conflicting on disk seeks.

Continue to part 2, partitioning.


PlanetMySQL Voting: Vote UP / Vote DOWN

CAOS Theory Podcast 2010.02.05

Февраль 5th, 2010

Topics for this podcast:

*Matt Asay moves from Alfresco to Canonical
*GPL fade fuels heated discussion
*Apple’s iPad and its enterprise and open source impact
*Open source in data warehousing and storage
*Our perspective on Oracle’s plans for Sun open source

iTunes or direct download (32:50, 9.2 MB)


PlanetMySQL Voting: Vote UP / Vote DOWN

Easter Eggs for MySQL and Kettle

Январь 27th, 2010
To whom it may concern,

A MySQL stored function to calculate easter day


I uploaded a MySQL forge snippet for the f_easter() function. You can use this function in MySQL statements to calculate easter sunday for any given year:

mysql> select f_easter(year(now()));
+-----------------------+
| f_easter(year(now())) |
+-----------------------+
| 2010-04-04 |
+-----------------------+
1 row in set (0.00 sec)

Anonymous Gregorian algorithm


To implement it, I simply transcribed the code of the "Anonymous Gregorian algorithm" from wikipedia's Computus article.

You might ask yourself: "how does it work?". Frankly, I don't know. Much like a tax form, I treat the calculation as a black box. But, it's wikipedia, so it must be right, right?

A Javascript snippet to calculate easter day


I also transcribed the algorithm to javascript, so I could use it in Kettle (a.k.a. Pentaho Data Integration). Of course, nothing should stop you from using it for another environment, such as a webpage.

I don't have a proper place to host that code, so I'm listing it here:

//Script to calculate Easter day
//according to the "Anonymous Gregorian algorithm"
function easterDay(year) {
var a = year % 19,
b = Math.floor(year / 100),
c = year % 100,
d = Math.floor(b / 4),
e = b % 4,
f = Math.floor((b + 8) / 25),
g = Math.floor((b - f + 1) / 3),
h = (19 * a + b - d - g + 15) % 30,
i = Math.floor(c / 4),
k = c % 4,
L = (32 + 2 * e + 2 * i - h - k) % 7,
m = Math.floor((a + 11 * h + 22 * L) / 451),
n = h + L - 7 * m + 114;
return new Date(year,
Math.floor(n / 31) - 1,
(n % 31) + 1);
}

easter = easterDay(year);

To use this in your kettle transformations, create a stream with an field of the Integer type called year. The year field should take on the value of some year. In the step, create one output field of the Date type to take on the value of the easter script variable. (For usage outside kettle, just use the easterDay() function as you see fit)

Nice, but so what?


The thought may have crossed your mind: "So what, who cares - why should I ever want to know when it's easter day?"

Apparently, if you think like that, you don't like eggs very much. That's ok - I don't blame you. But I happen to like eggs, and people in the egg business like people that like eggs like me so they can sell them more eggs. In fact, they like selling eggs so much, that it makes a big difference to them whether their business intelligence reports say: "On March 22, 2008, we sold 10 times more eggs than on February 22 and May 22 of the same year" as compared to "In 2008, on the day before Easter, we only sold half the amount of eggs as compared to the day before Easter in 2009".

In order to report these facts, special events and holidays like easter are stored in a date dimension. (I wrote about creating a localized date dimension, a date dimension that speaks your language some time ago)

So there you go: you could use these solutions in order to build a date dimension that understands easter. The nice thing about easter is that it can be used to derive a whole bunch of other Christian holidays, like good friday, ascension, and pentecost, and in many western countries, these will be special days with regard to the normal course of business. I leave all these as an exercise to the reader, but trust me - calculating easter is the key to a solving a lot of these problems.

PlanetMySQL Voting: Vote UP / Vote DOWN

Easter Eggs for MySQL and Kettle

Январь 27th, 2010
To whom it may concern,

A MySQL stored function to calculate easter day


I uploaded a MySQL forge snippet for the f_easter() function. You can use this function in MySQL statements to calculate easter sunday for any given year:

mysql> select f_easter(year(now()));
+-----------------------+
| f_easter(year(now())) |
+-----------------------+
| 2010-04-04 |
+-----------------------+
1 row in set (0.00 sec)

Anonymous Gregorian algorithm


To implement it, I simply transcribed the code of the "Anonymous Gregorian algorithm" from wikipedia's Computus article.

You might ask yourself: "how does it work?". Frankly, I don't know. Much like a tax form, I treat the calculation as a black box. But, it's wikipedia, so it must be right, right?

A Javascript snippet to calculate easter day


I also transcribed the algorithm to javascript, so I could use it in Kettle (a.k.a. Pentaho Data Integration). Of course, nothing should stop you from using it for another environment, such as a webpage.

I don't have a proper place to host that code, so I'm listing it here:

//Script to calculate Easter day
//according to the "Anonymous Gregorian algorithm"
function easterDay(year) {
var a = year % 19,
b = Math.floor(year / 100),
c = year % 100,
d = Math.floor(b / 4),
e = b % 4,
f = Math.floor((b + 8) / 25),
g = Math.floor((b - f + 1) / 3),
h = (19 * a + b - d - g + 15) % 30,
i = Math.floor(c / 4),
k = c % 4,
L = (32 + 2 * e + 2 * i - h - k) % 7,
m = Math.floor((a + 11 * h + 22 * L) / 451),
n = h + L - 7 * m + 114;
return new Date(year,
Math.floor(n / 31) - 1,
(n % 31) + 1);
}

easter = easterDay(year);

To use this in your kettle transformations, create a stream with an field of the Integer type called year. The year field should take on the value of some year. In the step, create one output field of the Date type to take on the value of the easter script variable. (For usage outside kettle, just use the easterDay() function as you see fit)

Nice, but so what?


The thought may have crossed your mind: "So what, who cares - why should I ever want to know when it's easter day?"

Apparently, if you think like that, you don't like eggs very much. That's ok - I don't blame you. But I happen to like eggs, and people in the egg business like people that like eggs like me so they can sell them more eggs. In fact, they like selling eggs so much, that it makes a big difference to them whether their business intelligence reports say: "On March 22, 2008, we sold 10 times more eggs than on February 22 and May 22 of the same year" as compared to "In 2008, on the day before Easter, we only sold half the amount of eggs as compared to the day before Easter in 2009".

In order to report these facts, special events and holidays like easter are stored in a date dimension. (I wrote about creating a localized date dimension, a date dimension that speaks your language some time ago)

So there you go: you could use these solutions in order to build a date dimension that understands easter. The nice thing about easter is that it can be used to derive a whole bunch of other Christian holidays, like good friday, ascension, and pentecost, and in many western countries, these will be special days with regard to the normal course of business. I leave all these as an exercise to the reader, but trust me - calculating easter is the key to a solving a lot of these problems.

PlanetMySQL Voting: Vote UP / Vote DOWN

Calpont opens up: InfiniDB Open Source Analytical Database (based on MySQL)

Октябрь 27th, 2009
Open source business intelligence and data warehousing are on the rise!

If you kept up with the MySQL Performance Blog, you might have noticed a number of posts comparing the open source analytical databases Infobright, LucidDB, and MonetDB. LucidDB got some more news last week when Nick Goodman announced that the Dynamo Business Intelligence Corporation will be offering services around LucidDB, branding it as DynamoDB.

Now, to top if off, Calpont has just released InfiniDB, a GPLv2 open source version of its analytical database offering, which is based on the MySQL server.

So, let's take a quick look at InfiniDB. I haven't yet played around with it, but the features sure look interesting:

  • Column-oriented architecture (like all other analytical database products mentioned)

  • Transparent compression

  • Vertical and horizontal partitioning: on top of being column-oriented, data is also partitioned, potentially allowing for less IO to access data.

  • MVCC and support for high concurrency. It would be interesting to see how much benefit this gives when loading data, because this is usually one of the bottle necks for column-oriented databases

  • Support for ACID/Transactions

  • High performance bulkloader

  • No specialized hardware - InfiniDB is a pure software solution that can run on commidity hardware

  • MySQL compatible


The website sums up a few more features and benefits, but I think this covers the most important ones.

Calpont also offers a closed source enterprise edition, which differs from the open source by offering support for multi-node scale-out support. By that, they do not mean regular MySQL replication scale-out. Instead, the enterprise edition features a true distributed database architecture which allows you to divide incoming requests across a layer of so-called "user modules" (MySQL front ends) and "performance modules" (the actual workhorses that partition, retrieve and cache data). In this scenario, the user modules break the queries they recieve from client applications into pieces, and send them to one or more performance modules in a parallel fashion. The performance modules then retrieve the actual data from either their cache, or from the disk, and sends those back to the user modules which re-assemble the partial and intermediate results to the final resultset which is sent back to the client. (see picture)
shared-disk-arch-simple
Given the MySQL compatibility and otherwise similar features, I think it is fair to compare the open source InfiniDB offering to the Infobright community edition. Interesting differences are that InfiniDB supports all usual DML statements (INSERT, DELETE, UPDATE), and that InfiniDB offers the same bulkloader in both the community edition as well as the enterprise edition: Infobright community edition does not support DML, and offers a bulk loader that is less performant than the one included in its enterprise edition. I have not heard of an InfoBright multi-node option, so when comparing the enterprise edition featuresets, that seems like an advantage too in Calpont's offering.

Please understand that I am not endorsing one of these products over the other: I'm just doing a checkbox feature list comparison here. What it mostly boils down to, is that users that need an affordable analytical database now have even more choice than before. In addition, it adds a bit more competition for the vendors, and I expect them all to improve as a result of that. These are interesting times for the BI and data warehousing market :)

PlanetMySQL Voting: Vote UP / Vote DOWN

Some scaling observations on Infobright

Октябрь 3rd, 2009

A couple of days ago, Baron Schwartz posted some simple load and select benchmarking of MyISAM, Infobright and MonetDB, which Vadim Tkachenko followed up with a more realistic dataset and interesting figures where MonetDB beat Infobright in most queries.

Used to the parallel IEE loader, I was surprised by the apparent slow loading speed of Baron's benchmark and decided to try and replicate it. I installed Infobright 3.2 on my laptop (see, this is very unscientific) and wrote a simple perl script to generate and load an arbitrarily large data set resembling Baron's description. I'm not going to post my exact numbers, because this installation is severely resource-constrained below Infobright's recommended smallest installation. However, you can reproduce the results yourself with the attached script, and I will note some observations.

First of all, this was run on a 1.8GHz Core 2 Duo 2GB RAM laptop running a 64-bit kernel and 64-bit ICE. I stopped most other programs for the duration of the test, but was still running Fedora 11's GNOME Desktop, and gave ICE only 400MB main heap, 200MB compressed heap and 300MB loader heap. What I found:

  • Loading speed is almost a linear function of table width. Every time I doubled the number of (random integer) columns in the loaded table, loading speed approximately halved. At 200 columns, I was seeing approximately the same speed as Baron.
  • On the other hand, loading speed is NOT affected by the number of rows loaded, from which one could assume that it won't be affected by the size of the pre-existing table either (though I did not test this).
  • SELECT speed is not affected by the number of columns in the table, unless those columns are being selected or used in the where constraint. The same select executed in the same speed regardless of whether the source table had 10, 100 or 500 columns which were not being inspected.
  • Loading order, column sorting and select constraints are strongly correlated. Limiting a query with a random-value column causes a query which reads 10%, 50% or 90% of the table to execute in approximately 1x, 2x and 3x time. On the other hand, replacing the random-value column in the where constraint with a sorted-value (in load order) column makes queries accessing the same 10%, 50% or 90% of the rows run in nearly constant time. This is the rough set "knowledge grid" in action, but only works if the aggregations done are sum(), min(), max() or other simple functions supported by the grid.
  • With large dataset (I stopped at 50x heap space), the constant scale factor in the previous query starts to deteriorate, as the entire knowledge grid no longer fits in heap, and inspecting 90% of it vs 10% will require I/O. At this point the scale is beginning to resemble a realistic production setting, as few people are able to host even 5% of their ADBMS working set in-memory.
  • So this is where things get really interesting, and performance characteristics shift around. Random-column constrained queries of 10%, 50% and 90% of rows now run at 1x, 1.1x and 1.3x time, while constant-column constraints execute at 1x, 1.3x and 3x their respective performance, but approximately 3000x faster than the random-constraints!
  • The last point shows why it really matters to Infobright workloads that the most frequent queries are taken into account when deciding the load order for the data set. Even when designing an incremental ETL process, it can pay off immensely to pre-sort the incremental data sets by the most likely constraint or group by columns to allow blocks of data (64k rows each) to be included or excluded for particular query plans.
  • Why is the random-column constrained queries executing in almost constant time regardless of the rows inspected? Because Infobright's columnar datapack engine accesses 64k rows at a time, it's nearly as expensive to access every tenth row as it is to access nine rows out of every ten, if the distribution of those rows is even across the data set. On the other hand, if the 10% of rows needed are clustered together, then the other 90% of the data set is skipped very early in the query optimization process. A traditional btree-indexed data set would still require a random sweep over most of the index, which would be much larger than Infobright's "knowledge grid" is.
That's it for now. If you decide to run your own test using the script, please post comments. It should run as-is on a machine with the unmodified ICE 3.2 installation and basic Perl packages available. The script takes two arguments: number of columns (each a random integer value), and number of rows, and generates the data set into a named pipe though which it's loaded into a local table created automatically.

PlanetMySQL Voting: Vote UP / Vote DOWN

A peek under the hood in Infobright 3.2 storage engine

Сентябрь 21st, 2009

I've been meaning to post some real-world data on the performance of the Infobright 3.2 release which happened a few weeks ago after an extended release candidate period. We're just preparing our upgrades now, so I don't have any performance notes over significant data sets or complicated queries to post quite yet.

To make up for that, I decided to address a particular annoyance of mine in the community edition, first because it hadn't been addressed in the 3.2 release (and really, I'm hoping doing this would include it into 3.2.1), and second, simply because the engine being open source means I can. I feel being OSS is one of Infobright's biggest strengths, in addition to being a pretty amazing piece of performance for such a simple, undemanding package in general, and not making use of that would be shame. Read on for details.

The annoyance? It's pretty difficult to tell, as a user, what the engine is doing while it's running queries. EXPLAIN isn't hooked up and falls back to the general MySQL code path (which, due to the storage engine not exporting any index information, simply thinks any query will be a full table scan). SHOW PROCESSLIST status data on every query simply says "init" for the entire duration of the queries, which could be minutes at a time. It does write quite a lot of detailed information into an optional debug log, but that's on the database server, inaccessible to the user and application, as well as being rather hard to read.

Fortunately, the existence of those debug statements meant it was very easy to find the places into which I could insert some status instrumentation for the process list. This is certainly not perfect - this doesn't help telling about execution paths before running a query, and the convention for process list status is far more terse than what the debug output of the engine could produce. I could have simply copied the same detail level into the process list, but that doesn't seem to be the norm in MySQL engines, and assuming that Infobright will later include the SHOW PROFILES feature, would not be helpful anyway.

The patch is below (or download it as raw text), and it applies on top of 3.2 src package downloadable at the Infobright.org site. Builds with 'make EDITION=community release' and works for me, but use this at your own risk. Please do post notes and comments, though, I'd be interested to hear about other users. I'm sure the patch could be much improved, too.

Now, what would be really interesting was if the debug log's information of the knowledge grid evaluation could be turned into EXPLAIN output, but that would require more understanding of MySQL internals than what I have...

This was the first time I looked at the source code for Infobright, and the second or third time I did so for MySQL in general. ICE is pretty impressive also in its techniques, not only being the only integrated columnar engine, but also having more join strategies than other engines I've used, and so forth. The code is tough to follow though, and the source package included a huge amount of unused stuff, like a copy of both the InnoDB and NDB storage engines, neither of which is built from the code base. I guess a bit of clean-up would make this somewhat more approachable..

diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/JoinerGeneral.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/JoinerGeneral.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/JoinerGeneral.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/JoinerGeneral.cpp    2009-09-18 12:13:23.001506795 +0300
@@ -26,6 +26,7 @@
         mind->Empty();
         return no_desc;            // all done
     }
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "joining");
     //if(desc[0].val1.vc->IsConst() && desc[0].val2.vc == NULL) {
     //    // Special case: if there is a chance for one-dimensional filtering, execute one condition only.
     //    no_desc = 1;
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/JoinerHash.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/JoinerHash.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/JoinerHash.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/JoinerHash.cpp    2009-09-18 15:31:15.263505407 +0300
@@ -55,6 +55,7 @@
     /////////////////// Prepare all descriptor information /////////////
     // TODO: prepare a common language for both joined columns, if not compatible
     bool first_found = true;
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "hash join");
     DimensionVector dims1(mind);        // Initial dimension descriptions
     DimensionVector dims2(mind);
     for(int i = 0; i < desc.size(); i++) {
@@ -203,6 +204,7 @@
 
     _int64 hash_row = 0;                // hash_row = 0, otherwise deadlock for null on the first position
     _int64 traversed_rows = 0;
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "hash join traverse");
     while(mit.IsValid()) {
         if(m_conn.killed())
             throw KilledRCException();
@@ -247,6 +249,7 @@
     int no_of_matching_rows;
     MIIterator mit(mind, matched_dims);
     MIDummyIterator combined_mit(mind);        // a combined iterator for checking non-hashed conditions, if any
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "hash join tuples");
     while(mit.IsValid()) {
         if(m_conn.killed())
             throw KilledRCException();
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/JoinerLoop.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/JoinerLoop.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/JoinerLoop.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/JoinerLoop.cpp    2009-09-18 12:12:46.056444175 +0300
@@ -34,6 +34,8 @@
     int cur_dim1, cur_dim2;
     int attr1, attr2;
 
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "loop join");
+
     ParseDescriptor( desc[0], cur_t1, cur_t2, cur_dim1, cur_dim2, attr1, attr2, loc_op );
 
     //////////////////////////////////////////////////////////////////////////////////
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/JoinerSort.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/JoinerSort.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/JoinerSort.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/JoinerSort.cpp    2009-09-18 15:32:01.921256944 +0300
@@ -29,6 +29,7 @@
     VirtualColumn *vc2 = desc[0].val1.vc;
     dim1 = vc1->GetDim();
     dim2 = vc2->GetDim();
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "sort join");
     // The only supported cases (for now):
     if(dim1 == -1 || dim2 == -1 ||            // one-dim only
         mind->GetFilter(dim1) == NULL ||
@@ -128,6 +129,8 @@
     s1.Lock();
     s2.Lock();
 
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "sort join apply");
+
     MINewContents new_mind(mind);
     new_mind.SetDimension(dim1);
     new_mind.SetDimension(dim2);
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/MIRoughSorter.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/MIRoughSorter.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/MIRoughSorter.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/MIRoughSorter.cpp    2009-09-16 23:17:07.487972507 +0300
@@ -77,6 +77,7 @@
 
     ///////////////////////// the main sorting loop through bigblocks /////////
     if(sorting_needed)    {
+        thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "sorting roughly");
         rccontrol.lock(mind->m_conn->GetThreadID()) << "Sorting roughly multiindex..." << unlock;
         _int64 start_tuple = 0;
         _int64 stop_tuple = 0;
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/MIUpdatingIterator.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/MIUpdatingIterator.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/MIUpdatingIterator.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/MIUpdatingIterator.cpp    2009-09-18 15:12:03.293257577 +0300
@@ -116,6 +116,7 @@
 {
     if(!changed)
         return;
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "commit");
     if(one_dim_filter) {
         one_dim_filter->Commit();        // working directly on multiindex filer (special case)
         mind->UpdateNoTuples();
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/MultiSorter.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/MultiSorter.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/MultiSorter.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/MultiSorter.cpp    2009-09-18 11:31:50.202579444 +0300
@@ -372,6 +372,7 @@
     }
     else
         rccontrol.lock(m_conn->GetThreadID()) << "Sorting " << no_obj << " rows..." << unlock;
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "sorting");
     if(max_rate < cur_rate)
         max_rate = cur_rate;
     int byte_ind = 4;            // no. of bytes to encode row index (4 or 8)
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/Query_exeq_low.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/Query_exeq_low.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/Query_exeq_low.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/Query_exeq_low.cpp    2009-09-18 11:59:48.065253205 +0300
@@ -553,6 +553,7 @@
     ////////////////////////////////////////////////////////////////////////
     if(desc.size() < 1)
         return;
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "preparing");
     DelayWhereConditions(desc);
     SyntacticalDescriptorListPreprocessing(desc, mind, table);
 
@@ -619,6 +620,8 @@
         return;
     }
 
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "executing");
+    
     ///////////////// Apply all one-dimensional filters (after where, i.e. without outer joins)
     for(uint i = 0; i < desc.size(); i++)
         if(!desc[i].done && desc[i].IsInner() && !desc[i].IsType_Join() && !desc[i].IsDelayed()) {
@@ -655,6 +658,7 @@
     }
 
     /////////////////////////////////////////////////////////////////////////////////////
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "joining");
     DescriptorJoinOrdering(desc, mind);
 
     ///// descriptor display for joins
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/Query_optimize_RS.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/Query_optimize_RS.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/Query_optimize_RS.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/Query_optimize_RS.cpp    2009-09-18 11:27:55.040256644 +0300
@@ -104,6 +104,7 @@
                                   MultiIndex &mind,
                                   vector<Descriptor> &desc)
 {
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "evaluating P2P");
     bool is_nonempty = true;
     // init by previous values of mind (if any nontrivial)
     for(int i = 0; i < mind.NoDimensions(); i++) {
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/RCEngine_results.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/RCEngine_results.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/RCEngine_results.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/RCEngine_results.cpp    2009-09-16 23:02:02.200221840 +0300
@@ -37,7 +37,7 @@
 void RCEngine::SendResults(na::DataSource* exectree, THD* thd, select_result *res, List<Item> &fields)
 {
     int error = 0;
-    thd->proc_info="Sending data";
+    thd_proc_info(thd,"Sending data");
     DBUG_PRINT("info", ("%s", thd->proc_info));
 
     res->send_fields(fields, Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF);
@@ -136,7 +136,7 @@
 void RCEngine::SendResults(JustATable& results, THD* thd, select_result *res, List<Item> &fields, ConnectionInfo *conn)
 {
     int error = 0;
-    thd->proc_info="Sending data";
+    thd_proc_info(thd,"Sending data");
     DBUG_PRINT("info", ("%s", thd->proc_info));
 
     res->send_fields(fields, Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF);
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/RoughJoinWatcher.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/RoughJoinWatcher.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/RoughJoinWatcher.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/RoughJoinWatcher.cpp    2009-09-16 23:18:25.821225774 +0300
@@ -184,6 +184,7 @@
     //        - after checking all the result:
     //            - if still potentially_excluded => set as non-intersecting and up to date
 
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "updating P2P");
     rccontrol.lock(mind.m_conn->GetThreadID()) << "Updating P2P..." << unlock;
     anything_to_update = false;
     _int64 pairs_already_updated = 0;
diff -ur infobright-3.2-x86_64src/src/storage/brighthouse/core/TempTable_aggregate.cpp infobright-3.2-x86_64src.new/src/storage/brighthouse/core/TempTable_aggregate.cpp
--- infobright-3.2-x86_64src/src/storage/brighthouse/core/TempTable_aggregate.cpp    2009-08-26 21:26:43.000000000 +0300
+++ infobright-3.2-x86_64src.new/src/storage/brighthouse/core/TempTable_aggregate.cpp    2009-09-18 11:30:24.149255211 +0300
@@ -65,6 +65,7 @@
     ::Filter tuple_left(mit.NoTuples());
     tuple_left.Set();
     gbw.SetDistinctTuples(tuple_left.NoObj());
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "aggregating");
     do {
         if(rccontrol.isOn())  {
             if(upper_approx_of_groups == 1)
@@ -222,6 +223,7 @@
 void TempTable::MultiDimensionalDistinctScan(GroupByWrapper& gbw, DimensionVector &dims)
 {
     MEASURE_FET("TempTable::MultiDimensionalDistinctScan(GroupByWrapper& gbw)");
+    thd_proc_info(&ConnectionInfoOnTLS.Get().Thd(), "Distinct scan");
     while(gbw.AnyOmittedByDistinct()) {    /////////// any distincts omitted? => another pass needed
         ///// Some displays
         _int64 max_size_for_display = 0;


PlanetMySQL Voting: Vote UP / Vote DOWN

EU Should Protect MySQL-based Special Purpose Database Vendors

Сентябрь 12th, 2009
In my recent post on the EU antitrust regulators' probe into the Oracle Sun merger I did not mention an important class of stakeholders: the MySQL-based special purpose database startups. By these I mean:

I think it's safe to say the first three are comparable in the sense that they are all analytical databases: they are designed for data warehousing and business intelligence applications. ScaleDB might be a good fit for those applications, but I think it's architecture is sufficiently different from the first three to not call it an analytical database.

For Kickfire and Infobright, the selling point is that they are offering a relatively cheap solution to build large data warehouses and responsive business intelligence applications. (I can't really find enough information on Calpoint pricing, although they do mention low total cost of ownership.) An extra selling point is that they are MySQL compatible, which may make some difference for some customers. But that compatibility is in my opinion not as important as the availability of a serious data warehousing solution at a really sharp price.

Now, in my previous post, I mentioned that the MySQL and Oracle RDBMS products are very different, and I do not perceive them as competing. Instead of trying to kill the plain MySQL database server product, Oracle should take advantage of a huge opportunity to help shape the web by being a good steward, leading ongoing MySQL development, and in addition, enable their current Oracle Enterprise customers to build cheap LAMP-based websites (with the possibility of adding value by offering Oracle to MySQL data integration).

For these analytical database solutions, things may be different though.

I think these MySQL based analytical databases really are competitive to Oracle's Exadata analytical appliance. Oracle could form a serious threat to these MySQL-based analytical database vendors. After the merger, Oracle would certainly be in a position to hamper these vendors by resticting the non-GPL licensed usage of MySQL.
In a recent ad, Oracle vouched to increase investments in developing Sun's hardware and operating system technology. And this would eventually put them in an even better position to create appliances like Exadata, allowing them to ditch an external hardware partner like HP (which is their Exadata hardware partner).

So, all in all, in my opinion the EU should definitely take a serious look at the dynamics of the analytical database market and decide how much impact the Oracle / Sun merger could have on this particular class of MySQL OEM customers. The rise of these relatvely cheap MySQL-based analytical databases is a very interesting development for the business intelligence and data warehousing space in general, and means a big win for customers that need affordable datawarhousing / business intelligence. It would be a shame if it would be curtailed by Oracle. After the merger, Oracle sure would have the means and the motive, so if someone needs protection, I think it would be these MySQL-based vendors of analytical databases.

As always, these are just my musing and opinions - speculation is free. Feel free to correct me, add applause or point out my ignorance :)

PlanetMySQL Voting: Vote UP / Vote DOWN