Archive for the ‘InnoDB’ Category

How long Innodb Shutdown may take

Сентябрь 3rd, 2010

How long it may take MySQL with Innodb tables to shut down ? It can be quite a while.
In default configuration innodb_fast_shutdown=ON the main job Innodb has to do to complete shutdown is flushing dirty buffers. The number of dirty buffers in the buffer pool varies depending on innodb_max_dirty_pages_pct as well as workload and innodb_log_buffer_size and can be anywhere from 10 to 90% in the real life workloads. Innodb_buffer_pool_pages_dirty status will show you the actual data. Now the flush speed also depends on number of factors. First it is your storage configuration – you may be looking at less than 200 writes/sec for single entry level hard drive to tens of thousands of writes/sec for high end SSD card. Flushing can be done using multiple threads (in XtraDB and Innodb Plugin at least) so it scales well with multiple hard drives. The second important variable is your workload, especially how dirty pages would line up on the hard drive. If there are a lot of sequential pages which are dirty Innodb will be able to use larger size IOs – up to 1MB flushing dirty pages which can be a lot faster than flushing data page by page.

So if we have system with single hard drive doing 200 IO/ssc, 48G buffer pool which is 90% dirty and completely random page writes we’ll look at 13500 seconds or about 5min per 1GB of Buffer pool size.
This is worse case scenario though it is quite common in practice to see shutdown time of about 1min per GB of buffer pool per hard drive.

Baron has written a nice post how to decrease innodb shutdown time which you may want to read on this topic.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Free webinar on MySQL performance this Thursday

Август 23rd, 2010

ODTUG invited me to give a webinar and I said yes, so this Thursday you’re invited to join me as I talk about MySQL performance. We’ve come a very long way towards a MySQL that can perform well on modern hardware, and there really isn’t broad recognition of this. A lot of the best work has gone into the InnoDB “plugin” storage engine, which was announced after my co-authors and I sent High Performance MySQL to the press. I will explain what you should be doing differently now than you did two years ago, and suggest a performance-in-a-nutshell configuration baseline for MySQL that’s quite different from what I’d have said in 2008. You can register for free through GoToWebinar. See you there.

Related posts:

  1. Get a free sample chapter of High Performance MySQL Second Edition
  2. MySQL: Free Software but not Open Source
  3. Speaking at NovaRUG on Thursday
  4. High Performance MySQL is going to press, again
  5. Coming soon: High Performance MySQL, Second Edition


PlanetMySQL Voting: Vote UP / Vote DOWN

Improving InnoDB Transaction Reporting

Август 18th, 2010

Everybody knows that parsing the output of SHOW ENGINE INNODB STATUS is hard, especially when you want to track the information historically, or want to aggregate any of the more dynamic sections such as the TRANSACTIONS one.

Within the InnoDB plugin the INFORMATION_SCHEMA.INNODB_TRX table was added, which allowed you to at least get some of the information on each transaction, but not the full breadth of of information that SHOW ENGINE INNODB STATUS provided.

“This is nice..” I thought “..but why not go the whole hog..?”.. And so I set about doing that, and opened up Bug#53336. In a very short time, I was in a review process with the InnoDB team, the patch was cleaned up, and (after a little hiccup) everything has been pushed for the next milestone.

Here’s the docs notes that I wrote for it today (with a little more annotation):

This patch adds the following columns:

mysql> DESC innodb_trx;
+----------------------------+---------------------+------+-----+---------------------+-------+
| FIELD                      | Type                | NULL | KEY | DEFAULT             | Extra |
+----------------------------+---------------------+------+-----+---------------------+-------+
...
| trx_operation_state        | varchar(64)         | YES  |     | NULL                |       |
| trx_tables_in_use          | bigint(21) UNSIGNED | NO   |     | 0                   |       |
| trx_tables_locked          | bigint(21) UNSIGNED | NO   |     | 0                   |       |
| trx_lock_structs           | bigint(21) UNSIGNED | NO   |     | 0                   |       |
| trx_lock_memory_bytes      | bigint(21) UNSIGNED | NO   |     | 0                   |       |
| trx_rows_locked            | bigint(21) UNSIGNED | NO   |     | 0                   |       |
| trx_rows_modified          | bigint(21) UNSIGNED | NO   |     | 0                   |       |
| trx_concurrency_tickets    | bigint(21) UNSIGNED | NO   |     | 0                   |       |
| trx_isolation_level        | varchar(16)         | NO   |     |                     |       |
| trx_unique_checks          | int(1)              | NO   |     | 0                   |       |
| trx_foreign_key_checks     | int(1)              | NO   |     | 0                   |       |
| trx_last_foreign_key_error | varchar(256)        | YES  |     | NULL                |       |
| trx_adaptive_hash_latched  | int(1)              | NO   |     | 0                   |       |
| trx_adaptive_hash_timeout  | bigint(21) UNSIGNED | NO   |     | 0                   |       |
+----------------------------+---------------------+------+-----+---------------------+-------+
22 rows IN SET (0.05 sec)

Most of these are self explanatory, a lot of them duplicate the information within SHOW ENGINE INNODB STATUS (so you now no longer need to parse the output of that to get complete transaction information).

Below are some of my notes. For comparison, here’s an example of a transaction from the SHOW ENGINE INNODB STATUS statement:

---TRANSACTION 517, ACTIVE 1 sec, OS thread id 2958520320 inserting
mysql TABLES IN USE 2, locked 2
189 LOCK struct(s), heap size 27968, 54389 row LOCK(s), undo log entries 2406
MySQL thread id 2, query id 36 localhost root Sending DATA
INSERT IGNORE INTO t1 (SELECT (i * rand())*100, 1 FROM t1)

trx_operation_state - corresponds to “inserting”, InnoDB’s internal transaction state
trx_tables_in_use - corresponds to “mysql tables in use 2″, the number of table locks requested by MySQL via external_lock()
trx_tables_locked - corresponds to “locked 2″, the number of actual table locks taken via external_lock()
trx_lock_structs - corresponds to “189 lock struct(s)”, the size of the lock struct list
trx_lock_memory_bytes - corresponds to “heap size 27968″, the number of bytes allocated to locks structs
trx_rows_locked - corresponds to “54389 row lock(s)”, an estimation of the number of rows locked (delete marked rows may make it imprecise)
trx_rows_modified - corresponds to “undo log entries 2406″, the number of rows modified in the transaction (inserted, updated, deleted)

Not seen in the transaction output above:

trx_concurrency_tickets - corresponds to “thread declared inside InnoDB 89″ for a transaction in SHOW ENGINE INNODB STATUS, the number of concurrency tickets remaining for the transaction when innodb_thread_concurrency != 0
trx_isolation_level - the transactions isolation level
trx_unique_checks - whether the transaction has “SET UNIQUE_CHECKS = 0″
trx_foreign_key_checks - whether the transaction has “SET FOREIGN_KEY_CHECKS = 0″
trx_last_foreign_key_error - if the last statement in the transaction resulted in an FK error, the error text is printed here
trx_adaptive_hash_latched - corresponds to “holds adaptive hash latch” being printed for a transaction in SHOW ENGINE INNODB STATUS
trx_adaptive_hash_timeout - when innodb_adapative_hash_index is enabled (default), statements that try to get the adapative hash latch spin 10000 (BTR_SEA_TIMEOUT) times, re-trying getting the adaptive hash latch, before giving up. lower numbers here for a lot of transactions may indicate contention on the adaptive hash latch

In summary - no more having to parse SHOW ENGINE INNODB STATUS output for transaction information, now you can just do it with a SQL (with all it’s aggregation goodness as well if you want to)!


PlanetMySQL Voting: Vote UP / Vote DOWN

HailDB 2.0.0 released!

Август 3rd, 2010

(Reposted from the HailDB Blog. See also the announcement on the Drizzle Blog.)
We’ve made our first HailDB release! We’ve decided to make this a very conservative release. Fixing some minor bugs, getting a lot of compiler warnings fixed and start to make the name change in the source from Embedded InnoDB to HailDB.

Migrating your software to use HailDB is really simple. In fact, for this release, it shouldn’t take more than 5 minutes.

Highlights of this release:

  • A lot of compiler warnings have been fixed.
  • The build system is now pandora-build.
  • some small bugs have been fixed
  • Header file is now haildb.h instead of innodb.h
  • We display “HailDB” instead of “Embedded InnoDB”
  • Library name is libhaildb instead of libinnodb
  • It is probably binary compatible with the last Embedded InnoDB release, but we don’t have explicit tests for that, so YMMV.

Check out the Launchpad page on 2.0.0 and you can download the tarball either from there or right here:

  • haildb-2.0.0.tar.gz
    MD5:  183b81bfe2303aed435cdc8babf11d2b
    SHA1:  065e6a2f2cb2949efd7b8f3ed664bc1ac655cd75

PlanetMySQL Voting: Vote UP / Vote DOWN

Why you can’t rely on a replica for disaster recovery

Август 1st, 2010

A couple of weeks ago one of my colleagues and I worked on a data corruption case that reminded me that sometimes people make unsafe assumptions without knowing it. This one involved SAN snapshotting that was unsafe.

In a nutshell, the client used SAN block-level replication to maintain a standby/failover MySQL system, and there was a failover that didn’t work; both the primary and fallback machine had identically corrupted data files. After running fsck on the replica, the InnoDB data files were entirely deleted.

When we arrived on the scene, there was a data directory with an 800+ GB data file, which we determined had been restored from a SAN snapshot. Accessing this file caused a number of errors, including warnings about accessing data outside of the partition boundaries. We were eventually able to coax the filesystem into truncating the data file back to a size that didn’t contain invalid pointers and could be read without errors on the filesystem level. From InnoDB’s point of view, though, it was still completely corrupted. The “InnoDB file” contained blocks of data that were obviously from other files, such as Python exception logs. The SAN snapshot was useless for practical purposes. (The client decided not to try to extract the data from the corrupted file, which we have specialized tools for doing. It’s an intensive process that costs a little money.)

The problem was that the filesystem was ext2, with no journaling and no consistency guarantees. A snapshot on the SAN is just the same as cutting the power to the machine — the block device is in an inconsistent state. A filesystem that can survive that has to ensure that it writes the data to the block device such that it can bring into a consistent state later. The techniques for doing this include things like ordered writes and meta-data journaling. But ext2 does not know how to do that. The data that’s seen by the SAN is some jumble of blocks that represents the most efficient way to transfer the changed blocks over the interconnect, without regard to logical consistency on the filesystem level.

Two things can help avoid such a disaster: 1) get qualified advice and 2) don’t trust the advice; backups and disaster recovery plans must be tested periodically.

This case illustrates an important point that I repeat often. The danger of using a replica as a backup is that data loss on the primary can affect the replica, too. This is true no matter what type of replication is being used. In this case it’s block-level SAN replication. DRBD would behave just the same way. At a higher level, MySQL replication has the same weakness. If you rely on a MySQL slave for a “backup,” you’ll be out of luck when someone accidentally runs DROP TABLE on your master. That statement will promptly replicate over and drop the table off your “backup.”

I still see people using a replica as a backup, and I know it’s just a matter of time before they lose data. In my experience, the types of errors that will propagate through replication are much more common than those that’ll be isolated to just one machine, such as hardware failures.


Entry posted by Baron Schwartz | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Speaking at OSCON 2010

Июль 21st, 2010

I am speaking in a few days at OSCON 2010 in Portland, Oregon. My talk, “MySQL Bottleneck Hunters – How Schooner Increased MySQL Performance by 8x” is about how Schooner optimized MySQL to run on modern hardware with flash memory. Come on by if you’re at OSCON!

Here’s the abstract:

MySQL Bottleneck Hunters – How Schooner Increased MySQL Performance by 8x — Thursday, July 22, 2010 at 11:30am

MySQL users have an insatiable need for speed, capacity, and availability, all at a reasonable cost. This session will provide technical overview of the approach that Schooner engineering took to optimize MySQL Enterprise and InnoDB with flash memory, multi-core processors, and DRAM to achieve an 8x improvement in performance relative to existing systems.

This session will provide a crash course in the performance tuning tools and techniques that Schooner used to radically improve MySQL performance on commodity hardware. It will also offer an overview of the results that were achieved, and compare these results to other commonly deployed MySQL server architectures, including stock MySQL on SSDs and Fusion-io, using the industry-standard DBT2 benchmark.


PlanetMySQL Voting: Vote UP / Vote DOWN

Analyzing the distribution of InnoDB log file writes

Июль 17th, 2010

I recently did a quick analysis of the distribution of writes to InnoDB’s log files. On a high-traffic commodity MySQL server running Percona XtraDB for a gaming workload (mostly inserts to the “moves” table), I used strace to gather statistics about how the log file writes are distributed in terms of write size. InnoDB writes to the log files in multiples of 512 bytes. Mark Callaghan explained this and some of its performance implications here. How many big writes does InnoDB do, and how many small writes?

First, I found out the file descriptor numbers of the log files:

# lsof -p $(pidof mysqld) | grep ib_log
mysqld  29772 mysql    8uW  REG                8,2   268435456   7143989 /var/lib/mysql/ib_logfile0
mysqld  29772 mysql    9uW  REG                8,2   268435456   7143993 /var/lib/mysql/ib_logfile1

The file descriptors are 8 and 9. We’ll need to capture writes to both of those; InnoDB round-robins through them. The following grabs the write sizes out of 100k calls to pwrite() and aggregates them:

# strace -f -p $(pidof mysqld) -e pwrite -s1 -xx 2>&1 \
   | grep 'pwrite([89],' |head -n 100000 \
   | awk '{writes[$5]++}END{for(w in writes){print w, " ", writes[w]}}'

I could have done it better with a little more shell scripting, but the output from this was enough for me to massage into a decent format with another step or two! Here is the final result:

bytes	count
512	44067
1024	30740
1536	15221
2048	7094
2560	1810
3072	570
3584	219
4096	112
4608	39
5120	23
5632	16
6144	15
6656	5
7168	3
7680	8
8192	2
8704	2
9216	1
9728	2
10240	1
10752	2
11264	1
11776	1
14848	1
15360	1
15872	2
16384	4
16896	4
17408	2
17920	2
18432	2
18944	8
19456	7
19968	4
20480	4
21504	1
22016	2
24064	1
40960	1

So, in sum, we see that about 3/4ths of InnoDB log file writes on this workload are 512 or 1024 bytes. (It might vary for other workloads.) Now, what does this actually mean? There are a lot of interesting and complex things to think about here and research further:

  • Most writes are less than 4k, but the operating system page size is 4k. If the page isn’t in the OS buffer cache, then the write will cause a read-around write — the page will have to be read, modified, and then written again, instead of just written. Vadim did some benchmarks earlier showing that the log files need to be in the OS cache for best performance (sorry, can’t find the link right now). This makes intuitive sense given that the writes are smaller than 4k.
  • This server has innodb_flush_log_at_trx_commit set to 2. This means each transaction does a log file write, the same as for a setting of 1. If it’s set to 0, the distribution may be quite different — the writes might accumulate and become much larger.
  • What happens if the log buffer is actually smaller than a transaction? That’s another topic for research. I don’t know off the top of my head.
  • Is the distribution meaningful for determining the necessary log buffer size? The largest write above is just under 40k, which might seem to indicate that only 64k or so of log buffer would be large enough — but is it really? We’d need to benchmark and see. Peter and I were just talking, trying to remember some complex behavior of how space in the buffer is reserved for writes. We could not quite recall what that is, but there is some subtlety. This needs more research. It’s possible that even if the largest write is quite small, a small log buffer size wouldn’t be good and could cause additional lock contention. (There is a single mutex around the log buffer.) We probably need to benchmark to learn more about this.

In the end, the distribution is a simple observation to make, but the InnoDB redo log system is intricate. It’s not something to guess about, but rather something to measure and study more deeply. Perhaps we can follow this up with some more benchmarks or observations under different InnoDB settings and different workloads. Or then again, maybe Yasufumi will read this post when he returns from vacation and already know all the answers by heart!


Entry posted by Baron Schwartz | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Intro to OLAP

Июль 12th, 2010

This is the first of a series of posts about business intelligence tools, particularly OLAP (or online analytical processing) tools using MySQL and other free open source software. OLAP tools are a part of the larger topic of business intelligence, a topic that has not had a lot of coverage on MPB. Because of this, I am going to start out talking about these topics in general, rather than getting right to gritty details of their performance.

I plan on covering the following topics:

  1. Introduction to OLAP and business intelligence. (this post)
  2. Identifying the differences between a data warehouse, and a data mart.
  3. Introduction to MDX queries and the kind of SQL which a ROLAP tool must generate to answer those queries.
  4. Performance challenges with larger databases, and some ways to help performance using aggregation.
  5. Using materialized views to automate that aggregation process.
  6. Comparing the performance of OLAP with and without aggregation over multiple MySQL storage engines at various data scales.

What is BI?
Chances are that you have heard the term business intelligence. Business intelligence (or BI) is a term which encompasses many different tools and methods for analyzing data, usually presenting it in a way that is easily consumed by upper management. This analysis is often used to determine how effectively the business has been at meeting certain performance goals, and to forecast how they will do in the future. To put it another way the tools are designed to provide insight about the business process, hence the name. Probably the most popular BI activity for web sites is click analysis.

As far as BI is concerned, this series of posts focuses on OLAP analysis and in a lesser sense, on data warehousing. Data warehouses often provide the information upon which OLAP analysis is performed, but more on this in post #2.

OLAP? What is that?
OLAP is an acronym which stands for online analytical processing. OLAP analysis, which is really just another name for multidimensional analysis, consists of displaying summary aggregations of the data broken down into different groups. A typical OLAP analysis might show “sale total, by year, by sales rep, by product category”. OLAP analysis is usually used for reporting on current data, looking at historical trends and trying to make predictions about future trends.

Multidimensional Analysis
Multidimensional analysis is a form of statistical analysis. In multidimensional analysis samples representing a particular measure are compared or broken down into different dimensions. For example, in a sales analysis, the “sale amount” is a measure. Measures are always aggregated values. That is, total sales might be expressed as SUM(sale_amt). This is because the SUM of the individual sales will be grouped along different dimensions, such as by year or by product. I’m getting a little ahead of myself. Before we talk about measures and dimensions, we should talk about the two ways in which this information can be stored.

There are two main ways to store multidimensional data for OLAP analysis
OLAP servers typically come in two basic flavors. Some servers have specialized data stores which store data in a form which is highly effective for multidimensional analysis. These servers are termed MOLAP and they tend to have exceptional performance due to their specialized data store. Almost all MOLAP solutions pre-compute many (or even all) of the possible answers to multi-dimensional queries. Palo is an example of an open source version of this technology. ESSbase is an example of closed source product. MOLAP servers often feature extensive compression of data which can improve performance. Loading data into a MOLAP server usually takes a very long time because many of the answers in the cube must be calculated. The extra time spent during the load is usually called “processing” time.

A relational OLAP (or ROLAP) server uses data stored in an RDBMS. These systems trade the performance of a multidimensional store for the convenience of an RDBMS. These servers almost always query over a database which is structured as a STAR or snowflake type schema. To go back to the sales analysis example above, in a STAR schema the facts about the sales would be stored in the fact table, and the list of customers and products would be stored in separate dimension tables. Some ROLAP servers support the aggregation of data into additional tables, and can use the tables automatically. These servers can approach the performance of MOLAP with the convenience of ROLAP, but there are still challenges with this approach. The biggest challenges are the amount of time that it takes to keep the tables updated and in the complexity of the many scripts or jobs which might be necessary to keep the tables in sync. Part five of my series will introduce materialized views which attempt to address these challenges in a manageable way.

What makes a ROLAP so great?
An OLAP server usually returns information to the user as a ‘pivot table‘ or ‘pivot report’. While you could create such a report in a spreadsheet, the ROLAP tool is designed to deal with millions or even billions of rows of data, much more than a spreadsheet can usually handle. MOLAP servers usually require that all, or almost all of the data must fit it memory. Another difference is the ease by which this analysis is constructed. You don’t necessarily have to write queries or drag and drop a report together in order to analyze multidimensional data using an OLAP tool.

Data before pivoting:
Example image from Wikimedia commons showing detail data for sales

Data summarized in pivot form:
Wikimedia commons image showing data summarized in pivot format

ROLAP tools use star schema
As I said before, a sale amount would be considered a measure, and it would usually be aggregated with SUM. The other information about the sale, such as the product, when it was sold and to whom it was sold would be defined in dimension tables. The fact table contains columns which are joined to the dimension tables, such as product_id and customer_id. These are often defined as foreign keys from the fact table to the dimension tables.

A note about degenerate dimensions:
Any values in the fact table that don’t join to dimensions are either considered degenerate dimensions or measures. In the example below the status of the order is a degenerate dimension. A degenerate dimension is stored as an ENUM in many cases. In the example below that there is no actual dimension table which includes the two different order statuses. Such a dimension would add an extra join, which is expensive. Any yes/no field and/or fields with a very low cardinality (such as gender or order status) will probably be stored in the fact table instead of in a dedicated dimension. In the “pivot data” example above, all the dimensions are degenerate: gender, region, style, date.

Star schema with degenerate dimension

Example star schema about sales.

Often a dimension will include redundant information to make reporting easier, a process called “denormalization”. Hierarchical information may be stored in a single dimension. For example, a dimension for products may include both the category AND a sub-category. A time dimension includes year, month and quarter. You can create multiple different hierarchies from a single dimension. This allows ‘drill down’ into the dimension. By default the data would be summarized by year, but you can drill down to quarter or month level aggregation.
Sample date hierarchy, showing quarter, month, year and day hierarchies.

The screenshots here in the jPivot (an OLAP cube browser) documentation can give you a better idea about the display of data. The examples break down sales by product, by category, and by region.

The information is presented in such a fashion that it can be “drilled into” and “filtered on” to provide an easy to use interface to the underlying data. Graphical display of the data as pie, line or bar charts is possible.

Focusing on ROLAP.
This is the MySQL performance blog, and as such an in depth discussion of MOLAP technology is not particularly warranted here. Our discussion will focus on Mondrian. Mondrian is an open source ROLAP server featuring an in-memory OLAP cache. Mondrian is part of the Pentaho open source business intelligence suite. Mondrian is also used by other projects such as Wabit and Jaspersoft. If you are using open source BI then you are probably already using Mondrian. Closed source ROLAP servers include Microstrategy, Microsoft Analysis Services and Oracle BI.

Mondrian speaks MDX, olap4j and XML for analysis. This means that there is a very high chance that your existing BI tools (if you have them) will work with it. MDX is a query language that looks similar to SQL but is actually very different. Olap4j is an OLAP interface for java applications. XML for analysis (XMLA) is an industry standard analytical interface originally created by Microsoft, SAS and Hyperion.

Whats next?
Next we’ll talk about the difference between data marts and data warehouses. The former are usually used for OLAP analysis, but they can be fundamentally related to a warehouse.


Entry posted by Justin Swanhart | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

Intro to OLAP

Июль 12th, 2010

This is the first of a series of posts about business intelligence tools, particularly OLAP (or online analytical processing) tools using MySQL and other free open source software. OLAP tools are a part of the larger topic of business intelligence, a topic that has not had a lot of coverage on MPB. Because of this, I am going to start out talking about these topics in general, rather than getting right to gritty details of their performance.

I plan on covering the following topics:

  1. Introduction to OLAP and business intelligence. (this post)
  2. Identifying the differences between a data warehouse, and a data mart.
  3. Introduction to MDX queries and the kind of SQL which a ROLAP tool must generate to answer those queries.
  4. Performance challenges with larger databases, and some ways to help performance using aggregation.
  5. Using materialized views to automate that aggregation process.
  6. Comparing the performance of OLAP with and without aggregation over multiple MySQL storage engines at various data scales.

What is BI?
Chances are that you have heard the term business intelligence. Business intelligence (or BI) is a term which encompasses many different tools and methods for analyzing data, usually presenting it in a way that is easily consumed by upper management. This analysis is often used to determine how effectively the business has been at meeting certain performance goals, and to forecast how they will do in the future. To put it another way the tools are designed to provide insight about the business process, hence the name. Probably the most popular BI activity for web sites is click analysis.

As far as BI is concerned, this series of posts focuses on OLAP analysis and in a lesser sense, on data warehousing. Data warehouses often provide the information upon which OLAP analysis is performed, but more on this in post #2.

OLAP? What is that?
OLAP is an acronym which stands for online analytical processing. OLAP analysis, which is really just another name for multidimensional analysis, consists of displaying summary aggregations of the data broken down into different groups. A typical OLAP analysis might show “sale total, by year, by sales rep, by product category”. OLAP analysis is usually used for reporting on current data, looking at historical trends and trying to make predictions about future trends.

Multidimensional Analysis
Multidimensional analysis is a form of statistical analysis. In multidimensional analysis samples representing a particular measure are compared or broken down into different dimensions. For example, in a sales analysis, the “sale amount” is a measure. Measures are always aggregated values. That is, total sales might be expressed as SUM(sale_amt). This is because the SUM of the individual sales will be grouped along different dimensions, such as by year or by product. I’m getting a little ahead of myself. Before we talk about measures and dimensions, we should talk about the two ways in which this information can be stored.

There are two main ways to store multidimensional data for OLAP analysis
OLAP servers typically come in two basic flavors. Some servers have specialized data stores which store data in a form which is highly effective for multidimensional analysis. These servers are termed MOLAP and they tend to have exceptional performance due to their specialized data store. Almost all MOLAP solutions pre-compute many (or even all) of the possible answers to multi-dimensional queries. Palo is an example of an open source version of this technology. ESSbase is an example of closed source product. MOLAP servers often feature extensive compression of data which can improve performance. Loading data into a MOLAP server usually takes a very long time because many of the answers in the cube must be calculated. The extra time spent during the load is usually called “processing” time.

A relational OLAP (or ROLAP) server uses data stored in an RDBMS. These systems trade the performance of a multidimensional store for the convenience of an RDBMS. These servers almost always query over a database which is structured as a STAR or snowflake type schema. To go back to the sales analysis example above, in a STAR schema the facts about the sales would be stored in the fact table, and the list of customers and products would be stored in separate dimension tables. Some ROLAP servers support the aggregation of data into additional tables, and can use the tables automatically. These servers can approach the performance of MOLAP with the convenience of ROLAP, but there are still challenges with this approach. The biggest challenges are the amount of time that it takes to keep the tables updated and in the complexity of the many scripts or jobs which might be necessary to keep the tables in sync. Part five of my series will introduce materialized views which attempt to address these challenges in a manageable way.

What makes a ROLAP so great?
An OLAP server usually returns information to the user as a ‘pivot table‘ or ‘pivot report’. While you could create such a report in a spreadsheet, the ROLAP tool is designed to deal with millions or even billions of rows of data, much more than a spreadsheet can usually handle. MOLAP servers usually require that all, or almost all of the data must fit it memory. Another difference is the ease by which this analysis is constructed. You don’t necessarily have to write queries or drag and drop a report together in order to analyze multidimensional data using an OLAP tool.

Data before pivoting:
Example image from Wikimedia commons showing detail data for sales

Data summarized in pivot form:
Wikimedia commons image showing data summarized in pivot format

ROLAP tools use star schema
As I said before, a sale amount would be considered a measure, and it would usually be aggregated with SUM. The other information about the sale, such as the product, when it was sold and to whom it was sold would be defined in dimension tables. The fact table contains columns which are joined to the dimension tables, such as product_id and customer_id. These are often defined as foreign keys from the fact table to the dimension tables.

A note about degenerate dimensions:
Any values in the fact table that don’t join to dimensions are either considered degenerate dimensions or measures. In the example below the status of the order is a degenerate dimension. A degenerate dimension is stored as an ENUM in many cases. In the example below that there is no actual dimension table which includes the two different order statuses. Such a dimension would add an extra join, which is expensive. Any yes/no field and/or fields with a very low cardinality (such as gender or order status) will probably be stored in the fact table instead of in a dedicated dimension. In the “pivot data” example above, all the dimensions are degenerate: gender, region, style, date.

Star schema with degenerate dimension

Example star schema about sales.

Often a dimension will include redundant information to make reporting easier, a process called “denormalization”. Hierarchical information may be stored in a single dimension. For example, a dimension for products may include both the category AND a sub-category. A time dimension includes year, month and quarter. You can create multiple different hierarchies from a single dimension. This allows ‘drill down’ into the dimension. By default the data would be summarized by year, but you can drill down to quarter or month level aggregation.
Sample date hierarchy, showing quarter, month, year and day hierarchies.

The screenshots here in the jPivot (an OLAP cube browser) documentation can give you a better idea about the display of data. The examples break down sales by product, by category, and by region.

The information is presented in such a fashion that it can be “drilled into” and “filtered on” to provide an easy to use interface to the underlying data. Graphical display of the data as pie, line or bar charts is possible.

Focusing on ROLAP.
This is the MySQL performance blog, and as such an in depth discussion of MOLAP technology is not particularly warranted here. Our discussion will focus on Mondrian. Mondrian is an open source ROLAP server featuring an in-memory OLAP cache. Mondrian is part of the Pentaho open source business intelligence suite. Mondrian is also used by other projects such as Wabit and Jaspersoft. If you are using open source BI then you are probably already using Mondrian. Closed source ROLAP servers include Microstrategy, Microsoft Analysis Services and Oracle BI.

Mondrian speaks MDX, olap4j and XML for analysis. This means that there is a very high chance that your existing BI tools (if you have them) will work with it. MDX is a query language that looks similar to SQL but is actually very different. Olap4j is an OLAP interface for java applications. XML for analysis (XMLA) is an industry standard analytical interface originally created by Microsoft, SAS and Hyperion.

Whats next?
Next we’ll talk about the difference between data marts and data warehouses. The former are usually used for OLAP analysis, but they can be fundamentally related to a warehouse.


Entry posted by Justin Swanhart | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

The new hotness in open-core: InnoDB

Июль 2nd, 2010

There’s lots of buzz lately about the so-called “open-core” business model of Marten Mickos’s new employer. But this is nothing new. Depending on how you define it, InnoDB is “open-core,” and has been for a long time. The InnoDB Hot Backup (ibbackup) tool was always closed-source. Did anyone ever cry foul and claim that this made InnoDB itself not open-source, or accuse Innobase / Oracle of masquerading as open-source? I don’t recall that happening, although sometimes people got suspicious about the interplay between the backup tool and the storage engine. Generally, though, the people I know who use InnoDB Hot Backup have no gripes about paying for it.

What is the difference between open-source with closed-source accessories, and crippleware? I think it depends on how people define the core functionality of software. Some might say that backup is core functionality for a database; and others would point to mysqldump and say that InnoDB isn’t crippleware as long as there is some alternative.

I think InnoDB is an interesting case that illustrates what can happen when commercial and GPL play together. Part of that story is the appearance of XtraBackup, an open-source competitor to InnoDB Hot Backup. Everyone’s subject to the rules of the game, unless they restrict the “core,” which would make it non-open-source to begin with.

Related posts:

  1. Does MySQL really have an open-source business model?
  2. MySQL: Free Software but not Open Source
  3. What does an open source sales model look like?
  4. Xtrabackup is for InnoDB tables too, not just XtraDB
  5. Making Maatkit more Open Source one step at a time


PlanetMySQL Voting: Vote UP / Vote DOWN