on primary keys

Июль 30th, 2010

5.1.46 has this change:

Performance: While looking for the shortest index for a covering index scan, the optimizer did not consider the full row length for a clustered primary key, as in InnoDB. Secondary covering indexes will now be preferred, making full table scans less likely.

In other words, if you have covering index on * (which is quite common on m:n mapping tables), use it rather than PK. As I have spent my time getting indexing right and having PKs be based on primary access pattern and SKs on secondary access pattern, I hereby not welcome the new change that suddenly reverses the behavior in late GA version.

Not good, when mysqldump queries end up taking 6 days instead of previous half an hour, not good at all.


PlanetMySQL Voting: Vote UP / Vote DOWN

North Texans — vote for your new meeting location

Июль 30th, 2010
The North Texas MySQL Users Group is now a special interest group withing the Dallas Oracle Users Group. As such, we can meet in oracle's offices in Plano or Irving. In the past there has been demand for meetings in the northern part of the Metroplex and demand for meetings more near the DFW Airport. So we can meet in either office or alternate between the two. Please state you preference by voting on the North Texas MySQL Users Group website.

PlanetMySQL Voting: Vote UP / Vote DOWN

Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning)

Июль 30th, 2010

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).

Partition – Larger tables or fact tables can benefit from partitioning because it makes data load easier and can increase join performance and use data elimination. Parallel execution can be done with partitioning due to partition pruning. The degree of parallelism should be a power of 2, because of hash-based algorithm in hash partitioning. To translate this to the MySQL world, if you are using LINEAR HASH partitioning, then you should use a degree of parallelism that is a power of 2 (I checked, and indeed. Otherwise, use a degree of parallelism that makes sense given the number of partitions you have.

One important note that during Pythian’s testing of MySQL partitioning, we found that all partitions were locked when an INSERT occurs, for the duration of the INSERT. Bulk-loading with MySQL partitioning is not as fast as it would be if MySQL allowed partition pruning for INSERTs.

So, what should be partitioned? For the first level of partitioning, the goal is to enable partitioning pruning and simplify data management. The most typical partitioning is range or interval partitioning on a date column. Interval partitioning is you say what the partition is (date, month) and partition is automatically created. MySQL does not have interval partitioning, and I have seen typical first-level partitioning be range or list based on a date or timestamp column. Note that if you use a timestamp field, the partitioning expression is optimized if you use TO_DAYS(timestamp_field) or YEAR(timestamp_field). In my experience, using anything else (such as DATE(timestamp_field)) actually makes partitioning slower than not using partitioning at all. Note that this is based on tests I did a few months ago, and your mileage may vary.

So — how do you decide partitioning strategy? Ask yourself:

  • What range of data do the queries touch – a quarter, a year?
  • What is the data loading frequency?
  • Is an incremental load required?
  • How much data is involved, a day, a week, a month?

The answers to the above questions will tell you about how big your interval needs to be. The best scenario is that all answers are the same, “we load every day, and people query by day.” If the answers are different weight access a higher priority than loading, because most people care more about query performance than performance of ETL.

This is true even if your intervals have different sizes — ie sales per day are much bigger in Dec but that’s OK. However, Maria recommends that the subpartition be as evenly divided as possible.

Easier to look at more partitions than to look at a partition that’s too big. But you don’t want too many partitions, max Oracle allows partitions is 1 million partitions, prior to 11g it was 64,000. “Stick closer to 64,000 than 1 million”. MySQL’s limitation is 1024 per table.

For the second level of partitioning, also called subpartitioning, the goal is to allow for multi-level pruning and improve join performance. In Oracle, the most typical subpartition is hash or list – in MySQL, you can only subpartition by hash or key.

How do you decide subpartitioning strategy?

  • Select the dimension queried most frequently on the fact table OR
  • Pick the common join column

For example, if you want to look at sales per day, per store, you would choose “per day” as the partition and “per store” as the subpartition.

If you do not have a good partition on logical elements (like grouping), then you can subpartition using hash partitioning on common joins — perhaps surrogate keys, or using join key of the largest table involved in the join.

For example, if the sales table is partitioned and another big table is product, you can hash subpartition product_id.

Because there’s overhead in partitions (loading metadata, reading metadata), make sure size of partitions and subpartitions is >20 Mb. So better to have a 30 Mb subpartition than a 15 Mb subpartition. [I have no idea if this is true in MySQL or not -- I think the general concept is true, because there is some overhead, but I have no idea about the 20 Mb figure and why that's true for Oracle, nor do I know what is true in MySQL.]

One easy calculation is double the # of CPUs, round up to nearest power of 2. If you’re executing in parallel, Oracle will use 2x CPUs. (all this advice, by the way, follows 80/20 rule, this is probably good for about 80% of the environments out there). Of course, MySQL does not do parallel execution very well, so this probably does not apply.

Oracle knows it can get partition elimination while it does a join.

If 2 tables have the same degree of parallelism (same # of buckets) and are partitioned in the same way on the join column (say, customer_id in a subpartition of sales and a partition of customer), Oracle will match the partitions when joining:

sales table joined with customer table can change into 4 small joins:
sales sub part 1 joins with customer part 1
sales sub part 2 joins with customer part 2
sales sub part 3 joins with customer part 3
sales sub part 4 joins with customer part 4

And with parallelism, the total time is now reduced to the time it takes to do one of those smaller joins.

This is also why you want to have a power of 2 for buckets – because cores/processors come in powers of 2. Partition-wise joins like this can also be done with range or list, assuming both tables in the join have the same buckets.

I have no idea if MySQL partitioning works this way, but it’s certainly a functionality that makes sense to me.


PlanetMySQL Voting: Vote UP / Vote DOWN

Data Warehousing Best Practices: Comparing Oracle to MySQL, part 1 (introduction and power)

Июль 30th, 2010

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.

One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from Forrester saying, “3NF is typically a selfless model used by Enterprise data warehouse, which is used by the whole company. A star schema is a selfish model, used by a department, because it’s already got aggregation in it.”

I thought that was an interesting way of pointing that out — most people do not understand why 3NF is not good enough for data warehousing, and I have had a hard time explaining why a star or snowflake schema should be used. Another schema-related topic I had a hard time putting into words before this workshop was the difference between a star and a snowflake schema: compared to a star schema, in a snowflake schema, you have more than one fact table and maybe some dimensions that are not used often.

From Maria and the slides:
“Oracle says model what will suit your business best. Don’t get lost in academia. Most schemas are not 100% according to the theoretical
models. Some examples: 3NF schema with denormalized attributes to avoid costly joins, Star schema with multiple hierarchies in same fact table.”

Data warehousing has a 3-step approach —

1) data sources -> staging layer (temp loading layer)
2) staging layer (temp loading layer)-> foundation (logical, data store) layer
3) foundation (logical, data store) layer -> access and performance layer

The foundation layer is usually 3NF the access layer is usually a star or snowflake schema. As for the data sources, they can be varied, you would hope that they are in 3NF (and if they are you can skip the first 2 steps) but they are not always that way.

The 3 P’s of best practice for data warehousing (on Oracle) are power, partitioning, parallelism. The goal of the data warehousing environment is to minimize the amount of data accessed and use the most efficient joins – so it is not so index focused. This may be based on Oracle’s way of doing joins, I am not so sure if it applies to MySQL as well.

Power The weakest link in the chain (the 3 steps above) will define the throughput, so make sure your hardware configuration is balanced. Maria mentioned that as DBAs, “most of the time we don’t have control over this, but we’re still bound to the SLAs.”

This includes hardware that immediately comes to mind such as # of CPUs/cores, speed of CPU, amount of RAM, speed of disk as well as what we may not think of immediately: speed of network switches, speed of disk controllers, number and speed of host BUS adapters. Notes on host BUS adapters (HBAs): Know the # of HBA ports you have. 4 Gb HBA does 400 Mb/sec. 2 Gb HBA does 200 Mb/sec. Make sure there’s enough HBA capacity to sustain the CPU throughput (ie, make sure HBA isn’t the bottleneck). Also the speed at which it all talks. If you have a 4 Gb machine but a 2 Gb switch, you end up having 2 Gb throughput. Upgrade the network at the same time you upgrade machines.

Because we are talking about data warehousing, it is often not possible to eliminate disk I/O, so the goal is to have the fastest I/O throughput possible. Data warehouses need to be sized on I/O throughput not number of I/O’s.

I made a post earlier about how to determine I/O throughput for a system, which used information from this session. Justin Swanhart already pointed out that this is based on the fact that Oracle can do hash joins and MySQL can only do nested loop joins. I wonder, though, if there is indeed no case when using MySQL for which I/O throughput is a more useful metric than iops.

Disk arrays that are expensive are usually sized for iops, not throughput, and because they’re expensive the disk array is shared throughout the company. A DBA needs to ask ‘how many connections into the storage array do I have? How many disk controllers do I have? Where are my physical disks, and which controllers are they hanging off of?’

Typical 15k rpm disk can do about 25-35 Mb/sec (per disk) random i/o’s. Disk manufacturers will throw out numbers like 200-300 Mb/sec but that’s sequential I/O and leading edge of the drive. Make sure all your LUNs are not coming off the same set of disks, so that you’re not conflicting on disk seeks.

Continue to part 2, partitioning.


PlanetMySQL Voting: Vote UP / Vote DOWN

Databases, The End User Experience

Июль 29th, 2010
Does it matter if the end user knows what the database is?

Recently I got a wonderful view of a database from the end user perspective.

While I was traveling I had found a restaurant where I had decided to let friends who live locally know where I was at. Part way through my food I got a message from a local friend that said "Don't eat there, their food always makes people sick!"

"Always" is a word that I would think would be a little too strong when applied to a restaurant, right?

Nope, the next day I got to feel the full truth of the word.

A couple of days later I am telling some friends about this and a local asked me "Where was this, I want to avoid them." I didn't get asked this question once, I got it asked a dozen times.

I don't know where the place is. Why is that? Because the system I was using lost the entire day worth of my data. I don't know how often they loose data, but from asking a few other folks it appeared to be that it is more frequent then not.

It came up in casual conversation the other day that the site had moved off Postgres to another system recently. Which suddenly made everything make sense, because the particular solution they moved too is not very durable.

We talk about databases being "transactional" or not. We talk about them being "durable". What matters in the end, to me as an end user, is that when I put my data in a system, I want a confirmation that the system stored it. I don't want to retype my data, and I don't want to collect it again. If I was the operator for the site? I certainly wouldn't want to be losing my users data.

In the MySQL world? MyISAM is the most abandoned storage engine in the stack. People will pick it initially because it is fast, but the first time they discover data corruption or have to deal with multiple hours of recovery time they quickly move away from it.

As an operator I wouldn't want to be having to explain to my users or my boss, that we had to wait 12 hours until the database recovered itself (or that it had corrupted itself). "Transactional" systems know how to handle recovery. People will wave their arms about and talk about disk controllers, disk failures, etc... That is hand waving. A properly configured system is redundant and sure it can be hit by lighting, but the real issue is most likely going to be that a plug gets pulled or a program crashes.

I look at, and even work with, some of the "no-sql" solutions. Some of them I recommend, and other's of them I don't. I look at scale out needs, usage patterns, and a wide variety of other details.

As end user though?

I would like to know that my data was stored, and that I will reliably be able to retrieve it when I want. I don't like outages. Of the services online that I pay for or that I have integrated into my life? I can't imagine wanting to deal with a system which was unreliable. A free service which does not work most of the time, is not free. It will consume my time whenever it is not available.

There is an end user experience for the database, site operators ought to remember this.
PlanetMySQL Voting: Vote UP / Vote DOWN

Determining I/O throughput for a system

Июль 29th, 2010

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. In that session, there was a section on how to determine I/O throughput for a system, because in data warehousing I/O per second (iops) is less important than I/O throughput (how much actual data goes through, not just how many reads/writes).

The section contained an Oracle-specific in-database tool, and a standalone tool that can be used on many operating systems, regardless of whether or not a database exists:

If Oracle is installed, run DBMS_RESOURCE_MANAGER.CALIBRATE_IO:

SET SERVEROUTPUT ON
DECLARE
lat INTEGER;
iops INTEGER;
mbps INTEGER;
BEGIN
-- DBMS_RESOURCE_MANAGER.CALIBRATE_IO(<DISKS>, <MAX_LATENCY>,iops,mbps,lat);
DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 10, iops, mbps, lat);
dbms_output.put_line('max_mbps = ' || mbps);
end;

For us MySQL folks, or even the Drizzle or NoSQL folks, Oracle offers a free standalone tool called Orion. The example given in the slides was:

./orion –run advanced –testname mytest –num_small 0 –size_large 1024 –type rand –simulate contact –write 0 –duration 60 –matrix column

-num_small is 0 because you don’t usually do small transactions in a dw.
-type rand for random I/O’s because data warehouse queries usually don’t do sequential reads
-write 0 – no writes, because you do not write often to the dw, that is what the ETL is for.
-duration is in seconds
-matrix column shows you how much you can sustain

I would be interested to see how other folks measure I/O throughput, and maybe even do a side-by-side comparison of different tools. Orion is available for:

Linux (x86, x86-64, Itanium, Power)
Solaris (SPARC64)
AIX (PPC64)
zLinux
HPUX (PA RISC, Itanium)
Windows

I am working on a larger write-up of the session itself, which had many concise descriptions of data warehousing issues, but I thought that this merited its own post.


PlanetMySQL Voting: Vote UP / Vote DOWN

Shinguz’s Blog (en): MySQL Cluster Local Checkpoint (LCP) and Global Checkpoint (GCP)

Июль 29th, 2010

MySQL Cluster is mainly an in-memory database. Nevertheless it requires a good I/O system for writing various different information to disk.

The information MySQL Cluster writes to disk are the:

  • Global Checkpoints (GCP) which are the transactions.
  • Local Checkpoints (LCP) which is a dirty image of the data.
  • Backup.

In the following schema you can see what is related to each other:

Please find here the meaning of each parameter:


PlanetMySQL Voting: Vote UP / Vote DOWN

Fast Company is searching for 2010’s Most Influential Person Online.

Июль 29th, 2010
I admit this is not closely related to my day-by-day activities, but I find the experiment interesting.

MySQL is the most popular open source database, particularly in the Web. Both twitter and facebook are avid users of MySQL. MySQL has influenced plenty of projects, has contributed to the creation of a multiplicity of social networks and it's behind many intriguing entrepreneurial ideas. The question is: how influential are you?

Fast Company started a project to measure how much we influence the online communities. Here is the description coming from their website.
We started with a simple question: Who are the most influential people online right now?
That's what The Influence Project is designed to answer. By participating, you will have your picture appear in the November issue of Fast Company magazine as part of an amazing photo spread. The more influence you demonstrate, the bigger your picture will be.
You may discover that you're more influential than you think.
Influence is not only about having the most friends or followers. Real influence is about being able to affect the behavior of those you interact with, to get others in your social network to act on a suggestion or recommendation. When you post a link or recommend a site, how many people actually bother to check it out? And what's the likelihood of those people then forwarding it on? How far does your influence spread?
This is the type of influence we're looking for. We want to find the most influential person online. Who knows? It might even be you.

Participate to the project by clicking here.



PlanetMySQL Voting: Vote UP / Vote DOWN

Will Oracle kill MySQL?

Июль 29th, 2010

I get asked this question often. It was mentioned again recently in a NYTECH executive breakfast with RedHat CIO Lee Congdon.

The short answer is No.

There is clear evidence that in the short to medium term Oracle will continue to promote and enhance MySQL. Some of these indicators include:

It is clear from these sources that Oracle intends to incorporate MySQL into Oracle Backup and Security Vault products. Both a practical and necessary step. There is also a clear mention of focusing on the Microsoft platform, a clear indicator that SQL Server is in their sights without actually saying it.

What is unknown is exact how and when features will be implemented. Also important is how much these may cost the end user. Oracle is in the business of selling, now an entire H/W and S/W stack. They also have a complicated pricing model of different components with product offerings. I assume this will continue. There are already two indications, InnoDBbackup included for Enterprise Backup (from April Keynote) and 5.1 enterprise split. (Note: while this split may have existed prior to Oracle, it is now more clearly obvious).

MySQL can never be seen as drawing away from any Oracle sales of the core entry level database product. It is likely Oracle will provide a SQL Syntax compatibility layer for MySQL within 2 years, however it will I’m sure be a commercial add-on. Likewise, I would suspect a PL/SQL lite layer within 5 years, but again at a significant cost to offset the potential loss of sales in the low end of the server market. There continues to be active development in the MySQL Enterprise Monitor, MySQL Workbench and MySQL Connectors which is all excellent news for users.

Moving forward, how long will this ancillary development of free tools continue? What will happen to the commercial storage engine, OEM and licensing model after the 5 year commitment? How will the MySQL ecosystem survive.? There is active development in Percona, MariaDB and Drizzle forks, however unless all players that want to provide a close MySQL compatible solution work together, progress will continue to be a disappointing disjointed approach. The 2011 conference season will also see a clear line with competing MySQL conferences in April scheduled at the same time, the O’Reilly MySQL conference in Santa Clara California and the Oracle supported(*) Collaborate 2011 in Orlando, Florida.

I have a number of predictions on what Oracle ME MySQL may look like in 5 years however this is a topic for a personal discussion.


PlanetMySQL Voting: Vote UP / Vote DOWN

Speaking at Surge Scalability 2010 – Baltimore, MD

Июль 29th, 2010

I will be joining a great list of quality speakers including John Allspaw, Theo Schlossnagle, Rasmus Lerdorf and Tom Cook at Surge 2010 in Baltimore, Maryland on Thu 30 Sep, and Fri Oct 1st 2010. Surge 2010 Speaker - Baltimore, MD

My presentation on “The most common MySQL scalability mistakes, and how to avoid them.” will include discussing various experiences observed in the field as a MySQL Consultant and MySQL Performance Tuning expert.

Abstract:

The most common mistakes are easy to avoid however many startups continue to fall prey, with the impact including large re-design costs, delays in new feature releases, lower staff productivity and less then ideal ROI. All growing and successful sites need to achieve higher Availability, seamless Scalability and proven Resilience. Know the right MySQL environments to provide a suitable architecture and application design to support these essential needs.

Overview:

Some details of the presentation would include:

  • The different types of accessible data  (e.g. R/W, R, none)
  • What limits MySQL availability (e.g software upgrades, blocking statements, locking etc)
  • The three components of scalability – Read Scalability/Write Scalability/Caching
  • Design practices for increasing scalability and not physical resources
  • Disaster is inevitable. Having a tested and functional failover strategy
  • When other products are better (e.g. Static files, Session management via Key/Value store)
  • What a lack of accurate monitoring causes
  • What a lack of breakability testing causes
  • What does “No Downtime” mean to your organization.
  • Implementing a successful “failed whale” approach with preemptive analysis
  • Identifying when MySQL is not your bottleneck

PlanetMySQL Voting: Vote UP / Vote DOWN