Archive for the ‘optimization’ Category

Resolve many-to-many relations a bit different with MySQL

Сентябрь 2nd, 2010

In database modeling, a m:n relationship is usually resolved by an additional table. But what if this relation is used only for archiving and the number of links in the resulting table is not too high? In that context, I got the idea to store all referring ID's as CSV string directly into a TEXT column of one of the referring tables. I came to this idea, because otherwise I would have to build complicated foreign keys and this way I also save one additional table. Certainly, this only makes sense if the data is not frequently accessed as foreign key. Nevertheless, I would like to tackle the problem, even if the implementation is very MySQL-oriented.

Read the rest »


PlanetMySQL Voting: Vote UP / Vote DOWN

Improving MySQL Productivity – From Design to Implementation

Июль 2nd, 2010

My closing presentation at the dedicated MySQL track at ODTUG Kaleidoscope 2010 discussed various techniques and best practices for improving the ROI of developer resources using MySQL. Included in the sections on Design, Security, Development, Testing, Implementation, Instrumentation and Support were also a number of horror stories of not what to do, combined with practical examples of improving productivity.


PlanetMySQL Voting: Vote UP / Vote DOWN

sort_buffer_size and Knowing Why

Май 10th, 2010

In How to tune MySQL’s sort_buffer_size, Baron gives a condescending viewpoint on how to tune the sort_buffer_size variable in MySQL. In a much-nicer-nutshell, his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, he handwaves that “people utterly ruin their server performance and stability with it,” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.


PlanetMySQL Voting: Vote UP / Vote DOWN

Videos of Pythian Sessions from the 2010 O’Reilly MySQL Conference and Expo

Апрель 21st, 2010

Here’s a sneak peek at a video matrix — this is all the videos that include Pythian Group employees at the MySQL conference. I hope to have all the rest of the videos processed and uploaded within 24 hours, with a matrix similar to the one below (but of course with many more sessions).

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Main Stage
Keynote: Under New Management: Next Steps for the CommunitySheeri K. Cabral (Pythian)N/A18:16
session 14808
Ignite talk: MySQLtuner 2.0Sheeri K. Cabral (Pythian)PDF5:31N/A
Interview
Thoughts on Drizzle and MySQLSheeri K. Cabral (Pythian)N/A9:22N/A
Tutorials
MySQL Configuration Options and Files: Basic MySQL Variables (Part 1)Sheeri K. Cabral (Pythian)
PDF
1:25:04, pre-break

1:35:47, post-break
session 12408
MySQL Configuration Options and Files: Intermediate MySQL Variables (Part 2)Sheeri K. Cabral (Pythian)
PDF
1:25:04, pre-break

1:24:28, post-break
session 12435
Sessions
Better Database Debugging for Shorter DowntimesRob Hamel (Pythian)PDF33:13
session 13021
Find Query Problems Proactively With Query ReviewsSheeri K. Cabral (Pythian)PDF45:59session 13267
Time Zones and MySQLSheeri K. Cabral (Pythian)PDF45:54
session 12412
Security Around MySQLDanil Zburivsky (The Pythian Group)ODP37:27session 13458
Continual Replication SyncDanil Zburivsky (The Pythian Group)ODP45:57session 13428


PlanetMySQL Voting: Vote UP / Vote DOWN

When the subselect runs faster

Март 18th, 2010

A few weeks ago, we had a query optimization request from one of our customer.

The query was very simple like:

CODE:
  1. SELECT * FROM `table` WHERE (col1='A'||col1='B') ORDER BY id DESC LIMIT 20 OFFSET 0

This column in the table is looks like this:

CODE:
  1. `col1` enum('A','B','C','CD','DE','F','G','HI') default NULL

The table used to have rows  more then 500k and of course, there is an index on the col1 column but the cardinality of that index is around 100 what is very low (what explains ENUM type :) ).

CODE:
  1. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
  2. | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra                       |
  3. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
  4. 1 | SIMPLE      | table  | range | col1         | col1 | 2       | NULL | 606920 | Using where; Using filesort |
  5. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+

This query was running for more then 5 minutes!

When you want to run this query mysql first will try to find each row where col1 is A or B. Then its going to order by the ID in a temp table and then throw away all result and return the first 20 match.

So to make this query faster without touching the architecture of the system, I had to re-think the goals.
The customer mentioned and I also verified, without the ORDER BY its runs fast. So basically, find each row where col1 is A|B is fast.
As ID column is your primary key, executing queries against it will be fast also.
But the combination of this two query in one is slow (because of the reasons I mentioned above).

So I tried this:

CODE:
  1. select * from table where id in (SELECT id FROM `table` WHERE (col1='A'||col1='B')) ORDER BY id DESC LIMIT 20 OFFSET 0;

In this case we are about to fool the mysql. The subquery is dependent, meaning that it'll be run once for every row found in the outer query. Normally this is very bad for performance. But in this special case, we have a small LIMIT so it won't be executed very many times.

CODE:
  1. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
  2. | id | select_type        | table | type            | possible_keys | key     | key_len | ref  | rows   | Extra       |
  3. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
  4. 1 | PRIMARY            | table  | index           | NULL          | PRIMARY | 4       | NULL | 765105 | Using where |
  5. 2 | DEPENDENT SUBQUERY | table  | unique_subquery | PRIMARY,col1  | PRIMARY | 4       | func |      1 | Using where |
  6. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+

As you can see, the explain says this query will run probably very slow, but lets see if its true.

CODE:
  1. (20 rows in set (0.01 sec))

As you can see, this a special and unique case when the subquery runs much faster using less resources. I have to mention that this case is more like publication of an interesting case like recommending you the way you can optimize queries . In any other case when the subquery is more sophisticated, this won't work. But queries like the one on the top could be easily rewritten like this when you have a poorly selective column to query against.

You always must consider that MySQL is amazingly bad with subqueries, it nearly always consider then as dependent even if they are not. If the outer query returns 1000 rows, the subquery will be executed 1000 times. So in this case, this is much faster and costs a way less to execute because of the limit in the main query. But always be very careful with similar queries.


Entry posted by Istvan Podor | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN

How to tell when using INFORMATION_SCHEMA might crash your database

Февраль 17th, 2010

There are those that are very adamant about letting people know that using INFORMATION_SCHEMA can crash your database. For example, in making changes to many tables at once Baron writes:

“querying the INFORMATION_SCHEMA database on MySQL can completely lock a busy server for a long time. It can even crash it. It is very dangerous.”

Though Baron is telling the truth here, he left out one extremely important piece of information: you can actually figure out how dangerous your INFORMATION_SCHEMA query will be, ahead of time, using EXPLAIN.


In MySQL 5.1.21 and higher, not only were optimizations made to the INFORMATION_SCHEMA, but new values were added so that EXPLAIN had better visibility into what MySQL is actually doing. As per http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html there are 6 new “Extra” values for EXPLAIN that are used only for INFORMATION_SCHEMA queries.

The first 2 “Extra” values for EXPLAIN are mostly self-explanatory:
Scanned 1 database – Only one database directory needs to be scanned.
Scanned all databases – All database directories are scanned. This is more dangerous than only scanning one database.

Note that there is no middle ground — there is no optimization to only scan 2 databases; either all database directories are scanned, or only one is. If your query spans more than one database, then all database directories are scanned. Note that this

SHOW statements are less dangerous than using INFORMATION_SCHEMA because they only use one database at a time. If you have an INFORMATION_SCHEMA query that produces an “Extra” value of “Scanned 1 database”, it is just as safe as a SHOW statement.

The optimizations went even further, though. From the most “dangerous” — ie, resource intensive — to the least, here are the other 4 “Extra” values introduced in MySQL 5.1.21 (which, for the record, came out in August 2007, so it is a feature that has been around for 2.5 years at this point):

Open_full_table
Open_trigger_only
Open_frm_only
Skip_open_table

A bit more explanation, and some examples:

Open_full_table – Needs to open all the metadata, including the tables format file (.frm) and data/index files such as .MYD and .MYI. The previously linked to manual page about the optimization includes which information will show each “Extra” type — for example, the AUTO_INCREMENT and DATA_LENGTH fields of the TABLES table require opening all the metadata.

mysql> EXPLAIN SELECT TABLE_SCHEMA,TABLE_NAME,AUTO_INCREMENT FROM TABLES\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TABLES
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Open_full_table; Scanned all databases
1 row in set (0.00 sec)

Let’s see an example that only scans 1 database:

mysql> EXPLAIN TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'\G
ERROR 1109 (42S02): Unknown table 'TABLE_NAME' in information_schema
mysql> EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TABLES
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_full_table; Scanned 1 database
1 row in set (0.00 sec)

Note that “Scanned all databases” will apply if there is any way there could be more than one database. For example, on my test server, only the ‘test’ and ’sakila’ databases exist (other than ‘mysql’ and ‘INFORMATION_SCHEMA’ of course) and yet when I do

EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA LIKE 'test%'\G

I still get “Scanned all databases”. So be careful.

One of the basic pieces of advice I see to optimize queries can be applied to queries on the INFORMATION_SCHEMA — Do not use SELECT * unless you actually want to get every single piece of information. In the case of INFORMATION_SCHEMA, optimizing your queries can mean the difference between the server crashing and the server staying up.

Open_trigger_only – Only the .TRG file needs to be opened. Interestingly enough, this does not seem to have an example that applies. The manual page says that the TRIGGERS table uses Open_full_table for all fields. When I tested it, though, I did not get anything in the “Extra” field at all — not “Open_trigger_only” and not even “Open_full_table”:

mysql> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 5.1.37-1ubuntu5-log |
+---------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM TRIGGERS\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TRIGGERS
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra:
1 row in set (0.00 sec)

Open_frm_only – Only the format file (.frm) of the table needs to be open. Again, check the manual page for the fields that can use this optimization — fields such as CREATE_OPTIONS and ENGINE in the TABLES table do, for example.

Skip_open_table – This is the last new “Extra” feature, and it is the best. This optimization type means that no files need to be opened. The database directories are scanned and information can be obtained — mostly the table name, so when querying only the TABLE_NAME and TABLE_SCHEMA fields from the TABLES table, your query is safe.

So instead of putting your head in the sand and never using the great tool that is the INFORMATION_SCHEMA, first EXPLAIN your query to see if it will work or not.

(Note, if you are still on MySQL 5.0, what are you waiting for? The upgrade to MySQL 5.1 is relatively painless, and Pythian has a comprehensive checklist for how to upgrade while keeping your sanity).


PlanetMySQL Voting: Vote UP / Vote DOWN

Indexing text columns in MySQL

Август 13th, 2009

This time, I’m talking about indexes for string typed columns. In particular, I’ll show a procedure I find useful while looking for good index length values for these columns.


I’ll use a sample table called people.


Here’s what it looks like:


mysql> desc people;
+————+——————+——+—–+———+—————-+
| Field      | Type             | Null | Key | Default | Extra          |
+————+——————+——+—–+———+—————-+
| id         | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| title      | varchar(250)     | NO   |     | NULL    |                |
| city       | varchar(250)     | NO   |     | NULL    |                |
| occupation | varchar(250)     | NO   |     | NULL    |                |
+————+——————+——+—–+———+—————-+
4 rows in set (0.00 sec)

mysql> select count(*) from people;
+———-+
| count(*) |
+———-+
|   150000 |
+———-+
1 row in set (0.00 sec)

mysql> 


We’ll start by using procedure analyse to get some useful information
about our data. Unless you know some fields are good candidates for
use with the ENUM datatype, invoke procedure analyse with arguments
(0,0) in order to prevent mysql from suggesting huge ENUMs for string
columns.


mysql> select * from people procedure analyse(0,0)\G
*************************** 1. row ***************************
             Field_name: test.people.id
              Min_value: 1
              Max_value: 150000
             Min_length: 1
             Max_length: 6
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 75000.5000
                    Std: 87258.1632
      Optimal_fieldtype: MEDIUMINT(6) UNSIGNED NOT NULL
*************************** 2. row ***************************
             Field_name: test.people.title
              Min_value: aback exclaims stopgap’s chapel’s tanked claps snowshoe cigarette correlates extras laster cluc
              Max_value: Zulus colossally dictate cleft’s enchanter del
             Min_length: 40
             Max_length: 150
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 95.0869
                    Std: NULL
      Optimal_fieldtype: TINYTEXT NOT NULL
*************************** 3. row ***************************
             Field_name: test.people.city
              Min_value: aback ascertaining unw
              Max_value: Zulus imprisonments veiner a
             Min_length: 5
             Max_length: 30
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 17.4861
                    Std: NULL
      Optimal_fieldtype: TINYTEXT NOT NULL
*************************** 4. row ***************************
             Field_name: test.people.occupation
              Min_value:
              Max_value:
             Min_length: 0
             Max_length: 0
       Empties_or_zeros: 150000
                  Nulls: 0
Avg_value_or_avg_length: 0.0000
                    Std: NULL
      Optimal_fieldtype: CHAR(0) NOT NULL
4 rows in set (0.19 sec)

mysql> 


The id column is numeric, and is actually already indexed. We can’t
see this right here (though I showed an example of how you can use
the output of procedure analyse and extend it to suit your needs, in
which I did include an ‘Indexed’ column in the output), but we can
see this, and gan more information from the table, with the following
statement:


mysql> show index from people;
+——–+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+——–+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
| people |          0 | PRIMARY  |            1 | id          | A         |      150000 |     NULL | NULL   |      | BTREE      |         |
+——–+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
1 row in set (0.00 sec)

mysql>


In order to create a good index for title, we can use the following query. I used 95 as a starting point, since it’s reported by procedure analyse as the average data length of this column.

mysql> select count(distinct(substr(title,1,95))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,95))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                           100.0000 |
+--------------------------------------------------------------------+
1 row in set (1.42 sec)

mysql> 


As you can see, with 95 chars, we can get an index that covers 100% of the rows (i.e., get distinct values for all of them). Still, it’s a big number. Using this query, we
can begin to play a little bit with the index size, until we get to a good compromise between enough distinct values and an index that’s small enough to be processed fast and
maybe even loaded into memory.

mysql> select count(distinct(substr(title,1,20))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,20))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            99.9507 |
+--------------------------------------------------------------------+
1 row in set (1.15 sec)

mysql>


As it turns out, my data set doesn’t require too many characters in order to be differentiated. Actually, I’m kind of cheating here, for a table this size, since I used a data generator to populate this tables, and it
generated a lot of random text. Real world data would probably require a larger prefix in order to get such good differentiation. Anyway, let’s push it a little bit more.

mysql> select count(distinct(substr(title,1,15))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,15))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            97.0787 |
+--------------------------------------------------------------------+
1 row in set (1.18 sec)

mysql> select count(distinct(substr(title,1,14))) / count(distinct(title)) * 100 from people;
+--------------------------------------------------------------------+
| count(distinct(substr(title,1,14))) / count(distinct(title)) * 100 |
+--------------------------------------------------------------------+
|                                                            94.4247 |
+--------------------------------------------------------------------+
1 row in set (1.13 sec)


Here’s the turning point for me. A jumb between 94% and 97% of index coverage in just 1 character. So I’m sticking with 15. Let’s test this with some queries.

mysql> select title from people limit 40;
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| title                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| puffs war's bruises buckles attainably Warnock's discoverer degeneration plots admirably assimilates germane burlesquely ri                        |
| arbitrariness MacDraw's carbonates suckers budget chronicler cur drabs untested Aryans imperial                                                    |
| commender dozes distills blackbird's mend meta                                                                                                     |
| gallons haying occupation's sculpt fittingness scores onwards recessed masculineness denominator's regulated boyfriend's                           |
| authored metaphor derivatively matchmakers ratification railing advantageousness flossing twin's barbarously infinite retreat alloying tenting t   |
| Africans determinateness enquired quivers replaces nowhere applicability negative alarms lacquerer shivered arachnid ulcer sil                     |
| filter offerings unboundedness clearness enthusiast commandants blunted betide rusticated blacks helmet's squabbles tasked Beethoven contro        |
| thirties oftener tunnel anguish attainable formulat                                                                                                |
| grotesquely fallacious inessential fain sanctioned too amplifi                                                                                     |
| consort rapes deeply marker patterns compacted plumbe                                                                                              |
| nasally combings searcher's pathname's bolts retrospective aroused squintingly boyish singers recompiles Austral                                   |
| purpled draggingly nobody's luckier spinning goddess oscilloscopes aimer                                                                           |
| aphasia reconverts shams entangle placer metaphysical visited turret nai                                                                           |
| violation bituminous unweighed darkness cackles consonant foully fisted loci relishes burn m                                                       |
| unsuffixed overdose humbles corpses fashions slashingly quietude delighte                                                                          |
| sheller hypocrisy falser productions shied cube breed childishness requested pads redoub                                                           |
| broils aorta refund sinker cankering reawakens portrayed resolving bard's stand ejects inhabitant's tittering genders proposition                  |
| cyclone's glorification unrestricted delicately inhibitive waterway wardrobes excommunicated laugher                                               |
| poppies heroine's gunner swollen reticle vertebrate's shrank unreliabilities infractions pretentious angstroms relations highness feasibil         |
| ampoule clustering intermediaries honer ree                                                                                                        |
| creature's transferals tidal unsigned stitching ought coerces visa girdling porn janitors parer song's croaked ta                                  |
| hammer amalgamating stunting feasibility hopefulness oilier spraying frets pinks                                                                   |
| comelier tomorrow's cowboy chalked lewdness cordial supering rut's neurally blindingly mute drowsiest gives in                                     |
| slides aqueduct glazers abolition dangers sultry raid prominence hedges walks toppled defenders autocrat                                           |
| theoretic thumps scum's photos bootlegged enveloper sallying populations disruptions inaugurate conclu                                             |
| annotated bibliographies lichen user's bluebird's subproofs unendurably recollection's crumple                                                     |
| sergeant outlets pinion reducer wiling impinge apes insaneness dose automatics lighthouse's cursory sleepily web's interruptions superin           |
| tautened skylarks toad's seminar's archangel's sarcasm shipwrecks indeed incliner tying waterf                                                     |
| chaotic censuses intimacy custodian's extendedly womb's safeguarding desire                                                                        |
| abusiveness skippered inspirer enunciation taper memory's clearly guardianship inputed m                                                           |
| firing anaphora subsegments turbulence affectedness refractory unsprayed chapter's volumes undramati                                               |
| chef reception's glens budged budge arson assistance disagreeableness fodder garnering boated skater heroine's pamphle                             |
| prophetic spilling asper petter's constable's classic ices teethes mails office's sordidness cylindered chaffing bivouac skeptics shuttering quash |
| hoppers iciest sharer dietitian dictionaries frac                                                                                                  |
| racketeer Ellen amounts origin's abstractions render vanish pantries retrieve Maxtor unprojected antithes                                          |
| hinter arrangers dialogue imputing droppers shelver boyish demonstrator braving submitting operated carbonate protruding creasing prospecto        |
| electrify garment linked discernible transceiver's ungrounded telegrapher uncoated                                                                 |
| badly unaffected vex taming affiliation strings detracts grandpas girded cafeteria approving ideology froze underlinings assass                    |
| MHz articulate draws transpires rubbling swarthier reeler bagged snug assisted consonant's settl                                                   |
| taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                           |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
40 rows in set (0.00 sec)

mysql>


That should give you an idea of what type of needles we’re looking for :)

mysql> select * from people where title like 'taxi%';
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
| id     | title                                                                                                                                               | city                           | occupation |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
|     40 | taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                            | rotation tri                   |            |
|   5736 | taxi's allegorically accounting manipulatory cautiousness computational promoter wool reproo                                                        | islan                          |            |
|   6967 | taxis sprawls unblushing rude put absorbs reproducibilities crumblier kid                                                                           | DeMorgan overhe                |            |
|   8388 | taxi catsup ornament transformer widener syndicates dismount pop t                                                                                  | exhibition's manages hedgehog' |            |
|  10418 | taxingly eligibility whichever meditation corrosion unluckiness intoxicat                                                                           | gagged politeness looser       |            |
|  13091 | taxi endowment watchfulness battalions stay trickle tangle blowfish maid's transmissions questionnaire vomit saner strokers constituent crab's      | populations window's zoo armie |            |
|  16723 | taxi storing couldest bouts allegoric cluttered steeples fives hitchhike thrashes retirement de                                                     | envisaged maintain             |            |
|  22168 | taxicab voicer controllers removing cellular houses router nourishing edict shrines strikes testicle's destine whale russeted certi                 | masturbates pu                 |            |
|  24244 | taxing sleeve consultant's nonprogrammable twine delayer ingot respecter subex                                                                      | prematurely significant        |            |
|  35595 | taxicabs aback spinal checkers germs overdraft's coon critter's patrician fled coalition massaging paced condemning impen                           | oppre                          |            |
|  37105 | taxis articulatory indulgence bystanders skin burgess starlight calendaring aunt's bilging benightedness smallest softened xiv immerser fresher unn | crudely papally r              |            |
|  40871 | taxicabs muzzling precocious resentment fellers pitiers beasts marines baselines diagrammatically clowning connecters stampedin                     | influencer                     |            |
|  44298 | taxi rages unintelligibleness anastomosis orthogonally incompatibilities keypads hoarse province stamping perceived sh                              | unforgiving quiet              |            |
|  46395 | taxi plunders novelty's downstairs newborn symbiotic climax highlights lounger keypads only schools possibilities                                   | flowing forgeries slende       |            |
|  66078 | taxicab mercilessly excesses ships merchandising patch strobe                                                                                       | armfuls firmament hum coop     |            |
|  71095 | taxing dispense regrettably resuspended kilobits downwardly domestically laps rainiest recapitulates despiser trophies chums a                      | enumerate indoctrinat          |            |
|  77668 | taxicabs approachable disqualifying charcoaled script's o                                                                                           | kited publish disburse anarch  |            |
|  84162 | taxicab's captivity dean eyeball uninspiring pawn's complication outcast's stared sneak s                                                           | impracticable dungeon crop     |            |
|  87930 | taxi swiftly repacks unsupported slice mornings squares gland solar brainier harrying wag                                                           | cowslip halter plastics        |            |
|  91282 | taxiing undetected cast commands clasping germina                                                                                                   | waxes her                      |            |
| 104029 | taxied roofs besetting leadership electrocuted input metaphor bubbler vowing sponges assess                                                         | worthing understated bark      |            |
| 105818 | taxis libretti defensively shoes antagonistically heavier endeared accidental gauging intercourse revolte                                           | runne                          |            |
| 106163 | taxis bonfire's bench stereo preventer boringness blot's quieter acronyms transplant gained implores ba                                             | sighting leased sp             |            |
| 122471 | taxicab's they've berries invader touching bumblingly courtier's boosting undisguised destroy amanuensis bangles digestiveness poppy's hulls        | purity professional unski      |            |
| 123931 | taxicab's headgear Popek ratifying tenured Pascal's subduedly quitting earned planter forgave implicated noo                                        | bibliographies fraill          |            |
| 127383 | taxi choir parameter's busted inspiration's fixated blinking complicator outwit plotters gobbles burningly leafed corruptively                      | radioed size telegr            |            |
| 134211 | taxi's reconstructible indirect agglutination awaken eked unoccupied pillager subcomputation interviewing treader commending i                      | muddiness broom's              |            |
| 145784 | taxicabs sanctuary armful battening terrifying impactors guns exchequer reigns laughter desolater s                                                 | buttonhole's isomorphism       |            |
| 146371 | taxied carnivals giver misconceptions countenancer introduced anchovy exile pipelines weaned unabridged Britishly abyss's extenuating moodiness th  | penetrator upsho               |            |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
29 rows in set (0.11 sec)

mysql> explain select * from people where title like 'taxi%';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | NULL          | NULL | NULL    | NULL | 150000 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+

mysql> select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+---------------------------------------------------+--------------------+------------+
| id | title                                             | city               | occupation |
+----+---------------------------------------------------+--------------------+------------+
| 34 | hoppers iciest sharer dietitian dictionaries frac | coroneted revolve  |            |
+----+---------------------------------------------------+--------------------+------------+
1 row in set (0.12 sec)

mysql> explain select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | NULL          | NULL | NULL    | NULL | 150000 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)

Ok, now let’s create the index. I’m loading it into a cache here, which is not necessary (even less given my size of 150000 tuples), but it helps. In order to do this, all the indexes in your table must have the same block size.

mysql> create index idx_people_title on people(title(15));
Query OK, 150000 rows affected (1.60 sec)
Records: 150000  Duplicates: 0  Warnings: 0

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> load index into cache people;
+-------------+--------------+----------+----------+
| Table       | Op           | Msg_type | Msg_text |
+-------------+--------------+----------+----------+
| test.people | preload_keys | status   | OK       |
+-------------+--------------+----------+----------+
1 row in set (0.00 sec)

Let’s re test the queries:


mysql> select * from people where title like 'taxi%';
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
| id     | title                                                                                                                                               | city                           | occupation |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
|   8388 | taxi catsup ornament transformer widener syndicates dismount pop t                                                                                  | exhibition's manages hedgehog' |            |
| 127383 | taxi choir parameter's busted inspiration's fixated blinking complicator outwit plotters gobbles burningly leafed corruptively                      | radioed size telegr            |            |
|  13091 | taxi endowment watchfulness battalions stay trickle tangle blowfish maid's transmissions questionnaire vomit saner strokers constituent crab's      | populations window's zoo armie |            |
|  46395 | taxi plunders novelty's downstairs newborn symbiotic climax highlights lounger keypads only schools possibilities                                   | flowing forgeries slende       |            |
|  44298 | taxi rages unintelligibleness anastomosis orthogonally incompatibilities keypads hoarse province stamping perceived sh                              | unforgiving quiet              |            |
|  16723 | taxi storing couldest bouts allegoric cluttered steeples fives hitchhike thrashes retirement de                                                     | envisaged maintain             |            |
|  87930 | taxi swiftly repacks unsupported slice mornings squares gland solar brainier harrying wag                                                           | cowslip halter plastics        |            |
|   5736 | taxi's allegorically accounting manipulatory cautiousness computational promoter wool reproo                                                        | islan                          |            |
| 134211 | taxi's reconstructible indirect agglutination awaken eked unoccupied pillager subcomputation interviewing treader commending i                      | muddiness broom's              |            |
|     40 | taxi's wearying espies Anglican's intangibly fluent jugs liveth pride ex                                                                            | rotation tri                   |            |
|  66078 | taxicab mercilessly excesses ships merchandising patch strobe                                                                                       | armfuls firmament hum coop     |            |
|  22168 | taxicab voicer controllers removing cellular houses router nourishing edict shrines strikes testicle's destine whale russeted certi                 | masturbates pu                 |            |
|  84162 | taxicab's captivity dean eyeball uninspiring pawn's complication outcast's stared sneak s                                                           | impracticable dungeon crop     |            |
| 123931 | taxicab's headgear Popek ratifying tenured Pascal's subduedly quitting earned planter forgave implicated noo                                        | bibliographies fraill          |            |
| 122471 | taxicab's they've berries invader touching bumblingly courtier's boosting undisguised destroy amanuensis bangles digestiveness poppy's hulls        | purity professional unski      |            |
|  35595 | taxicabs aback spinal checkers germs overdraft's coon critter's patrician fled coalition massaging paced condemning impen                           | oppre                          |            |
|  77668 | taxicabs approachable disqualifying charcoaled script's o                                                                                           | kited publish disburse anarch  |            |
|  40871 | taxicabs muzzling precocious resentment fellers pitiers beasts marines baselines diagrammatically clowning connecters stampedin                     | influencer                     |            |
| 145784 | taxicabs sanctuary armful battening terrifying impactors guns exchequer reigns laughter desolater s                                                 | buttonhole's isomorphism       |            |
| 146371 | taxied carnivals giver misconceptions countenancer introduced anchovy exile pipelines weaned unabridged Britishly abyss's extenuating moodiness th  | penetrator upsho               |            |
| 104029 | taxied roofs besetting leadership electrocuted input metaphor bubbler vowing sponges assess                                                         | worthing understated bark      |            |
|  91282 | taxiing undetected cast commands clasping germina                                                                                                   | waxes her                      |            |
|  71095 | taxing dispense regrettably resuspended kilobits downwardly domestically laps rainiest recapitulates despiser trophies chums a                      | enumerate indoctrinat          |            |
|  24244 | taxing sleeve consultant's nonprogrammable twine delayer ingot respecter subex                                                                      | prematurely significant        |            |
|  10418 | taxingly eligibility whichever meditation corrosion unluckiness intoxicat                                                                           | gagged politeness looser       |            |
|  37105 | taxis articulatory indulgence bystanders skin burgess starlight calendaring aunt's bilging benightedness smallest softened xiv immerser fresher unn | crudely papally r              |            |
| 106163 | taxis bonfire's bench stereo preventer boringness blot's quieter acronyms transplant gained implores ba                                             | sighting leased sp             |            |
| 105818 | taxis libretti defensively shoes antagonistically heavier endeared accidental gauging intercourse revolte                                           | runne                          |            |
|   6967 | taxis sprawls unblushing rude put absorbs reproducibilities crumblier kid                                                                           | DeMorgan overhe                |            |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+------------+
29 rows in set (0.00 sec)

mysql> explain select * from people where title like 'taxi%';
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys    | key              | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
|  1 | SIMPLE      | people | range | idx_people_title | idx_people_title | 17      | NULL |   56 | Using where |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
1 row in set (0.00 sec)

Notice how we only go through 56 rows now.


mysql> select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+---------------------------------------------------+--------------------+------------+
| id | title                                             | city               | occupation |
+----+---------------------------------------------------+--------------------+------------+
| 34 | hoppers iciest sharer dietitian dictionaries frac | coroneted revolve  |            |
+----+---------------------------------------------------+--------------------+------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'hoppers iciest sharer dietitian dictionaries frac';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.01 sec)

Just 1 row.
Ok, let’s test the quality of the index to find unique rows.


mysql> explain select * from people where title = 'arbitrariness MacDraw\'s carbonates suckers budget chronicler cur drabs untested Aryans imperial';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'dunes delightfulness manurers jousts axer aristocrat\'s driver greediness bloke pays preconditions enclosure consideration plaster';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    1 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from people where title = 'satire\'s most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally\'s amica';
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys    | key              | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | people | ref  | idx_people_title | idx_people_title | 17      | const |    2 | Using where |
+----+-------------+--------+------+------------------+------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> select *,length(title) from people where title = 'satire\'s most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally\'s amica';
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
| id  | title                                                                                                                                        | city             | occupation | length(title) |
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
| 344 | satire's most quacked campaigning wrists disengaging insignia woodlander knuckles despaired portending incredulous predication Sally's amica | buffaloes refill |            |           140 |
+-----+----------------------------------------------------------------------------------------------------------------------------------------------+------------------+------------+---------------+
1 row in set (0.00 sec)

Granted, 150.000 rows isn’t much, but still, with an average row data length of 94, I had to find a 140 character title in order to go through 2 rows before the right one was found. That’s reasonable, considering
I estimated a 97% index coverage.

In conclusion, wihle my dataset size certainly isn’t large enough to do many interesting things, it should prove the point that a good index size will go great lengths into helping you improve the performance of your MySQL based
system. The query I presented here can be useful to look for a decent index size in terms of unique rows coverage.

Related posts:

  1. Using MySQL Proxy to benchmark query performance By transparently sitting between client and server on each request,...
  2. Making use of procedure analyse() SELECT Field0[,Field1,Field2,...] FROM TABLE PROCEDURE ANALYSE() is a nice tool...
  3. New release of MySQL Proxy GPL MySQL Proxy has a new release, just three days ago,...

Related posts brought to you by Yet Another Related Posts Plugin.


PlanetMySQL Voting: Vote UP / Vote DOWN