Archive for the ‘Search’ Category

Setting optimizer search depth in MySQL

Август 11th, 2011
A customer of ours had an interesting problem regarding a query that was taking too long, around 55s. Looking at the query with the query profiler we found that it was spending most of its time in the "statistics" phase. Now the query was pretty complex, it contained nearly 20 tables with INNER JOINs, LEFT JOINs and even some subqueries. However the tables were small and fetching all the data shouldn't have taken the 55 seconds the query was taking. The problem was that the optimiser was spending too much time evaluating and finding the optimal execution plan.

There are two options in MySQL with which you can control the optimiser's behaviour a bit. The first one is optimizer_prune_level. The pruner discards non-optimal execution plans early without evaluating them fully. It is turned on by default and is not recommended to turn off unless there's a really good reason. For testing purposes we turned the pruner off for this query, but after evaluating the query for over half an hour we gave up on the test and decided to call it infinity. This is understandable, for a 20 table join there are 20! (~2.4 x 10^18) possible execution paths, so evaluating all of them would take forever (now this query contained LEFT JOINs and subqueries so that number is not exactly correct, but still used here to spread fear).

The second interesting option is the optimizer_search_depth. This defines how deep into the execution path the optimizer should look before deciding which plan to use. If this is set to a value higher than the number of tables in the query it means that all possible execution plans (except pruned ones) will be examined. If it's set lower than the number of tables in the query, the optimiser will not evaluate each path to its full extent but choose the first level based on all plans evaluated to the specified depth and then do a new evaluation for the next level and so on until the full execution plan has been chosen. Trying a few different values we got the following results:

SET SESSION optimizer_search_depth = 1;
-> statistics 0.000591

SET SESSION optimizer_search_depth = 5;
-> statistics 0.002321

SET SESSION optimizer_search_depth = 10;
-> statistics 0.365812

SET SESSION optimizer_search_depth = 15;
-> statistics 5.054150

SET SESSION optimizer_search_depth = 0;
-> statistics 0.026904


All of the above are much better than the 58.497217s we got with the default search depth of 62. Note that the value 0 (zero) is a special case where the optimiser chooses and sets the optimal search depth for each query, thus adding a bit of overhead to the optimisation process.

So why not just tune down the search depth to 1 for all queries? The problem here is that you can then miss optimal execution plans as the optimizer won't go deep enough down the execution plan before choosing which plan to use at each level. You might get an execution plan that is horribly wrong. So the best way to go if you have many queries with more than 15 tables is to choose an intermediate value of 7-8 or so, which will produce the optimal execution plan in 99% of all cases, but won't waste too much time finding it.

The best solution is of course to optimize all the 15+ table queries yourself but that's another story!

PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB Full-Text Search Tutorial

Июль 27th, 2011

The InnoDB full-text search capability is an exciting feature. The full-text search itself is generally useful to have in an RDBMS. If an application is using all InnoDB tables except for one that is used for full-text searches, now that last table can be switched to InnoDB. If putting the full-text data in a MyISAM table led to scalability problems, duplication, or a less-than-ideal schema design, now those issues can be addressed.

In this post, I’ll take you through some of the basics of setting up and querying an InnoDB FULLTEXT search index. I’ll leave the scalability and performance aspects to Jimmy’s and Vinay’s blog posts, and just use some toy-sized data for demonstration purposes.

Creating a Table with a Full-Text Search Index

The key component of this feature is an index of type FULLTEXT, applied to one or more columns of an InnoDB table.

In Jimmy’s post, he mentions some scalability considerations where you might create the table (including a special FTS_DOC_ID column), load the data, then create the FULLTEXT index afterward. For simplicity (and since the data volume is so small), I’ll create the table with the index in place, then load the data afterward.

use test;
-- We will do some commits and rollbacks to demonstrate transactional features.
-- So turn off the default setting that commits immediately after each statement.
set autocommit=0;

drop table if exists quotes;
-- In 5.5 and above, by default this table is an InnoDB table.
-- The full-text search feature lets us define the FULLTEXT index.

create table quotes
  (    id int unsigned auto_increment primary key
    , author varchar(64)    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
  );

-- Get some words and phrases to search for into the table.
insert into quotes (author, quote, source) values
  ('Abraham Lincoln', 'Fourscore and seven years ago...',
  'Gettysburg Address')
, ('George Harrison', 'All those years ago...',
  'Live In Japan')
, ('Arthur C. Clarke', 'Then 10 years ago the monolith was discovered.',
  '2010: The Year We Make Contact')
, ('Benjamin Franklin',
  'Early to bed and early to rise, makes a man healthy, wealthy, and wise.',
  'Poor Richard''s Almanack')
, ('James Thurber',
  'Early to rise and early to bed makes a male healthy and wealthy and dead.',
  'The New Yorker')
, ('K', '1500 hundred years ago, everybody knew that the Earth was the center of the universe.',
  'Men in Black')
;

-- Since this is an InnoDB table, we are mindful of transactions.
commit;

Word and Phrase Search – Natural Language Mode

Once the data is loaded and committed, you can run queries using the MATCH(columns) AGAINST (search expression) operator to do the actual searches. You can combine this operator with all the usual WHERE and similar clauses in the SELECT statement.

The simplest kind of search is to find a single word, or a phrase with all words in exact order. For this type of search, use the IN NATURAL LANGUAGE clause inside the AGAINST() call. This technique typically involves a user-entered string that you pass verbatim to the query (of course, after escaping any quotation marks or other special characters to prevent SQL injection attacks).

-- Search for a single word.
select author as "Monolith" from quotes
  where match(quote) against ('monolith' in natural language mode);
+------------------+
| Monolith         |
+------------------+
| Arthur C. Clarke |
+------------------+
1 row in set (0.01 sec)

select author as "Ago" from quotes
  where match(quote) against ('ago' in natural language mode);
+------------------+
| Ago              |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

-- Search for a phrase.
select author as "Years Ago" from quotes
  where match(quote) against ('years ago' in natural language mode);
+------------------+
| Years Ago        |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

AND / OR / NOT Operators – Boolean Mode

For more complicated searches, you can have multiple words and phrases and search for different combinations of optional and required terms, not necessarily in the same order. This technique typically involves several data values that you query from elsewhere, or splitting apart a user-entered string and applying your own rules to the words and phrases inside.

-- Search for a combination of words, not in the same order as the original.
select author as "Ago and Years" from quotes
  where match(quote) against ('+ago +years' in boolean mode);
+------------------+
| Ago and Years    |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

-- Search for other Boolean combinations of words.
select author as "Fourscore or Monolith" from quotes
  where match(quote) against ('fourscore monolith' in boolean mode);
+-----------------------+
| Fourscore or Monolith |
+-----------------------+
| Abraham Lincoln       |
| Arthur C. Clarke      |
+-----------------------+
2 rows in set (0.00 sec)

select author as "Years and not Monolith" from quotes
  where match(quote) against ('+years -monolith' in boolean mode);
+------------------------+
| Years and not Monolith |
+------------------------+
| Abraham Lincoln        |
| George Harrison        |
| K                      |
+------------------------+
3 rows in set (0.00 sec)

Proximity Search

Proximity search is a special case of Boolean search using the @ operator within the AGAINST() string. You supply 2 or more words, double-quoted, within the single-quoted AGAINST() string, followed by @distance to specify how far apart these words can be. The distance represents the maximum number of bytes between the starting points of all these words.

-- The starting points for these words are too far apart
-- (not within 20 bytes), so no results.
select quote as "Too Far Apart" from quotes
  where match(quote) against ('"early wise" @20' in boolean mode);
Empty set (0.00 sec)

-- But the starting points of all words are within 100 bytes,
-- so this query does give results.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @100' in boolean mode);
+-------------------------------------------------------------------------+
| Early...Wise                                                            |
+-------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- In this case, the smallest distance that produces results is 49.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @49' in boolean mode);
+-------------------------------------------------------------------------+
| Early...Wise                                                            |
+-------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- Here is an example showing 2 results, with the words close to each other.
select quote as "Early...Bed" from quotes
  where match(quote) against ('"early bed" @20' in boolean mode);
+---------------------------------------------------------------------------+
| Early...Bed                                                               |
+---------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise.   |
| Early to rise and early to bed makes a male healthy and wealthy and dead. |
+---------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Relevance Ranking

The relevance ranking is fairly basic, derived from word frequencies within each document and the search data overall. Typically, you would only ORDER BY this value for very simplistic searches of small documents; for any important search you would layer your own ranking logic on top, perhaps with the MySQL relevance value as one factor in the overall rank.

-- Get the relevance of a single word.
select substr(quote,1,20) as "And",
  match(quote) against ('and' in natural language mode) as "Relevance"
  from quotes order by "Relevance" desc;
+----------------------+--------------------+
| And                  | Relevance          |
+----------------------+--------------------+
| Fourscore and seven  | 0.0906190574169159 |
| All those years ago. |                  0 |
| Then 10 years ago th |                  0 |
| Early to bed and ear | 0.1812381148338318 |
| Early to rise and ea | 0.2718571722507477 |
| 1500 hundred years a |                  0 |
+----------------------+--------------------+
6 rows in set (0.00 sec)

Transactions

The key idea behind bringing full-text search to InnoDB tables is to make this feature compatible with transactions, so that you can include full-text columns alongside other columns in tables in ways that make sense in terms of schema design, and multiple sessions can update the full-text column data (and/or other columns in the table) simultaneously. The full-text data doesn’t have to be treated as read-only or read-mostly.

As mentioned in Jimmy’s blog post, the table structures that manipulate the full-text data behind the scenes are only updated at COMMIT time. So make sure to insert or update full-text data in one transaction, commit, and then run any full-text queries in a subsequent transaction. (Actually, in the examples below, it looks like the data is taken out of the full-text results as soon as a DELETE is issued, then comes back if the deletion is rolled back. I think that is explained in Jimmy’s blog post by the discussion about the delete-marking optimization to avoid huge updates to the full-text index for deleted data.)

drop table if exists quotes_uncommitted;

create table quotes_uncommitted
  (
      author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
    , primary key (author, quote(128))
  );

-- We insert but don't immediately commit.
insert into quotes_uncommitted select author, quote, source from quotes;
-- Within the same transaction, a full-text search does not see the uncommitted data.
select count(author), author as "Uncommitted Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------+
| count(author) | Uncommitted Results |
+---------------+---------------------+
|             0 | NULL                |
+---------------+---------------------+
1 row in set (0.00 sec)

-- If the newly inserted rows are rolled back...
rollback;
-- ...then the full-text search still doesn't see them.
select count(author), author as "Rolled-Back Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------+
| count(author) | Rolled-Back Results |
+---------------+---------------------+
|             0 | NULL                |
+---------------+---------------------+
1 row in set (0.00 sec)

-- OK, let's start with some committed data in the table, then empty the table,
-- and then try some FTS queries
-- both before and after the commit.
insert into quotes_uncommitted select author, quote, source from quotes;
commit;
delete from quotes_uncommitted;
select count(author), author as "Deleted but still not committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------------------+
| count(author) | Deleted but still not committed |
+---------------+---------------------------------+
|             0 | NULL                            |
+---------------+---------------------------------+
1 row in set (0.00 sec)

rollback;
select count(author), author as "Deleted and rolled back" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-------------------------+
| count(author) | Deleted and rolled back |
+---------------+-------------------------+
|             4 | Abraham Lincoln         |
+---------------+-------------------------+
1 row in set (0.00 sec)

delete from quotes_uncommitted;
commit;
select count(author), author as "Deleted and committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-----------------------+
| count(author) | Deleted and committed |
+---------------+-----------------------+
|             0 | NULL                  |
+---------------+-----------------------+
1 row in set (0.00 sec)

insert into quotes_uncommitted select author, quote, source from quotes;
commit;
truncate table quotes_uncommitted;
select count(author), author as "Truncated" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-----------+
| count(author) | Truncated |
+---------------+-----------+
|             0 | NULL      |
+---------------+-----------+
1 row in set (0.00 sec)

Multi-Column Searches

Although you can only have one FULLTEXT index in an InnoDB table, that index can apply to multiple columns, allowing you to search when you aren’t sure which column contains the term. With a multi-column index, we can MATCH() against all the columns to find words that appear in any of those columns. Always reference all the same columns in the MATCH() clause as in the FULLTEXT index definition, because the information about which column the words appear in is not included in the full-text search data.

drop table if exists quotes_multi_col;

create table quotes_multi_col
  (
    id int unsigned auto_increment primary key
    , author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(author, quote, source)
  );

insert into quotes_multi_col select * from quotes;
commit;

select author as "Poor 1 (NL)", substr(quote,1,15) as "Poor 2 (NL)", source as "Poor 3 (NL)" from
  quotes_multi_col where match(author, quote, source)
  against ('poor' in natural language mode);
+-------------------+-----------------+-------------------------+
| Poor 1 (NL)       | Poor 2 (NL)     | Poor 3 (NL)             |
+-------------------+-----------------+-------------------------+
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
+-------------------+-----------------+-------------------------+
1 row in set (0.00 sec)

select author as "Poor 1 (BOOL)", substr(quote,1,15) as "Poor 2 (BOOL)", source as "Poor 3 (BOOL)"
  from quotes_multi_col where match(author, quote, source)
  against ('poor' in boolean mode);
+-------------------+-----------------+-------------------------+
| Poor 1 (BOOL)     | Poor 2 (BOOL)   | Poor 3 (BOOL)           |
+-------------------+-----------------+-------------------------+
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
+-------------------+-----------------+-------------------------+
1 row in set (0.00 sec)

select author as "Clarke 1 (NL)", substr(quote,1,15) as "Clarke 2 (NL)", source as "Clarke 3 (NL)"
  from quotes_multi_col where match(author, quote, source)
  against ('clarke' in natural language mode);
+------------------+-----------------+--------------------------------+
| Clarke 1 (NL)    | Clarke 2 (NL)   | Clarke 3 (NL)                  |
+------------------+-----------------+--------------------------------+
| Arthur C. Clarke | Then 10 years a | 2010: The Year We Make Contact |
+------------------+-----------------+--------------------------------+
1 row in set (0.00 sec)

Interaction with Other Indexes

Remember that the design of your primary key index and secondary indexes is a big factor in query performance for InnoDB tables.

  • You can include parts (prefixes) of the full-text column(s) within the primary key.
  • However, that might not be a good idea if (a) the associated columns will ever be updated — which causes an expensive reorganization within the InnoDB table, or (b) if the table will have any other secondary indexes — the primary key values for a row are duplicated in the entry for that row in every secondary index, making index operations require more I/O and memory.
  • As mentioned in Jimmy’s blog post, adding the FULLTEXT index to the table is going to create a new column and associated index in the original table, so you could set up the column and index ahead of time, to avoid table reorganization later.
  • You can use the unique constraint of the primary key or a UNIQUE index to prevent duplicate values or combinations of values from being entered.
  • You can use the not-null constraint of the primary key to prevent blank values or combinations of values from being entered.
  • For the Labs release, the InnoDB FULLTEXT processing isn’t integrated with the MySQL optimizer and its estimates for which index is best to use, so don’t draw conclusions about performance characteristics from this early preview.

Stopwords

Stopwords are typically short, commonly used words that you designate as not significant for a search. They are left out of the FULLTEXT index and ignored when entered in FULLTEXT queries. For example, a search for ‘the’ is unsuccessful because it’s in the default stopword list. For your own customized search, you might create a bigger list (say, with common words from several languages) or a smaller one (for example, a music or movie site where words such as “The” in names and titles are significant). The details about customizing the stopword list are in Jimmy’s blog post.

select count(*), author as "Stopword 1", quote as "Stopword 2", source as "Stopword 3"
  from quotes_multi_col
  where match(author, quote, source) against ('the' in natural language mode);
+----------+------------+------------+------------+
| count(*) | Stopword 1 | Stopword 2 | Stopword 3 |
+----------+------------+------------+------------+
|        0 | NULL       | NULL       | NULL       |
+----------+------------+------------+------------+

PlanetMySQL Voting: Vote UP / Vote DOWN

InnoDB Full-Text Search Tutorial

Июль 27th, 2011

The InnoDB full-text search capability is an exciting feature. The full-text search itself is generally useful to have in an RDBMS. If an application is using all InnoDB tables except for one that is used for full-text searches, now that last table can be switched to InnoDB. If putting the full-text data in a MyISAM table led to scalability problems, duplication, or a less-than-ideal schema design, now those issues can be addressed.

In this post, I’ll take you through some of the basics of setting up and querying an InnoDB FULLTEXT search index. I’ll leave the scalability and performance aspects to Jimmy’s and Vinay’s blog posts, and just use some toy-sized data for demonstration purposes.

Creating a Table with a Full-Text Search Index

The key component of this feature is an index of type FULLTEXT, applied to one or more columns of an InnoDB table.

In Jimmy’s post, he mentions some scalability considerations where you might create the table (including a special FTS_DOC_ID column), load the data, then create the FULLTEXT index afterward. For simplicity (and since the data volume is so small), I’ll create the table with the index in place, then load the data afterward.

use test;
-- We will do some commits and rollbacks to demonstrate transactional features.
-- So turn off the default setting that commits immediately after each statement.
set autocommit=0;

drop table if exists quotes;
-- In 5.5 and above, by default this table is an InnoDB table.
-- The full-text search feature lets us define the FULLTEXT index.

create table quotes
  (    id int unsigned auto_increment primary key
    , author varchar(64)    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
  );

-- Get some words and phrases to search for into the table.
insert into quotes (author, quote, source) values
  ('Abraham Lincoln', 'Fourscore and seven years ago...',
  'Gettysburg Address')
, ('George Harrison', 'All those years ago...',
  'Live In Japan')
, ('Arthur C. Clarke', 'Then 10 years ago the monolith was discovered.',
  '2010: The Year We Make Contact')
, ('Benjamin Franklin',
  'Early to bed and early to rise, makes a man healthy, wealthy, and wise.',
  'Poor Richard''s Almanack')
, ('James Thurber',
  'Early to rise and early to bed makes a male healthy and wealthy and dead.',
  'The New Yorker')
, ('K', '1500 hundred years ago, everybody knew that the Earth was the center of the universe.',
  'Men in Black')
;

-- Since this is an InnoDB table, we are mindful of transactions.
commit;

Word and Phrase Search – Natural Language Mode

Once the data is loaded and committed, you can run queries using the MATCH(columns) AGAINST (search expression) operator to do the actual searches. You can combine this operator with all the usual WHERE and similar clauses in the SELECT statement.

The simplest kind of search is to find a single word, or a phrase with all words in exact order. For this type of search, use the IN NATURAL LANGUAGE clause inside the AGAINST() call. This technique typically involves a user-entered string that you pass verbatim to the query (of course, after escaping any quotation marks or other special characters to prevent SQL injection attacks).

-- Search for a single word.
select author as "Monolith" from quotes
  where match(quote) against ('monolith' in natural language mode);
+------------------+
| Monolith         |
+------------------+
| Arthur C. Clarke |
+------------------+
1 row in set (0.01 sec)

select author as "Ago" from quotes
  where match(quote) against ('ago' in natural language mode);
+------------------+
| Ago              |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

-- Search for a phrase.
select author as "Years Ago" from quotes
  where match(quote) against ('years ago' in natural language mode);
+------------------+
| Years Ago        |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

AND / OR / NOT Operators – Boolean Mode

For more complicated searches, you can have multiple words and phrases and search for different combinations of optional and required terms, not necessarily in the same order. This technique typically involves several data values that you query from elsewhere, or splitting apart a user-entered string and applying your own rules to the words and phrases inside.

-- Search for a combination of words, not in the same order as the original.
select author as "Ago and Years" from quotes
  where match(quote) against ('+ago +years' in boolean mode);
+------------------+
| Ago and Years    |
+------------------+
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
+------------------+
4 rows in set (0.00 sec)

-- Search for other Boolean combinations of words.
select author as "Fourscore or Monolith" from quotes
  where match(quote) against ('fourscore monolith' in boolean mode);
+-----------------------+
| Fourscore or Monolith |
+-----------------------+
| Abraham Lincoln       |
| Arthur C. Clarke      |
+-----------------------+
2 rows in set (0.00 sec)

select author as "Years and not Monolith" from quotes
  where match(quote) against ('+years -monolith' in boolean mode);
+------------------------+
| Years and not Monolith |
+------------------------+
| Abraham Lincoln        |
| George Harrison        |
| K                      |
+------------------------+
3 rows in set (0.00 sec)

Proximity Search

Proximity search is a special case of Boolean search using the @ operator within the AGAINST() string. You supply 2 or more words, double-quoted, within the single-quoted AGAINST() string, followed by @distance to specify how far apart these words can be. The distance represents the maximum number of bytes between the starting points of all these words.

-- The starting points for these words are too far apart
-- (not within 20 bytes), so no results.
select quote as "Too Far Apart" from quotes
  where match(quote) against ('"early wise" @20' in boolean mode);
Empty set (0.00 sec)

-- But the starting points of all words are within 100 bytes,
-- so this query does give results.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @100' in boolean mode);
+-------------------------------------------------------------------------+
| Early...Wise                                                            |
+-------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- In this case, the smallest distance that produces results is 49.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @49' in boolean mode);
+-------------------------------------------------------------------------+
| Early...Wise                                                            |
+-------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- Here is an example showing 2 results, with the words close to each other.
select quote as "Early...Bed" from quotes
  where match(quote) against ('"early bed" @20' in boolean mode);
+---------------------------------------------------------------------------+
| Early...Bed                                                               |
+---------------------------------------------------------------------------+
| Early to bed and early to rise, makes a man healthy, wealthy, and wise.   |
| Early to rise and early to bed makes a male healthy and wealthy and dead. |
+---------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Relevance Ranking

The relevance ranking is fairly basic, derived from word frequencies within each document and the search data overall. Typically, you would only ORDER BY this value for very simplistic searches of small documents; for any important search you would layer your own ranking logic on top, perhaps with the MySQL relevance value as one factor in the overall rank.

-- Get the relevance of a single word.
select substr(quote,1,20) as "And",
  match(quote) against ('and' in natural language mode) as "Relevance"
  from quotes order by "Relevance" desc;
+----------------------+--------------------+
| And                  | Relevance          |
+----------------------+--------------------+
| Fourscore and seven  | 0.0906190574169159 |
| All those years ago. |                  0 |
| Then 10 years ago th |                  0 |
| Early to bed and ear | 0.1812381148338318 |
| Early to rise and ea | 0.2718571722507477 |
| 1500 hundred years a |                  0 |
+----------------------+--------------------+
6 rows in set (0.00 sec)

Transactions

The key idea behind bringing full-text search to InnoDB tables is to make this feature compatible with transactions, so that you can include full-text columns alongside other columns in tables in ways that make sense in terms of schema design, and multiple sessions can update the full-text column data (and/or other columns in the table) simultaneously. The full-text data doesn’t have to be treated as read-only or read-mostly.

As mentioned in Jimmy’s blog post, the table structures that manipulate the full-text data behind the scenes are only updated at COMMIT time. So make sure to insert or update full-text data in one transaction, commit, and then run any full-text queries in a subsequent transaction. (Actually, in the examples below, it looks like the data is taken out of the full-text results as soon as a DELETE is issued, then comes back if the deletion is rolled back. I think that is explained in Jimmy’s blog post by the discussion about the delete-marking optimization to avoid huge updates to the full-text index for deleted data.)

drop table if exists quotes_uncommitted;

create table quotes_uncommitted
  (
      author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
    , primary key (author, quote(128))
  );

-- We insert but don't immediately commit.
insert into quotes_uncommitted select author, quote, source from quotes;
-- Within the same transaction, a full-text search does not see the uncommitted data.
select count(author), author as "Uncommitted Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------+
| count(author) | Uncommitted Results |
+---------------+---------------------+
|             0 | NULL                |
+---------------+---------------------+
1 row in set (0.00 sec)

-- If the newly inserted rows are rolled back...
rollback;
-- ...then the full-text search still doesn't see them.
select count(author), author as "Rolled-Back Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------+
| count(author) | Rolled-Back Results |
+---------------+---------------------+
|             0 | NULL                |
+---------------+---------------------+
1 row in set (0.00 sec)

-- OK, let's start with some committed data in the table, then empty the table,
-- and then try some FTS queries
-- both before and after the commit.
insert into quotes_uncommitted select author, quote, source from quotes;
commit;
delete from quotes_uncommitted;
select count(author), author as "Deleted but still not committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+---------------------------------+
| count(author) | Deleted but still not committed |
+---------------+---------------------------------+
|             0 | NULL                            |
+---------------+---------------------------------+
1 row in set (0.00 sec)

rollback;
select count(author), author as "Deleted and rolled back" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-------------------------+
| count(author) | Deleted and rolled back |
+---------------+-------------------------+
|             4 | Abraham Lincoln         |
+---------------+-------------------------+
1 row in set (0.00 sec)

delete from quotes_uncommitted;
commit;
select count(author), author as "Deleted and committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-----------------------+
| count(author) | Deleted and committed |
+---------------+-----------------------+
|             0 | NULL                  |
+---------------+-----------------------+
1 row in set (0.00 sec)

insert into quotes_uncommitted select author, quote, source from quotes;
commit;
truncate table quotes_uncommitted;
select count(author), author as "Truncated" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
+---------------+-----------+
| count(author) | Truncated |
+---------------+-----------+
|             0 | NULL      |
+---------------+-----------+
1 row in set (0.00 sec)

Multi-Column Searches

Although you can only have one FULLTEXT index in an InnoDB table, that index can apply to multiple columns, allowing you to search when you aren’t sure which column contains the term. With a multi-column index, we can MATCH() against all the columns to find words that appear in any of those columns. Always reference all the same columns in the MATCH() clause as in the FULLTEXT index definition, because the information about which column the words appear in is not included in the full-text search data.

drop table if exists quotes_multi_col;

create table quotes_multi_col
  (
    id int unsigned auto_increment primary key
    , author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(author, quote, source)
  );

insert into quotes_multi_col select * from quotes;
commit;

select author as "Poor 1 (NL)", substr(quote,1,15) as "Poor 2 (NL)", source as "Poor 3 (NL)" from
  quotes_multi_col where match(author, quote, source)
  against ('poor' in natural language mode);
+-------------------+-----------------+-------------------------+
| Poor 1 (NL)       | Poor 2 (NL)     | Poor 3 (NL)             |
+-------------------+-----------------+-------------------------+
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
+-------------------+-----------------+-------------------------+
1 row in set (0.00 sec)

select author as "Poor 1 (BOOL)", substr(quote,1,15) as "Poor 2 (BOOL)", source as "Poor 3 (BOOL)"
  from quotes_multi_col where match(author, quote, source)
  against ('poor' in boolean mode);
+-------------------+-----------------+-------------------------+
| Poor 1 (BOOL)     | Poor 2 (BOOL)   | Poor 3 (BOOL)           |
+-------------------+-----------------+-------------------------+
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
+-------------------+-----------------+-------------------------+
1 row in set (0.00 sec)

select author as "Clarke 1 (NL)", substr(quote,1,15) as "Clarke 2 (NL)", source as "Clarke 3 (NL)"
  from quotes_multi_col where match(author, quote, source)
  against ('clarke' in natural language mode);
+------------------+-----------------+--------------------------------+
| Clarke 1 (NL)    | Clarke 2 (NL)   | Clarke 3 (NL)                  |
+------------------+-----------------+--------------------------------+
| Arthur C. Clarke | Then 10 years a | 2010: The Year We Make Contact |
+------------------+-----------------+--------------------------------+
1 row in set (0.00 sec)

Interaction with Other Indexes

Remember that the design of your primary key index and secondary indexes is a big factor in query performance for InnoDB tables.

  • You can include parts (prefixes) of the full-text column(s) within the primary key.
  • However, that might not be a good idea if (a) the associated columns will ever be updated — which causes an expensive reorganization within the InnoDB table, or (b) if the table will have any other secondary indexes — the primary key values for a row are duplicated in the entry for that row in every secondary index, making index operations require more I/O and memory.
  • As mentioned in Jimmy’s blog post, adding the FULLTEXT index to the table is going to create a new column and associated index in the original table, so you could set up the column and index ahead of time, to avoid table reorganization later.
  • You can use the unique constraint of the primary key or a UNIQUE index to prevent duplicate values or combinations of values from being entered.
  • You can use the not-null constraint of the primary key to prevent blank values or combinations of values from being entered.
  • For the Labs release, the InnoDB FULLTEXT processing isn’t integrated with the MySQL optimizer and its estimates for which index is best to use, so don’t draw conclusions about performance characteristics from this early preview.

Stopwords

Stopwords are typically short, commonly used words that you designate as not significant for a search. They are left out of the FULLTEXT index and ignored when entered in FULLTEXT queries. For example, a search for ‘the’ is unsuccessful because it’s in the default stopword list. For your own customized search, you might create a bigger list (say, with common words from several languages) or a smaller one (for example, a music or movie site where words such as “The” in names and titles are significant). The details about customizing the stopword list are in Jimmy’s blog post.

select count(*), author as "Stopword 1", quote as "Stopword 2", source as "Stopword 3"
  from quotes_multi_col
  where match(author, quote, source) against ('the' in natural language mode);
+----------+------------+------------+------------+
| count(*) | Stopword 1 | Stopword 2 | Stopword 3 |
+----------+------------+------------+------------+
|        0 | NULL       | NULL       | NULL       |
+----------+------------+------------+------------+

PlanetMySQL Voting: Vote UP / Vote DOWN

TaskFreak! v0.6.2 – Alter Search Plugin

Июль 15th, 2010

Background Knowledge


The Search Plugin for TaskFreak! created by DaDaemon and xdu v0.0.1 (March 26, 2007) was designed to create a simple, quick search capability of the tasks title and description. As well it only searched through he current task view (tasks visible at the time) and tasks that are not completed. For some this was not what was desired and would rather have the Search Plugin search through all tasks weather completed or not and as well search through the comments of tasks along with the title and description. I’ll show you how this is done using Searcher, bchristie and davidlmansfield instructions posted on the TaskFreak! Forums.

Solution – Add the Ability to Search All Tasks


Edit the “index.php” located in the root of TaskFreak! as follows. We will be working in the “Load Tasks” section to just above the “Task Order” section. This solution was posted by Searcher at Re: Quick ‘n’ Dirty Search Plugin.

Code Before
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
$arrFilters = array();
 
// --- filter: project ---
if ($pProject) {
    // load tasks for specific project
	$sqlFilter = 'ii.projectId = \''.$pProject.'\'';
    $pLink=Tzn::concatUrl($pLink,'sProject='.$pProject);
} else if (!$objUser->checkLevel(6)) {
    // user can only access his own projects
    if ($objUserProjectList->rMore()) {
        $arrProject = array();
        while($objTmp = $objUserProjectList->rNext()) {
            $arrProject[] = $objTmp->id;
        }
        if ($objUser->checkLevel(13)) {
        	$arrProject[] = '0';
        }
        $sqlFilter = 'ii.projectId IN ('.implode(',',$arrProject).')';
 
        unset($arrProject);
        $objUserProjectList->rReset();
    }
}
 
$objItemList->addWhere($sqlFilter);
 
// --- filter: user ---
$pUser = intval($_REQUEST['sUser']);
if (!isset($_REQUEST['sUser']) && @constant('FRK_DEFAULT_VIEW_OWN_TASKS')) {
	// default view is own tasks only
	$pUser = $objUser->id;
}
 
if($_REQUEST['sUser'] == 'myprojects' && $_SESSION['selUser']){
   $objItemList->addWhere('ii.authorId='.$_SESSION['selUser']);
}
elseif ($pUser) {
	$objItemList->addWhere('ii.memberId = \''.$pUser.'\'');
    $_SESSION['selUser'] = $pUser;
    $pDefaultUserId = $pUser;
} else {
    unset($_SESSION['selUser']);
    session_unregister('selUser');
    $pDefaultUserId = $objUser->id;
    // by default, show own tasks
    /*
    if (!$objUser->checkLevel(1)) { // admin can see all
	    $objItemList->addWhere('(ii.memberId='.$objUser->id
    		.' OR ii.authorId='.$objUser->id.')');
    }
    */
}
 
$pLink=Tzn::concatUrl($pLink,'sUser='.$pUser);
 
// --- private tasks --------------------------------------------------------
 
$arrFilter[] = 'showPrivate=0';
 
if ($objUser->checkLevel(12)) {
	// show internal tasks
	$arrFilter[] = 'showPrivate=1';
}
 
$arrFilter[] = '(showPrivate=2 AND (ii.memberId='.$objUser->id
	.' OR ii.authorId='.$objUser->id.'))';
 
$objItemList->addWhere('('.implode(' OR ',$arrFilter).')');
 
// --- filter: context ---
 
if ($_REQUEST['sContext']) {
	$pContext = Tzn::getHttp($_REQUEST['sContext'],true);
	$objItemList->addWhere('context = \''.$pContext.'\'');
    $pLink=Tzn::concatUrl($pLink,'sContext='.$pContext);
}
 
$sqlFilter = '';
$pShow = ($_REQUEST['show'])?$_REQUEST['show']:'today';
$pLink=Tzn::concatUrl($pLink,'show='.$pShow);
 
$pKeepNoDead = intval(@constant('FRK_NO_DEADLINE_KEEP') -1) * 86400;
 
// --- Filter per date -----------------------------------------------------
 
switch ($pShow) {
	case 'all':
		break;
	case 'future':
		// show coming tasks and late tasks (undone only)
		$sqlFilter = '((deadlineDate >= \''
			.strftime(TZN_DATE_SQL,PRJ_DTE_NOW).'\' AND statusKey < '
			.FRK_STATUS_LEVELS.')'.' OR statusKey < '.FRK_STATUS_LEVELS.')';
        // show uncompleted tasks with no deadline
		$sqlFilter .= ' OR (deadlineDate = \'9999-00-00\' AND statusKey < '
			.FRK_STATUS_LEVELS.')';
		break;
	case 'past':
		// show past tasks and already done
		$sqlFilter = '(deadlineDate < \''
			.strftime(TZN_DATE_SQL,PRJ_DTE_NOW).'\' OR statusKey = '
			.FRK_STATUS_LEVELS.')';
		break;
	case 'today':
		// show all future tasks (done + undone) and late tasks
		$pKeepNoDead = intval(@constant('FRK_NO_DEADLINE_KEEP') -1) * 86400;
		$sqlFilter = '(statusKey = '.FRK_STATUS_LEVELS.' AND statusDate > \''
			.gmdate('Y-m-d 00:00:00',time()-$pKeepNoDead).'\') ';
 
		// hide far future tasks ?
		$tmpFilter = '';
		if (@constant('FRK_DEFAULT_FAR_FUTURE_HIDE')) {
			$tmp = intval(FRK_DEFAULT_FAR_FUTURE_HIDE) * 86400;
			$tmpFilter .= 'deadlineDate < \''
				.gmdate('Y-m-d 00:00:00',time()+$tmp).'\'';
		}
 
		// show tasks with no deadline ?
		if (@constant('FRK_NO_DEADLINE_TOO')) {
			// yes
			if ($tmpFilter) {
				$sqlFilter .= 'OR ('.$tmpFilter
					.' OR deadlineDate = \'9999-00-00\')'
					. ' AND statusKey < '.FRK_STATUS_LEVELS;
			} else {
				$sqlFilter .= ' OR statusKey < '.FRK_STATUS_LEVELS;
			}
		} else {
			// don't show uncompleted non planned tasks
			if ($tmpFilter) {
				$sqlFilter .= ' OR ('.$tmpFilter.' AND statusKey < '
    	        	.FRK_STATUS_LEVELS.')';
			} else {
	            $sqlFilter .= ' OR (deadlineDate <> \'9999-00-00\' AND statusKey < '
    	        	.FRK_STATUS_LEVELS.')';
			}
		}
 
        if (@constant('FRK_DEFAULT_CURRENT_TASKS')) {
            $objItemList->setPagination(FRK_DEFAULT_CURRENT_TASKS);
        }
		break;
	default:
		break;
}
 
// -TODO- Add filter current project only (no completed, no cancelled)
 
// echo '<p>&</p><p>-</p><p>-</p>'.$sqlFilter;
 
if ($sqlFilter) {
	$objItemList->addDateFilter($sqlFilter);
}
 
// search filter
	$pSearch = ($_REQUEST['search']);
	if ($pSearch) {
		$sqlFilter = '(ii.title LIKE \'%'.$pSearch.'%\' OR ii.description LIKE \'%'.$pSearch.'%\')';
		$objItemList->addWhere($sqlFilter);
	}
Code After
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
$pSearch = ($_REQUEST['search']);
if ($pSearch) {
	$sqlFilter = '(ii.title LIKE \'%'.$pSearch.'%\' OR ii.description LIKE \'%'.$pSearch.'%\')';
	$objItemList->addWhere($sqlFilter);
}
else
{
	$arrFilters = array();
 
	// --- filter: project ---
	if ($pProject) {
	    // load tasks for specific project
		$sqlFilter = 'ii.projectId = \''.$pProject.'\'';
	    $pLink=Tzn::concatUrl($pLink,'sProject='.$pProject);
	} else if (!$objUser->checkLevel(6)) {
	    // user can only access his own projects
	    if ($objUserProjectList->rMore()) {
	        $arrProject = array();
	        while($objTmp = $objUserProjectList->rNext()) {
	            $arrProject[] = $objTmp->id;
	        }
	        if ($objUser->checkLevel(13)) {
	        	$arrProject[] = '0';
	        }
	        $sqlFilter = 'ii.projectId IN ('.implode(',',$arrProject).')';
 
	        unset($arrProject);
	        $objUserProjectList->rReset();
	    }
	}
 
	$objItemList->addWhere($sqlFilter);
 
	// --- filter: user ---
	$pUser = intval($_REQUEST['sUser']);
	if (!isset($_REQUEST['sUser']) && @constant('FRK_DEFAULT_VIEW_OWN_TASKS')) {
		// default view is own tasks only
		$pUser = $objUser->id;
	}
 
	if($_REQUEST['sUser'] == 'myprojects' && $_SESSION['selUser']){
	   $objItemList->addWhere('ii.authorId='.$_SESSION['selUser']);
	}
	elseif ($pUser) {
		$objItemList->addWhere('ii.memberId = \''.$pUser.'\'');
	    $_SESSION['selUser'] = $pUser;
	    $pDefaultUserId = $pUser;
	} else {
	    unset($_SESSION['selUser']);
	    session_unregister('selUser');
	    $pDefaultUserId = $objUser->id;
	    // by default, show own tasks
	    /*
	    if (!$objUser->checkLevel(1)) { // admin can see all
		    $objItemList->addWhere('(ii.memberId='.$objUser->id
	    		.' OR ii.authorId='.$objUser->id.')');
	    }
	    */
	}
 
	$pLink=Tzn::concatUrl($pLink,'sUser='.$pUser);
 
	// --- private tasks --------------------------------------------------------
 
	$arrFilter[] = 'showPrivate=0';
 
	if ($objUser->checkLevel(12)) {
		// show internal tasks
		$arrFilter[] = 'showPrivate=1';
	}
 
	$arrFilter[] = '(showPrivate=2 AND (ii.memberId='.$objUser->id
		.' OR ii.authorId='.$objUser->id.'))';
 
	$objItemList->addWhere('('.implode(' OR ',$arrFilter).')');
 
	// --- filter: context ---
 
	if ($_REQUEST['sContext']) {
		$pContext = Tzn::getHttp($_REQUEST['sContext'],true);
		$objItemList->addWhere('context = \''.$pContext.'\'');
	    $pLink=Tzn::concatUrl($pLink,'sContext='.$pContext);
	}
 
	$sqlFilter = '';
	$pShow = ($_REQUEST['show'])?$_REQUEST['show']:'today';
	$pLink=Tzn::concatUrl($pLink,'show='.$pShow);
 
	$pKeepNoDead = intval(@constant('FRK_NO_DEADLINE_KEEP') -1) * 86400;
 
	// --- Filter per date -----------------------------------------------------
 
	switch ($pShow) {
		case 'all':
			break;
		case 'future':
			// show coming tasks and late tasks (undone only)
			$sqlFilter = '((deadlineDate >= \''
				.strftime(TZN_DATE_SQL,PRJ_DTE_NOW).'\' AND statusKey < '
				.FRK_STATUS_LEVELS.')'.' OR statusKey < '.FRK_STATUS_LEVELS.')';
	        // show uncompleted tasks with no deadline
			$sqlFilter .= ' OR (deadlineDate = \'9999-00-00\' AND statusKey < '
				.FRK_STATUS_LEVELS.')';
			break;
		case 'past':
			// show past tasks and already done
			$sqlFilter = '(deadlineDate < \''
				.strftime(TZN_DATE_SQL,PRJ_DTE_NOW).'\' OR statusKey = '
				.FRK_STATUS_LEVELS.')';
			break;
		case 'today':
			// show all future tasks (done + undone) and late tasks
			$pKeepNoDead = intval(@constant('FRK_NO_DEADLINE_KEEP') -1) * 86400;
			$sqlFilter = '(statusKey = '.FRK_STATUS_LEVELS.' AND statusDate > \''
				.gmdate('Y-m-d 00:00:00',time()-$pKeepNoDead).'\') ';
 
			// hide far future tasks ?
			$tmpFilter = '';
			if (@constant('FRK_DEFAULT_FAR_FUTURE_HIDE')) {
				$tmp = intval(FRK_DEFAULT_FAR_FUTURE_HIDE) * 86400;
				$tmpFilter .= 'deadlineDate < \''
					.gmdate('Y-m-d 00:00:00',time()+$tmp).'\'';
			}
 
			// show tasks with no deadline ?
			if (@constant('FRK_NO_DEADLINE_TOO')) {
				// yes
				if ($tmpFilter) {
					$sqlFilter .= 'OR ('.$tmpFilter
						.' OR deadlineDate = \'9999-00-00\')'
						. ' AND statusKey < '.FRK_STATUS_LEVELS;
				} else {
					$sqlFilter .= ' OR statusKey < '.FRK_STATUS_LEVELS;
				}
			} else {
				// don't show uncompleted non planned tasks
				if ($tmpFilter) {
					$sqlFilter .= ' OR ('.$tmpFilter.' AND statusKey < '
	    	        	.FRK_STATUS_LEVELS.')';
				} else {
		            $sqlFilter .= ' OR (deadlineDate <> \'9999-00-00\' AND statusKey < '
	    	        	.FRK_STATUS_LEVELS.')';
				}
			}
 
	        if (@constant('FRK_DEFAULT_CURRENT_TASKS')) {
	            $objItemList->setPagination(FRK_DEFAULT_CURRENT_TASKS);
	        }
			break;
		default:
			break;
	}
 
	// -TODO- Add filter current project only (no completed, no cancelled)
 
	// echo '<p>&</p><p>-</p><p>-</p>'.$sqlFilter;
 
	if ($sqlFilter) {
		$objItemList->addDateFilter($sqlFilter);
	}
}

Solution – Add Ability to Also Search within Task Comments


Edit the “index.php” located in the root of TaskFreak! as follows. We will be working in just below the heading section of the “Load Tasks”. This solution was posted by davidlmansfield at [PATCH] [Search Plugin] include tasks matching in comment fields.

Code Before
39
40
41
42
43
44
// search filter
$pSearch = ($_REQUEST['search']);
if ($pSearch) {
	$sqlFilter = '(ii.title LIKE \'%'.$pSearch.'%\' OR ii.description LIKE \'%'.$pSearch.'%\')';
	$objItemList->addWhere($sqlFilter);
}
Code After
39
40
41
42
43
44
// search filter
$pSearch = ($_REQUEST['search']);
if ($pSearch) {
	$sqlFilter = '(ii.title LIKE \'%'.$pSearch.'%\' OR ii.description LIKE \'%'.$pSearch.'%\' OR ii.itemId in (select itemId from '.$objItemList->gTable('itemComment').' where body LIKE \'%'.$pSearch.'%\'))';
	$objItemList->addWhere($sqlFilter);
}

Solution – Stop SQL Injections


Edit the “index.php” located in the root of TaskFreak! as follows. We will be working in just below the heading section of the “Load Tasks”. This solution was posted by bchristie at Re: Quick ‘n’ Dirty Search Plugin.

Code Before
39
40
// search filter
$pSearch = ($_REQUEST['search']);
Code After
39
40
// search filter
$pSearch = isset($_REQUEST['search'])?mysql_real_escape_string($_REQUEST['search']):false;

PlanetMySQL Voting: Vote UP / Vote DOWN

Comparison Between Solr And Sphinx Search Servers (Solr Vs Sphinx – Fight!)

Сентябрь 3rd, 2009

In the past few weeks I've been implementing advanced search at Plaxo, working quite closely with Solr enterprise search server. Today, I saw this relatively detailed comparison between Solr and its main competitor Sphinx (full credit goes to StackOverflow user mausch who had been using Solr for the past 2 years). For those still confused, Solr and Sphinx are similar to MySQL FULLTEXT search, or for those even more confused, think Google (yeah, this is a bit of a stretch, I know).

Similarities

  • Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
  • Both have a long list of high-traffic sites using them (Solr, Sphinx)
  • Both offer commercial support. (Solr, Sphinx)
  • Both offer client API bindings for several platforms/languages (Sphinx, Solr)
  • Both can be distributed to increase speed and capacity (Sphinx, Solr)

Here are some differences

Related questions

Conclusion

In my experience, Solr is very-very fast on the query side. It is also very powerful. The indexing side is very CPU and memory intensive and is an unfortunate side effect of having such a feature-rich, fast application. Nevertheless, I highly recommend Solr.

For disclaimer purposes, I have not had much experience with Sphinx and, again, all credit for this comparison goes to mausch.

 
Similar Posts:Share/Bookmark
PlanetMySQL Voting: Vote UP / Vote DOWN

Handy MySQL documentation indexes

Август 21st, 2009

I just discovered today in the MySQL 5.1 Reference Manual a handy set of additional indexes in the System Navigation section.

  • Index
  • Standard Index
  • C Function Index
  • Command Index
  • Function Index
  • INFORMATION_SCHEMA Index
  • Transaction Isolation Level Index
  • JOIN Types Index
  • Operator Index
  • Option Index
  • Privileges Index
  • SQL Modes Index
  • Status Variable Index
  • Statement/Syntax Index
  • System Variable Index

Perhaps they have been around for some time and I’ve not noticed, but there are much better then searching when you know the content type as per the index list on what you are searching for.


PlanetMySQL Voting: Vote UP / Vote DOWN

Four short links: 7 August 2009

Август 7th, 2009

  1. Defragging the Stimulus -- each [recovery] site has its own silo of data, and no site is complete. What we need is a unified point of access to all sources of information: firsthand reports from Recovery.gov and state portals, commentary from StimulusWatch and MetaCarta, and more. Suggests that Recovery.gov should be the hub for this presently-decentralised pile of recovery data.
  2. Memetracker -- site accompanying the research written up by the New York Times as Researchers at Cornell, using powerful computers and clever algorithms, studied the news cycle by looking for repeated phrases and tracking their appearances on 1.6 million mainstream media sites and blogs [...] For the most part, the traditional news outlets lead and the blogs follow, typically by 2.5 hours [...] a relative handful of blog sites are the quickest to pick up on things that later gain wide attention on the Web. Confirming that blogs and traditional media have a symbiotic relationship, not a parasitic one. (via Stats article in NY Times)
  3. Feds at DefCon Alarmed After RFIDs Scanned (Wired) -- RFID badges make for convenient security, and for convenient attack. Black hats can read your security cards from 2 or 3 feet away, and few in government are aware of the attack vector. To help prevent surreptitious readers from siphoning RFID data, a company named DIFRWear was doing brisk business at DefCon selling leather Faraday-shielded wallets and passport holders lined with material that prevents readers from sniffing RFID chips in proximity cards.
  4. A Comparison of Open Source Search Engines and Indexing Twitter -- Detailed write-up of the open source search options and how they stack up on a pile of Tweets. While researching for the Software section, I was quite surprised by the number of open source vertical search solutions I found: Lucene (Nutch, Solr, Hounder), Sphinx, zettair, Terrier, Galago, Minnion, MG4J, Wumpus, RDBMS (mysql, sqlite), Indri, Xapian, grep … And I was even more surprised by the lack of comparisons between these solutions. Many of these platforms advertise their performance benchmarks, but they are in isolation, use different data sets, and seem to be more focused on speed as opposed to say relevance. (via joshua on Delicious)


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Librarian: Capturing Community Insights

Июль 28th, 2009

In the MySQL Community team, our charter is to serve the MySQL community — new and old MySQL users alike. One of the ways we do this is by facilitating information exchange between community members, where the new can learn from the old.

And there’s been lots of that information exchange going on, such as over mailing lists (in the early days the dominant vehicle), forums, and Planet MySQL.

One problem with this information exchange has been its ephemeral nature. The same questions pop up for many new users, and should they for some reason not be amongst the issues solved in the MySQL documentation, chances are you’ll have to know quite precisely what you’re looking for when coming up with your Google search phrases. Of course, browsing Planet MySQL is a great way of keeping up to date about how others are using MySQL — but what happened on the Planet six months ago, or a year ago? With currently 16940 (sixteen thousand ninehundred and forty) entries, you won’t by chance pop into the most interesting ones.

Enter the MySQL Librarian. The MySQL Librarian knows what the library contains. Ask the MySQL Librarian for what’s popular, and you’ll be answered.

How does the librarian know? Well, the librarian asks the library visitors (that’s people like you!) for what’s interesting, and whether it’s good.

For more on MySQL Librarian,

or

Currently, there are 217 entries in it. It contains what our user community think it is the-best-of-the-best from what the community has produced, and it’s tagged and voted for.

Thanks to Dups for coding this, to Giuseppe for leading the efforts and publishing it, and to everyone in the community who has already entered items into the library (at the same time not forgetting those who wrote the items in the first place).

Hence: Go take a look at the MySQL Librarian! Add new items to it (presentations, pictures, videos, articles, and books)! Tag items for easy searching! Vote, so others know what you liked!