Archive for the ‘sphinx’ Category

How to offload MySQL server with Sphinx by Vladimir Fedorkov

Февраль 5th, 2012

Vladimir Fedorkov of Sphinx.

Presentation started out with a very nice presentation of candies to all the audience members.

What is Sphinx? Another (C++) daemon on your boxes. Can be queried via API (PHP, Python, etc.) or MySQL-compatible protocol and SQL queries (SphinxQL). Some query examples are in the slides, here’s one about SphinxSE in the KB.

MyISAM FTS is good but becomes slow with half a million documents. InnoDB has FTS now but he’s not tried it (and neither has anyone in the audience to see it compare with MyISAM FTS).

Geographical distance is the distance measuring the surface of the earth (two pairs of float values – latitude, longitude). In Sphinx, there is support for GEODIST(Lat,Long,Lat2,Long2) in Sphinx.

Segments are good for price ranges on a site, date ranges, etc. Use INTERVAL(field, x0, x1, …, xN).

Keep huge text collections out of the database. sql_field = path_to_file_text. Tell Sphinx to index text not from MySQL but out in the filesystem. Keep the metadata inside the database but keep the actual data outside of the database. max_file_field_buffer needs to be set properly.

You can do proximity search with Sphinx — find the words “hello world” within a ten word block, for example.

Resources: the documentation, a book by O’Reilly: Introduction to Search with Sphinx: From installation to relevance tuning (sold out at the FOSDEM O’Reilly booth!), and their community page including wiki, forum, etc.

Related posts:

  1. MySQL Full Text Search by Alex Rubin
  2. Sphinx user stories by Stéphane Varoqui
  3. Morning sessions at MySQL MiniConf


PlanetMySQL Voting: Vote UP / Vote DOWN

How to offload MySQL server with Sphinx by Vladimir Fedorkov

Февраль 5th, 2012

Vladimir Fedorkov of Sphinx.

Presentation started out with a very nice presentation of candies to all the audience members.

What is Sphinx? Another (C++) daemon on your boxes. Can be queried via API (PHP, Python, etc.) or MySQL-compatible protocol and SQL queries (SphinxQL). Some query examples are in the slides, here’s one about SphinxSE in the KB.

MyISAM FTS is good but becomes slow with half a million documents. InnoDB has FTS now but he’s not tried it (and neither has anyone in the audience to see it compare with MyISAM FTS).

Geographical distance is the distance measuring the surface of the earth (two pairs of float values – latitude, longitude). In Sphinx, there is support for GEODIST(Lat,Long,Lat2,Long2) in Sphinx.

Segments are good for price ranges on a site, date ranges, etc. Use INTERVAL(field, x0, x1, …, xN).

Keep huge text collections out of the database. sql_field = path_to_file_text. Tell Sphinx to index text not from MySQL but out in the filesystem. Keep the metadata inside the database but keep the actual data outside of the database. max_file_field_buffer needs to be set properly.

You can do proximity search with Sphinx — find the words “hello world” within a ten word block, for example.

Resources: the documentation, a book by O’Reilly: Introduction to Search with Sphinx: From installation to relevance tuning (sold out at the FOSDEM O’Reilly booth!), and their community page including wiki, forum, etc.

Related posts:

  1. MySQL Full Text Search by Alex Rubin
  2. Sphinx user stories by Stéphane Varoqui
  3. Morning sessions at MySQL MiniConf


PlanetMySQL Voting: Vote UP / Vote DOWN

Sphinx user stories by Stéphane Varoqui

Февраль 5th, 2012

Stephane Varoqui, Field Services SkySQL, Vlad Fedorkov, Director of PS, Sphinx Inc, Christophe Gesche, LAMP Expert, Delcampe, Herve Seignole, Web Architect, Groupe Pierre & Vacances Center Parcs – this is a big talk!

Pros: Filtering takes place on attributes in separate tables. Rely on the optimizer choice. HASH JOIN can help (MariaDB 5.3). Table elimination can help (MariaDB 5.2). ICP Index Condition Pushdown can help (MariaDB 5.3/MySQL 5.6). Max 80M documents at Pixmania, all queries come in less than 1s using 128GB of RAM (MariaDB 5.2). At PAP.fr, there is 16GB RAM with MariaDB 5.2.

Cons: CPU intensive (replication with many slaves). Need covering indexes to cover various !filter !order. Join & sorting cost on lazy filtering.

The more indexes you have in the system, the more you need to increase the main memory of the server. Keep the Btree’s in memory.

What about denormalized schemas? Not really CPU intensive, just IO. Can go to disk, full partition scan with filtering taking place on record order using covering index. Can shard but not that easy. Use the spider storage engine or shard-query. Can use memory engine for isolation. There are cons like duplicate data, duplicate indexes, missing material views, merge index cost, impact on write performance, and can consume a lot of memory with many indexes.

MySQL can push hardware, so read less/do less/read serialized/map reduce to get better latency. Chose data type wisely, replace string with numeric, vertical & horizontal sharing, snowflake compression (combination of attributes, build a table of the combination and replace it with an ID). If you are lazy, just use Sphinx!

Sphinx is just another daemon that can serve queries. Its easy to setup, easy to scale, storage engine makes it accessible to current MySQL users, API in core MariaDB (SphinxSE), SphinxQL, SphinxSE is transparent to the application layer of MySQL protocol.

Demo done using the Employees DB.

Pierre & Vacances – Centerparcs. Free text search, they use MariaDB using Levenshtein UDF implementation. Went live 01/2011. First implementation of Sphinx (12 indexes). Its grown, they use PHP API. The new goal is to never send an empty result. 1 index per website/market, with a total of 15 million docs. Index built on standalone server. Using internal gearman job schedule to generate index before cache generated. Current monitoring is via Nagios & perl, but the next step is to use Monyog & MariaDB INFORMATION_SCHEMA plugin.

Delcampe is an auction website with 45M ‘active’ items. Its dedicated for collectors. 3 string fields, and 15 attributes. 40-120K new items daily. Started with mysql fulltext in 2007, moved to Sphinx in 2008. There was a need to have more filters. Now they have 5 sphinx servers + 1 MySQL server. HAproxy to load balance.

Related posts:

  1. MySQL Full Text Search by Alex Rubin
  2. Abusing MySQL (& thoughts on NoSQL)
  3. OpenSUSE users have a choice of database now!


PlanetMySQL Voting: Vote UP / Vote DOWN

Using Jenkins to parse sphinx warnings

Октябрь 5th, 2011

At Percona, we’re now using sphinx for our documentation. We’re also using Jenkins for our  continuous integration. We have compiler warnings from GCC being parsed by Jenkins using the built in filters, but there isn’t one for the sphinx warnings.

Luckily, in the configuration page for Jenkins, the Warnings plugin allows you to specify your own filters. I’ve added the following filter to process warnings from sphinx:

For those who want to copy and paste:

Regex: ^(.*):(\d+): \((.*)\) (.*)

Mapping Script

import hudson.plugins.warnings.parser.Warning
String fileName = matcher.group(1)
String lineNumber = matcher.group(2)
String category = matcher.group(3)
String message = matcher.group(4)

return new Warning(fileName, Integer.parseInt(lineNumber), "sphinx", category, message);

Example log message: /home/stewart/percona-server/docs-5.1/doc/source/release-notes/Percona-Server-1.0.2-3.rst:67: (WARNING/2) Inline literal start-string without end-string.

Then I can select this filter from the job that builds (and publishes) our documentation and it shows up like any other compiler warnings. Neat!

TODO: get the intersphinx warnings also in there

TODO: fix the linkcheck target in Sphinx so that it’s easily parseable and can also be integrated.


PlanetMySQL Voting: Vote UP / Vote DOWN

A review of Introduction to Search with Sphinx by Andrew Aksyonoff

Август 29th, 2011

Introduction to Search with Sphinx

Introduction to Search with Sphinx

Introduction to Search with Sphinx by Andrew Aksyonoff, O’Reilly Media 2011. About 146 pages. (Here’s a link to the publisher’s page.)

This is an engaging short introduction to Sphinx. At 146 pages, you shouldn’t expect it to go into every detail, and it doesn’t. There are major topics that it omits entirely or mentions only tangentially, such as distributed searching across a cluster of machines, real-time updating of indexes and attributes, and so on. But although the book doesn’t boil the ocean, it does a great job at covering an introductory subset of Sphinx. It’s just the book you need to get rid of MyISAM full-text indexes and return MySQL to proper performance again.

This book is about how to get started with Sphinx: getting it up and running, basic configuration, creating and maintaining indexes, and querying with the SphinxAPI and SphinxQL languages. It covers relevance and ranking, so you can understand how Sphinx is different from most relevance engines and why it produces better results.

If you haven’t explored Sphinx, you should start with this book. If you know Sphinx, it might be redundant to you.

Either way, it’s a fun book to read. Andrew is a great writer with a charming sense of geeky humor, and he makes a lot of jokes in his examples. These don’t interfere with the learning, but they do make it a lot less dull. (It isn’t forced or overdone, either, unlike a lot of books that have to make constant Beatles references at every opportunity.)

Related posts:

  1. Two examples of Sphinx search at work
  2. Did you know Sphinx can act like a MySQL server?
  3. Sphinx 0.9.8 is released!
  4. Who needs better email search for Thunderbird?
  5. A review of MySQL 5.1 Plugin Development by Golubchik and Hutchings


PlanetMySQL Voting: Vote UP / Vote DOWN

SphinxSE with MySQL

Ноябрь 1st, 2010
Check out this SlideShare Presentation:

PlanetMySQL Voting: Vote UP / Vote DOWN

Sphinx new

Ноябрь 1st, 2010
Check out this SlideShare Presentation:

PlanetMySQL Voting: Vote UP / Vote DOWN

Sphinx & MySQL: facts and misconceptions

Сентябрь 2nd, 2010

Sphinx search is a full text search engine, commonly used with MySQL.

There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx’ properties, hoping to answer some common questions.

  • Sphinx is not part of MySQL/Oracle.
  • It is a standalone server; an external application to MySQL.
  • Actually, it is not MySQL specific. It can work with other RDBMS: PostgreSQL, MS SQL Server.
  • And, although described as “free open-source SQL full-text search engine”, it is not SQL-specific: Sphinx can read documents from XML.
  • It is often described as “full text search for InnoDB”. This description is misleading. Sphinx indexes text; be it from any storage engine or external source. It solves, in a way, the issue of “FULLTEXT is only supported by MyISAM”. Essentially, it provided full-text indexing for InnoDB tables, but in a very different way than the way MyISAM’s FULLTEXT index works.

Sphinx works by reading documents, usually from databases. Considering the case of MySQL, Sphinx issues a SQL query which retrieves relevant data (mostly the text you want to index, but other properties allowed).

  • Being an external module, it does not update its indexes on the fly. So if 10 new rows are INSERTed, it has no knowledge of this. It must be called externally to re-read the data (or just read the new data), and re-index.
    • This is perhaps the greatest difference, functionality-wise, between Sphinx and MyISAM’s FULLTEXT. The latter is always updated, for every row INSERTed, DELETEd or UPDATEd. The latter also suffers by this property, as this makes for serious overhead with large volumes.
    • There’s more than one way to make that less of an issue. I’ll write some more in future posts.
  • Sphinx does not keep the text to itself; just the index. Sphinx cannot be asked “Give me the blog post content for those posts containing ‘open source’”.
    • Sphinx will only tell you the ID (i.e. Primary Key) for the row that matches your search.
    • It is up to you to then get the content from the table.
    • With SphinxSE (Sphinx Storage Engine for MySQL) this becomes easier, all-in-one query.
  • It can keep other numeric data. Such data can be used to filter results.
  • It provides with GROUP BY-like, as well as ORDER BY-like mechanism.
  • It allows for ordering results by relevance.
  • It allows for exact match search, boolean search, and more.
  • It has an API & implementation for popular programming languages: PHP, Python, Perl, Ruby, Java.

The above describes Sphinx as a general fulltext search engine for databases. It does, however, have special treatment for MySQL:

  • First and foremost, it knows how to query MySQL for data (duh!)
  • If you don’t mind compiling from source, you can rebuild MySQL with SphinxSE: a storage engine implementation. This storage engine does not actually hold any data, but rather provides an SQL-like interface to the search daemon.
    • Thus, you can query for search results using SELECT statements, JOINing to document tables, retrieving results, all in one step.
    • If you do mind compiling MySQL, be aware that MariaDB comes with SphinxSE built in in newer versions.
  • It implements the MySQL protocol. You can connect to the sphinx server using a MySQL client, and actually issue SQL statements to retrieve data. Not all SQL is supported. The valid subset is called SphinxQL.
  • Snippets (excerpts) are supported via MySQL UDF.

PlanetMySQL Voting: Vote UP / Vote DOWN

Expert PHP and MySQL published!

Апрель 15th, 2010
Expert PHP and MySQLI'm very pleased to announce the publication of my book Expert PHP and MySQL, published by Wrox. This book was written by myself, Andrew Curioso and Ronald Bradford. The short of it is, upon finishing my previous book, Developing Web Applications with Apache, MySQL, memcached, and Perl, Wiley asked me if I knew of anyone who would like to write a MySQL/PHP book. I had worked with Andrew at Lycos and found him to be a brilliant PHP developer, having been the primary developer of Lycos's Webon product-- which has some excellent usage of PHP, Javascript and MySQL. When I friend of mine Bob Wilkins, who started MyVBO, was looking for a developer I suggested Andrew (he now works at MyVBO), and for this book I also suggested Andrew. Andrew had also written a short book for O'Reilly on AJAX, so Wiley was glad to have had him as a suggestion. They came back and asked if I would lend a hand to which despite being exhausted from writing my first book, decided to contribute and work on some chapters that covered material that thought would be a great benefit to the community-- namely Gearman, Sphinx, the Memcached Functions for MySQL as well as Memcached itself. Finally, we needed someone else with MySQL expertise to add to the mix to cover more advanced MySQL information to which Ronald Bradford, who I know, like, and respect as an expert in the field, signed on also.

For me, I have been more of a Perl guy, however, I like PHP just as well and even prefer how it's web deployment model is much easier (you don't need to modify your httpd config files) was interested in a challenge of writing a book on a different language. Also, some fruits of this being that my project from my Perl book, Narada, now has a PHP port.

I'm proud of this book. It was harder in some ways to write a book with other people than do it all myself, despite far fewer pages. However, the end product took advantage of all our strengths.

Topics covered in this book are (but not limited to!):

* PHP and MySQL techniques every programmer should know
* Advanced PHP concepts such as using iterators, making classes behave like functions, using true lambda functions and closures
* MySQL storage engines
* Using the information schema
* Improving performance through caching - using memcached to add a caching layer to your application
* Writing UDFs
* Writing PHP extensions
* Using the Memcached Functions for MySQL (UDFs)
* Full text search - installing, configuring, and using Sphinx in a PHP application
* Multi-tasking in PHP using Gearman. Narada is used as an example application demonstrating how to put many of the concepts of the book into an application
* Using Apache rewrite rules
* HTTP-based authentication

All-in-all, this is a great book that I hope will benefit PHP/MySQL or any other web developers who need a source of expert information!

I want to thank Andrew, Ronald, Wiley (Bob Elliot, Maureen Spears), Trond Norbye and Eric Day (who tech-edited the book!), and all others who I could write a book in itself who have helped (see my credits in the book!). Also, I want to thank the team at NorthScale, my employer, for being able to work with a team experts who I could refer to while working on this book.

You can buy this book from any major publisher, and from Wiley at http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470563125.html

PlanetMySQL Voting: Vote UP / Vote DOWN

Slides from my Sphinx talk at RIT++ 2010

Апрель 14th, 2010

While the majority of Percona gang travelled to California for the MySQL event of the year, I headed in the opposite direction to Moscow for RIT++ 2010 conference where I presented a talk on Sphinx. You can get the PDF file here – Improving MySQL-based applications performance with Sphinx.

 

 

I have been invited to talk at Open Source Data Center Conference in Nürnberg, Germany in June this year, so I hope I can meet some of you there.


Entry posted by Maciej Dobrzanski | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


PlanetMySQL Voting: Vote UP / Vote DOWN