Archive for the ‘fulltext’ Category

State of MySQL Search

Январь 14th, 2010
Recently I took part in the first meeting of the MySQL Search Team. You can read more about the team here.

My task was to represent the customer interests regarding fulltext search. My report had a few main points which I made:

  • MySQL Fulltext Search (FTS) has several big issues with it compared to other solutions. I have listed these in the order of importance that I see. Numbers 1 and 2 are very close in importance however, so I can see those going either way.

    1. MyISAM only: Limits the usefulness in many very busy environments due to table level locking and crash safety

    2. Performance: The design of FTS makes performance hurt as the index size exceeds RAM. Combining multiple second queries with MyISAM table level locking is a recipe for failure.

    3. Features: There are a lot of features lacking in FTS that exist in other products. Stemming, synonyms, dictionary usage, CJK support, etc... are all potential avenues of investigation.

  • Third-party solutions are commonly used for searching data in MySQL, but does require extra work to deploy or implement. Tools like Lucene, Sphinx, etc... are very good at what they do, and are normally faster and more fully featured compared to MySQL FTS. The big issue is that normally there is some effort involved in implementing these, so the ease of use in implementing and maintaining is lower than MySQL FTS.



So the forward looking question is how to combine the best attributes of these two solutions to get one easy to deploy and use robust solution. Personally, I can see a few possible routes forward:
  • Improve MySQL FTS! There is a lot of research and known best practices that exist for search. This would leverage those in order to make the MySQL solution more robust and featureful. From the current state of FTS, it would be a pretty decent amount of work, but could have good returns.
  • Improve the glue code between the third parties and MySQL. This could take a variety of forms. One idea I had was in the the form of index plugins where as rows are inserted/updated/deleted, the external datastores can be automatically updated as well. This is very similar to triggers, but is a bit different since there can be more integration and easier deployment.
It's possible the best solution may be a mix of the two options to allow for more flexibility or even something completely new that I haven't thought of!

Regardless, I am happy that people at MySQL are finally taking a look at the poor state of the MySQL fulltext search environment and trying to figure out new good solutions for this space.

PlanetMySQL Voting: Vote UP / Vote DOWN

Comparison Between Solr And Sphinx Search Servers (Solr Vs Sphinx – Fight!)

Сентябрь 3rd, 2009

In the past few weeks I've been implementing advanced search at Plaxo, working quite closely with Solr enterprise search server. Today, I saw this relatively detailed comparison between Solr and its main competitor Sphinx (full credit goes to StackOverflow user mausch who had been using Solr for the past 2 years). For those still confused, Solr and Sphinx are similar to MySQL FULLTEXT search, or for those even more confused, think Google (yeah, this is a bit of a stretch, I know).

Similarities

  • Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
  • Both have a long list of high-traffic sites using them (Solr, Sphinx)
  • Both offer commercial support. (Solr, Sphinx)
  • Both offer client API bindings for several platforms/languages (Sphinx, Solr)
  • Both can be distributed to increase speed and capacity (Sphinx, Solr)

Here are some differences

Related questions

Conclusion

In my experience, Solr is very-very fast on the query side. It is also very powerful. The indexing side is very CPU and memory intensive and is an unfortunate side effect of having such a feature-rich, fast application. Nevertheless, I highly recommend Solr.

For disclaimer purposes, I have not had much experience with Sphinx and, again, all credit for this comparison goes to mausch.

 
Similar Posts:Share/Bookmark
PlanetMySQL Voting: Vote UP / Vote DOWN