Preface
I cant quite believe it, but just 10 years ago there was no Google.
Other web search engines were around back then, such as AltaVista, HotBot, Inktomi, and AllTheWeb, among others. So the stunningly swift ascendance of Google can settle in my mind, given some effort. But whats even more unbelievable is that just 20 years ago there were no web search engines at all. Thats only logical, because there was barely any Web! But its still hardly believable today.
The world is rapidly changing. The volume of information available and the connection bandwidth that gives us access to that information grows substantially every year, making all the kindsand volumes!of data increasingly accessible. A 1-million-row database of geographical locations, which was mind-blowing 20 years ago, is now something a fourth-grader can quickly fetch off the Internet and play with on his netbook. But the processing rate at which human beings can consume information does not change much (and said fourth-grader would still likely have to read complex location names one syllable at a time). This inevitably transforms searching from something that only eggheads would ever care about to something that every single one of us has to deal with on a daily basis.
Where does this leave the application developers for whom this book is written? Searching changes from a high-end, optional feature to an essential functionality that absolutely has to be provided to end users. People trained by Google no longer expect a 50-component form with check boxes, radio buttons, drop-down lists, roll-outs, and every other bell and whistle that clutters an application GUI to the point where it resembles a Boeing 797 pilot deck. They now expect a simple, clean text search box.
But this simplicity is an illusion. A whole lot is happening under the hood of that text search box. There are a lot of different usage scenarios, too: web searching, vertical searching such as product search, local email searching, image searching, and other search types. And while a search system such as Sphinx relieves you from the implementation details of complex, low-level, full-text index and query processing, you will still need to handle certain high-level tasks.
How exactly will the documents be split into keywords? How will the queries that might need additional syntax (such as cats AND dogs
) work? How do you implement matching that is more advanced than just exact keyword matching? How do you rank the results so that the text that is most likely to interest the reader will pop up near the top of a 200-result list, and how do you apply your business requirements to that ranking? How do you maintain the search system instance? Show nicely formatted snippets to the user? Set up a cluster when your database grows past the point where it can be handled on a single machine? Identify and fix bottlenecks if queries start working slowly? These are only a few of all the questions that come up during development, which only you and your team can answer because the choices are specific to your particular application.
This book covers most of the basic Sphinx usage questions that arise in practice. I am not aiming to talk about all the tricky bits and visit all the dark corners; because Sphinx is currently evolving so rapidly that even the online documentation lags behind the software, I dont think comprehensiveness is even possible. What I do aim to create is a practical field manual that teaches you how to use Sphinx from a basic to an advanced level.
Audience
I assume that readers have a basic familiarity with tools for system administrators and programmers, including the command line and simple SQL. Programming examples are in PHP, because of its popularity for website development.
Organization of This Book
This book consists of six chapters, organized as follows:
, lays out the types of search and the concepts you need to understand regarding the particular ways Sphinx conducts searches.
, tells you how to install and configure Sphinx, and run a few basic tests.
, shows you how to set up Sphinx indexing for either an SQL database or XML data, and includes some special topics such as handling different character sets.
, describes the syntax of search text, which can be exposed to the end user or generated from an application, and the effects of various search options.
, offers strategies for dealing with large data sets (which means nearly any real-life data set, such as multi-index searching).
, gives you some guidelines for the crucial goal of presenting the best results to the user first.
Conventions Used in This Book
The following typographical conventions are used in this book:
ItalicIndicates new terms, URLs, filenames, Unix utilities, and command-line options
Constant width
Indicates variables and other code elements, the contents of files, and the output from commands
Constant width bold
Shows commands or other text that should be typed literally by the user (such as the contents of full-text queries)
Constant width italic
Shows text that should be replaced with user-supplied values
Note
This icon signifies a tip, suggestion, or general note.
Using Code Examples
This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless youre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: Introduction to Search with Sphinx , by Andrew Aksyonoff. Copyright 2011 Andrew Aksyonoff, 978-0-596-80955-3.
If you feel your use of code examples falls outside fair use or the permission given here, feel free to contact us at .
Wed Like to Hear from You
Every example in this book has been tested on various platforms, but occasionally you may encounter problems. The information in this book has also been verified at each step of the production process. However, mistakes and oversights can occur and we will gratefully receive details of any you find, as well as any suggestions you would like to make for future editions. You can contact the authors and editors at:
OReilly Media, Inc. |
1005 Gravenstein Highway North |
Sebastopol, CA 95472 |
(800) 998-9938 (in the United States or Canada) |
(707) 829-0515 (international or local) |