Elasticsearch: The Definitive Guide
by Clinton Gormley and Zachary Tong
Copyright 2015 Elasticsearch. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Editors: Mike Loukides and Brian Anderson
- Production Editor: Shiny Kalapurakkel
- Proofreader: Sharon Wilkey
- Indexer: Ellen Troutman-Zaig
- Interior Designer: David Futato
- Cover Designer: Ellie Volkhausen
- Illustrator: Rebecca Demarest
- January 2015: First Edition
Revision History for the First Edition
- 2015-01-16: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781449358549 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Elasticsearch: The Definitive Guide, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed astrademarks. Where those designations appear in this book, and OReilly Media, Inc. was aware of a trademarkclaim, the designations have been printed in caps or initial caps.
While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-449-35854-9
[LSI]
Foreword
One of the most nerve-wracking periods when releasing the first version of an open source project occurs when the IRC channel is created. You are all alone, eagerly hoping and wishing for the first user to come along. I still vividly remember those days.
One of the first users that jumped on IRC was Clint, and how excited was I. Well for a brief period, until I found out that Clint was actually a Perl user, no less working on a website that dealt with obituaries. I remember asking myself why couldnt we get someone from a more hyped community, like Ruby or Python (at the time), and a slightly nicer use case.
How wrong I was. Clint ended up being instrumental to the success of Elasticsearch. He was the first user to roll out Elasticsearch into production (version 0.4 no less!), and the interaction with Clint was pivotal during the early days in shaping Elasticsearch into what it is today. Clint has a unique insight into what is simple, and he is very rarely wrong, which has a huge impact on various usability aspects of Elasticsearch, from management, to API design, to day-to-day usability features. It was a no brainer for us to reach out to Clint and ask if he would join our company immediately after we formed it.
Another one of the first things we did when we formed the company was offer public training. Its hard to express how nervous we were about whether or not people would even sign up for it.
We were wrong.
The trainings were and still are a rave success with waiting lists in all major cities. One of the people who caught our eye was a young fellow, Zach, who came to one of our trainings. We knew about Zach from his blog posts about using Elasticsearch (and secretly envied his ability to explain complex concepts in a very simple manner) and from a PHP client he wrote for the software. What we found out was that Zach had actually paid to attend the Elasticsearch training out of his own pocket! You cant really ask for more than that, and we reached out to Zach and asked if he would join our company as well.
Both Clint and Zach are pivotal to the success of Elasticsearch. They are wonderful communicators who can explain Elasticsearch from its high-level simplicity, to its (and Apache Lucenes) low-level internal complexities. Its a unique skill that we dearly cherish here at Elasticsearch. Clint is also responsible for the Elasticsearch Perl client, while Zach is responsible for the PHP one - both wonderful pieces of code.
And last, both play an instrumental role in most of what happens daily with the Elasticsearch project itself. One of the main reasons why Elasticsearch is so popular is its ability to communicate empathy to its users, and Clint and Zach are both part of the group that makes this a reality.
Preface
The world is swimming in data. For years we have been simply overwhelmed bythe quantity of data flowing through and produced by our systems. Existingtechnology has focused on how to store and structure warehouses full of data.Thats all well and gooduntil you actually need to make decisions inreal time informed by that data.
Elasticsearch is a distributed, scalable, real-time search and analytics engine.It enables you to search, analyze, and explore your data, often in ways thatyou did not anticipate at the start of a project. It exists because raw datasitting on a hard drive is just not useful.
Whether you need full-text search, real-time analytics of structured data, ora combination of the two, this book introduces you to the fundamentalconcepts required to start working with Elasticsearch at a basic level. Withthese foundations laid, it will move on to more-advanced search techniques,which you will need to shape the search experience to fit your requirements.
Elasticsearch is not just about full-text search. We explain structuredsearch, analytics, the complexities of dealing with human language,geolocation, and relationships. We will also discuss how best to model yourdata to take advantage of the horizontal scalability of Elasticsearch, and howto configure and monitor your cluster when moving to production.
Who Should Read This Book
This book is for anybody who wants to put their data to work. It doesntmatter whether you are starting a new project and have the flexibility todesign the system from the ground up, or whether you need to give new life toa legacy system. Elasticsearch will help you to solve existing problems andopen the way to new features that you havent yet considered.
This book is suitable for novices and experienced users alike. We expect youto have some programming background and, although not required, it would helpto have used SQL and a relational database. We explain concepts from firstprinciples, helping novices to gain a sure footing in the complex world ofsearch.
The reader with a search background will also benefit from this book.Elasticsearch is a new technology that has some familiar concepts. The moreexperienced user will gain an understanding of how those concepts have beenimplemented and how they interact in the context of Elasticsearch. Even theearly chapters contain nuggets of information that will be useful to themore advanced user.
Finally, maybe you are in DevOps. While the other departments are stuffingdata into Elasticsearch as fast as they can, youre the one charged withstopping their servers from bursting into flames. Elasticsearch scaleseffortlessly, as long as your users play within the rules. You need to knowhow to set up a stable cluster before going into production, and then be able torecognize the warning signs at three in the morning in order to preventcatastrophe. The earlier chapters may be of less interest to you, but the lastpart of the book is essential readingall you need to know to avoidmeltdown.