Allen B. Downey
Think Data Structures
by Allen B. Downey
Copyright 2017 Allen Downey. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Editors: Nan Barber and Brian Foster
- Production Editor: Kristen Brown
- Copyeditor: Charles Roumeliotis
- Proofreader: Amanda Kersey
- Indexer: Allen B. Downey
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Rebecca Demarest
Revision History for the First Edition
- 2017-07-07: First Release
The OReilly logo is a registered trademark of OReilly Media, Inc. Think Data Structures, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Think Data Structures is available under the Creative Commons Attribution-NonCommercial 3.0 Unported License. The author maintains an online version at http://greenteapress.com/wp/think-data-structures/.
978-1-491-97239-7
[LSI]
The Philosophy Behind the Book
Data structures and algorithms are among the most important inventions of the last 50 years, and they are fundamental tools software engineers need to know. But in my opinion, most of the books on these topics are too theoretical, too big, and too bottom up:
Too theoretical
Mathematical analysis of algorithms is based on simplifying assumptions that limit its usefulness in practice. Many presentations of this topic gloss over the simplifications and focus on the math. In this book I present the most practical subset of this material and omit or de-emphasize the rest.
Too big
Most books on these topics are at least 500 pages, and some are more than 1,000. By focusing on the topics I think are most useful for software engineers, I kept this book under 150 pages.
Too bottom up
Many data structures books focus on how data structures work (the implementations), with less about how to use them (the interfaces). In this book, I go top down, starting with the interfaces. Readers learn to use the structures in the Java Collections Framework before getting into the details of how they work.
Finally, some books present this material out of context and without motivation: its just one damn data structure after another! I try to liven it up by organizing the topics around an application web search that uses data structures extensively, and is an interesting and important topic in its own right.
This application motivates some topics that are not usually covered in an introductory data structures class, including persistent data structures with Redis.
I have made difficult decisions about what to leave out, but I have made some compromises. I include a few topics that most readers will never use, but that they might be expected to know, possibly in a technical interview. For these topics, I present both the conventional wisdom as well as my reasons to be skeptical.
This book also presents basic aspects of software engineering practice, including version control and unit testing. Most chapters include an exercise that allows readers to apply what they have learned. Each exercise provides automated tests that check the solution. And for most exercises, I present my solution at the beginning of the next chapter.
Prerequisites
This book is intended for college students in computer science and related fields, as well as professional software engineers, people training in software engineering, and people preparing for technical interviews.
Before you start this book, you should know Java pretty well; in particular, you should know how to define a new class that extends an existing class or implements an interface
. If your Java is rusty, here are two books you might start with:
- Downey and Mayfield, Think Java (OReilly Media, 2016), which is intended for people who have never programmed before
- Sierra and Bates, Head First Java (OReilly Media, 2005), which is appropriate for people who already know another programming language
If you are not familiar with interfaces in Java, you might want to work through the tutorial called What Is an Interface? at http://thinkdast.com/interface.
One vocabulary note: the word interface can be confusing. In the context of an application programming interface (API), it refers to a set of classes and methods that provide certain capabilities.
In the context of Java, it also refers to a language feature, similar to a class, that specifies a set of methods. To help avoid confusion, Ill use interface in the normal typeface for the general idea of an interface, and interface
in the code typeface for the Java language feature.
You should also be familiar with type parameters and generic types. For example, you should know how create an object with a type parameter, like ArrayList
. If not, you can read about type parameters at http://thinkdast.com/types.
You should be familiar with the Java Collections Framework (JCF), which you can read about at http://thinkdast.com/collections. In particular, you should know about the List
interface and the classes ArrayList
and LinkedList
.
Ideally you should be familiar with Apache Ant, which is an automated build tool for Java. You can read more about Ant at http://thinkdast.com/anttut.
And you should be familiar with JUnit
, which is a unit testing framework for Java. You can read more about it at http://thinkdast.com/junit.
Working with the Code
The code for this book is in a Git repository at http://thinkdast.com/repo.
Git is a version control system that allows you to keep track of the files that make up a project. A collection of files under Gits control is called a repository.
GitHub is a hosting service that provides storage for Git repositories and a convenient web interface. It provides several ways to work with the code:
- You can create a copy of the repository on GitHub by pressing the Fork button. If you dont already have a GitHub account, youll need to create one. After forking, youll have your own repository on GitHub that you can use to keep track of code you write. Then you can clone the repository, which downloads a copy of the files to your computer.
- Alternatively, you could clone the repository without forking. If you choose this option, you dont need a GitHub account, but you wont be able to save your changes on GitHub.