High Performance Python
by Micha Gorelick and Ian Ozsvald
Copyright 2020 Micha Gorelick and Ian Ozsvald. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
Acquisitions Editor: Tyler Ortman | Indexer: Potomac Indexing, LLC |
Development Editor: Sarah Grey | Interior Designer: David Futato |
Production Editor: Christopher Faucher | Cover Designer: Karen Montgomery |
Copyeditor: Arthur Johnson | Illustrator: Rebecca Demarest |
Proofreader: Sharon Wilkey |
- September 2014: First Edition
- May 2020: Second Edition
Revision History for the Second Edition
- 2020-04-30: First release
See http://oreilly.com/catalog/errata.csp?isbn=9781492055020 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. High Performance Python, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
High Performance Python is available under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 International License.
978-1-492-05502-0
[LSI]
Foreword
When you think about high performance computing, you might imagine giant clusters of machines modeling complex weather phenomena or trying to understand signals in data collected about far-off stars. Its easy to assume that only people building specialized systems should worry about the performance characteristics of their code. By picking up this book, youve taken a step toward learning the theory and practices youll need to write highly performant code. Every programmer can benefit from understanding how to build performant systems.
There are an obvious set of applications that are just on the edge of possible, and you wont be able to approach them without writing optimally performant code. If thats your practice, youre in the right place. But there is a much broader set of applications that can benefit from performant code.
We often think that new technical capabilities are what drives innovation, but Im equally fond of capabilities that increase the accessibility of technology by orders of magnitude. When something becomes ten times cheaper in time or compute costs, suddenly the set of applications you can address is wider than you imagined.
The first time this principle manifested in my own work was over a decade ago, when I was working at a social media company, and we ran an analysis over multiple terabytes of data to determine whether people clicked on more photos of cats or dogs on social media.
It was dogs, of course. Cats just have better branding.
This was an outstandingly frivolous use of compute time and infrastructure at the time! Gaining the ability to apply techniques that had previously been restricted to sufficiently high-value applications, such as fraud detection, to a seemingly trivial question opened up a new world of now-accessible possibilities. We were able to take what we learned from these experiments and build a whole new set of products in search and content discovery.
For an example that you might encounter today, consider a machine-learning system that recognizes unexpected animals or people in security video footage. A sufficiently performant system could allow you to embed that model into the camera itself, improving privacy or, even if running in the cloud, using significantly less compute and powerbenefiting the environment and reducing your operating costs. This can free up resources for you to look at adjacent problems, potentially building a more valuable system.
We all desire to create systems that are effective, easy to understand, and performant. Unfortunately, it often feels like we have to pick two (or one) out of the three! High Performance Python is a handbook for people who want to make things that are capable of all three.
This book stands apart from other texts on the subject in three ways. First, its written for ushumans who write code. Youll find all of the context you need to understand why you might make certain choices. Second, Gorelick and Ozsvald do a wonderful job of curating and explaining the necessary theory to support that context. Finally, in this updated edition, youll learn the specific quirks of the most useful libraries for implementing these approaches today.
This is one of a rare class of programming books that will change the way you think about the practice of programming. Ive given this book to many people who could benefit from the additional tools it provides. The ideas that youll explore in its pages will make you a better programmer, no matter what language or environment you choose to work in.
Enjoy the adventure.
Hilary Mason,
Data Scientist in Residence at Accel
Preface
Python is easy to learn. Youre probably here because now that your code runscorrectly, you need it to run faster. You like the fact that your code is easyto modify and you can iterate with ideas quickly. The trade-off between easy todevelop and runs as quickly as I need is a well-understood and often-bemoanedphenomenon. There are solutions.
Some people have serial processes that have to run faster. Others have problemsthat could take advantage of multicore architectures, clusters, or graphics processing units. Someneed scalable systems that can process more or less as expediency and fundsallow, without losing reliability. Others will realize that their codingtechniques, often borrowed from other languages, perhaps arent as natural asexamples they see from others.
In this book we will cover all of these topics, giving practical guidance forunderstanding bottlenecks and producing faster and more scalable solutions.We also include some war stories from those who went ahead of you, who took theknocks so you dont have to.
Python is well suited for rapid development, production deployments, and scalablesystems. The ecosystem is full of people who are working to make it scale onyour behalf, leaving you more time to focus on the more challenging tasks aroundyou.
Who This Book Is For
Youve used Python for long enough to have an idea about why certain things areslow and to have seen technologies like Cython, numpy
, and PyPy being discussedas possible solutions. You might also have programmed with other languages andso know that theres more than one way to solve a performance problem.