PYTHON TOOLS FOR SCIENTISTS
An Introduction to Using Anaconda, JupyterLab, and Pythons Scientific Libraries
by Lee Vaughan
San Francisco
PYTHON TOOLS FOR SCIENTISTS. Copyright 2023 by Lee Vaughan.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher.
First printing
26 25 24 23 22 1 2 3 4 5
ISBN-13: 978-1-7185-0266-6 (print)
ISBN-13: 978-1-7185-0267-3 (ebook)
Publisher: William Pollock
Managing Editor: Jill Franklin
Production Manager: Sabrina Plomitallo-Gonzlez
Production Editor: Katrina Horlbeck Olsen
Developmental Editor: Frances Saux
Cover Illustrator: Gina Redman
Interior Design: Octopod Studios
Technical Reviewer: John Mayhew
Production Services: Octal Publishing, LLC
For information on distribution, bulk sales, corporate sales, or translations, please contact No Starch Press, Inc. directly at or:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 1.415.863.9900
www.nostarch.com
Library of Congress Control Number: 2022942882
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an As Is basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.
This book is dedicated to the worldwide army of open source software developers. I am immensely grateful for your hard work and the immeasurable good it produces.
About the Author
Lee Vaughan is a programmer, educator, and author of Impractical Python Projects (No Starch Press, 2019) and Real-World Python (No Starch Press, 2021). As an executive-level scientist at ExxonMobil, he constructed and reviewed computer models, developed and tested software, and trained geoscientists and engineers. His books are dedicated to helping self-learners develop and hone their Python skills and have fun doing it!
About the Technical Reviewer
John Mayhew is a geoscientist with an extensive background in mathematics, data analysis, and scientific computing. He is a co-founder of the nonprofit organization Land of Jershon and currently serves on its board of directors and as the CEO. He has also established a charitable giving consultantship, East Gate Advocates, designed to connect donors with nonprofit projects.
CONTENTS IN DETAIL
1
INSTALLING AND LAUNCHING ANACONDA
2
KEEPING ORGANIZED WITH CONDA ENVIRONMENTS
3
SIMPLE SCRIPTING IN THE JUPYTER QT CONSOLE
4
SERIOUS SCRIPTING WITH SPYDER
5
JUPYTER NOTEBOOK: AN INTERACTIVE JOURNAL FOR COMPUTATIONAL RESEARCH
6
JUPYTERLAB: YOUR CENTER FOR SCIENCE
7
INTEGERS, FLOATS, AND STRINGS
8
VARIABLES
9
THE CONTAINER DATA TYPES
10
FLOW CONTROL
11
FUNCTIONS AND MODULES
12
FILES AND FOLDERS
13
OBJECT-ORIENTED PROGRAMMING
14
DOCUMENTING YOUR WORK
15
THE SCIENTIFIC LIBRARIES
16
THE INFOVIS, SCIVIS, AND DASHBOARDING LIBRARIES
17
THE GEOVIS LIBRARIES
18
NUMPY: NUMERICAL PYTHON
19
DEMYSTIFYING MATPLOTLIB
20
PANDAS, SEABORN, AND SCIKIT-LEARN
21
MANAGING DATES AND TIMES WITH PYTHON AND PANDAS
APPENDIX
ANSWERS TO THE TEST YOUR KNOWLEDGE CHALLENGES
ACKNOWLEDGMENTS
Thanks to Bill Pollock, founder and president of No Starch Press, for letting me write yet another book. Thanks also to Frances Saux for sticking with me through two whole books and providing the best editing money can buy. To Gina Redman, Jill Franklin, and Octopod Studios for another spectacular cover illustration. To Sarah De Vos for marketing assistance, Katrina Horlbeck Olsen for production editing, and the rest of the staff at No Starch Press who work tirelessly to produce the finest in geek entertainment.
Special thanks to Anaconda Inc. co-founder and CEO, Peter Wang, for his vision to empower the whole world with data literacy. To James Bednar, director of custom services at Anaconda, for his invaluable time, guidance, and advice with respect to the data visualization chapters. Thanks also to Anaconda data scientist Albert DeFusco for technical assistance in setting up the Anaconda distribution and for useful discussions around project management best practices and understanding the relationships among products like Anaconda Cloud and Nucleus.
Thanks to John Mayhew for his thorough technical review and helpful suggestions. Two heads really are better than one! Thanks also to Mike Driscoll, content writer at Real Python, for advice on the Jupyter Notebook and JupyterLab chapters.
Finally, extra special thanks to ExxonMobil geological modeler Andy Maas for his frank and frustrated discussions on Pythons large selection of plotting and coding tools. Although others shared his concerns, he directly inspired this book. Hopefully, Ive added some clarity to these issues.
INTRODUCTION
This book is for scientists and budding scientists who want to use the Python programming language in their work. It teaches the basics of Python and shows the easiest and most popular way to gain access to Pythons universe of scientific libraries, the preferred method for documenting work, and how to keep various projects separate and secure.
As a mature, open source, and easy-to-learn language, Python has an enormous user base and a welcoming community eager to help you develop your skills. This user base has contributed to a rich set of tools and supporting libraries (collections of precompiled routines) for scientific endeavors such as data science, machine learning, language processing, robotics, computer vision, and more. As a result, Python has become one of the most important scientific computing languages in academia and industry.
Popularity, however, comes with a price. The Python ecosystem is growing into an impenetrable jungle. In fact, this book sprang from conversations with scientific colleagues in the corporate world. New to Python, they were frustrated, stressed, and suffering from