Introduction
Conceived in the late 1980s as a teaching and scripting language, Pythonhas since become an essential tool for many programmers, engineers,researchers, and data scientists across academia and industry. As anastronomer focused on building and promoting the free open tools fordata-intensive science, Ive found Python to be a near-perfect fit forthe types of problems I face day to day, whether its extracting meaningfrom large astronomical datasets, scraping and munging data sources fromthe Web, or automating day-to-day research tasks.
The appeal of Python is in its simplicity and beauty, as well as theconvenience of the large ecosystem of domain-specific tools that havebeen built on top of it. For example, most of the Python code inscientific computing and data science is built around a group of matureand useful packages:
NumPy provides efficient storage and computation formultidimensional data arrays.
SciPy contains a wide array of numerical tools suchas numerical integration and interpolation.
Pandas provides a DataFrame object alongwith a powerful set of methods to manipulate, filter, group, andtransform data.
Matplotlib provides a useful interface forcreation of publication-quality plots and figures.
Scikit-Learn provides a uniform toolkit forapplying common machine learning algorithms to data.
IPython/Jupyter provides an enhanced terminal andan interactive notebook environment that is useful for exploratoryanalysis, as well as creation of interactive, executable documents. Forexample, the manuscript for this report was composed entirely in Jupyter notebooks.
No less important are the numerous other tools and packages whichaccompany these: if there is a scientific or data analysis task you wantto perform, chances are someone has written a package that will do itfor you.
To tap into the power of this data science ecosystem, however, first requires familiarity with the Python language itself. I often encounterstudents and colleagues who have (sometimes extensive) backgrounds incomputing in some languageMATLAB, IDL, R, Java, C++, etc.and arelooking for a brief but comprehensive tour of the Python language thatrespects their level of knowledge rather than starting from ground zero.This report seeks to fill that niche.
As such, this report in no way aims to be a comprehensive introductionto programming, or a full introduction to the Python language itself; ifthat is what you are looking for, you might check out one of therecommended references listed in . Instead, this willprovide a whirlwind tour of some of Pythons essential syntax andsemantics, built-in data types and structures, function definitions,control flow statements, and other aspects of the language. My aim isthat readers will walk away with a solid foundation from which toexplore the data science stack just outlined.
Using Code Examples
Supplemental material (code examples, IPython notebooks, etc.) is available for download at https://github.com/jakevdp/WhirlwindTourOfPython/.
This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless youre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: A Whirlwind Tour of Python by Jake VanderPlas (OReilly). Copyright 2016 OReilly Media, Inc., 978-1-491-96465-1.
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .
Installation and Practical Considerations
Installing Python and the suite of libraries that enable scientific computing is straightforward whether you use Windows, Linux, or Mac OS X. This section will outline some of the considerations when setting up your computer.
Python 2 versus Python 3
This report uses the syntax of Python 3, which contains languageenhancements that are not compatible with the 2.x series of Python.Though Python 3.0 was first released in 2008, adoption has beenrelatively slow, particularly in the scientific and web developmentcommunities. This is primarily because it took some time for many of theessential packages and toolkits to be made compatible with the newlanguage internals. Since early 2014, however, stable releases of themost important tools in the data science ecosystem have beenfully compatible with both Python 2 and 3, and so this report will use thenewer Python 3 syntax. Even though that is the case, the vast majorityof code snippets in this report will also work without modification inPython 2: in cases where a Py2-incompatible syntax is used, I will makeevery effort to note it explicitly.
Installation with conda
Though there are various ways to install Python, the one I would suggestparticularly if you wish to eventually use the data science toolsmentioned earlieris via the cross-platform Anaconda distribution. Thereare two flavors of the Anaconda distribution: