1. Introduction to Computing with Python
This book is about using Python for numerical computing. Python is a high-level, general-purpose interpreted programming language that is widely used in scientific computing and engineering. As a general-purpose language, Python was not specifically designed for numerical computing, but many of its characteristics make it well suited for this task. First and foremost, Python is well known for its clean and easy-to-read code syntax. Good code readability improves maintainability, which in general results in less bugs and better applications overall, but it also encourages rapid code development. This readability and expressiveness is essential in exploratory and interactive computing, which requires fast turnaround for testing various ideas and models.
In computational problem solving, it is of course important to consider the performance of algorithms and their implementations. It is natural to strive for efficient high-performance code, and optimal performance is indeed crucial in many computational situations. In such cases it may be necessary to use a low-level program language, such as C or Fortran, to obtain the best performance out of the hardware that runs the code. However, it is not always the case that optimal runtime performance is the most suitable objective. It is also important to consider the development time required to implement a solution to a problem in a given programming language or environment. While the best possible runtime performance can be achieved in a low-level programming language, working in a high-level language such as Python usually reduces the development time, and often results in more flexible and extensible code.
These conflicting objectives present a trade-off between high performance and long development time, and lower performance but shorter development time. See Figure for a schematic visualization of this concept. When choosing a computational environment for solving a particular problem, it is important to consider this trade-off and to decide whether man-hours spent on the development or CPU-hours spent on running the computations is more valuable. It is worth noting that CPU-hours are cheap already and are getting even cheaper, but man-hours are expensive. In particular, your own time is of course a very valuable resource. This makes a strong case for minimizing development time rather than the runtime of a computation by using a high-level programming language and environment such as Python and its scientific computing libraries.
Figure 1-1.
Trade-off between low- and high-level programming languages. While a low-level language typically gives the best performance when a significant amount of development time is invested in the implemenation of a problem, the development time required to obtain a first runnable code that solve the problem is typically shorter in a high-level language such as Python
A solution that partially avoids the trade-off between high- and low-level languages is to use a multi-language model, where a high-level language is used to interface libraries and software packages written in low-level languages. In a high-level scientific computing environment, this type of interoperability with software packages written in low-level languages (for example Fortran, C, or C++) is an important requirement. Python excels at this type of integration, and as a result Python has become a popular glue language used as an interface for setting up and controlling computations that use code written in low-level programming languages for time-consuming number crunching. This is an important reason why Python is a popular language for numerical computing. The multi-language model enables rapid code development in a high-level language, while retaining most of the performance of low-level languages.
As a consequence of the multi-language model, scientific and technical computing with Python involves much more than just the Python language itself. In fact, the Python language is only a piece of an entire ecosystem of software and solutions that provide a complete environment for scientific and technical computing. This ecosystem includes development tools and interactive programming environments, such as Spyder and IPython, which are designed particularly with scientific computing in mind. It also includes a vast collection of Python packages for scientific computing. This ecosystem of scientifically oriented libraries ranges from generic core libraries such as NumPy, SciPy, and Matplotlib to more specific libraries for particular problem domains. Another crucial layer in the scientific Python stack exists below the various Python modules. Many scientific Python libraries interface, in one way or another: low-level high-performance scientific software packages, such as, for example, optimized LAPACK and BLAS libraries for and overview of the various layers of the software stack for computing with Python.
Figure 1-2.
An overview of the components and layers in the scientific computing environment for Python, from a users perspective, from top to bottom. Users typically only interact with the top three layers, but the bottom layer constitutes a very important part of the software stack. An example of specific software components from each layer in the stack is shown in the right part of the figure
Tip
The SciPy organization and its web site http://www.scipy.org provide a centralized resource for information about the core packages in the scientific Python ecosystem, and lists of additional specialized packages, as well as documentation and tutorials. As such, it is an indispensable asset when working with scientific and technical computing in Python. Another great resource is the Numeric and Scientific page on the official Python Wiki: http://wiki.python.org/moin/NumericAndScientific .
Apart from the technical reasons for why Python provides a good environment for computational work, it is also significant that Python and its scientific computing libraries are free and open source. This eliminates artificial constraints on when and how applications developed with the environment can be deployed and distributed by its users. Equally significant, it makes it possible for a dedicated user to obtain complete insight in how the language and the domain-specific packages are implemented and what methods are used. For academic work where transparency and reproducibility are hallmarks, this is increasingly recognized as an important requirement on software used in research. For commercial use, it provides freedom in how the environment is used and integrated in products and how such solutions are distributed to customers. All users benefit from the relief of not having to pay license fees, which may otherwise inhibit deployments on large computing environments, such as clusters and cloud computing platforms.
The social component of the scientific computing ecosystem for Python is another important aspect of its success. Vibrant user communities have emerged around the core packages and many of the domain-specific projects. Project specific mailing lists, stack overflow groups, and issue trackers (for example, on Github, http://www.github.com ) are typically very active and provide forums for discussing problems and obtaining help, as well as a way of getting involved in the development of these tools. The Python computing community also organizes yearly conferences and meet-ups at many venues around the world, such as the SciPy ( http://conference.scipy.org ) and PyData ( http://pydata.org ) conference series.