Parallel Programming
Concepts and Practice
First edition
Bertil Schmidt
Institut fr Informatik, Staudingerweg 9, 55128, Mainz, Germany
Jorge Gonzlez-Domnguez
Computer Architecture Group, University of A Corua, Edificio rea cientfica (Office 3.08), Campus de Elvia, 15071, A Corua, Spain
Christian Hundt
Institut fr Informatik, Staudingerweg 9, 55128, Mainz, Germany
Moritz Schlarb
Data Center, Johannes Gutenberg-University Mainz, Germany
Anselm-Franz-von-Bentzel-Weg 12, 55128, Mainz, Germany
Table of Contents
List of tables
- Tables in Chapter 1
- Tables in Chapter 2
- Tables in Chapter 5
- Tables in Chapter 7
- Tables in Chapter 9
List of figures
- Figures in Chapter 1
- Figures in Chapter 2
- Figures in Chapter 3
- Figures in Chapter 4
- Figures in Chapter 5
- Figures in Chapter 6
- Figures in Chapter 7
- Figures in Chapter 8
- Figures in Chapter 9
- Figures in Chapter 10
Landmarks
Copyright
Morgan Kaufmann is an imprint of Elsevier
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright 2018 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-849890-3
For information on all Morgan Kaufmann publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Katey Birtcher
Acquisition Editor: Steve Merken
Developmental Editor: Nate McFadden
Production Project Manager: Sreejith Viswanathan
Designer: Christian J. Bilbow
Typeset by VTeX
Preface
Parallelism abounds. Nowadays, any modern CPU contains at least two cores, whereas some CPUs feature more than 50 processing units. An even higher degree of parallelism is available on larger systems containing multiple CPUs such as server nodes, clusters, and supercomputers. Thus, the ability to program these types of systems efficiently and effectively is an essential aspiration for scientists, engineers, and programmers. The subject of this book is a comprehensive introduction to the area of parallel programming that addresses this need. Our book teaches practical parallel programming for shared memory and distributed memory architectures based on the C++11 threading API, Open Multiprocessing (OpenMP), Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and Unified Parallel C++ (UPC++), as well as necessary theoretical background. We have included a large number of programming examples based on the recent C++11 and C++14 dialects of the C++ programming language.
This book targets participants of Parallel Programming or High Performance Computing courses which are taught at most universities at senior undergraduate level or graduate level in computer science or computer engineering. Moreover, it serves as suitable literature for undergraduates in other disciplines with a computer science minor or professionals from related fields such as research scientists, data analysts, or R&D engineers. Prerequisites for being able to understand the contents of our book include some experience with writing sequential code in C/C++ and basic mathematical knowledge.
In good tradition with the historic symbiosis of High Performance Computing and natural science, we introduce parallel concepts based on real-life applications ranging from basic linear algebra routines over machine learning algorithms and physical simulations but also traditional algorithms from computer science. The writing of correct yet efficient code is a key skill for every programmer. Hence, we focus on the actual implementation and performance evaluation of algorithms. Nevertheless, the theoretical properties of algorithms are discussed in depth, too. Each chapter features a collection of additional programming exercises that can be solved within a web framework that is distributed with this book. The System for Automated Code Evaluation (SAUCE) provides a web-based testing environment for the submission of solutions and their subsequent evaluation in a classroom setting: the only prerequisite is an HTML5 compatible web browser allowing for the embedding of interactive programming exercise in lectures. SAUCE is distributed as docker image and can be downloaded at
https://parallelprogrammingbook.org
This website serves as hub for related content such as installation instructions, a list of errata, and supplementary material (such as lecture slides and solutions to selected exercises for instructors).
If you are a student or professional that aims to learn a certain programming technique, we advise to initially read the first three chapters on the fundamentals of parallel programming, theoretical models, and hardware architectures. Subsequently, you can dive into one of the introductory chapters on C++11 Multithreading, OpenMP, CUDA, or MPI which are mostly self-contained. The chapters on Advanced C++11 Multithreading, Advanced CUDA, and UPC++ build upon the techniques of their preceding chapter and thus should not be read in isolation.
If you are a lecturer, we propose a curriculum consisting of 14 lectures mainly covering applications from the introductory chapters. You could start with a lecture discussing the fundamentals from the first chapter including parallel summation using a hypercube and its analysis, the definition of basic measures such as speedup, parallelization efficiency and cost, and a discussion of ranking metrics. The second lecture could cover an introduction to PRAM, network topologies, weak and strong scaling. You can spend more time on PRAM if you aim to later discuss CUDA in more detail or emphasize hardware architectures if you focus on CPUs. Two to three lectures could be spent on teaching the basics of the C++11 threading API, CUDA, and MPI, respectively. OpenMP can be discussed within a span of one to two lectures. The remaining lectures can be used to either discuss the content in the advanced chapters on multithreading, CUDA, or the PGAS-based UPC++ language.