Parallel Programming with Microsoft Visual C++: Design Patterns for Decomposition and Coordination on Multicore Architectures
Colin Campbell
Ade Miller
Copyright 2011
This document is provided as-is. Information and views expressed in this document, including URL and other Internet website references, may change without notice. You bear the risk of using it. Unless otherwise noted, the companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted in examples herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Microsoft, MSDN, Visual Basic, Visual C++, Visual C#, Visual Studio, Windows, Windows Live, Windows Server, and Windows Vista are trademarks of the Microsoft group of companies.
All other trademarks are property of their respective owners.
Microsoft Press
Foreword
At its inception some 40 or so years ago, parallel computing was the province of experts who applied it to exotic fields, such as high energy physics, and to engineering applications, such as computational fluid dynamics. Weve come a long way since those early days.
This change is being driven by hardware trends. The days of perpetually increasing processor clock speeds are now at an end. Instead, the increased chip densities that Moores Law predicts are being used to create multicore processors, or single chips with multiple processor cores. Quad-core processors are now common, and this trend will continue, with 10s of cores available on the hardware in the not-too-distant future.
In the last five years, Microsoft has taken advantage of this technological shift to create a variety of parallel implementations. These include the Microsoft Windows High Performance Cluster (HPC) technology for message-passing interface (MPI) programs, Dryad, which offers a Map-Reduce style of parallel data processing, the Windows Azure technology platform, which can supply compute cores on demand, the Parallel Patterns Library (PPL) and Asynchronous Agents Library for native code, and the parallel extensions of the Microsoft .NET Framework 4.
Multicore computation affects the whole spectrum of applications, from complex scientific and design problems to consumer applications and new human/computer interfaces. We used to joke that parallel computing is the future, and always will be, but the pessimists have been proven wrong. Parallel computing has at last moved from being a niche technology to being center stage for both application developers and the IT industry.
But, there is a catch. To obtain any speed-up of an application, programmers now have to divide the computational work to make efficient use of the power of multicore processors, a skill that still belongs to experts. Parallel programming presents a massive challenge for the majority of developers, many of whom are encountering it for the first time. There is an urgent need to educate them in practical ways so that they can incorporate parallelism into their applications.
Two possible approaches are popular with some of my computer science colleagues: either design a new parallel programming language, or develop a heroic parallelizing compiler. While both are certainly interesting academically, neither has had much success in popularizing and simplifying the task of parallel programming for non-experts. In contrast, a more pragmatic approach is to provide programmers with a library that hides much of parallel programmings complexity and teach programmers how to use it.
To that end, the Microsoft Visual C++ Parallel Patterns Library and Asynchronous Agents Library present a higher-level programming model than earlier APIs. Programmers can, for example, think in terms of tasks rather than threads, and avoid the complexities of thread management. Parallel Programming with Microsoft Visual C++ teaches programmers how to use these libraries by putting them in the context of design patterns. As a result, developers can quickly learn to write parallel programs and gain immediate performance benefits.
I believe that this book, with its emphasis on parallel design patterns and an up-to-date programming model, represents an important first step in moving parallel programming into the mainstream.
Tony Hey
Corporate Vice President, Microsoft Research
This timely book comes as we navigate a major turning point in our industry: parallel hardware + mobile devices = the pocket supercomputer as the mainstream platform for the next 20 years.
Parallel applications are increasingly needed to exploit all kinds of target hardware. As I write this, getting full computational performance out of most machinesnearly all desktops and laptops, most game consoles, and the newest smartphonesalready means harnessing local parallel hardware, mainly in the form of multicore CPU processing; this is the commoditization of the supercomputer. Increasingly in the coming years, getting that full performance will also mean using gradually ever-more-heterogeneous processing, from local general-purpose computation on graphics processing units (GPGPU) flavors to harnessing often-on remote parallel computing power in the form of elastic compute clouds; this is the generalization of the heterogeneous cluster in all its NUMA glory, with instantiations ranging from on-die to on-machine to on-cloud, with early examples of each kind already available in the wild.
Starting now and for the foreseeable future, for compute-bound applications, fast will be synonymous not just with parallel, but with scalably parallel. Only scalably parallel applications that can be shipped with lots of latent concurrency beyond what can be exploited in this years mainstream machines will be able to enjoy the new Free Lunch of getting substantially faster when todays binaries can be installed and blossom on tomorrows hardware that will have more parallelism.
Visual C++ 2010 with its Parallel Patterns Library (PPL), described in this book, helps enable applications to take the first steps down this new path as it continues to unfold. During the design of PPL, many people did a lot of heavy lifting. For my part, I was glad to be able to contribute the heavy emphasis on lambda functions as the key central language extension that enabled the rest of PPL to be built as Standard Template Library (STL)-like algorithms implemented as a normal library. We could instead have built a half-dozen new kinds of special-purpose parallel loops into the language itself (and almost did), but that would have been terribly invasive and non-general. Adding a single general-purpose language feature like lambdas that can be used everywhere, including with PPL but not limited to only that, is vastly superior to baking special cases into the language.
The good news is that, in large parts of the world, we have as an industry already achieved pervasive computing: the vision of putting a computer on every desk, in every living room, and in everyones pocket. But now we are in the process of delivering pervasive and even elastic supercomputing: putting a supercomputer on every desk, in every living room, and in everyones pocket, with both local and non-local resources. In 1984, when I was just finishing high school, the worlds fastest computer was a Cray X-MP with four processors, 128MB of RAM, and peak performance of 942MFLOPSor, put another way, a fraction of the parallelism, memory, and computational power of a 2005 vintage Xbox, never mind modern phones and Kinect. Weve come a long way, and the pace of change is not only still strong, but still accelerating.