Over the past five years there has been a revolution in computing brought about by a company that for successive years has emerged as one of the premier gaming hardware manufacturersNVIDIA. With the introduction of the CUDA (Compute Unified Device Architecture) programming language, for the first time these hugely powerful graphics coprocessors could be used by everyday C programmers to offload computationally expensive work. From the embedded device industry, to home users, to supercomputers, everything has changed as a result of this.
One of the major changes in the computer software industry has been the move from serial programming to parallel programming. Here, CUDA has produced great advances. The graphics processor unit (GPU) by its very nature is designed for high-speed graphics, which are inherently parallel. CUDA takes a simple model of data parallelism and incorporates it into a programming model without the need for graphics primitives.
In fact, CUDA, unlike its predecessors, does not require any understanding or knowledge of graphics or graphics primitives. You do not have to be a games programmer either. The CUDA language makes the GPU look just like another programmable device.
Throughout this book I will assume readers have no prior knowledge of CUDA, or of parallel programming. I assume they have only an existing knowledge of the C/C++ programming language. As we progress and you become more competent with CUDA, well cover more advanced topics, taking you from a parallel unaware programmer to one who can exploit the full potential of CUDA.
For programmers already familiar with parallel programming concepts and CUDA, well be discussing in detail the architecture of the GPUs and how to get the most from each, including the latest Fermi and Kepler hardware. Literally anyone who can program in C or C++ can program with CUDA in a few hours given a little training. Getting from novice CUDA programmer, with a several times speedup to 10 timesplus speedup is what you should be capable of by the end of this book.
The book is very much aimed at learning CUDA, but with a focus on performance, having first achieved correctness. Your level of skill and understanding of writing high-performance code, especially for GPUs, will hugely benefit from this text.
This book is a practical guide to using CUDA in real applications, by real practitioners. At the same time, however, we cover the necessary theory and background so everyone, no matter what their background, can follow along and learn how to program in CUDA, making this book ideal for both professionals and those studying GPUs or parallel programming.
The book is set out as follows:
: A Short History of Supercomputing. This chapter is a broad introduction to the evolution of streaming processors covering some key developments that brought us to GPU processing today.
: CUDA Hardware Overview. This chapter provides a fairly detailed explanation of the hardware and architecture found around and within CUDA devices. To achieve the best performance from CUDA programming, a reasonable understanding of the hardware both within and outside the device is required.
: Setting Up CUDA. Installation and setup of the CUDA SDK under Windows, Mac, and the Linux variants. We also look at the main debugging environments available for CUDA.
: Grids, Blocks, and Threads. A detailed explanation of the CUDA threading model, including some examples of how the choices here impact performance.
: Memory Handling with CUDA. Understanding the different memory types and how they are used within CUDA is the single largest factor influencing performance. Here we take a detailed explanation, with examples, of how the various memory types work and the pitfalls of getting it wrong.
: Using CUDA in Practice. Detailed examination as to how central processing units (CPUs) and GPUs best cooperate with a number of problems and the issues involved in CPU/GPU programming.
: Multi-CPU and Multi-GPU Solutions. We look at how to program and use multiple GPUs within an application.
: Optimizing Your Application. A detailed breakdown of the main areas that limit performance in CUDA. We look at the tools and techniques that are available for analysis of CUDA code.
: Libraries and SDK. A look at some of the CUDA SDK samples and the libraries supplied with CUDA, and how you can use these within your applications.
: Designing GPU-Based Systems. This chapter takes a look at some of the issues involved with building your own GPU server or cluster.
: Common Problems, Causes, and Solutions. A look at the type of mistakes most programmers make when developing applications in CUDA and how these can be detected and avoided.