Front Matter
CUDA Application Design and Development
CUDA Application Design and Development
Rob Farber
AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO
Morgan Kaufmann is an imprint of Elsevier
Copyright
Acquiring Editor: Todd Green
Development Editor: Robyn Day
Project Manager: Danielle S. Miller
Designer: Dennis Schaeffer
Morgan Kaufmann is an imprint of Elsevier
225 Wyman Street, Waltham, MA 02451, USA
2011 NVIDIA Corporation and Rob Farber. Published by Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
NoticesKnowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
Application submitted.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
ISBN: 978-0-12-388426-8
For information on all MK publications visit our website at www.mkp.com
Typeset by: diacriTech, Chennai, India
Printed in the United States of America
11 12 13 14 15 10 9 8 7 6 5 4 3 2 1
Dedication
This book is dedicated to my wife Margy and son Ryan, who could not help but be deeply involved as I wrote it. In particular to my son Ryan, who is proof that I am the older model thank you for the time I had to spend away from your childhood.
To my many friends who reviewed this book and especially those who caught errors, I cannot thank you enough for your time and help. In particular, I'd like to thank everyone at ICHEC (the Irish Center for High-End Computing) who adopted me as I finished the book's birthing process and completed this manuscript. Finally, thank you to my colleagues and friends at NIVDIA, who made the whole CUDA revolution possible.
Foreword
Jeffrey S. Vetter
Distinguished Research Staff Member, Oak Ridge National Laboratory; Professor, Georgia Institute of Technology.
GPUs have recently burst onto the scientific computing scene as an innovative technology that has demonstrated substantial performance and energy efficiency improvements for the numerous scientific applications. These initial applications were often pioneered by early adopters, who went to great effort to make use of GPUs. More recently, the critical question facing this technology is whether it can become pervasive across the multiple, diverse algorithms in scientific computing, and useful to a broad range of users, not only the early adopters. A key barrier to this wider adoption is software development: writing and optimizing massively parallel CUDA code, using new performance and correctness tools, leveraging libraries, and understanding the GPU architecture.
Part of this challenge will be solved by experts sharing their knowledge and methodology with other users through books, tutorials, and collaboration. CUDA Application Design and Development is one such book. In this book, the author provides clear, detailed explanations of implementing important algorithms, such as algorithms in quantum chemistry, machine learning, and computer vision methods, on GPUs. Not only does the book describe the methodologies that underpin GPU programming, but it describes how to recast algorithms to maximize the benefit of GPU architectures. In addition, the book provides many case studies, which are used to explain and reinforce important GPU concepts like CUDA threads, the GPU memory hierarchy, and scalability across multiple GPUs including an MPI example demonstrated near-linear scaling to 500 GPUs.
Lastly, no programming language stands alone. Arguably, for any language to be successful, it must be surrounded by an ecosystem of powerful compilers, performance and correctness tools, and optimized libraries. These pragmatic aspects of software development are often the most important factor to developing applications quickly. CUDA Application Design and Development does not disappoint in this area, as it devotes multiple chapters to describing how to use CUDA compilers, debuggers, performance profilers, libraries, and interoperability with other languages.
I have enjoyed learning from this book, and I am certain you will also.
20 September 2011
Preface
Timing is so very important in technology, as well as in our academic and professional careers. We are an extraordinarily lucky generation of programmers who have the initial opportunity to capitalize on inexpensive, generally available, massively parallel computing hardware. The impact of GPGPU (General-Purpose Graphics Processing Units) technology spans all aspects of computation, from the smallest cell phones to the largest supercomputers in the world. They are changing the commercial application landscape, scientific computing, cloud computing, computer visualization, games, and robotics and are even redefining how computer programming is taught. Teraflop (trillion floating-point operations per second) computing is now within the economic reach of most people around the world. Teenagers, students, parents, teachers, professionals, small research organizations, and large corporations can easily afford GPGPU hardware and the software development kits (SDKs) are free. NVIDIA estimates that more than 300 million of their programmable GPGPU devices have already been sold.