• Complain

Dr. Brian Tuomanen - Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA

Here you can read online Dr. Brian Tuomanen - Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2018, publisher: Packt Publishing, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

No cover
  • Book:
    Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA
  • Author:
  • Publisher:
    Packt Publishing
  • Genre:
  • Year:
    2018
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Hands-On GPU Programming with Python and CUDA hits the ground running: youll start by learning how to apply Amdahls Law, use a code profiler to identify bottlenecks in your Python code, and set up an appropriate GPU programming environment. Youll then see how to query the GPUs features and copy arrays of data to and from the GPUs own memory.As you make your way through the book, youll launch code directly onto the GPU and write full blown GPU kernels and device functions in CUDA C. Youll get to grips with profiling GPU code effectively and fully test and debug your code using Nsight IDE. Next, youll explore some of the more well-known NVIDIA libraries, such as cuFFT and cuBLAS.With a solid background in place, you will now apply your new-found knowledge to develop your very own GPU-based deep neural network from scratch. Youll then explore advanced topics, such as warp shuffling, dynamic parallelism, and PTX assembly. In the final chapter, youll see some topics and applications related to GPU programming that you may wish to pursue, including AI, graphics, and blockchain.By the end of this book, you will be able to apply GPU programming to problems related to data science and high-performance computing.What you will learn Launch GPU code directly from Python Write effective and efficient GPU kernels and device functions Use libraries such as cuFFT, cuBLAS, and cuSolver Debug and profile your code with Nsight and Visual Profiler Apply GPU programming to datascience problems Build a GPU-based deep neuralnetwork from scratch Explore advanced GPU hardware features, such as warp shufflingWho this book is forHands-On GPU Programming with Python and CUDA is for developers and data scientists who want to learn the basics of effective GPU programming to improve performance using Python code. You should have an understanding of first-year college or university-level engineering mathematics and physics, and have some experience with Python as well as in any C-based programming language such as C, C++, Go, or Java.

Dr. Brian Tuomanen: author's other books


Who wrote Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA? Find out the surname, the name of the author of the book and a list of all author's works by series.

Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
The basics

While you now know many of the intricacies of low-level GPU programming, you won't be able to apply this knowledge to machine learning immediately . If you don't have the basic skills in this field, like how to do a basic statistical analysis of a dataset, you really should stop and familiarize yourself with them. Stanford Professor Andrew Ng, the founder of Google Brain, provides many materials that are available for free on the web and on YouTube. Professor Ng's work is generally considered to be the gold standard of educational material on machine learning.

Professor Ng provides a free introductory machine learning class on the web here: http://www.ml-class.org.
Chapter 5, Streams, Events, Contexts, and Concurrency
  1. The performance improves for both; as we increase the number of threads, the GPU reaches peak utilization in both cases, reducing the gains made through using streams.
  2. Yes, you can launch an arbitrary number of kernels asynchronously and synchronize them to with cudaDeviceSynchronize.
  3. Open up your text editor and try it!
  4. High standard deviation would mean that the GPU is being used unevenly, overwhelming the GPU at some points and under-utilizing it at others. A low standard deviation would mean that all launched operations are running generally smoothly.
  5. i. The host can generally handle far fewer concurrent threads than a GPU. ii. Each thread requires its own CUDA context. The GPU can become overwhelmed with excessive contexts, since each has its own memory space and has to handle its own loaded executable code.
DirectX 12

DirectX 12 is the latest iteration of Microsoft's well-known and well-supported graphics API. While this is proprietary for Windows PCs and Microsoft Xbox game consoles, these systems obviously have a wide install base of hundreds of millions of users. Furthermore, a variety of GPUs are supported on Windows PCs besides NVIDIA cards, and the Visual Studio IDE provides a great ease of use. DirectX 12 actually supports low-level GPGPU programming-type concepts and can utilize multiple GPUs.

Microsoft's DirectX 12 Programming Guide is available here: https://docs.microsoft.com/en-us/windows/desktop/direct3d12/directx-12-programming-guide.
Questions
  1. Suppose that you use nvcc to compile a single .cu file containing both host and kernel code into an EXE file, and also into a PTX file. Which file will contain the host functions, and which file will contain the GPU code?
  2. Why do we have to destroy a context if we are using the CUDA Driver API?
  3. At the beginning of this chapter when we first saw how to use Ctypes, notice that we had to typecast the floating point value 3.14 to a Ctypes c_double object in a call to printf before it would work. Yet we can see many working cases of not typecasting to Ctypes in this chapter. Why do you think printf is an exception here?
  4. Suppose you want to add functionality to our Python CUDA Driver interface module to support CUDA streams. How would you represent a single stream object in Ctypes?
  5. Why do we use extern "C" for functions in mandelbrot.cu?
  6. Look at mandelbrot_driver.py again. Why do we not use the cuCtxSynchronize function after GPU memory allocations and host/GPU memory transfers, and only after the single kernel invocation?
Chapter 2, Setting Up Your GPU Programming Environment
  1. No, CUDA only supports Nvidia GPUs, not Intel HD or AMD Radeon
  2. This book only uses Python 2.7 examples
  3. Device Manager
  4. lspci
  5. free
  6. .run
Technical requirements

A Linux or Windows 10 PC with a modern NVIDIA GPU (2016onward) is required for this chapter, with all of the necessary GPU drivers and the CUDA Toolkit (9.0onward) installed. A suitable Python 2.7 installation (such as Anaconda Python 2.7) with the PyCUDA module is also required.

This chapter's code is also available on GitHub at https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA.

For more information about the prerequisites for this chapter, check out the preface of this book. For the software and hardware requirements, check out the README file in https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA.
Implementation of the softmax layer

We will now look at how we can implement a softmax layer. As we have already discussed, a sigmoid layer is used for assigning labels to a classthat is, if you want to have multiple nonexclusive characteristics that you want to infer from an input, you should use a sigmoid layer. A softmax layer is used when you only want to assign a single class to a sample by inferencethis is done by computing a probability for each possible class (with probabilities over all classes, of course, summing to 100%). We can then select the class with the highest probability to give the final classification.

Now, let's see exactly what the softmax layer doesgiven a set of a collection of N real numbers (c0, ..., cN-1) , we first compute the sum of the exponential function on each number (and then calculate the exponential of each number divided by this sum to - photo 1), and then calculate the exponential of each number divided by this sum to yield the softmax:

Lets start with our implementation We will start by writing two very short - photo 2

Let's start with our implementation. We will start by writing two very short CUDA kernels: one that takes the exponential of each input, and another that takes the mean over all of the points:

SoftmaxExpCode='''
__global__ void softmax_exp( int num, float *x, float *y, int batch_size)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < num)
{
for (int k=0; k < batch_size; k++)
{
y[num*k + i] = expf(x[num*k+i]);
}
}
}
'''
exp_mod = SourceModule(SoftmaxExpCode)
exp_ker = exp_mod.get_function('softmax_exp')
SoftmaxMeanCode='''
__global__ void softmax_mean( int num, float *x, float *y, int batch_size)
{
int i = blockDim.x*blockIdx.x + threadIdx.x;
if (i < batch_size)
{
float temp = 0.0f;
for(int k=0; k < num; k++)
temp += x[i*num + k];
for(int k=0; k < num; k++)
y[i*num+k] = x[i*num+k] / temp;
}
return;
}'''
mean_mod = SourceModule(SoftmaxMeanCode)
mean_ker = mean_mod.get_function('softmax_mean')

Now, let's write a Python wrapper class, like we did previously. First, we will start with the constructor, and we will indicate the number of both inputs and outputs with num. We can also specify a default stream, if we wish:

class SoftmaxLayer:
def __init__(self, num=None, stream=None):
self.num = np.int32(num)
self.stream = stream

Now, let's write eval_ function in a way that is similar to the dense layer :

def eval_(self, x, y=None, batch_size=None, stream=None):
if stream is None:
stream = self.stream
if type(x) != pycuda.gpuarray.GPUArray:
temp = np.array(x,dtype=np.float32)
x = gpuarray.to_gpu_async( temp , stream=stream)
if batch_size==None:
if len(x.shape) == 2:
batch_size = np.int32(x.shape[0])
else:
batch_size = np.int32(1)
else:
batch_size = np.int32(batch_size)
if y is None:
if batch_size == 1:
y = gpuarray.empty((self.num,), dtype=np.float32)
else:
y = gpuarray.empty((batch_size, self.num), dtype=np.float32)
exp_ker(self.num, x, y, batch_size, block=(32,1,1), grid=(int( np.ceil( self.num / 32) ), 1, 1), stream=stream)
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA»

Look at similar books to Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA»

Discussion, reviews of the book Hands-On GPU Programming with Python and CUDA: Explore high-performance parallel computing with CUDA and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.