LitArk » Books » Computer

Dahnken Christopher. - Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops

Here you can read online Dahnken Christopher. - Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Berkeley;CA, year: 2014, publisher: Apress, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops
Author:
Dahnken Christopher / Klemm Michael / Semin Andrey / Supalov Alexander
Publisher:
Apress
Genre:
Books / Computer
Year:
2014
City:
Berkeley;CA
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Optimizing HPC Applications with Intel Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interface (MPI) and OpenMP for multi-threading to achieve the ultimate goal of high performance at low power consumption on enterprise-class workstations and compute clusters. The book focuses on optimization for clusters consisting of the Intel Xeon processor, but the optimization methodologies also apply to the Intel Xeon Phi coprocessor and heterogeneous clusters mixing both architectures. Besides the tutorial and reference content, the authors address and refute many myths and misconceptions surrounding the topic. The text is augmented and enriched by descriptions of real-life situations.

Dahnken Christopher.: author's other books

Who wrote Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops? Find out the surname, the name of the author of the book and a list of all author's works by series.

Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Alexander Supalov 2014

Alexander Supalov , Andrey Semin , Michael Klemm and Christopher Dahnken Optimizing HPC Applications with Intel Cluster Tools 10.1007/978-1-4302-6497-2_1

1. No Time to Read This Book?

Alexander Supalov 1, Andrey Semin 1, Michael Klemm 1 and Christopher Dahnken 1

(1)

Tuntenhausen, Germany

We know what it feels like to be under pressure. Try out a few quick and proven optimization stunts described below. They may provide a good enough performance gain right away.

There are several parameters that can be adjusted with relative ease. Here are the steps we follow when hard pressed:

Use Intel MPI Library
Got more time? Tune Intel MPI:
- Collect built-in statistics data
- Tune Intel MPI process placement and pinning
- Tune OpenMP thread pinning

Got still more time? Tune Intel Composer XE:
- Analyze optimization and vectorization reports
- Use interprocedural optimization

Using Intel MPI Library

The Intel MPI Library delivers good out-of-the-box performance for bandwidth-bound applications. If your application belongs to this popular class, you should feel the difference immediately when switching over.

If your application has been built for Intel MPI compatible distributions like MPICH, and some others, there is no need to recompile the application. You can switch by dynamically linking the Intel MPI 5.0 libraries at runtime:

$ source /opt/intel/impi_latest/bin64/mpivars.sh

$ mpirun -np 16 -ppn 2 xhpl

If you use another MPI and have access to the application source code, you can rebuild your application using Intel MPI compiler scripts:

Use mpicc (for C), mpicxx (for C++), and mpifc / mpif77 / mpif90 (for Fortran) if you target GNU compilers.
Use mpiicc , mpiicpc , and mpiifort if you target Intel Composer XE.

Using Intel Composer XE

The invocation of the Intel Composer XE is largely compatible with the widely used GNU Compiler Collection (GCC). This includes both the most commonly used command line options and the language support for C/C++ and Fortran. For many applications you can simply replace gcc with icc , g++ with icpc , and gfortran with ifort . However, be aware that although the binary code generated by Intel C/C++ Composer XE is compatible with the GCC-built executable code, the binary code generated by the Intel Fortran Composer is not.

For example:

$ source /opt/intel/composerxe/bin/compilervars.sh intel64

$ icc -O3 -xHost -qopenmp -c example.o example.c

Revisit the compiler flags you used before the switch; you may have to remove some of them. Make sure that Intel Composer XE is invoked with the flags that give the best performance for your application (see Table

Table 1-1.

Selected Intel Composer XE Optimization Flags

GCC	ICC	Effect
-O0	-O0	Disable (almost all) optimization. Not something you want to use for performance!
-O1	-O1	Optimize for speed (no code size increase for ICC)
-O2	-O2	Optimize for speed and enable vectorization
-O3	-O3	Turn on high-level optimizations
-ftlo	-ipo	Enable interprocedural optimization
-ftree-vectorize	-vec	Enable auto-vectorization (auto-enabled with -O2 and -O3 )
-fprofile-generate	-prof-gen	Generate runtime profile for optimization
-fprofile-use	-prof-use	Use runtime profile for optimization
-parallel	Enable auto-parallelization
-fopenmp	-qopenmp	Enable OpenMP
-g	-g	Emit debugging symbols
-qopt-report	Generate the optimization report
-vec-report	Generate the vectorization report
-ansi-alias	Enable ANSI aliasing rules for C/C++
-msse4.1	-xSSE4.1	Generate code for Intel processors with SSE 4.1 instructions
-mavx	-xAVX	Generate code for Intel processors with AVX instructions
-mavx2	-xCORE-AVX2	Generate code for Intel processors with AVX2 instructions
-mcpu=native	-xHost	Generate code for the current machine used for compilation

For most applications, the default optimization level of -O2 will suffice. It runs fast and gives reasonable performance. If you feel adventurous, try -O3 . It is more aggressive but it also increases the compilation time.

Tuning Intel MPI Library

If you have more time, you can try to tune Intel MPI parameters without changing the application source code.

Gather Built-in Statistics

Intel MPI comes with a built-in statistics-gathering mechanism. It creates a negligible runtime overhead and reports key performance metrics (for example, MPI to computation ratio, message sizes, counts, and collective operations used) in the popular IPM format.

To switch the IPM statistics gathering mode on and do the measurements, enter the following commands:

$ export I_MPI_STATS=ipm

$ mpirun -np 16 xhpl

By default, this will generate a file called stats.ipm . Listing 1-1 shows an example of the MPI statistics gathered for the well-known High Performance Linpack (HPL) benchmark. (We will return to this benchmark throughout this book, by the way.)

Listing 1-1. MPI Statistics for the HPL Benchmark with the Most Interesting Fields Highlighted

Intel(R) MPI Library Version 5.0

Summary MPI Statistics

Stats format: region

Stats scope : full

############################################################################

# command : /home/book/hpl/./xhpl_hybrid_intel64_dynamic (completed)

# host : esg066/x86_64_Linux mpi_tasks : 16 on 8 nodes

# start : 02/14/14/12:43:33 wallclock : 2502.401419 sec

# stop : 02/14/14/13:25:16 %comm : 8.43

# gbytes : 0.00000e+00 total gflop/sec : NA

############################################################################

# region : * [ntasks] = 16

# [total] min max

# entries 16 1 1 1

# wallclock 40034.7 2502.17 2502.13 2502.4

# user 446800 27925 27768.4 28192.7

# system 1971.27 123.205 102.103 145.241

# mpi 3375.05 210.941 132.327 282.462

# %comm 8.43032 5.28855 11.2888

# gflop/sec NA NA NA NA

# gbytes 0 0 0 0

# [time] [calls]

Alexander Supalov 2014

Alexander Supalov , Andrey Semin , Michael Klemm and Christopher Dahnken Optimizing HPC Applications with Intel Cluster Tools 10.1007/978-1-4302-6497-2_2

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops»

Look at similar books to Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Max Pumperla

Learning Ray

Cohen Ryan

Android Application Development for the Intel® Platform

Igor Kucherenko

Mastering High Performance with Kotlin: Overcome performance difficulties in Kotlin with a range of exciting techniques and solutions

James Jeffers

High Performance Parallelism Pearls Volume One

Alexander Supalov

Inside the Message Passing Interface: Creating Fast Communication Libraries

Vladimirov A.

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors

Khaled Tannir

Optimizing Hadoop for MapReduce

Iggy Krajci

Android on x86: An Introduction to Optimizing for Intel® Architecture

Rezaur Rahman

Intel® Xeon Phi™ Coprocessor Architecture and Tools: The Guide for Application Developers

Clay Breshears

The Art of Concurrency: A Thread Monkeys Guide to Writing Parallel Applications

James Jeffers

Intel Xeon Phi Coprocessor High Performance Programming

Stephen Blair-Chappell

Parallel Programming with Intel Parallel Studio XE

Reviews about «Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops»

Discussion, reviews of the book Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.