• Complain

Dahnken Christopher. - Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops

Here you can read online Dahnken Christopher. - Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Berkeley;CA, year: 2014, publisher: Apress, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Dahnken Christopher. Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops

Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Optimizing HPC Applications with Intel Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interface (MPI) and OpenMP for multi-threading to achieve the ultimate goal of high performance at low power consumption on enterprise-class workstations and compute clusters. The book focuses on optimization for clusters consisting of the Intel Xeon processor, but the optimization methodologies also apply to the Intel Xeon Phi coprocessor and heterogeneous clusters mixing both architectures. Besides the tutorial and reference content, the authors address and refute many myths and misconceptions surrounding the topic. The text is augmented and enriched by descriptions of real-life situations.

Dahnken Christopher.: author's other books


Who wrote Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops? Find out the surname, the name of the author of the book and a list of all author's works by series.

Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Alexander Supalov 2014
Alexander Supalov , Andrey Semin , Michael Klemm and Christopher Dahnken Optimizing HPC Applications with Intel Cluster Tools 10.1007/978-1-4302-6497-2_1
1. No Time to Read This Book?
Alexander Supalov 1, Andrey Semin 1, Michael Klemm 1 and Christopher Dahnken 1
(1)
Tuntenhausen, Germany
We know what it feels like to be under pressure. Try out a few quick and proven optimization stunts described below. They may provide a good enough performance gain right away.
There are several parameters that can be adjusted with relative ease. Here are the steps we follow when hard pressed:
  • Use Intel MPI Library
  • Got more time? Tune Intel MPI:
    • Collect built-in statistics data
    • Tune Intel MPI process placement and pinning
    • Tune OpenMP thread pinning
  • Got still more time? Tune Intel Composer XE:
    • Analyze optimization and vectorization reports
    • Use interprocedural optimization
Using Intel MPI Library
The Intel MPI Library delivers good out-of-the-box performance for bandwidth-bound applications. If your application belongs to this popular class, you should feel the difference immediately when switching over.
If your application has been built for Intel MPI compatible distributions like MPICH, and some others, there is no need to recompile the application. You can switch by dynamically linking the Intel MPI 5.0 libraries at runtime:
$ source /opt/intel/impi_latest/bin64/mpivars.sh
$ mpirun -np 16 -ppn 2 xhpl
If you use another MPI and have access to the application source code, you can rebuild your application using Intel MPI compiler scripts:
  • Use mpicc (for C), mpicxx (for C++), and mpifc / mpif77 / mpif90 (for Fortran) if you target GNU compilers.
  • Use mpiicc , mpiicpc , and mpiifort if you target Intel Composer XE.
Using Intel Composer XE
The invocation of the Intel Composer XE is largely compatible with the widely used GNU Compiler Collection (GCC). This includes both the most commonly used command line options and the language support for C/C++ and Fortran. For many applications you can simply replace gcc with icc , g++ with icpc , and gfortran with ifort . However, be aware that although the binary code generated by Intel C/C++ Composer XE is compatible with the GCC-built executable code, the binary code generated by the Intel Fortran Composer is not.
For example:
$ source /opt/intel/composerxe/bin/compilervars.sh intel64
$ icc -O3 -xHost -qopenmp -c example.o example.c
Revisit the compiler flags you used before the switch; you may have to remove some of them. Make sure that Intel Composer XE is invoked with the flags that give the best performance for your application (see Table
Table 1-1.
Selected Intel Composer XE Optimization Flags
GCC
ICC
Effect
-O0
-O0
Disable (almost all) optimization. Not something you want to use for performance!
-O1
-O1
Optimize for speed (no code size increase for ICC)
-O2
-O2
Optimize for speed and enable vectorization
-O3
-O3
Turn on high-level optimizations
-ftlo
-ipo
Enable interprocedural optimization
-ftree-vectorize
-vec
Enable auto-vectorization (auto-enabled with -O2 and -O3 )
-fprofile-generate
-prof-gen
Generate runtime profile for optimization
-fprofile-use
-prof-use
Use runtime profile for optimization
-parallel
Enable auto-parallelization
-fopenmp
-qopenmp
Enable OpenMP
-g
-g
Emit debugging symbols
-qopt-report
Generate the optimization report
-vec-report
Generate the vectorization report
-ansi-alias
Enable ANSI aliasing rules for C/C++
-msse4.1
-xSSE4.1
Generate code for Intel processors with SSE 4.1 instructions
-mavx
-xAVX
Generate code for Intel processors with AVX instructions
-mavx2
-xCORE-AVX2
Generate code for Intel processors with AVX2 instructions
-mcpu=native
-xHost
Generate code for the current machine used for compilation
For most applications, the default optimization level of -O2 will suffice. It runs fast and gives reasonable performance. If you feel adventurous, try -O3 . It is more aggressive but it also increases the compilation time.
Tuning Intel MPI Library
If you have more time, you can try to tune Intel MPI parameters without changing the application source code.
Gather Built-in Statistics
Intel MPI comes with a built-in statistics-gathering mechanism. It creates a negligible runtime overhead and reports key performance metrics (for example, MPI to computation ratio, message sizes, counts, and collective operations used) in the popular IPM format.
To switch the IPM statistics gathering mode on and do the measurements, enter the following commands:
$ export I_MPI_STATS=ipm
$ mpirun -np 16 xhpl
By default, this will generate a file called stats.ipm . Listing 1-1 shows an example of the MPI statistics gathered for the well-known High Performance Linpack (HPL) benchmark. (We will return to this benchmark throughout this book, by the way.)
Listing 1-1. MPI Statistics for the HPL Benchmark with the Most Interesting Fields Highlighted
Intel(R) MPI Library Version 5.0
Summary MPI Statistics
Stats format: region
Stats scope : full
############################################################################
#
# command : /home/book/hpl/./xhpl_hybrid_intel64_dynamic (completed)
# host : esg066/x86_64_Linux mpi_tasks : 16 on 8 nodes
# start : 02/14/14/12:43:33 wallclock : 2502.401419 sec
# stop : 02/14/14/13:25:16 %comm : 8.43
# gbytes : 0.00000e+00 total gflop/sec : NA
#
############################################################################
# region : * [ntasks] = 16
#
# [total] min max
# entries 16 1 1 1
# wallclock 40034.7 2502.17 2502.13 2502.4
# user 446800 27925 27768.4 28192.7
# system 1971.27 123.205 102.103 145.241
# mpi 3375.05 210.941 132.327 282.462
# %comm 8.43032 5.28855 11.2888
# gflop/sec NA NA NA NA
# gbytes 0 0 0 0
#
#
# [time] [calls]
Alexander Supalov 2014
Alexander Supalov , Andrey Semin , Michael Klemm and Christopher Dahnken Optimizing HPC Applications with Intel Cluster Tools 10.1007/978-1-4302-6497-2_2
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops»

Look at similar books to Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops»

Discussion, reviews of the book Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.