Profiling an MPI program with MAP

Objectives
MAP profiler
Code example
Using MAP to profile an executable
Interpreting the profiling data
Exercises

Objectives

You will:

learn how to use MAP to profile a parallel MPI code
learn how to interpret the profiling data

MAP profiler

On NeSI systems the Arm MAP profiler is provided as part of the forge module (along with the parallel debugger DDT).

MAP is a commercial product, which can profile parallel, multi-threaded and single-threaded C/C++, Fortran, as well as Python code. It can be used without code modification.

MAP can be used to identify hotspots and load balance problems in parallel codes. In contrast to the cProfiler described in here, MAP can be used to instrument Python, C, C++ and Fortran codes. In contrast to other profilers, there is no need to recompile the code and MAP supports OpenMP threads and/or MPI communication. It comes with a graphical user interface which makes it easy to drill down into particular code sections or focus on specific time intervals during the run.

For more details see the Arm MAP documentation.

Code example

We’ll use the scatter.py code in directory mpi of the solutions branch. Start by

git fetch --all
git checkout solutions
cd mpi

Using MAP to profile an executable

To use MAP we need to load the forge module in our batch script and add map --profile in front of the executable. See for example

ml forge
map --profile srun python scatter.py

in the Slurm script “scatter_map.sl”.

Upon execution, a file with subscript .map will be generated. The results can be viewed, for instance, with

map python3_scatter_py_4p_1n_1t_2019-05-22_00-58.map

(the .map file name will vary with each run.) See section MAP Profile for how to interpret the results.

Note: command map --profile must precede “srun” in the case of an MPI program. For serial or OpenMP programs we recommend “map” and its options to be after “srun”.

Interpreting the profiling data

Upon execution, a file with subscript .map will be generated. The results can be viewed with the command map, for instance

map python3_scatter_py_8p_1n_2019-01-14_00-31.map

(the .map file name will vary with each run.)

The profile window is divided into three main sections (click on picture to enlarge).

On top, various metrics can be selected in the “Metrics” menu. In the middle part, a source code navigator connects line by line with profiling data. Most interesting is the profiling table at the bottom, which sorts the most time consuming parts of the program, providing function names, source code and line numbers.

The Metrics part can be changed to:

Activity timeline
CPU instructions
CPU Time
IO
Memory
MPI

As an example, “CPU instructions” present the usage of different instruction sets during the program run time.

Exercises

profile the scatter code under the mpi directory:

use 16 MPI tasks

increase the problem size to -nx 256 -ny 256 -nc 256 to make the test run longer

what is the amount of time spent in computeScatteredWave for the above test case?