- Why multiprocessing
- Learn the basics
- Running the scatter code using multiple threads
Learn how to call a function in parallel using shared memory multiprocessing.
We’ll use the code in directory
multiproc. Start by
Multiprocessing is suitable when:
- your computational resources have many CPU cores. On Mahuika, you can access up to 36 cores (72 hyperthreads) within a single node.
- you have a large number of tasks that need to be executed in any order
- decreases your execution time by leveraging multiple CPU cores
- suitable when the workload of each process varies
- uses more resources
- processes must typically run on the same node
Learn the basics
As an example, we’ll assume that you have to apply a very expensive function
f to a number of input values:
import time def f(x): # expensive function time.sleep(10) return x # call the function sequentially for input values 0, 1, 2, 3 and 4 input_values = [x for x in range(5)] res = [f(x) for x in input_values]
In its original form, function
f is called sequentially for each value of
x. The modified version using 3 processes is:
import multiprocessing import time def f(x): # expensive function time.sleep(10) return x # create a "pool" of 3 processes to do the calculations pool = multiprocessing.Pool(processes=3) # the function is called in parallel, using the number of processes # we set when creating the Pool input_values = [x for x in range(5)] res = pool.map(f, input_values)
How it works: each input value of
input_values is put in a queue and handed over to a worker. Here, there are 3 workers who accomplish the task in parallel. When a worker has finished a task, a new task is assigned until the queue is empty. At which point all the elements of array
res have been filled.
The serial version takes about 50 seconds - there are 5 tasks each taking 10 seconds. The multiprocessing version takes about 20 seconds as some processes need to complete two tasks and others only one. Naturally, it would be more efficient to match the number of tasks to the number of processes. In many cases, however, the number of tasks exceeds the number of processes available on the system, meaning that some processes will need to work on several tasks.
Running the scatter code using multiple threads
We’ve created a version of
scatter.py which reads the environment variable
OMP_NUM_THREADS to set the number of processes. We’ve also modified our
scatter.sl script to set the number of processes using
--cpus-per-task=4. To prevent two processes to be placed on each core, we use
To run interactively using 4 processes, type
export OMP_NUM_THREADS=4 python scatter.py
This version of scatter.py has been slightly adapted so that, with a few code changes, you can turn the serial version into one where function
computeFieldcan execute in parallel. Function
computeFieldtakes input argument
k, the flat index into the 2D field arrays representing the incident and scatted fields (i.e.
k = j*nx1 + i), and returns the incident and scattered field values for each grid node. The incident and scattered field values
# change the following for parallel execution res = [computeField(k) for k in range(ny1 * nx1)]
can be computed in any order.
- adapt the code in scatter.py to call
computeFieldin parallel (the number of processes is
- what is the speedup (timing of