Project 4: OpenMP and Hybrid Parallelism
In this project, you will gain experience with the basics of shared-memory parallelism using OpenMP. You will also combine distributed- and shared-memory approaches by implementing hybrid parallelism with MPI and OpenMP. In addition to the course text book, you may wish to refer to the excellent OpenMP resource pages at https://computing.llnl.gov/tutorials/openMP/.
Part 1: OpenMP Matrix-Matrix Multiplication
Consider the simple matrix-matrix multiplication,
for i = 1, N
for j = 1, N
for k = 1, N
C[i,j] += A[i,k] * B[k,j]
What strategies could you use to add parallelism using OpenMP threading to this kernel? Is each of the three loops threadable?
Now, let’s implement so OpenMP loop parallelism.
- Modify your MMM code from Project 1 to implement OpenMP threading by adding appropriate compiler directives to the outer loop of the MMM kernel. When compiling the OpenMP version of your code be sure to include the appropriate compiler flag (
-fopenmp
for GCC). - Compute the time-to-solution of your MMM code for 1 thread (e.g.,
export OMP_NUM_THREADS=1
) to the non-OpenMP version (i.e., compiled without the-fopenmp
flag). Any matrix sizeN
will do here. Does it perform as you expect? If not, consider the OpenMP directives you are using. - Perform a thread-to-thread speedup study of your MMM code either on your laptop or HPCC. Compute the total time to solution for a few thread counts (in powers of 2):
1,2,4,...T
, where T is the maximum number of threads available on the machine you are using. Do this for matrix sizes ofN=20,100,1000
. - Plot the times-to-solution for the MMM for each value of
N
separately as functions of the the thread countT
. Compare the scaling of the MMM for different matrix dimensions. - Verify that for the same input matrices that the solution does not depend on the number of threads.
Part 2: Adding OpenMP threading to a simple MPI application
Take a look at the Hello World applications that we have used in past assignments that include basic MPI functionality. Modify one of these applications to include OpenMP.
- Wrap the print statements in an
omp parallel
region. - Make sure to modify the
MPI_Init
call accordingly to allow for threads! What level of thread support do you need? - Compile the code including the appropriate flag for OpenMP support. For a GCC-based MPI installation, this would be, e.g.,
mpic++ -fopenmp hello.cpp
. - Run the code using 2 MPI ranks and 4 OpenMP threads per rank. To do this, prior to executing the run command, set the number of threads environment variable as
> export OMP_NUM_THREADS=4
. Then you can simply execute the application with thempiexec
command:> mpiexec -n 2 ./a.out
. - Explain the output.
Part 3: Hybrid Parallel Matrix Multiplication
Now, let’s combine OpenMP and MPI functionality into a hybrid parallel version of the MMM.
- Add MPI to your OpenMP MMM code by distributing the rows of one of the input matrices across MPI ranks. Have each MPI rank perform its portion of the MMM using OpenMP threading. Think very carefully about the structure of the main MMM loops! Once done, gather the resulting matrix on rank 0 and output the result. Verify that for the same input matrices the result does not depend on either the number of MPI ranks or the number of OpenMP threads per rank.
- On HPCC, carry out a performance study in which you vary the number of MPI ranks, the number of OpenMP threads per rank, and the matrix size. Make plots showing the times to solution for the various cases. Explain your results.
What to turn in
To your git project repo, commit your final working code for the above exercises and a concise write-up including all plots, and detailed responses to the questions posed concerning your results.