Project 3: MPI Ping-Pong and Ring Shift
Background
The ping-pong problem is a benchmark often used to evaluate the performance of message passing interfaces (MPI) in parallel computing. In this problem, two processes exchange messages back and forth a specified number of times, with each process sending a message and receiving a message alternately. In the ping-pong, process i
sends a message of size m
to process j
, then receives a message of size m
back from j
. The values of i
, j
, and m
to use are given below.
The “ring shift” problem is similar to ping-pong. In the MPI ring shift, a group of processes is arranged in a ring, with each process holding a unique subset of a larger array of data. The goal is to shift the data elements by a specified number of positions around the ring, wrapping around the ends of the ring as necessary.
Part 1: Blocking Ping-Pong
Your task is to implement the ping-pong problem using MPI in C or C++ and analyze the behavior and performance of your code. Specifically, you should:
- Implement the ping-pong problem using MPI in C or C++. Use blocking
MPI_Send()
andMPI_Recv()
calls. You should define the number of iterations and the size of the message to be exchanged. - Measure the time taken to complete the ping-pong exchange for different message sizes. You should use the
MPI_Wtime()
function to obtain the time before and after the exchange and calculate the elapsed time. Vary the message size from 2 bytes to 4 kilobytes in powers of 2 (i.e., 2 bytes, 4 bytes, 8 bytes,…, 2048 bytes, 4096 bytes). For each message size, perform 100 iterations of the ping-pong to build up statistical significance. - Record the total amount of data sent and received during the ping-pong exchange for each configuration.
- Repeat steps 2 and 3 but ensure that the 2 processes that are communicating reside on different physical hardware nodes on HPCC.
- Plot the average communication time of a single exchange (send and receive) as a function of message size for the two cases. Using this plot, estimate the latency and bandwidth for each case. Are they different? Explain your results.
- Analyze and discuss your results. Explain the behavior of the resulting curves.
Part 2: Non-block Ping-Pong
Repeat Part 1 using non-blocking MPI communication, i.e., using MPI_Isend()
and MPI_Irecv()
. You will need to include explicit process synchronization using, e.g., MPI_Wait()
calls. Compare the results to the blocking case.
Part 3: MPI Ring Shift
- Implement the MPI ring shift in C or C++ for an arbitrary number of processes in the ring and arbitrary message size (i.e., number of elements per process). In your implementation, use
MPI_Sendrecv()
instead of separateMPI_Send()
andMPI_Recv()
calls. - As in Parts 1 and 2, vary the message size from 2 bytes to 4 kb, in powers of 2. Also vary the number of processes used from 2 to
N
, in powers of 2, whereN
is sufficiently large that rank 0 and rankN-1
are guaranteed to reside on separate nodes (N
will depend on which cluster you are using on HPCC). - Compute the bandwidth and latency, as above. Plot the bandwidth as a function of message size. Include separate lines for each number of processes used.
- Analyze and discuss your results. Explain the behavior of the resulting curves.
Part 4: Non-blocking MPI Ring Shift
Repeat Part 3 but using non-blocking communication via MPI_Isend()
and MPI_Irecv()
. Compare the results to the blocking case.
What to turn-in
To your git project repo, commit your final working code for the above exercises and a concise write-up including all plots, and detailed responses to the questions posed concerning your results.