Homework 11, 11/19 in-class

OpenMP SIMD Directives

Start from your Matrix-Matrix multiplication code from previous assignments. Compile the code with flags set to get optimization reports. For the GNU compilers, these flags are -fopt-info and -fopt-info-all, the latter being far more verbose. Compile your code with varying levels of optimization, -O0, -O2, -O3. Look at the output of the optimization reports and study how the optimizations change. Is the inner most loop of the matrix multiply vectorized automatically by the compiler? Another handy compiler flag for GNU is -fopt-info-missed. This will tell you what optimizations the compiler did not do and why. To check for loop vectorization, try -ftree-vectorizer-verbose=2.
Now, add OpenMP SIMD directives to the inner loop and repeat the investigation from above. Be sure to include the -fopenmp compiler flag on GNU. Think carefully about how to correctly use the SIMD directives. Are any clauses needed? Do you need to re-write the loop somehow to ensure correct vectorization?
Now try to break vectorization. Place the body of the inner most loop (essentially just the C += A*B part) in separate function so that the compiler chooses not to vectorize the innermost loop. Review the optimization reports to verify the compiler is not vectorizing that loop and check that the performance of the code is indeed reduced.
Use the omp simd declare directives to make this function vectorizable. Again, use the optimization reports to check this and verify that you see an increase in performance.

NOTE: GCC may have trouble with vectorizing your loops! Try using the Intel compilers on HPCC and compare your output to GCC.

What to turn in

To your git repo, in the hw11 directory, commit your working code for the above exercise. Your code is due on 12/1.

CMSE 822: Parallel Computing

Michigan State University - Computational Mathematics, Science, and Engineering 822: Parallel Computing

Homework 11, 11/19 in-class

OpenMP SIMD Directives

What to turn in