You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current loop goes from 0 to 31. It has an if statement to do an
assignment for j < 16 and a different assignment for j >= 16. By unrolling
the loop to do the j < 16 and the j >= 16 iterations in parallel the if
j < 16 is eliminated and the number of loop iterations is reduced in half.
Then unroll the loop for the j < 16 and the j >=16 to a depth of 2.
This change results in approximately a 55% reduction in the execution time
for the bench_ivf_fastscan.py workload on Power 10 when compiled with
CMAKE_INSTALL_CONFIG_NAME=Release.
The removal of the if (j < 16) statement and the unrolling of the loop
removes branch cycle stall and register dependencies on instruction issue.
The result is the unrolled code is able issue instructions earlier thus
reducing the total number of cycles required to execute the function.
0 commit comments