Skip to content

Commit b2e91f6

Browse files
Carl Lovefacebook-github-bot
Carl Love
authored andcommitted
Unroll loop in lookup_2_lanes (#3364)
Summary: The current loop goes from 0 to 31. It has an if statement to do an assignment for j < 16 and a different assignment for j >= 16. By unrolling the loop to do the j < 16 and the j >= 16 iterations in parallel the if j < 16 is eliminated and the number of loop iterations is reduced in half. Then unroll the loop for the j < 16 and the j >=16 to a depth of 2. This change results in approximately a 55% reduction in the execution time for the bench_ivf_fastscan.py workload on Power 10 when compiled with CMAKE_INSTALL_CONFIG_NAME=Release. The removal of the if (j < 16) statement and the unrolling of the loop removes branch cycle stall and register dependencies on instruction issue. The result is the unrolled code is able issue instructions earlier thus reducing the total number of cycles required to execute the function. Pull Request resolved: #3364 Reviewed By: kuarora Differential Revision: D56455690 Pulled By: mdouze fbshipit-source-id: 490a17a40d9d4439b1a8ea22e991e706d68fb2fa
1 parent 5893ab7 commit b2e91f6

File tree

2 files changed

+1088
-0
lines changed

2 files changed

+1088
-0
lines changed

faiss/utils/simdlib.h

+4
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@
2727

2828
#include <faiss/utils/simdlib_neon.h>
2929

30+
#elif defined(__PPC64__)
31+
32+
#include <faiss/utils/simdlib_ppc64.h>
33+
3034
#else
3135

3236
// emulated = all operations are implemented as scalars

0 commit comments

Comments
 (0)