Skip to content

Commit c44011f

Browse files
alexanderguzhvaabhinavdangeti
authored andcommitted
AVX512 for PQFastScan (facebookresearch#3276)
Summary: AVX-512 implementation for PQFastScan for QBS. For local benchmarks on 4th gen Xeon, the QPS is up to 10% higher, mostly for a single query case. But as far as I remember, production cases would show higher performance improvements. * Baseline `benchs/bench_ivf_fastscan_single_query.py` (sift1M): https://gist.github.com/alexanderguzhva/c9cde2cb5e9c7675f429623e6faa9fbf * Candidate `benchs/bench_ivf_fastscan_single_query.py` (sift1M): https://gist.github.com/alexanderguzhva/4e8530073a108f73771d38e55bc45b17 * Baseline `benchs/bench_ivf_fastscan.py` (sift1M): https://gist.github.com/alexanderguzhva/9eb03ed60354d7e76cfa25e676f983ac * Candidate `benchs/bench_ivf_fastscan.py` (sift1M): https://gist.github.com/alexanderguzhva/3cbfeba1364dd445a2bb52455966979e mdouze should I modify `pq4_fast_scan_search_1.cpp` as well? It is somewhat cumbersome to dig through various possible sub-implementations Pull Request resolved: facebookresearch#3276 Reviewed By: junjieqi Differential Revision: D54943632 Pulled By: mdouze fbshipit-source-id: 3d70066e9779039559b1734c2be99bf439058246
1 parent 0d4aedf commit c44011f

5 files changed

+784
-2
lines changed

faiss/impl/LookupTableScaler.h

+34
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,23 @@ struct DummyScaler {
3838
return simd16uint16(0);
3939
}
4040

41+
#ifdef __AVX512F__
42+
inline simd64uint8 lookup(const simd64uint8&, const simd64uint8&) const {
43+
FAISS_THROW_MSG("DummyScaler::lookup should not be called.");
44+
return simd64uint8(0);
45+
}
46+
47+
inline simd32uint16 scale_lo(const simd64uint8&) const {
48+
FAISS_THROW_MSG("DummyScaler::scale_lo should not be called.");
49+
return simd32uint16(0);
50+
}
51+
52+
inline simd32uint16 scale_hi(const simd64uint8&) const {
53+
FAISS_THROW_MSG("DummyScaler::scale_hi should not be called.");
54+
return simd32uint16(0);
55+
}
56+
#endif
57+
4158
template <class dist_t>
4259
inline dist_t scale_one(const dist_t&) const {
4360
FAISS_THROW_MSG("DummyScaler::scale_one should not be called.");
@@ -67,6 +84,23 @@ struct NormTableScaler {
6784
return (simd16uint16(res) >> 8) * scale_simd;
6885
}
6986

87+
#ifdef __AVX512F__
88+
inline simd64uint8 lookup(const simd64uint8& lut, const simd64uint8& c)
89+
const {
90+
return lut.lookup_4_lanes(c);
91+
}
92+
93+
inline simd32uint16 scale_lo(const simd64uint8& res) const {
94+
auto scale_simd_wide = simd32uint16(scale_simd, scale_simd);
95+
return simd32uint16(res) * scale_simd_wide;
96+
}
97+
98+
inline simd32uint16 scale_hi(const simd64uint8& res) const {
99+
auto scale_simd_wide = simd32uint16(scale_simd, scale_simd);
100+
return (simd32uint16(res) >> 8) * scale_simd_wide;
101+
}
102+
#endif
103+
70104
// for non-SIMD implem 2, 3, 4
71105
template <class dist_t>
72106
inline dist_t scale_one(const dist_t& x) const {

0 commit comments

Comments
 (0)