Skip to content

Commit fa1f39e

Browse files
mdouzefacebook-github-bot
authored andcommitted
Fix HNSW stats (facebookresearch#3309)
Summary: Pull Request resolved: facebookresearch#3309 Make sure that the HNSW search stats work, remove stats for deprecated functionality. Remove code of the link and code paper that is not supported anymore. Reviewed By: kuarora, junjieqi Differential Revision: D55247802 fbshipit-source-id: 03f176be092bff6b2db359cc956905d8646ea702
1 parent b77061f commit fa1f39e

File tree

9 files changed

+27
-947
lines changed

9 files changed

+27
-947
lines changed

benchs/link_and_code/README.md

+2-135
Original file line numberDiff line numberDiff line change
@@ -21,138 +21,5 @@ graph to improve the reconstruction. It is described in
2121

2222
ArXiV [here](https://arxiv.org/abs/1804.09996)
2323

24-
Code structure
25-
--------------
26-
27-
The test runs with 3 files:
28-
29-
- `bench_link_and_code.py`: driver script
30-
31-
- `datasets.py`: code to load the datasets. The example code runs on the
32-
deep1b and bigann datasets. See the [toplevel README](../README.md)
33-
on how to download them. They should be put in a directory, edit
34-
datasets.py to set the path.
35-
36-
- `neighbor_codec.py`: this is where the representation is trained.
37-
38-
The code runs on top of Faiss. The HNSW index can be extended with a
39-
`ReconstructFromNeighbors` C++ object that refines the distances. The
40-
training is implemented in Python.
41-
42-
Update: 2023-12-28: the current Faiss dropped support for reconstruction with
43-
this method.
44-
45-
Reproducing Table 2 in the paper
46-
--------------------------------
47-
48-
The results of table 2 (accuracy on deep100M) in the paper can be
49-
obtained with:
50-
51-
```bash
52-
python bench_link_and_code.py \
53-
--db deep100M \
54-
--M0 6 \
55-
--indexkey OPQ36_144,HNSW32_PQ36 \
56-
--indexfile $bdir/deep100M_PQ36_L6.index \
57-
--beta_nsq 4 \
58-
--beta_centroids $bdir/deep100M_PQ36_L6_nsq4.npy \
59-
--neigh_recons_codes $bdir/deep100M_PQ36_L6_nsq4_codes.npy \
60-
--k_reorder 0,5 --efSearch 1,1024
61-
```
62-
63-
Set `bdir` to a scratch directory.
64-
65-
Explanation of the flags:
66-
67-
- `--db deep1M`: dataset to process
68-
69-
- `--M0 6`: number of links on the base level (L6)
70-
71-
- `--indexkey OPQ36_144,HNSW32_PQ36`: Faiss index key to construct the
72-
HNSW structure. It means that vectors are transformed by OPQ and
73-
encoded with PQ 36x8 (with an intermediate size of 144D). The HNSW
74-
level>0 nodes have 32 links (theses ones are "cheap" to store
75-
because there are fewer nodes in the upper levels.
76-
77-
- `--indexfile $bdir/deep1M_PQ36_M6.index`: name of the index file
78-
(without information for the L&C extension)
79-
80-
- `--beta_nsq 4`: number of bytes to allocate for the codes (M in the
81-
paper)
82-
83-
- `--beta_centroids $bdir/deep1M_PQ36_M6_nsq4.npy`: filename to store
84-
the trained beta centroids
85-
86-
- `--neigh_recons_codes $bdir/deep1M_PQ36_M6_nsq4_codes.npy`: filename
87-
for the encoded weights (beta) of the combination
88-
89-
- `--k_reorder 0,5`: number of results to reorder. 0 = baseline
90-
without reordering, 5 = value used throughout the paper
91-
92-
- `--efSearch 1,1024`: number of nodes to visit (T in the paper)
93-
94-
The script will proceed with the following steps:
95-
96-
0. load dataset (and possibly compute the ground-truth if the
97-
ground-truth file is not provided)
98-
99-
1. train the OPQ encoder
100-
101-
2. build the index and store it
102-
103-
3. compute the residuals and train the beta vocabulary to do the reconstruction
104-
105-
4. encode the vertices
106-
107-
5. search and evaluate the search results.
108-
109-
With option `--exhaustive` the results of the exhaustive column can be
110-
obtained.
111-
112-
The run above should output:
113-
```bash
114-
...
115-
setting k_reorder=5
116-
...
117-
efSearch=1024 0.3132 ms per query, R@1: 0.4283 R@10: 0.6337 R@100: 0.6520 ndis 40941919 nreorder 50000
118-
119-
```
120-
which matches the paper's table 2.
121-
122-
Note that in multi-threaded mode, the building of the HNSW structure
123-
is not deterministic. Therefore, the results across runs may not be exactly the same.
124-
125-
Reproducing Figure 5 in the paper
126-
---------------------------------
127-
128-
Figure 5 just evaluates the combination of HNSW and PQ. For example,
129-
the operating point L6&OPQ40 can be obtained with
130-
131-
```bash
132-
python bench_link_and_code.py \
133-
--db deep1M \
134-
--M0 6 \
135-
--indexkey OPQ40_160,HNSW32_PQ40 \
136-
--indexfile $bdir/deep1M_PQ40_M6.index \
137-
--beta_nsq 1 --beta_k 1 \
138-
--beta_centroids $bdir/deep1M_PQ40_M6_nsq0.npy \
139-
--neigh_recons_codes $bdir/deep1M_PQ36_M6_nsq0_codes.npy \
140-
--k_reorder 0 --efSearch 16,64,256,1024
141-
```
142-
143-
The arguments are similar to the previous table. Note that nsq = 0 is
144-
simulated by setting beta_nsq = 1 and beta_k = 1 (ie a code with a single
145-
reproduction value).
146-
147-
The output should look like:
148-
149-
```bash
150-
setting k_reorder=0
151-
efSearch=16 0.0147 ms per query, R@1: 0.3409 R@10: 0.4388 R@100: 0.4394 ndis 2629735 nreorder 0
152-
efSearch=64 0.0122 ms per query, R@1: 0.4836 R@10: 0.6490 R@100: 0.6509 ndis 4623221 nreorder 0
153-
efSearch=256 0.0344 ms per query, R@1: 0.5730 R@10: 0.7915 R@100: 0.7951 ndis 11090176 nreorder 0
154-
efSearch=1024 0.2656 ms per query, R@1: 0.6212 R@10: 0.8722 R@100: 0.8765 ndis 33501951 nreorder 0
155-
```
156-
157-
The results with k_reorder=5 are not reported in the paper, they
158-
represent the performance of a "free coding" version of the algorithm.
24+
The necessary code for this paper was removed from Faiss in version 1.8.0.
25+
For a functioning verinsion, use Faiss 1.7.4.

0 commit comments

Comments
 (0)