@@ -21,138 +21,5 @@ graph to improve the reconstruction. It is described in
21
21
22
22
ArXiV [ here] ( https://arxiv.org/abs/1804.09996 )
23
23
24
- Code structure
25
- --------------
26
-
27
- The test runs with 3 files:
28
-
29
- - ` bench_link_and_code.py ` : driver script
30
-
31
- - ` datasets.py ` : code to load the datasets. The example code runs on the
32
- deep1b and bigann datasets. See the [ toplevel README] ( ../README.md )
33
- on how to download them. They should be put in a directory, edit
34
- datasets.py to set the path.
35
-
36
- - ` neighbor_codec.py ` : this is where the representation is trained.
37
-
38
- The code runs on top of Faiss. The HNSW index can be extended with a
39
- ` ReconstructFromNeighbors ` C++ object that refines the distances. The
40
- training is implemented in Python.
41
-
42
- Update: 2023-12-28: the current Faiss dropped support for reconstruction with
43
- this method.
44
-
45
- Reproducing Table 2 in the paper
46
- --------------------------------
47
-
48
- The results of table 2 (accuracy on deep100M) in the paper can be
49
- obtained with:
50
-
51
- ``` bash
52
- python bench_link_and_code.py \
53
- --db deep100M \
54
- --M0 6 \
55
- --indexkey OPQ36_144,HNSW32_PQ36 \
56
- --indexfile $bdir /deep100M_PQ36_L6.index \
57
- --beta_nsq 4 \
58
- --beta_centroids $bdir /deep100M_PQ36_L6_nsq4.npy \
59
- --neigh_recons_codes $bdir /deep100M_PQ36_L6_nsq4_codes.npy \
60
- --k_reorder 0,5 --efSearch 1,1024
61
- ```
62
-
63
- Set ` bdir ` to a scratch directory.
64
-
65
- Explanation of the flags:
66
-
67
- - ` --db deep1M ` : dataset to process
68
-
69
- - ` --M0 6 ` : number of links on the base level (L6)
70
-
71
- - ` --indexkey OPQ36_144,HNSW32_PQ36 ` : Faiss index key to construct the
72
- HNSW structure. It means that vectors are transformed by OPQ and
73
- encoded with PQ 36x8 (with an intermediate size of 144D). The HNSW
74
- level>0 nodes have 32 links (theses ones are "cheap" to store
75
- because there are fewer nodes in the upper levels.
76
-
77
- - ` --indexfile $bdir/deep1M_PQ36_M6.index ` : name of the index file
78
- (without information for the L&C extension)
79
-
80
- - ` --beta_nsq 4 ` : number of bytes to allocate for the codes (M in the
81
- paper)
82
-
83
- - ` --beta_centroids $bdir/deep1M_PQ36_M6_nsq4.npy ` : filename to store
84
- the trained beta centroids
85
-
86
- - ` --neigh_recons_codes $bdir/deep1M_PQ36_M6_nsq4_codes.npy ` : filename
87
- for the encoded weights (beta) of the combination
88
-
89
- - ` --k_reorder 0,5 ` : number of results to reorder. 0 = baseline
90
- without reordering, 5 = value used throughout the paper
91
-
92
- - ` --efSearch 1,1024 ` : number of nodes to visit (T in the paper)
93
-
94
- The script will proceed with the following steps:
95
-
96
- 0 . load dataset (and possibly compute the ground-truth if the
97
- ground-truth file is not provided)
98
-
99
- 1 . train the OPQ encoder
100
-
101
- 2 . build the index and store it
102
-
103
- 3 . compute the residuals and train the beta vocabulary to do the reconstruction
104
-
105
- 4 . encode the vertices
106
-
107
- 5 . search and evaluate the search results.
108
-
109
- With option ` --exhaustive ` the results of the exhaustive column can be
110
- obtained.
111
-
112
- The run above should output:
113
- ``` bash
114
- ...
115
- setting k_reorder=5
116
- ...
117
- efSearch=1024 0.3132 ms per query, R@1: 0.4283 R@10: 0.6337 R@100: 0.6520 ndis 40941919 nreorder 50000
118
-
119
- ```
120
- which matches the paper's table 2.
121
-
122
- Note that in multi-threaded mode, the building of the HNSW structure
123
- is not deterministic. Therefore, the results across runs may not be exactly the same.
124
-
125
- Reproducing Figure 5 in the paper
126
- ---------------------------------
127
-
128
- Figure 5 just evaluates the combination of HNSW and PQ. For example,
129
- the operating point L6&OPQ40 can be obtained with
130
-
131
- ``` bash
132
- python bench_link_and_code.py \
133
- --db deep1M \
134
- --M0 6 \
135
- --indexkey OPQ40_160,HNSW32_PQ40 \
136
- --indexfile $bdir /deep1M_PQ40_M6.index \
137
- --beta_nsq 1 --beta_k 1 \
138
- --beta_centroids $bdir /deep1M_PQ40_M6_nsq0.npy \
139
- --neigh_recons_codes $bdir /deep1M_PQ36_M6_nsq0_codes.npy \
140
- --k_reorder 0 --efSearch 16,64,256,1024
141
- ```
142
-
143
- The arguments are similar to the previous table. Note that nsq = 0 is
144
- simulated by setting beta_nsq = 1 and beta_k = 1 (ie a code with a single
145
- reproduction value).
146
-
147
- The output should look like:
148
-
149
- ``` bash
150
- setting k_reorder=0
151
- efSearch=16 0.0147 ms per query, R@1: 0.3409 R@10: 0.4388 R@100: 0.4394 ndis 2629735 nreorder 0
152
- efSearch=64 0.0122 ms per query, R@1: 0.4836 R@10: 0.6490 R@100: 0.6509 ndis 4623221 nreorder 0
153
- efSearch=256 0.0344 ms per query, R@1: 0.5730 R@10: 0.7915 R@100: 0.7951 ndis 11090176 nreorder 0
154
- efSearch=1024 0.2656 ms per query, R@1: 0.6212 R@10: 0.8722 R@100: 0.8765 ndis 33501951 nreorder 0
155
- ```
156
-
157
- The results with k_reorder=5 are not reported in the paper, they
158
- represent the performance of a "free coding" version of the algorithm.
24
+ The necessary code for this paper was removed from Faiss in version 1.8.0.
25
+ For a functioning verinsion, use Faiss 1.7.4.
0 commit comments