datastax
diff --git a/‎README.md
+13-9 b/‎README.md
+13-9
diff --git a/‎UPGRADING.md
+11 b/‎UPGRADING.md
+11
diff --git a/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/IndexConstructionWithStaticSetBenchmark.java
+1-1 b/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/IndexConstructionWithStaticSetBenchmark.java
+1-1
diff --git a/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/RandomVectorsBenchmark.java
+2-1 b/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/RandomVectorsBenchmark.java
+2-1
diff --git a/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/StaticSetVectorsBenchmark.java
+2-1 b/‎benchmarks-jmh/src/main/java/io/github/jbellis/jvector/bench/StaticSetVectorsBenchmark.java
+2-1
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/ByteBufferReader.java
+6-1 b/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/ByteBufferReader.java
+6-1
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/RandomAccessReader.java
+2 b/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/RandomAccessReader.java
+2
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/SimpleReader.java
+5 b/‎jvector-base/src/main/java/io/github/jbellis/jvector/disk/SimpleReader.java
+5
diff --git a/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/ConcurrentNeighborMap.java
+20-11 b/‎jvector-base/src/main/java/io/github/jbellis/jvector/graph/ConcurrentNeighborMap.java
+20-11
@@ -10,17 +10,19 @@ There are two broad categories of ANN index:
 
 Graph-based indexes tend to be simpler to implement and faster, but more importantly they can be constructed and updated incrementally.  This makes them a much better fit for a general-purpose index than partitioning approaches that only work on static datasets that are completely specified up front.  That is why all the major commercial vector indexes use graph approaches.
 
-JVector is a graph index in the DiskANN family tree.
+JVector is a graph index that merges the DiskANN and HNSW family trees.
+JVector borrows the hierarchical structure from HNSW, and uses Vamana (the algorithm behind DiskANN) within each layer.
 
 
 ## JVector Architecture
 
-JVector is a graph-based index that builds on the DiskANN design with composeable extensions.
+JVector is a graph-based index that builds on the HNSW and DiskANN designs with composable extensions.
 
-JVector implements a single-layer graph with nonblocking concurrency control, allowing construction to scale linearly with the number of cores:
+JVector implements a multi-layer graph with nonblocking concurrency control, allowing construction to scale linearly with the number of cores:
 ![JVector scales linearly as thread count increases](https://github.com/jbellis/jvector/assets/42158/f0127bfc-6c45-48b9-96ea-95b2120da0d9)
 
-The graph is represented by an on-disk adjacency list per node, with additional data stored inline to support two-pass searches, with the first pass powered by lossily compressed representations of the vectors kept in memory, and the second by a more accurate representation read from disk.  The first pass can be performed with
+The upper layers of the hierarchy are represnted by an in-memory adjacency list per node. This allows for quick navigation with no IOs.
+The bottom layer of the graph is represented by an on-disk adjacency list per node. JVector uses additional data stored inline to support two-pass searches, with the first pass powered by lossily compressed representations of the vectors kept in memory, and the second by a more accurate representation read from disk.  The first pass can be performed with
 * Product quantization (PQ), optionally with [anisotropic weighting](https://arxiv.org/abs/1908.10396)
 * [Binary quantization](https://huggingface.co/blog/embedding-quantization) (BQ)
 * Fused ADC, where PQ codebooks are transposed and written inline with the graph adjacency list
@@ -51,15 +53,16 @@ First the code:
         int originalDimension = baseVectors.get(0).length();
         // wrap the raw vectors in a RandomAccessVectorValues
         RandomAccessVectorValues ravv = new ListRandomAccessVectorValues(baseVectors, originalDimension);
-
+        
         // score provider using the raw, in-memory vectors
         BuildScoreProvider bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, VectorSimilarityFunction.EUCLIDEAN);
         try (GraphIndexBuilder builder = new GraphIndexBuilder(bsp,
                                                                ravv.dimension(),
                                                                16, // graph degree
                                                                100, // construction search depth
                                                                1.2f, // allow degree overflow during construction by this factor
-                                                               1.2f)) // relax neighbor diversity requirement by this factor
+                                                               1.2f, // relax neighbor diversity requirement by this factor (alpha)
+                                                               true)) // use a hierarchical index
         {
             // build the index (in memory)
             OnHeapGraphIndex index = builder.build(ravv);
@@ -86,6 +89,7 @@ Commentary:
 * For the overflow Builder parameter, the sweet spot is about 1.2 for in-memory construction and 1.5 for on-disk.  (The more overflow is allowed, the fewer recomputations of best edges are required, but the more neighbors will be consulted in every search.)
 * The alpha parameter controls the tradeoff between edge distance and diversity; usually 1.2 is sufficient for high-dimensional vectors; 2.0 is recommended for 2D or 3D datasets.  See [the DiskANN paper](https://suhasjs.github.io/files/diskann_neurips19.pdf) for more details.
 * The Bits parameter to GraphSearcher is intended for controlling your resultset based on external predicates and won’t be used in this tutorial.
+* Setting the addHierarchy parameter to true, build a multi-layer index. This approach has proven more robust in highly challenging scenarios.
 
 
 #### Step 2: more control over GraphSearcher
@@ -129,7 +133,7 @@ This is expected given the approximate nature of the index being created and the
 The code:
 ```java
         Path indexPath = Files.createTempFile("siftsmall", ".inline");
-        try (GraphIndexBuilder builder = new GraphIndexBuilder(bsp, ravv.dimension(), 16, 100, 1.2f, 1.2f)) {
+        try (GraphIndexBuilder builder = new GraphIndexBuilder(bsp, ravv.dimension(), 16, 100, 1.2f, 1.2f, true)) {
             // build the index (in memory)
             OnHeapGraphIndex index = builder.build(ravv);
             // write the index to disk with default options
@@ -218,7 +222,7 @@ Then we need to set up an OnDiskGraphIndexWriter with full control over the cons
         Path indexPath = Files.createTempFile("siftsmall", ".inline");
         Path pqPath = Files.createTempFile("siftsmall", ".pq");
         // Builder creation looks mostly the same
-        try (GraphIndexBuilder builder = new GraphIndexBuilder(bsp, ravv.dimension(), 16, 100, 1.2f, 1.2f);
+        try (GraphIndexBuilder builder = new GraphIndexBuilder(bsp, ravv.dimension(), 16, 100, 1.2f, 1.2f, true);
              // explicit Writer for the first time, this is what's behind OnDiskGraphIndex.write
              OnDiskGraphIndexWriter writer = new OnDiskGraphIndexWriter.Builder(builder.getGraph(), indexPath)
                      .with(new InlineVectors(ravv.dimension()))
@@ -259,7 +263,7 @@ Commentary:
 
 ### Less-obvious points
 
-* Embeddings models product output from a consistent distribution of vectors. This means that you can save and re-use ProductQuantization codebooks, even for a different set of vectors, as long as you had a sufficiently large training set to build it the first time around. ProductQuantization.MAX_PQ_TRAINING_SET_SIZE (128,000 vectors) has proven to be sufficiently large.
+* Embeddings models produce output from a consistent distribution of vectors. This means that you can save and re-use ProductQuantization codebooks, even for a different set of vectors, as long as you had a sufficiently large training set to build it the first time around. ProductQuantization.MAX_PQ_TRAINING_SET_SIZE (128,000 vectors) has proven to be sufficiently large.
 * JDK ThreadLocal objects cannot be referenced except from the thread that created them.  This is a difficult design into which to fit caching of Closeable objects like GraphSearcher.  JVector provides the ExplicitThreadLocal class to solve this.
 * Fused ADC is only compatible with Product Quantization, not Binary Quantization.  This is no great loss since [very few models generate embeddings that are best suited for BQ](https://thenewstack.io/why-vector-size-matters/).  That said, BQ continues to be supported with non-Fused indexes.
 * JVector heavily utilizes the Panama Vector API(SIMD) for ANN indexing and search.  We have seen cases where the memory bandwidth is saturated during indexing and product quantization and can cause the process to slow down. To avoid this, the batch methods for index and PQ builds use a [PhysicalCoreExecutor](https://javadoc.io/doc/io.github.jbellis/jvector/latest/io/github/jbellis/jvector/util/PhysicalCoreExecutor.html) to limit the amount of operations to the physical core count. The default value is 1/2 the processor count seen by Java. This may not be correct in all setups (e.g. no hyperthreading or hybrid architectures) so if you wish to override the default use the `-Djvector.physical_core_count` property, or pass in your own ForkJoinPool instance.
 
@@ -5,11 +5,22 @@
   in each vector with high accuracy by first applying a nonlinear transformation that is individually fit to each
   vector. These nonlinearities are designed to be lightweight and have a negligible impact on distance computation
   performance.
+- Support for hierarchical graph indices. This new type of index blends HNSW and DiskANN in a novel way. An
+  HNSW-like hierarchy resides in memory for quickly seeding the search. This also reduces the need for caching the
+  DiskANN graph near the entrypoint. The base layer of the hierarchy is a DiskANN-like index and inherits its
+  properties. This hierarchical structure can be disabled, ending up with just the base DiskANN layer.  
 
 ## API changes
 - MemorySegmentReader.Supplier and SimpleMappedReader.Supplier must now be explicitly closed, instead of being
   closed by the first Reader created from them.
 - OnDiskGraphIndex no longer closes its ReaderSupplier
+- The constructor of GraphIndexBuilder takes an additional parameter which allows to enable or disable the use of the
+  hierarchy.
+- GraphSearcher can be configured to run pruned searches using GraphSearcher.usePruning. When this is set to true,
+  we do early termination of the search. In certain cases, this can accelerate the search at the potential cost of some
+  accuracy. It is set to false by default.
+- The constructors of GraphIndexBuilder allow to specify different maximum out-degrees for the graphs in each layer.
+  However, this feature does not work with FusedADC in this version.
 
 ### API changes in 3.0.6
 
 
@@ -78,7 +78,7 @@ public void tearDown() throws IOException {
     @Benchmark
     public void buildIndexBenchmark(Blackhole blackhole) throws IOException {
         // score provider using the raw, in-memory vectors
-        try (final var graphIndexBuilder = new GraphIndexBuilder(bsp, ravv.dimension(), M, beamWidth, 1.2f, 1.2f)) {
+        try (final var graphIndexBuilder = new GraphIndexBuilder(bsp, ravv.dimension(), M, beamWidth, 1.2f, 1.2f, true)) {
             final var graphIndex = graphIndexBuilder.build(ravv);
             blackhole.consume(graphIndex);
         }
 
@@ -81,7 +81,8 @@ public void setup() throws IOException {
                 16, // graph degree
                 100, // construction search depth
                 1.2f, // allow degree overflow during construction by this factor
-                1.2f); // relax neighbor diversity requirement by this factor
+                1.2f, // relax neighbor diversity requirement by this factor
+                true); // add the hierarchy
         graphIndex = graphIndexBuilder.build(ravv);
     }
 
 
@@ -69,7 +69,8 @@ public void setup() throws IOException {
                 16, // graph degree
                 100, // construction search depth
                 1.2f, // allow degree overflow during construction by this factor
-                1.2f); // relax neighbor diversity requirement by this factor
+                1.2f, // relax neighbor diversity requirement by this factor
+                true); // add the hierarchy
         graphIndex = graphIndexBuilder.build(ravv);
     }
 
 
@@ -74,7 +74,12 @@ public int readInt() {
     }
 
     @Override
-    public float readFloat() throws IOException {
+    public long readLong() {
+        return bb.getLong();
+    }
+
+    @Override
+    public float readFloat() {
         return bb.getFloat();
     }
 
 
@@ -40,6 +40,8 @@ public interface RandomAccessReader extends AutoCloseable {
 
     float readFloat() throws IOException;
 
+    long readLong() throws IOException;
+
     void readFully(byte[] bytes) throws IOException;
 
     void readFully(ByteBuffer buffer) throws IOException;
 
@@ -48,6 +48,11 @@ public int readInt() throws IOException {
         return raf.readInt();
     }
 
+    @Override
+    public long readLong() throws IOException {
+        return raf.readLong();
+    }
+
     @Override
     public float readFloat() throws IOException {
         return raf.readFloat();
 
@@ -24,32 +24,38 @@
 import io.github.jbellis.jvector.util.DenseIntMap;
 import io.github.jbellis.jvector.util.DocIdSetIterator;
 import io.github.jbellis.jvector.util.FixedBitSet;
+import io.github.jbellis.jvector.util.IntMap;
 
 import static java.lang.Math.min;
 
 /**
  * Encapsulates operations on a graph's neighbors.
  */
 public class ConcurrentNeighborMap {
-    private final DenseIntMap<Neighbors> neighbors;
+    final IntMap<Neighbors> neighbors;
 
     /** the diversity threshold; 1.0 is equivalent to HNSW; Vamana uses 1.2 or more */
-    private final float alpha;
+    final float alpha;
 
     /** used to compute diversity */
-    private final BuildScoreProvider scoreProvider;
+    final BuildScoreProvider scoreProvider;
 
     /** the maximum number of neighbors desired per node */
     public final int maxDegree;
     /** the maximum number of neighbors a node can have temporarily during construction */
     public final int maxOverflowDegree;
 
     public ConcurrentNeighborMap(BuildScoreProvider scoreProvider, int maxDegree, int maxOverflowDegree, float alpha) {
+        this(new DenseIntMap<>(1024), scoreProvider, maxDegree, maxOverflowDegree, alpha);
+    }
+
+    public <T> ConcurrentNeighborMap(IntMap<Neighbors> neighbors, BuildScoreProvider scoreProvider, int maxDegree, int maxOverflowDegree, float alpha) {
+        assert maxDegree <= maxOverflowDegree : String.format("maxDegree %d exceeds maxOverflowDegree %d", maxDegree, maxOverflowDegree);
+        this.neighbors = neighbors;
         this.alpha = alpha;
         this.scoreProvider = scoreProvider;
         this.maxDegree = maxDegree;
         this.maxOverflowDegree = maxOverflowDegree;
-        neighbors = new DenseIntMap<>(1024);
     }
 
     public void insertEdge(int fromId, int toId, float score, float overflow) {
@@ -103,6 +109,7 @@ public void replaceDeletedNeighbors(int nodeId, BitSet toDelete, NodeArray candi
     public Neighbors insertDiverse(int nodeId, NodeArray candidates) {
         while (true) {
             var old = neighbors.get(nodeId);
+            assert old != null : nodeId; // graph.addNode should always be called before this method
             var next = old.insertDiverse(candidates, this);
             if (next == old || neighbors.compareAndPut(nodeId, old, next)) {
                 return next;
@@ -132,10 +139,6 @@ public void addNode(int nodeId) {
         addNode(nodeId, new NodeArray(0));
     }
 
-    public NodesIterator nodesIterator() {
-        return neighbors.keysIterator();
-    }
-
     public Neighbors remove(int node) {
         return neighbors.remove(node);
     }
@@ -262,7 +265,9 @@ private Neighbors insertDiverse(NodeArray toMerge, ConcurrentNeighborMap map) {
                 retainDiverse(merged, 0, map);
             }
             // insertDiverse usually gets called with a LOT of candidates, so trim down the resulting NodeArray
-            var nextNodes = merged.getArrayLength() <= map.nodeArrayLength() ? merged : merged.copy(map.nodeArrayLength());
+            var nextNodes = merged.getArrayLength() <= map.nodeArrayLength()
+                    ? merged
+                    : merged.copy(map.nodeArrayLength());
             return new Neighbors(nodeId, nextNodes);
         }
 
@@ -402,16 +407,20 @@ public NeighborWithShortEdges(Neighbors neighbors, double shortEdges) {
         }
     }
 
-    private static class NeighborIterator extends NodesIterator {
+    private static class NeighborIterator implements NodesIterator {
         private final NodeArray neighbors;
         private int i;
 
         private NeighborIterator(NodeArray neighbors) {
-            super(neighbors.size());
             this.neighbors = neighbors;
             i = 0;
         }
 
+        @Override
+        public int size() {
+            return neighbors.size();
+        }
+
         @Override
         public boolean hasNext() {
             return i < neighbors.size();
Original file line number	Diff line number	Diff line change
`@@ -78,7 +78,7 @@ public void tearDown() throws IOException {`
`78`	`78`	`@Benchmark`
`79`	`79`	`public void buildIndexBenchmark(Blackhole blackhole) throws IOException {`
`80`	`80`	`// score provider using the raw, in-memory vectors`
`81`		`- try (final var graphIndexBuilder = new GraphIndexBuilder(bsp, ravv.dimension(), M, beamWidth, 1.2f, 1.2f)) {`
	`81`	`+ try (final var graphIndexBuilder = new GraphIndexBuilder(bsp, ravv.dimension(), M, beamWidth, 1.2f, 1.2f, true)) {`
`82`	`82`	`final var graphIndex = graphIndexBuilder.build(ravv);`
`83`	`83`	`blackhole.consume(graphIndex);`
`84`	`84`	`}`
Original file line number	Diff line number	Diff line change
`@@ -81,7 +81,8 @@ public void setup() throws IOException {`
`81`	`81`	`16, // graph degree`
`82`	`82`	`100, // construction search depth`
`83`	`83`	`1.2f, // allow degree overflow during construction by this factor`
`84`		`- 1.2f); // relax neighbor diversity requirement by this factor`
	`84`	`+ 1.2f, // relax neighbor diversity requirement by this factor`
	`85`	`+ true); // add the hierarchy`
`85`	`86`	`graphIndex = graphIndexBuilder.build(ravv);`
`86`	`87`	`}`
`87`	`88`
Original file line number	Diff line number	Diff line change
`@@ -69,7 +69,8 @@ public void setup() throws IOException {`
`69`	`69`	`16, // graph degree`
`70`	`70`	`100, // construction search depth`
`71`	`71`	`1.2f, // allow degree overflow during construction by this factor`
`72`		`- 1.2f); // relax neighbor diversity requirement by this factor`
	`72`	`+ 1.2f, // relax neighbor diversity requirement by this factor`
	`73`	`+ true); // add the hierarchy`
`73`	`74`	`graphIndex = graphIndexBuilder.build(ravv);`
`74`	`75`	`}`
`75`	`76`
Original file line number	Diff line number	Diff line change
`@@ -74,7 +74,12 @@ public int readInt() {`
`74`	`74`	`}`
`75`	`75`
`76`	`76`	`@Override`
`77`		`- public float readFloat() throws IOException {`
	`77`	`+ public long readLong() {`
	`78`	`+ return bb.getLong();`
	`79`	`+ }`
	`80`	`+`
	`81`	`+ @Override`
	`82`	`+ public float readFloat() {`
`78`	`83`	`return bb.getFloat();`
`79`	`84`	`}`
`80`	`85`