floresta-chain: new optimized chainstore #251

Davidson-Souza · 2024-10-08T00:28:34Z

The current chainstore is based on kv, but it has a few problems:

When we flush, we get a huge heap spike
We are getting a 2 or 3 times overhead on headers
It gets kinda slow to retrieve headers during IBD if we flush early

This commit introduces a bare-bones, ad-hock store that consists in two parts:

A open addressing, file backed and memory-mapped hash map to keep the relation block_hash -> block_height
A flat file that contains block headers serialized, in ascending order

To recover a header, given the block height, we simply use pointer arithmetic inside the flat file. If we need to get from the block hash, use the map first, then find it inside the flat file. This has the advantage of not needing explicit flushes (the os will flush it in fixed intervals), flushes are async (the os will do it), we get caching for free (mmap-ed pages will stay in memory if we need) and our cache can react to system constraints, because the kernel will always know how much memory we sill have

JoseSK999 · 2024-10-08T15:54:34Z

My understanding is that kv buckets also don't need explicit flushes. The data saved in a bucket is flushed to disk by the OS periodically, and it's kept in the inner sled pagecache (so we should have fast access), which is configured with a capacity of 100MB in KvChainStore::new.

Davidson-Souza · 2024-10-08T16:23:37Z

That was my understanding too, but for some reason, even before using the cache (second part of #169) it didn't really flush on its own, and if we had an unclean shutdown, we would lose our progress (or a good part of it). It would also become increasingly more CPU and IO heavy as we made progress, I suspect it's due to the block locator getting bigger as our chain grows.

But the biggest problem for me, that I couldn't find an alternative, was the heap spike. It would always crash on my phone, before or after #169. With this PR, it runs fine!

JoseSK999 · 2024-10-08T16:39:33Z

It would also become increasingly more CPU and IO heavy as we made progress, I suspect it's due to the block locator getting bigger as our chain grows.

Isn't that the expected behavior of a node in IBD, as it moves from old empty blocks to more recent ones?

Also was the OS not flushing on its own on desktop or on mobile?

Davidson-Souza · 2024-10-10T00:40:33Z

Isn't that the expected behavior of a node in IBD, as it moves from old empty blocks to more recent ones?

Not this much, at least not for headers. They are small and have constant-size

Also was the OS not flushing on its own on desktop or on mobile?

Both. At least on my setup.

jaoleal

Nice changes, heres mine superficial review... I still didnt finished.

Youre right, this needs a lot of testing and review.
It looks a nice job!

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

The current chainstore is based on `kv`, but it has a few problems: - When we flush, we get a huge heap spike - We are getting a 2 or 3 times overhead on headers - It gets kinda slow to retrieve headers during IBD if we flush early This commit introduces a bare-bones, ad-hock store that consists in two parts: - A open addressing, file backed and memory-mapped hash map to keep the relation block_hash -> block_height - A flat file that contains block headers serialized, in ascending order - A LRU cache to avoid going througth the map every time To recover a header, given the block height, we simply use pointer arithmetic inside the flat file. If we need to get from the block hash, use the map first, then find it inside the flat file. This has the advantage of not needing explicit flushes (the os will flush it in fixed intervals), flushes are async (the os will do it), we get caching for free (mmap-ed pages will stay in memory if we need) and our cache can react to system constraints, because the kernel will always know how much memory we sill have

JoseSK999

Some initial review, looks good so far! I still have to check flat_chain_store.rs

JoseSK999 · 2025-02-13T11:53:54Z

crates/floresta-chain/Cargo.toml

@@ -45,6 +47,7 @@ hex = "0.4.3"
 default = []
 bitcoinconsensus = ["bitcoin/bitcoinconsensus", "dep:bitcoinconsensus"]
 metrics = ["dep:metrics"]
+experimental-db = ["memmap2", "lru"]


Here I would use the dep: prefix to avoid the two implicit features

JoseSK999 · 2025-02-13T12:03:03Z

crates/floresta-chain/src/pruned_utreexo/chain_state.rs

+///
+/// This is safe because we only access the chainstore through the inner lock, and we don't
+/// expose the chainstore to the outside world. We could use a lock for the chainstore, but
+/// that would be overkill and would make a big performance hit.


Nit: Maybe saying "Using a separate lock solely for the chainstore would add unnecessary overhead, as we already rely on a single lock to protect all of ChainState’s data." would be clearer.

JoseSK999

Mainly nits, but I think I have found an issue with the update_block_index method impl, as explained below.

JoseSK999 · 2025-02-16T17:37:54Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

@@ -0,0 +1,1075 @@
+//! A fast database for the chainstore
+//!
+//! In it's infancy, floresta-chain used `kv` as it's database, since `kv` is a small and efficient


Nit: s/it's/its (... infancy, ... database)

JoseSK999 · 2025-02-16T17:39:59Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+//! indexed and retrieved, all at once (~800k for mainnet at the time of writing). If we simply
+//! keep everything in memory, and then make one big batch, most embedded databases will see a big
+//! spike in heap usage. This would be OK for regular desktops, but floresta aims to run in small,
+//! lowe-power devices too, so we can't just assume we have two gigs of RAM to spare. We could make


s/lowe-power/lower-power

JoseSK999 · 2025-02-16T17:41:21Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+//!
+//! This chainstore was designed to reduce the over-usage of both. We do rely on any extra RAM as
+//! kernel buffers, but we also do a decent level of I/O. We get a better performance by using
+//! a ad-hock storage that exploits the fact that the data we keep is canonical and monotonically


s/a ad-hock/an ad-hock

JoseSK999 · 2025-02-16T17:42:32Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+//! kernel buffers, but we also do a decent level of I/O. We get a better performance by using
+//! a ad-hock storage that exploits the fact that the data we keep is canonical and monotonically
+//! increasing, so we keep all headers in a simple flat file, one after the other. So pos(h) = h *
+//! size_of(DiskBlockHeader), with a overhead factor of 1. We also need a way to map block hashes


s/a overhead/an overhead

JoseSK999 · 2025-02-17T12:06:18Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+use crate::DiskBlockHeader;
+
+/// The magic number we use to make sure we're reading the right file
+const FLAT_CHAINSTORE_MAGIC: u32 = 0x6c_73_74_63; // "flst"


"flst" is from "Flat Store" right? The current bytes actually decode to "lstc" ("flst" would be 66 6C 73 74)

JoseSK999 · 2025-02-20T17:45:44Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+pub enum Entry {
+    /// This bucket is empty
+    ///
+    /// Is this is a search, this means the entry isn't in the map, and this is where it would be


s/Is this is/If this is

JoseSK999 · 2025-02-20T17:51:14Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+    unsafe fn hash_map_find_pos(
+        &self,
+        block_hash: BlockHash,
+        get_block_header_by_height: impl Fn(
+            IndexEntry,
+        )
+            -> Result<DiskBlockHeaderAndHash, FlatChainstoreError>,
+    ) -> Result<Entry, FlatChainstoreError> {
+        let mut hash = Self::index_hash_fn(block_hash) as usize;
+        loop {
+            let entry = self
+                .index_map
+                .as_ptr()
+                .wrapping_add((hash & self.index_size) * size_of::<u32>())
+                as *mut IndexEntry;
+
+            let index = (*entry).index();
+            let header = get_block_header_by_height(*entry)?;
+            if header.hash == block_hash {
+                return Ok(Entry::Occupied(entry));
+            }
+
+            if index == 0 {
+                return Ok(Entry::Empty(entry));
+            }
+
+            hash += 1;
+        }
+    }


A suggestion, using a base_ptr for readability:

unsafe fn hash_map_find_pos( &self, block_hash: BlockHash, get_block_header_by_height: impl Fn( IndexEntry, ) -> Result<DiskBlockHeaderAndHash, FlatChainstoreError>, ) -> Result<Entry, FlatChainstoreError> { let mut hash = Self::index_hash_fn(block_hash) as usize; // Retrieve the base pointer to the start of the memory-mapped index let base_ptr = self.index_map.as_ptr(); loop { // Apply a mask to ensure the value is within the valid range of buckets. Then multiply // by 4 to get the byte offset, since each bucket maps to 4 bytes (an u32 index/height) let byte_offset = (hash & self.index_size) * size_of::<u32>(); // Obtain the bucket's address by adding the byte offset to the base pointer let entry_ptr = base_ptr.wrapping_add(byte_offset) as *mut IndexEntry; let index = (*entry_ptr).index(); let header = get_block_header_by_height(*entry_ptr)?; if header.hash == block_hash { return Ok(Entry::Occupied(entry_ptr)); } if index == 0 { return Ok(Entry::Empty(entry_ptr)); } // If no match and bucket is occupied, continue probing the next bucket hash += 1; } }

JoseSK999 · 2025-02-20T18:04:38Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+        }
+    }
+
+    /// The (short) hash function we use to compute where is the map a given block height should be


s/where is the map/where in the map

JoseSK999 · 2025-02-20T18:40:38Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+
+        let pos = self.hash_map_find_pos(hash, get_block_header_by_height)?;
+        match pos {
+            Entry::Empty(pos) | Entry::Occupied(pos) => pos.write(index),


Rewriting the Entry::Occupied(pos) (i.e. a block hash that we already have indexed) would only be used for when a fork becomes canonical and we need to update the MSB tags, right?

A comment about this would be nice.

JoseSK999 · 2025-02-20T19:32:42Z

crates/floresta-chain/src/pruned_utreexo/flat_chain_store.rs

+    fn update_block_index(&self, height: u32, hash: BlockHash) -> Result<(), Self::Error> {
+        let index = IndexEntry::new(height);
+        unsafe {
+            self.block_index
+                .set_index_for_hash(hash, index, |height| self.get_block_header_by_index(height))
+        }
+    }


In this update_block_index method impl we update the block index with a new mainchain block (tagging the block index as mainchain). But if this is done for reorging the chain, we would also need to update the tag for the reorged block indexes to fork. Otherwise we would end up with two mainchain-tagged block hashes with the same height.

A straightforward fix would be to create a update_fork_block_index, such that it is called recursively within mark_chain_as_inactive (similar to what happens with update_block_index within mark_chain_as_active). And conditionally compile such method if experimental-db is set.

Davidson-Souza mentioned this pull request Oct 8, 2024

No data is persisted during IBD #250

Closed

Davidson-Souza mentioned this pull request Dec 23, 2024

Add prometheus support #314

Merged

Davidson-Souza force-pushed the new-chainstore branch 7 times, most recently from 4f43d68 to 9961e8c Compare January 2, 2025 22:05

Davidson-Souza changed the title ~~[WIP] floresta-chain: new optimized chainstore~~ floresta-chain: new optimized chainstore Jan 3, 2025

Davidson-Souza marked this pull request as ready for review January 3, 2025 00:26

jaoleal reviewed Jan 3, 2025

View reviewed changes

Davidson-Souza force-pushed the new-chainstore branch from 9961e8c to 9f33428 Compare February 10, 2025 16:31

Davidson-Souza force-pushed the new-chainstore branch from 9f33428 to 66f70e6 Compare February 11, 2025 16:19

JoseSK999 suggested changes Feb 13, 2025

View reviewed changes

JoseSK999 suggested changes Feb 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

floresta-chain: new optimized chainstore #251

floresta-chain: new optimized chainstore #251

Davidson-Souza commented Oct 8, 2024

JoseSK999 commented Oct 8, 2024

Davidson-Souza commented Oct 8, 2024

JoseSK999 commented Oct 8, 2024 •

edited

Loading

Davidson-Souza commented Oct 10, 2024

jaoleal left a comment

JoseSK999 left a comment

JoseSK999 Feb 13, 2025

JoseSK999 Feb 13, 2025

JoseSK999 left a comment

JoseSK999 Feb 16, 2025

JoseSK999 Feb 16, 2025

JoseSK999 Feb 16, 2025

JoseSK999 Feb 16, 2025

JoseSK999 Feb 17, 2025

JoseSK999 Feb 20, 2025

JoseSK999 Feb 20, 2025

JoseSK999 Feb 20, 2025

JoseSK999 Feb 20, 2025

JoseSK999 Feb 20, 2025

JoseSK999 Feb 20, 2025

floresta-chain: new optimized chainstore #251

Are you sure you want to change the base?

floresta-chain: new optimized chainstore #251

Conversation

Davidson-Souza commented Oct 8, 2024

JoseSK999 commented Oct 8, 2024

Davidson-Souza commented Oct 8, 2024

JoseSK999 commented Oct 8, 2024 • edited Loading

Davidson-Souza commented Oct 10, 2024

jaoleal left a comment

Choose a reason for hiding this comment

JoseSK999 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoseSK999 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoseSK999 commented Oct 8, 2024 •

edited

Loading