Parallel bitswap retrieval #149

rvagg · 2023-03-11T03:30:51Z

WIP, but it works at least. Depends on getting ipld/go-ipld-prime#452 right and landed and there's lots of additional work to do here to make it nicely integrated and configurable.

For now there's some debugging stats coming out just to prove that it's working properly (better proof to come later). At the end of TestHttpFetch/bitswap_large_sharded_directory we get:

PreloadStorage stats:
  preloading hits: 87
  preloading misses: 1
  preloaded hits: 178

i.e. only one complete miss - the first block, past that 87 requests were for in-progress block fetches and 178 were for blocks we've already fetched.

codecov-commenter · 2023-03-11T03:31:51Z

Codecov Report

Merging #149 (0e1c5c0) into main (c90e9ea) will increase coverage by 1.18%.
The diff coverage is 69.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #149      +/-   ##
==========================================
+ Coverage   68.45%   69.64%   +1.18%     
==========================================
  Files          60       62       +2     
  Lines        4521     5040     +519     
==========================================
+ Hits         3095     3510     +415     
- Misses       1268     1337      +69     
- Partials      158      193      +35

Impacted Files	Coverage Δ
cmd/lassie/daemon.go	`0.00% <0.00%> (ø)`
cmd/lassie/fetch.go	`0.00% <0.00%> (ø)`
cmd/lassie/flags.go	`0.00% <ø> (ø)`
cmd/lassie/main.go	`0.00% <0.00%> (ø)`
pkg/net/client/client.go	`67.62% <0.00%> (+0.81%)`	⬆️
pkg/retriever/bitswaphelpers/indexerrouting.go	`95.65% <ø> (ø)`
pkg/retriever/bitswaphelpers/multiblockstore.go	`79.22% <ø> (+1.01%)`	⬆️
pkg/lassie/lassie.go	`60.83% <20.00%> (-3.72%)`	⬇️
pkg/storage/limitstore/limitstore.go	`68.00% <50.00%> (ø)`
pkg/storage/deferredstoragecar.go	`71.42% <71.42%> (ø)`
... and 9 more

... and 3 files with indirect coverage changes

rvagg · 2023-03-21T00:47:41Z

Marked as ready for review but do not merge at this stage due to this relying on a branch in go-ipld-prime that's not merged and still rough. That's my next task.
This is rebased to today's master and it works. Test coverage is not as good as I'd like but we can iterate to that, overall functionality is covered by existing tests, just some specific modes aren't well covered or covered at all.

hannahhoward

A few comments, nothing major -- when IPLD fix is done I'm mostly ready to merge this

I agree, tracking all the various stores is a bit unweildy. It might be nice to put a doc.go in pkg/storage with a package comment explaining what the various recorders are and why you would use one or the other.

cmd/lassie/fetch.go

pkg/net/client/client.go

pkg/retriever/bitswaphelpers/preloadcachingstorage.go

pkg/retriever/bitswapretriever.go

pkg/retriever/bitswaphelpers/preloadcachingstorage.go

willscott · 2023-03-25T10:02:00Z

pkg/retriever/bitswapretriever.go

+		if tmpDir == "" {
+			tmpDir = os.TempDir()
+		}
+		cacheStore := storage.NewDeferredCarStorage(tmpDir)


could the preload cache be fully in memory? presumably we don't need that much of a preload per active retrieval and by keeping those working sets in memory we save additional disk latencies

Yes .. so this started off as in-memory (it's based on @hannahhoward's fully in-memory buffering preload store in the go-ipld-prime PR) but the concern is that there are real cases where a preloader will fill up memory, especially in the Saturn case where you're doing many parallel fetches. e.g. plain unixfs shard with 256 links and you only care about 1 initially, so you go down the next, preloader works on the other 255, the next jump has 256 more, etc. onward down until you come back up and start chewing away at the next branch; that could compound quickly with many parallel fetches.

Future work for this general preloading approach I imagined would have both an in-memory and a filesystem backed storage, where your in-memory objects can overflow into the filesystem if you end up holding too much. The vast majority of cases will not need more than a couple of blocks from the preloader so this would speed things up significantly.

But there's another issue that gets in the way - the traverser needs access to blocks its already passed for those DAG cases where you re-encounter blocks via different paths. They're less common with UnixFS but can happen, and we're also not limited to UnixFS here either. SO that means we need to hang onto the full suite of blocks for the life of a retrieval anyway. Using the approach I've gone with here we just put them all into the same basket on disk, using a CAR as a temporary store, not caring about the ordering, so the blocks are available to the preloader or the loader and in the case of a re-load via the loader. But it's also true that this could be done in memory up to some limit! So I'd really like to work on a storage system that starts in memory but overflows to disk, which we can then use for all the cases we need it in Lassie - graphsync, bitswap, preloader. The user only ever gets a nicely ordered CARv1 and we get to mess around with different ways of juggling blocks internally.

For the selector we're using here do we actually need block data on re-encounter, or would it be sufficient to keep just the list of encountered CIDs so that we can properly answer 'Has' to remember which ones have already been interacted with?

For the ones we're using I think we probably could .. we do allow for arbitrary selectors in the retrieval Request that can be passed in when using Lassie as a library, and this is yet again one of those cases where it'd be nice to be able to differentiate between selectors that care about revisits and those that don't.

For the main cases we're focused on for now, for the purposes of Saturn (these simple path-based selectors), we could probably get away with just a SkipMe{} whenever the LinkSystem is asked for a block that it's already encountered. In which case, just maintaining a list of CIDs would work. @hannahhoward is there anything in the graphsync traverser that would complicate this kind of approach (if we were hypothetically able to restrict the types of selectors we're willing to work with)?

I think maybe not in scope for this batch of work, but we could explore options for the "have simple selector, won't store anything but a list of CIDs we've already spat out" case later.

Ref: filecoin-project/go-data-transfer#371

hannahhoward

OK so the names are good enough that I think I get what all of them do now.

So:

DeferredCarWriter is basically the thing that allows you do defer writing the header or anything else till first block write. And then OnPut is used to do that dance in the IPFS handler to send all the headers when the first block is ready to go through. It deals with write only CARv1.
DeferredCarStorage is the temp read/write store, but again, we defer making the file till first block write
CachingTempStore is the big coordinator, mixing a DeferredCarStorage & and DeferredCarWriter, but then also exposing the two kinds of storage up to the Request logic -- the regular write for real store and the preload store.

Anyway, I think with this I'm ready to merge. Remaining comments are non-blocking. Merge whenever you want.

cmd/lassie/fetch.go

pkg/eventrecorder/eventrecorder.go

pkg/storage/deferredcarwriter.go

rvagg force-pushed the rvagg/preloader branch 9 times, most recently from 42fbe78 to d92b3bf Compare March 21, 2023 00:46

rvagg marked this pull request as ready for review March 21, 2023 00:46

rvagg force-pushed the rvagg/preloader branch 3 times, most recently from f50dd1b to 24b1cbd Compare March 21, 2023 03:47

hannahhoward reviewed Mar 23, 2023

View reviewed changes

rvagg force-pushed the rvagg/preloader branch from 24b1cbd to cfe0f36 Compare March 24, 2023 11:29

rvagg mentioned this pull request Mar 24, 2023

Preloader for selector traversal ipld/go-ipld-prime#452

Merged

willscott reviewed Mar 25, 2023

View reviewed changes

rvagg added 11 commits March 28, 2023 14:50

fix: stop network resources

b066dcb

Ref: filecoin-project/go-data-transfer#371

feat: use preloader for block loading parallelism

fcc5953

feat: storage setup for bitswap preloading

06f0e49

feat: add disable-bitswap, wire up readwrite store from fetch

7413f25

fix: fix race

ad70333

fix: remove debugging

8027160

fix: minor cleanup

97f51d8

feat: repurpose temp CAR store for preload data

c523b3b

fix: remove disable-bitswap

86b4e17

chore: add new storage mode tests

86a6865

feat: wire up options, clean up fetch.go

25138ac

rvagg added 9 commits March 28, 2023 14:54

chore: consolitate storage locations; tweak test

db40f8b

fix: race in parallel tests

4c2808f

fix: proper file cleanup

5ed5de6

fix: adapt to new Preloader signature; one link per call

edfe657

fix: address some review feedback

cb97601

fix: address feedback, introduce PreloadStats

12f9b0b

feat: add preload performance metric; add more docs

8b2609b

fix: rename storage systems for clarity

0a6b1e6

fix: trust local storage

8f43bae

rvagg force-pushed the rvagg/preloader branch from 05584b9 to 8f43bae Compare March 28, 2023 03:56

rvagg added 2 commits March 29, 2023 11:45

fix: preloaded stat as percentage, report on success event

2818969

deps: update deps, incl go-ipld-prime master w/ preloader

0c3d967

rvagg force-pushed the rvagg/preloader branch from 9001e70 to 0c3d967 Compare March 29, 2023 01:21

hannahhoward approved these changes Mar 30, 2023

View reviewed changes

cmd/lassie/fetch.go Show resolved Hide resolved

pkg/eventrecorder/eventrecorder.go Show resolved Hide resolved

pkg/storage/deferredcarwriter.go Outdated Show resolved Hide resolved

fix: address code comment issues

77c4fba

rvagg force-pushed the rvagg/preloader branch from 0e1c5c0 to 77c4fba Compare March 30, 2023 01:59

rvagg merged commit c34d7c2 into main Mar 30, 2023

rvagg deleted the rvagg/preloader branch March 30, 2023 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel bitswap retrieval #149

Parallel bitswap retrieval #149

rvagg commented Mar 11, 2023

codecov-commenter commented Mar 11, 2023 •

edited

Loading

rvagg commented Mar 21, 2023

hannahhoward left a comment

willscott Mar 25, 2023

rvagg Mar 27, 2023

willscott Mar 27, 2023

rvagg Mar 27, 2023 •

edited

Loading

hannahhoward left a comment

Parallel bitswap retrieval #149

Parallel bitswap retrieval #149

Conversation

rvagg commented Mar 11, 2023

codecov-commenter commented Mar 11, 2023 • edited Loading

Codecov Report

rvagg commented Mar 21, 2023

hannahhoward left a comment

Choose a reason for hiding this comment

willscott Mar 25, 2023

Choose a reason for hiding this comment

rvagg Mar 27, 2023

Choose a reason for hiding this comment

willscott Mar 27, 2023

Choose a reason for hiding this comment

rvagg Mar 27, 2023 • edited Loading

Choose a reason for hiding this comment

hannahhoward left a comment

Choose a reason for hiding this comment

codecov-commenter commented Mar 11, 2023 •

edited

Loading

rvagg Mar 27, 2023 •

edited

Loading