Skip to content

Commit db4fbe6

Browse files
committed
Merge pull request #279 from more-storage-implementations-and-benchmarks
2 parents 54be030 + e99bcd1 commit db4fbe6

27 files changed

+2333
-34
lines changed

linking/functions.go

+6
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,9 @@ func (lsys *LinkSystem) LoadRaw(lnkCtx LinkContext, lnk datamodel.Link) ([]byte,
129129
if err != nil {
130130
return nil, err
131131
}
132+
if closer, ok := reader.(io.Closer); ok {
133+
defer closer.Close()
134+
}
132135
var buf bytes.Buffer
133136
if _, err := io.Copy(&buf, reader); err != nil {
134137
return nil, err
@@ -174,6 +177,9 @@ func (lsys *LinkSystem) Fill(lnkCtx LinkContext, lnk datamodel.Link, na datamode
174177
if err != nil {
175178
return err
176179
}
180+
if closer, ok := reader.(io.Closer); ok {
181+
defer closer.Close()
182+
}
177183
// TrustedStorage indicates the data coming out of this reader has already been hashed and verified earlier.
178184
// As a result, we can skip rehashing it
179185
if lsys.TrustedStorage {

linking/types.go

+7-4
Original file line numberDiff line numberDiff line change
@@ -84,8 +84,11 @@ type (
8484
// and hash in the Link parameter before returning.
8585
// (This is something that the LinkSystem composition will handle if you're using it.)
8686
//
87-
// Some implementations of BlockWriteOpener and BlockReadOpener may be
88-
// found in the storage package. Applications are also free to write their own.
87+
// BlockReadOpener can also be created out of storage.ReadableStorage and attached to a LinkSystem
88+
// via the LinkSystem.SetReadStorage method.
89+
//
90+
// Users of a BlockReadOpener function should also check the io.Reader
91+
// for matching the io.Closer interface, and use the Close function as appropriate if present.
8992
BlockReadOpener func(LinkContext, datamodel.Link) (io.Reader, error)
9093

9194
// BlockWriteOpener defines the shape of a function used to open a writer
@@ -118,8 +121,8 @@ type (
118121
// and when the BlockWriteCommiter is called, it will flush the writes
119122
// and then use a rename operation to place the tempfile in a permanent path based the Link.)
120123
//
121-
// Some implementations of BlockWriteOpener and BlockReadOpener may be
122-
// found in the storage package. Applications are also free to write their own.
124+
// BlockWriteOpener can also be created out of storage.WritableStorage and attached to a LinkSystem
125+
// via the LinkSystem.SetWriteStorage method.
123126
BlockWriteOpener func(LinkContext) (io.Writer, BlockWriteCommitter, error)
124127

125128
// BlockWriteCommitter defines the shape of a function which, together

storage/README_adapters.md

+125
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
Storage Adapters
2+
================
3+
4+
The go-ipld-prime storage APIs were introduced in the v0.14.x ranges of go-ipld-prime,
5+
which happened in fall 2021.
6+
7+
There are many other pieces of code in the IPLD (and even more so, the IPFS) ecosystem
8+
which predate this, and have interfaces that are very _similar_, but not quite exactly the same.
9+
10+
In order to keep using that code, we've built a series of adapters.
11+
12+
You can see these in packages beneath this one:
13+
14+
- `go-ipld-prime/storage/bsadapter` is an adapter to `github.com/ipfs/go-ipfs-blockstore`.
15+
- `go-ipld-prime/storage/dsadapter` is an adapter to `github.com/ipfs/go-datastore`.
16+
- `go-ipld-prime/storage/bsrvadapter` is an adapter to `github.com/ipfs/go-blockservice`.
17+
18+
Note that there are also other packages which implement the go-ipld-prime storage APIs,
19+
but are not considered "adapters" -- these just implement the storage APIs directly:
20+
21+
- `go-ipld-prime/storage/memstore` is a simple in-memory storage system.
22+
- `go-ipld-prime/storage/fsstore` is a simple filesystem-backed storage system
23+
(comparable to, and compatible with [flatfs](https://pkg.go.dev/github.com/ipfs/go-ds-flatfs),
24+
if you're familiar with that -- but higher efficiency).
25+
26+
Finally, note that there are some shared benchmarks across all this:
27+
28+
- check out `go-ipld-prime/storage/benchmarks`!
29+
30+
31+
Why structured like this?
32+
-------------------------
33+
34+
### Why is there adapter code at all?
35+
36+
The `go-ipld-prime/storage` interfaces are a newer generation.
37+
38+
A new generation of APIs was desirable because it unifies the old APIs,
39+
and also because we were able to improves and update several things in the process.
40+
(You can see some of the list of improvements in https://github.com/ipld/go-ipld-prime/pull/265,
41+
where these APIs were first introduced.)
42+
The new generation of APIs avoids several types present in the old APIs which forced otherwise-avoidable allocations.
43+
(See notes later in this document about "which adapter should I use" for more on that.)
44+
Finally, the new generation of APIs is carefully designed to support minimal implementations,
45+
by carefully avoiding use of non-standard-library types in key API definitions,
46+
and by keeping most advanced features behind a standardized convention of feature detection.
47+
48+
Because the newer generation of APIs are not exactly the same as the multiple older APIs we're unifying and updating,
49+
some amount of adapter code is necessary.
50+
51+
(Fortunately, it's not much! But it's not "none", either.)
52+
53+
### Why have this code in a shared place?
54+
55+
The glue code to connect `go-datastore` and the other older APIs
56+
to the new `go-ipld-prime/storage` APIs is fairly minimal...
57+
but there's also no reason for anyone to write it twice,
58+
so we want to put it somewhere easy to share.
59+
60+
### Why do the adapters have their own go modules?
61+
62+
A separate module is used because it's important that go-ipld-prime can be used
63+
without forming a dependency on `go-datastore` (or the other relevant modules, per adapter).
64+
65+
We want this so that there's a reasonable deprecation pathway -- it must be
66+
possible to write new code that doesn't take on transitive dependencies to old code.
67+
68+
(As a bonus, looking at the module dependency graphs makes an interestingly
69+
clear statement about why minimal APIs that don't force transitive dependencies are a good idea!)
70+
71+
### Why is this code all together in this repo?
72+
73+
We put these separate modules in the same git repo as `go-ipld-prime`... because we can.
74+
75+
Technically, neither the storage adapter modules nor the `go-ipld-prime` module depend on each other --
76+
they just have interfaces that are aligned with each other -- so it's very easy to
77+
hold them as separate go modules in the same repo, even though that can otherwise sometimes be tricky.
78+
79+
You may want to make a point of pulling updated versions of the storage adapters that you use
80+
when pulling updates to go-ipld-prime, though.
81+
82+
### Could we put these adapters upstream into the other relevant repos?
83+
84+
Certainly!
85+
86+
We started with them here because it seemed developmentally lower-friction.
87+
That may change; these APIs could move.
88+
This code is just interface satisfaction, so even having multiple copies of it is utterly harmless.
89+
90+
91+
Which of `dsadapter` vs `bsadapter` vs `bsrvadapter` should I use?
92+
------------------------------------------------------------------
93+
94+
None of them, ideally.
95+
A direct implementation of the storage APIs will almost certainly be able to perform better than any of these adapters.
96+
(Check out the `fsstore` package, for example.)
97+
98+
Failing that: use the adapter matching whatever you've got on hand in your code.
99+
100+
There is no correct choice.
101+
102+
`dsadapter` suffers avoidable excessive allocs in processing its key type,
103+
due to choices in the interior of `github.com/ipfs/go-datastore`.
104+
It is also unable to support streaming operation, should you desire it.
105+
106+
`bsadapter` and `bsrvadapter` both also suffer overhead due to their key type,
107+
because they require a transformation back from the plain binary strings used in the storage API to the concrete go-cid type,
108+
which spends some avoidable CPU time (and also, at present, causes avoidable allocs because of some interesting absenses in `go-cid`).
109+
Additionally, they suffer avoidable allocs because they wrap the raw binary data in a "block" type,
110+
which is an interface, and thus heap-escapes; and we need none of that in the storage APIs, and just return the raw data.
111+
They are also unable to support streaming operation, should you desire it.
112+
113+
It's best to choose the shortest path and use the adapter to whatever layer you need to get to --
114+
for example, if you really want to use a `go-datastore` implementation,
115+
*don't* use `bsadapter` and have it wrap a `go-blockstore` that wraps a `go-datastore` if you can help it:
116+
instead, use `dsadapter` and wrap the `go-datastore` without any extra layers of indirection.
117+
You should prefer this because most of the notes above about avoidable allocs are true when
118+
the legacy interfaces are communicating with each other, as well...
119+
so the less you use the internal layering of the legacy interfaces, the better off you'll be.
120+
121+
Using a direct implementation of the storage APIs will suffer none of these overheads,
122+
and so will always be your best bet if possible.
123+
124+
If you have to use one of these adapters, hopefully the performance overheads fall within an acceptable margin.
125+
If not: we'll be overjoyed to accept help porting things.

storage/api.go

+1-2
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,7 @@ type WritableStorage interface {
8989
// --- streaming --->
9090

9191
type StreamingReadableStorage interface {
92-
// Note that the returned io.Reader may also be an io.ReadCloser -- check for this.
93-
GetStream(ctx context.Context, key string) (io.Reader, error)
92+
GetStream(ctx context.Context, key string) (io.ReadCloser, error)
9493
}
9594

9695
// StreamingWritableStorage is a feature-detection interface that advertises support for streaming writes.

storage/benchmarks/README.md

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
benchmarks
2+
==========
3+
4+
This is a small module that pulls in a bunch of storage implementations,
5+
as well as legacy implementations via the adapter modules,
6+
and benchmarks all of them on the same benchmarks.
7+
8+
There's no reason to import this code,
9+
so the go.mod file uses relative paths shamelessly.
10+
(You can create your own benchmarks using the code in `../tests`,
11+
which contains most of the engine; this package is just tables of setup.)
12+
13+
14+
What variations do the benchmarks exercise?
15+
------------------------------------------
16+
17+
- the various storage implementations!
18+
- in some cases: variations of parameters to individual storage implementations. (TODO)
19+
- puts and gets. (TODO: currently only puts.)
20+
- various distributions of data size. (TODO)
21+
- block mode vs streaming mode. (TODO)
22+
- end-to-end use via linksystem with small cbor objects. (TODO)
23+
- (this measures a lot of things that aren't to do with the storage itself -- but is useful to contextualize things.)
24+
25+
Running the benchmarks on variations in hardware and filesystem may also be important!
26+
Many of these storage systems use the disk in some way.
27+
28+
29+
Why is the module structured like this?
30+
----------------------------------------
31+
32+
Because many of the storage implementations are also their own modules,
33+
and we don't want to have the go-ipld-prime module pull in a huge universe of transitive dependencies.
34+
35+
See similar discussion in `../README_adapters.md`.
36+
37+
It may be worth pulling this out into a new git repo in the future,
38+
especially if we want to add more and more implementations to what we benchmark,
39+
or develop additional tools for deploying the benchmark on varying hardware, etc.
40+
For now, it incubates here.

storage/benchmarks/go.mod

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
module github.com/ipld/go-ipld-prime/storage/benchmarks
2+
3+
go 1.16
4+
5+
replace github.com/ipld/go-ipld-prime => ../..
6+
7+
replace github.com/ipld/go-ipld-prime/storage/dsadapter => ../dsadapter
8+
9+
require (
10+
github.com/ipfs/go-ds-flatfs v0.4.5
11+
github.com/ipld/go-ipld-prime v0.12.3
12+
github.com/ipld/go-ipld-prime/storage/dsadapter v0.0.0-20211022093231-ebf675a9bd6d
13+
)

0 commit comments

Comments
 (0)