Support Storage and Retrieval of Large & Arbitrary IPLD DAGs in Filecoin #22

Stebalien · 2021-02-17T05:42:45Z

No description provided.

proposals/large-ipld-dags.md

aschmahmann · 2021-02-17T17:02:30Z

proposals/large-ipld-dags.md

+
+At the moment, any tool wishing to support storing IPFS files/directories larger than 32GiB will need to store these IPFS files/directories as "raw blocks", throwing away all the DAG structural information. This will make future retrieval deals for subsets of this data infeasible and will make IPFS interop extremely difficult.
+
+This is only one 🔥 because there are plenty of useful sub-32GiB datasets and non-IPFS datasets.


This is true although there is additional impact here which is enabling people to store compositions of data sets.

If deals already exist on Filecoin for a dataset and then someone wants to reference that dataset (or some part of it) within theirs then the data has to be duplicated and stored in two separate deals. With this feature as long as there is a way to discover mappings of CID -> miner with CID (currently out of band, but is a required part of retrieval market work) then users don't need to store the same data twice (or worry about compositions exceeding 32GiB)

Co-authored-by: Vasco Santos <vasco.santos@ua.pt> Co-authored-by: Marcin Rataj <lidel@lidel.org>

aschmahmann · 2021-02-17T17:07:28Z

proposals/large-ipld-dags.md

+_How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?_
+
+1. Don't support datasets > 32GiB.
+2. Store large datasets as raw objects instead of IPFS files and accept the fact that these datasets


I guess a 3rd one like this could include allowing users to send a parallel DAG structure that only contains links if they want to be queryable and accepting that our selector options will be limited and some dealing with this manifest may be a pain

rvagg · 2021-02-18T02:21:13Z

Title could do with some work. "in Filecoin" would be helpful, but this is about partial dags too, so maybe it's "Support Storage and Retrieval of Large & Arbitrary IPLD DAGs in Filecoin".

But maybe this is up to three separate projects:

Support arbitrarily large DAGs in Filecoin
Support arbitrary and incomplete DAGs natively in Filecoin
Selector support for retrieval of partial DAGs

Stebalien · 2021-02-18T03:13:25Z

Yeah, this could be split into 3 mini projects. But the overarching goal is to be able to support large datasets, both for storage and retrieval. I'm not sure what breaking it into three parts would give us.

willscott · 2021-02-18T20:16:11Z

proposals/large-ipld-dags.md

+#### Dependencies/prerequisites
+<!--List any other projects that are dependencies/prerequisites for this project that is being pitched.-->
+
+None.


we should think about if this can be deferred or done in parallel with having the lotus client / market work using ipld-prime

Just using ipld-prime doesn't get us much. I need to be able to (a) make a deal over a selector and (b) retrieve a selector.

#27 is probably a dependency.

momack2 · 2021-03-01T08:53:56Z

proposals/large-ipld-dags.md

+#### Counterpoints &amp; pre-mortem
+_Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_
+
+The primary risk is that there may be a lack of demand to store large IPFS-formatted datasets in Filecoin. That is, users storing large datasets (> 32GiB) may all be using custom formats and may not care about IPFS files/directories, partial retrieval, etc.


is this the required path for partial retrievability - or is that a somewhat orthogonal (if related) problem?

I'm not sure how the comment relates to the paragraph so I may be misinterpreting it.

Step 2 of the "plan of attack" is required for partial retrieval.

momack2 · 2021-04-01T06:29:03Z

@Kubuxu could you review this please?

rvagg · 2021-04-01T07:17:16Z

proposals/large-ipld-dags.md

+   data for both storage and retrieval. This is especially true when interacting with IPFS.
+2. This workaround requires storing an "overlay" DAG in Filecoin (paying for that storage).
+
+Second, it should be possible to retrieve subsets of DAGs. While the underlying protocols support


The protocols support this - I think this is referring to graphsync and the other IPLD pieces down to the data storage - but the CLI doesn't. What about the miner side of this? The wording of this suggests that it's just the client CLI that's blocked on this, is that true? Can an alternative retrieval client use the protocols today to retrieve an arbitrary sub-DAG from a miner or is there more to be done on that side too?

I tested selector-based retrievals way back in August ( using a hardcoded selector in the client directly ) - they worked, in the context of everything else being flaky.

It's not a CLI issue, rather we do not have a decent selector interchange format in general ( a gob of cbor is not something to use over API/CLI )

In other words:

if today I want to specify a cid - I usually get to do the funny { "/":"baf..." } thing

if today I want to express a selector - I do... ❓

ahhhhh back to the "selector syntax" problem, we should just solve that properly eh? so close ipld/specs#239

Support Large IPLD/IPFS DAGs

b6ca808

Stebalien requested review from aschmahmann and magik6k February 17, 2021 05:43

vasco-santos reviewed Feb 17, 2021

View reviewed changes

proposals/large-ipld-dags.md Outdated Show resolved Hide resolved

lidel reviewed Feb 17, 2021

View reviewed changes

proposals/large-ipld-dags.md Show resolved Hide resolved

alanshaw reviewed Feb 17, 2021

View reviewed changes

proposals/large-ipld-dags.md Outdated Show resolved Hide resolved

aschmahmann reviewed Feb 17, 2021

View reviewed changes

Apply suggestions from code review

d04fb9e

Co-authored-by: Vasco Santos <vasco.santos@ua.pt> Co-authored-by: Marcin Rataj <lidel@lidel.org>

aschmahmann reviewed Feb 17, 2021

View reviewed changes

finish sentence

ccc481a

Stebalien changed the title ~~Support Large IPLD/IPFS DAGs~~ Support Large IPLD/IPFS DAGs in Filecoin Feb 18, 2021

Stebalien changed the title ~~Support Large IPLD/IPFS DAGs in Filecoin~~ Support Storage and Retrieval of Large & Arbitrary IPLD DAGs in Filecoin Feb 18, 2021

willscott reviewed Feb 18, 2021

View reviewed changes

momack2 reviewed Mar 1, 2021

View reviewed changes

rvagg reviewed Apr 1, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Storage and Retrieval of Large & Arbitrary IPLD DAGs in Filecoin #22

Support Storage and Retrieval of Large & Arbitrary IPLD DAGs in Filecoin #22

Stebalien commented Feb 17, 2021

aschmahmann Feb 17, 2021

aschmahmann Feb 17, 2021

rvagg commented Feb 18, 2021

Stebalien commented Feb 18, 2021

willscott Feb 18, 2021

Stebalien Mar 3, 2021

warpfork Mar 3, 2021

momack2 Mar 1, 2021

Stebalien Mar 3, 2021

momack2 commented Apr 1, 2021

rvagg Apr 1, 2021

ribasushi Apr 1, 2021

rvagg Apr 1, 2021


		At the moment, any tool wishing to support storing IPFS files/directories larger than 32GiB will need to store these IPFS files/directories as "raw blocks", throwing away all the DAG structural information. This will make future retrieval deals for subsets of this data infeasible and will make IPFS interop extremely difficult.

		This is only one 🔥 because there are plenty of useful sub-32GiB datasets and non-IPFS datasets.

Support Storage and Retrieval of Large & Arbitrary IPLD DAGs in Filecoin #22

Are you sure you want to change the base?

Support Storage and Retrieval of Large & Arbitrary IPLD DAGs in Filecoin #22

Conversation

Stebalien commented Feb 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg commented Feb 18, 2021

Stebalien commented Feb 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

momack2 commented Apr 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment