Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Proposal: DHT in JS for effective content/service discovery #30

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions proposals/dht-js.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Effective content and service discovery in JS

Authors: @jacobheun

Initial PR: TBD

## Purpose & impact
#### Background & intent
_Describe the desired state of the world after this project? Why does that matter?_
<!--
Outline the status quo, including any relevant context on the problem you’re seeing that this project should solve. Wherever possible, include pains or problems that you’ve seen users experience to help motivate why solving this problem works towards top-line objectives.
-->

**Current State**
- JS projects rely on delegate and preload nodes to be able to interact with the live network

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"JS projects" here is not just the browser, it's node, electron, react native etc. E.g. web, desktop and mobile.

- PL hosted delegate/preload nodes are often overloaded negatively impact performance of projects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmasgras this sounds like something that is soluble from an infra perspective. Can you comment on our current preload node uptime/utilization?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirement to preload content also makes JS projects very bandwidth heavy - every piece of data I add to a JS node is transferred to a preload node, this is very expensive in terms of transfer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using DHT delegates suffers from similar problems, except it's sort of worse - we re-publish every hour so content survives the timed garbage collection, which means re-uploading every block, which scales poorly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why delegated routing should be slow/expensive or what preload nodes are necessary in non-browser use cases.

DHT publishing frequency with delegated routing should match what you'd expect with non-delegated routing, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the frequency should be the same, but as currently implemented with delegated routing, if a block has been garbage collected, you end up sending the data to the server each time for it to create the provider record - the whole DAG, not just the root block since the delegate has to be able to supply everything on behalf of your node.

what preload nodes are necessary in non-browser use cases

They're not, at least, they shouldn't be - if the DHT implementation was complete and performant, which it isn't. This proposal is to finish up the work we did durning the hack week and make it so. Then we could turn off preload/delegate for everything that isn't a browser node.

Copy link
Contributor

@aschmahmann aschmahmann Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With proper delegated routing shouldn't I be making my own provider records that point at me instead of having ones that points at the preload nodes?

E.g. I ask the delegated router for the 20 closest peers then directly send them provider record puts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. I ask the delegated router for the 20 closest peers then directly send them provider record puts.

Yes, but this is a limitation in the browser, as we will not be able to dial most of the network

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but my comment was about non-browser nodes not needing to store their data on preload nodes.

Browser nodes have a very rough time talking to the rest of the network at the moment, so they need to do a lot more delegating of work. Does anything in this proposal really help the browser node situation?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preload nodes also garbage collect the content periodically so over time the content becomes inaccessible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm just totally missing the idea here, but why are preload nodes and garbage collection on them relevant in this discussion? Aren't they only really needed if a node is unreachable (e.g. in a browser)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point above is that the PL hosted infra is often overloaded which is a risk to project functionality.

I was making the additional point that even if the nodes were not overloaded, they garbage collect anything uploaded to them, so the IPFS magic of 'add a file to your node, cat it from another node' is time-limited if the only way that content makes it to another node is via a preload node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, this doc states.

The focus of this effort is to build the DHT to function solely in Node.js. Running a DHT in browser is not currently viable.

Putting data in preload nodes is not required to get NodeJS to play well with the rest of the network.


#### Assumptions &amp; hypotheses
_What must be true for this project to matter?_
<!--(bullet list)-->

TODO

#### User workflow example
_How would a developer or user use this new capability?_
<!--(short paragraph)-->

TODO

#### Impact
TODO

<!--
Explain why you have chosen this rating
What awesome potential impact/outcomes/results will we see if we nail this project?
-->

#### Leverage
TODO

<!-- Explain the opportunity or leverage point for our subsequent velocity/impact (e.g. by speeding up development, enabling more contributors, etc)
-->

#### Confidence
_How sure are we that this impact would be realized? Label from [this scale](https://medium.com/@nimay/inside-product-introduction-to-feature-priority-using-ice-impact-confidence-ease-and-gist-5180434e5b15)_.

<!--Explain why this rating-->


## Project definition
#### Brief plan of attack

<!--Briefly describe the milestones/steps/work needed for this project-->

The focus of this effort is to build the DHT to function solely in Node.js. Running a DHT in browser is not currently viable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a stupid question from someone with little exposure to js ipfs or js ecosystem at large. What is the benefit to building out functionality in node.js as opposed to relying on a delegate go-ipfs nodes? Does the value come from having multiple fully functional implementations? Is there a positive impact on in-browser nodes from delegating DHT functionality to a node.js ipfs node instead of a go-ipfs node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a stupid question, and both are viable options with pros/cons and I think there is merit in both. I will expand more here when I have time to flush this out, but I think there is significant work that needs to be done improving remote api access to Go nodes and that should be a separate proposal. There is a lot to be gained there for new developers. The DHT in JS is relatively short term in comparison and immediately improves usability of the JS ecosystem of projects for existing developers (we currently have users leveraging JS and Go in hacky ways to get around some limitations in these systems; flexible IPLD in JS, performant DHT in Go).


- Week 0 - Complete the DHT specification (DHT protocol expertise)
- Create a test plan in Testground to evaluate query performance
- Implement DHT routing table construction and refresh
- Implement the improved query logic
- Implement Peer Eviction logic


**Assumptions**
- Routing table diversity is not criticial and can be implemented later
- We can start with "Client Mode" only support to minimize the surface area of the solution

#### What does done look like?

- js-ipfs in Node.js ships with the DHT enabled in client mode by default
- js DHT query times are comparable to Go over TCP/Noise connections

#### What does success look like?
_Success means impact. How will we know we did the right thing?_

<!--
Provide success criteria. These might include particular metrics, desired changes in the types of bug reports being filed, desired changes in qualitative user feedback (measured via surveys, etc), etc.
-->

#### Counterpoints &amp; pre-mortem
_Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_

#### Alternatives
_How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?_

#### Dependencies/prerequisites
<!--List any other projects that are dependencies/prerequisites for this project that is being pitched.-->

#### Future opportunities
<!--What future projects/opportunities could this project enable?-->

Being able to effectively find content and services on the network unlocks core functionality of IPFS for the JS ecosystem that removes more of its reliance on a paired Go IPFS node. This will enable more JS engineers to build Full Stack solutions on top of the web3 stack.

## Required resources

#### Effort estimate

- Medium, 3-5 weeks

This should be reasonable to do within a 5 week period given that the DHT protocol is well known. A concerted effort should be made within the first week, owned by DHT protocol experts, to produce a specification for the JS engineers to continue working off of.

#### Roles / skills needed

- DHT protocol expertise; Ideally this would result in a finished specification for the existing state of the libp2p DHT.
- js-libp2p expertise; 2-3 JS Engineers to implememnt the DHT spec
- Testground expertise for scale testing