-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Proposal: DHT in JS for effective content/service discovery #30
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# Effective content and service discovery in JS | ||
|
||
Authors: @jacobheun | ||
|
||
Initial PR: TBD | ||
|
||
## Purpose & impact | ||
#### Background & intent | ||
_Describe the desired state of the world after this project? Why does that matter?_ | ||
<!-- | ||
Outline the status quo, including any relevant context on the problem you’re seeing that this project should solve. Wherever possible, include pains or problems that you’ve seen users experience to help motivate why solving this problem works towards top-line objectives. | ||
--> | ||
|
||
**Current State** | ||
- JS projects rely on delegate and preload nodes to be able to interact with the live network | ||
- PL hosted delegate/preload nodes are often overloaded negatively impact performance of projects | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gmasgras this sounds like something that is soluble from an infra perspective. Can you comment on our current preload node uptime/utilization? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The requirement to preload content also makes JS projects very bandwidth heavy - every piece of data I add to a JS node is transferred to a preload node, this is very expensive in terms of transfer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using DHT delegates suffers from similar problems, except it's sort of worse - we re-publish every hour so content survives the timed garbage collection, which means re-uploading every block, which scales poorly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I understand why delegated routing should be slow/expensive or what preload nodes are necessary in non-browser use cases. DHT publishing frequency with delegated routing should match what you'd expect with non-delegated routing, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the frequency should be the same, but as currently implemented with delegated routing, if a block has been garbage collected, you end up sending the data to the server each time for it to create the provider record - the whole DAG, not just the root block since the delegate has to be able to supply everything on behalf of your node.
They're not, at least, they shouldn't be - if the DHT implementation was complete and performant, which it isn't. This proposal is to finish up the work we did durning the hack week and make it so. Then we could turn off preload/delegate for everything that isn't a browser node. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With proper delegated routing shouldn't I be making my own provider records that point at me instead of having ones that points at the preload nodes? E.g. I ask the delegated router for the 20 closest peers then directly send them provider record puts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, but this is a limitation in the browser, as we will not be able to dial most of the network There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, but my comment was about non-browser nodes not needing to store their data on preload nodes. Browser nodes have a very rough time talking to the rest of the network at the moment, so they need to do a lot more delegating of work. Does anything in this proposal really help the browser node situation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The preload nodes also garbage collect the content periodically so over time the content becomes inaccessible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry if I'm just totally missing the idea here, but why are preload nodes and garbage collection on them relevant in this discussion? Aren't they only really needed if a node is unreachable (e.g. in a browser)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The point above is that the PL hosted infra is often overloaded which is a risk to project functionality. I was making the additional point that even if the nodes were not overloaded, they garbage collect anything uploaded to them, so the IPFS magic of 'add a file to your node, cat it from another node' is time-limited if the only way that content makes it to another node is via a preload node. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused, this doc states.
Putting data in preload nodes is not required to get NodeJS to play well with the rest of the network. |
||
|
||
vasco-santos marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#### Assumptions & hypotheses | ||
_What must be true for this project to matter?_ | ||
<!--(bullet list)--> | ||
|
||
TODO | ||
vasco-santos marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### User workflow example | ||
_How would a developer or user use this new capability?_ | ||
<!--(short paragraph)--> | ||
|
||
TODO | ||
|
||
#### Impact | ||
TODO | ||
|
||
<!-- | ||
Explain why you have chosen this rating | ||
What awesome potential impact/outcomes/results will we see if we nail this project? | ||
--> | ||
|
||
#### Leverage | ||
TODO | ||
|
||
<!-- Explain the opportunity or leverage point for our subsequent velocity/impact (e.g. by speeding up development, enabling more contributors, etc) | ||
--> | ||
|
||
#### Confidence | ||
_How sure are we that this impact would be realized? Label from [this scale](https://medium.com/@nimay/inside-product-introduction-to-feature-priority-using-ice-impact-confidence-ease-and-gist-5180434e5b15)_. | ||
|
||
<!--Explain why this rating--> | ||
|
||
|
||
## Project definition | ||
#### Brief plan of attack | ||
|
||
<!--Briefly describe the milestones/steps/work needed for this project--> | ||
|
||
The focus of this effort is to build the DHT to function solely in Node.js. Running a DHT in browser is not currently viable. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's a stupid question from someone with little exposure to js ipfs or js ecosystem at large. What is the benefit to building out functionality in node.js as opposed to relying on a delegate go-ipfs nodes? Does the value come from having multiple fully functional implementations? Is there a positive impact on in-browser nodes from delegating DHT functionality to a node.js ipfs node instead of a go-ipfs node? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not a stupid question, and both are viable options with pros/cons and I think there is merit in both. I will expand more here when I have time to flush this out, but I think there is significant work that needs to be done improving remote api access to Go nodes and that should be a separate proposal. There is a lot to be gained there for new developers. The DHT in JS is relatively short term in comparison and immediately improves usability of the JS ecosystem of projects for existing developers (we currently have users leveraging JS and Go in hacky ways to get around some limitations in these systems; flexible IPLD in JS, performant DHT in Go). |
||
|
||
- Week 0 - Complete the DHT specification (DHT protocol expertise) | ||
- Create a test plan in Testground to evaluate query performance | ||
- Implement DHT routing table construction and refresh | ||
- Implement the improved query logic | ||
- Implement Peer Eviction logic | ||
|
||
|
||
**Assumptions** | ||
- Routing table diversity is not criticial and can be implemented later | ||
- We can start with "Client Mode" only support to minimize the surface area of the solution | ||
|
||
#### What does done look like? | ||
|
||
- js-ipfs in Node.js ships with the DHT enabled in client mode by default | ||
- js DHT query times are comparable to Go over TCP/Noise connections | ||
|
||
#### What does success look like? | ||
_Success means impact. How will we know we did the right thing?_ | ||
|
||
<!-- | ||
Provide success criteria. These might include particular metrics, desired changes in the types of bug reports being filed, desired changes in qualitative user feedback (measured via surveys, etc), etc. | ||
--> | ||
|
||
#### Counterpoints & pre-mortem | ||
_Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_ | ||
|
||
#### Alternatives | ||
_How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?_ | ||
|
||
#### Dependencies/prerequisites | ||
<!--List any other projects that are dependencies/prerequisites for this project that is being pitched.--> | ||
|
||
#### Future opportunities | ||
<!--What future projects/opportunities could this project enable?--> | ||
|
||
Being able to effectively find content and services on the network unlocks core functionality of IPFS for the JS ecosystem that removes more of its reliance on a paired Go IPFS node. This will enable more JS engineers to build Full Stack solutions on top of the web3 stack. | ||
|
||
## Required resources | ||
|
||
#### Effort estimate | ||
|
||
- Medium, 3-5 weeks | ||
|
||
This should be reasonable to do within a 5 week period given that the DHT protocol is well known. A concerted effort should be made within the first week, owned by DHT protocol experts, to produce a specification for the JS engineers to continue working off of. | ||
|
||
#### Roles / skills needed | ||
|
||
- DHT protocol expertise; Ideally this would result in a finished specification for the existing state of the libp2p DHT. | ||
- js-libp2p expertise; 2-3 JS Engineers to implememnt the DHT spec | ||
- Testground expertise for scale testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"JS projects" here is not just the browser, it's node, electron, react native etc. E.g. web, desktop and mobile.