Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future Improvements for ConnectionManager #1189

Closed
6 of 8 tasks
danisharora099 opened this issue Feb 20, 2023 · 9 comments
Closed
6 of 8 tasks

Future Improvements for ConnectionManager #1189

danisharora099 opened this issue Feb 20, 2023 · 9 comments
Assignees
Labels
blocked This issue is blocked by some other work enhancement New feature or request

Comments

@danisharora099
Copy link
Collaborator

danisharora099 commented Feb 20, 2023

With #1135, the ConnectionManager module was introduced.

This issue intends to track future/iterative work around the same.

TODO:

@danisharora099 danisharora099 self-assigned this Feb 20, 2023
@fryorcraken fryorcraken added this to Waku Feb 20, 2023
@danisharora099 danisharora099 removed their assignment Feb 20, 2023
@danisharora099 danisharora099 added the enhancement New feature or request label Feb 20, 2023
@fryorcraken
Copy link
Collaborator

Note on peer:disconnect: could the keep alive manager help detect dropped connections thanks to the ipfs/ping protocol?

add regular logs for available/connected nodes

would be good to understand the needs. Talk to current platforms. Is the need mainly to debug/dog food peer exchange?

@danisharora099
Copy link
Collaborator Author

danisharora099 commented Mar 15, 2023

look into peer:disconnect issues

Steps taken through waku-org/examples.waku.org#219:

  • ran an nwaku node locally
  • connected browser to the nwaku node
  • setup the peer:disconnect event listener & ran recurring pings with that node

Findings:

  • Upon going offline from the devtools while the node is running, pings start to fail but the connection is not lost nor is the peer:disonnect event triggered.

    • The nwaku node still shows the peer as connected
    • Upon going online from browser again, connection is re-established and starts to ping as normal
    • Note: no peer-disconnect event is emitted during this process
    • TODO: to check how long does it take since the pings start to fail for the connection to get completely destroyed and peer:disconnect is emitted
      • update: left it on offline for 5+ mins without a disconnection and was able to successfully connect back on going online again
  • Upon closing the node when the connection is established, the peer:disconnect event is straightaway emitted

Imo, according to the above findings, it is safe to keep the code as it is -

  • using peer:disconnect to detect proper disconnections and only then stopping the (keep-alive) pings
  • using pings in place as it is (while not sure what purpose they serve right now, but can be extended into displaying online/offline status for a node if exposed/necessary)
    • also looking into if IPFS pings can replace the manual relay pings

cc @fryorcraken

@danisharora099
Copy link
Collaborator Author

danisharora099 commented Mar 15, 2023

confirm if the relay keep-alive pings are still needed

According to libp2p/js-libp2p#744:

👍 In the majority of cases a ping on that connection should suffice, but we'll need to test this on the different transports. This is also really important for remote listening (webrtc-star, relay, etc).

by this logic, it should be okay to only have the libp2p ping and remove the relay but as mentioned, it needs to be tested against different transports.

  • TODO: try to reproduce the connection timings out without the relay & ping protocols
    • once reproduced, add ping protocol to check if it resolves

@fryorcraken
Copy link
Collaborator

All good findings.

You can disable relay ping in the API.
I'd use eth-pm example, disable relay ping and check that relay still works 10 minutes in.
If it does, then removing relay ping seems reasonable.

I agree with using ping to detect disconnection. Maybe we could emit an event upon ping failure to start going in this direction. And also an event upon ping reconnection.

This can then be the basis to have automated behaviour such as checking filter subscriptions or attempting store query after reconnection

@danisharora099 danisharora099 moved this from To Do to In Progress in Waku Mar 21, 2023
@danisharora099
Copy link
Collaborator Author

danisharora099 commented Apr 3, 2023

You can disable relay ping in the API. I'd use eth-pm example, disable relay ping and check that relay still works 10 minutes in. If it does, then removing relay ping seems reasonable.

Steps taken with the above investigation:

  • Running an nwaku node locally with relay:true
  • Disabling relay pings in js-waku
  • Dialing eth-pmto the locally spun nwaku node

Findings: Disabling relay pings (along with keeping libp2pPing) the connection only stayed alive for ~10 minutes.
Screenshot 2023-04-03 at 2 39 03 PM
Screenshot 2023-04-03 at 2 38 48 PM

Conclusion: Relay pings should be kept as it is


As for:

I agree with using ping to detect disconnection. Maybe we could emit an event upon ping failure to start going in this direction. And also an event upon ping reconnection.

As mentioned here:

Imo, according to the above findings, it is safe to keep the code as it is -

  • using peer:disconnect to detect proper disconnections and only then stopping the (keep-alive) pings
  • using pings in place as it is (while not sure what purpose they serve right now, but can be extended into displaying online/offline status for a node if exposed/necessary)

I believe it's best to watch for peer:disconnect to detect permanent disconnections.
Failure of pings can be a good way to deduce if the connection is temporarily lost, with a potential of reconnection on the same connection.
However, when a peer:disconnect event is fired, it can be deduced that the base connection has been closed and in order to reconnect, a new connection would have to be opened.

@fryorcraken

@fryorcraken
Copy link
Collaborator

Thanks for the investigation. This is annoying because we basically spam the network with relay ping messages when they don't need to be relayed.

Next step I suggest: open an issue to investigate why we need relay pings and understand if there's any other way to do it.
Surely, if after 10min a connection or stream close, the software should try to re-open it.

after 10min, does message sending AND receiving start failing or does only receiving fails?

@danisharora099
Copy link
Collaborator Author

Thanks for the investigation. This is annoying because we basically spam the network with relay ping messages when they don't need to be relayed.

Next step I suggest: open an issue to investigate why we need relay pings and understand if there's any other way to do it. Surely, if after 10min a connection or stream close, the software should try to re-open it.

after 10min, does message sending AND receiving start failing or does only receiving fails?

possibly related with #1403

@danisharora099
Copy link
Collaborator Author

Regarding gauging peer disconnections:
it might be possible that there are improvements done with the 0.45 libp2p upgrade through the connection:close listener along with the peer:disconnect handler: https://github.com/libp2p/js-libp2p/blob/70fe699c424c22f81e39c47631fda73abc1cae9e/doc/migrations/v0.44-v0.45.md?plain=1#L143

@danisharora099
Copy link
Collaborator Author

danisharora099 commented Jun 30, 2023

Closing this issue in favour of separately tracked issues for two items

@github-project-automation github-project-automation bot moved this from In Progress to Done in Waku Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked This issue is blocked by some other work enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants