-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: underlying websocket connection leak #1186
Comments
Posting my original investigation here:
Everytime we accept on the fd number 11, the returned fd get stuck
The fd get released correctly
Last sign of life we get for each stuck fd |
Websockets stop responding after some time due to socket leak: waku-org/nwaku#1186 Signed-off-by: Jakub Sokołowski <jakub@status.im>
Added restarts every 6 hours with random 3 hour stagger to mitigate this temporarily: |
Websockets stop responding after some time due to socket leak: waku-org/nwaku#1186 Signed-off-by: Jakub Sokołowski <jakub@status.im>
waku-org/nwaku#1186 Signed-off-by: Jakub Sokołowski <jakub@status.im>
With many of the nim-libp2p/websock team participating at devcon, the fix is targeted for beginning-December (and release |
waku-org/nwaku#1186 Signed-off-by: Jakub Sokołowski <jakub@status.im>
Okay, then if there is no fix in sight I will simply disable the Consul Websocket healthchecks, because I don't want to get pinged for a known issue that is not currently being researched: status-im/infra-role-nim-waku@b9868446 |
Status update: So again, somewhere in the stack this is not handled properly. Investigation will continue next week. |
@Menduist: Would it be worth merging and release the PRs above so that we can improve the status quo? |
@jakubgs what is the restart frequency at the moment for status.prod and status.test please? I tried to check the config but not able to relate. |
@fryorcraken It's literally in he second comment on this issue, with commits: #1186 (comment) |
Update: |
Merged libp2p & websock PR, now waiting on status-im/nim-chronos#330 |
@Menduist are you aware of any progress in merging the underlying fixes? |
No :/ |
Problem
An underlying bug is causing some incoming websocket connections to "leak", resulting in an increase in open file descriptors. It's not yet clear whether this is related to the
nim-libp2p
,nim-chronos
ornim-websock
libraries, so this tracking issue will be linked to the underlying issue and relevant repo once it's created.Impact
Eventually a node exhibiting this leak will reach the maximum amount of open file descriptors. This could lead to:
To reproduce
This has been observed on the
status.prod
andwakuv2.test
fleets. Thestatus.test
fleet, which runs the same version ofnwaku
has not shown the same leak. This is likely because the leak is triggered by very specific (possibly erratic) websocket behaviour exhibited by client(s) connected to only the fleets with leak detected.Expected behavior
Connections should close properly and the corresponding file descriptor released in all cases.
Screenshots/logs
7 days of open file descriptor count on
status.prod
nwaku version/commit hash
latest
master
on 2022/09/21: commit11832411
cc @Menduist
The text was updated successfully, but these errors were encountered: