-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record empty responses when retrying a peer task #5509
Record empty responses when retrying a peer task #5509
Conversation
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
|
da95d50
to
063c129
Compare
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
063c129
to
a20c5ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice refactor. one q on a change from getMaxPeers to getPeerLowerBound
@@ -136,7 +125,7 @@ private void refreshPeers() { | |||
// If we are at max connections, then refresh peers disconnecting one of the failed peers, | |||
// or the least useful | |||
|
|||
if (peers.peerCount() >= peers.getMaxPeers()) { | |||
if (peers.peerCount() >= peers.getPeerLowerBound()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be upperBound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to force Besu to actively search for new peers, and my understanding is that this happens when the number of peers is below the lower bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment (or edit the existing comment) because the code is now doing something slightly different
@@ -90,24 +89,21 @@ protected CompletableFuture<List<BlockHeader>> executeTaskOnCurrentPeer(final Et | |||
referenceHash, | |||
peer, | |||
peerResult.getResult()); | |||
if (peerResult.getResult().isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this change the behavior of the code that could listen to the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this exception was intercepted by AbstractRetryingSwitchingPeerTask
as a retryable error, but since now the emptiness check is centralized in AbstractRetryingPeerTask
there is not more need for this custom handling here
if (!isEmptyResponse.test(peerResult)) { | ||
retryCount = 0; | ||
if (successfulResult(peerResult)) { | ||
result.complete(peerResult); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how it worked before without this success part. Is this something that is done automatically and that we don't really need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before each extending task had to set the final result by itself, using result.complete
, and another thing that was a bit confusing to me, is that the returned value of a task is not the final result until you set result.complete
and so I tried to centralize this part here and make the writing of extending tasks easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Waiting for the test results
SGTM. Do we demote the peer as soon as we get one empty response from it? If so, are there not cases where we receive an empty result but still want to retry with the same peer? |
|
||
@Override | ||
protected boolean emptyResult(final AccountRangeMessage.AccountRangeData data) { | ||
return data.accounts().isEmpty() && data.proofs().isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty response for SnapProtocol/v1 is not necessarily a reason to demote a peer. If we are requesting a range that is outside of the 128 block snap range, an empty response is in-protocol. We should add additional criteria to only demote peers that give empty range that is withing 128 blocks of head.
Otherwise we might end up with snap sync performance regression by dropping peers from which we ask for old ranges (while we have an old pivot block)
|
||
@Override | ||
protected boolean emptyResult(final StorageRangeMessage.SlotRangeData peerResult) { | ||
return peerResult.proofs().isEmpty() && peerResult.slots().isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, an empty response might be a signal that we are asking for old ranges.
Blocked by #6609 - without that we would disconnect peers, because we are trying peers that do not serve snap data. |
@Matilda-Clerke his could be superseded by #7602 too |
@fab-10 I haven't included any feedback regarding peer performance or quality in the refactor yet, so it should be possible to ensure we provide feedback for empty responses too. I'll take a good look at this PR to make sure I understand the intended behaviour. |
PR description
AbstractRetryingPeerTask
has a way to understand if the result of a try is empty, but it only uses this information to discriminate if the result is a partial one, instead is very useful to also use the emptiness information to demote the peer and eventually disconnect it in case it sends too many useless responses, as done in this PR.In the making of this PR, I discovered that there were opportunities to improve the code and simplify the writing of retrying tasks, so I refactored and documented the code so that any class extending
AbstractRetryingPeerTask
should not set the final task result by themself, but instead implements theemptyResult
andsuccessfulResult
to report the status of the request, so that the final setting of the task result is always a duty ofAbstractRetryingPeerTask
, removing the different approaches used before.relates to #5415 and #5271
Tests