Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record empty responses when retrying a peer task #5509

Conversation

fab-10
Copy link
Contributor

@fab-10 fab-10 commented May 29, 2023

PR description

AbstractRetryingPeerTask has a way to understand if the result of a try is empty, but it only uses this information to discriminate if the result is a partial one, instead is very useful to also use the emptiness information to demote the peer and eventually disconnect it in case it sends too many useless responses, as done in this PR.
In the making of this PR, I discovered that there were opportunities to improve the code and simplify the writing of retrying tasks, so I refactored and documented the code so that any class extending AbstractRetryingPeerTask should not set the final task result by themself, but instead implements the emptyResult and successfulResult to report the status of the request, so that the final setting of the task result is always a duty of AbstractRetryingPeerTask, removing the different approaches used before.

relates to #5415 and #5271

Tests

  • Checkpoint Sync
  • Snap Sync
  • Fast Sync
  • Full sync

Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
@github-actions
Copy link

github-actions bot commented May 29, 2023

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I have considered running ./gradlew acceptanceTestNonMainnet locally if my PR affects non-mainnet modules.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

@fab-10 fab-10 force-pushed the demote-peer-with-empty-responses-when-retrying branch from da95d50 to 063c129 Compare May 29, 2023 14:28
Signed-off-by: Fabio Di Fabio <fabio.difabio@consensys.net>
@fab-10 fab-10 force-pushed the demote-peer-with-empty-responses-when-retrying branch from 063c129 to a20c5ed Compare May 29, 2023 16:16
@fab-10 fab-10 marked this pull request as ready for review May 29, 2023 16:38
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice refactor. one q on a change from getMaxPeers to getPeerLowerBound

@@ -136,7 +125,7 @@ private void refreshPeers() {
// If we are at max connections, then refresh peers disconnecting one of the failed peers,
// or the least useful

if (peers.peerCount() >= peers.getMaxPeers()) {
if (peers.peerCount() >= peers.getPeerLowerBound()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be upperBound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to force Besu to actively search for new peers, and my understanding is that this happens when the number of peers is below the lower bound

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment (or edit the existing comment) because the code is now doing something slightly different

@@ -90,24 +89,21 @@ protected CompletableFuture<List<BlockHeader>> executeTaskOnCurrentPeer(final Et
referenceHash,
peer,
peerResult.getResult());
if (peerResult.getResult().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this change the behavior of the code that could listen to the exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this exception was intercepted by AbstractRetryingSwitchingPeerTask as a retryable error, but since now the emptiness check is centralized in AbstractRetryingPeerTask there is not more need for this custom handling here

if (!isEmptyResponse.test(peerResult)) {
retryCount = 0;
if (successfulResult(peerResult)) {
result.complete(peerResult);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how it worked before without this success part. Is this something that is done automatically and that we don't really need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before each extending task had to set the final result by itself, using result.complete, and another thing that was a bit confusing to me, is that the returned value of a task is not the final result until you set result.complete and so I tried to centralize this part here and make the writing of extending tasks easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

Copy link
Contributor

@matkt matkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting for the test results

@siladu
Copy link
Contributor

siladu commented Jun 1, 2023

SGTM. Do we demote the peer as soon as we get one empty response from it? If so, are there not cases where we receive an empty result but still want to retry with the same peer?


@Override
protected boolean emptyResult(final AccountRangeMessage.AccountRangeData data) {
return data.accounts().isEmpty() && data.proofs().isEmpty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty response for SnapProtocol/v1 is not necessarily a reason to demote a peer. If we are requesting a range that is outside of the 128 block snap range, an empty response is in-protocol. We should add additional criteria to only demote peers that give empty range that is withing 128 blocks of head.

Otherwise we might end up with snap sync performance regression by dropping peers from which we ask for old ranges (while we have an old pivot block)


@Override
protected boolean emptyResult(final StorageRangeMessage.SlotRangeData peerResult) {
return peerResult.proofs().isEmpty() && peerResult.slots().isEmpty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, an empty response might be a signal that we are asking for old ranges.

@macfarla
Copy link
Contributor

Blocked by #6609 - without that we would disconnect peers, because we are trying peers that do not serve snap data.

@fab-10
Copy link
Contributor Author

fab-10 commented Sep 11, 2024

@Matilda-Clerke his could be superseded by #7602 too

@Matilda-Clerke
Copy link
Contributor

@fab-10 I haven't included any feedback regarding peer performance or quality in the refactor yet, so it should be possible to ensure we provide feedback for empty responses too. I'll take a good look at this PR to make sure I understand the intended behaviour.

@jframe jframe closed this Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants