-
Notifications
You must be signed in to change notification settings - Fork 539
Consensus failure cycling AcceptState
-> ValidateState
-> RoundChangeState
-> AcceptState
#248
Comments
AcceptState
-> ValidateState
-> RoundChangeState
-> AcceptState
AcceptState
-> ValidateState
-> RoundChangeState
-> AcceptState
Hi! I think resetting the messages for commit and prepare is okay since the future messages are pending on the message queue anyway and not in the state yet. I am not sure yet about resetting round changes though. |
@ferranbt thanks. Yeah we are still looking into how this can happen and trying to reproduce. We are certain all the nodes were online and communicating at the time. |
Hey @mrwillis, I just wanted to jump in for a quick question. Why do you mention that the max number of faulty nodes is 4 in an 8 node cluster? |
@zivkovicmilos sorry that's a total typo. |
Hey @mrwillis, thank you for your inputs! In this situation, the code in I have addressed this issue in the PR #263 by limiting this value to maximum of 5 mins. Also issues #245 and #261 seem to describe the same problem and should be solved with this. |
Consensus failure cycling
AcceptState
->ValidateState
->RoundChangeState
->AcceptState
Description
Our testnet at the time was an 8 node cluster. 7 nodes were online. We experienced a consensus failure and nodes could not agree on a shared state and hence produce blocks. Max faulty nodes is thus 4.
We're unsure why this happens. We're digging more into it, but one thing we can't understand is why are all the round messages reset everytime the node enters the
AcceptState
?https://github.com/0xPolygon/polygon-sdk/blob/6c91309a4f633c5e359b7b76ae383aca43a51e5c/consensus/ibft/ibft.go#L529
Your environment
develop
Steps to reproduce
Expected behaviour
Consensus failure should not occur.
Actual behaviour
Consensus failure occurred and nodes were unable to write blocks.
Logs
I have only included the logs from Validators 1-4 here because 5,6,7 were not as verbose, but had identical behaviour. We added custom logging and will continue to do so. The
roundchange state received msg: roundLength=
is counting the number of round change messages received at the particular round.Validator 1:
Validator 2:
Validator 4
Proposed solution
If you have an idea of how to fix this issue, please write it down here, so we can begin discussing it
The text was updated successfully, but these errors were encountered: