You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the context: as part of the Aeron Cluster monitoring we compare snapshots made at the same log position. This allows us to detect divergence of the state in case of bugs which introduce non-deterministic logic.
From time to time we've been detecting consensus module producing different snapshots on different nodes. The difference is in the nextSessionId field. After some investigation I found that ConsensusModuleAgent#nextSessionId on the leader is updated at the same time as adding the "session open" message to the log, while on the followers it is updated when it reaches the "session open" message.
Consider following scenario:
A snapshot command is issued
Leader node adds the snapshot message to the log
A new client is connected
Leader node increments the nextSessionId and adds the "session open" message to the log
Nodes reach the snapshot message and take a snapshot (at this point leader and followers have different nextSessionId)
Followers reach the "session open" message in the log and increment the nextSessionId (now all nodes have same nextSessionId)
Is it expected that nodes in the cluster may have different consensus module snapshots? Or should the leader write the same nextSessionId value as a follower would?
The text was updated successfully, but these errors were encountered:
For the context: as part of the Aeron Cluster monitoring we compare snapshots made at the same log position. This allows us to detect divergence of the state in case of bugs which introduce non-deterministic logic.
From time to time we've been detecting consensus module producing different snapshots on different nodes. The difference is in the
nextSessionId
field. After some investigation I found thatConsensusModuleAgent#nextSessionId
on the leader is updated at the same time as adding the "session open" message to the log, while on the followers it is updated when it reaches the "session open" message.Consider following scenario:
nextSessionId
)nextSessionId
(now all nodes have samenextSessionId
)Is it expected that nodes in the cluster may have different consensus module snapshots? Or should the leader write the same
nextSessionId
value as a follower would?The text was updated successfully, but these errors were encountered: