Don't commit entries from previous terms by counting replicas + noop entry upon leader election #336

MathieuBordere · 2022-12-13T13:50:48Z

An entry from a previous term will no longer be committed by counting votes (which violates the raft spec), instead upon leader election a leader applies a noop barrier entry to its log. When committing the noop barrier entry, all previous entries in the log will be committed.

The noop entry upon leader election is needed because raft, on startup doesn't know if its latest configuration loaded from disk is committed or not, therefore it must set configuration_uncommitted_index (to be added in following PR) and will reject any configuration changes requested by the client. To clear configuration_uncommitted_index the entry at that index must be committed. It's also needed when a follower becomes leader while still having a non-0 configuration_uncommitted_index.

This also includes a fix to update last_applied before firing an applied request's callback. The callback could contain logic dependent on last_applied which should have logically increased after applying the entry, but before firing the callback. This fixes an issue in dqlite where a barrier could fire another barrier in its callback.

fixes #220.

! This will break the dqlite tests, but not dqlite's functionality, I've updated raft's version so dqlite's configure step can test against it.

codecov · 2022-12-13T14:00:04Z

Codecov Report

Merging #336 (5cc0bed) into master (85059c4) will increase coverage by 0.07%.
The diff coverage is 86.66%.

❗ Current head 5cc0bed differs from pull request most recent head b4bdc8a. Consider uploading reports for the commit b4bdc8a to get more accurate results

@@            Coverage Diff             @@
##           master     #336      +/-   ##
==========================================
+ Coverage   84.12%   84.19%   +0.07%     
==========================================
  Files          50       50              
  Lines        9416     9460      +44     
  Branches     2500     2515      +15     
==========================================
+ Hits         7921     7965      +44     
+ Misses        940      936       -4     
- Partials      555      559       +4

Impacted Files	Coverage Δ
src/convert.c	`86.66% <75.00%> (+5.21%)`	⬆️
src/replication.c	`78.12% <89.58%> (+0.38%)`	⬆️
src/fixture.c	`94.62% <0.00%> (-0.10%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

src/replication.c

MathieuBordere · 2022-12-13T14:23:20Z

Converted to draft to optimize the vote counting a bit first.

MathieuBordere · 2022-12-13T16:48:34Z

Added an optimization, need to test it a bit more to feel comfortable to release it for review. In short, raft will only try to commit log entry indices that are smaller than or equal to the largest match_index of another voter in the cluster (except in some corner cases).

This fixes a case where the loop that probes for new commit indices would run needlessly for e.g. a 1000 times when the leader's match index is for example 5000 and the next voter's match index is only 4000. In this case it only makes sense to start looking for a new commit index from 4000 and below.

MathieuBordere · 2022-12-14T09:34:42Z

The overhaul of the replicationQuorum logic might seem like a bit of overkill, but the old implementation contained a bug I think.
It's uncovered in the code I added to the test here.
In short, when a leader continuously receives apply requests, it will continuously update its last_stored index, that will always remain higher than the match_index of the follower, resulting in a commit_index that will never be updated.

src/convert.c

src/replication.c

freeekanayaka · 2022-12-14T10:23:06Z

src/replication.c

-    if (index <= r->commit_index) {
-        return;
+    /* In a single voter cluster where this node is part of the configuration,
+     * is a voter all stored entries can be committed. */


I think is a voter can be dropped.

Thanks, will investigate if it can be dropped.

Thanks, will investigate if it can be dropped.

Not sure it's clear, I mean that the words is a voter can be dropped from the comment (probably they are a leftover? the full comment does quite sound grammatically correct to me).

Ah okay, yes the sentence indeed looks a bit strange.

tests also still pass when I remove r->configuration.servers[server_index].role == RAFT_VOTER from the check, I will investigate anyway to make sure my understanding is correct.

tests also still pass when I remove r->configuration.servers[server_index].role == RAFT_VOTER from the check, I will investigate anyway to make sure my understanding is correct.

I was a bit suspicious about that condition too, but didn't mention it actually. I would not expect that cases where there is a single server and that server is not a voter are possible, because otherwise there would be no way to make progress and commit further entries. So that condition could be turned into an assertion. However, I'm just hand-waving, not sure.

The check actually says that there is 1 voter in the cluster, not that there is a single server. Checking for r->configuration.servers[server_index].role == RAFT_VOTER then means that this server was not demoted to standby or spare in the latest configuration and is busy committing the configuration change.

Ah yeah sorry, I misread the configurationVoterCount call, that's the number of voters indeed, not servers. Makes sense then.

Ok, I double-checked and I think the condition is fine. The test still succeed after removing the RAFT_VOTER condition because the test is less strict then and the start_index is chosen to be last_stored on the current leader, which will be the highest possible index that can be committed anyway.

src/replication.c

freeekanayaka

Looks good to me at first sight. Just a new minor nits.

src/replication.c

configure.ac

src/convert.c

cole-miller · 2022-12-14T15:04:07Z

src/replication.c

+     * of matchIndex[i] >= N, and log[N].term == currentTerm:
+     * set commitIndex = N (5.3, 5.4).
+     * */
+    start_index = replicationCalculateCommitIndex(r);


I just thought of a different approach here: sort the nodes' match indices from highest to lowest, then check for a majority at each of those indices only. Whatever the highest index we can commit is, it's going to be equal to the match index i of some node n, such that for i we have a majority but for i + 1 we don't (because we lose n). That way the number of runs of the loop body below is bounded by the number of nodes, instead of the length of the log. What do you think?

Actually, it'll always be the median (rounded up) of that sorted array of match indices, right? So no looping at all.

Yeah, I think it's worth it to optimize this, let me think about it for a bit.

I actually think we can keep the current (what's on master) implementation of replicationQuorum, while adding the check to not commit entries by counting if the term is older than the current term, on the condition that we call it here

raft/src/replication.c

Line 754 in faadebb

replicationQuorum(r, r->last_stored);

with last_index instead of r->last_stored. last_index is equal to the match_index of the server from which we have just received an AppendEntriesResultRPC. (due to

raft/src/replication.c

Line 722 in faadebb

if (!progressMaybeUpdate(r, i, last_index)) {

)

This works because a new commit_index is always equal to some server's match_index and all match_index updates trickle down to the replicationQuorum call. Unless this call

raft/src/replication.c

Line 746 in faadebb

rv = triggerActualPromotion(r);

fails, but then we convert to RAFT_UNAVAILABLE anyway.

It's a lot simpler solution than the current one proposed in this PR (and still passes the new tests). Let me know what you think.

That reasoning seems sound to me, and I like the simplification. I guess that here

raft/src/replication.c

Lines 520 to 521 in 0ddb329

/* Check if we can commit some new entries. */

replicationQuorum(r, r->last_stored);

we'd still use r->last_stored, right?

That reasoning seems sound to me, and I like the simplification. I guess that here

raft/src/replication.c

Lines 520 to 521 in 0ddb329

/* Check if we can commit some new entries. */

replicationQuorum(r, r->last_stored);

we'd still use r->last_stored, right?

yes

src/replication.c

The callback could contain logic that depends on the value of last_applied, which logically should have increased after applying the log entry, but before calling the callback.

Add test to ensure the commit_index is updated when a leader continually receives apply requests. The test fails without the fix in this commit. Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>

Adapt the tests to take into account the add barrier entries in the log.

Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>

MathieuBordere · 2022-12-15T15:15:29Z

Implemented the simplification described here. Removed the test that checks that we don't commit entries from previous terms by counting votes, as it was really testing nothing anymore after this change (it was hacky anyway) and from the code it's clear we can't commit entries from an older term by counting voters.

freeekanayaka · 2022-12-15T20:34:56Z

It's surely simpler :)

MathieuBordere requested review from freeekanayaka and cole-miller December 13, 2022 13:50

This was referenced Dec 13, 2022

replication: Don't use majority rule for old entries #302

Closed

Test fixes following raft quorum bugfix. canonical/dqlite#450

Merged

MathieuBordere commented Dec 13, 2022

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

MathieuBordere marked this pull request as draft December 13, 2022 14:22

MathieuBordere force-pushed the quorum branch 2 times, most recently from 5cc0bed to 4031e23 Compare December 14, 2022 09:28

freeekanayaka reviewed Dec 14, 2022

View reviewed changes

src/convert.c Outdated Show resolved Hide resolved

freeekanayaka reviewed Dec 14, 2022

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

freeekanayaka reviewed Dec 14, 2022

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

freeekanayaka approved these changes Dec 14, 2022

View reviewed changes

freeekanayaka reviewed Dec 14, 2022

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

MathieuBordere force-pushed the quorum branch from 4031e23 to 315bafb Compare December 14, 2022 10:45

MathieuBordere marked this pull request as ready for review December 14, 2022 10:48

MathieuBordere force-pushed the quorum branch 2 times, most recently from 85c397c to e35e216 Compare December 14, 2022 13:36

cole-miller reviewed Dec 14, 2022

View reviewed changes

Mathieu Borderé added 5 commits December 15, 2022 11:11

replication: Increase last_applied before firing cb

fcc4c8d

The callback could contain logic that depends on the value of last_applied, which logically should have increased after applying the log entry, but before calling the callback.

replication: commit peer match_index instead of leader last_stored

1996d32

Add test to ensure the commit_index is updated when a leader continually receives apply requests. The test fails without the fix in this commit. Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>

convert: Add no-op barrier when converting to leader

29bb302

Adapt the tests to take into account the add barrier entries in the log.

replication: Don't commit entries from old terms by counting

524f735

Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>

configure: bump version 0.17.0 -> 0.17.1

b4bdc8a

Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>

MathieuBordere force-pushed the quorum branch from e35e216 to b4bdc8a Compare December 15, 2022 14:37

MathieuBordere requested a review from cole-miller December 15, 2022 15:00

MathieuBordere requested a review from freeekanayaka December 15, 2022 15:00

freeekanayaka approved these changes Dec 15, 2022

View reviewed changes

cole-miller approved these changes Dec 15, 2022

View reviewed changes

MathieuBordere merged commit 63c8aa5 into canonical:master Dec 16, 2022

MathieuBordere deleted the quorum branch January 18, 2023 12:56

freeekanayaka mentioned this pull request Jun 1, 2023

Assertion `r->last_applied <= r->commit_index' failed. #355

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't commit entries from previous terms by counting replicas + noop entry upon leader election #336

Don't commit entries from previous terms by counting replicas + noop entry upon leader election #336

MathieuBordere commented Dec 13, 2022 •

edited

Loading

codecov bot commented Dec 13, 2022 •

edited

Loading

MathieuBordere commented Dec 13, 2022

MathieuBordere commented Dec 13, 2022 •

edited

Loading

MathieuBordere commented Dec 14, 2022

freeekanayaka Dec 14, 2022

MathieuBordere Dec 14, 2022

freeekanayaka Dec 14, 2022

MathieuBordere Dec 14, 2022

MathieuBordere Dec 14, 2022

freeekanayaka Dec 14, 2022 •

edited

Loading

MathieuBordere Dec 14, 2022 •

edited

Loading

freeekanayaka Dec 14, 2022

MathieuBordere Dec 14, 2022

freeekanayaka left a comment

cole-miller Dec 14, 2022

cole-miller Dec 14, 2022 •

edited

Loading

MathieuBordere Dec 14, 2022

MathieuBordere Dec 15, 2022 •

edited

Loading

cole-miller Dec 15, 2022

MathieuBordere Dec 15, 2022

MathieuBordere commented Dec 15, 2022

freeekanayaka commented Dec 15, 2022

	/* Check if we can commit some new entries. */
	replicationQuorum(r, r->last_stored);

Don't commit entries from previous terms by counting replicas + noop entry upon leader election #336

Don't commit entries from previous terms by counting replicas + noop entry upon leader election #336

Conversation

MathieuBordere commented Dec 13, 2022 • edited Loading

codecov bot commented Dec 13, 2022 • edited Loading

Codecov Report

MathieuBordere commented Dec 13, 2022

MathieuBordere commented Dec 13, 2022 • edited Loading

MathieuBordere commented Dec 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freeekanayaka Dec 14, 2022 • edited Loading

Choose a reason for hiding this comment

MathieuBordere Dec 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freeekanayaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cole-miller Dec 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MathieuBordere Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MathieuBordere commented Dec 15, 2022

freeekanayaka commented Dec 15, 2022

MathieuBordere commented Dec 13, 2022 •

edited

Loading

codecov bot commented Dec 13, 2022 •

edited

Loading

MathieuBordere commented Dec 13, 2022 •

edited

Loading

freeekanayaka Dec 14, 2022 •

edited

Loading

MathieuBordere Dec 14, 2022 •

edited

Loading

cole-miller Dec 14, 2022 •

edited

Loading

MathieuBordere Dec 15, 2022 •

edited

Loading