Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-3.5] [Solution 1] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store #19586

Closed
wants to merge 1 commit into from

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Mar 13, 2025

Provide a fix for the users who have already been affected by #19557

cc @fuweid @ivanvc @jmhbnz @serathius @siyuanfoundation

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr
Copy link
Member Author

ahrtr commented Mar 13, 2025

@ivanvc once this PRs get merged and a couple of days workflow/pipeline soak, then we are ready to release 3.5.20.

@fuweid the new Kubeadm style e2e test should pass once 3.5.20 is out.

@ahrtr ahrtr force-pushed the learner_sync_20250313 branch from 01f8a38 to 966938d Compare March 13, 2025 22:00
@ahrtr
Copy link
Member Author

ahrtr commented Mar 13, 2025

Hi folks, please take a look when you get time, thx

@@ -299,6 +299,9 @@ func StartEtcd(inCfg *Config) (e *Etcd, err error) {
zap.Strings("listen-metrics-urls", e.cfg.getMetricsURLs()),
)
serving = true

e.Server.SyncLearnerPromotionIfNeeded()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing cluster membership after the apply loop has been started seems like a bad idea. My expectation was that we shouldn't modify cluster state outside apply loop. Any reason to do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expectation is good, but not feasible.

I already clarified it in the comment for the method. We must wait for etcd finishes replaying the WAL records.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must wait for etcd finishes replaying the WAL records.

Yes, we can also do that in apply loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can also do that in apply loop.

If we intentionally issue a member promotion request, then only one member should do it. Then we must wait for leader gets elected?

I really hate the leader's privilege mode; it causes some uncertainties, i.e. leader change etc. I agree that doing this outside of apply loop isn't ideal, but actually it's safe for this specific case. And we would never let such code/implementation go into main branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not proposing to do a member promotion request.

Just a local patching of v3store:

  • When bootstrapping read v2 and v3, if there is a unpromoted member in v3 that is promoted in v2, we can mark this member in v3 as promoted. This is correct because it member can only be promoted once and v2 has always older data than v3.
  • To handle Promote Member in WAL, depending on etcd version:
    • For v3.5, it should just work with fixes for PromoteMember. This should work as v2 store is the source of truth.
    • For v3.6, when bootstrapping we read WAL since v2 snapshot, if we find any MemberPromote, just mark it as promoted in v3. This needs to be done before we start raft as v3 store should be source of truth.

Proposed actions are based on assumption (would need to confirm as I might remember it wrongly), that in v3.6 the v3 store is source of truth of membership even when raft is just started and even during replaying of WAL entries between v2 snapshot and v3 consistent index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash only is calcuated based on key space.

pls review #19606 instead, let's not waste time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also data inconsistence has nothing to do with this PR (and #19606).

mix-version will lead to member ship data inconsistence, but it will converge once upgrading to 3.5.20+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pinged your guys in slack, also clearly mentioned that preferred to #19606. It's really confused why you still review this PR?

If you insist on reviewing this PR, as least you should state why you prefer to this PR firstly, instead of wasting time to still review it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you had closed this PR, then that would not have result in the confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already clarified in above comment. What confusion do you still have?

again, let's focus on #19606

v2Members, _ := membersFromStore(c.lg, c.v2store)
v3Members, _ := membersFromBackend(c.lg, c.be)

for id, v2Member := range v2Members {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For v3.6 we care about v3membership information. If for some reason v2 and v3 has different members (this is what happen in previous attempt to make v3 authoritative), it's safer to check v3 members.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is for 3.5 instead of 3.6. In 3.5, v2store is the source of truth. Also we are syncing from v2store to v3store.

Also note that we need to check both v2 and v3store, the member ID must match. Not sure what's your concern here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we resolve this comment?

clonedV3Member := v3Member.Clone()
sort.Strings(clonedV3Member.PeerURLs)

if reflect.DeepEqual(clonedV2Member.RaftAttributes, clonedV3Member.RaftAttributes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just compare IsLearner? We don't care about other information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RaftAttribute includes both peerURLs and IsLearner, both of them should always match between v2 and v3store. I think it's more prudent to double check peerURL matches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we resolve this comment?

@ahrtr ahrtr force-pushed the learner_sync_20250313 branch 2 times, most recently from ecc4da2 to cc73f83 Compare March 14, 2025 15:25
…ers between v2store and v3store

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
@ahrtr ahrtr force-pushed the learner_sync_20250313 branch from cc73f83 to e7e5358 Compare March 14, 2025 15:26
@ahrtr
Copy link
Member Author

ahrtr commented Mar 14, 2025

e2e test updated, PTAL, thx

@ivanvc ivanvc mentioned this pull request Mar 14, 2025
1 task
@ahrtr ahrtr changed the title [release-3.5] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store [release-3.5] [Solution 1] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store Mar 14, 2025
@ahrtr ahrtr closed this Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants