[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

ahrtr · 2025-03-08T18:18:55Z

See summary: https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.efgyhz20f9eb

What's the symptom

When upgrading etcd from v3.5.x to v3.6.0-rc.x, the upgrade may fail due to "membership: too many learner members in cluster", because etcd allows only 1 learner at most by default.

Thanks @neolit123 for uploading the initial log.

Which versions are impacted

Note only release-3.5 has this issue. The issue was introduced in v3.5.1 in #13348. All etcd patch versions in v3.5.1 - v3.5.19 are affected.

If you ever added & promoted learner(s) in v3.5.1 - v3.5.19 and try to upgrade from 3.5.1+ to v3.6.0-rc.x, then you will see this issue,

If you only added & promoted one learner, you will see that the member become a learner again after upgrading to v3.6.0-rc.x;
if you added & promoted multiple learner (>=2), then the upgrade will fail, because the etcdserver will crash on bootstrap due to membership: too many learner members in cluster.

What's the root cause

When promoting a learner, the change is only persisted in v2store, not in v3store. The reason is simple, because etcd returns errMemberAlreadyExist. Clearly, the member (learner) ID has already existed. See 3.5 code below,

etcd/server/etcdserver/api/membership/store.go

Lines 57 to 59 in 1810af3

    
           if unsafeMemberExists(tx, mkey) { 
        
           	return errMemberAlreadyExist 
        
           }

So the membership data will be inconsistent between v2store and v3store in such case.

Why we only see this issue when upgrading from 3.5 to 3.6?

In 3.5, the v2store is the source of truth for the membership data. In 3.6, v3store (bbolt) is the source of truth for the membership data. When upgrading from 3.5.x to 3.6, the source of truth changes.

How to reproduce this issue

Manual steps

Note try the steps using 3.5.x (>=1) binary,

Step 1: start an etcd instance

$./bin/etcd --name e1 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://127.0.0.1:2380 --advertise-client-urls http://127.0.0.1:2379 --listen-client-urls http://127.0.0.1:2379 --initial-cluster "e1=http://127.0.0.1:2380" --initial-cluster-state new

Step 2: add a learner in another terminal

$ ./bin/etcdctl member add e2 --peer-urls=http://127.0.0.1:2382 --learner

Step 3: start the learner

$ ./bin/etcd --name e2 --initial-advertise-peer-urls http://127.0.0.1:2382  --listen-peer-urls http://127.0.0.1:2382 --advertise-client-urls http://127.0.0.1:2378 --listen-client-urls http://127.0.0.1:2378 --initial-cluster "e1=http://127.0.0.1:2380,e2=http://127.0.0.1:2382" --initial-cluster-state existing

Step 4: promote the learner

$ ./bin/etcdctl member promote 155a4a14c50481b8

Step 5: stop both etcd instances

Step 6: check the bbolt db file directly

Using tool etcd-dump-db to check the db file. You will see that the already promoted leaner is still a leaner.

$ ./etcd-dump-db  iterate-bucket ../../e1.etcd/ members
key="e610623c040f129c", value="{\"id\":16577858238256452252,\"peerURLs\":[\"http://127.0.0.1:2382\"],\"isLearner\":true}"
key="b71f75320dc06a6c", value="{\"id\":13195394291058371180,\"peerURLs\":[\"http://127.0.0.1:2380\"],\"name\":\"e1\"}"

Automatic step

Just execute upgrade_test.sh (of course, you need to download both v3.5.19 and v3.6.0-rc.2 binaries beforehand), afterwards, check the log, you will see the error message "membership: too many learner members in cluster".

Proposed solution & actions

Proposal for release-3.5

Fix the bug as mentioned above (pasted again below) (refer to main branch's implementation). Also add an e2e test to verify the membership data is consistent between v2store and v3store. Probably we should add the verification in production code, but only enabled in test.

etcd/server/etcdserver/api/membership/store.go

Lines 57 to 59 in 1810af3

    
           if unsafeMemberExists(tx, mkey) { 
        
           	return errMemberAlreadyExist 
        
           }

Provide etcdutl commands to
- check the membership data differences between v2store and v3store.
  - Something like etcdutl check members --data-dir path-2-data-dir
- sync the membership data between v2store and v3store.
  - Something like etcdutl sync members --data-dir path-2-data-dir

Proposal for main & release-3.6.

Make change to main firstly, backport to release-3.6 later.

Add an e2e test similar to upgrade_test.sh to cover the upgrade case similar to what kubeadm does. The rough steps,
- start a one member (3.5.x) cluster
- add a learner (3.5.x), and promote it to a voting member later
- add another learner (3.5.x) again, and promote it a voting member later.
- upgrade the member to 3.6 one by one.
We should definitely get the issue included in the upgrade(3.5->3.6) checklist. Users should check the membership data is consistent between v3store and v2store.
Probably we need to publish an official announcement

cc @fuweid @ivanvc @jmhbnz @serathius @siyuanfoundation @spzala @neolit123 @dims

Next step

Once the release-3.5 side changes are done, release etcd v3.5.20
Once the main & release-3.6 changes are done, release etcd v3.6.0-rc.3.
When the above two are done, bump to v3.5.20 in K8s. Also verify the upgrade in K8s workflow similar to Testing Upgrade etcd to v3.6.0-rc.1 kubernetes/kubernetes#130583

Let's discuss this next Monday, and who work on what.

PRs

The text was updated successfully, but these errors were encountered:

serathius · 2025-03-10T08:32:25Z

Could we implement a fix without introducing series of commands that each user that ever used learners would need to run?

For example during etcd v3.6 bootstrap compare v2store and v3store and if there are any unpromoted learners in v3 that are normal members in v2, then override them in v3store?

neolit123 · 2025-03-10T09:10:24Z

thanks for the investigation @ahrtr

ahrtr · 2025-03-10T10:44:20Z

For example during etcd v3.6 bootstrap compare v2store and v3store and if there are any unpromoted learners in v3 that are normal members in v2, then override them in v3store?

I wish we could follow this approach, as it wouldn’t add any extra burden on users when upgrading to 3.6. But unfortunately, it won’t work because v3store is the sole source of truth in 3.6—even the v2snapshot is generated from v3store.

So before upgrading 3.5 to 3.6, users need to

either bump to 3.5.20+ firstly, afterward upgrade to 3.6.0
- users can also execute etcdutl check members firstly, if there is no any inconsistence, then they don't have to necessarily bump to 3.5.20+. Of course, they can first bump directly to 3.5.20+ regardless.
or execte command like etcdutl check members firstly, and execute etcdutl sync members if needed.

So the proposed actions:

Fix the issue (persist learner promotion into v3store/bbolt) in 3.5.20, and add a generic verification to ensure the membership data in v3store is always in sync with v2store. @ahrtr will fix it today.
- [release-3.5] Fix the learner promotion changes not being persisted into v3store (bbolt) #19563
Add auto sync in 3.5.20 as discussed above
- **Note that v3store is usually more up-to-date than v2store, so we should be very cautious and only sync v3store if and only if
  - IsLearner is the only field that differs between v2store and v3store for each member.**
  - and v2store.IsLearner == true && v3store.IsLearner == false
    cc @fuweid and @ahrtr
add command like etcdutl check members and etcdutl sync members in release-3.5 only. cc @fuweid
- If majorities agree that we require users always bump to 3.5.20 firstly before upgrading to 3.6.0, then we don't have to necessarily add these commands. But it isn't user friendly.
- We release 3.6.0 almost 4 years after 3.5.0, I think it might be accepted to add one more item (run etcdutl check members, and run etcdutl sync members if needed) into the 3.5->3.6 upgrade checklist
add a kubeadm style upgrade e2e test in main, then backport to release-3.6. Anyone works on this?

serathius · 2025-03-10T11:02:55Z

But unfortunately, it won’t work because v3store is the sole source of truth in 3.6—even the v2snapshot is generated from v3store.

I understand how v3.6 is meant to treat v2 state, I'm saying that during bootstrap you still have some wiggle room. When we upgrade etcd v3.5 to v3.6, there will be v2store snapshots on disk generated by v3.5, that can be read and cross validated with v3 state.

ahrtr · 2025-03-10T11:59:18Z

I understand how v3.6 is meant to treat v2 state, I'm saying that during bootstrap you still have some wiggle room.

What's the "wiggle room"? As mentioned above, v3store is usually more up-to-date than v2store, so theoretically during the very first 3.6 bootstrap, the v2store might be out of date, accordingly we might miss the only chance to sync from v2store to v3store.

serathius · 2025-03-10T12:48:34Z

so theoretically during the very first 3.6 bootstrap, the v2store might be out of date, accordingly we might miss the only chance to sync from v2store to v3store.

Yes, but we also will replay all WAL entries from v2store. They will be skipped for v3store, but we can make an exception for learner promotion.

What I'm proposing is hacky, but should be possible. Please let me know if you think it's too risky to modify apply loop in that way, but I want to make sure we considered alternatives.

ahrtr · 2025-03-10T13:51:39Z

Yes, but we also will replay all WAL entries from v2store. They will be skipped for v3store, but we can make an exception for learner promotion.

What I'm proposing is hacky, but should be possible. Please let me know if you think it's too risky to modify apply loop in that way, but I want to make sure we considered alternatives.

It's too error prone, and also greatly complicate the release-3.6, which we will maintain a long long time.

neolit123 · 2025-03-10T13:59:56Z

in terms of k8s release schedule it might be safer to release 1.33 with 3.5.20 (that has the fix) instead of 3.6.0.
then 1.34 can start using 3.6.0.

it's also possible to backport 3.5.20 to older versions of k8s and add 3.6.0 in 1.33, but that seems more rushed and risky.

serathius · 2025-03-10T14:00:07Z

which we will maintain a long long time. That's even more true for etcdutl commands. In my opinion we should not rely on manual processes if possible.

ahrtr · 2025-03-10T14:02:36Z

That's even more true for etcdutl commands.

We only add the command in release-3.5.

In my opinion we should not rely on manual processes if possible.

As mentioned above, users don't have to necessarily execute the commands, if they bump to 3.5.20+ regardless.

ahrtr · 2025-03-10T14:17:21Z

in terms of k8s release schedule it might be safer to release 1.33 with 3.5.20 (that has the fix) instead of 3.6.0.
then 1.34 can start using 3.6.0.

There is no plan to integrate etcd 3.6.0 with k8s 1.33, I think we are targeting 1.34.

serathius · 2025-03-10T14:40:58Z

There is no plan to integrate etcd 3.6.0 with k8s 1.33, I think we are targeting 1.34.

Agree, to bump etcd version in K8s we would need to do it at beginning of cycle to get enough soak for K8s to feel safe.

ahrtr · 2025-03-10T15:49:22Z

@fuweid @serathius @siyuanfoundation @ivanvc Please see the proposed action in #19557 (comment), and feedback if you have any further comment, thx

siyuanfoundation · 2025-03-10T17:53:39Z

I agree with @ahrtr but for a different reason than the long maintenance time. The root cause of this issue is the inconsistent v2store and v3store in 3.5, and that is what we really should fix. I would prefer not complicate 3.6 code to fix a 3.5 bug.
With the offline tool, users with more knowledge of the whole membership history could have more control and make more informed decision about whether or not to override values in v3.

ahrtr · 2025-03-10T18:21:44Z

@fuweid @siyuanfoundation do you have time to have a call today?

ivanvc · 2025-03-11T00:09:57Z

I agree with Siyuan's point. It should be treated as a 3.5 bug, not 3.6.

I also think the less the end-user has to do, the better. So, upgrading to 3.5.20 before to 3.6.0 should be reasonable.

If we follow that approach, does that mean we must ship 3.5.20 with K8s 1.33?

serathius · 2025-03-11T08:31:07Z

I'm concerned more with user experience than where the bug came from. We should strive for minimizing maintenance cost, however we should be careful when this impacts user experience.

My proposal came from concern about writing another 10 page upgrade instructions scaring users from upgrading and further sharding the ecosystem. In the current day and age I don't think writing a instructions like https://etcd.io/docs/v3.3/upgrades/upgrade_3_5/ should have place at all. Most users expectation is that they can just replace the container tag. But we want to require user to run dedicated commands before upgrade, or risk complete unrecoverable cluster shutdown when upgrading to v3.6. Requiring additional steps for upgrade will just cause bad user experience, dissolving the trust.

I also think the less the end-user has to do, the better.

I think you might be lacking full context, the options are:

Option 1 - Implement resynchronization in v3.5.20

Proposed by @ahrtr, preferred by @siyuanfoundation, @ivanvc

Synchronize storages in v3.5.20.
Add new etcdutl members commands to handle cases where user doesn't upgrade to v3.5.20

Downside:

If user doesn't upgrade to v3.5.20 and any existing cluster member has been promoted from learner user needs to run the following commands or risk unrecoverable cluster shutdown, commands:
- etcdutl check members to validate if stores are desynchronized (aka, any existing member was promoted from learner)
- etcdutl sync members to fix desynchronized storages

Option 2 - Implement resynchronization in v3.6.0

Proposed by @serathius

Synchronize storages in v3.6.0.
Can still implement resynchronization in v3.5.20 but not nessesery.
User doesn't need to take any actions.

Downside:

Code to fix v3.5 bug present in v3.6. Don't agree with this argument it might be bug in v3.5, but by itself v3.5 will work. It's issue for upgrading to v3.6 the issue surfaces, thus I think it's reasonable to fix it in v3.6

Note; the resynchronization code is risky to implement, whether we implement it in v3.5.20 or v3.6 doesn't matter. Both will be non trivial and require extensive testing.

Why requiring upgrade through v3.5.20

We should not require or assume user upgrades go through each and every patch release. Many upgrade automations (including one used in GKE), doesn't give you guarantee of total control over exact upgrade path.

ahrtr · 2025-03-11T09:49:26Z

Responses to comments

I'm concerned more with user experience

This is a generally valid point.

concern about writing another 10 page upgrade instructions scaring users from upgrading and further sharding the ecosystem. In the current day and age I don't think writing a instructions like https://etcd.io/docs/v3.3/upgrades/upgrade_3_5/

We release 3.6.0 almost 4 years after 3.5.0. There are already some unavoidable items in the upgrade checklist. Long-term success depends on building a more collaborative and stronger team, and maintaining a regular release cycle (can be discussed separately).

Option 1 - Implement resynchronization in v3.5.20

Proposed by @ahrtr, preferred by @siyuanfoundation, @ivanvc

Prepare workaround in v3.6 to handle desynchronized storages.

In my proposal, no any workaround in 3.6 is needed.

whether we implement it in v3.5.20 or v3.6 doesn't matter.

No, it matters.

In 3.5, v2store is the source of truth for membership data, the auto-sync can be executed on every bootstrap.
But in 3.6, v3store is the source of truth, it's only safe to execute it in the very first bootstrap after upgrading to 3.6. So it's fragile & really risky. Doing this in 3.6 is even infeasible. If you insist on doing it, please provide a PoC.

New comments

I still stick to the proposed action [release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557 (comment) so far, unless I see a valid PoC to do the auto-sync in 3.6
We need to fix this issue in 3.5 regardless, so please review [release-3.5] Fix the learner promotion changes not being persisted into v3store (bbolt) #19563.

neolit123 · 2025-03-11T10:30:42Z

one concern that i have is that this bug was caught on the k/k pr by an upgrade test that runs on demand, manually.
kubernetes/kubernetes#130583 (comment)

the etcd maintainers might have missed this error and merged the upgrade to 3.6.0. later kubeadm periodic upgrade e2e would have failed and the kubeadm maintainers would have contacted the etcd maintainers with a report.

that is all good, but are there any plans to run local etcd upgrade e2e tests on the etcd project side? perhaps the etcd-operator project would help with that?

ahrtr · 2025-03-11T10:33:14Z

but are there any plans to run local etcd upgrade e2e tests on the etcd project side?

@neolit123 we have upgrade & downgrade e2e test cases, but the problem is that we don't have kubeadm style upgrade case (adding & promoting learners, upgrade later). We will add fill the gap, please see the last action item in the proposed actions in #19557 (comment)

ahrtr · 2025-03-11T12:37:32Z

If synchronizing members in v3.6 is too risky, I would argue that we at least v3.6 should validate and prevent upgrade from 3.5.x (pre .20).

If users are already on 3.5.20+, then all good; otherwise, there are two options as mentioned in
#19557 (comment) provides two options to users,

they either upgrade to 3.5.20+ firstly, then to 3.6.0 later.
or execute etcdutl check members and followed by etcdutl sync members if needed.

Now you are proposing to limit to only the first option. I am not strongly against this, the good side is that it simplifies the users' burden to understand this (it also saves our effort), the bad side it that it removes the flexibility.

From 3.6 perspective, we will have to add some validation on bootstrap, and check (1) whether or not there is any learner and (2) the version of any other members <= 3.5.19, and raise an alarm if any. Is this what you proposed? @serathius Actually I tend not to do it. Reasons:

If there are multiple learners (>=2), etcd will crash no matter we add such validation in 3.6;
If there is only one learner (actually should be a voting member), it will become a learner again after upgrading to 3.6. Users can promote it again manually to workaround it.

if a 3.6.0 member is added and it checks if there are existing < 3.5.20 members i think it can print a sensible error message and exit 1. i don't know if such version checks are easy to do, though.

please see my answer above.

serathius · 2025-03-11T13:00:27Z

If there are multiple learners (>=2), etcd will crash no matter we add such validation in 3.6;

Looking in bootstrap.go, looks like validation for max learners is pretty robust. It checks all paths and should prevent cluster with multiple learners to start. Still I think we should improve the error message by providing context about this issue.

If there is only one learner (actually should be a voting member), it will become a learner again after upgrading to 3.6. Users can promote it again manually to workaround it.

I'm not sure about this path, seems like there are many edge cases (Promotion before last snapshot, promotion after last snapshot, restarted was promoted, restarted member was not promoted, restarted member becomes leader), still you just assume will work the same in all cases and user will be able to easily fix it by promotion.

ahrtr · 2025-03-11T13:17:44Z

I'm not sure about this path, seems like there are many edge cases (Promotion before last snapshot, promotion after last snapshot, restarted was promoted, restarted member was not promoted, restarted member becomes leader), still you just assume will work the same in all cases and user will be able to easily fix it by promotion.

That's why we need to fix it in 3.5 before upgrading to 3.6.

The key for now is whether we should forcibly prevent users from upgrading from 3.5.19 (or older versions) to 3.6. If yes, then we don't need the etcdutl check/sync members commands; and always requires users to bump to 3.5.20+, and to 3.6 later. I agree this is the most prudent way and should be able to prevent all possible issues.

ahrtr · 2025-03-11T13:21:58Z

Summarisation.

release-3.5

[3.5.20] Fix the issue of learner promotion not persisted into v3store(bbolt): [release-3.5] Fix the learner promotion changes not being persisted into v3store (bbolt) #19563
[3.5.20] Auto sync from v2store to v3store if not consistent

release-3.6

Add validation on bootstrap, and raise an alarm if any other member is on 3.5.19 or older versions

serathius · 2025-03-11T13:25:19Z

Add validation on bootstrap, and raise an alarm if any other member is on 3.5.19 or older versions

Raising an alarm has a specific meaning in etcd, and I would be even more worried about adding a new type of alarm and its consequences.

siyuanfoundation · 2025-03-11T16:24:25Z

As @serathius mentioned, there are many edge cases to consider, so it is potentially more risky to rely on syncing from v2store in 3.6 in some cases when v3store should be the ground truth.
We don't necessarily have to raise any alarm, a better error msg about what to try next when the upgrade fails due to multiple learners and if any member version <= 3.5.19 would suffice.

BenTheElder · 2025-03-11T17:11:25Z

Clarifying:

1.33 is in code freeze, and that means only bug fixes are merging.

Kubernetes is not in code freeze until the 21st per https://k8s.dev/release (rendered form of the previous link)
I checked and non-bug-fix PRs are still merging today.

However, the 21st is pretty soon.

ahrtr · 2025-03-11T17:48:46Z

See summary https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.efgyhz20f9eb

Please anyone feel free to add comment. Also added the link in the very first comment.

wenjiaswe · 2025-03-11T18:29:16Z

cc @dims @chaochn47 @jberkus @jmhbnz for visibility

wenjiaswe · 2025-03-11T19:44:19Z

cc @stackbaek

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr · 2025-03-20T10:03:45Z

Update & summary

We were planning to release 3.5.20 and land it to K8s 1.33 before the code freeze, but unfortunately, we didn't make it although I have been working & pushing hard. We have to land 3.5.20 to 1.33.1. cc @neolit123

We were planning to release etcd 3.6.0 before KubeCon, we have to postpone it until post KubeCon, because,

We need to release etcd 3.5.20 first.
We still need to finish the e2e test cases

release-3.5

I have already resolved this issue in 3.5 in [release-3.5] Fix the learner promotion changes not being persisted into v3store (bbolt) #19563.
For the users who have already been affected by this issue, etcd is able to automatically fix it once users upgrade to 3.5.20 or higher patch versions. I already added e2e test cases to verify this in [release-3.5] Add e2e test to verify etcd is able to automatically fix the issue #19629 (not merged yet)
- Once this PR gets merged, then we are ready to release 3.5.20 @ivanvc

release-3.6

We still require users to upgrade to 3.5.20 (or a higher patch version) first before upgrading to 3.6.0, otherwise the upgrading (3.5 -> 3.6) may fail. We need to clearly document this in upgrade checklist and 3.6.0 release announcement blog. I will also call this out in the KubeCon event.
Note that we don't need to add any validation in release-3.6 to prevent users upgrade from 3.5.19 (or an older patch) to 3.6.0. Refer to [release-3.5] [Solution 2] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store #19606 (comment) for details.

We need to add two e2e tests below,

Add kubeadm style e2e test in main branch, and backport to release-3.6. @fuweid is working on this.
Add e2e test in release-3.6 only to verify that etcd is able to automatically fix the issues in which the users have already been affected. @fuweid I believe this test should be able to reuse most of the code in above e2e test, so are you able to take care of this case as well?

I also updated the google doc: https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.58mxw7gv0jjj

ahrtr · 2025-03-20T19:09:44Z

Note that we don't need to add any validation in release-3.6 to prevent users upgrade from 3.5.19 (or an older patch) to 3.6.0. Refer to [release-3.5] [Solution 2] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store #19606 (comment) for details.

Unfortunately, this isn't correct :( . 3.6 won't be able to automatically fix the issues which are already affected, because v3store is the source of truth in 3.6. cc @fuweid who confirmed this.

But adding validation in 3.6 also has flaws,

[minor concern] What if some of other member(s) are temporarily unreachable. We need to keep retrying async until getting all other members' version, but it shouldn't block the bootstrap process otherwise it may cause lack of quorum (chick-egg issue)
[major concern] If users upgrade all members from 3.5.19 (or older versions) to 3.6.0 at the same time (although it's unlikely, because usually it's rolling update), then the validation will pass because all members are already 3.6.0.

Options: we need to provide both options.

Still add validation. It has value although with flaws.
Update upgrade checklist/guide to remind users to double check the member's status after upgrade.

ahrtr added type/bug priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release/v3.6 release/v3.5 labels Mar 8, 2025

ahrtr mentioned this issue Mar 8, 2025

Testing Upgrade etcd to v3.6.0-rc.1 kubernetes/kubernetes#130583

Draft

ahrtr added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 9, 2025

ahrtr mentioned this issue Mar 10, 2025

Build etcd image v3.5.19 kubernetes/kubernetes#130609

Open

ahrtr pinned this issue Mar 10, 2025

ahrtr mentioned this issue Mar 10, 2025

[release-3.5] Fix the learner promotion changes not being persisted into v3store (bbolt) #19563

Merged

ahrtr mentioned this issue Mar 13, 2025

[release-3.5] [Solution 1] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store #19586

Closed

ivanvc added the stage/triaged label Mar 13, 2025

ahrtr mentioned this issue Mar 14, 2025

[release-3.5] [Solution 2] Auto sync members in v3store if Islearner is the only field that differs between v2store and v3store #19606

Closed

ivanvc mentioned this issue Mar 14, 2025

Plan to release v3.5.20 #19605

Open

1 task

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025

Add e2e test to verify etcd is able to automatically fix the issue

c88a009

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025

Add e2e test to verify etcd is able to automatically fix the issue

d78c731

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr mentioned this issue Mar 19, 2025

[release-3.5] Add e2e test to verify etcd is able to automatically fix the issue #19629

Merged

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025

Add e2e test to verify etcd is able to automatically fix the issue

f1aa95b

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025

Add e2e test to verify etcd is able to automatically fix the issue

a274288

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025

Add e2e test to verify etcd is able to automatically fix the issue

aa09df7

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025

Add e2e test to verify etcd is able to automatically fix the issue

c4d8012

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025

Add e2e test to verify etcd is able to automatically fix the issue

15134ba

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 20, 2025

Add e2e test to verify etcd is able to automatically fix the issue

1ea1162

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 20, 2025

Add e2e test to verify etcd is able to automatically fix the issue

fcd80db

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

ahrtr mentioned this issue Mar 20, 2025

[3.5] tests: add TestIssue19557FixMemberDataAfterUpgrade #19627

Closed

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 20, 2025

Add e2e test to verify etcd is able to automatically fix the issue

0a51fc2

caused by etcd-io#19557 Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

fuweid mentioned this issue Mar 20, 2025

e2e: add upgrade test for clusters set up by promoted members #19634

Open

ahrtr mentioned this issue Mar 21, 2025

[release-3.6] Auto sync members in v3store is IsLearner differs between v2 and v3 store #19636

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

ahrtr commented Mar 8, 2025 •

edited

Loading

serathius commented Mar 10, 2025

neolit123 commented Mar 10, 2025

ahrtr commented Mar 10, 2025 •

edited

Loading

serathius commented Mar 10, 2025

ahrtr commented Mar 10, 2025

serathius commented Mar 10, 2025 •

edited

Loading

ahrtr commented Mar 10, 2025

neolit123 commented Mar 10, 2025

serathius commented Mar 10, 2025

ahrtr commented Mar 10, 2025

ahrtr commented Mar 10, 2025

serathius commented Mar 10, 2025

ahrtr commented Mar 10, 2025

siyuanfoundation commented Mar 10, 2025

ahrtr commented Mar 10, 2025

ivanvc commented Mar 11, 2025

serathius commented Mar 11, 2025 •

edited

Loading

ahrtr commented Mar 11, 2025 •

edited

Loading

Option 1 - Implement resynchronization in v3.5.20

neolit123 commented Mar 11, 2025

ahrtr commented Mar 11, 2025

ahrtr commented Mar 11, 2025 •

edited

Loading

serathius commented Mar 11, 2025 •

edited

Loading

ahrtr commented Mar 11, 2025

ahrtr commented Mar 11, 2025

serathius commented Mar 11, 2025

siyuanfoundation commented Mar 11, 2025 •

edited

Loading

BenTheElder commented Mar 11, 2025 •

edited

Loading

ahrtr commented Mar 11, 2025

wenjiaswe commented Mar 11, 2025 •

edited

Loading

wenjiaswe commented Mar 11, 2025

ahrtr commented Mar 20, 2025

ahrtr commented Mar 20, 2025 •

edited

Loading

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

Comments

ahrtr commented Mar 8, 2025 • edited Loading

What's the symptom

Which versions are impacted

What's the root cause

Why we only see this issue when upgrading from 3.5 to 3.6?

How to reproduce this issue

Manual steps

Automatic step

Proposed solution & actions

Proposal for release-3.5

Proposal for main & release-3.6.

Next step

PRs

serathius commented Mar 10, 2025

neolit123 commented Mar 10, 2025

ahrtr commented Mar 10, 2025 • edited Loading

serathius commented Mar 10, 2025

ahrtr commented Mar 10, 2025

serathius commented Mar 10, 2025 • edited Loading

ahrtr commented Mar 10, 2025

neolit123 commented Mar 10, 2025

serathius commented Mar 10, 2025

ahrtr commented Mar 10, 2025

ahrtr commented Mar 10, 2025

serathius commented Mar 10, 2025

ahrtr commented Mar 10, 2025

siyuanfoundation commented Mar 10, 2025

ahrtr commented Mar 10, 2025

ivanvc commented Mar 11, 2025

serathius commented Mar 11, 2025 • edited Loading

Option 1 - Implement resynchronization in v3.5.20

Option 2 - Implement resynchronization in v3.6.0

Why requiring upgrade through v3.5.20

ahrtr commented Mar 11, 2025 • edited Loading

Responses to comments

Option 1 - Implement resynchronization in v3.5.20

New comments

neolit123 commented Mar 11, 2025

ahrtr commented Mar 11, 2025

ahrtr commented Mar 11, 2025 • edited Loading

serathius commented Mar 11, 2025 • edited Loading

ahrtr commented Mar 11, 2025

ahrtr commented Mar 11, 2025

release-3.5

release-3.6

serathius commented Mar 11, 2025

siyuanfoundation commented Mar 11, 2025 • edited Loading

BenTheElder commented Mar 11, 2025 • edited Loading

ahrtr commented Mar 11, 2025

wenjiaswe commented Mar 11, 2025 • edited Loading

wenjiaswe commented Mar 11, 2025

ahrtr commented Mar 20, 2025

release-3.5

release-3.6

ahrtr commented Mar 20, 2025 • edited Loading

ahrtr commented Mar 8, 2025 •

edited

Loading

ahrtr commented Mar 10, 2025 •

edited

Loading

serathius commented Mar 10, 2025 •

edited

Loading

serathius commented Mar 11, 2025 •

edited

Loading

ahrtr commented Mar 11, 2025 •

edited

Loading

ahrtr commented Mar 11, 2025 •

edited

Loading

serathius commented Mar 11, 2025 •

edited

Loading

siyuanfoundation commented Mar 11, 2025 •

edited

Loading

BenTheElder commented Mar 11, 2025 •

edited

Loading

wenjiaswe commented Mar 11, 2025 •

edited

Loading

ahrtr commented Mar 20, 2025 •

edited

Loading