Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

Open
ahrtr opened this issue Mar 8, 2025 · 36 comments
Open

[release-3.5] Learner promotion isn't persisted to v3store (bbolt) #19557

ahrtr opened this issue Mar 8, 2025 · 36 comments
Labels
priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release/v3.5 release/v3.6 stage/triaged type/bug

Comments

@ahrtr
Copy link
Member

ahrtr commented Mar 8, 2025

See summary: https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.efgyhz20f9eb

What's the symptom

When upgrading etcd from v3.5.x to v3.6.0-rc.x, the upgrade may fail due to "membership: too many learner members in cluster", because etcd allows only 1 learner at most by default.

Thanks @neolit123 for uploading the initial log.

Which versions are impacted

Note only release-3.5 has this issue. The issue was introduced in v3.5.1 in #13348. All etcd patch versions in v3.5.1 - v3.5.19 are affected.

If you ever added & promoted learner(s) in v3.5.1 - v3.5.19 and try to upgrade from 3.5.1+ to v3.6.0-rc.x, then you will see this issue,

  • If you only added & promoted one learner, you will see that the member become a learner again after upgrading to v3.6.0-rc.x;
  • if you added & promoted multiple learner (>=2), then the upgrade will fail, because the etcdserver will crash on bootstrap due to membership: too many learner members in cluster.

What's the root cause

When promoting a learner, the change is only persisted in v2store, not in v3store. The reason is simple, because etcd returns errMemberAlreadyExist. Clearly, the member (learner) ID has already existed. See 3.5 code below,

if unsafeMemberExists(tx, mkey) {
return errMemberAlreadyExist
}

So the membership data will be inconsistent between v2store and v3store in such case.

Why we only see this issue when upgrading from 3.5 to 3.6?

In 3.5, the v2store is the source of truth for the membership data. In 3.6, v3store (bbolt) is the source of truth for the membership data. When upgrading from 3.5.x to 3.6, the source of truth changes.

How to reproduce this issue

Manual steps

Note try the steps using 3.5.x (>=1) binary,

Step 1: start an etcd instance

$./bin/etcd --name e1 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://127.0.0.1:2380 --advertise-client-urls http://127.0.0.1:2379 --listen-client-urls http://127.0.0.1:2379 --initial-cluster "e1=http://127.0.0.1:2380" --initial-cluster-state new

Step 2: add a learner in another terminal

$ ./bin/etcdctl member add e2 --peer-urls=http://127.0.0.1:2382 --learner

Step 3: start the learner

$ ./bin/etcd --name e2 --initial-advertise-peer-urls http://127.0.0.1:2382  --listen-peer-urls http://127.0.0.1:2382 --advertise-client-urls http://127.0.0.1:2378 --listen-client-urls http://127.0.0.1:2378 --initial-cluster "e1=http://127.0.0.1:2380,e2=http://127.0.0.1:2382" --initial-cluster-state existing

Step 4: promote the learner

$ ./bin/etcdctl member promote 155a4a14c50481b8

Step 5: stop both etcd instances

Step 6: check the bbolt db file directly

Using tool etcd-dump-db to check the db file. You will see that the already promoted leaner is still a leaner.

$ ./etcd-dump-db  iterate-bucket ../../e1.etcd/ members
key="e610623c040f129c", value="{\"id\":16577858238256452252,\"peerURLs\":[\"http://127.0.0.1:2382\"],\"isLearner\":true}"
key="b71f75320dc06a6c", value="{\"id\":13195394291058371180,\"peerURLs\":[\"http://127.0.0.1:2380\"],\"name\":\"e1\"}"

Automatic step

Just execute upgrade_test.sh (of course, you need to download both v3.5.19 and v3.6.0-rc.2 binaries beforehand), afterwards, check the log, you will see the error message "membership: too many learner members in cluster".

Proposed solution & actions

Proposal for release-3.5

  • Fix the bug as mentioned above (pasted again below) (refer to main branch's implementation). Also add an e2e test to verify the membership data is consistent between v2store and v3store. Probably we should add the verification in production code, but only enabled in test.

if unsafeMemberExists(tx, mkey) {
return errMemberAlreadyExist
}

  • Provide etcdutl commands to
    • check the membership data differences between v2store and v3store.
      • Something like etcdutl check members --data-dir path-2-data-dir
    • sync the membership data between v2store and v3store.
      • Something like etcdutl sync members --data-dir path-2-data-dir

Proposal for main & release-3.6.

Make change to main firstly, backport to release-3.6 later.

  • Add an e2e test similar to upgrade_test.sh to cover the upgrade case similar to what kubeadm does. The rough steps,

    • start a one member (3.5.x) cluster
    • add a learner (3.5.x), and promote it to a voting member later
    • add another learner (3.5.x) again, and promote it a voting member later.
    • upgrade the member to 3.6 one by one.
  • We should definitely get the issue included in the upgrade(3.5->3.6) checklist. Users should check the membership data is consistent between v3store and v2store.

  • Probably we need to publish an official announcement

cc @fuweid @ivanvc @jmhbnz @serathius @siyuanfoundation @spzala @neolit123 @dims

Next step

Let's discuss this next Monday, and who work on what.

PRs

@ahrtr ahrtr added type/bug priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release/v3.6 release/v3.5 labels Mar 8, 2025
@ahrtr ahrtr added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 9, 2025
@serathius
Copy link
Member

Could we implement a fix without introducing series of commands that each user that ever used learners would need to run?

For example during etcd v3.6 bootstrap compare v2store and v3store and if there are any unpromoted learners in v3 that are normal members in v2, then override them in v3store?

@neolit123
Copy link

thanks for the investigation @ahrtr

@ahrtr ahrtr pinned this issue Mar 10, 2025
@ahrtr
Copy link
Member Author

ahrtr commented Mar 10, 2025

For example during etcd v3.6 bootstrap compare v2store and v3store and if there are any unpromoted learners in v3 that are normal members in v2, then override them in v3store?

I wish we could follow this approach, as it wouldn’t add any extra burden on users when upgrading to 3.6. But unfortunately, it won’t work because v3store is the sole source of truth in 3.6—even the v2snapshot is generated from v3store.

So before upgrading 3.5 to 3.6, users need to

  • either bump to 3.5.20+ firstly, afterward upgrade to 3.6.0
    • users can also execute etcdutl check members firstly, if there is no any inconsistence, then they don't have to necessarily bump to 3.5.20+. Of course, they can first bump directly to 3.5.20+ regardless.
  • or execte command like etcdutl check members firstly, and execute etcdutl sync members if needed.

So the proposed actions:

  • Fix the issue (persist learner promotion into v3store/bbolt) in 3.5.20, and add a generic verification to ensure the membership data in v3store is always in sync with v2store. @ahrtr will fix it today.
  • Add auto sync in 3.5.20 as discussed above
    • **Note that v3store is usually more up-to-date than v2store, so we should be very cautious and only sync v3store if and only if
      • IsLearner is the only field that differs between v2store and v3store for each member.**
      • and v2store.IsLearner == true && v3store.IsLearner == false
        cc @fuweid and @ahrtr
  • add command like etcdutl check members and etcdutl sync members in release-3.5 only. cc @fuweid
    • If majorities agree that we require users always bump to 3.5.20 firstly before upgrading to 3.6.0, then we don't have to necessarily add these commands. But it isn't user friendly.
    • We release 3.6.0 almost 4 years after 3.5.0, I think it might be accepted to add one more item (run etcdutl check members, and run etcdutl sync members if needed) into the 3.5->3.6 upgrade checklist
  • add a kubeadm style upgrade e2e test in main, then backport to release-3.6. Anyone works on this?

@serathius
Copy link
Member

But unfortunately, it won’t work because v3store is the sole source of truth in 3.6—even the v2snapshot is generated from v3store.

I understand how v3.6 is meant to treat v2 state, I'm saying that during bootstrap you still have some wiggle room. When we upgrade etcd v3.5 to v3.6, there will be v2store snapshots on disk generated by v3.5, that can be read and cross validated with v3 state.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 10, 2025

I understand how v3.6 is meant to treat v2 state, I'm saying that during bootstrap you still have some wiggle room.

What's the "wiggle room"? As mentioned above, v3store is usually more up-to-date than v2store, so theoretically during the very first 3.6 bootstrap, the v2store might be out of date, accordingly we might miss the only chance to sync from v2store to v3store.

@serathius
Copy link
Member

serathius commented Mar 10, 2025

so theoretically during the very first 3.6 bootstrap, the v2store might be out of date, accordingly we might miss the only chance to sync from v2store to v3store.

Yes, but we also will replay all WAL entries from v2store. They will be skipped for v3store, but we can make an exception for learner promotion.

What I'm proposing is hacky, but should be possible. Please let me know if you think it's too risky to modify apply loop in that way, but I want to make sure we considered alternatives.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 10, 2025

Yes, but we also will replay all WAL entries from v2store. They will be skipped for v3store, but we can make an exception for learner promotion.

What I'm proposing is hacky, but should be possible. Please let me know if you think it's too risky to modify apply loop in that way, but I want to make sure we considered alternatives.

It's too error prone, and also greatly complicate the release-3.6, which we will maintain a long long time.

@neolit123
Copy link

in terms of k8s release schedule it might be safer to release 1.33 with 3.5.20 (that has the fix) instead of 3.6.0.
then 1.34 can start using 3.6.0.

it's also possible to backport 3.5.20 to older versions of k8s and add 3.6.0 in 1.33, but that seems more rushed and risky.

@serathius
Copy link
Member

which we will maintain a long long time. That's even more true for etcdutl commands. In my opinion we should not rely on manual processes if possible.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 10, 2025

That's even more true for etcdutl commands.

We only add the command in release-3.5.

In my opinion we should not rely on manual processes if possible.

As mentioned above, users don't have to necessarily execute the commands, if they bump to 3.5.20+ regardless.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 10, 2025

in terms of k8s release schedule it might be safer to release 1.33 with 3.5.20 (that has the fix) instead of 3.6.0.
then 1.34 can start using 3.6.0.

There is no plan to integrate etcd 3.6.0 with k8s 1.33, I think we are targeting 1.34.

@serathius
Copy link
Member

There is no plan to integrate etcd 3.6.0 with k8s 1.33, I think we are targeting 1.34.

Agree, to bump etcd version in K8s we would need to do it at beginning of cycle to get enough soak for K8s to feel safe.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 10, 2025

@fuweid @serathius @siyuanfoundation @ivanvc Please see the proposed action in #19557 (comment), and feedback if you have any further comment, thx

@siyuanfoundation
Copy link
Contributor

I agree with @ahrtr but for a different reason than the long maintenance time. The root cause of this issue is the inconsistent v2store and v3store in 3.5, and that is what we really should fix. I would prefer not complicate 3.6 code to fix a 3.5 bug.
With the offline tool, users with more knowledge of the whole membership history could have more control and make more informed decision about whether or not to override values in v3.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 10, 2025

@fuweid @siyuanfoundation do you have time to have a call today?

@ivanvc
Copy link
Member

ivanvc commented Mar 11, 2025

I agree with Siyuan's point. It should be treated as a 3.5 bug, not 3.6.

I also think the less the end-user has to do, the better. So, upgrading to 3.5.20 before to 3.6.0 should be reasonable.

If we follow that approach, does that mean we must ship 3.5.20 with K8s 1.33?

@serathius
Copy link
Member

serathius commented Mar 11, 2025

I'm concerned more with user experience than where the bug came from. We should strive for minimizing maintenance cost, however we should be careful when this impacts user experience.

My proposal came from concern about writing another 10 page upgrade instructions scaring users from upgrading and further sharding the ecosystem. In the current day and age I don't think writing a instructions like https://etcd.io/docs/v3.3/upgrades/upgrade_3_5/ should have place at all. Most users expectation is that they can just replace the container tag. But we want to require user to run dedicated commands before upgrade, or risk complete unrecoverable cluster shutdown when upgrading to v3.6. Requiring additional steps for upgrade will just cause bad user experience, dissolving the trust.

I also think the less the end-user has to do, the better.

I think you might be lacking full context, the options are:

Option 1 - Implement resynchronization in v3.5.20

Proposed by @ahrtr, preferred by @siyuanfoundation, @ivanvc

  • Synchronize storages in v3.5.20.
  • Add new etcdutl members commands to handle cases where user doesn't upgrade to v3.5.20

Downside:

  • If user doesn't upgrade to v3.5.20 and any existing cluster member has been promoted from learner user needs to run the following commands or risk unrecoverable cluster shutdown, commands:
    • etcdutl check members to validate if stores are desynchronized (aka, any existing member was promoted from learner)
    • etcdutl sync members to fix desynchronized storages

Option 2 - Implement resynchronization in v3.6.0

Proposed by @serathius

  • Synchronize storages in v3.6.0.
  • Can still implement resynchronization in v3.5.20 but not nessesery.
  • User doesn't need to take any actions.

Downside:

  • Code to fix v3.5 bug present in v3.6. Don't agree with this argument it might be bug in v3.5, but by itself v3.5 will work. It's issue for upgrading to v3.6 the issue surfaces, thus I think it's reasonable to fix it in v3.6

Note; the resynchronization code is risky to implement, whether we implement it in v3.5.20 or v3.6 doesn't matter. Both will be non trivial and require extensive testing.

Why requiring upgrade through v3.5.20

We should not require or assume user upgrades go through each and every patch release. Many upgrade automations (including one used in GKE), doesn't give you guarantee of total control over exact upgrade path.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 11, 2025

Responses to comments

I'm concerned more with user experience

This is a generally valid point.

concern about writing another 10 page upgrade instructions scaring users from upgrading and further sharding the ecosystem. In the current day and age I don't think writing a instructions like https://etcd.io/docs/v3.3/upgrades/upgrade_3_5/

We release 3.6.0 almost 4 years after 3.5.0. There are already some unavoidable items in the upgrade checklist. Long-term success depends on building a more collaborative and stronger team, and maintaining a regular release cycle (can be discussed separately).

Option 1 - Implement resynchronization in v3.5.20

Proposed by @ahrtr, preferred by @siyuanfoundation, @ivanvc

  • Prepare workaround in v3.6 to handle desynchronized storages.

In my proposal, no any workaround in 3.6 is needed.

whether we implement it in v3.5.20 or v3.6 doesn't matter.

No, it matters.

  • In 3.5, v2store is the source of truth for membership data, the auto-sync can be executed on every bootstrap.
  • But in 3.6, v3store is the source of truth, it's only safe to execute it in the very first bootstrap after upgrading to 3.6. So it's fragile & really risky. Doing this in 3.6 is even infeasible. If you insist on doing it, please provide a PoC.

New comments

@neolit123
Copy link

one concern that i have is that this bug was caught on the k/k pr by an upgrade test that runs on demand, manually.
kubernetes/kubernetes#130583 (comment)

the etcd maintainers might have missed this error and merged the upgrade to 3.6.0. later kubeadm periodic upgrade e2e would have failed and the kubeadm maintainers would have contacted the etcd maintainers with a report.

that is all good, but are there any plans to run local etcd upgrade e2e tests on the etcd project side? perhaps the etcd-operator project would help with that?

@ahrtr
Copy link
Member Author

ahrtr commented Mar 11, 2025

but are there any plans to run local etcd upgrade e2e tests on the etcd project side?

@neolit123 we have upgrade & downgrade e2e test cases, but the problem is that we don't have kubeadm style upgrade case (adding & promoting learners, upgrade later). We will add fill the gap, please see the last action item in the proposed actions in #19557 (comment)

@ahrtr
Copy link
Member Author

ahrtr commented Mar 11, 2025

If synchronizing members in v3.6 is too risky, I would argue that we at least v3.6 should validate and prevent upgrade from 3.5.x (pre .20).

If users are already on 3.5.20+, then all good; otherwise, there are two options as mentioned in
#19557 (comment) provides two options to users,

  • they either upgrade to 3.5.20+ firstly, then to 3.6.0 later.
  • or execute etcdutl check members and followed by etcdutl sync members if needed.

Now you are proposing to limit to only the first option. I am not strongly against this, the good side is that it simplifies the users' burden to understand this (it also saves our effort), the bad side it that it removes the flexibility.

From 3.6 perspective, we will have to add some validation on bootstrap, and check (1) whether or not there is any learner and (2) the version of any other members <= 3.5.19, and raise an alarm if any. Is this what you proposed? @serathius Actually I tend not to do it. Reasons:

  • If there are multiple learners (>=2), etcd will crash no matter we add such validation in 3.6;
  • If there is only one learner (actually should be a voting member), it will become a learner again after upgrading to 3.6. Users can promote it again manually to workaround it.

if a 3.6.0 member is added and it checks if there are existing < 3.5.20 members i think it can print a sensible error message and exit 1. i don't know if such version checks are easy to do, though.

please see my answer above.

@serathius
Copy link
Member

serathius commented Mar 11, 2025

If there are multiple learners (>=2), etcd will crash no matter we add such validation in 3.6;

Looking in bootstrap.go, looks like validation for max learners is pretty robust. It checks all paths and should prevent cluster with multiple learners to start. Still I think we should improve the error message by providing context about this issue.

If there is only one learner (actually should be a voting member), it will become a learner again after upgrading to 3.6. Users can promote it again manually to workaround it.

I'm not sure about this path, seems like there are many edge cases (Promotion before last snapshot, promotion after last snapshot, restarted was promoted, restarted member was not promoted, restarted member becomes leader), still you just assume will work the same in all cases and user will be able to easily fix it by promotion.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 11, 2025

I'm not sure about this path, seems like there are many edge cases (Promotion before last snapshot, promotion after last snapshot, restarted was promoted, restarted member was not promoted, restarted member becomes leader), still you just assume will work the same in all cases and user will be able to easily fix it by promotion.

That's why we need to fix it in 3.5 before upgrading to 3.6.

The key for now is whether we should forcibly prevent users from upgrading from 3.5.19 (or older versions) to 3.6. If yes, then we don't need the etcdutl check/sync members commands; and always requires users to bump to 3.5.20+, and to 3.6 later. I agree this is the most prudent way and should be able to prevent all possible issues.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 11, 2025

Summarisation.

release-3.5

release-3.6

Add validation on bootstrap, and raise an alarm if any other member is on 3.5.19 or older versions

@serathius
Copy link
Member

Add validation on bootstrap, and raise an alarm if any other member is on 3.5.19 or older versions

Raising an alarm has a specific meaning in etcd, and I would be even more worried about adding a new type of alarm and its consequences.

@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Mar 11, 2025

As @serathius mentioned, there are many edge cases to consider, so it is potentially more risky to rely on syncing from v2store in 3.6 in some cases when v3store should be the ground truth.
We don't necessarily have to raise any alarm, a better error msg about what to try next when the upgrade fails due to multiple learners and if any member version <= 3.5.19 would suffice.

@BenTheElder
Copy link

BenTheElder commented Mar 11, 2025

Clarifying:

1.33 is in code freeze, and that means only bug fixes are merging.

Kubernetes is not in code freeze until the 21st per https://k8s.dev/release (rendered form of the previous link)
I checked and non-bug-fix PRs are still merging today.

However, the 21st is pretty soon.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 11, 2025

See summary https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.efgyhz20f9eb

Please anyone feel free to add comment. Also added the link in the very first comment.

@wenjiaswe
Copy link
Contributor

wenjiaswe commented Mar 11, 2025

cc @dims @chaochn47 @jberkus @jmhbnz for visibility

@wenjiaswe
Copy link
Contributor

cc @stackbaek

ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 19, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 20, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 20, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Mar 20, 2025
caused by etcd-io#19557

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
@ahrtr
Copy link
Member Author

ahrtr commented Mar 20, 2025

Update & summary

We were planning to release 3.5.20 and land it to K8s 1.33 before the code freeze, but unfortunately, we didn't make it although I have been working & pushing hard. We have to land 3.5.20 to 1.33.1. cc @neolit123

We were planning to release etcd 3.6.0 before KubeCon, we have to postpone it until post KubeCon, because,

  • We need to release etcd 3.5.20 first.
  • We still need to finish the e2e test cases

release-3.5

release-3.6

We need to add two e2e tests below,

  • Add kubeadm style e2e test in main branch, and backport to release-3.6. @fuweid is working on this.
  • Add e2e test in release-3.6 only to verify that etcd is able to automatically fix the issues in which the users have already been affected. @fuweid I believe this test should be able to reuse most of the code in above e2e test, so are you able to take care of this case as well?

I also updated the google doc: https://docs.google.com/document/d/11wkyiK-bDYGOPtFfLGII7iPQxsXoYiq8f6L1fc30S4k/edit?tab=t.0#heading=h.58mxw7gv0jjj

@ahrtr
Copy link
Member Author

ahrtr commented Mar 20, 2025

Unfortunately, this isn't correct :( . 3.6 won't be able to automatically fix the issues which are already affected, because v3store is the source of truth in 3.6. cc @fuweid who confirmed this.

But adding validation in 3.6 also has flaws,

  • [minor concern] What if some of other member(s) are temporarily unreachable. We need to keep retrying async until getting all other members' version, but it shouldn't block the bootstrap process otherwise it may cause lack of quorum (chick-egg issue)
  • [major concern] If users upgrade all members from 3.5.19 (or older versions) to 3.6.0 at the same time (although it's unlikely, because usually it's rolling update), then the validation will pass because all members are already 3.6.0.

Options: we need to provide both options.

  • Still add validation. It has value although with flaws.
  • Update upgrade checklist/guide to remind users to double check the member's status after upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release/v3.5 release/v3.6 stage/triaged type/bug
Development

No branches or pull requests

7 participants