Skip to content

Commit

Permalink
Merge pull request kubernetes#207 from deads2k/etcd-upgrade
Browse files Browse the repository at this point in the history
describe etcd 4.3 to 4.4 upgrade/downgrade
  • Loading branch information
openshift-merge-robot authored Feb 19, 2020
2 parents 6d25129 + 91688f1 commit 2cb29e1
Showing 1 changed file with 31 additions and 1 deletion.
32 changes: 31 additions & 1 deletion enhancements/etcd/cluster-etcd-operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,37 @@ items:

### Upgrade / Downgrade Strategy

TODO
Assume that I update the above to indicate that we're using a static pod operator like the kube-apiserver.
It rolls out one node at a time, not coordinated with the MCO in any way, without regard for PDBs.
It does prefer updating it's own crashlooping or unready pods to bringing down working members (we already did this).

#### Upgrade from 4.3 to 4.4

1. The 4.4-etcd-staticpod moves the /etcd/kubernetes/manifests/etcd-member.yaml to a backup location before trying to start etcd.
2. This causes the 4.3-machineconfigpool to go degraded because the file that it tries to maintain is gone.
1. We discovered that the MCO does not upgrade past a degraded condition.
2. This appears to be flaw in the MCO that prevents upgrading past bugs, but for the short term, we will simply
skip evaluating the file in question.
3. If the master node restarts using a 4.3-machineconfigpool, the old /etcd/kubernetes/manifests/etcd-member.yaml will come back.
This is ok because the 4.4-etcd-staticpod will remove it again and try to claim the same port.
4. The 4.4-mco will not have an etcd-member.yaml file. When the 4.4-mco restarts master nodes, they will start back up
and not have a /etcd/kubernetes/manifests/etcd-member.yaml. This means the 4.4-machineconfigpool will be healthy again.

#### Downgrade from 4.4 to 4.3

The cluster can function without intervention, but to fully restore 4.3, manual intervention is required.

1. The 4.4-etcd-pod exists on every master. Recall that it moves /etcd/kubernetes/manifests/etcd-member.yaml to a backup location before trying to start etcd.
2. The 4.4-etcd-pod are still maintained by the 4.4 etcd operator because the CVO doesn't know how to remove any resources.
3. If left, this will leave a 4.3 cluster with a 4.4 style etcd and degraded machineconfigpools.
The cluster can run in this state for a very long time.
4. To clean up, upgrade again. Or....
5. Delete the openshift-etcd-operator namespace and wait for it to be removed.
6. **One master at at time**...
1. move the 4.4-etcd-pod to a backup location
2. restore the etcd-member.yaml from its backup location
3. wait for the etcd-member to rejoin
4. move to the next master.

### Version Skew Strategy

Expand Down

0 comments on commit 2cb29e1

Please sign in to comment.