diff --git a/keps/sig-cluster-lifecycle/kubeadm-config-ha-support.md b/keps/sig-cluster-lifecycle/kubeadm-config-ha-support.md new file mode 100644 index 00000000000..ceb71e1af64 --- /dev/null +++ b/keps/sig-cluster-lifecycle/kubeadm-config-ha-support.md @@ -0,0 +1,147 @@ +--- +kep-number: TBD +title: Augment Kubeadm Config to Enable Upgrades of HA Clusters +authors: + - "@mattkelly" +owning-sig: sig-cluster-lifecycle +reviewers: + - TBD +approvers: + - TBD +editor: + - "@mattkelly" +creation-date: 2018-03-08 +last-updated: 2018-03-09 +status: provisional +see-also: + - [KEP kubeadm join --master workflow](https://github.com/kubernetes/community/pull/1707) (in progress) +--- + +# Augment Kubeadm Config to Enable Upgrades of HA Clusters + +## Table of Contents + +* [Augment Kubeadm Config to Enable Upgrades of HA Clusters](#augment-kubeadm-config-to-enable-upgrades-of-ha-clusters) + * [Table of Contents](#table-of-contents) + * [Summary](#summary) + * [Motivation](#motivation) + * [Goals](#goals) + * [Non-Goals](#non-goals) + * [Challenges and Open Questions](#challenges-and-open-questions) + * [Proposal](#proposal) + * [Implementation Details](#implementation-details) + * [Understanding the Situation Today](#understanding-the-situation-today) + * [Adding Additional Master-Specific ConfigMaps](#adding-additional-master-specific-configmaps) + * [Key Design Considerations and Benefits](#key-design-considerations-and-benefits) + * [Parallel Node Creation](#parallel-node-creation) + * [Guaranteed Consistent kubeadm-config](#guaranteed-consistent-kubeadm-config) + * [Risks and Mitigations](#risks-and-mitigations) + * [Implementation History](#implementation-history) + * [Drawbacks [optional]](#drawbacks-optional) + * [Alternatives [optional]](#alternatives-optional) + +## Summary + +Currently, `kubeadm upgrade` of a master in a multi-master cluster will fail without workarounds because of a lack of node-specific master configuration in the `kubeadm-config` ConfigMap that is created at `init` time and later referenced during an upgrade. +In particular, node-specific information is required in order for kubeadm to identify which control plane static pods belong to the current node during an upgrade. +In the non-HA case, having a single `nodeName` property in `kubeadm-config` that corresponds to the single master is sufficient because there is no ambiguity. +As we move towards supporting HA natively in kubeadm, a new approach is required to uniquely identify master nodes. + +## Motivation + +Kubeadm is driving towards natively supporting highly available clusters. +As part of HA support, a clean upgrade path is required. +The purpose of this KEP is simply to introduce support for multiple masters in the kubeadm configuration that is stored in-cluster in order to enable that clean upgrade path. + +### Goals + +Enable `kubeadm upgrade` of highly available clusters by augmenting the existing persistent kubeadm configuration. + +### Non-Goals + +This proposal does not aim to solve the entire problem of upgrading HA clusters. +This KEP specifically tackles the persistent configuration problem so that the information required at upgrade time is available. + +### Challenges and Open Questions + +The final implementation of this KEP will require deciding exactly what "master node-specific information" means. +Currently, the `nodeName` of the master is the only entry that is unarguably node-specific. +However, it may be possible that additional config entries could be split out into the node-specific area(s) of the config. +This could result in asymmetric configuration across the masters, which may or may not be something that we wish to support. + +## Proposal + +### Implementation Details + +#### Understanding the Situation Today + +Currently, the `kubeadm-config` ConfigMap in the `kube-system` namespace serves as the single source of truth for how kubeadm has been used to create and modify a cluster. +Because kubeadm is not a process that runs on the cluster (it is only run to perform operations, e.g. `init` and `upgrade`), this config is not modified during normal operation. +In the non-HA case today, it is guaranteed to be an accurate representation of the kubeadm configuration. + +If kubeadm is used to create an HA cluster today, e.g. using the workarounds described in [kubeadm #546](https://github.com/kubernetes/kubeadm/issues/546) and/or @mbert's [document](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg), then the `kubeadm-config` ConfigMap will be an accurate representation except for any master node-specific information. +As explained in [Challenges and Open Questions](#challenges-and-open-questions), such node-specific information is not yet well-defined but minimally consists of the master's `nodeName`. +The `nodeName` in `kubeadm-config` will correspond to the last master that happened to write to the ConfigMap. +In the case of parallel node creation, this may not be well-defined. +When `kubeadm upgrade` is run on a master and this `nodeName` is fetched, it may be incorrect and the upgrade process will fail. + +#### Adding Additional Master-Specific ConfigMaps + +The proposed solution is to add additional kubeadm ConfigMaps that are specific to each master (one ConfigMap for each master). +Each master-specific ConfigMap will be created as part of the to-be-implemented [`kubeadm join --master` process](https://github.com/kubernetes/community/pull/1707). +Any master-specific information in the main `kubeadm-config` ConfigMap will be removed. + +The names of these new ConfigMaps will be `kubeadm-config-` where `machine_UID` is an identifier that is guaranteed to be unique for each node in the cluster. +There is a precedent for using such a `machine_UID`, and in fact kubeadm already has a [prerequisite](https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product_uuid-are-unique-for-every-node) that such machine identifiers be unique for every node. +For the purpose of this KEP, let us assume that `machine_UID` is the full `product_uuid` of the machine that the master node is running on. + +Kubeadm operations such as upgrade that require master-specific information should now also grab the corresponding ConfigMap for their node. + +##### Key Design Considerations and Benefits + +There are a few key benefits to the approach of adding additional ConfigMaps over an approach which would augment the existing `kubeadm-config` with master-specific information: + +###### Parallel Node Creation + +Node creation in parallel is a valid use-case that works today. +If we required that each master modify `kubeadm-config`, then there may be concern around concurrent modification of this ConfigMap. + +###### Guaranteed Consistent kubeadm-config + +This approach allows us to continue to guarantee that the main `kubeadm-config` is consistent with the actual cluster configuration. +If we put master-specific information into `kubeadm-config` itself, then we would require either a yet-to-be-defined `kubeadm leave` workflow or active reconciliation of `kubeadm-config` in order to ensure accurateness. +This may not be critical, but it is a consideration. + +With this proposal, if a node unexpectedly leaves a cluster, then at worst a stale ConfigMap will be left in the cluster. +For the case where a node is explicitly deleted, we can leverage garbage collection to automatically delete the master-specific ConfigMap by listing the node as an `ownerReference` when the ConfigMap is created. + +### Risks and Mitigations + +There will be situations in which a kubeadm operation (e.g. upgrade) that requires the new master-specific ConfigMap is run and finds that the expected ConfigMap does not exist. +For example, this will happen for users who are upgrading HA clusters that were created using the aforementioned workarounds required before kubeadm HA support is available. +In this case, there are several options: + +1. Kubeadm can fall back to looking at the main `kubeadm-config`. + The user would have to manually modify `kubeadm-config` for each master to set the `nodeName` to the current master. + This is the [recommended workaround](https://github.com/kubernetes/kubeadm/issues/546#issuecomment-365063404) today. + +2. Kubeadm can provide an additional command similar to (or an extension of) the existing [`kubeadm config upload`](https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-config/) command. + This command would (minimally) create the `kubeadm-config-` ConfigMap for the current master, which would allow a subsequent `kubeadm upgrade` to succeed. + +It seems reasonable to assume that any user that created an HA cluster using kubeadm before the existence `kubeadm join --master` workflow should be aware that workarounds will be required for upgrading (since they presumably applied many workarounds to create the cluster in the first place). +In any case, useful documentation and error messages are critical to a good user experience. + +## Implementation History + +- [Issue #546: Workarounds for the time before kubeadm HA becomes available](https://github.com/kubernetes/kubeadm/issues/546) +- [Adding HA to kubeadm-deployed clusters](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg) +- [Issue #706: Make kubeadm upgrade HA ready](https://github.com/kubernetes/kubeadm/issues/706) + +## Drawbacks [optional] + +This KEP introduces additional ConfigMaps for kubeadm to use (one for each master). + +## Alternatives [optional] + +An alternative approach would be to augment the existing `kubeadm-config` ConfigMap with master-specific information. +The advantages over this approach are detailed in the [Proposal](#proposal) section.