-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Matt Kelly
committed
Mar 19, 2018
1 parent
cee5f0b
commit 4a716f2
Showing
1 changed file
with
161 additions
and
0 deletions.
There are no files selected for viewing
161 changes: 161 additions & 0 deletions
161
keps/sig-cluster-lifecycle/kubeadm-config-ha-support.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
--- | ||
kep-number: TBD | ||
title: Augment Kubeadm Config to Enable Upgrades of HA Clusters | ||
authors: | ||
- "@mattkelly" | ||
owning-sig: sig-cluster-lifecycle | ||
reviewers: | ||
- "@timothysc" | ||
approvers: | ||
- TBD | ||
editor: | ||
- "@mattkelly" | ||
creation-date: 2018-03-08 | ||
last-updated: 2018-03-19 | ||
status: provisional | ||
see-also: | ||
- [KEP kubeadm join --master workflow](https://github.com/kubernetes/community/pull/1707) (in progress) | ||
- [Upgrading kubeadm HA clusters from 1.9.x to 1.9.y](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm-upgrade-ha) | ||
--- | ||
|
||
# Augment Kubeadm Config to Enable Upgrades of HA Clusters | ||
|
||
## Table of Contents | ||
|
||
* [Augment Kubeadm Config to Enable Upgrades of HA Clusters](#augment-kubeadm-config-to-enable-upgrades-of-ha-clusters) | ||
* [Table of Contents](#table-of-contents) | ||
* [Summary](#summary) | ||
* [Motivation](#motivation) | ||
* [Goals](#goals) | ||
* [Non-Goals](#non-goals) | ||
* [Challenges and Open Questions](#challenges-and-open-questions) | ||
* [Proposal](#proposal) | ||
* [Implementation Details](#implementation-details) | ||
* [Background](#background) | ||
* [Adding Additional Master-Specific ConfigMaps](#adding-additional-master-specific-configmaps) | ||
* [Key Design Considerations and Benefits](#key-design-considerations-and-benefits) | ||
* [Parallel Node Creation](#parallel-node-creation) | ||
* [Guaranteed Consistent kubeadm-config](#guaranteed-consistent-kubeadm-config) | ||
* [Risks and Mitigations](#risks-and-mitigations) | ||
* [Migrating Existing Clusters](#migrating-existing-clusters) | ||
* [More Complex User Experience for Overriding Configuration](#more-complex-user-experience-for-overriding-configuration) | ||
* [Implementation History](#implementation-history) | ||
* [Drawbacks](#drawbacks) | ||
* [Alternatives](#alternatives) | ||
|
||
## Summary | ||
|
||
One of the first steps of the upgrade process is to retrieve the `kubeadm-config` ConfigMap that is stored at `kubeadm init` time in order to get any relevant cluster configuration. | ||
The current `kubeadm-config` ConfigMap was designed to support upgrades of clusters with a single master. | ||
As we move towards supporting High Availability (HA) natively in kubeadm, the persistent configuration must be enhanced in order to store information unique to each master node. | ||
In particular, master node-specific information is required in order for kubeadm to identify which control plane static pods belong to the current node during an upgrade (the static pods are identifiable by the `nodeName` field embedded in the pod name). | ||
|
||
This KEP outlines a possible solution for adding and retrieving master node-specific information through the use of additional kubeadm ConfigMaps. | ||
|
||
## Motivation | ||
|
||
Kubeadm is driving towards natively supporting highly available clusters. | ||
As part of HA support, a clean upgrade path is required. | ||
The purpose of this KEP is to introduce support for multiple masters in the kubeadm configuration that is stored in-cluster in order to enable that clean upgrade path. | ||
|
||
### Goals | ||
|
||
Enable `kubeadm upgrade` of highly available clusters by augmenting the existing persistent kubeadm configuration. | ||
|
||
### Non-Goals | ||
|
||
This proposal does not aim to solve the entire problem of upgrading HA clusters. | ||
This KEP specifically tackles the persistent configuration problem so that the information required at upgrade time is available. | ||
|
||
### Challenges and Open Questions | ||
|
||
The final implementation of this KEP will require deciding exactly what "master node-specific information" means. | ||
Currently, the `nodeName` of the master is the only entry that is unarguably node-specific. | ||
However, it may be possible that additional config entries could be split out into the node-specific area(s) of the config. | ||
This could result in asymmetric configuration across the masters, which may or may not be something that we wish to support. | ||
|
||
## Proposal | ||
|
||
### Implementation Details | ||
|
||
#### Background | ||
|
||
Currently, the `kubeadm-config` ConfigMap in the `kube-system` namespace serves as the single source of truth for how kubeadm has been used to create and modify a cluster. | ||
Because kubeadm is not a process that runs on the cluster (it is only run to perform operations, e.g. `init` and `upgrade`), this config is not modified during normal operation. | ||
In the non-HA case today, it is guaranteed to be an accurate representation of the kubeadm configuration. | ||
|
||
If kubeadm is used to create an HA cluster today, e.g. using the workarounds described in [kubeadm #546](https://github.com/kubernetes/kubeadm/issues/546) and/or @mbert's [document](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg), then the `kubeadm-config` ConfigMap will be an accurate representation except for any master node-specific information. | ||
As explained in [Challenges and Open Questions](#challenges-and-open-questions), such node-specific information is not yet well-defined but minimally consists of the master's `nodeName`. | ||
The `nodeName` in `kubeadm-config` will correspond to the last master that happened to write to the ConfigMap. | ||
In the case of parallel node creation, this may not be well-defined. | ||
When `kubeadm upgrade` is run on a master and this `nodeName` is fetched, it may be incorrect and the upgrade process will fail. | ||
|
||
#### Adding Additional Master-Specific ConfigMaps | ||
|
||
The proposed solution is to add additional kubeadm ConfigMaps that are specific to each master (one ConfigMap for each master). | ||
Each master-specific ConfigMap will be created as part of the `kubeadm init` process for the initial master and as part of the to-be-implemented [`kubeadm join --master` process](https://github.com/kubernetes/community/pull/1707) for additional masters. | ||
Any master-specific information in the main `kubeadm-config` ConfigMap will be removed. | ||
Each master-specific ConfigMap can be automatically deleted via garbage collection (see [below](#guaranteed-consistent-kubeadm-config) for details). | ||
|
||
The names of these new ConfigMaps will be `kubeadm-config-<machine_UID>` where `machine_UID` is an identifier that is guaranteed to be unique for each node in the cluster. | ||
There is a precedent for using such a `machine_UID`, and in fact kubeadm already has a [prerequisite](https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product_uuid-are-unique-for-every-node) that such machine identifiers be unique for every node. | ||
For the purpose of this KEP, let us assume that `machine_UID` is the full `product_uuid` of the machine that the master node is running on. | ||
|
||
Kubeadm operations such as upgrade that require master-specific information should now also retrieve the corresponding ConfigMap for their node. | ||
This master-specific configuration will be explicitly provided to any functions that require it instead of e.g. merging everything into one configuration. | ||
|
||
##### Key Design Considerations and Benefits | ||
|
||
There are a few key benefits to the approach of adding additional ConfigMaps over an approach which would augment the existing `kubeadm-config` with master-specific information: | ||
|
||
###### Parallel Node Creation | ||
|
||
Node creation in parallel is a valid use-case that works today. | ||
By adding additional ConfigMaps instead of requiring each master to modify the existing `kubeadm-config`, we avoid the need to lock on that ConfigMap. | ||
|
||
###### Guaranteed Consistent kubeadm-config | ||
|
||
This approach allows us to continue to guarantee that the main `kubeadm-config` is consistent with the actual cluster configuration. | ||
If we put master-specific information into `kubeadm-config` itself, then we would require either a yet-to-be-defined `kubeadm leave` workflow or active reconciliation of `kubeadm-config` in order to ensure accurateness. | ||
This may not be critical, but it is a consideration. | ||
|
||
With this proposal, if a node unexpectedly leaves a cluster, then at worst a dangling ConfigMap will be left in the cluster. | ||
For the case where a node is explicitly deleted, we can leverage garbage collection to automatically delete the master-specific ConfigMap by listing the node as an `ownerReference` when the ConfigMap is created. | ||
|
||
### Risks and Mitigations | ||
|
||
#### Migrating Existing Clusters | ||
|
||
There will be situations in which a kubeadm operation (e.g. upgrade) that requires the new master-specific ConfigMap is run and finds that the expected ConfigMap does not exist. | ||
For example, this will happen for users who are upgrading HA clusters that were created using the aforementioned workarounds required before kubeadm HA support is available. | ||
We can automate the creation of missing node-specific ConfigMaps in the following manner when a `kubeadm upgrade` (or other operation requiring it) is performed: | ||
|
||
1. Determine which node kubeadm is currently running on by listing all master nodes in the cluster and looking at the existing `product_uuid` field | ||
2. Get the `nodeName` from this node's metadata | ||
3. Compare the current `nodeName` to the `nodeName` in `kubeadm-config` | ||
4. If they are equal, update `kubeadm-config` to remove node-specific information (to match the new `kubeadm-config` specification) | ||
5. Create the `kubeadm-config-<machine_UID>` ConfigMap for this node | ||
6. Continue the upgrade process | ||
|
||
#### More Complex User Experience for Overriding Configuration | ||
|
||
Currently, users may override configuration items by providing a configuration file when running kubeadm. | ||
The existence of additional, disjoint ConfigMaps may make the user experience more complex for overriding configuration. | ||
One possibility for mitigating this would be to keep the `kubeadm-config` specification the same as it is today instead of removing fields. | ||
This would allow a user to specify any node-specific information in the same configuration file instead of having to provide multiple files. | ||
Placing this information in the appropriate node-specific ConfigMap would be an implementation detail not requiring any impact to user experience. | ||
|
||
## Implementation History | ||
|
||
- [Issue #546: Workarounds for the time before kubeadm HA becomes available](https://github.com/kubernetes/kubeadm/issues/546) | ||
- [Adding HA to kubeadm-deployed clusters](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg) | ||
- [Issue #706: Make kubeadm upgrade HA ready](https://github.com/kubernetes/kubeadm/issues/706) | ||
|
||
## Drawbacks | ||
|
||
This KEP introduces additional ConfigMaps for kubeadm to use (one for each master). | ||
|
||
## Alternatives | ||
|
||
An alternative approach would be to augment the existing `kubeadm-config` ConfigMap with master-specific information. | ||
The advantages over this approach are detailed in the [Proposal](#proposal) section. |