Skip to content

Commit

Permalink
KEP: kubeadm config HA support
Browse files Browse the repository at this point in the history
  • Loading branch information
Matt Kelly committed Mar 19, 2018
1 parent cee5f0b commit 4a716f2
Showing 1 changed file with 161 additions and 0 deletions.
161 changes: 161 additions & 0 deletions keps/sig-cluster-lifecycle/kubeadm-config-ha-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
kep-number: TBD
title: Augment Kubeadm Config to Enable Upgrades of HA Clusters
authors:
- "@mattkelly"
owning-sig: sig-cluster-lifecycle
reviewers:
- "@timothysc"
approvers:
- TBD
editor:
- "@mattkelly"
creation-date: 2018-03-08
last-updated: 2018-03-19
status: provisional
see-also:
- [KEP kubeadm join --master workflow](https://github.com/kubernetes/community/pull/1707) (in progress)
- [Upgrading kubeadm HA clusters from 1.9.x to 1.9.y](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm-upgrade-ha)
---

# Augment Kubeadm Config to Enable Upgrades of HA Clusters

## Table of Contents

* [Augment Kubeadm Config to Enable Upgrades of HA Clusters](#augment-kubeadm-config-to-enable-upgrades-of-ha-clusters)
* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Challenges and Open Questions](#challenges-and-open-questions)
* [Proposal](#proposal)
* [Implementation Details](#implementation-details)
* [Background](#background)
* [Adding Additional Master-Specific ConfigMaps](#adding-additional-master-specific-configmaps)
* [Key Design Considerations and Benefits](#key-design-considerations-and-benefits)
* [Parallel Node Creation](#parallel-node-creation)
* [Guaranteed Consistent kubeadm-config](#guaranteed-consistent-kubeadm-config)
* [Risks and Mitigations](#risks-and-mitigations)
* [Migrating Existing Clusters](#migrating-existing-clusters)
* [More Complex User Experience for Overriding Configuration](#more-complex-user-experience-for-overriding-configuration)
* [Implementation History](#implementation-history)
* [Drawbacks](#drawbacks)
* [Alternatives](#alternatives)

## Summary

One of the first steps of the upgrade process is to retrieve the `kubeadm-config` ConfigMap that is stored at `kubeadm init` time in order to get any relevant cluster configuration.
The current `kubeadm-config` ConfigMap was designed to support upgrades of clusters with a single master.
As we move towards supporting High Availability (HA) natively in kubeadm, the persistent configuration must be enhanced in order to store information unique to each master node.
In particular, master node-specific information is required in order for kubeadm to identify which control plane static pods belong to the current node during an upgrade (the static pods are identifiable by the `nodeName` field embedded in the pod name).

This KEP outlines a possible solution for adding and retrieving master node-specific information through the use of additional kubeadm ConfigMaps.

## Motivation

Kubeadm is driving towards natively supporting highly available clusters.
As part of HA support, a clean upgrade path is required.
The purpose of this KEP is to introduce support for multiple masters in the kubeadm configuration that is stored in-cluster in order to enable that clean upgrade path.

### Goals

Enable `kubeadm upgrade` of highly available clusters by augmenting the existing persistent kubeadm configuration.

### Non-Goals

This proposal does not aim to solve the entire problem of upgrading HA clusters.
This KEP specifically tackles the persistent configuration problem so that the information required at upgrade time is available.

### Challenges and Open Questions

The final implementation of this KEP will require deciding exactly what "master node-specific information" means.
Currently, the `nodeName` of the master is the only entry that is unarguably node-specific.
However, it may be possible that additional config entries could be split out into the node-specific area(s) of the config.
This could result in asymmetric configuration across the masters, which may or may not be something that we wish to support.

## Proposal

### Implementation Details

#### Background

Currently, the `kubeadm-config` ConfigMap in the `kube-system` namespace serves as the single source of truth for how kubeadm has been used to create and modify a cluster.
Because kubeadm is not a process that runs on the cluster (it is only run to perform operations, e.g. `init` and `upgrade`), this config is not modified during normal operation.
In the non-HA case today, it is guaranteed to be an accurate representation of the kubeadm configuration.

If kubeadm is used to create an HA cluster today, e.g. using the workarounds described in [kubeadm #546](https://github.com/kubernetes/kubeadm/issues/546) and/or @mbert's [document](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg), then the `kubeadm-config` ConfigMap will be an accurate representation except for any master node-specific information.
As explained in [Challenges and Open Questions](#challenges-and-open-questions), such node-specific information is not yet well-defined but minimally consists of the master's `nodeName`.
The `nodeName` in `kubeadm-config` will correspond to the last master that happened to write to the ConfigMap.
In the case of parallel node creation, this may not be well-defined.
When `kubeadm upgrade` is run on a master and this `nodeName` is fetched, it may be incorrect and the upgrade process will fail.

#### Adding Additional Master-Specific ConfigMaps

The proposed solution is to add additional kubeadm ConfigMaps that are specific to each master (one ConfigMap for each master).
Each master-specific ConfigMap will be created as part of the `kubeadm init` process for the initial master and as part of the to-be-implemented [`kubeadm join --master` process](https://github.com/kubernetes/community/pull/1707) for additional masters.
Any master-specific information in the main `kubeadm-config` ConfigMap will be removed.
Each master-specific ConfigMap can be automatically deleted via garbage collection (see [below](#guaranteed-consistent-kubeadm-config) for details).

The names of these new ConfigMaps will be `kubeadm-config-<machine_UID>` where `machine_UID` is an identifier that is guaranteed to be unique for each node in the cluster.
There is a precedent for using such a `machine_UID`, and in fact kubeadm already has a [prerequisite](https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product_uuid-are-unique-for-every-node) that such machine identifiers be unique for every node.
For the purpose of this KEP, let us assume that `machine_UID` is the full `product_uuid` of the machine that the master node is running on.

Kubeadm operations such as upgrade that require master-specific information should now also retrieve the corresponding ConfigMap for their node.
This master-specific configuration will be explicitly provided to any functions that require it instead of e.g. merging everything into one configuration.

##### Key Design Considerations and Benefits

There are a few key benefits to the approach of adding additional ConfigMaps over an approach which would augment the existing `kubeadm-config` with master-specific information:

###### Parallel Node Creation

Node creation in parallel is a valid use-case that works today.
By adding additional ConfigMaps instead of requiring each master to modify the existing `kubeadm-config`, we avoid the need to lock on that ConfigMap.

###### Guaranteed Consistent kubeadm-config

This approach allows us to continue to guarantee that the main `kubeadm-config` is consistent with the actual cluster configuration.
If we put master-specific information into `kubeadm-config` itself, then we would require either a yet-to-be-defined `kubeadm leave` workflow or active reconciliation of `kubeadm-config` in order to ensure accurateness.
This may not be critical, but it is a consideration.

With this proposal, if a node unexpectedly leaves a cluster, then at worst a dangling ConfigMap will be left in the cluster.
For the case where a node is explicitly deleted, we can leverage garbage collection to automatically delete the master-specific ConfigMap by listing the node as an `ownerReference` when the ConfigMap is created.

### Risks and Mitigations

#### Migrating Existing Clusters

There will be situations in which a kubeadm operation (e.g. upgrade) that requires the new master-specific ConfigMap is run and finds that the expected ConfigMap does not exist.
For example, this will happen for users who are upgrading HA clusters that were created using the aforementioned workarounds required before kubeadm HA support is available.
We can automate the creation of missing node-specific ConfigMaps in the following manner when a `kubeadm upgrade` (or other operation requiring it) is performed:

1. Determine which node kubeadm is currently running on by listing all master nodes in the cluster and looking at the existing `product_uuid` field
2. Get the `nodeName` from this node's metadata
3. Compare the current `nodeName` to the `nodeName` in `kubeadm-config`
4. If they are equal, update `kubeadm-config` to remove node-specific information (to match the new `kubeadm-config` specification)
5. Create the `kubeadm-config-<machine_UID>` ConfigMap for this node
6. Continue the upgrade process

#### More Complex User Experience for Overriding Configuration

Currently, users may override configuration items by providing a configuration file when running kubeadm.
The existence of additional, disjoint ConfigMaps may make the user experience more complex for overriding configuration.
One possibility for mitigating this would be to keep the `kubeadm-config` specification the same as it is today instead of removing fields.
This would allow a user to specify any node-specific information in the same configuration file instead of having to provide multiple files.
Placing this information in the appropriate node-specific ConfigMap would be an implementation detail not requiring any impact to user experience.

## Implementation History

- [Issue #546: Workarounds for the time before kubeadm HA becomes available](https://github.com/kubernetes/kubeadm/issues/546)
- [Adding HA to kubeadm-deployed clusters](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg)
- [Issue #706: Make kubeadm upgrade HA ready](https://github.com/kubernetes/kubeadm/issues/706)

## Drawbacks

This KEP introduces additional ConfigMaps for kubeadm to use (one for each master).

## Alternatives

An alternative approach would be to augment the existing `kubeadm-config` ConfigMap with master-specific information.
The advantages over this approach are detailed in the [Proposal](#proposal) section.

0 comments on commit 4a716f2

Please sign in to comment.