Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: kubeadm config HA support #1915

Closed
wants to merge 1 commit into from

Conversation

mattkelly
Copy link

This KEP elaborates on the approach suggested by @fabriziopandini in kubernetes/kubeadm#706 (comment) to augment the existing kubeadm-config with additional master-specific ConfigMaps in order to enable HA upgrades in the future.

An alternative approach (elaborated on in the KEP) would be to augment the existing kubeadm-config directly without the use of additional ConfigMaps. The multiple ConfigMap approach makes more sense to me so that's what I decided to propose. If there is a lot of opposition to this approach, then I'm happy to draft a new KEP.

I will also work on prototyping this as we agreed. I just wanted to get something out there to get a more serious discussion moving.

cc @timothysc @luxas @fabriziopandini @stealthybox @mbert

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 9, 2018
@k8s-ci-robot k8s-ci-robot requested review from jbeda and luxas March 9, 2018 14:35
@mattkelly mattkelly force-pushed the kep-kubeadm-config branch from a6b0da0 to cf2338e Compare March 9, 2018 14:39
@k8s-github-robot k8s-github-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Mar 9, 2018
@mattkelly mattkelly force-pushed the kep-kubeadm-config branch from cf2338e to 06e8686 Compare March 9, 2018 15:22
@cblecker
Copy link
Member

cblecker commented Mar 9, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 9, 2018
@timothysc timothysc requested review from timothysc, detiber and fabriziopandini and removed request for jbeda March 14, 2018 14:06
@timothysc
Copy link
Member

/cc @jamiehannaford @detiber for review

@k8s-ci-robot
Copy link
Contributor

@timothysc: GitHub didn't allow me to request PR reviews from the following users: for, review.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @jamiehannaford @detiber for review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I like the approach and +1 for a WIP to review and reference. The flow of the doc is a little weird, but that is minor imo.

/cc @fabriziopandini @detiber

- "@mattkelly"
owning-sig: sig-cluster-lifecycle
reviewers:
- TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • me

last-updated: 2018-03-09
status: provisional
see-also:
- [KEP kubeadm join --master workflow](https://github.com/kubernetes/community/pull/1707) (in progress)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link the HA upgrade doc, which should now be merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


### Implementation Details

#### Understanding the Situation Today
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just call this Background

Each master-specific ConfigMap will be created as part of the to-be-implemented [`kubeadm join --master` process](https://github.com/kubernetes/community/pull/1707).
Any master-specific information in the main `kubeadm-config` ConfigMap will be removed.

The names of these new ConfigMaps will be `kubeadm-config-<machine_UID>` where `machine_UID` is an identifier that is guaranteed to be unique for each node in the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How and when are they removed?


The names of these new ConfigMaps will be `kubeadm-config-<machine_UID>` where `machine_UID` is an identifier that is guaranteed to be unique for each node in the cluster.
There is a precedent for using such a `machine_UID`, and in fact kubeadm already has a [prerequisite](https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product_uuid-are-unique-for-every-node) that such machine identifiers be unique for every node.
For the purpose of this KEP, let us assume that `machine_UID` is the full `product_uuid` of the machine that the master node is running on.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the UID be consistent across cloud provider grand reboot and migration scenarios?


###### Parallel Node Creation

Node creation in parallel is a valid use-case that works today.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it eliminates the need for locking on a single configmap.

FWIW we have those primitives built into the system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point - fixed.

This may not be critical, but it is a consideration.

With this proposal, if a node unexpectedly leaves a cluster, then at worst a stale ConfigMap will be left in the cluster.
For the case where a node is explicitly deleted, we can leverage garbage collection to automatically delete the master-specific ConfigMap by listing the node as an `ownerReference` when the ConfigMap is created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh this answers my previous question.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note above about deletion as well, referencing this section. I may further restructure this to make it flow better.

For example, this will happen for users who are upgrading HA clusters that were created using the aforementioned workarounds required before kubeadm HA support is available.
In this case, there are several options:

1. Kubeadm can fall back to looking at the main `kubeadm-config`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just:

  1. scan for existing non_UID configs
  2. read in and check local name = name in the config, if so delete old and drop new.
  3. if not, create new one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
We should ensure an automatic upgrade flow for single--master nodes
For HA cluster, created with manual workaround, IMO It is acceptable to document some manual step as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe we can automate this actually, which is great.

I was originally thinking it would be difficult / impossible due to not being able to decide whether the existing NodeName refers to the current node or not, given the lack of a UID in kubeadm-config. However, we can do this:

  1. Figure out which node we are by listing all master nodes in the cluster and looking at the existing product_uuid field
  2. Pull the NodeName from our node's metadata
  3. Do as @timothysc describes in his comment

Or maybe there's an even simpler way to determine the current node's NodeName that I'm overlooking?

The user would have to manually modify `kubeadm-config` for each master to set the `nodeName` to the current master.
This is the [recommended workaround](https://github.com/kubernetes/kubeadm/issues/546#issuecomment-365063404) today.

2. Kubeadm can provide an additional command similar to (or an extension of) the existing [`kubeadm config upload`](https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-config/) command.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can automate it.

@mattkelly mattkelly force-pushed the kep-kubeadm-config branch 2 times, most recently from 60fe0e8 to fdd6ed7 Compare March 14, 2018 17:18

## Summary

Currently, `kubeadm upgrade` of a master in a multi-master cluster will fail without workarounds because of a lack of node-specific master configuration in the `kubeadm-config` ConfigMap that is created at `init` time and later referenced during an upgrade.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is a little bit unfair saying that the current implementation will fail (it hints a wrong implementation)
What about rephrasing this a little bit and telling that the kubeadm-config-machineUID was originally designed for supporting upgrades in single-master clusters, and now, As we move towards supporting HA natively in kubeadm, this design should be improved?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this could be misinterpreted - I will reword it along the lines of your suggestion.

#### Adding Additional Master-Specific ConfigMaps

The proposed solution is to add additional kubeadm ConfigMaps that are specific to each master (one ConfigMap for each master).
Each master-specific ConfigMap will be created as part of the to-be-implemented [`kubeadm join --master` process](https://github.com/kubernetes/community/pull/1707).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that also kubeadm init should create a configMap for the first master...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I will clarify this. Explained in more detail in a comment below.

There is a precedent for using such a `machine_UID`, and in fact kubeadm already has a [prerequisite](https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product_uuid-are-unique-for-every-node) that such machine identifiers be unique for every node.
For the purpose of this KEP, let us assume that `machine_UID` is the full `product_uuid` of the machine that the master node is running on.

Kubeadm operations such as upgrade that require master-specific information should now also grab the corresponding ConfigMap for their node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for checking if I got this right:

  • kubeadm init will continue to use the current master Configuration struct (with both cluster info and master specific info)
  • when creating the kubeadm configMap, master specific info will be stripped down and stored in a separated configMap named using the machine UID
  • when joining a new master node, an additional master specific configMap must be created for the joining machine UID
  • when upgrading, the master specific configMap with the current machine UID, and merge d together with the kubeadm-- config configMap into the current Master Configuration struct
    Is that right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was picturing this slightly differently - I think my wording is unclear and I will have to edit it to clarify:

  • The current MasterConfiguration struct type (which becomes the kubeadm-config ConfigMap) will be modified to remove master-specific entries (NodeName definitely, others TBD)
  • A new struct type MasterNodeConfiguration (don't love the name, still thinking about it in POC implementation) and corresponding kubeadm-config-<UID> will be defined to contain anything stripped from MasterConfiguration (minimally NodeName, others TBD)
  • kubeadm init will remain as-is today except it will additionally create the kubeadm-config-<UID> for the current node
  • kubeadm join --master (when it exists) will reference kubeadm-config but it will not modify it - it will only create the kubeadm-config-<UID> for the current node
  • kubeadm upgrade will reference both kubeadm-config and kubeadm-config-<UID> for this node, however no merging will be performed - it will simply look at both structs

Does this make sense? If so, I will update accordingly to clarify. The key points are that no explicit merging is performed and kubeadm-config and kubeadm-config-<UID> are disjoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the expected workflow for init where a user would be overriding config that would end up in both the kubeadm-config and kubeadm-config- configmaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattkelly

The key points are that no explicit merging is performed and kubeadm-config and kubeadm-config-<UID> are disjoint.
Does this make sense?

It makes sense, but I'm a little bit concerned about the impact on the user experience...
By designing two disjoint configMaps, do you expect also changes in the kubeadm init user interface? e.g. from keadm init --config to kubeadm init --config --node-config (to pass two files instead of one)

I guess the prototype will answer to those questions as well; in the meantime, if you agree, we could just take note of the possibile alternative - that provide a less clean representation of the HA cluster - but preserves the current user interface (focused only on the node kubeadm is running) while managing the complexity of the cluster in behind the scene.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini that's a good point, and I think also what @detiber was hinting at. I hadn't considered the user-facing impact of using disjoint ConfigMaps. I'll make a note in the meantime, and I agree that the POC will shed some light on how to best manage this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note in the Risks and Mitigations section explaining this. Once I finally get the POC out, this can always be updated.

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattkelly overall it looks good; only some cosmetic changes/minor improvements + one question for checking if I got this right

## Summary

Currently, `kubeadm upgrade` of a master in a multi-master cluster will fail without workarounds because of a lack of node-specific master configuration in the `kubeadm-config` ConfigMap that is created at `init` time and later referenced during an upgrade.
In particular, node-specific information is required in order for kubeadm to identify which control plane static pods belong to the current node during an upgrade.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is a bit unclear to me. Could we better relate how "node-specific information" relate to a static pod?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

In the non-HA case, having a single `nodeName` property in `kubeadm-config` that corresponds to the single master is sufficient because there is no ambiguity.
As we move towards supporting HA natively in kubeadm, a new approach is required to uniquely identify master nodes.

## Motivation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need separate sections? they seem to talk about the same thing. maybe we could synthesize

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is following the KEP template.


Kubeadm is driving towards natively supporting highly available clusters.
As part of HA support, a clean upgrade path is required.
The purpose of this KEP is simply to introduce support for multiple masters in the kubeadm configuration that is stored in-cluster in order to enable that clean upgrade path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/simply//

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

## Motivation

Kubeadm is driving towards natively supporting highly available clusters.
As part of HA support, a clean upgrade path is required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need these on separate lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was questioning this myself - I ended up looking at the KEP template to check the style of the raw markdown. This is the style used there. They're not on separate lines in the generated file FWIW (need two newlines for that).

Because kubeadm is not a process that runs on the cluster (it is only run to perform operations, e.g. `init` and `upgrade`), this config is not modified during normal operation.
In the non-HA case today, it is guaranteed to be an accurate representation of the kubeadm configuration.

If kubeadm is used to create an HA cluster today, e.g. using the workarounds described in [kubeadm #546](https://github.com/kubernetes/kubeadm/issues/546) and/or @mbert's [document](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg), then the `kubeadm-config` ConfigMap will be an accurate representation except for any master node-specific information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we refer to the actual HA doc too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided that it would be better to leave this as-is since it would be difficult to shoehorn in a reference to the HA doc here and there are plenty of links out to it from the workarounds links. It also didn't seem to fit in well with the other link sections (see-also which is ideally just to other KEPs and Implementation History).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it difficult to add a link?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link to the official kubeadm HA doc in the see-also section.


If kubeadm is used to create an HA cluster today, e.g. using the workarounds described in [kubeadm #546](https://github.com/kubernetes/kubeadm/issues/546) and/or @mbert's [document](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg), then the `kubeadm-config` ConfigMap will be an accurate representation except for any master node-specific information.
As explained in [Challenges and Open Questions](#challenges-and-open-questions), such node-specific information is not yet well-defined but minimally consists of the master's `nodeName`.
The `nodeName` in `kubeadm-config` will correspond to the last master that happened to write to the ConfigMap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda feel this should be described earlier in the problem definition

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this, I don't really see a good way to move it up in the document. I'd like to keep the Summary free of details like this and I think that the preceding background is important to understand this detail (readers need to know that specific workarounds are needed for kubeadm HA to work today anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when reading the summary I didn't understand the problem domain. Giving details there would have facilitated better understanding IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added line 54 which I hope will clarify the problem at a deeper level earlier on.


The proposed solution is to add additional kubeadm ConfigMaps that are specific to each master (one ConfigMap for each master).
Each master-specific ConfigMap will be created as part of the to-be-implemented [`kubeadm join --master` process](https://github.com/kubernetes/community/pull/1707).
Any master-specific information in the main `kubeadm-config` ConfigMap will be removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be specific on what master-specific info?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not yet sure exactly what fields are master-specific (see "Challenges and Open Questions" above). I could just add "(e.g. NodeName)" for now, as that's a definite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thanks for clarifying

Any master-specific information in the main `kubeadm-config` ConfigMap will be removed.
Each master-specific ConfigMap can be automatically deleted via garbage collection (see [below](#guaranteed-consistent-kubeadm-config) for details).

The names of these new ConfigMaps will be `kubeadm-config-<machine_UID>` where `machine_UID` is an identifier that is guaranteed to be unique for each node in the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why machine_UID and not nodeName? wouldn't the latter be more human-readable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core problem is that we don't actually know the nodeName of the current node without retrieving it from some persistent config. It can be anything - it's not guaranteed to be the hostname, for example. That's why we need to fetch it based on a machine UID, which is what's proposed here :)

If we put master-specific information into `kubeadm-config` itself, then we would require either a yet-to-be-defined `kubeadm leave` workflow or active reconciliation of `kubeadm-config` in order to ensure accurateness.
This may not be critical, but it is a consideration.

With this proposal, if a node unexpectedly leaves a cluster, then at worst a stale ConfigMap will be left in the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we replace "stale" with "dangling"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@mattkelly mattkelly force-pushed the kep-kubeadm-config branch 3 times, most recently from b0dfd7a to 4a716f2 Compare March 19, 2018 15:48
@mattkelly
Copy link
Author

All comments are now addressed. I'll try to get a POC out early this week.

One of the first steps of the upgrade process is to retrieve the `kubeadm-config` ConfigMap that is stored at `kubeadm init` time in order to get any relevant cluster configuration.
The current `kubeadm-config` ConfigMap was designed to support upgrades of clusters with a single master.
As we move towards supporting High Availability (HA) natively in kubeadm, the persistent configuration must be enhanced in order to store information unique to each master node.
In particular, master node-specific information is required in order for kubeadm to identify which control plane static pods belong to the current node during an upgrade (the static pods are identifiable by the `nodeName` field embedded in the pod name).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this specific to static pods?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be - I'll make it more clear that this is just an example of how the node-specific information is used.

In the non-HA case today, it is guaranteed to be an accurate representation of the kubeadm configuration.

If kubeadm is used to create an HA cluster today, e.g. using the workarounds described in [kubeadm #546](https://github.com/kubernetes/kubeadm/issues/546) and/or @mbert's [document](https://docs.google.com/document/d/1rEMFuHo3rBJfFapKBInjCqm2d7xGkXzh0FpFO0cRuqg), then the `kubeadm-config` ConfigMap will be an accurate representation except for any master node-specific information.
As explained in [Challenges and Open Questions](#challenges-and-open-questions), such node-specific information is not yet well-defined but minimally consists of the master's `nodeName`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this link doesn't go anywhere

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It navigates to the Challenges and Open Questions section properly for me. Maybe you were scrolled in such a way that it didn't seem like it went anywhere (the section is not far about this line)?


#### Adding Additional Master-Specific ConfigMaps

The proposed solution is to add additional kubeadm ConfigMaps that are specific to each master (one ConfigMap for each master).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the regular kubeadm join commands require information about these master nodes (for kubelet control)? How do they decide which master to connect to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that kubeadm join will still be done through an L4/L7 LB that is fronting all of the masters.

@mattkelly mattkelly force-pushed the kep-kubeadm-config branch from 4a716f2 to 89f4748 Compare April 9, 2018 16:47
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: timothysc

Assign the PR to them by writing /assign @timothysc in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mattkelly
Copy link
Author

mattkelly commented Apr 9, 2018

Made some quick updates:

  • Address all new comments
  • Update some stale links
  • Call out advertiseAddress as another master node-specific config entry as @fabriziopandini pointed out in a recent meeting

@mattkelly
Copy link
Author

Adding some thoughts here regarding the actual implementation details for a proof of concept of what is proposed in this KEP. Hopefully this is useful as a potential jumping off point for discussion during one of the weekly meetings. Keep in mind that I'm still fairly new to the codebase so my opinions on complexity, etc may not be calibrated yet :).

After digging through the code and trying to introduce additional node-specific ConfigMaps (the new kubeadm-config-<product_uuid> CMs proposed here), I've now realized that this is potentially a pretty big change if we want to do it properly.

The main difficulty is that the MasterConfiguration struct (aka the "main" config) is used everywhere in kubeadm. By introducing another source for configuration, we will have to do one of the following:

  1. Define a new struct containing the "main" config and "node-specific" config and change the entire codebase to use this
    • Massive changeset
  2. Intelligently pass the node-specific config only where it is needed
    • Will probably get pretty complex / confusing, especially if we continue to transfer things from the main config to node-specific configs
  3. Merge the new node-specific config into the existing config before using it anywhere
    • Internal representation is inconsistent with external, feels pretty strange and too magical
    • May be hard to maintain depending on exactly what "merge" means

Some of the more obvious tradeoffs are listed. The complexity of all of them is compounded by the need to add (and/or modify) all of the defaulting + validation code, etc. This is then made a lot more difficult if we want to minimize negative impact on user experience, as discussed here.

For these reasons (purely from an implementation complexity + UX viewpoint), my current opinion is that we should revisit the idea of extending the existing kubeadm-config without introducing additional node-specific ConfigMaps. I believe the main considerations there would be (1) reconciling the ConfigMap on a best-effort basis to resolve inconsistencies (masters can join/leave/die, etc) and (2) managing concurrent creation attempts / access to the ConfigMap for parallel master bringup.

cc @timothysc @detiber @fabriziopandini @liztio @stealthybox

@timothysc
Copy link
Member

I'm going to put a hold on this until we KEP out well defined versioning of the config file.

/cc @liztio

@k8s-ci-robot
Copy link
Contributor

@timothysc: GitHub didn't allow me to request PR reviews from the following users: liztio.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

I'm going to put a hold on this until we KEP out well defined versioning of the config file.

/cc @liztio

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@timothysc timothysc added area/kubeadm do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 12, 2018
* [Drawbacks](#drawbacks)
* [Alternatives](#alternatives)

## Summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that your summary is a great motivation and vice versa but that's just how it read to me

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2018
@neolit123
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2018
@timothysc
Copy link
Member

We're going to close this one as @fabriziopandini 's PRs are currently up or in.

@timothysc timothysc closed this Aug 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.