reflect pod-blocking issues into member status #562

joel-bluedata · 2022-02-17T22:38:46Z

There are various reasons why a K8s pod might not come up right away. e.g.:

bad image / inaccessible repo
can't schedule because of resources
can't schedule because of node taints
can't schedule because of affinity constraints
can't use PV in exclusive mode because already in use (see PR Support for mounting user created volumes #561)

...and probably others, but we're mainly concerned with the common cases. Currently discovering these issues requires digging into pod status, which in turn requires some set of K8s API privs and some set of knowledge about where to look. We don't want to mirror the whole pod status in our status -- that would be a lot of stuff and wouldn't really help any of the know-where-to-look issues.

Instead, we need to surface some summary of common issues (like those above) into the per-member state.

Note that all the things I listed so far are when the member is in "create pending" state and its lastKnownContainerState is "waiting" or "absent" -- this also sets the "membersWaiting" flag in the rollup status. So this work would be for drilling further into those situations.

I do also wonder if it would be useful to have a separate rollup flag for that "absent" state.

==========

Other than hard errors like the above, there are also more fuzzy "taking too long" kinds of problems. We're already tackling one of those in issue #540 for PV initialization. There are a couple of other things we might be able to give insight on, like time spent waiting on image load or time spent running the config script. Not sure about that. Maybe that just boils down to "time spent in current member state" for transitional states?

==========

Along with issue #540 and issue #547, this work would improve the user experience for launching a kdcluster in the situations where anything is wrong.

joel-bluedata · 2022-02-18T18:21:11Z

Something to ponder as an additional change: how proactive could we be about warning the kdcluster-submitter that the spec is likely to not be schedulable? This is a hard problem (and ultimately best-effort since even if we give something the greenlight it could still fail) but we get an interesting number of requests for that.

Edit: this just seems like way too much to bite off, unless and until K8s introduces scheduling dry-run capability. Without that capability we'd more or less need to try to replicate the K8s scheduling logic internally to KD, although we could make some simplifying assumptions. It would be good to research the OpenShift cluster capacity tool which at first glance is attacking the same problem.

joel-bluedata · 2022-03-09T01:06:46Z

Let's look at the steps involved in for example advertising the "can't schedule" cases. For this exercise let's bundle them all together and just distinguish between them in the actual text of the error message, since I think that's what K8s does.

When this happens for a member, the per-member status currently will show:

state: create pending
stateDetail:
  lastKnownContainerState: absent

So we can hook into the KD code that populates this status information, to further investigate the pod and see if it is in the "not enough resources case".

What should that investigation actually look like?

Well if we look at the pod's status.conditions, it will have something like this:

lastProbeTime: null
lastTransitionTime: "2022-03-09T00:27:11Z"
message: '0/4 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master:
  }, that the pod didn''t tolerate, 3 Insufficient cpu.'
reason: Unschedulable
status: "False"
type: PodScheduled

That message is interesting and seems to encapsulate what we want to expose to the user. As a strawman for how the per-member status will be modified, let's say that we want to add an "schedulingErrorMessage" field to that object, which will be populated with this message if we're in this situation. We may really want to modify this arrangement once we start looking at the other things we want to advertise, but let's use that as a working assumption.

So OK, let's hop over to the KD code for a bit.

To add that field, we'll need to modify kubedirector.hpe.com_kubedirectorclusters_crd.yaml to add schedulingErrorMessage as another string field under stateDetail. Let's also say that we add a membersNotScheduled boolean field under memberStateRollup.

In kubedirectorcluster_types.go we'll need to correspondingly update the MemberStateDetail and StateRollup structures. For SchedulingErrorMessage in MemberStateDetail, let's make it a string pointer (rather than just a string) so it's clear when it is or isn't populated.

Now where can we have the code to populate these? The place to do the pod-investigation would be wherever that lastKnownContainerState is being set to "absent". In the KD code this value is actually a constant called containerMissing. That is set in the checkContainerStates function, so let's look at that function.

checkContainerStates actually starts with an assumption of lastKnownContainerState=containerMissing, then loads the Pod, then tries to determine if lastKnownContainerState can be set to something else. If it goes through that code and and still ends up thinking that lastKnownContainerState=containerMissing, it can look into the status.conditions of the loaded Pod object. Probably by calling some subroutine, since this function is getting big.

That subroutine can look through the pod's status.conditions, looking for a condition where Type == corev1.PodScheduled. If it finds that condition AND sees that the Reason == corev1.PodReasonUnschedulable, it should set our new SchedulingErrorMessage field to whatever is in the condition's Message.

(Note that it's possible for an unschedulable pod to BECOME schedulable if things change, so we also need to think about clearing out that field when necessary. Probably we could just always clear it in checkContainerStates when we first start processing the per-member status.)

Last question then would be where to set the rollup boolean of MembersNotScheduled. Rollups are set in the updateStateRollup function and adding another one should be straightforward. If we want to be laser-targetted about it: in the memberCreatePending case, when LastKnownContainerState == containerMissing, we can then additionally check to see if SchedulingErrorMessage is non-nil and if so then set MembersNotScheduled.

joel-bluedata · 2022-03-10T16:42:46Z

We'll do the above-described "unschedulable" status advertisement as the first cut, to definitely be in the next minor KD release. Advertisement of bad image, inaccessible repo, and already-in-use-PV will be investigated next, and probably not handled for this next KD release.

The scheduling dry run is off the table for now and I'll update the relevant comment above accordingly.

joel-bluedata · 2022-03-22T16:01:28Z

Change has been added to partially address this. I'll split this issue so we can track future work separately.

joel-bluedata added Priority: High Type: Enhancement Project: Cluster Model Project: Cluster Reconcile beyond simple xlate of model to K8s spec labels Feb 17, 2022

joel-bluedata assigned Rahul1804 Mar 10, 2022

joel-bluedata added the Status: In Progress label Mar 10, 2022

joel-bluedata added this to the 0.9.0 milestone Mar 10, 2022

Rahul1804 mentioned this issue Mar 14, 2022

Issue-562 | Added code changes to update pod scheduling error message… #566

Merged

joel-bluedata mentioned this issue Mar 22, 2022

reflect image-pull issues into member status #569

Open

joel-bluedata closed this as completed Mar 22, 2022

joel-bluedata removed the Status: In Progress label Mar 27, 2022

joel-bluedata mentioned this issue Apr 5, 2022

investigate cluster capacity prediction #580

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reflect pod-blocking issues into member status #562

reflect pod-blocking issues into member status #562

joel-bluedata commented Feb 17, 2022 •

edited

Loading

joel-bluedata commented Feb 18, 2022 •

edited

Loading

joel-bluedata commented Mar 9, 2022

joel-bluedata commented Mar 10, 2022

joel-bluedata commented Mar 22, 2022

reflect pod-blocking issues into member status #562

reflect pod-blocking issues into member status #562

Comments

joel-bluedata commented Feb 17, 2022 • edited Loading

joel-bluedata commented Feb 18, 2022 • edited Loading

joel-bluedata commented Mar 9, 2022

joel-bluedata commented Mar 10, 2022

joel-bluedata commented Mar 22, 2022

joel-bluedata commented Feb 17, 2022 •

edited

Loading

joel-bluedata commented Feb 18, 2022 •

edited

Loading