Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting for job pod finalizers #43773

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions content/en/docs/concepts/workloads/controllers/job.md
Original file line number Diff line number Diff line change
Expand Up @@ -938,6 +938,11 @@ creates Pods with the finalizer `batch.kubernetes.io/job-tracking`. The
controller removes the finalizer only after the Pod has been accounted for in
the Job status, allowing the Pod to be removed by other controllers or users.

{{< note >}}
See [My pod stays terminating](/docs/tasks/debug-application/debug-pods) if you
observe that pods from a Job are stucked with the tracking finalizer.
{{< /note >}}

### Elastic Indexed Jobs

{{< feature-state for_k8s_version="v1.27" state="beta" >}}
Expand Down
28 changes: 28 additions & 0 deletions content/en/docs/tasks/debug/debug-application/debug-pods.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,34 @@ There are three things to check:
* Try to manually pull the image to see if the image can be pulled. For example,
if you use Docker on your PC, run `docker pull <image>`.


#### My pod stays terminating

If a Pod is stuck in the `Terminating` state, it means that a deletion has been
issued for the Pod, but the control plane is unable to delete the Pod object.

This typically happens if the Pod has a [finalizer](/docs/concepts/overview/working-with-objects/finalizers/)
and there is an [admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/)
installed in the cluster that prevents the control plane from removing the
finalizer.

To identify this scenario, check if your cluster has any
ValidatingWebhookConfiguration or MutatingWebhookConfiguration that target
`UPDATE` operations for `pods` resources.

If the webhook is provided by a third-party:
- Make sure you are using the latest version.
- Disable the webhook for `UPDATE` operations.
- Report an issue with the corresponding provider.

If you are the author of the webhook:
- For a mutating webhook, make sure it never changes immutable fields on
`UPDATE` operations. For example, changes to containers are usually not allowed.
- For a validating webhook, make sure that your validation policies only apply
to new changes. In other words, you should allow Pods with existing violations
to pass validation. This allows Pods that were created before the validating
webhook was installed to continue running.

#### My pod is crashing or otherwise unhealthy

Once your pod has been scheduled, the methods described in
Expand Down