-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote sysctls to Beta #8804
Promote sysctls to Beta #8804
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,15 @@ | ||
--- | ||
title: Using Sysctls in a Kubernetes Cluster | ||
title: Using sysctls in a Kubernetes Cluster | ||
reviewers: | ||
- sttts | ||
content_template: templates/task | ||
--- | ||
|
||
{{% capture overview %}} | ||
{{< feature-state for_k8s_version="v1.11" state="beta" >}} | ||
|
||
This document describes how sysctls are used within a Kubernetes cluster. | ||
This document describes how to configure and use kernel parameters within a | ||
Kubernetes cluster using the sysctl interface. | ||
|
||
{{% /capture %}} | ||
|
||
|
@@ -74,7 +76,7 @@ application tuning. _Unsafe_ sysctls are enabled on a node-by-node basis with a | |
flag of the kubelet, e.g.: | ||
|
||
```shell | ||
$ kubelet --experimental-allowed-unsafe-sysctls \ | ||
$ kubelet --allowed-unsafe-sysctls \ | ||
'kernel.msg*,net.ipv4.route.min_pmtu' ... | ||
``` | ||
|
||
|
@@ -89,36 +91,48 @@ Only _namespaced_ sysctls can be enabled this way. | |
## Setting Sysctls for a Pod | ||
|
||
A number of sysctls are _namespaced_ in today's Linux kernels. This means that | ||
they can be set independently for each pod on a node. Being namespaced is a | ||
requirement for sysctls to be accessible in a pod context within Kubernetes. | ||
they can be set independently for each pod on a node. Only namespaced sysctls | ||
are accessible in the pod security context within Kubernetes. | ||
|
||
The following sysctls are known to be _namespaced_: | ||
The following sysctls are _namespaced_: | ||
|
||
- `kernel.shm*`, | ||
- `kernel.msg*`, | ||
- `kernel.sem`, | ||
- `fs.mqueue.*`, | ||
- `net.*`. | ||
|
||
Sysctls which are not namespaced are called _node-level_ and must be set | ||
manually by the cluster admin, either by means of the underlying Linux | ||
distribution of the nodes (e.g. via `/etc/sysctls.conf`) or using a DaemonSet | ||
with privileged containers. | ||
Sysctls with no namespace are called _node-level_ sysctls. If you need to set | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this text changed? What is an "underlying node"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even with the change about it still make sense to me. Though, if it is still generally unclear, I can undo the change. @sttts WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I introduced this change to try to make it clear that you'd be messing with the underlying OS of the node, outside the scope of Kubernetes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My point is that a node is a fixed concept in kube. What should be the underlying node be? Write "underlying operation system" or "underlying machine". Also the "operation system" is misleading. The operation system is visible all the time, to each container via syscalls of the kernel. The point here is that the user-level tooling by the distribution has to be used, not the operation system as the kernel visible to containers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, that helps. Let me try to fix this better. |
||
them, you must manually configure them on each node's operating system, or by | ||
using a DaemonSet with privileged containers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
||
The sysctl feature is an alpha API. Therefore, sysctls are set using annotations | ||
on pods. They apply to all containers in the same pod. | ||
For namespaced sysctls, use the pod securityContext to configure sysctls. They | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: to configure them. |
||
apply to all containers in the same pod. | ||
|
||
Here is an example, with different annotations for _safe_ and _unsafe_ sysctls: | ||
This example uses the pod securityContext to set a safe sysctl | ||
`kernel.shm_rmid_forced` and two unsafe sysctls `net.ipv4.route.min_pmtu` and | ||
`kernel.msgmax` There is no distinction between _safe_ and _unsafe_ sysctls in | ||
the specification. | ||
|
||
{{< warning >}} | ||
Only modify sysctl parameters after you understand their effects, to avoid | ||
destabilizing your operating system. | ||
{{< /warning >}} | ||
|
||
```yaml | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: sysctl-example | ||
annotations: | ||
security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1 | ||
security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.route.min_pmtu=1000,kernel.msgmax=1 2 3 | ||
spec: | ||
securityContext: | ||
sysctls: | ||
- name: kernel.shm_rmid_forced | ||
value: "0" | ||
- name: net.ipv4.route.min_pmtu | ||
value: "552" | ||
- name: kernel.msgmax | ||
value: "65536" | ||
... | ||
``` | ||
{{% /capture %}} | ||
|
@@ -143,27 +157,50 @@ is recommended to use | |
[taints on nodes](/docs/concepts/configuration/taint-and-toleration/) | ||
to schedule those pods onto the right nodes. | ||
|
||
## PodSecurityPolicy Annotations | ||
## PodSecurityPolicy | ||
|
||
To control which sysctls can be set in pods, specify the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: To further control |
||
`forbiddenSysctls` and/or `allowedUnsafeSysctls` fields in the PodSecurityPolicy. | ||
|
||
By default, all safe sysctls in the whitelist are allowed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: the whitelist defines the safe sysctls. "in the whitelist" is redundant. |
||
|
||
Both `forbiddenSysctls` and `allowedUnsafeSysctls` are lists of plain sysctl names | ||
or sysctl patterns (which end with `*`). The string `*` matches all sysctls. | ||
|
||
The use of sysctl in pods can be controlled via annotation on the PodSecurityPolicy. | ||
The `forbiddenSysctls` field excludes specific sysctls, and can include a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: excludes vs. include is confusing. Better use a different word for "include". |
||
combination of safe and unsafe ones. To forbid setting any sysctls, use `*` on | ||
its own. | ||
|
||
Sysctl annotation represents a whitelist of allowed safe and unsafe sysctls | ||
in a pod spec. It's a comma-separated list of plain sysctl names or sysctl patterns | ||
(which end in `*`). The string `*` matches all sysctls. | ||
If you specify any unsafe sysctl in the `allowedUnsafeSysctls` field and it is | ||
not present in the `forbiddenSysctls` field, that sysctl can be used in Pods under | ||
this PodSecurityPolicy. In order to allow all unsafe sysctls in the PodSecurityPolicy | ||
to be set (except for those explicitly forbidden by `forbiddenSysctls`), | ||
use `*` on its own. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unfortunately this does not work: you cannot use |
||
|
||
Here is an example, it authorizes binding user creating pod with corresponding sysctls. | ||
Do not configure these two fields such that there is overlap, meaning that a | ||
given sysctl is both allowed and forbidden. | ||
|
||
{{< warning >}} | ||
**Warning**: If you whitelist unsafe sysctls via the `allowedUnsafeSysctls` field | ||
in a PodSecurityPolicy, any pod using such a sysctl will fail to start | ||
if the sysctl is not whitelisted via the `--allowed-unsafe-sysctls` kubelet | ||
flag as well on that node. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
{{< /warning >}} | ||
|
||
This example allows unsafe sysctls prefixed with `kernel.msg` to be set and | ||
disallows setting of the `kernel.shm_rmid_forced` sysctl. | ||
|
||
```yaml | ||
apiVersion: policy/v1beta1 | ||
kind: PodSecurityPolicy | ||
metadata: | ||
name: sysctl-psp | ||
annotations: | ||
security.alpha.kubernetes.io/sysctls: 'net.ipv4.route.*,kernel.msg*' | ||
spec: | ||
allowedUnsafeSysctls: | ||
- kernel.msg* | ||
forbiddenSysctls: | ||
- kernel.shm_rmid_forced | ||
... | ||
``` | ||
|
||
{{% /capture %}} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this list might change from kernel to kernel. This was the reason for the "known to be".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. I misunderstood the intent.