Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run in eBPF least-privileged mode on COS #1299

Closed
dinvlad opened this issue Jul 2, 2020 · 7 comments
Closed

Unable to run in eBPF least-privileged mode on COS #1299

dinvlad opened this issue Jul 2, 2020 · 7 comments
Labels

Comments

@dinvlad
Copy link

dinvlad commented Jul 2, 2020

Describe the bug

When we attempt to run Falco in least-privileged mode on COS with eBPF enabled according to the official instructions (https://falco.org/docs/running/#docker-least-privileged), it fails.

How to reproduce it

  1. Load the driver:
docker pull falcosecurity/falco-driver-loader:latest
docker run --rm -i -t \
    --privileged \
    -v $HOME/.falco:/root/.falco \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    -v /etc:/host/etc:ro \
    -e FALCO_BPF_PROBE="" \
    falcosecurity/falco-driver-loader:latest

This step succeeds, with the build log as follows:

* Setting up /usr/src links from host
* Running falco-driver-loader with: driver=bpf, compile=yes, download=yes
* Mounting debugfs
* Found kernel config at /proc/config.gz
* COS detected (build 12871.148.0), using cos kernel headers
* Downloading https://storage.googleapis.com/cos-tools/12871.148.0/kernel-headers.tgz
* Extracting kernel sources
* Configuring kernel
* Trying to compile the eBPF probe (falco_cos_4.19.112+_1.o)
* Skipping download, eBPF probe is already present in /root/.falco/falco_cos_4.19.112+_1.o
* eBPF probe located in /root/.falco/falco_cos_4.19.112+_1.o
******************************************************************
** BPF doesn't have JIT enabled, performance might be degraded. **
** Please ensure to run on a kernel with CONFIG_BPF_JIT on.     **
******************************************************************
  1. Run Falco:
docker pull falcosecurity/falco-no-driver:latest
docker run --rm -i -t \
    --cap-add SYS_PTRACE --pid=host $(ls /dev/falco* | xargs -I {} echo --device {}) \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -e FALCO_BPF_PROBE="" \
    falcosecurity/falco-no-driver:latest

This step fails:

ls: cannot access '/dev/falco*': No such file or directory
2020-07-02T20:12:51+0000: Falco initialized with configuration file /etc/falco/falco.yaml
2020-07-02T20:12:51+0000: Loading rules from file /etc/falco/falco_rules.yaml:
2020-07-02T20:12:51+0000: Loading rules from file /etc/falco/falco_rules.local.yaml:
2020-07-02T20:12:51+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
2020-07-02T20:12:51+0000: Unable to load the driver. Exiting.
2020-07-02T20:12:51+0000: Runtime error: setrlimit failed. Exiting.

It error says something about setrlimit, but not sure what that means. And we're running on stock COS stable image, without any other modifications on the host.

Expected behaviour

Falco is able to run.

Screenshots

Environment

  • Falco version:

From docker run --rm -it falcosecurity/falco-no-driver falco --version:

Falco version: 0.23.0
Driver version: 96bd9bc560f67742738eb7255aeb4d03046b8045
  • System info:

{"machine":"x86_64","nodename":"xxxxxx","release":"4.19.112+","sysname":"Linux","version":"#1 SMP Sat Jun 13 11:04:33 PDT 2020"}

  • Cloud provider or hardware configuration: Google Compute Engine, n1-standard-1 instance.
  • OS:
BUILD_ID=12871.148.0
NAME="Container-Optimized OS"
KERNEL_COMMIT_ID=1d5bc45f886bc0308010614cdcdf658f5fb44a25
GOOGLE_CRASH_ID=Lakitu
VERSION_ID=81
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
PRETTY_NAME="Container-Optimized OS from Google"
VERSION=81
GOOGLE_METRICS_PRODUCT_ID=26
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
ID=cos
  • Kernel:
Linux <machine-name> 4.19.112+ #1 SMP Sat Jun 13 11:04:33 PDT 2020 x86_64 Intel(R) Xeon(R) CPU @ 2.30GHz GenuineIntel GNU/Linux
  • Installation method:

Docker, as described in steps 1-2 above.

Additional context

The same works in eBPF privileged mode, however:

docker pull falcosecurity/falco:latest
docker run --rm -i -t \
    --privileged \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    -v /etc:/host/etc:ro \
    -e FALCO_BPF_PROBE="" \
    falcosecurity/falco:latest
* Setting up /usr/src links from host
* Running falco-driver-loader with: driver=bpf, compile=yes, download=yes
* Mounting debugfs
* Found kernel config at /proc/config.gz
* COS detected (build 12871.148.0), using cos kernel headers
* Downloading https://storage.googleapis.com/cos-tools/12871.148.0/kernel-headers.tgz
* Extracting kernel sources
* Configuring kernel
* Trying to compile the eBPF probe (falco_cos_4.19.112+_1.o)
* Skipping download, eBPF probe is already present in /root/.falco/falco_cos_4.19.112+_1.o
* eBPF probe located in /root/.falco/falco_cos_4.19.112+_1.o
******************************************************************
** BPF doesn't have JIT enabled, performance might be degraded. **
** Please ensure to run on a kernel with CONFIG_BPF_JIT on.     **
******************************************************************
* Success: eBPF probe symlinked to /root/.falco/falco-bpf.o
2020-07-02T20:37:14+0000: Falco initialized with configuration file /etc/falco/falco.yaml
2020-07-02T20:37:14+0000: Loading rules from file /etc/falco/falco_rules.yaml:
2020-07-02T20:37:14+0000: Loading rules from file /etc/falco/falco_rules.local.yaml:
2020-07-02T20:37:14+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
2020-07-02T20:37:15+0000: Starting internal webserver, listening on port 8765

Also, doing this without eBPF (privileged or not) hits another bug (#1239)

@fntlnz
Copy link
Contributor

fntlnz commented Jul 3, 2020

I see we are lacking some documentation on how to use the eBPF probe in the running section of the docs , thanks for calling that out @dinvlad

I think that the best thing we can do here is to use --privileged

docker run  --privileged -v $HOME/.falco:/root/.falco --rm -i -t  --pid=host  -v /var/run/docker.sock:/var/run/docker.sock -e FALCO_BPF_PROBE="" falcosecurity/falco-no-driver:latest

Please note that:

  • I'm sharing -v $HOME/.falco:/root/.falco
  • I added --privileged
  • I removed the $(ls /dev/falco* | xargs -I {} echo --device {}) \ - this is for the kernel module, you want to use the eBPF probe so this is not needed

Let me explain why:

Starting from the error you posted Runtime error: setrlimit failed. Exiting., the next natural step is to think a capability is missing, YES it's missing. It's CAP_SYS_RESOURCE, so let's add it

docker run  -v $HOME/.falco:/root/.falco --rm -i -t --cap-add SYS_RESOURCE --cap-add SYS_PTRACE --pid=host     -v /var/run/docker.sock:/var/run/docker.sock     -e FALCO_BPF_PROBE="" falcosecurity/falco-no-driver:latest

Output:

2020-07-03T09:16:15+0000: Falco initialized with configuration file /etc/falco/falco.yaml
2020-07-03T09:16:15+0000: Loading rules from file /etc/falco/falco_rules.yaml:
2020-07-03T09:16:15+0000: Loading rules from file /etc/falco/falco_rules.local.yaml:
2020-07-03T09:16:16+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
2020-07-03T09:16:16+0000: Unable to load the driver. Exiting.
2020-07-03T09:16:16+0000: Runtime error: can't create map: Errno 1. Exiting.

Look! It went ahead, but still Falco can't start. This is because at this point we are here and we want todo bpf_map_create.
Unfortunately, here's the implementation of bpf_map_create

static int bpf_map_create(enum bpf_map_type map_type,
			  int key_size, int value_size, int max_entries,
			  uint32_t map_flags)
{
	union bpf_attr attr;

	bzero(&attr, sizeof(attr));

	attr.map_type = map_type;
	attr.key_size = key_size;
	attr.value_size = value_size;
	attr.max_entries = max_entries;
	attr.map_flags = map_flags;

	return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}

This means that after some initialization sys_bpf just does a syscall with __NR_BPF (aka. the bpf() syscall).

YAY! We can go ahead and say, why don't we just add a capability to do bpf operations at this point? Something like CAP_BPF would be awesome. Unfortunately that capability is not included in the kernel we are talking about 4.19.112+ since it's a Kernel 5.8 feature. At the moment of writing we have Kernel 5.8-rc3 including that one. Read more about this in this LWN article

This means that since we don't have the capability (yet) to for this kind of operations the only way to do them is being root.

So, here is why we need --privileged here!

Hope this helps!

I did spent some time putting this together, if anyone is interested there's a very good opportunity to become a contributor by sending a PR to our website's running.md containing the explainations here.

@leodido
Copy link
Member

leodido commented Jul 7, 2020

So the summary is that you need a kernel with CAP_BPF (and CAP_PERFMON) to make it work in the least-privileged mode.

Thanks to @fntlnz for clarifying it and sending updates to the docs.

/close

@poiana
Copy link
Contributor

poiana commented Jul 7, 2020

@leodido: Closing this issue.

In response to this:

So the summary is that you need a kernel with CAP_BPF to make it work in the least-privileged mode.

Thanks to @fntlnz for clarifying it and sending updates to the docs.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana poiana closed this as completed Jul 7, 2020
@dinvlad
Copy link
Author

dinvlad commented Sep 21, 2020

@fntlnz I've finally tried the command

docker run  --privileged -v $HOME/.falco:/root/.falco --rm -i -t  --pid=host  -v /var/run/docker.sock:/var/run/docker.sock -e FALCO_BPF_PROBE="" falcosecurity/falco-no-driver:latest

however this is what I see on cos-beta (Linux 5.4.65+):

2020-09-21T19:17:30+0000: Falco version 0.25.0 (driver version ae104eb20ff0198a5dcb0c91cc36c86e7c3f25c7)
2020-09-21T19:17:30+0000: Falco initialized with configuration file /etc/falco/falco.yaml
2020-09-21T19:17:30+0000: Loading rules from file /etc/falco/falco_rules.yaml:
2020-09-21T19:17:30+0000: Loading rules from file /etc/falco/falco_rules.local.yaml:
2020-09-21T19:17:31+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
2020-09-21T19:17:31+0000: Unable to load the driver.
2020-09-21T19:17:31+0000: Runtime error: can't open BPF probe '/root/.falco/falco-bpf.o': Errno 2. Exiting.

@dinvlad
Copy link
Author

dinvlad commented Sep 21, 2020

So looks like if we're <5.8, we have to run this command to install the driver first:

docker run --rm -i -t \
    --privileged \
    -v $HOME/.falco:/root/.falco \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    -v /etc:/host/etc:ro \
    falcosecurity/falco-driver-loader:latest

However, there are 2 general problems with it:

  1. With any of cos-stable (4.19.112), cos-beta (5.4.49), or cos-dev (5.4.65), this command fails with

    * Trying to download prebuilt module from 
    https://dl.bintray.com/falcosecurity/driver/ae104eb20ff0198a5dcb0c91cc36c86e7c3f25c7/falco_cos_4.19.112%2B_1.ko
    curl: (22) The requested URL returned error: 404 Not Found
    

    (replace version string in the URL with the kernel version..)

  2. In the environment where we'd like to run this (Google Life Sciences API), it's really not possible to set any of -v options.
    I can mount a disk volume, but that's about it - no /proc, /lib or other host directories are allowed by the API.

    Additionally, we can't run a container in --privileged mode, but we can set enable_fuse: true option in the API, which

    has the effect of causing the container to be executed with CAP_SYS_ADMIN and exposes /dev/fuse to the container, so use it only for containers you trust

Do you think there're fixes/workarounds for these issues? I'd imagine (1) is just a matter of pre-building a kernel module for COS on your side, but (2) - is that a dead end?

Thanks a lot for any help!

@dinvlad
Copy link
Author

dinvlad commented Sep 21, 2020

OK, I missed the earlier option from the original post (hard to remember all of these nuances..), but here's the combination that worked in a "bare-bones" COS (without Google Life Sciences API):

docker run --rm -it \
    --privileged \
    -v $HOME/.falco:/root/.falco \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    -v /etc:/host/etc:ro \
    -e FALCO_BPF_PROBE="" \
    falcosecurity/falco-driver-loader

docker run  --rm -it \
    --privileged \
    -v $HOME/.falco:/root/.falco \
    --pid=host  \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -e FALCO_BPF_PROBE="" \
    falcosecurity/falco-no-driver

So now, coming back to question 2 above, do you think it's possible:

  1. to compile Falco kernel module on another machine, then just mount it from inside the 2nd command without running the 1st command (since we can't mount /proc etc.)?

  2. to replace the 2nd command here with something such that it effectively becomes

    docker run --rm -it \
      --cap-add SYS_ADMIN \
      -v /mnt/volume:/root/.falco \
      --pid=host \
      -v /dev/fuse:/dev/fuse \
      -e FALCO_BPF_PROBE="" \
      falcosecurity/falco-driver-loader
    

    I.e. is it possible to run without --privileged and -v /var/run/docker.sock:/var/run/docker.sock in this case?

@dinvlad
Copy link
Author

dinvlad commented Sep 25, 2020

I think we may have to just wait until Kernel 5.8+ for Container OS + support for passing capabilities and mounting Docker socket by Life Sciences API. Could you confirm this would then be the right command to use:

docker run --rm -it \
    --cap-add SYS_BPF \
    --cap-add SYS_PTRACE \
    --pid=host \
    -v /var/run/docker.sock:/var/run/docker.sock \
    falcosecurity/falco-no-driver

Particularly, would SYS_PTRACE and -v /var/run/docker.sock:/var/run/docker.sock still be required here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants