link: wrap errors for Raw{At,De}tachProgram #320

cyphar · 2021-06-08T08:41:25Z

This allows callers to detect the underlying syscall error, which is
necessary for being able to implement safe fallbacks based on the error
during program loading and unloading. runc in particular needs this to
be able to implement BPF_F_REPLACE fallbacks correctly.

Ref: opencontainers/runc#3008
Signed-off-by: Aleksa Sarai cyphar@cyphar.com

AkihiroSuda

I was about to open the same PR but my draft have the following changes 😛

diff --git a/internal/elf.go b/internal/elf.go
index 54a4313..88326bb 100644
--- a/internal/elf.go
+++ b/internal/elf.go
@@ -24,7 +24,7 @@ func NewSafeELFFile(r io.ReaderAt) (safe *SafeELFFile, err error) {
                }
 
                safe = nil
-               err = fmt.Errorf("reading ELF file panicked: %s", r)
+               err = fmt.Errorf("reading ELF file panicked: %w", r)
        }()
 
        file, err := elf.NewFile(r)
@@ -44,7 +44,7 @@ func (se *SafeELFFile) Symbols() (syms []elf.Symbol, err error) {
                }
 
                syms = nil
-               err = fmt.Errorf("reading ELF symbols panicked: %s", r)
+               err = fmt.Errorf("reading ELF symbols panicked: %w", r)
        }()
 
        syms, err = se.File.Symbols()
@@ -60,7 +60,7 @@ func (se *SafeELFFile) DynamicSymbols() (syms []elf.Symbol, err error) {
                }
 
                syms = nil
-               err = fmt.Errorf("reading ELF dynamic symbols panicked: %s", r)
+               err = fmt.Errorf("reading ELF dynamic symbols panicked: %w", r)
        }()
 
        syms, err = se.File.DynamicSymbols()

AkihiroSuda

diff --git a/cmd/bpf2go/compile.go b/cmd/bpf2go/compile.go
index a7509e1..0f0b28f 100644
--- a/cmd/bpf2go/compile.go
+++ b/cmd/bpf2go/compile.go
@@ -91,7 +91,7 @@ func compile(args compileArgs) error {
        }
 
        if err := cmd.Start(); err != nil {
-               return fmt.Errorf("can't execute %s: %s", args.cc, err)
+               return fmt.Errorf("can't execute %s: %w", args.cc, err)
        }
 
        if depRd != nil {
@@ -104,7 +104,7 @@ func compile(args compileArgs) error {
        }
 
        if err := cmd.Wait(); err != nil {
-               return fmt.Errorf("%s: %s", args.cc, err)
+               return fmt.Errorf("%s: %w", args.cc, err)
        }
 
        return nil

cmd/bpf2go/main.go

cyphar · 2021-06-08T08:45:59Z

Ah, I did a search-and-replace rather than checking for the cases where the format string was more complicated. I'll enable the errorlint linter and will fix all the errors then push again.

internal/btf/btf.go

cyphar · 2021-06-08T08:47:46Z

Wait, @AkihiroSuda please stop sending reviews -- I'm fixing them all now. 😅

AkihiroSuda · 2021-06-08T08:48:36Z

@cyphar Sure (FYI: my draft is at AkihiroSuda@650e297)

cyphar · 2021-06-08T09:12:01Z

@AkihiroSuda This now has fixes to pass all errorlint failures, which included porting things to error.As and error.Is where appropriate.

lmb · 2021-06-08T09:18:13Z

Thanks for the PR, error wrapping is a bit muddled at the moment. I'm conflicted about this change: we're not going to make all wrapped errors part of our API contract. That is, if your code relies on unwrapping an error that is not explicitly documented by a function we may well break it in the future. Are you OK with that? @ti-mo do you have opinions on error wrapping vs. no error wrapping?

I also dug into the downstream issue you linked and I wonder whether you could use https://pkg.go.dev/github.com/cilium/ebpf/link#AttachCgroup instead? This should handle the cgroupv2 abstraction for you, including the bpf flags to use.

cyphar · 2021-06-08T11:49:16Z

@lmb

We can't use that API for two reasons:

It depends on giving a path to a cgroup, but we have various hardenings such that the dirfd we use is definitely safe to operate on. To be fair, we could use /proc/self/fd/$fd but I'd prefer to not have to go through /proc if there is a way to use the dirfd directly.
As far as I can tell, there doesn't appear to be a nice way for us to detect that the error was "BPF_F_REPLACE is unsupported". Yes there is a custom error for that case, but we can't check it programmatically other than getting the string of the error and matching that (which isn't guaranteed to never change).

The first issue isn't that bad, but the second one kinda makes it untenable (I would've used that API if it weren't for the second issue). The only reason why we need to detect unix.* errors from RawAttachProgram is because I copied (and slightly modified) the code for haveProgAttachReplace() so we can use it in runc. If there was a way to be told that haveProgAttachReplace() == false in the error of RawAttachProgram, that would solve our problem and not require any other changes.

That is, if your code relies on unwrapping an error that is not explicitly documented by a function we may well break it in the future.

The only thing we really care about is being able to get the underlying syscall error from RawAttachProgram and RawDetachProgram (and having that be something we can rely on would be nice). I did a tree-wide change because it seemed that some of the internal and external functions were wrapping errors, so the ones that weren't were either an oversight or something like that. I can reduce this PR to just wrap RawAttachProgram and RawDetachProgram if that would cause less issues for you.

lmb · 2021-06-08T13:33:56Z

I'd prefer to not have to go through /proc if there is a way to use the dirfd directly.
As far as I can tell, there doesn't appear to be a nice way for us to detect that the error was "BPF_F_REPLACE is unsupported".

Makes sense. What if we added an fd-only API? From your description it seems like something about BPF_F_REPLACE would still trip you up, but I'm missing context. How does your behaviour differ if F_REPLACE is not supported? Is the problem that the library forces BPF_F_REPLACE if F_ALLOW_MULTI is specified?

I can reduce this PR to just wrap RawAttachProgram and RawDetachProgram if that would cause less issues for you.

Yes please, it'll make it easier to review and get you unblocked.

cyphar · 2021-06-08T13:54:32Z

@lmb

How does your behaviour differ if F_REPLACE is not supported?

We have slightly strange fallback behaviour. If BPF_F_REPLACE is not supported, we attach the program then detach the old one (effectively an inatomic BPF_F_REPLACE). This is okay for us because we're the only people managing the cgroup. I don't expect this fallback behaviour to be in the ebpf library, but being able to detect that "BPF_F_REPLACE isn't supported" was the error would allow us to fallback cleanly.

Yes please, it'll make it easier to review and get you unblocked.

Sure, I'll do that tomorrow. Thanks.

florianl · 2021-06-08T15:44:28Z

To me this reported issue sounds like a good example for feature probing (similar to the ideas in issue #235 ). Like @lmb I would appreciate it if not all errors would be part of the API.

This allows callers to detect the underlying syscall error, which is necessary for being able to implement safe fallbacks based on the error during program loading and unloading. runc in particular needs this to be able to implement BPF_F_REPLACE fallbacks correctly. Ref: opencontainers/runc#3008 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

cyphar · 2021-06-09T01:45:08Z

@florianl It would be nice if we could just directly use the feature probing that this library already does internally (and if we could do that, this PR wouldn't be necessary), but I expect that design discussions on such a feature will probably take some time -- we need some kind of workaround for this fairly quickly so we can do a runc release.

But personally I think it's fairly reasonable to expect that the errors from a "raw program API" would allow you to get the underlying syscall error. I've updated this PR to only add wrapping to the RawAttachProgram and RawDetachProgram.

ti-mo · 2021-06-09T07:49:45Z

A first proposal for feature probing is being worked on at the moment, but don't expect full coverage for all program and map types right out of the gate, that will take some time. Of course, if there is a strong use case for particular program types like F_REPLACE, those should be worked on first if it means we can positively impact (and gain adoption in) a project like runc.

I'm not sure where to go with this PR, though. On the surface, propagating errors all the way down to the user sounds reasonable. Do those errors then suddenly become part of the package API? Well, yes. We're not exporting sentinels, and we don't know which error(s) the kernel will give us. We could only document this like "returns error x when y" in godoc, but that doesn't make for a particularly strong API, as the kernel tends to change from under our feet.

If we want to make that work long-term, adding tests for specific error returns would be a base requirement. @cyphar this includes your most recent changes (wrapping RawAttachProgram and RawDetachProgram only). If we allow users to become reliant on certain error returns, we need to make sure that keeps working in the future. I get the urgency, but even these one-liners break with the current way of doing things, and we'll have to maintain them as an exception going forward.

Would it be possible to just rely on 'there was an error' for now to kick off your fallback behaviour, or is it really important to know the cause? If you're willing to implement a workaround, you could string-match errors for now until a feature probe API is available. This prevents your particular pain from potentially knock-on effects to other users of the library.

lmb · 2021-06-09T08:03:06Z

Good points @ti-mo.

Do those errors then suddenly become part of the package API? Well, yes. We're not exporting sentinels, and we don't know which error(s) the kernel will give us.

Like I said, if it's not documented and tested it's not part of the API from my POV. @cyphar I merged the PR so that you are unblocked, but I'd suggest adding a test and documentation if you don't want this to regress.

AkihiroSuda suggested changes Jun 8, 2021

View reviewed changes

cyphar force-pushed the wrap-errors branch from 400c5b6 to 9a33754 Compare June 8, 2021 08:44

AkihiroSuda suggested changes Jun 8, 2021

View reviewed changes

AkihiroSuda reviewed Jun 8, 2021

View reviewed changes

cmd/bpf2go/main.go Outdated Show resolved Hide resolved

AkihiroSuda reviewed Jun 8, 2021

View reviewed changes

internal/btf/btf.go Outdated Show resolved Hide resolved

AkihiroSuda mentioned this pull request Jun 8, 2021

update cilium/ebpf to fix haveBpfProgReplace() check opencontainers/runc#3009

Merged

cyphar force-pushed the wrap-errors branch from 9a33754 to e6026da Compare June 8, 2021 09:10

cyphar changed the title *: switch to %w formatting for all errors *: switch to Go 1.13-safe error handling Jun 8, 2021

AkihiroSuda approved these changes Jun 8, 2021

View reviewed changes

cyphar force-pushed the wrap-errors branch from e6026da to 199a4de Compare June 9, 2021 01:42

cyphar changed the title *: switch to Go 1.13-safe error handling link: wrap errors for Raw{At,De}tachProgram Jun 9, 2021

lmb approved these changes Jun 9, 2021

View reviewed changes

lmb merged commit ed2c0a0 into cilium:master Jun 9, 2021

cyphar deleted the wrap-errors branch June 9, 2021 07:58

kolyshkin mentioned this pull request Jun 11, 2021

Tag a 0.6.1 release #323

Closed

cyphar mentioned this pull request Dec 6, 2024

"failed to call BPF_PROG_ATTACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI): can't attach program: invalid argument: unknown" (master, kernel 5.4, cgroup2) opencontainers/runc#3008

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

link: wrap errors for Raw{At,De}tachProgram #320

link: wrap errors for Raw{At,De}tachProgram #320

cyphar commented Jun 8, 2021 •

edited

Loading

AkihiroSuda left a comment

AkihiroSuda left a comment

cyphar commented Jun 8, 2021 •

edited

Loading

cyphar commented Jun 8, 2021

AkihiroSuda commented Jun 8, 2021

cyphar commented Jun 8, 2021

lmb commented Jun 8, 2021

cyphar commented Jun 8, 2021 •

edited

Loading

lmb commented Jun 8, 2021

cyphar commented Jun 8, 2021 •

edited

Loading

florianl commented Jun 8, 2021

cyphar commented Jun 9, 2021 •

edited

Loading

ti-mo commented Jun 9, 2021

lmb commented Jun 9, 2021

link: wrap errors for Raw{At,De}tachProgram #320

link: wrap errors for Raw{At,De}tachProgram #320

Conversation

cyphar commented Jun 8, 2021 • edited Loading

AkihiroSuda left a comment

Choose a reason for hiding this comment

AkihiroSuda left a comment

Choose a reason for hiding this comment

cyphar commented Jun 8, 2021 • edited Loading

cyphar commented Jun 8, 2021

AkihiroSuda commented Jun 8, 2021

cyphar commented Jun 8, 2021

lmb commented Jun 8, 2021

cyphar commented Jun 8, 2021 • edited Loading

lmb commented Jun 8, 2021

cyphar commented Jun 8, 2021 • edited Loading

florianl commented Jun 8, 2021

cyphar commented Jun 9, 2021 • edited Loading

ti-mo commented Jun 9, 2021

lmb commented Jun 9, 2021

cyphar commented Jun 8, 2021 •

edited

Loading

cyphar commented Jun 8, 2021 •

edited

Loading

cyphar commented Jun 8, 2021 •

edited

Loading

cyphar commented Jun 8, 2021 •

edited

Loading

cyphar commented Jun 9, 2021 •

edited

Loading