-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CO-RE support for kernel modules #1300
Conversation
It looks like some examples are failing because tracefs is not mounted. I could make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I was worried that this would be a lot more complicated!
I still have my old questions re semantics and corner cases:
- As mentioned, do we have to deal with module reload / unload? It would invalidate the list of probeable functions for sure. It might change kmod BTF as well.
- Using the fn name to figure out the kmod seems a bit error prone.
- You propose that CO-RE for a probe attached to a kprobe in a kernel module will look at
set of vmlinux types
+set of containing kmod types
. What about a kprobe on a vmlinux function which needs access to kmod types? I'm thinking of data behindvoid *ctx
and similar. (There is also a related case of probe on kmod which needs access to other kmod types, but that is probably rare.)
For the implementation, I was thinking that instead of introducing mergedSpec
we'd operate on []*Spec
. ProgramOptions.KernelTypes
would become []*Spec
as well. CORERelocate
would iterate the []*Spec
and try to find a valid set of relocations for each target. The logic to find the best possible match would have to be adjusted accordingly:
Lines 276 to 294 in a8be855
if score > bestScore { | |
// We have a better target already, ignore this one. | |
continue | |
} | |
if score < bestScore { | |
// This is the best target yet, use it. | |
bestScore = score | |
bestFixups = fixups | |
continue | |
} | |
// Some other target has the same score as the current one. Make sure | |
// the fixups agree with each other. | |
for i, fixup := range bestFixups { | |
if !fixup.equal(fixups[i]) { | |
return nil, fmt.Errorf("%s: multiple types match: %w", fixup.kind, errAmbiguousRelocation) | |
} | |
} |
Since the type IDs of kernel modules conflict with each other, you are only ever going to be dealing with vmlinux plus one optional kmod BTF. libbpf makes this same assumption. |
With the changes you requested, you could accomplish this by setting |
I don't see a way to know if a module has been unloaded and then loaded with different contents, other than maybe comparing sizes from |
b02c8ed
to
d01b1d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, went over the PR again. Here is what I would propose, let me know what you think.
- Skip
ProgramSpec.KernelModule
. Instead,ProgramOptions.KernelTypes
becomes[]*btf.Spec
as discussed. IfKernelTypes
is nil, the library tries to find the appropriate BTF to use, via means of parsing kallsyms / available_filter_functions and doing a lookup of ProgramSpec.AttachTo. Most of the time this will be just vmlinux. Some of the time this will be vmlinux + kmod via LoadKernelModuleSpec. - kallsyms / available_filter_functions: which one is more appropriate to use? I'd also prefer to not export
KernelModule
for now, since it ties us into an API. Probably best to stick this intointernal/kallsyms
. Need to figure out whether this needs to be cached, and if yes how to invalidate the cache when the set of loaded modules changes. Something inproc
that lists loaded modules maybe? - Merge
FlushKernelModuleCache
intobtf.FlushKernelSpec()
. There should be a single function to clean up cached info, and in a way this is all related to btf stuff anyways. We could rename the function if need be. - CORERelocate will work against
[]*Spec
as discussed. I think it makes sense to reject target type ID lookups for kmods as long as there is no canonical way to refer to these IDs from the kernel side. This might need some finessing in CORERelocate, I can probably take a look at that once the first implementation is in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, went over the PR again. Here is what I would propose, let me know what you think.
- Skip
ProgramSpec.KernelModule
. Instead,ProgramOptions.KernelTypes
becomes[]*btf.Spec
as discussed. IfKernelTypes
is nil, the library tries to find the appropriate BTF to use, via means of parsing kallsyms / available_filter_functions and doing a lookup of ProgramSpec.AttachTo. Most of the time this will be just vmlinux. Some of the time this will be vmlinux + kmod via LoadKernelModuleSpec. - kallsyms / available_filter_functions: which one is more appropriate to use? I'd also prefer to not export
KernelModule
for now, since it ties us into an API. Probably best to stick this intointernal/kallsyms
. Need to figure out whether this needs to be cached, and if yes how to invalidate the cache when the set of loaded modules changes. Something inproc
that lists loaded modules maybe? - Merge
FlushKernelModuleCache
intobtf.FlushKernelSpec()
. There should be a single function to clean up cached info, and in a way this is all related to btf stuff anyways. We could rename the function if need be. - CORERelocate will work against
[]*Spec
as discussed. I think it makes sense to reject target type ID lookups for kmods as long as there is no canonical way to refer to these IDs from the kernel side. This might need some finessing in CORERelocate, I can probably take a look at that once the first implementation is in.
It was hard to find a conclusive comparison between the two based on the kernel source, but |
In the case of btfhub-supported kernels, a user will need a way to know if/which module a function lies within. That way we can lookup the appropriate BTF to pass into the Since the library would already have this functionality, why not export it? Otherwise every user of btfhub, kmods, and |
|
This moves the loading of kernel/kmod BTF up from |
I made some changes that are close to what you want, but had to fill in some gaps. I would like to see |
494679f
to
a20efa4
Compare
@lmb can you take another look? |
Ugh. Since we need different One idea would be
Another idea is a callback function |
59a51a1
to
5a8c94d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Thank you for this contribution!
We are interested by using it in Inspektor Gadget.
So far, I only have one comment but I will taker another look later to get a broader understanding.
Best regards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to semantics: your current implementation means that a type defined in a kernel module can "shadow" a vmlinux type if it's better according to our scoring. I'm on the fence whether it'd be better to always prefer vmlinux if there is a matching target, no matter how good a kmod type is. Do you have an opinion on that?
btf/core.go
Outdated
for localType, group := range relosByType { | ||
localTypeName := localType.TypeName() | ||
if localTypeName == "" { | ||
return nil, fmt.Errorf("relocate unnamed or anonymous type %s: %w", localType, ErrNotSupported) | ||
} | ||
|
||
targets := target.namedTypes[newEssentialName(localTypeName)] | ||
fixups, err := coreCalculateFixups(group.relos, target, targets, bo) | ||
fixups, err := coreCalculateFixups(group.relos, &mergeTarget, mergeTarget.NamedTypesIterate(newEssentialName(localTypeName)), bo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging specs up front is going to be more complicated with lazy copy going in. How about the following pseudo go:
for localType, group := range relosByType {
var bestFixup
for _, target := range targets {
fixup, err := coreCalculateFixup(...)
if fixup > bestFixup {
bestFixup = fixup
// other checks
}
}
if bestFixup == nil {
// generate poison all
}
}
- Instead of merging the specs, we add a loop here which iterates over
targets
and tries to find a valid fixup. - We factor out the code to do scoring and call if from here in addition to coreCalculateFixups
Lines 276 to 294 in f95957d
if score > bestScore { // We have a better target already, ignore this one. continue } if score < bestScore { // This is the best target yet, use it. bestScore = score bestFixups = fixups continue } // Some other target has the same score as the current one. Make sure // the fixups agree with each other. for i, fixup := range bestFixups { if !fixup.equal(fixups[i]) { return nil, fmt.Errorf("%s: multiple types match: %w", fixup.kind, errAmbiguousRelocation) } } - We move generating the poison all fixup to after the loop over targets in this function:
Lines 298 to 309 in f95957d
// Nothing at all matched, probably because there are no suitable // targets at all. // // Poison everything except checksForExistence. bestFixups = make([]COREFixup, len(relos)) for i, relo := range relos { if relo.kind.checksForExistence() { bestFixups[i] = COREFixup{kind: relo.kind, local: 1, target: 0} } else { bestFixups[i] = COREFixup{kind: relo.kind, poison: true} } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually merging them, just storing pointers to each Spec and iterating through them with wrapper functions. I updated to handle your changes in main. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, works for me. I'm not a big fan of mergedSpec
, it seems too special purpose for me. Can you move it to core.go
so that it's close to where it'll be used?
Ah, that is annoying :( Why do you want to avoid multiple NewCollection calls?
That could work. I'd probably use
This is because you don't want to load all kmod by default? What if you only parsed kmod for all loaded modules instead?
Hmm, I don't follow, sorry. We could still stick the map in ProgramOptions I think? Not sure how we would make use of it when loading
I'm a bit wary of callbacks. If we add this it begs the question whether we'd use this for vmlinux as well? Does it deprecate |
We have a bunch of eBPF programs in the same object file. That is currently a single collection. Anything that needs kernel modules would need to be extracted out and dealt with separately, probably as individual program loads. That makes shared map handling way more complicated. It isn't impossible, but is a smell about the API maybe being wrong. |
If you mean modules already loaded in the kernel, that can still be 100 or more. That is a lot of CPU and memory used for probably no reason. Otherwise, we have to loop through the kprobe functions, determine which modules are needed, load those BTF. The library when loading each kprobe would also need to do that lookup of function to kmod, to load the correct BTF from the map. |
We could put it in |
How would that happen? kmod BTF is generated using the vmlinux BTF so that it doesn't include anything already in vmlinux BTF. |
e914652
to
913f586
Compare
@lmb updated using a map for the module types. |
Sorry for the long radio silence. I've been busy hashing out cilium/cilium work.
I don't see the problem to be honest.
By using distinct types with the same name in multiple compilation units. Try grepping for Which ring_buffer do we prefer? The kmod one? Or the kernel? Which one of the kernel ones?
Does that mean we can't do better? ;P
We can't figure out which modules we need up front, but we could do something like the following:
These are the correct semantics to me (rather than just vmlinux + kmod) but I understand that the immediate need is simpler. I still want to keep the door open for this later on though.
That's interesting! Can you show me an example or tell me how to reproduce that?
Thanks for giving your view point and giving me confirmation that I'm not completely off xD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, please make the changes to CORERelocate
as we discussed. If targets is nil it can just return an error, which is what it was doing before.
I'm a bit concerned about the dead lock potential between kernel and kmod loading, I'd love there to only be a single lock. Oh well.
Finally, the bit I'm still missing is how you're going to use this with external kmod BTF?
- You don't want to parse all loaded modules, since that might be in the 100s and could still be too slow.
- There is no feedback mechanism for you to figure out which modules the lib wants.
I see two solutions: replace the map with the callback you proposed or expose programKernelModule
on ProgramSpec
. The latter seems fine, and would also allow dropping the export of ebpf.KernelModule
. WDYT?
btf/core.go
Outdated
for localType, group := range relosByType { | ||
localTypeName := localType.TypeName() | ||
if localTypeName == "" { | ||
return nil, fmt.Errorf("relocate unnamed or anonymous type %s: %w", localType, ErrNotSupported) | ||
} | ||
|
||
targets := target.namedTypes[newEssentialName(localTypeName)] | ||
fixups, err := coreCalculateFixups(group.relos, target, targets, bo) | ||
fixups, err := coreCalculateFixups(group.relos, &mergeTarget, mergeTarget.NamedTypesIterate(newEssentialName(localTypeName)), bo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, works for me. I'm not a big fan of mergedSpec
, it seems too special purpose for me. Can you move it to core.go
so that it's close to where it'll be used?
You can see how I've implemented it at the moment: DataDog/ebpf-manager@8b77c9a |
This is not what it was doing. It tries to auto-load the kernel spec: Lines 173 to 179 in b3432fb
|
I meant this: Lines 169 to 170 in 4267cbc
I'm looking at merging this but realised that you merged main into your branch. Can you please undo that and rebase + squash on top of main? |
fd2e4dd
to
ce7b452
Compare
Done. Let me know if there are any other changes you'd like me to make. |
Signed-off-by: Bryce Kahle <bryce.kahle@datadoghq.com>
ce7b452
to
5fd9ae5
Compare
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
No changes to the source code. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
This allows dropping the fallback parameter and allows further refactoring. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Kernel and module BTF are currently kept in two separate global variables, with separate locking. This creates a deadlock hazard. Refactor the code to use a single lock, and get rid of a lot of indirection which is not necessary anymore because Spec.Copy is cheap. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
The function is already invoked via btf.FlushKernelSpec(). Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
5ad459e
to
215a22a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brycekahle I ended up making the changes myself.
- Don't export FlushKallsysm separately.
- Don't pass kmodName to CORERelocate.
- Remove the deadlock risk from having two locks protecting vmlinux and modules
I feel like these came up during the review before, but I'm not sure what happened.
Sorry it took so long to get this over the finish line!
Going to merge this tomorrow morning UK time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Thank you for the initial work and the polish!
I do not have a broad knowledge about it but I have one question with regard to locking and found a nit.
Best regards.
kernelBTF.Lock() | ||
defer kernelBTF.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to understand the locking here.
Particularly, I would just hold the write lock while writing kernelBTF.kernel
, but I surely miss something here. Can you please shed some light?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the code is a bit hard to follow. Reason we do it this way is that dropping the read lock and acquiring the write lock are not atomic. Another goroutine might have preempted us and populated kernelBTF.kernel.
base, err := LoadKernelSpec() | ||
if err != nil { | ||
return nil, fmt.Errorf("load kernel spec: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snippet can be moved right before you need it, i.e. line 86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would cause a deadlock: we already hold kernelBTF.Lock at that point, so acquiring it in LoadKernelSpec will block indefinitely.
re #705
KernelModule() (string, error)
toProgramSpec
. This function attempts to determine the kernel module for kprobe/fentry programs, by parsing/proc/kallsyms
.KernelModuleTypes map[string]*btf.Spec
toProgramOptions
, so users can provide their own kmod BTF.The usage of a
*Spec
in parts of CO-RE had to be abstracted to avoid copying/merging of[]Type
slices.This does work (tested with
nf_conntrack
kmod and__nf_conntrack_hash_insert
kprobe), but I imagine you have thoughts about how to adapt this approach.