Skip to content

Use the verbose parameter to enable the tracer's debugging without recompiling the program #145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 10, 2024

Conversation

tsint
Copy link
Contributor

@tsint tsint commented Sep 2, 2024

Currently, to enable BPF debugging, you need to recompile the profiler, replace the application on the server, and then replace it back after obtaining useful information. This is very inconvenient, so I thought of directly integrating the BPF program with BPF debugging enabled into the profiler. If BPF debugging needs to be enabled, you only need to restart the application and modify the parameters.

As shown below, you can easily obtain the BPF debug log output.

            etcd-1921    [003] d.h2.  4529.247295: bpf_trace_printk: delta index 6, addrLow 0x3b00, unwindInfo 174

            etcd-1921    [003] d.h2.  4529.247296: bpf_trace_printk: unwind: fp+16

            etcd-1921    [003] d.h2.  4529.247296: bpf_trace_printk: unwind: cfa+-16

            etcd-1921    [003] d.h2.  4529.247297: bpf_trace_printk:  pc: 46b2c5 sp: c00006fff8 fp: 7ffe3eb68f90

            etcd-1921    [003] d.h2.  4529.247297: bpf_trace_printk: ==== Resolve next frame unwinder: frame 5 ====

            etcd-1921    [003] d.h2.  4529.247298: bpf_trace_printk: Text section id for PC 46b2c5 is 178b55ad52677a08 (unwinder 1)

            etcd-1921    [003] d.h2.  4529.247298: bpf_trace_printk: Text section bias is 0, and offset is 46b2c5

            etcd-1921    [003] d.h2.  4529.247298: bpf_trace_printk: ==== unwind_native 5 ====

            etcd-1921    [003] d.h2.  4529.247299: bpf_trace_printk: Pushing 178b55ad52677a08 46b2c5 to position 5 on stack

            etcd-1921    [003] d.h2.  4529.247299: bpf_trace_printk: Look up stack delta for 178b55ad52677a08:46b2c5

            etcd-1921    [003] d.h2.  4529.247300: bpf_trace_printk: Intervals should be from 7 to 219 (mapID 9)

            etcd-1921    [003] d.h2.  4529.247301: bpf_trace_printk: delta index 60, addrLow 0xb2c0, unwindInfo 32769

@tsint tsint requested review from a team September 2, 2024 02:38
@tsint tsint force-pushed the pr_debugbpf branch 3 times, most recently from 16053f2 to 81eafe1 Compare September 2, 2024 03:13
@tsint tsint changed the title Add a debug-bpf parameter to the profiler to enable BPF debugging wit… Enable BPF debugging using the verbose parameter without recompiling the BPF program Sep 2, 2024
@athre0z
Copy link
Member

athre0z commented Sep 2, 2024

Hmm, I had thought about this as well, but I'm not sure whether it's a good idea: the debug blobs are 1MB each, and we'll be updating them frequently. This will probably have rather significant impact on the repository size in the long run. If we want this, we'll probably have to stop committing tracer blobs into git, which in turn means that this repository can no longer be pulled in as a Go module (without an external build system).

@tsint
Copy link
Contributor Author

tsint commented Sep 2, 2024

Hmm, I had thought about this as well, but I'm not sure whether it's a good idea: the debug blobs are 1MB each, and we'll be updating them frequently. This will probably have rather significant impact on the repository size in the long run. If we want this, we'll probably have to stop committing tracer blobs into git, which in turn means that this repository can no longer be pulled in as a Go module (without an external build system).

What if we put the debug blobs in a separate module and do not commit these debug blobs to git to reduce the impact on the repository size, while keeping the original tracker blobs unchanged? Would this address your concerns?

@athre0z
Copy link
Member

athre0z commented Sep 2, 2024

Hmm yeah, something along those lines could work. We don't have to go with a separate Go module, though, if that is what you are proposing. It'd probably be enough to go:embed them in a separate file with a _sometag.go suffix that is only compiled in when the person building the profiler explicitly requests it. Other than with the regular tracers, the responsibility to make sure the debug tracers are built somehow before go:embed kicks in would lie with the user. Maybe debugtracers could be a good name for such a tag.

@tsint
Copy link
Contributor Author

tsint commented Sep 3, 2024

Hmm yeah, something along those lines could work. We don't have to go with a separate Go module, though, if that is what you are proposing. It'd probably be enough to go:embed them in a separate file with a _sometag.go suffix that is only compiled in when the person building the profiler explicitly requests it. Other than with the regular tracers, the responsibility to make sure the debug tracers are built somehow before go:embed kicks in would lie with the user. Maybe debugtracers could be a good name for such a tag.

Your suggestion is a great idea.

@tsint tsint force-pushed the pr_debugbpf branch 5 times, most recently from 2d37db8 to 84fddb9 Compare September 3, 2024 01:57
@tsint tsint changed the title Enable BPF debugging using the verbose parameter without recompiling the BPF program Use the verbose parameter to enable the tracer's debugging without recompiling the program Sep 3, 2024
@tsint tsint requested review from a team as code owners September 19, 2024 00:19
@tsint tsint force-pushed the pr_debugbpf branch 2 times, most recently from c9882cf to 03072ed Compare September 19, 2024 01:59
@tmm1
Copy link

tmm1 commented Sep 29, 2024

thanks for adding this. the new param can also be wired up to the docker build:

diff --git a/Makefile b/Makefile
index e017b4a..5fb2c0f 100644
--- a/Makefile
+++ b/Makefile
@@ -115,7 +115,7 @@ docker-image:
 
 agent:
 	docker run -v "$$PWD":/agent -it --rm --user $(shell id -u):$(shell id -g) profiling-agent \
-	   "make TARGET_ARCH=$(TARGET_ARCH) VERSION=$(VERSION) REVISION=$(REVISION) BUILD_TIMESTAMP=$(BUILD_TIMESTAMP)"
+	   "make TARGET_ARCH=$(TARGET_ARCH) VERSION=$(VERSION) REVISION=$(REVISION) BUILD_TIMESTAMP=$(BUILD_TIMESTAMP) BUILD_TYPE=$(BUILD_TYPE)"
 
 legal:
 	@go install github.com/google/go-licenses@latest

@tsint
Copy link
Contributor Author

tsint commented Oct 4, 2024

Hi!
The MR is opened over a month.
Can someone please provide some feedback or a conclusion?

Best regards.

Copy link
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pass, will take another look

@tsint tsint force-pushed the pr_debugbpf branch 2 times, most recently from cc1cbf9 to 0413c89 Compare October 4, 2024 13:09
@tsint
Copy link
Contributor Author

tsint commented Oct 4, 2024

Did a pass, will take another look

Thanks for your suggestions.

@rockdaboot
Copy link
Contributor

The ebpf programs with debug symbols should load (did this in the past, but not recently).
Though, on Debian with

Linux box 6.10.11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1 (2024-09-22) x86_64 GNU/Linux

the agent stops with the following error

$ sudo ./ebpf-profiler -collection-agent=127.0.0.1:11000 -disable-tls -v
DEBU[0000] Config:                                      
DEBU[0000] bpf-log-level: 0                             
DEBU[0000] clock-sync-interval: 3m0s                    
DEBU[0000] collection-agent: 127.0.0.1:11000            
DEBU[0000] copyright: false                             
DEBU[0000] disable-tls: true                            
DEBU[0000] map-scale-factor: 0                          
DEBU[0000] monitor-interval: 5s                         
DEBU[0000] no-kernel-version-check: false               
DEBU[0000] pprof:                                       
DEBU[0000] probabilistic-interval: 1m0s                 
DEBU[0000] probabilistic-threshold: 100                 
DEBU[0000] reporter-interval: 5s                        
DEBU[0000] samples-per-second: 20                       
DEBU[0000] send-error-frames: false                     
DEBU[0000] t: all                                       
DEBU[0000] tracers: all                                 
DEBU[0000] v: true                                      
DEBU[0000] verbose: true                                
DEBU[0000] version: false                               
INFO[0000] Starting OTEL profiling agent  (revision , build timestamp ) 
DEBU[0000] Determining tracers to include               
DEBU[0000] Tracer string: all                           
INFO[0000] Interpreter tracers: perl,php,python,hotspot,ruby,v8,dotnet 
DEBU[0000] Traffic to 127.0.0.1:11000 is routed from 127.0.0.1 
WARN[0000] Using debug eBPF tracers                     
DEBU[0000] Size of eBPF map exe_id_to_14_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_15_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_19_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_16_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_17_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_21_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_8_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_10_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_12_stack_deltas: 65536 
DEBU[0000] Size of eBPF map stack_delta_page_to_info: 65536 
DEBU[0000] Size of eBPF map pid_page_to_mapping_info: 1048576 
DEBU[0000] Size of eBPF map exe_id_to_9_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_18_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_13_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_20_stack_deltas: 65536 
DEBU[0000] Size of eBPF map exe_id_to_11_stack_deltas: 65536 
ERRO[0000] processed 8568 insns (limit 1000000) max_states_per_insn 1 total_states 468 peak_states 468 mark_read 141 
ERRO[0000] Failed to load eBPF tracer: failed to load eBPF code: failed to load eBPF programs: failed to load unwind_v8

Is there anything that needs to be tuned?

@tsint
Copy link
Contributor Author

tsint commented Oct 8, 2024

on Debian with

I don't have this issue on my Mint 21.3 system. I'll try upgrading the kernel.

Linux mint213 6.5.0-14-generic #14~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 20 18:15:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

The output is as follows:

$ sudo ./opentelemetry-ebpf-profiler -collection-agent=127.0.0.1:11000 -disable-tls -v
DEBU[0000] Config:
DEBU[0000] bpf-log-level: 0
DEBU[0000] bpf-log-size: 65536
DEBU[0000] clock-sync-interval: 3m0s
DEBU[0000] collection-agent: 127.0.0.1:11000
DEBU[0000] copyright: false
DEBU[0000] disable-tls: true
DEBU[0000] map-scale-factor: 0
DEBU[0000] monitor-interval: 5s
DEBU[0000] no-kernel-version-check: false
DEBU[0000] pprof:
DEBU[0000] probabilistic-interval: 1m0s
DEBU[0000] probabilistic-threshold: 100
DEBU[0000] reporter-interval: 5s
DEBU[0000] samples-per-second: 20
DEBU[0000] send-error-frames: false
DEBU[0000] t: all
DEBU[0000] tracers: all
DEBU[0000] v: true
DEBU[0000] verbose: true
DEBU[0000] version: false
INFO[0000] Starting OTEL profiling agent  (revision pr_debugbpf-4a355b7d, build timestamp 1728387148)
DEBU[0000] Determining tracers to include
DEBU[0000] Tracer string: all
INFO[0000] Interpreter tracers: perl,php,python,hotspot,ruby,v8,dotnet
DEBU[0000] Traffic to 127.0.0.1:11000 is routed from 127.0.0.1
WARN[0000] Using debug eBPF tracers
DEBU[0000] Size of eBPF map exe_id_to_13_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_12_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_11_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_18_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_21_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_10_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_15_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_19_stack_deltas: 65536
DEBU[0000] Size of eBPF map pid_page_to_mapping_info: 1048576
DEBU[0000] Size of eBPF map exe_id_to_8_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_16_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_17_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_9_stack_deltas: 65536
DEBU[0000] Size of eBPF map exe_id_to_20_stack_deltas: 65536
DEBU[0000] Size of eBPF map stack_delta_page_to_info: 65536
DEBU[0000] Size of eBPF map exe_id_to_14_stack_deltas: 65536
DEBU[0000] PAC is not enabled on the system.
INFO[0000] Found offsets: task stack 0x20, pt_regs 0x3f58, tpbase 0x1528
INFO[0000] Supports generic eBPF map batch operations
INFO[0000] Supports LPM trie eBPF map batch operations
DEBU[0000] Found KERNEL TEXT at ffffffffa1600000-ffffffffa2a00000
INFO[0000] eBPF tracer loaded
DEBU[0000] = PID: 1
DEBU[0000] ProcessManager doesn't know about PID 1
DEBU[0000] Stored file ID mapping 0xf0995bc925e74b6 -> 0xf0995bc925e74b6df0977b89d09ebec
DEBU[0000] Stored file ID mapping 0xe3dec1708fe4df32 -> 0xe3dec1708fe4df3273ade8a4322ce4a6
DEBU[0000] Stored file ID mapping 0x65db3171cdbf0d86 -> 0x65db3171cdbf0d869bddf033582b15a1
DEBU[0000] Stored file ID mapping 0x8d8564c3fd4f13a6 -> 0x8d8564c3fd4f13a6479eff70e771f40a
DEBU[0000] Stored file ID mapping 0x579c00bfae51d8e8 -> 0x579c00bfae51d8e8afdd19e220cbf741
DEBU[0000] Stored file ID mapping 0x4777caf8980311a7 -> 0x4777caf8980311a7463d924254063534
DEBU[0000] Stored file ID mapping 0x57a86f9d0fddc0ea -> 0x57a86f9d0fddc0ea531829bc0f395c7f
DEBU[0000] Stored file ID mapping 0x37d89eca9f41687 -> 0x37d89eca9f41687f4d001cb8a720bac

@rockdaboot
Copy link
Contributor

@tsint The issue is unrelated to this PR. It also happens on main and even before the recent upgrade to github.com/cilium/ebpf v0.16.0.

@rockdaboot
Copy link
Contributor

@tsint Do you mind to update README.md (Building)?

@tsint
Copy link
Contributor Author

tsint commented Oct 9, 2024

@tsint Do you mind to update README.md (Building)?

No problem.

Copy link
Contributor

@rockdaboot rockdaboot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@rockdaboot
Copy link
Contributor

@tsint There is a minor merge conflict in README.md. Please resolve :)

@tsint
Copy link
Contributor Author

tsint commented Oct 10, 2024

@tsint There is a minor merge conflict in README.md. Please resolve :)

OK.

@christos68k christos68k merged commit 400bc73 into open-telemetry:main Oct 10, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants