Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting SIGSEGV when host is restarted #8252

Closed
hilshalom opened this issue Dec 5, 2023 · 9 comments
Closed

Getting SIGSEGV when host is restarted #8252

hilshalom opened this issue Dec 5, 2023 · 9 comments
Labels
Stale status: waiting-for-triage waiting-for-user Waiting for more information, tests or requested changes

Comments

@hilshalom
Copy link

hilshalom commented Dec 5, 2023

Bug Report

Describe the bug
When performing host reboots in a loop, a fluent-bit core dump was created. In the logs we see a SIGSEGV was caught with the following stack:

To Reproduce
Perform host reboots in a loop

[engine] caught signal (SIGSEGV)
0x55c348e6c2f8      in  ???() at ???:0
0x55c348b7a5d7      in  ???() at ???:0
0x55c348b772b5      in  ???() at ???:0
0x7f84cdf2151f      in  ???() at ???:0
0x55c348b771a0      in  ???() at ???:0
0x7f84cdf2151f      in  ???() at ???:0
0x7f84ce004fde      in  ???() at ???:0
0x55c348ecbceb      in  _mk_event_wait_2() at lib/monkey/mk_core/mk_event_epoll.c:439
0x55c348ecc145      in  mk_event_wait() at lib/monkey/mk_core/mk_event.c:194
0x55c348b88f34      in  ???() at ???:0
0x55c348bb2f71      in  ???() at ???:0
0x7f84cdf73b42      in  ???() at ???:0
0x7f84ce004bb3      in  ???() at ???:0
0xffffffffffffffff  in  ???() at ???:0

Core dump

(gdb) bt
#0  0x00007f84cdf75a7c in __pthread_mutex_cond_lock_full (mutex=0x29) at ../nptl/pthread_mutex_lock.c:184
#1  0x0000000000000000 in ?? ()

(gdb) info threads
  Id   Target Id                       Frame
1    Thread 0x7f84cd6da640 (LWP 210) 0x00007f84cdf75a7c in __pthread_mutex_cond_lock_full (mutex=0x29) at ../nptl/pthread_mutex_lock.c:184
  2    Thread 0x7f84cceb1640 (LWP 211) 0x00007f84ce004fde in epoll_wait (epfd=104, events=0x7f84c8240850, maxevents=64, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3    Thread 0x7f84c7fff640 (LWP 212) 0x00007f84ce004fde in epoll_wait (epfd=107, events=0x7f84c8242060, maxevents=64, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  4    Thread 0x7f84cdedca00 (LWP 41)  0x00007f84cdfc4868 in _GI__clock_nanosleep (clock_id=0, flags=0, req=0x7ffc913720c0, rem=0xffffffffffffffff)
    at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:65
  5    Thread 0x7f84cdedb640 (LWP 209) 0x00007f84cdff89bf in fallocate64 (fd=147, mode=0, offset=0, len=102400) at ../sysdeps/unix/sysv/linux/fallocate64.c:27

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f84cd6da640 (LWP 210))]
#0  0x00007f84cdf75a7c in __pthread_mutex_cond_lock_full (mutex=0x29) at ../nptl/pthread_mutex_lock.c:184
184     in ../nptl/pthread_mutex_lock.c

(gdb) bt
#0  0x00007f84cdf75a7c in __pthread_mutex_cond_lock_full (mutex=0x29) at ../nptl/pthread_mutex_lock.c:184
#1  0x0000000000000000 in ?? ()

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f84cceb1640 (LWP 211))]
#0  0x00007f84ce004fde in epoll_wait (epfd=104, events=0x7f84c8240850, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.

(gdb) bt
#0  0x00007f84ce004fde in epoll_wait (epfd=104, events=0x7f84c8240850, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055c3492222d0 in ?? ()
#2  0x00000068cdf844d3 in ?? ()
#3  0x00007f84c8240850 in ?? ()
#4  0xffffffff00000040 in ?? ()
#5  0x00007f84c0001050 in ?? ()
#6  0x000055c348ecbcec in _mk_event_wait_2 (loop=0x7f84c8240b60, timeout=-1) at /tmp/fluent-bit-2.0.8/lib/monkey/mk_core/mk_event_epoll.c:439
#7  0x000055c348ecc146 in mk_event_wait (loop=0x7f84c8240b60) at /tmp/fluent-bit-2.0.8/lib/monkey/mk_core/mk_event.c:194
#8  0x000055c348b9a681 in output_thread ()
#9  0x000055c348bb2f72 in step_callback ()
#10 0x00007f84cdf73b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:470
#11 0x00007f84ce004bb4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52
#12 0x0000000000000000 in ?? ()

Your Environment

  • Version used: 2.0.8
@patrick-stephens
Copy link
Contributor

Can you reproduce it with a more recent version, ideally latest at 2.2.0?

@patrick-stephens patrick-stephens added the waiting-for-user Waiting for more information, tests or requested changes label Dec 5, 2023
@hilshalom
Copy link
Author

Thank you for your reply. Our product is using fluent bit 2.0.8 so this is the version that is relevant for us.

@patrick-stephens
Copy link
Contributor

2.1 and 2.2 series are the current maintained releases.
There are unlikely to be any future 2.0 releases, and 2.0.14 is the latest release in that series so a new 2.0.15 release would have been the next one. We do not backport and re-release old versions for OSS but there are enterprise support providers that may - see the docs: https://fluentbit.io/enterprise/

I would verify it with the latest release to see if it is already resolved or still occurs.
If it still occurs then it can be investigated with a fix for a future release potentially.
If it does not occur then we know it is resolved and potentially it is then possible to figure out a backported fix you could contribute or fork 2.0.8 with directly.

The 2.0.14 release is worth a try as well: https://github.com/fluent/fluent-bit/releases/tag/v2.0.14
This likely has less deltas from your version so may be easier to adopt or investigate if it resolves the issue.

@hilshalom
Copy link
Author

Thank you for your input, we will try to reproduce with the latest version and consider upgrading the version we use.

@hilshalom
Copy link
Author

Hi @patrick-stephens
I am trying to upgrade the fluent-bit version, from version 2.1.3 and higher Linking C executable fails when FLB_IN_SYSTEMD is off. Are there any specific requirements to disable this flag?

Linking C executable ../bin/fluent-bit
/usr/bin/ld: ../library/libflb-plugin-in_node_exporter_metrics.a(ne_systemd.c.o): in function get_system_property': /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:125: undefined reference to sd_bus_get_property_string'
/usr/bin/ld: /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:134: undefined reference to sd_bus_get_property_trivial' /usr/bin/ld: ../library/libflb-plugin-in_node_exporter_metrics.a(ne_systemd.c.o): in function get_unit_property':
/tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:183: undefined reference to sd_bus_get_property_string' /usr/bin/ld: /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:192: undefined reference to sd_bus_get_property_trivial'
/usr/bin/ld: ../library/libflb-plugin-in_node_exporter_metrics.a(ne_systemd.c.o): in function ne_systemd_update_unit_state': /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:231: undefined reference to sd_bus_call_method'
/usr/bin/ld: /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:244: undefined reference to sd_bus_message_enter_container' /usr/bin/ld: /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:247: undefined reference to sd_bus_message_unref'
/usr/bin/ld: /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:261: undefined reference to sd_bus_message_read' /usr/bin/ld: /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:276: undefined reference to sd_bus_message_unref'
/usr/bin/ld: /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:488: undefined reference to sd_bus_message_exit_container' /usr/bin/ld: /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:490: undefined reference to sd_bus_message_unref'
/usr/bin/ld: ../library/libflb-plugin-in_node_exporter_metrics.a(ne_systemd.c.o): in function ne_systemd_init': /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:587: undefined reference to sd_bus_open_system'
/usr/bin/ld: ../library/libflb-plugin-in_node_exporter_metrics.a(ne_systemd.c.o): in function ne_systemd_exit': /tmp/fluent-bit-2.2.0/plugins/in_node_exporter_metrics/ne_systemd_linux.c:791: undefined reference to sd_bus_unref'

@patrick-stephens
Copy link
Contributor

Not really, it looks like node exporter has some implicit dependency on systemd @cosmo0920?
I would disable node exporter and see if that resolves it.

Also, if you're compiling it yourself then you need to make sure you indicate that in any issues @hilshalom. You should include full details of how it is built as possibly an issue is due to misconfiguration in the build process.

@hilshalom
Copy link
Author

hilshalom commented Jan 2, 2024

image

`
image

image
`

Copy link
Contributor

github-actions bot commented Apr 7, 2024

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Apr 7, 2024
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale status: waiting-for-triage waiting-for-user Waiting for more information, tests or requested changes
Projects
None yet
Development

No branches or pull requests

2 participants