-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filesystem based chunk storage results in "chunk_io_locked" exception followed by fluentbit process termination #4598
Comments
The stack trace looks like it has your username in it so presumably you've compiled Fluent Bit from source to get it. What options did you use, on what OS and compiler, etc.? Any other patches? |
Hi Patrick, Tried with latest fluentbit v1.9 (active master branch). It's been manually complied by me. Build Machine - Amazon Linux 2 (a centos varient). Related important settings used in our configuration pipeline At service level => At Filter level => (Tail - Total count of 14 filters in our pipeline) Following are flags+active plugins in built binary Inputs Filters Outputs Internal Error Message:- [2022/01/17 08:37:08] [error] [filter:lua:lua.85] failed to allocate outbuf [2022/01/17 08:37:09] [engine] caught signal (SIGSEGV) #0 0x79dda1 in cio_chunk_is_locked() at lib/chunkio/src/cio_chunk.c:375 #1 0x48aa2c in input_chunk_get() at src/flb_input_chunk.c:1124 #4 0x4ab4de in in_emitter_add_record() at plugins/in_emitter/emitter.c:117 #5 0x55d281 in process_record() at plugins/filter_rewrite_tag/rewrite_tag.c:352 #7 0x457097 in flb_filter_do() at src/flb_filter.c:124 #10 0x4aed2f in flb_tail_mult_process_first() at plugins/in_tail/tail_multiline.c:166 #12 0x4b1780 in process_content() at plugins/in_tail/tail_file.c:437 #14 0x4ac028 in in_tail_collect_event() at plugins/in_tail/tail.c:261 #15 0x4ad1e5 in tail_fs_event() at plugins/in_tail/tail_fs_inotify.c:268 #16 0x456b83 in flb_input_collector_fd() at src/flb_input.c:1102 #17 0x4694eb in flb_engine_handle_event() at src/flb_engine.c:413 #19 0x44b770 in flb_lib_worker() at src/flb_lib.c:627 #22 0xffffffffffffffff in ???() at ???:0 |
Would you mind showing your fluent-bit.conf here? |
I was able to reproduce the rewrite_tag chunk_io_locked segfault with moderate log tailing load on a k8s cluster 1.21.5 (running a daemonset fluent bit 1.18.15). My data pipeline is primarily: input tail (all container logs) -> kubernetes filter -> rewrite_tag with re-emit the original (non rewrite_tag) records are sent to logstash while the records that are processed by rewrite_tag ultimately NOTE:
Stack trace and logs leading up to the error.
|
FWIW, disabling all of the |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
Any update @patrick-stephens/ @EKwongChum ? |
Not really: both versions listed in this PR are quite old now so I would encourage you to try the latest to see if the issue is resolved already. I don't believe there will be any further releases of 1.7 or 1.8. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
This issue was closed because it has been stalled for 5 days with no activity. |
Version: 1.7.5
Environment: Ubuntu (containarized) (k8s)
We have a high load in production and multiple files+rewrite tags in our pipleline.
Upon new log files found over a period in k8s cluster, we will restart fluentbit (SIGTERM call followed by new process creation )
I can see fluentbit crashed sporadically,
Following is the GDB backtrace observed in one of crash dump file
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fe965090921 in __GI_abort () at abort.c:79
#2 0x0000000000436a72 in flb_signal_handler (signal=11) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/fluent-bit.c:514
#3
#4 0x000000000072e3ee in cio_chunk_is_locked (ch=0x36) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/lib/chunkio/src/cio_chunk.c:343
#5 0x0000000000478b7c in input_chunk_get (tag=0x7fe960446f30 "klog", tag_len=4, in=0x7fe9603f2a80, chunk_size=368, set_down=0x7fe96504df08) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input_chunk.c:630
#6 0x0000000000479121 in flb_input_chunk_append_raw (in=0x7fe9603f2a80, tag=0x7fe960446f30 "klog", tag_len=4, buf=0x7fe96001b9d0, buf_size=368) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input_chunk.c:865
#7 0x000000000048cf47 in in_emitter_add_record (tag=0x7fe960446680 "klog", tag_len=4, buf_data=0x7fe96624901f <error: Cannot access memory at address 0x7fe96624901f>, buf_size=368, in=0x7fe9603f2a80) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_emitter/emitter.c:117
#8 0x0000000000523635 in process_record (tag=0x7fe960452b90 "kubelet", tag_len=7, map=..., buf=0x7fe96624901f, buf_size=368, keep=0x7fe96504e160, ctx=0x7fe9603f2270) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/filter_rewrite_tag/rewrite_tag.c:324
#9 0x000000000052378b in cb_rewrite_tag_filter (data=0x7fe96624901f, bytes=368, tag=0x7fe960452b90 "kubelet", tag_len=7, out_buf=0x7fe96504e1f8, out_bytes=0x7fe96504e1e8, f_ins=0x1320a40, filter_context=0x7fe9603f2270, config=0x128e290)
at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/filter_rewrite_tag/rewrite_tag.c:375
#10 0x000000000044cc0c in flb_filter_do (ic=0x7fe96043ee70, data=0x7fe960018ce0, bytes=371, tag=0x7fe96043ef00 "kubelet", tag_len=7, config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_filter.c:118
#11 0x00000000004792ee in flb_input_chunk_append_raw (in=0x7fe960404f50, tag=0x7fe96043ef00 "kubelet", tag_len=7, buf=0x7fe960018ce0, buf_size=371) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input_chunk.c:911
#12 0x000000000048cf47 in in_emitter_add_record (tag=0x7fe9604520c0 "kubelet", tag_len=7, buf_data=0x7fe96045c5d0 "\222\327", buf_size=371, in=0x7fe960404f50) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_emitter/emitter.c:117
#13 0x0000000000523635 in process_record (tag=0x7fe96042ede0 "syslog", tag_len=6, map=..., buf=0x7fe96045c5d0, buf_size=371, keep=0x7fe96504e510, ctx=0x7fe960404740) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/filter_rewrite_tag/rewrite_tag.c:324
#14 0x000000000052378b in cb_rewrite_tag_filter (data=0x7fe96045c5d0, bytes=371, tag=0x7fe96042ede0 "syslog", tag_len=6, out_buf=0x7fe96504e5a8, out_bytes=0x7fe96504e598, f_ins=0x1322050, filter_context=0x7fe960404740, config=0x128e290)
at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/filter_rewrite_tag/rewrite_tag.c:375
#15 0x000000000044cc0c in flb_filter_do (ic=0x7fe9604519d0, data=0x7fe960013630, bytes=177, tag=0x7fe96029a320 "syslog", tag_len=6, config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_filter.c:118
#16 0x00000000004792ee in flb_input_chunk_append_raw (in=0x12c30e0, tag=0x7fe96029a320 "syslog", tag_len=6, buf=0x7fe960013630, buf_size=177) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input_chunk.c:911
#17 0x0000000000491487 in process_content (file=0x7fe9602a11b0, bytes=0x7fe96504e858) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_tail/tail_file.c:367
#18 0x000000000049316b in flb_tail_file_chunk (file=0x7fe9602a11b0) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_tail/tail_file.c:994
#19 0x000000000048d9ba in in_tail_collect_event (file=0x7fe9602a11b0, config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_tail/tail.c:261
#20 0x0000000000498277 in tail_fs_event (ins=0x12c30e0, config=0x128e290, in_context=0x7fe96029dec0) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_tail/tail_fs_inotify.c:268
#21 0x000000000044c6cd in flb_input_collector_fd (fd=215, config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input.c:1004
#22 0x000000000045c5d8 in flb_engine_handle_event (config=0x128e290, mask=1, fd=215) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_engine.c:363
#23 flb_engine_start (config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_engine.c:624
#24 0x00000000004422db in flb_lib_worker (data=0x128e260) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_lib.c:493
#25 0x00007fe965e0a6db in start_thread (arg=0x7fe96504f700) at pthread_create.c:463
Note: We never got into this scenario , when we didn't use filesytem based storage.
The text was updated successfully, but these errors were encountered: