[pkg/stanza/fileconsumer] Fix long line parsing #32100

OverOrion · 2024-04-02T12:44:59Z

Description:
Flush could have sent partial input before EOF was reached, this PR fixes it.

Link to tracking Issue: #31512, #32170

Testing: Added unit test TestFlushPeriodEOF

Documentation: Added a note to force_flush_period option

djaglowski

Thanks @OverOrion and @ChrsMark for finding this. Unfortunately, I do not believe this implementation works.

The underlying array may point to data that will be overwritten by a subsequent call to Scan.

This certainly seems to be a problem. To articulate the issue a bit more, we are not calling Scan again until we emit the token. However, because the token is emitted as a slice which directly references the scanner's buffer, it's contents may change later.

Possible solutions then would seem to be:

Copy the token into a new slice to ensure the underlying contents will not change.
Clearly advise emit funcs that they may need to do this, depending on their need for correctness vs performance. (And then handle the copy in our emit funcs)

I think it's fair of us to prioritize correctness but it will be interesting to see the performance impact associated with copying every token. To that point, we will certainly need unit tests for this as well as benchmark results to understand the tradeoff we are introducing.

Longer term, we may want to look at replacing the scanner altogether, as it's not necessarily the most performant solution. It does however provide a high degree of robustness that will be difficult to reproduce.

pkg/stanza/fileconsumer/internal/reader/reader.go

djaglowski

Sorry for the delay, finally starting to catch up.

receiver/filelogreceiver/README.md

pkg/stanza/fileconsumer/internal/reader/factory_test.go

pkg/stanza/flush/flush.go

OverOrion · 2024-04-19T05:59:28Z

Thanks @djaglowski @ChrsMark, will address this today!

Flush should only happen if the scanner reached EOF Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

djaglowski

LGTM, just correcting the description a bit, since we update the flush timeout whenever new data is found, even if we do not emit a token.

receiver/filelogreceiver/README.md

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>

…37596) Fixes #35042 (and #32100 again) The issue affected unterminated logs of particular lengths. Specifically, longer than our internal `scanner.DefaultBufferSize` (16kB) and shorter than `max_log_size`. The failure mode was described in #32100 but was apparently only fixed in some circumstances. I believe this is a more robust fix. I'll articulate the exact failure mode again here: 1. During a poll cycle, `reader.ReadToEnd` is called. Within this, a scanner is created which starts with a default buffer size. The buffer is filled, but no terminator is found. Therefore the scanner resizes the buffer to accommodate more data, hoping to find a terminator. Eventually, the buffer is large enough to contain all content until EOF, but still no terminator was found. At this time, the flush timer has not expired, so `reader.ReadToEnd` returns without emitting anything. 2. During the _next_ poll cycle, `reader.ReadToEnd` creates a new scanner, also with default buffer size. The first time is looks for a terminator, it of course doesn't find one, but at this time the flush timer has expired. Therefore, instead of resizing the buffer and continuing to look for a terminator, it just emits what it has. What should happen instead is the scanner continues to resize the buffer to find as much of the unterminated token as possible before emitting it. Therefore, this fix introduces a simple layer into the split func stack which allows us to reason about unterminated tokens more carefully. It captures the length of unterminated tokens and ensures that when we recreate a scanner, we will start with a buffer size that is appropriate to read the same content as last time, plus one additional byte. The extra byte allows us to check if new content has been added, in which case we will resume resizing. If no new content is found, the flusher will emit the entire unterminated token as one.

…pen-telemetry#37596) Fixes open-telemetry#35042 (and open-telemetry#32100 again) The issue affected unterminated logs of particular lengths. Specifically, longer than our internal `scanner.DefaultBufferSize` (16kB) and shorter than `max_log_size`. The failure mode was described in open-telemetry#32100 but was apparently only fixed in some circumstances. I believe this is a more robust fix. I'll articulate the exact failure mode again here: 1. During a poll cycle, `reader.ReadToEnd` is called. Within this, a scanner is created which starts with a default buffer size. The buffer is filled, but no terminator is found. Therefore the scanner resizes the buffer to accommodate more data, hoping to find a terminator. Eventually, the buffer is large enough to contain all content until EOF, but still no terminator was found. At this time, the flush timer has not expired, so `reader.ReadToEnd` returns without emitting anything. 2. During the _next_ poll cycle, `reader.ReadToEnd` creates a new scanner, also with default buffer size. The first time is looks for a terminator, it of course doesn't find one, but at this time the flush timer has expired. Therefore, instead of resizing the buffer and continuing to look for a terminator, it just emits what it has. What should happen instead is the scanner continues to resize the buffer to find as much of the unterminated token as possible before emitting it. Therefore, this fix introduces a simple layer into the split func stack which allows us to reason about unterminated tokens more carefully. It captures the length of unterminated tokens and ensures that when we recreate a scanner, we will start with a buffer size that is appropriate to read the same content as last time, plus one additional byte. The extra byte allows us to check if new content has been added, in which case we will resume resizing. If no new content is found, the flusher will emit the entire unterminated token as one.

OverOrion requested a review from djaglowski as a code owner April 2, 2024 12:44

OverOrion requested a review from a team April 2, 2024 12:44

github-actions bot assigned bogdandrutu Apr 2, 2024

github-actions bot added the pkg/stanza label Apr 2, 2024

OverOrion force-pushed the stanza-fix-scanner branch 2 times, most recently from 9683fd8 to 954249f Compare April 2, 2024 13:11

djaglowski requested changes Apr 2, 2024

View reviewed changes

pkg/stanza/fileconsumer/internal/reader/reader.go Outdated Show resolved Hide resolved

OverOrion marked this pull request as draft April 4, 2024 07:19

OverOrion force-pushed the stanza-fix-scanner branch 6 times, most recently from 224b323 to 36c8f1d Compare April 5, 2024 14:37

github-actions bot added the receiver/filelog label Apr 5, 2024

OverOrion requested a review from djaglowski April 5, 2024 14:38

OverOrion marked this pull request as ready for review April 5, 2024 14:39

github-actions bot assigned atoulme Apr 5, 2024

ChrsMark mentioned this pull request Apr 5, 2024

[receiver/filelog] Flush can send partial input #32170

Closed

djaglowski reviewed Apr 16, 2024

View reviewed changes

receiver/filelogreceiver/README.md Outdated Show resolved Hide resolved

pkg/stanza/fileconsumer/internal/reader/factory_test.go Outdated Show resolved Hide resolved

pkg/stanza/flush/flush.go Outdated Show resolved Hide resolved

[pkg/stanza/fileconsumer] Fix long line parsing

c8e274b

Flush should only happen if the scanner reached EOF Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

OverOrion force-pushed the stanza-fix-scanner branch from 36c8f1d to c8e274b Compare April 22, 2024 06:55

djaglowski approved these changes Apr 22, 2024

View reviewed changes

receiver/filelogreceiver/README.md Outdated Show resolved Hide resolved

[receiver/filelog] update force_flush_period docs

3cfd79f

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>

ChrsMark approved these changes Apr 23, 2024

View reviewed changes

djaglowski approved these changes Apr 23, 2024

View reviewed changes

djaglowski merged commit ba22e43 into open-telemetry:main Apr 23, 2024
156 checks passed

github-actions bot added this to the next release milestone Apr 23, 2024

ChrsMark mentioned this pull request Apr 25, 2024

Filelog receiver looses characters #31512

Closed

djaglowski mentioned this pull request Apr 26, 2024

[pkg/stanza] Fileconsumer TestFlushPeriodEOF test failing on Windows #32715

Closed

djaglowski mentioned this pull request Jan 30, 2025

[receiver/filelog] Fix issue where flushed tokens could be truncated #37596

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/stanza/fileconsumer] Fix long line parsing #32100

[pkg/stanza/fileconsumer] Fix long line parsing #32100

OverOrion commented Apr 2, 2024 •

edited

Loading

djaglowski left a comment •

edited

Loading

djaglowski left a comment

OverOrion commented Apr 19, 2024

djaglowski left a comment

[pkg/stanza/fileconsumer] Fix long line parsing #32100

[pkg/stanza/fileconsumer] Fix long line parsing #32100

Conversation

OverOrion commented Apr 2, 2024 • edited Loading

djaglowski left a comment • edited Loading

Choose a reason for hiding this comment

djaglowski left a comment

Choose a reason for hiding this comment

OverOrion commented Apr 19, 2024

djaglowski left a comment

Choose a reason for hiding this comment

OverOrion commented Apr 2, 2024 •

edited

Loading

djaglowski left a comment •

edited

Loading