Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add determinism fuzzers and fix rare determinism bugs #2648

Merged
merged 4 commits into from
May 15, 2021

Conversation

terrelln
Copy link
Contributor

See each commit message for details.

  • Fix determinism bug in the optimal parser
  • Fix off-by-one error in repcode checks (not a determinism problem itself, but inefficient)
  • Fix up the dictionary invalidation logic to update lowLimit when it is too low for the beginning of the block
  • Adds determinism fuzzing to simple_round_trip and dictionary_round_trip

@terrelln
Copy link
Contributor Author

This is a non-trivial change. It isn't terribly complex, but it is not trivial. If we wanted to merge this into 1.5.0 we'd probably need to re-run the long tests and fuzzers.

I don't think it is terribly critical to merge, since these are longstanding bugs. The only one that might impact us in practice is the dictionary invalidation bug, which might be losing us a few percent of speed.

@terrelln terrelln force-pushed the determinism-fuzzer branch from 9d41f0e to 725c5e4 Compare May 14, 2021 00:01
terrelln added 4 commits May 13, 2021 17:05
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.

https://github.com/facebook/zstd/blob/40def70387f99b239f3f566ba399275a8fd44cde/lib/compress/zstd_opt.c#L476

This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.

The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the
block. This ensures that `lowLimit` is updated to `dictLimit` whenever
the ext-dict is out of range, so we can use prefix mode for speed.

This can cause non-determinism because prefix mode and ext-dict mode
match finders can return different results. It can also hurt speed
because ext-dict match finders are slower.

The scenario is:
1. Compress large data with a dictionary.
2. The dictionary goes out of bounds, so we invalidate it.
3. However, we still have `lowLimit < dictLimit`, since it is
   never updated.
4. We will call the ext-dict match finder instead of the prefix one.
Compress the input twice in the `simple_round_trip` and
`dictionary_round_trip` fuzzers with exactly the same parameters, but
reusing the context. Then ensure that the compressed output is
identical.
@Cyan4973
Copy link
Contributor

A few percent of speed, for a rare corner case, doesn't feel critical.

@terrelln terrelln changed the title Add determinism fuzzers and fix bugs Add determinism fuzzers and fix minor determinism bugs May 14, 2021
@terrelln terrelln changed the title Add determinism fuzzers and fix minor determinism bugs Add determinism fuzzers and fix rare determinism bugs May 14, 2021
@terrelln terrelln merged commit accbf0a into facebook:dev May 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants