-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reflection-based deterministic message hashing #30761
Conversation
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
…rently Signed-off-by: Raven Black <ravenblack@dropbox.com>
hash_type(reflection->Get##get_type(message, field)); \ | ||
} | ||
|
||
#define MAP_SORT_BY(get_type) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pass in the variable name rather than capturing the free variable, ie MAP_SORT_BY(map, get_type) ?
Also the map
variable at call-sites is really a vector. Call it that? I'm actually not sure we really need to macro-izer the call to std::sort. I think you could do this readably without macros by instead just making a template function that returns the comparator:
std::sort(map.begin(), map.end(), makeCompareFn<get_type>);
I think that will be more readaable at call-sits and only slight more verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can template e.g. makeCompareFn<int>
to reflection->GetInt(...)
can you? Other than by explicitly writing a specialization for each of the possible values. If we want to go that way I might as well just manually fully expand the macro throughout.
(It is frustrating that the reflection API doesn't have a Get template with specializations that would make this easy!)
I've now done it with a middle-ground template which to me has the worst readability of all possible options. Do you prefer this over the full expansion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how that got awkward fast :) Let me reflect on that (no pun intended).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restructured it without that template, it's less lines of code.
Then separated the sorting out into its own helper class, then made the comparison function also its own helper class so there's no std::function or even lambdas involved any more. Then got rid of the original helper class because the comparer is enough to make it more suitable to just be a function again. :)
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
/retest |
/retest |
/coverage |
Coverage for this Pull Request will be rendered here: https://storage.googleapis.com/envoy-pr/30761/coverage/index.html The coverage results are (re-)rendered each time the CI |
Signed-off-by: Raven Black <ravenblack@dropbox.com>
The reduction is necessary because the new uncovered lines are unreachable enum paths that cannot be tested, and the existing covered function had its LoC reduced which also contributed to a coverage percentage downswing. Signed-off-by: Raven Black <ravenblack@dropbox.com>
CC @envoyproxy/coverage-shephards: FYI only for changes made to |
test/per_file_coverage.sh
Outdated
@@ -15,7 +15,7 @@ declare -a KNOWN_LOW_COVERAGE=( | |||
"source/common/matcher:94.6" | |||
"source/common/network:94.4" # Flaky, `activateFileEvents`, `startSecureTransport` and `ioctl`, listener_socket do not always report LCOV | |||
"source/common/network/dns_resolver:91.4" # A few lines of MacOS code not tested in linux scripts. Tested in MacOS scripts | |||
"source/common/protobuf:96.5" | |||
"source/common/protobuf:96.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a way to keep the existing coverage threshold for this directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a relevant one. Literally every new line of code that is reachable is covered. We could add some nonsense padding lines in covered areas, or add a test to something completely unrelated to this change, would be the only feasible ways to re-increase the coverage percentage.
OR, an alternative I considered, give the helper class a separate constructor to allow to construct it with invalid input fields, expose that constructor only for testing (but it would still be built for production) and use that to add test coverage of the unreachable lines by making them reachable. Pretty sure that's even worse than the other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Coverage LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up actually improving the coverage, as the final version of deterministic_hash.cc
now has 100% coverage (the switch with unreachable paths was excised thanks to jmarantz pointing out that we don't actually have to sort a list in order to hash it ignoring order), and the intermediate/current version of utility.cc
still has the old lines covered for the lifespan of a runtime guard.
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Updated to use |
/retest |
changelogs/current.yaml
Outdated
The performance of the hash operation is improved by 2-10x depending on the structure of the message, | ||
which is expected to reduce config update time or startup time by 10-25%. The new algorithm is also | ||
used for http_cache_filter hashing, which will effectively cause a one-time cache flush on update | ||
for users with a persistent cache. To revert this behavior set ``envoy.restart_features.use_fast_protobuf_hash`` to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you feel about making this default to the current behavior.
I would like, when we import this into our mono-repo, to have products run the new mode in staging for a few weeks first before we turn it on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to. My understanding is that runtime guards for performance improvements are generally supposed to be emergency shutoffs, not optional activates - if you make it opt-in then it doesn't get tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not forever. Just for some amount of time (a quarter?)
@alyssawilk may have more context on an appropriate amount of time a large change should bake before flipping it to the default. It will improve things; it will get tested and turned on.
I just don't want the eventual productionizing of this to catch us by surprise, and we wouldn't be able to disable it prior to importing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
False by default it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally when we do false by default we assign an owner to hassle when we flip true. Josh as you're looking at testing this are you OK driving the default flip? Alternately if this is no higher risk than most flags should we leave it as true by default upstream and import-false? I'm fine either way, just want to make sure we have a plan on how to move forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't need to take that on -- I like this change a lot, but it's solving a problem our service is not currently suffering from, afaict. Even if we have like a delay till end of January to make sure we have a chance to flip the flag off, I'd be OK.
I think for the service my team runs, we are being particularly paranoid right now, and we'd just want a chance to have this flag imported into our monorepo first, and then turn it off, then it could be turned on in Envoy main. After N months we could flip it on ourselves; we just can't take on that risk right now.
Signed-off-by: Raven Black <ravenblack@dropbox.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but as this is pretty deep stuff I think I'd like to see Adi and Yan both review.
@@ -296,7 +296,7 @@ TEST_F(LookupRequestTest, PragmaNoFallback) { | |||
TEST(HttpCacheTest, StableHashKey) { | |||
Key key; | |||
key.set_host("example.com"); | |||
ASSERT_EQ(stableHashKey(key), 9582653837550152292u); | |||
ASSERT_EQ(stableHashKey(key), 6153940628716543519u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ASSERT_EQ rather than EXPECT_EQ? Usually I use ASSERT_EQ only if, when it fails, that might result in a crash later in the function, like
ASSERT_EQ(5, array.size());
EXPECT_EQ("foo", array[4]);
no big deal though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see Ryan is already the senior maintainer on it but I'd still like Adi to review first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thanks!
Left a few minor comments.
Signed-off-by: Raven Black <ravenblack@dropbox.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
One small minor comment, otherwise LGTM
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
/retest |
I think this just need Senior Maintainer approval. Ryan? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
value.set_index(2); | ||
a2.mutable_any()->PackFrom(value); | ||
EXPECT_NE(hash(a1), hash(a2)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent tests!
/retest |
Signed-off-by: Raven Black <ravenblack@dropbox.com>
/retest |
1 similar comment
/retest |
Commit Message: Reflection-based deterministic message hashing
Additional Description: Comparing config used to be done with "deterministic" serialization and a hash of that, but it turned out deterministic serialization was not in fact deterministic enough (
Any
messages could contain reordered maps or even other fields). The recommendation in protobuf docs is "if you want a deterministic order do it yourself". (Note that the linked comment also suggests that SetSerializationDeterministic should work for our purposes here, but protocolbuffers/protobuf#5731 is the relevant unaddressed bug about Any fields.)The prior method of achieving determinism by expanding Any via

TextFormat
serialization is slow enough that it shows up on performance graphs as a significant cost (about 1/4 of our total startup time).This change is quite a lot of code, but should give us a deterministic hash with a much faster runtime than transforming to and then hashing a human-readable string.
Risk Level: Some; if it's broken it might cause config updates to not apply, or cache entries to collide.
Testing: Many unit tests, and a bit extra in the existing hash test. Benchmark results from
bazel run -c opt test/common/protobuf:utility_speed_test
:Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a
Runtime Guard:
envoy.restart_features.use_fast_protobuf_hash