Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflection-based deterministic message hashing #30761

Merged
merged 44 commits into from
Dec 13, 2023
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
9c55848
Reflection-based deterministic message hashing
ravenblackx Nov 7, 2023
e9ca067
Add comment
ravenblackx Nov 7, 2023
44f7ebe
Use GetStringReference
ravenblackx Nov 7, 2023
465d833
Spelling
ravenblackx Nov 7, 2023
0bfb6f1
Debug
ravenblackx Nov 7, 2023
ef99f1e
Empty message hash must be nonzero
ravenblackx Nov 9, 2023
11aba21
Make http_cache use deterministic hash
ravenblackx Nov 9, 2023
a5b4a89
Messages of different types with identical contents should hash diffe…
ravenblackx Nov 9, 2023
6c7d090
Template map sorting
ravenblackx Nov 10, 2023
29532f6
Fewer macros
ravenblackx Nov 10, 2023
aa9e5fd
Type-typo
ravenblackx Nov 10, 2023
9e38e75
Add some tests, fix a bug they found
ravenblackx Nov 10, 2023
edcf363
Last of the tests and another fix
ravenblackx Nov 10, 2023
9b67119
Untemplate
ravenblackx Nov 13, 2023
c172e2d
No lambdas version
ravenblackx Nov 13, 2023
c686b3a
Remove nested struct
ravenblackx Nov 13, 2023
5b4e623
Satisfy Windows inferior "no return value" checks
ravenblackx Nov 13, 2023
cd2d3c1
Panic > bug for invalid enum
ravenblackx Nov 14, 2023
38c7f6d
Reduce coverage threshold
ravenblackx Nov 14, 2023
b6e84b0
Remove one uncovered line :)
ravenblackx Nov 14, 2023
3320ed0
Coverage back up by 0.1% for the one line
ravenblackx Nov 16, 2023
5afe859
Remove most auto, Comparer->Comparator
ravenblackx Nov 17, 2023
da47756
const
ravenblackx Nov 17, 2023
e128ee4
Explanatory comment
ravenblackx Nov 17, 2023
974f4f0
Verbose comment about sorting map fields, plus an ASSERT
ravenblackx Nov 17, 2023
5b0b818
Moar templates
ravenblackx Nov 28, 2023
f5393b6
Runtime guard
ravenblackx Nov 29, 2023
8830f12
Remove unused WithAny test proto
ravenblackx Nov 29, 2023
f308278
Add comment about unknown Any types and MessageDifferencer
ravenblackx Nov 29, 2023
2104c6e
Add comment about type URLs
ravenblackx Nov 29, 2023
234cbc8
Return coverage to 96.5
ravenblackx Nov 29, 2023
7b737bc
Spelling
ravenblackx Nov 29, 2023
c829856
Add benchmark
ravenblackx Nov 30, 2023
ecdd501
More benchmarks
ravenblackx Nov 30, 2023
2757c86
Unorder map hashing without sorting
ravenblackx Dec 4, 2023
3baddeb
Add relnote
ravenblackx Dec 4, 2023
40cefb2
Merge branch 'main' into hash
ravenblackx Dec 5, 2023
fffa18c
Use xxHash64Value
ravenblackx Dec 5, 2023
4255b62
Runtime guard defaulting to false
ravenblackx Dec 8, 2023
23c6aa9
Adi's comments
ravenblackx Dec 11, 2023
7feee62
Add ASSERT for GetPrototype
ravenblackx Dec 11, 2023
f6a44ef
Add tests for Any and repeated message
ravenblackx Dec 11, 2023
4455b88
Merge branch 'main' into hash
ravenblackx Dec 11, 2023
fbed541
Merge branch 'main' into hash
ravenblackx Dec 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions changelogs/current.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,13 @@ minor_behavior_changes:
Added new configuration field :ref:`rate_limited_as_resource_exhausted
<envoy_v3_api_field_extensions.filters.http.local_ratelimit.v3.LocalRateLimit.rate_limited_as_resource_exhausted>`
to allow for setting if rate limit grpc response should be RESOURCE_EXHAUSTED instead of the default UNAVAILABLE.
- area: config parsing, http cache filter
change: |
replaces protobuf hashing by human-readable string with a dedicated deterministic hashing algorithm.
The performance of the hash operation is improved by 2-10x depending on the structure of the message,
which is expected to reduce config update time or startup time by 10-25%. The new algorithm is also
used for http_cache_filter hashing, which will effectively cause a one-time cache flush on update
for users with a persistent cache. To revert this behavior set ``envoy.restart_features.use_fast_protobuf_hash`` to false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would you feel about making this default to the current behavior.

I would like, when we import this into our mono-repo, to have products run the new mode in staging for a few weeks first before we turn it on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to. My understanding is that runtime guards for performance improvements are generally supposed to be emergency shutoffs, not optional activates - if you make it opt-in then it doesn't get tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not forever. Just for some amount of time (a quarter?)

@alyssawilk may have more context on an appropriate amount of time a large change should bake before flipping it to the default. It will improve things; it will get tested and turned on.

I just don't want the eventual productionizing of this to catch us by surprise, and we wouldn't be able to disable it prior to importing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False by default it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally when we do false by default we assign an owner to hassle when we flip true. Josh as you're looking at testing this are you OK driving the default flip? Alternately if this is no higher risk than most flags should we leave it as true by default upstream and import-false? I'm fine either way, just want to make sure we have a plan on how to move forward

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't need to take that on -- I like this change a lot, but it's solving a problem our service is not currently suffering from, afaict. Even if we have like a delay till end of January to make sure we have a chance to flip the flag off, I'd be OK.

I think for the service my team runs, we are being particularly paranoid right now, and we'd just want a chance to have this flag imported into our monorepo first, and then turn it off, then it could be turned on in Envoy main. After N months we could flip it on ourselves; we just can't take on that risk right now.

- area: filter state
change: |
Added config name of filter sending a local reply in filter state with key
Expand Down
12 changes: 12 additions & 0 deletions source/common/protobuf/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -199,13 +199,25 @@ envoy_cc_library(
]),
)

envoy_cc_library(
name = "deterministic_hash_lib",
srcs = ["deterministic_hash.cc"],
hdrs = ["deterministic_hash.h"],
deps = [
":protobuf",
"//source/common/common:assert_lib",
"//source/common/common:hash_lib",
],
)

envoy_cc_library(
name = "utility_lib",
srcs = ["utility.cc"],
external_deps = [
"protobuf",
],
deps = [
":deterministic_hash_lib",
":message_validator_lib",
":protobuf",
":utility_lib_header",
Expand Down
228 changes: 228 additions & 0 deletions source/common/protobuf/deterministic_hash.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
#if defined(ENVOY_ENABLE_FULL_PROTOS)
#include "source/common/protobuf/deterministic_hash.h"

#include "source/common/common/assert.h"
#include "source/common/common/hash.h"

namespace Envoy {
namespace DeterministicProtoHash {
namespace {

// Get a scalar field from protobuf reflection field definition. The return
// type must be specified by the caller. Every implementation is a specialization
// because the reflection interface did separate named functions instead of a
// template.
template <typename T>
T reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field);

template <>
uint32_t reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetUInt32(message, &field);
}

template <>
int32_t reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetInt32(message, &field);
}

template <>
uint64_t reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetUInt64(message, &field);
}

template <>
int64_t reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetInt64(message, &field);
}

template <>
float reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetFloat(message, &field);
}

template <>
double reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetDouble(message, &field);
}

template <>
bool reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetBool(message, &field);
}

// Takes a field of scalar type, which may be a repeated field, and hashes
// it (or each of it).
template <typename T, std::enable_if_t<std::is_scalar_v<T>, bool> = true>
uint64_t hashScalarField(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field, uint64_t seed) {
if (field.is_repeated()) {
for (const T& scalar : reflection.GetRepeatedFieldRef<T>(message, &field)) {
seed = HashUtil::xxHash64Value(scalar, seed);
}
} else {
seed = HashUtil::xxHash64Value(reflectionGet<T>(reflection, message, field), seed);
}
return seed;
}

uint64_t reflectionHashMessage(const Protobuf::Message& message, uint64_t seed = 0);
uint64_t reflectionHashField(const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field, uint64_t seed);

// To make a map serialize deterministically we need to ignore the order of
// the map fields. To do that, we simply combine the hashes of each entry
// using an unordered operator (addition), and then apply that combined hash to
// the seed.
uint64_t reflectionHashMapField(const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field, uint64_t seed) {
const Protobuf::Reflection& reflection = *message.GetReflection();
ASSERT(field.is_map());
const auto& entries = reflection.GetRepeatedFieldRef<Protobuf::Message>(message, &field);
ASSERT(!entries.empty());
const Protobuf::Descriptor& map_descriptor = *entries.begin()->GetDescriptor();
const Protobuf::FieldDescriptor& key_field = *map_descriptor.map_key();
const Protobuf::FieldDescriptor& value_field = *map_descriptor.map_value();
uint64_t combined_hash = 0;
for (const Protobuf::Message& entry : entries) {
uint64_t entry_hash = reflectionHashField(entry, key_field, 0);
entry_hash = reflectionHashField(entry, value_field, entry_hash);
combined_hash += entry_hash;
}
return HashUtil::xxHash64Value(combined_hash, seed);
}

uint64_t reflectionHashField(const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field, uint64_t seed) {
using Protobuf::FieldDescriptor;
const Protobuf::Reflection& reflection = *message.GetReflection();
seed = HashUtil::xxHash64Value(field.number(), seed);
switch (field.cpp_type()) {
case FieldDescriptor::CPPTYPE_INT32:
seed = hashScalarField<int32_t>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_UINT32:
seed = hashScalarField<uint32_t>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_INT64:
seed = hashScalarField<int64_t>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_UINT64:
seed = hashScalarField<uint64_t>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_DOUBLE:
seed = hashScalarField<double>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_FLOAT:
seed = hashScalarField<float>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_BOOL:
seed = hashScalarField<bool>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_ENUM:
if (field.is_repeated()) {
int c = reflection.FieldSize(message, &field);
for (int i = 0; i < c; i++) {
seed = HashUtil::xxHash64Value(reflection.GetRepeatedEnumValue(message, &field, i), seed);
}
} else {
seed = HashUtil::xxHash64Value(reflection.GetEnumValue(message, &field), seed);
}
break;
case FieldDescriptor::CPPTYPE_STRING: {
if (field.is_repeated()) {
for (const std::string& str : reflection.GetRepeatedFieldRef<std::string>(message, &field)) {
seed = HashUtil::xxHash64(str, seed);
}
} else {
std::string scratch;
seed = HashUtil::xxHash64(reflection.GetStringReference(message, &field, &scratch), seed);
}
} break;
case FieldDescriptor::CPPTYPE_MESSAGE:
if (field.is_map()) {
seed = reflectionHashMapField(message, field, seed);
} else if (field.is_repeated()) {
for (const Protobuf::Message& submsg :
reflection.GetRepeatedFieldRef<Protobuf::Message>(message, &field)) {
seed = reflectionHashMessage(submsg, seed);
}
} else {
seed = reflectionHashMessage(reflection.GetMessage(message, &field), seed);
}
break;
}
return seed;
}

// Converts from type urls OR descriptor full names to descriptor full names.
// Type urls are as used in envoy yaml config, e.g.
// "type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig"
// becomes
// "envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig"
absl::string_view typeUrlToDescriptorFullName(absl::string_view url) {
const size_t pos = url.rfind('/');
if (pos != absl::string_view::npos) {
return url.substr(pos + 1);
}
return url;
}

std::unique_ptr<Protobuf::Message> unpackAnyForReflection(const ProtobufWkt::Any& any) {
const Protobuf::Descriptor* descriptor =
Protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName(
typeUrlToDescriptorFullName(any.type_url()));
// If the type name refers to an unknown type, we treat it the same as other
// unknown fields - not including its contents in the hash.
if (descriptor == nullptr) {
return nullptr;
}
const Protobuf::Message* prototype =
Protobuf::MessageFactory::generated_factory()->GetPrototype(descriptor);
std::unique_ptr<Protobuf::Message> msg(prototype->New());
any.UnpackTo(msg.get());
return msg;
}

// This is intentionally ignoring unknown fields.
uint64_t reflectionHashMessage(const Protobuf::Message& message, uint64_t seed) {
using Protobuf::FieldDescriptor;
std::string scratch;
const Protobuf::Reflection* reflection = message.GetReflection();
const Protobuf::Descriptor* descriptor = message.GetDescriptor();
seed = HashUtil::xxHash64(descriptor->full_name(), seed);
if (descriptor->well_known_type() == Protobuf::Descriptor::WELLKNOWNTYPE_ANY) {
const ProtobufWkt::Any* any = Protobuf::DynamicCastToGenerated<ProtobufWkt::Any>(&message);
std::unique_ptr<Protobuf::Message> submsg = unpackAnyForReflection(*any);
if (submsg == nullptr) {
// If we wanted to handle unknown types in Any, this is where we'd have to do it.
return seed;
}
return reflectionHashMessage(*submsg, seed);
}
std::vector<const FieldDescriptor*> fields;
reflection->ListFields(message, &fields);
// If we wanted to handle unknown fields, we'd need to also GetUnknownFields here.
for (const FieldDescriptor* field : fields) {
seed = reflectionHashField(message, *field, seed);
}
// Hash one extra character to signify end of message, so that
// msg{} field2=2
// hashes differently from
// msg{field2=2}
return HashUtil::xxHash64("\x17", seed);
}
} // namespace

uint64_t hash(const Protobuf::Message& message) { return reflectionHashMessage(message, 0); }

} // namespace DeterministicProtoHash
} // namespace Envoy
#endif
26 changes: 26 additions & 0 deletions source/common/protobuf/deterministic_hash.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#pragma once

#include "source/common/protobuf/protobuf.h"

#if defined(ENVOY_ENABLE_FULL_PROTOS)
namespace Envoy {
namespace DeterministicProtoHash {

// Note: this ignores unknown fields and unrecognized types in Any fields.
// An alternative approach might treat such fields as "raw data" and include
// them in the hash, which would risk breaking the deterministic behavior,
// versus this way risks ignoring significant data.
//
// Ignoring unknown fields was chosen as the implementation because the
// TextFormat-based hashing this replaces was explicitly ignoring unknown
// fields.
//
// If this is used as part of making a hash table, it may result in
// collisions if unknown fields are present and are not ignored by the
// corresponding comparator. A `MessageDifferencer` can be configured to
// ignore unknown fields, or not to.
uint64_t hash(const Protobuf::Message& message);

} // namespace DeterministicProtoHash
} // namespace Envoy
#endif
13 changes: 7 additions & 6 deletions source/common/protobuf/utility.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "source/common/common/assert.h"
#include "source/common/common/documentation_url.h"
#include "source/common/common/fmt.h"
#include "source/common/protobuf/deterministic_hash.h"
#include "source/common/protobuf/message_validator_impl.h"
#include "source/common/protobuf/protobuf.h"
#include "source/common/protobuf/visitor.h"
Expand Down Expand Up @@ -129,22 +130,22 @@ void ProtoExceptionUtil::throwProtoValidationException(const std::string& valida
}

size_t MessageUtil::hash(const Protobuf::Message& message) {
std::string text_format;

#if defined(ENVOY_ENABLE_FULL_PROTOS)
{
if (Runtime::runtimeFeatureEnabled("envoy.restart_features.use_fast_protobuf_hash")) {
return DeterministicProtoHash::hash(message);
} else {
std::string text_format;
Protobuf::TextFormat::Printer printer;
printer.SetExpandAny(true);
printer.SetUseFieldNumber(true);
printer.SetSingleLineMode(true);
printer.SetHideUnknownFields(true);
printer.PrintToString(message, &text_format);
return HashUtil::xxHash64(text_format);
}
#else
absl::StrAppend(&text_format, message.SerializeAsString());
return HashUtil::xxHash64(message.SerializeAsString());
#endif

return HashUtil::xxHash64(text_format);
}

#if !defined(ENVOY_ENABLE_FULL_PROTOS)
Expand Down
1 change: 1 addition & 0 deletions source/common/runtime/runtime_features.cc
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ RUNTIME_GUARD(envoy_reloadable_features_validate_grpc_header_before_log_grpc_sta
RUNTIME_GUARD(envoy_reloadable_features_validate_upstream_headers);
RUNTIME_GUARD(envoy_restart_features_send_goaway_for_premature_rst_streams);
RUNTIME_GUARD(envoy_restart_features_udp_read_normalize_addresses);
RUNTIME_GUARD(envoy_restart_features_use_fast_protobuf_hash);

// Begin false flags. These should come with a TODO to flip true.
// Sentinel and test flag.
Expand Down
2 changes: 1 addition & 1 deletion source/extensions/filters/http/cache/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ envoy_cc_library(
"//source/common/common:assert_lib",
"//source/common/http:header_utility_lib",
"//source/common/http:headers_lib",
"//source/common/protobuf:utility_lib",
"//source/common/protobuf:deterministic_hash_lib",
"@envoy_api//envoy/extensions/filters/http/cache/v3:pkg_cc_proto",
],
)
Expand Down
10 changes: 8 additions & 2 deletions source/extensions/filters/http/cache/http_cache.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#include "source/common/http/header_utility.h"
#include "source/common/http/headers.h"
#include "source/common/http/utility.h"
#include "source/common/protobuf/utility.h"
#include "source/common/protobuf/deterministic_hash.h"
#include "source/extensions/filters/http/cache/cache_custom_headers.h"
#include "source/extensions/filters/http/cache/cache_headers_utils.h"

Expand Down Expand Up @@ -54,7 +54,13 @@ LookupRequest::LookupRequest(const Http::RequestHeaderMap& request_headers, Syst
// Unless this API is still alpha, calls to stableHashKey() must always return
// the same result, or a way must be provided to deal with a complete cache
// flush.
size_t stableHashKey(const Key& key) { return MessageUtil::hash(key); }
size_t stableHashKey(const Key& key) {
if (Runtime::runtimeFeatureEnabled("envoy.restart_features.use_fast_protobuf_hash")) {
return DeterministicProtoHash::hash(key);
} else {
return MessageUtil::hash(key);
}
}

void LookupRequest::initializeRequestCacheControl(const Http::RequestHeaderMap& request_headers) {
const absl::string_view cache_control =
Expand Down
2 changes: 0 additions & 2 deletions source/extensions/filters/http/cache/http_cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,6 @@ using LookupResultPtr = std::unique_ptr<LookupResult>;
//
// When providing a cached response, Caches must ensure that the keys (and not
// just their hashes) match.
//
// TODO(toddmgreer): Ensure that stability guarantees above are accurate.
size_t stableHashKey(const Key& key);

// LookupRequest holds everything about a request that's needed to look for a
Expand Down
Loading