-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns: configuring a basic key value store to persist to disk #17745
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
/wait on CI |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this overall feature make sense as part of the DNS extensibility being worked on in #17479?
I.e. we add a caching DNS resolver extension (that maybe wraps another DNS resolver like c-ares or platform specific)?
I think this would potentially be a nice solution to think about it. FWIW we had discussed something similar today for wrapping DNS in order to get DNS specific analytic events. cc @snowp |
I'm not sure what the ask is for this PR. I don't want this tied to DNS but I think it'd be reasonable to make sure the DNS refactor allows for all the platform resolvers to use the cache. |
I'm suggesting the DNS resolver itself is a cache. |
i.e. that we build DNS caching inside the DNS resolver framework. I'm not sure why we want to reuse the existing data plane caching framework for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small questions to get started...
// | ||
// All keys and values are flushed to a single file as | ||
// [length]\n[key][length]\n[value] | ||
class FileBasedKeyValueStore : public KeyValueStoreBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit surprised to see that the interface here is Envoy::KeyValueStore but the concrete class is Envoy::Extensions::Cache::KeyValueCache. I had expected them to both live in the same namespace, roughly. But maybe this is a typical way to lay things out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, per offline conversation with harvey I moved all the things to a location he prefered, so now it's KeyValue and then KeyValueStore which is at least a substring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI there's an outstanding issue with both clang tidy and docs build, but API should be ready for review. PTAL.
// | ||
// All keys and values are flushed to a single file as | ||
// [length]\n[key][length]\n[value] | ||
class FileBasedKeyValueStore : public KeyValueStoreBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, per offline conversation with harvey I moved all the things to a location he prefered, so now it's KeyValue and then KeyValueStore which is at least a substring?
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API looks good modulo comments.
|
||
// [#not-implemented-hide:] | ||
// Configuration to flush the DNS cache to long term storage. | ||
key_value.v3.KeyValueStoreConfig persistent_cache_config = 13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit on naming: given this is referencing an extension point, do we know that there will always be persistence guaranteed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I can't think of a use case where we'd add a key-value store and not persist it. We already have a DNS cache so the point of this is to persist. That said, just because I can't think of a use case it certainly doesn't hurt to future-proof :-)
|
||
// The interval at which the key value store should be flushed to long term | ||
// storage. | ||
google.protobuf.Duration flush_interval = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this gets to my earlier question, do we know that all implementations will flush? Or will flushing be a property of a specific key-value story, e.g. the FileBasedKeyValueStoreConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto - current uses all involve persisting but can't hurt to move it to file based and just dup the single value for mobile based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at docs pr - must have failed a string somewhere
|
||
// [#not-implemented-hide:] | ||
// Configuration to flush the DNS cache to long term storage. | ||
key_value.v3.KeyValueStoreConfig persistent_cache_config = 13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I can't think of a use case where we'd add a key-value store and not persist it. We already have a DNS cache so the point of this is to persist. That said, just because I can't think of a use case it certainly doesn't hurt to future-proof :-)
|
||
// The interval at which the key value store should be flushed to long term | ||
// storage. | ||
google.protobuf.Duration flush_interval = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto - current uses all involve persisting but can't hurt to move it to file based and just dup the single value for mobile based.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@@ -0,0 +1,27 @@ | |||
syntax = "proto3"; | |||
|
|||
package envoy.extensions.common.key_value.file_based.v3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should be in api/envoy/extensions/key_value/filed_based/v3/config.proto
and same package namespace.
The rationale is that the KeyValueConfig
is a common extension point declaration reused across protos. The FileBasedKkeyValueStoreConfig
is a key_value
extension, and all extension classes have a root like api/envoy/extensions/<extension class>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm api
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
argh sorry, force pushed format fixes. Bad @alyssawilk no biscuit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks pretty good, just a few nits and a question on the interaction between cache load and DNS resolution
virtual KeyValueStorePtr createStore(const Protobuf::Message& config, | ||
ProtobufMessage::ValidationVisitor& validation_visitor, | ||
Event::Dispatcher& dispatcher, | ||
Filesystem::Instance& file_system) PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just pass ServerFactoryContext
instead of specifically passing the file system? Might future proof against some other use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we can't - the DNS store is created both from ServerFactoryContext and from the cluster's TransportSocketFactoryContextImpl which have wide overlap but no common base class.
I have that filed in away in my clean up list but I think resolving it is out of scope for this PR
// TypedFactory | ||
ProtobufTypes::MessagePtr createEmptyConfigProto() override { | ||
return ProtobufTypes::MessagePtr{ | ||
new envoy::extensions::cache::key_value_cache::v3::FileBasedKeyValueCacheConfig()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_unique
// TODO(alyssawilk) change finishResolve to actually use the TTL rather than | ||
// putting 0 here, return the remaining TTL or indicate the result is stale. | ||
response.emplace_back(Network::DnsResponse(address, std::chrono::seconds(0) /* ttl */)); | ||
startCacheLoad(key, address->ip()->port()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that familiar with this code, but it looks like this will kick off a DNS query? And then we immediately resolve it?
If so I wonder if this will cause unforeseen consequences, like having entries being resolved twice (once from cache and once from DNS) and confusing stats (e.g. stat increment for DNS resolution and cache load for same entry)
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works today (see integration test) but I agree it's not great, hence the TODO :-)
What I want to have happen is be able to differentiate stale cached results from fresh. I do want to immediately kick off a resolve, but also have stale data in there, config for how long we wait for fresh data before we use stale data. I plan to do that in the follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, would a DNS resolution then overwrite this value once it arrives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, fresh resolve always overwrites older data (if it's different)
in this case it'd be the same so not trigger any changes.
MockKeyValueStoreFactory factory; | ||
EXPECT_CALL(factory, createEmptyConfigProto()).WillRepeatedly(Invoke([]() { | ||
return std::make_unique< | ||
envoy::extensions::key_value::file_based::v3::FileBasedKeyValueStoreConfig>(); | ||
})); | ||
MockKeyValueStore* store{}; | ||
EXPECT_CALL(factory, createStore(_, _, _, _)).WillOnce(Invoke([&store]() { | ||
auto ret = std::make_unique<NiceMock<MockKeyValueStore>>(); | ||
store = ret.get(); | ||
// Make sure there's an attempt to load from the key value store. | ||
EXPECT_CALL(*store, iterate); | ||
return ret; | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be implemented on the MockKeyValueStoreFactory itself? Having them in the test case here is a bit distracting from what the test is actually testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe? Honestly I regularly get bit by expectations in mock timer where I don't understand what's going on, so I'm hesitant to add complexity like that to the mock base classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure not a big deal, just what I usually see done with mocks. Agree that MockTimer is especially confusing :)
InSequence s; | ||
ASSERT(store != nullptr); | ||
|
||
// Make sure the store gets the first insert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit hard to follow how the below code block verifies anything with the store, maybe make that clearer with comments? (I think it's L1067 but there is a lot of boiler plate here)
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@davinci26 @wrowe any ideas on the windows failure? I've commented out with TODO to unblock this but I'm wondering if you have ideas what went wrong. For example are there known gotchas with file paths on windows CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG! We definitely need to figure out the Windows thing, but I'm okay with doing that in a follow up
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk I am not sure, I will have to debug locally but it is super weird because the same test passes on windows with clang. |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
coverage flake unrelated: from a combination of quic refactor and setec patches |
* main: config: fix dfp config validation (envoyproxy#17835) docs: updating where meetings are uploaded (envoyproxy#17832) h2: moving a comment (envoyproxy#17846) quiche: early fail listener config with both quic and connection_balencer (envoyproxy#17834) dns: configuring a basic key value store to persist to disk (envoyproxy#17745) quic: fix receiving STOP_SENDING (envoyproxy#17815) tooling: Add Github release manager (envoyproxy#17741) tooling: Use upstream pytest-patches (envoyproxy#17809) Remove `hidden_envoy_deprecated_use_http2` (envoyproxy#17805) kafka: produce request for mesh-filter (envoyproxy#17818) Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Clean up inspired by #17745 Risk Level: low (interface refactor) Testing: n/a Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Risk Level: low (config guarded)
Testing: new unit, integration tests
Docs Changes: n/a (hidden)
Release Notes: n/a
Part of envoyproxy/envoy-mobile#1587