-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try out random DNS resolved endpoints in case of a failure. #3559
Conversation
If resolver returns multiple IP and if last endpoint is in the list and status is success then use last endpoint otherwise randomly select another endpoint from the list. FIXES 3494
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one micro-nit. 👍
src/test/app/DNS_test.cpp
Outdated
BEAST_DEFINE_TESTSUITE_MANUAL_PRIO(DNS, ripple_data, ripple, 20); | ||
|
||
} // namespace test | ||
} // namespace ripple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a trailing newline here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 However, I left a refactoring of the onResolve
function for consideration.
resolver_type::iterator end; | ||
std::vector< | ||
std::reference_wrapper<resolver_type::iterator::value_type const>> | ||
entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this is written this way. It's a straight forward way to choose an random endpoint that was not previously failing. However, this code creates a vector of entries to randomly choose from, even when we won't be choosing from the vector. This will allocate memory, which we try to avoid. Even when we will be making a random choice, I don't think we need to create a vector to do so. What do you think about this refactoring?
template <class Impl>
void
WorkBase<Impl>::onResolve(error_code const& ec, resolver_type::iterator it)
{
if (ec)
return fail(ec);
lastEndpoint_ = [&]() -> endpoint_type {
// Use last endpoint if it is successfully connected and is in the list,
// otherwise pick a random endpoint from the list. If there is only one
// endpoint and it is the last endpoint then use the last endpoint.
resolver_type::iterator end;
auto const foundIt = std::find(it, end, lastEndpoint_);
if (lastStatus_ && foundIt != end)
return lastEndpoint_;
switch (auto const n = std::distance(it, end); n)
{
case 0:
return lastEndpoint_;
case 1:
return *it;
default: {
// need to choose a random element from the collection that does
// not equal `lastEndpoint_`` if `lastEndpoint_` is part of the
// collection.
if (foundIt == end)
return *std::next(it, rand_int(n - 1));
// lastEndpoint_ is part of the collection
// Pick a random number from the n-1 valid choices, if we use
// this as an index, note the last element will never be chosen
// and the `lastEndpoint_` index may be chosen. So when the
// `lastEndpoint_` index is chosen, that is treated as if the
// last element was chosen.
auto randIndex = (n > 2) ? rand_int(n - 2) : 0;
auto const foundIndex = std::distance(it, foundIt);
if (randIndex == foundIndex)
randIndex = n - 1;
return *std::next(it, randIndex);
}
}
}();
socket_.async_connect(
lastEndpoint_,
strand_.wrap(std::bind(
&Impl::onConnect,
impl().shared_from_this(),
std::placeholders::_1)));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, when I saw the vector
being populated and usually not used I had concerns similar to those expressed by @seelabs. I'd prefer an approach which either does not use the vector
or only populates the vector
if it will actually use the contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I updated with a slightly different implementation but it's pretty much the same idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I left a few comments, but I don't think any of them should block merging.
@@ -182,8 +195,30 @@ WorkBase<Impl>::onResolve(error_code const& ec, resolver_type::iterator it) | |||
if (ec) | |||
return fail(ec); | |||
|
|||
resolver_type::iterator end; | |||
std::vector< | |||
std::reference_wrapper<resolver_type::iterator::value_type const>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the Boost 1.71 documentation says that the ip::tcp::resolver::iterator
is deprecated.
https://www.boost.org/doc/libs/1_71_0/doc/html/boost_asio/reference/ip__tcp/resolver.html
And the usage, at least from here, looks really weird. A default constructed resolver::iterator
acts like an end()
iterator? At least that's how it's being used here. Are there any alternatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
@@ -315,9 +324,9 @@ ValidatorSite::onTimer(std::size_t siteIdx, error_code const& ec) | |||
if (ec) | |||
{ | |||
// Restart the timer if any errors are encountered, unless the error | |||
// is from the wait operating being aborted due to a shutdown request. | |||
// is from the wait operation being aborted due to a shutdown request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing and fixing the comment.
src/test/app/DNS_test.cpp
Outdated
{ | ||
parse(); | ||
// First endpoint is random. Next three | ||
// hould resolve to the same endpoint. Run a few times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hould -> should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Fixed.
src/test/app/DNS_test.cpp
Outdated
for (int i = 0; i < 4; ++i) | ||
makeRequest(lastEndpoint_, false); | ||
// Should have more than one but some endpoints can repeat since | ||
// selected at random. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the test repeatedly and I feel like I have to agree with your conclusion. However I don't understand why. If the endpoint selection is really random it seems like occasionally all of the random numbers would be the same. With 4 possible endpoints being selected randomly 5 times I think you get a probability of 4:1024 (or 1:256) that you'd get the same value all five times.
Still, I've run the test over 4096 times and it has not failed. Why is that? You should put that explanation in the comments or else someone else will be just as puzzled as me when they maintain the test. If it's an artifact of the pseudo random number generator, then the test is fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments. The gist of this is that on failure we are not going to select the last end point. So it's not really random - we exclude the failed endpoint from the list.
resolver_type::iterator end; | ||
std::vector< | ||
std::reference_wrapper<resolver_type::iterator::value_type const>> | ||
entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, when I saw the vector
being populated and usually not used I had concerns similar to those expressed by @seelabs. I'd prefer an approach which either does not use the vector
or only populates the vector
if it will actually use the contents.
Change async_resolve handler arg type to results_type. Change random endpoint selection algorithm - remove vector. Clarify expected test result outcome in the unit test.
// previous or next endpoint. | ||
auto r = rand_int(results.size() - 1); | ||
return *std::next( | ||
results.begin(), r != foundIndex ? r : (r > 0 ? --r : ++r)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a biased sample. The foundIndex
's alternate element has two chances of being selected, while every other element has only one chance of being selected. Consider make the r
not include the last element. If foundIndex
is selected, then select the last element instead. This way every element has the same probability of being selected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice job on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM!
If the resolver response includes a single IP, then we can just use it normally: attempt to connect to that IP and retrieve the list. Store the IP address and the success/failure status in the object and produce a log entry.
If the resolver response includes multiple IPs we need to use more advanced logic. Resolve the hostname again and:
If the previous attempt was successful and if the previously used IP is still valid use that, following the "single IP" logic above.
If this isn't the first attempt and the previous attempt failed, remove the failed IP from the list of resolved addresses if it still exists. Select an IP at random from the remaining choices and follow the "single IP" logic above.
FIXES #3494