Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: use a request queue in the http2 conn pool #4917

Merged
merged 28 commits into from
Nov 8, 2018

Conversation

snowp
Copy link
Contributor

@snowp snowp commented Oct 30, 2018

Modifies the http2 connection pool to use a request queue similar to
how the http/1.1 connection pool does. This is done in order to defer
the connection pool callbacks until the connection has been established
instead of simply invoking the callback immediately.

Signed-off-by: Snow Pettersen snowp@squareup.com

Risk Level: High
Testing: UTs for new conn pool behavior
Docs Changes: n/a
Release Notes: n/a
Part of #4903

Modifies the http2 connection pool to use a request queueu similar to
how the http/1.1 connection pool does. This is done in order to defer
the connection pool callbacks until the connection has been established
instead of simply invoking the callback immediately.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp
Copy link
Contributor Author

snowp commented Oct 30, 2018

NOTE: not done, test coverage is lacking and not all integration tests are passing, but putting this up to get some feedback on the high level approach

@@ -308,15 +308,15 @@ TEST_P(Http2UpstreamIntegrationTest, UpstreamConnectionCloseWithManyStreams) {
responses.push_back(std::move(encoder_decoder.second));
// Reset a few streams to test how reset and watermark interact.
if (i % 15 == 0) {
codec_client_->sendReset(*encoders[i]);
// codec_client_->sendReset(*encoders[i]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reseting a stream causes waitForEndStream to hang, not sure why this is happening yet

@@ -132,6 +146,21 @@ void ConnPoolImpl::onConnectionEvent(ActiveClient& client, Network::ConnectionEv
if (client.connect_timer_) {
host_->cluster().stats().upstream_cx_connect_fail_.inc();
host_->stats().cx_connect_fail_.inc();

// Raw connect failures should never happen under normal circumstances. If we have an upstream
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is copied verbatim from the http1 conn pool. Should probably share that code somehow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, optimally we would share code here. I would at least add a clear TODO on what could be shared.

@mattklein123 mattklein123 self-assigned this Oct 30, 2018
Snow Pettersen added 3 commits October 30, 2018 13:44
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
wip
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level this makes sense to me. LMK when you are ready for a more formal review. Thanks for working on this!

host_->cluster().resourceManager(priority_).requests().inc();
callbacks.onPoolReady(primary_client_->client_->newStream(response_decoder),
primary_client_->real_host_description_);
// If the primary client is not connected yet, queue up the request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the pending request circuit breaker here. This is a nice cleanup that also brings parity between H2 and H1 IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this. Please make sure this has test coverage also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, with test coverage

@@ -132,6 +146,21 @@ void ConnPoolImpl::onConnectionEvent(ActiveClient& client, Network::ConnectionEv
if (client.connect_timer_) {
host_->cluster().stats().upstream_cx_connect_fail_.inc();
host_->stats().cx_connect_fail_.inc();

// Raw connect failures should never happen under normal circumstances. If we have an upstream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, optimally we would share code here. I would at least add a clear TODO on what could be shared.

@@ -66,12 +66,29 @@ class ConnPoolImpl : Logger::Loggable<Logger::Id::pool>, public ConnectionPool::
Upstream::HostDescriptionConstSharedPtr real_host_description_;
uint64_t total_streams_{};
Event::TimerPtr connect_timer_;
Event::TimerPtr upstream_ready_timer_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refresh my memory of why this is needed? I remember this was added to HTTP/1 to fix some issue which I can't recall now. I think @lizan did this. Can you track down why this was done and add comments to both places?

Copy link
Contributor Author

@snowp snowp Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat cargo culted this from the http1 implementation, but seems like it was done in order to avoid requests being assigned to a connection that's already been closed: #2715

I think this would apply to http/2 as well? I'll spend some time thinking about it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that sounds right. I would just look at the convo in the related PR. If you could add some comments for future readers that would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking into the issue I don't think we need this for h/2 unless we're already at risk without this PR. The semantics of how we're handling a connection being closed/reset isn't changing, we're just changing when streams get created on the codec. The issue also talks about how this being due to a h/1.1 issue, though I'm not sure exactly what property of h/1.1 causes this. Maybe the different close semantics vs h/2?

I'll try without using a timer. Hopefully we'll be able to add test coverage that can add confidence around this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look at this more tomorrow. For now I would go ahead and remove the timer and we can then think about whether it is necessary.

Copy link
Member

@lizan lizan Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you don't need this timer when connection is established, this timer is not used in that case in H1 conn pool either.

It is needed in H1 conn pool to schedule another request to existing idle connection, while the connection might be already closed by upstream so we wait one event cycle. In H2, streams are multiplexed into one connection so the problem doesn't exist. You can call onUpstreamReady directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for confirming. There's no similar concern about the primary client's connection being reset immediately after newStream is called?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no similar concern about the primary client's connection being reset immediately after newStream is called?

Can you elaborate a bit more? Connection reset after newStream can happen but the timer cannot save that case either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood slightly and thought that the timer was meant to help with connections being dropped mid-request. Your explanation makes sense to me now.

@snowp
Copy link
Contributor Author

snowp commented Oct 31, 2018

So looking into my conn reset issue it seems like the issue is that with this change, when a stream resets before the connection has been established, newStream will not be called on the fake connection. This means waitForNewStream cannot be used to get a handle on streams that get reset.

Tests like the grpc json transcoder seems to use this to get access to the request that will be sent on that stream, but the stream is reset in certain cases due to the filter's usage of sendLocalReply. This is why the tests are now timing out waiting for a stream.

So the way I see it there's two options: 1) get waitForNewStream to work with the new connection pool setup or 2) update tests to not rely on on waitForNewStream for streams that will be reset

I assume waitForNewStream can't be used for this purpose for http/1.1 either due to their use of a request queue (or maybe I'm missing something obvious?), so maybe 2) is the way to go?

@mattklein123 thoughts?

Snow Pettersen added 2 commits October 31, 2018 18:02
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
@mattklein123
Copy link
Member

@mattklein123 thoughts?

Sorry can you provide more detail on the flow that is failing/different? I'm not completely clear on what is happening now.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp
Copy link
Contributor Author

snowp commented Nov 1, 2018

The grpc-json transcoder integration tests uses waitForNewStream to get the request sent through the filter: https://github.com/envoyproxy/envoy/blob/master/test/extensions/filters/http/grpc_json_transcoder/grpc_json_transcoder_integration_test.cc#L75

With the change in this PR, it seems like waitForNewStream won't return for requests that that hit sendLocalReply or are otherwise being reset before the upstream connection has been established. I'll push a change that removes the problematic expectations, and we can discuss options in review.

Snow Pettersen added 7 commits October 31, 2018 20:23
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transcoder integration test LGTM, thanks for fixing that.

@snowp
Copy link
Contributor Author

snowp commented Nov 1, 2018

This should be ready for review now (just some clang-tidy stuff to... tidy up)

I extracted out the PendingRequest and queue management stuff from the h1 connection pool so that I could reuse it in the h2 one, which significantly reduces the amount of duplicated code compared to earlier. Besides that there's some minor tweaks to some integration tests to handle the new behavior of waitForNewStream.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp
Copy link
Contributor Author

snowp commented Nov 1, 2018

Actually, looks like tests are flaky. Looking into it

Snow Pettersen added 2 commits November 1, 2018 13:59
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp
Copy link
Contributor Author

snowp commented Nov 1, 2018

Ok, tests are green (except mac tests that are taking forever). ptal

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is well done. @alyssawilk can you also PTAL, especially at the integration test changes? @snowp I would upgrade the regression risk of this change to high IMO. This is all pretty scary.

virtual void checkForDrained() PURE;

Upstream::HostConstSharedPtr host_;
Upstream::ResourcePriority priority_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

@@ -35,6 +36,9 @@ class ConnPoolImpl : Logger::Loggable<Logger::Id::pool>, public ConnectionPool::
ConnectionPool::Cancellable* newStream(Http::StreamDecoder& response_decoder,
ConnectionPool::Callbacks& callbacks) override;

// Http::ConnPoolImplBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can make this override private or protected.

host_->cluster().resourceManager(priority_).requests().inc();
callbacks.onPoolReady(primary_client_->client_->newStream(response_decoder),
primary_client_->real_host_description_);
// If the primary client is not connected yet, queue up the request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this. Please make sure this has test coverage also.

void ConnPoolImpl::onUpstreamReady() {
// Establishes new codec streams for each pending request.
while (!pending_requests_.empty()) {
newClientStream(pending_requests_.back()->decoder_, pending_requests_.back()->callbacks_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test case that covers when there is a circuit breaker overflow for max requests once we start creating new streams here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

void expectClientCreate() {
// Creates a new test client, expecting a new connection to be created and associated
// with the new client.
void expectClientCreate(absl::optional<int> buffer_limits = {}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/int/uint64_t or whatever the type is?

if (!grpc_request_messages.empty()) {
ASSERT_TRUE(fake_upstream_connection_->waitForNewStream(*dispatcher_, upstream_request_));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the change here? Is this basically because in certain cases the old code was making an upstream connection and creating a stream, but then immediately resetting it or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, from what I could gather this code waits for was waiting for an upstream stream, but since the filter immediately called sendLocalReply the upstream connection is reset before the connection is established, so upstream never sees a stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why did it see a stream before? I'm confused. @lizan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what happens is that:

  1. transcoder filter responds with Continue in decodeHeaders
  2. router creates an UpstreamRequest, opening a connection
  3. transcoder filter calls sendLocalReply in decodeData, reseting the connection opened in 2), canceling the pending stream before it's ever created.

Before this change, newStream is called on the codec inline in ConnPool::newStream which initializes the stream by calling nghttp2_submit_request which triggers nghttp2_session_callbacks_new -> ServerConnectionCallbacks.newStream. This is all before the connection has been established, and is how FakeConnection picks up new streams.

After this PR, all of this stream setup happens after the upstream connection has been established. This means that there is now a window between a connection being opened and when its established in which a connection reset will prevent a stream from being created. In this flow, a reset before the connection has been established will call cancel on the PendingRequest, so there's never a call to newStream on the http2 codec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this makes sense, but I don't think the explanation is quite right.

This is all before the connection has been established, and is how FakeConnection picks up new streams.

FakeConnection is a real network connection, so I think this was always relying on a specific sequence? Meaning, we made the connection and immediately wrote data to the stream, so the connection would pick it up. Now I think the sequence is more explicit in that we reset locally before we ever actually get back to create the stream on the newly connected connection. Right? In any case, this change now makes sense to me assuming we agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that fits my understanding.

@@ -115,8 +115,6 @@ class GrpcJsonTranscoderIntegrationTest
upstream_request_->encodeTrailers(response_trailers);
}
EXPECT_TRUE(upstream_request_->complete());
} else {
ASSERT_TRUE(upstream_request_->waitForReset());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason? No stream created w/ the new code? Trying to understand the flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't call waitForNewStream, we don't have an upstream_request_ to call waitForReset on

@@ -306,17 +306,24 @@ TEST_P(Http2UpstreamIntegrationTest, UpstreamConnectionCloseWithManyStreams) {
{":authority", "host"}});
encoders.push_back(&encoder_decoder.first);
responses.push_back(std::move(encoder_decoder.second));

// Ensure we've established a connection before issuing the resets.
// TODO(snowp): Why can't we handle resets here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm a little confused here. What was the sequence causing the hang here? I don't think we should need to wait for upstream before sending client resets? What was actually hanging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, the code was hanging in waitForNewStream, which we never hit since we reset the stream while it's in the pending queue.

If only call waitForNewStream for streams that won't get reset, this ends up hanging in waitForEndStream about 50% of the time. The test sends between 1 and 51 requests, reseting every 15th, and every time it fails it's when a reset happens in the middle of all the requests (so number of requests > 15), but it doesn't always fail in that case.

It seems like it's getting stuck waiting for the (15 + 1)th stream to end, so I was thinking it might be a race condition either in the conn pool or in how we account for stream/connection events in test. It seems like somehow the reset on stream i=15 ends up affecting stream i=16.

I suspected that the pending request queue might be concurrently accessed, but guarding it with a mutex didn't seem to help.

I agree that this should work, but at this point I'm not even sure if the connection pool is buggy or if the tests just need to be reworked, so open to any ideas :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it some more I think it might be due to a race between when we call reset and when the connection is established: any reset that comes after connect will be proceeded by a new stream (because we've already called onPoolReady), while any reset before connect will not trigger a new stream (we'll cancel the pending request).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable merging this change until we understand this. I can think about this more tomorrow morning. Do you have full trace logs for the hang? That would be useful to look at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a gist with a good and bad test run:
failed run: https://gist.githubusercontent.com/snowp/ecb62fd9cf74dc885f68ddfe2aa0ac8e/raw/8e54586118f15cbcf9b0eb3eb8f1030447eda8e9/failure.out
successful run: https://gist.githubusercontent.com/snowp/ecb62fd9cf74dc885f68ddfe2aa0ac8e/raw/8e54586118f15cbcf9b0eb3eb8f1030447eda8e9/success.out

This is when running with the following modifications to the code in the PR: snowp@4b59099

Namely:

  • move waiting for the connection back to the original location
  • hard code the number of requests instead of picking a random number
  • skip waiting for requests that should be reset (without this we fail on the first waitForNewStream)

The test sends 17 requests, and resets every 15th of them, namely it will reset the 0th and 15th request.

This bit from the output logs makes me think my previous theory about this being a race between when the connection is established and when the request is reset:

~/Development/envoy(h2-queue):0 grep cancel failure
[2018-11-02 04:01:20.395][2075503][debug][router] [source/common/router/router.cc:938] [C1][S2898179549770959211] cancelling pool request
[2018-11-02 04:01:20.395][2075503][debug][pool] [source/common/http/conn_pool_base.cc:41] cancelling pending request
~/Development/envoy(h2-queue):0 grep cancel success
[2018-11-02 04:00:15.278][2069333][debug][router] [source/common/router/router.cc:938] [C1][S4194523254301487947] cancelling pool request
[2018-11-02 04:00:15.278][2069333][debug][pool] [source/common/http/conn_pool_base.cc:41] cancelling pending request
[2018-11-02 04:00:15.308][2069333][debug][router] [source/common/router/router.cc:938] [C1][S82244168407272366] cancelling pool request
[2018-11-02 04:00:15.308][2069333][debug][pool] [source/common/http/conn_pool_base.cc:41] cancelling pending request

Specifically, in the successful run both resets happened before the connection completed, as indicated by the log line stating that the pool request was canceled.

I suspect that in the run that failed, one of the reset requests created a stream (because it wasn't canceled), so it messed up the stream accounting on our end: we didn't expect the 15th request to generate a stream, so when it did we attributed it to the 16th request. This test is now failing because it's waiting for endStream on what we think is the 16th stream, but that's actually the 15th request's stream, which has been reset and won't reach endStream.

If I'm right, then in order to have this test send resets before the connection has been established, we need a way to tell whether we reset the request before or after the connect, because depending on that we may or may not see a stream on the FakeConnection. One option would be to wait for N streams, compare to total number of requests R and from that infer that the first (R-N) requests must have been reset before the connection was established.

Hopefully this makes sense. Would definitely want someone else to look this over to make sure it seems like a reasonable explanation of what we're seeing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks for the really detailed investigation here. I agree your theory makes sense.

@alyssawilk probably disagrees with me here, but I would be inclined to make this test more deterministic (remove the random number of requests). I think the randomness makes this more confusing than it probably needs to be.

Can we figure out a way to make this test a bit easier to understand and more deterministic? @alyssawilk can likely help with this on Monday.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll accept removing randomity here - I care more about it when it comes to payload sizes. Failing the nth request when n < [default max stream sizes] is less likely than triggering backup due to larger/smaller payloads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test to always send 20 requests, so we get a reset on the 1st and 15th request.

upstream_requests.emplace_back();
FakeStreamPtr stream;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable move isn't explicitly needed, right? Just trying to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not, just a remnant of me playing around. in fact, it's not even used so i'll probably just remove it

Snow Pettersen added 7 commits November 1, 2018 22:39
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to take a look - I'll do a pass later today after I get through my morning meeting block :-)

One high level question - do we currently have developer docs on the connection pool design? I'm pretty solid on the downstream data plane but less so with upstream and it strikes me as a thing worth having architectural docs on if we don't already. And not just because it'd make this review easier ;-)

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that day did not clear up the way I thought it was going to :-(
blocked off time monday to do a proper review. Apologies for the delay.

ConnectionPool::Callbacks& callbacks);
~PendingRequest();

// Cancellable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitty nit, while you're moving this can you change it to ConnectionPool::Cancellable?

I took this as a really silly comment at first pass ;-)

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@mattklein123
Copy link
Member

One high level question - do we currently have developer docs on the connection pool design? I'm pretty solid on the downstream data plane but less so with upstream and it strikes me as a thing worth having architectural docs on if we don't already. And not just because it'd make this review easier ;-)

There are not. I will take an action item to write some up.

@mattklein123
Copy link
Member

Marking as waiting.

/wait

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp
Copy link
Contributor Author

snowp commented Nov 6, 2018

Ping on review on this. I think all the current comments have been addressed

@mattklein123
Copy link
Member

@snowp will review today

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great. 2 small things.

/wait

dispatcher_.clearDeferredDeleteList();
}

// Verifies that reuqests are queued up in the conn pool and respect max request circuit breaking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo reuqests

@@ -306,16 +305,24 @@ TEST_P(Http2UpstreamIntegrationTest, UpstreamConnectionCloseWithManyStreams) {
{":authority", "host"}});
encoders.push_back(&encoder_decoder.first);
responses.push_back(std::move(encoder_decoder.second));

// Ensure we've established a connection before issuing the resets. This is necessary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be lifted up above the for loop since this is HTTP/2 Basically, make an HTTP connection and then wait for the upstream connection, and then do the for loop? That might be simpler to understand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that won't work because we're waiting for an upstream connection, which won't be established until we send at least one request through to the router. Moving the check up causes the test to time out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK makes sense. Then can we do it just on the first iteration since we should just make a single connection and need to wait for it?

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
@mattklein123 mattklein123 merged commit a196489 into envoyproxy:master Nov 8, 2018
fredlas pushed a commit to fredlas/envoy that referenced this pull request Mar 5, 2019
Modifies the http2 connection pool to use a request queue similar to
how the http/1.1 connection pool does. This is done in order to defer
the connection pool callbacks until the connection has been established
instead of simply invoking the callback immediately.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants