Missed upstream disconnect leading to 503 UC #6190

ppadevski · 2019-03-06T13:54:51Z

Hi,

I've been chasing a certain issue that is very difficult to reproduce. I'm using envoy v1.8.0 and have an unencrypted HTTP/1.1 local upstream, an HTTP/2 downstream, and use HTTP connection manager. The cluster configuration is

name: "/websso"
connect_timeout {
  seconds: 120
}
hosts {
  socket_address {
    address: "127.0.0.1"
    port_value: 7080
  }
}

and the route match rule is

{"match":{"path":"/websso"}, "route: {"cluster":"/websso","timeout":"28800s","use_websocket":true,"idle_timeout":"28800s"}}}

My issue is that from time to time I get these "503 UC"s, e.g.

2019-03-06T11:02:11.968Z GET /websso HTTP/2 200 - 0 433415 6 5 0 10.197.41.4:57937 10.197.41.64:443 127.0.0.1:60382 127.0.0.1:7080
2019-03-06T11:07:34.910Z GET /websso HTTP/2 503 UC 0 57 0 - - 10.197.41.4:58440 10.197.41.64:443 127.0.0.1:60382 127.0.0.1:7080

The packet capture has this to say for request/response 1.

11:02:11.968219    127.0.0.1             60382    127.0.0.1             7080      TCP      68     60382 → 7080 [SYN] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 WS=256
11:02:11.968225    127.0.0.1             7080     127.0.0.1             60382     TCP      68     7080 → 60382 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 WS=256
11:02:11.968230    127.0.0.1             60382    127.0.0.1             7080      TCP      56     60382 → 7080 [ACK] Seq=1 Ack=1 Win=43776 Len=0
11:02:11.969252    127.0.0.1             60382    127.0.0.1             7080      TCP      1894   60382 → 7080 [PSH, ACK] Seq=1 Ack=1 Win=43776 Len=1838
11:02:11.969256    127.0.0.1             7080     127.0.0.1             60382     TCP      56     7080 → 60382 [ACK] Seq=1 Ack=1839 Win=174848 Len=0
11:02:11.972626    127.0.0.1             7080     127.0.0.1             60382     TCP      8248   7080 → 60382 [PSH, ACK] Seq=1 Ack=1839 Win=174848 Len=8192
11:02:11.972638    127.0.0.1             60382    127.0.0.1             7080      TCP      56     60382 → 7080 [ACK] Seq=1839 Ack=8193 Win=174848 Len=0
...
11:02:11.973868    127.0.0.1             7080     127.0.0.1             60382     TCP      40548  7080 → 60382 [PSH, ACK] Seq=393094 Ack=1839 Win=174848 Len=40492
11:02:11.973886    127.0.0.1             60382    127.0.0.1             7080      TCP      56     60382 → 7080 [ACK] Seq=1839 Ack=433586 Win=320512 Len=0
11:03:12.094243    127.0.0.1             7080     127.0.0.1             60382     TCP      56     7080 → 60382 [FIN, ACK] Seq=433586 Ack=1839 Win=174848 Len=0
11:03:12.132002    127.0.0.1             60382    127.0.0.1             7080      TCP      56     60382 → 7080 [ACK] Seq=1839 Ack=433587 Win=464896 Len=0

and the following for request/response 2.

11:07:34.910503    127.0.0.1             60382    127.0.0.1             7080      TCP      1933   60382 → 7080 [PSH, ACK] Seq=1 Ack=1 Win=1816 Len=1877
11:07:34.910519    127.0.0.1             7080     127.0.0.1             60382     TCP      56     7080 → 60382 [RST] Seq=1 Win=0 Len=0

It seems that envoy didn't reply with a FIN and instead pooled the connection and tried to reuse it 4 minutes later which ended badly.

The envoy log contains the following for that connection.

2019-03-06T11:02:11.968Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:25 [C715] connecting
2019-03-06T11:02:11.968Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:632 [C715] connecting to 127.0.0.1:7080
2019-03-06T11:02:11.968Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:641 [C715] connection in progress
2019-03-06T11:02:11.969Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:514 [C715] connected
2019-03-06T11:02:11.969Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:63 [C715] connected
2019-03-06T11:02:11.969Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:252 [C715] attaching to next request
2019-03-06T11:02:11.974Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:94 [C715] response complete
2019-03-06T11:02:11.974Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:209 [C715] response complete
2019-03-06T11:02:11.974Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:247 [C715] moving to ready
2019-03-06T11:07:34.910Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:90 [C715] using existing connection
2019-03-06T11:07:34.910Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:499 [C715] remote close
2019-03-06T11:07:34.910Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:181 [C715] closing socket: 0
2019-03-06T11:07:34.910Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:81 [C715] disconnect. resetting 1 pending requests
2019-03-06T11:07:34.910Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:104 [C715] request reset
2019-03-06T11:07:34.910Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:123 [C715] client disconnected

A good connection that happened around that time looks like

2019-03-06T11:02:11.967Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:25 [C714] connecting
2019-03-06T11:02:11.967Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:632 [C714] connecting to 127.0.0.1:7080
2019-03-06T11:02:11.967Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:641 [C714] connection in progress
2019-03-06T11:02:11.969Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:514 [C714] connected
2019-03-06T11:02:11.969Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:63 [C714] connected
2019-03-06T11:02:11.969Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:252 [C714] attaching to next request
2019-03-06T11:02:11.973Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:94 [C714] response complete
2019-03-06T11:02:11.973Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:209 [C714] response complete
2019-03-06T11:02:11.973Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:247 [C714] moving to ready
2019-03-06T11:03:12.758Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:499 [C714] remote close
2019-03-06T11:03:12.758Z debug envoy[37657] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:181 [C714] closing socket: 0
2019-03-06T11:03:12.758Z debug envoy[37657] [Originator@6876 sub=client] source/common/http/codec_client.cc:81 [C714] disconnect. resetting 0 pending requests
2019-03-06T11:03:12.758Z debug envoy[37657] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:123 [C714] client disconnected

The upstream server has a 1 minute idle timeout and it seems that for C714 and many other connections envoy properly received and processed the disconnect but not for C715.

Have you heard of the above issue before and is there any remedy for it? It is possible to retry once for that particular case only and how?

The text was updated successfully, but these errors were encountered:

dio · 2019-03-06T14:49:59Z

@alyssawilk do you have any ideas about this? Thanks!

alyssawilk · 2019-03-06T20:22:42Z

Ok, so reading this the first request was proxied successfully, the upstream sent a FIN, but Envoy reused the connection 4s later for a subsequent request?

There's always a potential race where a FIN may not be received, but 4s pause is not a race - that's clearly crummy. I thought that Envoy listened for data between H1 requests should have detected that FIN but maybe I'm misremembering. @lizan do you remember from your work on #2871? If not I'll dig around.

ppadevski · 2019-03-07T12:17:10Z

Ok, so reading this the first request was proxied successfully, the upstream sent a FIN, but Envoy reused the connection 4s later for a subsequent request?

I would like to clarify that we're talking minutes and not seconds. That being said I was able to reproduce the issue with trace logs.

From C425 (good)

2019-03-07T11:04:52.455Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:212] [C425] response complete
2019-03-07T11:04:52.455Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:251] [C425] moving to ready
2019-03-07T11:04:52.455Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:367] [C425] parsed 39521 bytes
2019-03-07T11:05:53.254Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:437] [C425] socket event: 3
2019-03-07T11:05:53.254Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:505] [C425] write ready
2019-03-07T11:05:53.254Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:475] [C425] read ready
2019-03-07T11:05:53.254Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/raw_buffer_socket.cc:21] [C425] read returns: 0
2019-03-07T11:05:53.254Z debug envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:499] [C425] remote close
2019-03-07T11:05:53.254Z debug envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:181] [C425] closing socket: 0
2019-03-07T11:05:53.254Z debug envoy[64427] [Originator@6876 sub=client] source/common/http/codec_client.cc:81] [C425] disconnect. resetting 0 pending requests
2019-03-07T11:05:53.254Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:126] [C425] client disconnected

From C427 (bad)

2019-03-07T11:04:52.457Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:212] [C427] response complete
2019-03-07T11:04:52.457Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:251] [C427] moving to ready
2019-03-07T11:04:52.457Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:367] [C427] parsed 564201 bytes
2019-03-07T11:04:52.457Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:437] [C427] socket event: 2
2019-03-07T11:04:52.457Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:505] [C427] write ready
2019-03-07T11:05:53.254Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:437] [C427] socket event: 2
2019-03-07T11:05:53.254Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:505] [C427] write ready
2019-03-07T11:14:00.706Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:92] [C427] using existing connection

It seems that libevent received correctly socket event: 3 (read ready and write ready) in the good case but received only write ready in the bad one. I'm using Photon OS, maybe the issue has something to do with its TCP stack.

ppadevski · 2019-03-07T13:30:47Z

Actually now that I looked closer at C425 (good) and C427 (bad) I noticed another difference. Attaching logs for both upstreams.

C425.txt
C427.txt

2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:346] [C427] parsing 564201 bytes
2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:444] [C427] message begin
2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:314] [C427] completed header: key=Last-Modified value=Thu, 21 Feb 2019 17:28:36 GMT
2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:314] [C427] completed header: key=Accept-Ranges value=bytes
2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:314] [C427] completed header: key=Content-Type value=application/javascript
2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:314] [C427] completed header: key=Content-Length value=564017
2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:410] [C427] headers complete
2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:314] [C427] completed header: key=Date value=Thu, 07 Mar 2019 11:04:52 GMT
2019-03-07T11:04:52.456Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:280] [C427] readDisable: enabled=true disable=true
2019-03-07T11:04:52.457Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:431] [C427] message complete
2019-03-07T11:04:52.457Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:735] [C427] message complete
2019-03-07T11:04:52.457Z debug envoy[64427] [Originator@6876 sub=client] source/common/http/codec_client.cc:94] [C427] response complete
2019-03-07T11:04:52.457Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:212] [C427] response complete
2019-03-07T11:04:52.457Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:251] [C427] moving to ready
2019-03-07T11:04:52.457Z trace envoy[64427] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:367] [C427] parsed 564201 bytes
2019-03-07T11:04:52.457Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:437] [C427] socket event: 2
2019-03-07T11:04:52.457Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:505] [C427] write ready
2019-03-07T11:05:53.254Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:437] [C427] socket event: 2
2019-03-07T11:05:53.254Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:505] [C427] write ready
2019-03-07T11:14:00.706Z debug envoy[64427] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:92] [C427] using existing connection
2019-03-07T11:14:00.706Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:280] [C427] readDisable: enabled=false disable=false
2019-03-07T11:14:00.706Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:374] [C427] writing 1881 bytes, end_stream false
2019-03-07T11:14:00.706Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:437] [C427] socket event: 2
2019-03-07T11:14:00.706Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:505] [C427] write ready
2019-03-07T11:14:00.706Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/raw_buffer_socket.cc:62] [C427] write returns: 1881
2019-03-07T11:14:00.707Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:437] [C427] socket event: 3
2019-03-07T11:14:00.707Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:505] [C427] write ready
2019-03-07T11:14:00.707Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:475] [C427] read ready
2019-03-07T11:14:00.707Z trace envoy[64427] [Originator@6876 sub=connection] source/common/network/raw_buffer_socket.cc:21] [C427] read returns: 0
2019-03-07T11:14:00.707Z debug envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:499] [C427] remote close
2019-03-07T11:14:00.707Z debug envoy[64427] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:181] [C427] closing socket: 0

It seems that read was disabled, hence no FIN notification.

ppadevski · 2019-03-07T13:54:06Z

I'm not familiar with the envoy codebase but here's an idea - why not readDisable(false) unconditionally when "moving to ready".

ppadevski · 2019-03-07T15:15:07Z

FYI the following patch seems to work fine.

diff --git a/source/common/http/http1/codec_impl.cc b/source/common/http/http1/codec_impl.cc
index 89c3450..b97780b 100644
--- a/source/common/http/http1/codec_impl.cc
+++ b/source/common/http/http1/codec_impl.cc
@@ -733,6 +733,11 @@ void ClientConnectionImpl::onBody(const char* data, size_t length) {

 void ClientConnectionImpl::onMessageComplete() {
   ENVOY_CONN_LOG(trace, "message complete", connection_);
+  if (connection_.state() == Network::Connection::State::Open) {
+    while (!connection_.readEnabled()) {
+      connection_.readDisable(false);
+    }
+  }
   if (ignore_message_complete_for_100_continue_) {
     ignore_message_complete_for_100_continue_ = false;
     return;

alyssawilk · 2019-03-07T19:30:16Z

Hey @mattklein123's mased on #2715 I think we should detect that disconnect, and the supplied patch working makes me think the H1 stream is being returned to the conn pool disabled (probably due to flow control...).

The hopefully quick question I have for you is if onMessageComplete is the right place to unwind those - I'd lean towards unwinding where the connection is returned to the conn pool for reuse (where it's moved to the ready_clients_ list?) just in case there's lag between the message being complete and the prior stream detangling itself from data callbacks. Thoughts?

ppadevski · 2019-03-08T15:20:57Z

The patch that I proposed works on a large scale setup that used to reproduce the issue, @see C5683.txt

2019-03-07T14:48:52.155Z trace envoy[19419] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:410] [C5683] headers complete
2019-03-07T14:48:52.155Z trace envoy[19419] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:314] [C5683] completed header: key=Date value=Thu, 07 Mar 2019 14:48:52 GMT
2019-03-07T14:48:52.155Z trace envoy[19419] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:280] [C5683] readDisable: enabled=true disable=true
2019-03-07T14:48:52.156Z trace envoy[19419] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:431] [C5683] message complete
2019-03-07T14:48:52.156Z trace envoy[19419] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:735] [C5683] message complete
2019-03-07T14:48:52.156Z trace envoy[19419] [Originator@6876 sub=connection] source/common/network/connection_impl.cc:280] [C5683] readDisable: enabled=false disable=false
2019-03-07T14:48:52.156Z debug envoy[19419] [Originator@6876 sub=client] source/common/http/codec_client.cc:94] [C5683] response complete
2019-03-07T14:48:52.156Z debug envoy[19419] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:209] [C5683] response complete
2019-03-07T14:48:52.156Z debug envoy[19419] [Originator@6876 sub=pool] source/common/http/http1/conn_pool.cc:247] [C5683] moving to ready
2019-03-07T14:48:52.156Z trace envoy[19419] [Originator@6876 sub=http] source/common/http/http1/codec_impl.cc:367] [C5683] parsed 564201 bytes

Ideally I'd like to see "readDisable: enabled=false disable=false" before "moving to ready" but was in a hurry to patch the environment and didn't have time to explore that option. The good news is that I no longer get "upstream connect error or disconnect/reset before headers" : )

ppadevski · 2019-03-08T15:23:43Z

Now that we got to the bottom of this issue, I think that it's a bug and not a question.

dio · 2019-03-08T15:50:27Z

@ppadevski are you willing to propose a patch for this? Thanks!

mattklein123 · 2019-03-08T17:09:07Z

@ppadevski @alyssawilk how is readDisabled(true) getting called in the first place? From my quick read of the code I think this must be due to flow control? I think the router is telling the upstream connection to stop reading, but then the upstream connection finishes, right? (Agree this is a bug.)

The hopefully quick question I have for you is if onMessageComplete is the right place to unwind those - I'd lean towards unwinding where the connection is returned to the conn pool for reuse (where it's moved to the ready_clients_ list?) just in case there's lag between the message being complete and the prior stream detangling itself from data callbacks. Thoughts?

@alyssawilk I tend to agree with you here that we should be handling this in the calling code. We are already kind of doing this here https://github.com/envoyproxy/envoy/blob/master/source/common/http/http1/codec_impl.cc#L677, but I think that code now seems to be in the wrong place. I think we should replace that code with an ASSERT that reads are already enabled, and enable reads in the owning codec client code that wants to reuse the connection?

lizan · 2019-03-08T20:56:30Z

Catching up here... Yeah I agree that this is a bug. #2871 tried to cover the race in the case upstream sends FIN right after completed response. IIUC per this comment we're not guaranteed to get notified for FIN if read is disabled.

Moving this part of code to here sounds right to me.

mattklein123 · 2019-03-20T16:24:33Z

I will try to fix this before we ship 1.10.0

iandyh · 2019-04-08T06:35:56Z

@mattklein123 Hello. Just want to double check, is the fix now in 1.10.0? Reading through the release notes does not indicate so: https://www.envoyproxy.io/docs/envoy/v1.10.0/intro/version_history

thank you!

jaygorrell · 2019-04-13T21:01:34Z

Does anyone know some of the details surrounding what causes this issue to trigger?

I have an Istio environment with 200+ services using identical foundational setups and this bug is only hitting a few of them -- one more than the others (which happens to serve slower responses if that's relevant; something like 30+ seconds). These are node services with a 5s keep-alive which I assume could cause it to hit a little more frequently... but again, this is only really hitting a few services out of the many.

Interestingly, this also doesn't seem to get caught by 503 retries... using either retriable-status-codes with 503 or 5xx.

Previously we were doing this when we create a new stream, but on a reused connection this can lead to us missing an upstream disconnection when the connection is placed back in the pool. Fixes #6190 Signed-off-by: Matt Klein <mklein@lyft.com>

ChrisMagnuson · 2019-05-02T21:20:43Z

I was struggling with half open connections where my node app would send a tcp fin but envoy would still proxy the next get request to the server which would then trigger the node app to send back a tcp rst.

This was not fixed for me in 1.10 but I just pulled the latest envoy-alpine-dev and the issues have gone away, hopefully because envoy after receiving the fin ack from the nodejs app is removing the connection from the connection pool.

From other threads it seems like I need to make my node app (koa plus node v12) do something to send the http header "Connection: Closed" so that it behaves better, will have to look into that more.

If your hitting this and can run on the latest dev code you might want to give this a try.

…#6578) Previously we were doing this when we create a new stream, but on a reused connection this can lead to us missing an upstream disconnection when the connection is placed back in the pool. Fixes envoyproxy#6190 Signed-off-by: Matt Klein <mklein@lyft.com>

dio added the question Questions that are neither investigations, bugs, nor enhancements label Mar 6, 2019

dio added bug help wanted Needs help! and removed question Questions that are neither investigations, bugs, nor enhancements labels Mar 8, 2019

mattklein123 added this to the 1.10.0 milestone Mar 8, 2019

mattklein123 self-assigned this Mar 20, 2019

mattklein123 removed the help wanted Needs help! label Mar 20, 2019

mattklein123 modified the milestones: 1.10.0, 1.11.0 Apr 2, 2019

mattklein123 added the help wanted Needs help! label Apr 2, 2019

arianmotamedi mentioned this issue Apr 2, 2019

Sporadic 503 errors istio/istio#9113

Closed

This was referenced Apr 9, 2019

Random TCP Resets istio/istio#13107

Closed

503 "upstream connect error or disconnect/reset before headers" in 1.1 with low traffic istio/istio#13205

Closed

jaygorrell mentioned this issue Apr 14, 2019

Missed upstream disconnect (503) not retried with 5xx or retriable-status-codes and "503" #6575

Closed

mattklein123 mentioned this issue Apr 14, 2019

http1: enable reads when final pipeline response received #6578

Merged

mattklein123 closed this as completed in #6578 Apr 15, 2019

vaibhavrtk mentioned this issue Apr 30, 2019

Intermittent 503s sending requests to gRPC service. emissary-ingress/emissary#504

Closed

This was referenced May 3, 2019

Jupyter - Http 503 - upstream connect error or disconnect/reset before headers kubeflow/kubeflow#1710

Closed

upstream connect error or disconnect/reset before headers kubeflow/kubeflow#2852

Closed

esmet mentioned this issue May 12, 2019

Update past Envoy 1.10 emissary-ingress/emissary#1523

Closed

3 tasks

DanThrane mentioned this issue Jun 3, 2019

Ambassador has incresed 503s on prod SDU-eScience/UCloud#811

Closed

enbohm mentioned this issue Oct 18, 2019

Sporadic 503 responses from Quarkus MP application quarkusio/quarkus#4572

Closed

columbus9963 mentioned this issue Jun 15, 2020

Envoy occasional 504s but the upstream response 200 #11587

Closed

zufardhiyaulhaq mentioned this issue Nov 2, 2021

Intermittent broken pipe istio/istio#24387

Closed

kevin-shelaga mentioned this issue Jun 27, 2022

Envoy will send requests to a closed connection kgateway-dev/kgateway#6647

Closed

KBaichoo mentioned this issue Nov 2, 2022

HTTP 1.1 connections intermittently used after connection is closed resulting in 503 #23625

Closed

jbohanon mentioned this issue May 10, 2023

Requests co-located on worker thread 503 when upstream closes connection #27330

Closed

smagnani96 mentioned this issue Jan 10, 2025

plain TCP RST packet in pod-to-pod-with-l7-policy-encryption for IPv6+IPSec+VXLan cilium/cilium#35485

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missed upstream disconnect leading to 503 UC #6190

Missed upstream disconnect leading to 503 UC #6190

ppadevski commented Mar 6, 2019 •

edited

Loading

dio commented Mar 6, 2019

alyssawilk commented Mar 6, 2019

ppadevski commented Mar 7, 2019

ppadevski commented Mar 7, 2019

ppadevski commented Mar 7, 2019

ppadevski commented Mar 7, 2019

alyssawilk commented Mar 7, 2019

ppadevski commented Mar 8, 2019

ppadevski commented Mar 8, 2019

dio commented Mar 8, 2019

mattklein123 commented Mar 8, 2019

lizan commented Mar 8, 2019

mattklein123 commented Mar 20, 2019

iandyh commented Apr 8, 2019

jaygorrell commented Apr 13, 2019 •

edited

Loading

ChrisMagnuson commented May 2, 2019 •

edited

Loading

Missed upstream disconnect leading to 503 UC #6190

Missed upstream disconnect leading to 503 UC #6190

Comments

ppadevski commented Mar 6, 2019 • edited Loading

dio commented Mar 6, 2019

alyssawilk commented Mar 6, 2019

ppadevski commented Mar 7, 2019

ppadevski commented Mar 7, 2019

ppadevski commented Mar 7, 2019

ppadevski commented Mar 7, 2019

alyssawilk commented Mar 7, 2019

ppadevski commented Mar 8, 2019

ppadevski commented Mar 8, 2019

dio commented Mar 8, 2019

mattklein123 commented Mar 8, 2019

lizan commented Mar 8, 2019

mattklein123 commented Mar 20, 2019

iandyh commented Apr 8, 2019

jaygorrell commented Apr 13, 2019 • edited Loading

ChrisMagnuson commented May 2, 2019 • edited Loading

ppadevski commented Mar 6, 2019 •

edited

Loading

jaygorrell commented Apr 13, 2019 •

edited

Loading

ChrisMagnuson commented May 2, 2019 •

edited

Loading