wasm: onLog is not always called for network streams on upstream failure #13806

kyessenov · 2020-10-28T22:24:58Z

Ref: istio/istio#24720
Validated with a flakey upstream that becomes unavailable at an interval.
Recording number of calls to onLog and onNewConnection, we see that onLog is not always called:
connection_count: 11030 log_count: 10662

/cc @PiotrSikora

kyessenov · 2020-10-28T22:34:19Z

Note: stream contexts are also leaking (they are not destroyed in null VM after the stream is gone).

kyessenov · 2020-10-28T23:12:49Z

We chatted briefly and believe that we should probably simplify the ABI to stop trying to figure out who closed the connection (upstream or downstream) and then wait for both. Instead, we should just emit closure event if either upstream or downstream closes.

PiotrSikora · 2020-10-29T03:37:58Z

You can do this right now, there is nothing at the ABI level that is stopping you from performing actions on downstream or upstream connection close.

It looks that the guard to wait for both connections being closed was added here: envoyproxy/envoy-wasm#453, though I don't see anything wrong there.

Note that the "upstream connection close" event implementation in Envoy is a bit hacky right now, and it's triggered when downstream connection sees doWrite(..., end_stream=true). That's most likely the source of the issue.

cc @gargnupur

gargnupur · 2020-10-29T18:14:18Z

At the time of envoyproxy/envoy-wasm#453 the lifecycles were not very clear and hence the guard was added to make sure, we don't close preemptively. But looks like now, it's ok to close if downstream close event is reached..

PiotrSikora · 2020-10-29T18:40:22Z

From what @kyessenov mentioned yesterday, I was under the impression that downstream close events are getting lost, and if that's the case, then relying on them might not solve anything.

kyessenov · 2020-10-29T18:47:20Z

I think we might be lost in the terminology here. Upstream with respect to the downstream connection is the proxy, so upstream close really is downstream local close from the client perspective. From reading other network filters, either Local or Remote close is guaranteed for a connection (but not both). I suggest we follow this practice and initiate termination once any close event is received. I don't think we can reliably detect whether the connection to an upstream host is closed (it's not even always there for direct response, or could be tunnelled).

PiotrSikora · 2020-11-02T10:34:45Z

The issue is that the upstream close event is not raised if the connection to upstream was never established (e.g. connection timeout, connection refused), but the context destruction is gated on that event, so it leaks in case of failures.

PiotrSikora · 2020-11-15T01:36:47Z

See: #13939 and #13940.

github-actions · 2020-12-15T04:03:44Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

PiotrSikora · 2020-12-15T07:03:34Z

Fixed / worked around in #13836.

kyessenov added bug triage Issue requires triage labels Oct 28, 2020

kyessenov changed the title ~~wasm: onLog is not always called for networking stream on upstream failure~~ wasm: onLog is not always called for network streams on upstream failure Oct 28, 2020

yanavlasov added area/wasm and removed triage Issue requires triage labels Oct 29, 2020

kyessenov mentioned this issue Oct 30, 2020

wasm: fix network leak #13836

Merged

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 15, 2020

PiotrSikora closed this as completed Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm: onLog is not always called for network streams on upstream failure #13806

wasm: onLog is not always called for network streams on upstream failure #13806

kyessenov commented Oct 28, 2020

kyessenov commented Oct 28, 2020

kyessenov commented Oct 28, 2020 •

edited

Loading

PiotrSikora commented Oct 29, 2020

gargnupur commented Oct 29, 2020

PiotrSikora commented Oct 29, 2020

kyessenov commented Oct 29, 2020

PiotrSikora commented Nov 2, 2020

PiotrSikora commented Nov 15, 2020

github-actions bot commented Dec 15, 2020

PiotrSikora commented Dec 15, 2020

wasm: onLog is not always called for network streams on upstream failure #13806

wasm: onLog is not always called for network streams on upstream failure #13806

Comments

kyessenov commented Oct 28, 2020

kyessenov commented Oct 28, 2020

kyessenov commented Oct 28, 2020 • edited Loading

PiotrSikora commented Oct 29, 2020

gargnupur commented Oct 29, 2020

PiotrSikora commented Oct 29, 2020

kyessenov commented Oct 29, 2020

PiotrSikora commented Nov 2, 2020

PiotrSikora commented Nov 15, 2020

github-actions bot commented Dec 15, 2020

PiotrSikora commented Dec 15, 2020

kyessenov commented Oct 28, 2020 •

edited

Loading