Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: thrift filter decoding introduces high latency #13082

Closed
caitong93 opened this issue Sep 14, 2020 · 5 comments · Fixed by #13592
Closed

perf: thrift filter decoding introduces high latency #13082

caitong93 opened this issue Sep 14, 2020 · 5 comments · Fixed by #13592

Comments

@caitong93
Copy link

caitong93 commented Sep 14, 2020

Title: thrift filter introduces high latency

Description:

I observed high latency(thrift decoding cost 10ms+) when using thrift filter. qps under 1000, request/response message size < 1MB, connections < 100 , cpu usage < 0.01 core

envoy version: 1.12 ( cherry-picked the lastest thrift filter commits)
thrift protocol: framed + binary

Analysis shows most time is spent on thrift decoding. After dig into the code, I found thrift filter decode the whole struct of message to memory and encode it. I use thrift filter for loadbalancing, only metadata is useful. In this case, thrift filter waste a lot time decoding the struct part.
I wonder if we can add support to thrift filter to send the struct part as a passthrough.

Logs:

By replaying traffic from production, I observed 14ms latency for decoding a 930648 bytes message.

Following is debug log for a single request. For more detail, see trace log https://drive.google.com/file/d/1jkSPDFQmWBpa4BXbFJcdeNLzV9uiASbH/view?usp=sharing

[2020-09-14 21:00:54.437][17119][debug][conn_handler] [external/envoy/source/server/connection_handler_impl.cc:335] [C0] new connection
[2020-09-14 21:00:54.437][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:376] thrift: 65482 bytes available
[2020-09-14 21:00:54.437][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:397] thrift: framed(auto) transport started
[2020-09-14 21:00:54.437][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:431] thrift: protocol auto, state MessageBegin, 65478 bytes available
[2020-09-14 21:00:54.437][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/router/router_impl.cc:227] [C0][S5593458996354885164] cluster 'thrift_service' match for method 'PredictionService:Predict'
[2020-09-14 21:00:54.437][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/router/router_impl.cc:260] [C0][S5593458996354885164] router decoding request
[2020-09-14 21:00:54.437][17119][debug][pool] [external/envoy/source/common/tcp/conn_pool.cc:82] creating a new connection
[2020-09-14 21:00:54.437][17119][debug][pool] [external/envoy/source/common/tcp/conn_pool.cc:362] [C1] connecting
[2020-09-14 21:00:54.437][17119][debug][connection] [external/envoy/source/common/network/connection_impl.cc:706] [C1] connecting to 127.0.0.1:9090
[2020-09-14 21:00:54.437][17119][debug][connection] [external/envoy/source/common/network/connection_impl.cc:715] [C1] connection in progress
[2020-09-14 21:00:54.437][17119][debug][pool] [external/envoy/source/common/tcp/conn_pool.cc:108] queueing request due to no available connections
[2020-09-14 21:00:54.437][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:439] thrift: wait for continuation
[2020-09-14 21:00:54.437][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/conn_manager.cc:54] [C0] thrift filter stopped
[2020-09-14 21:00:54.437][17119][debug][connection] [external/envoy/source/common/network/connection_impl.cc:554] [C1] connected
[2020-09-14 21:00:54.438][17119][debug][pool] [external/envoy/source/common/tcp/conn_pool.cc:283] [C1] assigning connection
[2020-09-14 21:00:54.438][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/conn_manager.cc:127] [C0] thrift filter continued
[2020-09-14 21:00:54.438][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:376] thrift: 930611 bytes available
[2020-09-14 21:00:54.438][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:431] thrift: protocol binary(auto), state StructBegin, 930611 bytes available
[2020-09-14 21:00:54.452][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:455] thrift: framed(auto) transport ended
[2020-09-14 21:00:54.453][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:376] thrift: 12153 bytes available
[2020-09-14 21:00:54.453][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:397] thrift: framed transport started
[2020-09-14 21:00:54.453][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:431] thrift: protocol binary, state MessageBegin, 12149 bytes available
[2020-09-14 21:00:54.453][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/decoder.cc:455] thrift: framed transport ended
[2020-09-14 21:00:54.453][17119][debug][thrift] [external/envoy/source/extensions/filters/network/thrift_proxy/router/router_impl.cc:310] [C0][S5593458996354885164] response complete

Flamegraph

https://drive.google.com/file/d/1QNHe3I_fbXJtRAPY0alCNUiNO80Zfgf3/view?usp=sharing

Configuration

    - filters:
      - name: envoy.filters.network.thrift_proxy
        config:
        #typed_config:
        #  "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          stat_prefix: thrift_proxy
          transport: AUTO_TRANSPORT
          protocol: AUTO_PROTOCOL
          route_config:
            name: local_route
            routes:
            - match:
                method_name: ""
              route:
                cluster: thrift_service
          thrift_filters:
          - name: envoy.filters.thrift.router
@caitong93 caitong93 added bug triage Issue requires triage labels Sep 14, 2020
@mattklein123 mattklein123 added area/perf and removed triage Issue requires triage labels Sep 16, 2020
@mattklein123
Copy link
Member

cc @rgs1

@rgs1
Copy link
Member

rgs1 commented Sep 17, 2020

cc: @fishcakez

@caitong93
Copy link
Author

caitong93 commented Oct 15, 2020

I write a very draft pr to gain some thoughts on this. Can you take a look when you have time? Thanks !

@mattklein123

@rgs1
Copy link
Member

rgs1 commented Oct 15, 2020

@caitong93 nice! we can review & test it out on our end as well.

@caitong93
Copy link
Author

nice! we can review & test it out on our end as well.

Great ! I will make the pr ready for review soon.

htuch pushed a commit that referenced this issue Nov 30, 2020
…t message (#13592)

Fixes #13082

Support skip decoding data after metadata in the thrift message.

In payload_passthrough mode, there are some issues:

Envoy cannot detect some errors and exceptions ( e.g. a reply that contains exceptions ). It's possible to improve this by peeking beginning of the payload.

payload_passthrough controls both request and response path. It can be split into two options if we want more fine-grained control.

FilterStatus passthroughData(Buffer::Instance& data, uint64_t bytes_to_passthrough) will not prohibit custom filters to modify buffer. Now it is assumed custom filters won't do that, otherwise behavior is undefined.

Risk Level: Medium
Testing:

unit test:

config
decoder
router
conn_manager
integration:

add an parameter payload_passthrough
manual:

send requests and verify responses

Signed-off-by: Tong Cai <caitong93@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants