[security] Error handling policy and macro usage clarification #14190

asraa · 2020-11-25T19:39:43Z

Error Handling and Macro Usage Policy Updates and Clarifications

Description:
Envoy contains over 1000 ASSERTs in core code, and over 400 exist in router and HTTP components. In some cases, ASSERTs are not used as statements of true invariants, but used incorrectly to indicate possible errors on the data plane. Especially as Envoy moves towards a hardened posture against extensions, dependencies, and upstreams, macro usage and when proper error handling is needed should be clarified. There is some evidence that ASSERTs are not consistently used for true invariants, either because of wishful assumptions on incoming traffic or from mistaking the invariant (GHSA-gxvv-x4p2-rppp, #13756, #9996, #9509).

In addition to broadening and clarifying types of errors that should be handled, this proposal also suggests adding guidelines for using ENVOY_BUGs, which can be used in addition to error handling and as a way to detect violations in release mode. This could especially help debugging complex crashes, where a condition was violated at some earlier point in time.

A link the the full proposal is here.

Summary
A high level of the changes, with significant differences bolded.

Gracefully handle all errors that may be caused by

Untrusted data plane traffic (from downstreams, upstreams, or extensions like filters)
Raised by the Envoy process environment and are plausible to happen (reading bad data from a runtime file, system calls that require permissions, network errors).
Configuration related errors after startup (e.g. loading configuration at runtime, this is implicit from above)
Third party dependency return codes (e.g. use of a 3P parser, an external RPC that requires permissions). Some return codes may be handled simply by continuing (e.g. calling an RPC out of process can be handled by “continuing”).

Macros:

RELEASE_ASSERT: fatal check
ASSERT: compiled out of opt mode, fatal in debug mode
ENVOY_BUG: compiled in opt mode and log, fatal in debug mode
- ENVOY_BUG_ALPHA (effectively ENVOY_BUG): Used for alpha/rapidly changing protocols that want to have detectability on probable invariants.
ENVOY_BUG_INTERNAL (compiled by default to ASSERT, can be compiled to ENVOY_BUG): provable, bounded internal invariant. No error handling needed. If this is a weakly held belief that would result in misbehavior if triggered, use ENVOY_BUG and add error handling.

Some clarifications/additions in macro usage philosophy:

ENVOY_BUGs provide detectability and more confidence than an ASSERT. This is useful for non-trivial conditions, those with complex control flow, and in rapidly changing protocols. Testing should be added to ensures Envoy can continue to operate even if an ENVOY_BUG condition is hit. Otherwise, there is no point in having it compile in release mode and increment a stat on failure.
Do not ASSERT on conditions imposed by the external environment. Add error handling.
ASSERTs should be standalone and understandable to a reader. Add comments.
ASSERTs should be robust to future changes. For example, an ASSERT can document the internal state of a class or an internal post-condition that always holds due to a preceding call.

In addition, the main addition to the proposal is having clear guidelines. A high level bucketing can show potential usages of ASSERT/RELEASE_ASSERT, ENVOY_BUG, and error handling. There is always room for cross-over, dotted lines indicate places where that may happen.

Open Questions

As time goes on, ENVOY_BUGs should be monitored in production and triaged upstream if needed. Logging levels/severity levels for ENVOY_BUGs may be useful?
Can we provide tools to help users monitor ENVOY_BUG performance? Should incrementing a stat be configurable in case this is too expensive?

Proposed Actions

Reach consensus on some of the additional places errors should be handled
Update STYLE.md guidelines with examples
Add and document ENVOY_BUG_INTERNAL/ASSERT_INVARIANT for internal class invariants that is ASSERT by default but can be configured to ENVOY_BUG.
Hopefully ask reviewers and developers to move towards these standards over time

Relevant Links
#9087

htuch · 2020-11-25T21:17:56Z

@envoyproxy/maintainers for review

mattklein123 · 2020-11-25T21:50:27Z

Overall looks fine to me. I left one comment in the doc about ENVOY_BUG_INTERNAL vs. ASSERT and how we might avoid bike-shedding.

github-actions · 2020-12-26T00:12:34Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions · 2021-01-02T00:16:28Z

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

asraa · 2021-03-08T13:48:17Z

Fixed by #14575
#14800

asraa added area/security design proposal Needs design doc/proposal before implementation labels Nov 25, 2020

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 26, 2020

github-actions bot closed this as completed Jan 2, 2021

asraa removed the stale stalebot believes this issue/PR has not been touched recently label Jan 5, 2021

asraa mentioned this issue Jan 6, 2021

[docs] Add guidance on ENVOY_BUG in STYLE.md #14575

Merged

mattklein123 reopened this Jan 7, 2021

mattklein123 added the no stalebot Disables stalebot from closing an issue label Jan 7, 2021

mattklein123 assigned asraa Mar 8, 2021

asraa closed this as completed Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[security] Error handling policy and macro usage clarification #14190

[security] Error handling policy and macro usage clarification #14190

asraa commented Nov 25, 2020 •

edited

Loading

htuch commented Nov 25, 2020

mattklein123 commented Nov 25, 2020

github-actions bot commented Dec 26, 2020

github-actions bot commented Jan 2, 2021

asraa commented Mar 8, 2021

[security] Error handling policy and macro usage clarification #14190

[security] Error handling policy and macro usage clarification #14190

Comments

asraa commented Nov 25, 2020 • edited Loading

htuch commented Nov 25, 2020

mattklein123 commented Nov 25, 2020

github-actions bot commented Dec 26, 2020

github-actions bot commented Jan 2, 2021

asraa commented Mar 8, 2021

asraa commented Nov 25, 2020 •

edited

Loading