Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[security] Error handling policy and macro usage clarification #14190

Closed
4 tasks done
asraa opened this issue Nov 25, 2020 · 5 comments
Closed
4 tasks done

[security] Error handling policy and macro usage clarification #14190

asraa opened this issue Nov 25, 2020 · 5 comments
Assignees
Labels
area/security design proposal Needs design doc/proposal before implementation no stalebot Disables stalebot from closing an issue

Comments

@asraa
Copy link
Contributor

asraa commented Nov 25, 2020

Error Handling and Macro Usage Policy Updates and Clarifications

Description:
Envoy contains over 1000 ASSERTs in core code, and over 400 exist in router and HTTP components. In some cases, ASSERTs are not used as statements of true invariants, but used incorrectly to indicate possible errors on the data plane. Especially as Envoy moves towards a hardened posture against extensions, dependencies, and upstreams, macro usage and when proper error handling is needed should be clarified. There is some evidence that ASSERTs are not consistently used for true invariants, either because of wishful assumptions on incoming traffic or from mistaking the invariant (GHSA-gxvv-x4p2-rppp, #13756, #9996, #9509).

In addition to broadening and clarifying types of errors that should be handled, this proposal also suggests adding guidelines for using ENVOY_BUGs, which can be used in addition to error handling and as a way to detect violations in release mode. This could especially help debugging complex crashes, where a condition was violated at some earlier point in time.

A link the the full proposal is here.

Summary
A high level of the changes, with significant differences bolded.

Gracefully handle all errors that may be caused by

  • Untrusted data plane traffic (from downstreams, upstreams, or extensions like filters)
  • Raised by the Envoy process environment and are plausible to happen (reading bad data from a runtime file, system calls that require permissions, network errors).
  • Configuration related errors after startup (e.g. loading configuration at runtime, this is implicit from above)
  • Third party dependency return codes (e.g. use of a 3P parser, an external RPC that requires permissions). Some return codes may be handled simply by continuing (e.g. calling an RPC out of process can be handled by “continuing”).

Macros:

  • RELEASE_ASSERT: fatal check
  • ASSERT: compiled out of opt mode, fatal in debug mode
  • ENVOY_BUG: compiled in opt mode and log, fatal in debug mode
    - ENVOY_BUG_ALPHA (effectively ENVOY_BUG): Used for alpha/rapidly changing protocols that want to have detectability on probable invariants.
  • ENVOY_BUG_INTERNAL (compiled by default to ASSERT, can be compiled to ENVOY_BUG): provable, bounded internal invariant. No error handling needed. If this is a weakly held belief that would result in misbehavior if triggered, use ENVOY_BUG and add error handling.

Some clarifications/additions in macro usage philosophy:

  • ENVOY_BUGs provide detectability and more confidence than an ASSERT. This is useful for non-trivial conditions, those with complex control flow, and in rapidly changing protocols. Testing should be added to ensures Envoy can continue to operate even if an ENVOY_BUG condition is hit. Otherwise, there is no point in having it compile in release mode and increment a stat on failure.
  • Do not ASSERT on conditions imposed by the external environment. Add error handling.
  • ASSERTs should be standalone and understandable to a reader. Add comments.
  • ASSERTs should be robust to future changes. For example, an ASSERT can document the internal state of a class or an internal post-condition that always holds due to a preceding call.

In addition, the main addition to the proposal is having clear guidelines. A high level bucketing can show potential usages of ASSERT/RELEASE_ASSERT, ENVOY_BUG, and error handling. There is always room for cross-over, dotted lines indicate places where that may happen.

image

Open Questions

  • As time goes on, ENVOY_BUGs should be monitored in production and triaged upstream if needed. Logging levels/severity levels for ENVOY_BUGs may be useful?
  • Can we provide tools to help users monitor ENVOY_BUG performance? Should incrementing a stat be configurable in case this is too expensive?

Proposed Actions

  • Reach consensus on some of the additional places errors should be handled
  • Update STYLE.md guidelines with examples
  • Add and document ENVOY_BUG_INTERNAL/ASSERT_INVARIANT for internal class invariants that is ASSERT by default but can be configured to ENVOY_BUG.
  • Hopefully ask reviewers and developers to move towards these standards over time

Relevant Links
#9087

@asraa asraa added area/security design proposal Needs design doc/proposal before implementation labels Nov 25, 2020
@htuch
Copy link
Member

htuch commented Nov 25, 2020

@envoyproxy/maintainers for review

@mattklein123
Copy link
Member

Overall looks fine to me. I left one comment in the doc about ENVOY_BUG_INTERNAL vs. ASSERT and how we might avoid bike-shedding.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 26, 2020
@github-actions
Copy link

github-actions bot commented Jan 2, 2021

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as completed Jan 2, 2021
@asraa asraa removed the stale stalebot believes this issue/PR has not been touched recently label Jan 5, 2021
@mattklein123 mattklein123 reopened this Jan 7, 2021
@mattklein123 mattklein123 added the no stalebot Disables stalebot from closing an issue label Jan 7, 2021
@asraa
Copy link
Contributor Author

asraa commented Mar 8, 2021

Fixed by #14575
#14800

@asraa asraa closed this as completed Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security design proposal Needs design doc/proposal before implementation no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests

3 participants