Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(decisions): add architectural decision records structure #9310

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/decisions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Decision Log

We capture important decisions with [architectural decision records](https://adr.github.io/).

These records provide context, trade-offs, and reasoning taken at our community & technical cross-roads. Our goal is to preserve the understanding of the project growth, and capture enough insight to effectively revisit previous decisions.

Get started created a new decision record with the template:

```sh
cp template.md NNNN-title-with-dashes.md
```

For more rational behind this approach, see [Michael Nygard's article](http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions).

We've inherited MADR [ADR template](https://adr.github.io/madr/), which is a bit more verbose than Nygard's original template. We may simplify it in the future.

## Evolving Decisions

Many decisions build on each other, a driver of iterative change and messiness
in software. By laying out the "story arc" of a particular system within the
application, we hope future maintainers will be able to identify how to rewind
decisions when refactoring the application becomes necessary.
51 changes: 51 additions & 0 deletions docs/decisions/devops/0001-docker-high-uid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
status: accepted
date: 2025-02-28
story: Appropriate UID/GID values for container users
---

# Use High UID/GID Values for Container Users

## Context & Problem Statement

Docker containers share the host's user namespace by default. If container UIDs/GIDs overlap with privileged host accounts, this could lead to privilege escalation if a container escape vulnerability is exploited. Low UIDs (especially in the system user range of 100-999) are particularly risky as they often map to privileged system users on the host.

Our previous approach used UID/GID 101 with the `--system` flag for user creation, which falls within the system user range and could potentially overlap with critical system users on the host.

## Priorities & Constraints

* Enhance security by reducing the risk of container user namespace overlaps
* Avoid warnings during container build related to system user ranges
* Maintain compatibility with common Docker practices
* Prevent potential privilege escalation in case of container escape

## Considered Options

* Option 1: Keep using low UID/GID (101) with `--system` flag
* Option 2: Use unprivileged UID/GID (1000+) without `--system` flag
* Option 3: Use high UID/GID (10000+) without `--system` flag

## Decision Outcome

Chosen option: [Option 3: Use high UID/GID (10000+) without `--system` flag]

We decided to:

1. Change the default UID/GID from 101 to 10001
2. Remove the `--system` flag from user/group creation commands
3. Document the security rationale for these changes

This approach significantly reduces the risk of UID/GID collision with host system users while avoiding build-time warnings related to system user ranges. Using a very high UID/GID (10001) provides an additional security boundary in containers where user namespaces are shared with the host.

### Expected Consequences

* Improved security posture by reducing the risk of container escapes leading to privilege escalation
* Elimination of build-time warnings related to system user UID/GID ranges
* Consistency with industry best practices for container security
* No functional impact on container operation, as the internal user permissions remain the same

## More Information

* [NGINX Docker User ID Issue](https://github.com/nginxinc/docker-nginx/issues/490) - Demonstrates the risks of using UID 101 which overlaps with `systemd-network` user on Debian systems
* [.NET Docker Issue on System Users](https://github.com/dotnet/dotnet-docker/issues/4624) - Details the problems with using `--system` flag and the SYS_UID_MAX warnings
* [Docker Security Best Practices](https://docs.docker.com/develop/security-best-practices/) - General security recommendations for Docker containers
51 changes: 51 additions & 0 deletions docs/decisions/devops/0002-docker-use-gosu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
status: accepted
date: 2025-02-28
story: Volumes permissions and privilege management in container entrypoint
---

# Use gosu for Privilege Dropping in Entrypoint

## Context & Problem Statement

Running containerized applications as the root user is a security risk. If an attacker compromises the application, they gain root access within the container, potentially facilitating a container escape. However, some operations during container startup, such as creating directories or modifying file permissions in locations not owned by the application user, require root privileges. We need a way to perform these initial setup tasks as root, but then switch to a non-privileged user *before* executing the main application (`zebrad`). Using `USER` in the Dockerfile is insufficient because it applies to the entire runtime, and we need to change permissions *after* volumes are mounted.

## Priorities & Constraints

* Minimize the security risk by running the main application (`zebrad`) as a non-privileged user.
* Allow initial setup tasks (file/directory creation, permission changes) that require root privileges.
* Maintain a clean and efficient entrypoint script.
* Avoid complex signal handling and TTY issues associated with `su` and `sudo`.
* Ensure 1:1 parity with Docker's `--user` flag behavior.

## Considered Options

* Option 1: Use `USER` directive in Dockerfile.
* Option 2: Use `su` within the entrypoint script.
* Option 3: Use `sudo` within the entrypoint script.
* Option 4: Use `gosu` within the entrypoint script.
* Option 5: Use `chroot --userspec`
* Option 6: Use `setpriv`

## Decision Outcome

Chosen option: [Option 4: Use `gosu` within the entrypoint script]

We chose to use `gosu` because it provides a simple and secure way to drop privileges from root to a non-privileged user *after* performing necessary setup tasks. `gosu` avoids the TTY and signal-handling complexities of `su` and `sudo`. It's designed specifically for this use case (dropping privileges in container entrypoints) and leverages the same underlying mechanisms as Docker itself for user/group handling, ensuring consistent behavior.

### Expected Consequences

* Improved security by running `zebrad` as a non-privileged user.
* Simplified entrypoint script compared to using `su` or `sudo`.
* Avoidance of TTY and signal-handling issues.
* Consistent behavior with Docker's `--user` flag.
* No negative impact on functionality, as initial setup tasks can still be performed.

## More Information

* [gosu GitHub repository](https://github.com/tianon/gosu#why) - Explains the rationale behind `gosu` and its advantages over `su` and `sudo`.
* [gosu usage warning](https://github.com/tianon/gosu#warning) - Highlights the core use case (stepping down from root) and potential vulnerabilities in other scenarios.
* Alternatives considered:
* `chroot --userspec`: While functional, it's less common and less directly suited to this specific task than `gosu`.
* `setpriv`: A viable alternative, but `gosu` is already well-established in our workflow and offers the desired functionality with a smaller footprint than a full `util-linux` installation.
* `su-exec`: Another minimal alternative, but it has known parser bugs that could lead to unexpected root execution.
115 changes: 115 additions & 0 deletions docs/decisions/devops/0003-filesystem-hierarchy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
status: proposed
date: 2025-02-28
story: Standardize filesystem hierarchy for Zebra deployments
---

# Standardize Filesystem Hierarchy: FHS vs. XDG

## Context & Problem Statement

Zebra currently has inconsistencies in its filesystem layout, particularly regarding where configuration, data, cache files, and binaries are stored. We need a standardized approach compatible with:

1. Traditional Linux systems.
2. Containerized deployments (Docker).
3. Cloud environments with stricter filesystem restrictions (e.g., Google's Container-Optimized OS).

We previously considered using the Filesystem Hierarchy Standard (FHS) exclusively ([Issue #3432](https://github.com/ZcashFoundation/zebra/issues/3432)). However, recent changes introduced the XDG Base Directory Specification, which offers a user-centric approach. We need to decide whether to:

* Adhere to FHS.
* Adopt XDG Base Directory Specification.
* Use a hybrid approach, leveraging the strengths of both.

The choice impacts how we structure our Docker images, where configuration files are located, and how users interact with Zebra in different environments.

## Priorities & Constraints

* **Security:** Minimize the risk of privilege escalation by adhering to least-privilege principles.
* **Maintainability:** Ensure a clear and consistent filesystem layout that is easy to understand and maintain.
* **Compatibility:** Work seamlessly across various Linux distributions, Docker, and cloud environments (particularly those with restricted filesystems like Google's Container-Optimized OS).
* **User Experience:** Provide a predictable and user-friendly experience for locating configuration and data files.
* **Flexibility:** Allow users to override default locations via environment variables where appropriate.
* **Avoid Breaking Changes:** Minimize disruption to existing users and deployments, if possible.

## Considered Options

### Option 1: FHS

* Configuration: `/etc/zebrad/`
* Data: `/var/lib/zebrad/`
* Cache: `/var/cache/zebrad/`
* Logs: `/var/log/zebrad/`
* Binary: `/opt/zebra/bin/zebrad` or `/usr/local/bin/zebrad`

### Option 2: XDG Base Directory Specification

* Configuration: `$HOME/.config/zebrad/`
* Data: `$HOME/.local/share/zebrad/`
* Cache: `$HOME/.cache/zebrad/`
* State: `$HOME/.local/state/zebrad/`
* Binary: `$HOME/.local/bin/zebrad` or `/usr/local/bin/zebrad`

### Option 3: Hybrid Approach (FHS for System-Wide, XDG for User-Specific)

* System-wide configuration: `/etc/zebrad/`
* User-specific configuration: `$XDG_CONFIG_HOME/zebrad/`
* System-wide data (read-only, shared): `/usr/share/zebrad/` (e.g., checkpoints)
* User-specific data: `$XDG_DATA_HOME/zebrad/`
* Cache: `$XDG_CACHE_HOME/zebrad/`
* State: `$XDG_STATE_HOME/zebrad/`
* Runtime: `$XDG_RUNTIME_DIR/zebrad/`
* Binary: `/opt/zebra/bin/zebrad` (system-wide) or `$HOME/.local/bin/zebrad` (user-specific)

## Pros and Cons of the Options

### FHS

* **Pros:**
* Traditional and well-understood by system administrators.
* Clear separation of configuration, data, cache, and binaries.
* Suitable for packaged software installations.

* **Cons:**
* Less user-friendly; requires root access to modify configuration.
* Can conflict with stricter cloud environments restricting writes to `/etc` and `/var`.
* Doesn't handle multi-user scenarios as gracefully as XDG.

### XDG Base Directory Specification

* **Pros:**
* User-centric: configuration and data stored in user-writable locations.
* Better suited for containerized and cloud environments.
* Handles multi-user scenarios gracefully.
* Clear separation of configuration, data, cache, and state.

* **Cons:**
* Less traditional; might be unfamiliar to some system administrators.
* Requires environment variables to be set correctly.
* Binary placement less standardized.

### Hybrid Approach (FHS for System-Wide, XDG for User-Specific)

* **Pros:**
* Combines strengths of FHS and XDG.
* Allows system-wide defaults while prioritizing user-specific configurations.
* Flexible and adaptable to different deployment scenarios.
* Clear binary placement in `/opt`.

* **Cons:**
* More complex than either FHS or XDG alone.
* Requires careful consideration of precedence rules.

## Decision Outcome

Pending

## Expected Consequences

Pending

## More Information

* [Filesystem Hierarchy Standard (FHS) v3.0](https://refspecs.linuxfoundation.org/FHS_3.0/fhs-3.0.html)
* [XDG Base Directory Specification](https://specifications.freedesktop.org/basedir-spec/latest/)
* [Zebra Issue #3432: Use the Filesystem Hierarchy Standard (FHS) for deployments and artifacts](https://github.com/ZcashFoundation/zebra/issues/3432)
* [Google Container-Optimized OS: Working with the File System](https://cloud.google.com/container-optimized-os/docs/concepts/disks-and-filesystem#working_with_the_file_system)
49 changes: 49 additions & 0 deletions docs/decisions/template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
# status and date are the only required elements. Feel free to remove the rest.
status: {[proposed | rejected | accepted | deprecated | … | superseded by [ADR-NAME](adr-file-name.md)]}
date: {YYYY-MM-DD when the decision was last updated}
builds-on: {[Short Title](2021-05-15-short-title.md)}
story: {description or link to contextual issue}
---

# {short title of solved problem and solution}

## Context and Problem Statement

{2-3 sentences explaining the problem and the forces influencing the decision.}
<!-- The language in this section is value-neutral. It is simply describing facts. -->

## Priorities & Constraints <!-- optional -->

* {List of concerns or constraints}
* {Factors influencing the decision}

## Considered Options

* Option 1: Thing
* Option 2: Another

### Pros and Cons of the Options <!-- optional -->

#### Option 1: {Brief description}

* Good, because {reason}
* Bad, because {reason}

## Decision Outcome

Chosen option [Option 1: Thing]

{Clearly state the chosen option and provide justification. Reference the "Pros and Cons of the Options" section below if applicable.}

### Expected Consequences <!-- optional -->

* List of outcomes resulting from this decision
<!-- Positive, negative, and/or neutral consequences, as long as they affect the team and project in the future. -->

## More Information <!-- optional -->

<!-- * Resources reviewed as part of making this decision -->
<!-- * Links to any supporting documents or resources -->
<!-- * Related PRs -->
<!-- * Related User Journeys -->
Loading