diff --git a/docs/decisions/README.md b/docs/decisions/README.md new file mode 100644 index 00000000000..8844bc7c156 --- /dev/null +++ b/docs/decisions/README.md @@ -0,0 +1,22 @@ +# Decision Log + +We capture important decisions with [architectural decision records](https://adr.github.io/). + +These records provide context, trade-offs, and reasoning taken at our community & technical cross-roads. Our goal is to preserve the understanding of the project growth, and capture enough insight to effectively revisit previous decisions. + +Get started created a new decision record with the template: + +```sh +cp template.md NNNN-title-with-dashes.md +``` + +For more rational behind this approach, see [Michael Nygard's article](http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions). + +We've inherited MADR [ADR template](https://adr.github.io/madr/), which is a bit more verbose than Nygard's original template. We may simplify it in the future. + +## Evolving Decisions + +Many decisions build on each other, a driver of iterative change and messiness +in software. By laying out the "story arc" of a particular system within the +application, we hope future maintainers will be able to identify how to rewind +decisions when refactoring the application becomes necessary. diff --git a/docs/decisions/devops/0001-docker-high-uid.md b/docs/decisions/devops/0001-docker-high-uid.md new file mode 100644 index 00000000000..687e1569019 --- /dev/null +++ b/docs/decisions/devops/0001-docker-high-uid.md @@ -0,0 +1,51 @@ +--- +status: accepted +date: 2025-02-28 +story: Appropriate UID/GID values for container users +--- + +# Use High UID/GID Values for Container Users + +## Context & Problem Statement + +Docker containers share the host's user namespace by default. If container UIDs/GIDs overlap with privileged host accounts, this could lead to privilege escalation if a container escape vulnerability is exploited. Low UIDs (especially in the system user range of 100-999) are particularly risky as they often map to privileged system users on the host. + +Our previous approach used UID/GID 101 with the `--system` flag for user creation, which falls within the system user range and could potentially overlap with critical system users on the host. + +## Priorities & Constraints + +* Enhance security by reducing the risk of container user namespace overlaps +* Avoid warnings during container build related to system user ranges +* Maintain compatibility with common Docker practices +* Prevent potential privilege escalation in case of container escape + +## Considered Options + +* Option 1: Keep using low UID/GID (101) with `--system` flag +* Option 2: Use unprivileged UID/GID (1000+) without `--system` flag +* Option 3: Use high UID/GID (10000+) without `--system` flag + +## Decision Outcome + +Chosen option: [Option 3: Use high UID/GID (10000+) without `--system` flag] + +We decided to: + +1. Change the default UID/GID from 101 to 10001 +2. Remove the `--system` flag from user/group creation commands +3. Document the security rationale for these changes + +This approach significantly reduces the risk of UID/GID collision with host system users while avoiding build-time warnings related to system user ranges. Using a very high UID/GID (10001) provides an additional security boundary in containers where user namespaces are shared with the host. + +### Expected Consequences + +* Improved security posture by reducing the risk of container escapes leading to privilege escalation +* Elimination of build-time warnings related to system user UID/GID ranges +* Consistency with industry best practices for container security +* No functional impact on container operation, as the internal user permissions remain the same + +## More Information + +* [NGINX Docker User ID Issue](https://github.com/nginxinc/docker-nginx/issues/490) - Demonstrates the risks of using UID 101 which overlaps with `systemd-network` user on Debian systems +* [.NET Docker Issue on System Users](https://github.com/dotnet/dotnet-docker/issues/4624) - Details the problems with using `--system` flag and the SYS_UID_MAX warnings +* [Docker Security Best Practices](https://docs.docker.com/develop/security-best-practices/) - General security recommendations for Docker containers diff --git a/docs/decisions/devops/0002-docker-use-gosu.md b/docs/decisions/devops/0002-docker-use-gosu.md new file mode 100644 index 00000000000..0bdd2931f89 --- /dev/null +++ b/docs/decisions/devops/0002-docker-use-gosu.md @@ -0,0 +1,51 @@ +--- +status: accepted +date: 2025-02-28 +story: Volumes permissions and privilege management in container entrypoint +--- + +# Use gosu for Privilege Dropping in Entrypoint + +## Context & Problem Statement + +Running containerized applications as the root user is a security risk. If an attacker compromises the application, they gain root access within the container, potentially facilitating a container escape. However, some operations during container startup, such as creating directories or modifying file permissions in locations not owned by the application user, require root privileges. We need a way to perform these initial setup tasks as root, but then switch to a non-privileged user *before* executing the main application (`zebrad`). Using `USER` in the Dockerfile is insufficient because it applies to the entire runtime, and we need to change permissions *after* volumes are mounted. + +## Priorities & Constraints + +* Minimize the security risk by running the main application (`zebrad`) as a non-privileged user. +* Allow initial setup tasks (file/directory creation, permission changes) that require root privileges. +* Maintain a clean and efficient entrypoint script. +* Avoid complex signal handling and TTY issues associated with `su` and `sudo`. +* Ensure 1:1 parity with Docker's `--user` flag behavior. + +## Considered Options + +* Option 1: Use `USER` directive in Dockerfile. +* Option 2: Use `su` within the entrypoint script. +* Option 3: Use `sudo` within the entrypoint script. +* Option 4: Use `gosu` within the entrypoint script. +* Option 5: Use `chroot --userspec` +* Option 6: Use `setpriv` + +## Decision Outcome + +Chosen option: [Option 4: Use `gosu` within the entrypoint script] + +We chose to use `gosu` because it provides a simple and secure way to drop privileges from root to a non-privileged user *after* performing necessary setup tasks. `gosu` avoids the TTY and signal-handling complexities of `su` and `sudo`. It's designed specifically for this use case (dropping privileges in container entrypoints) and leverages the same underlying mechanisms as Docker itself for user/group handling, ensuring consistent behavior. + +### Expected Consequences + +* Improved security by running `zebrad` as a non-privileged user. +* Simplified entrypoint script compared to using `su` or `sudo`. +* Avoidance of TTY and signal-handling issues. +* Consistent behavior with Docker's `--user` flag. +* No negative impact on functionality, as initial setup tasks can still be performed. + +## More Information + +* [gosu GitHub repository](https://github.com/tianon/gosu#why) - Explains the rationale behind `gosu` and its advantages over `su` and `sudo`. +* [gosu usage warning](https://github.com/tianon/gosu#warning) - Highlights the core use case (stepping down from root) and potential vulnerabilities in other scenarios. +* Alternatives considered: + * `chroot --userspec`: While functional, it's less common and less directly suited to this specific task than `gosu`. + * `setpriv`: A viable alternative, but `gosu` is already well-established in our workflow and offers the desired functionality with a smaller footprint than a full `util-linux` installation. + * `su-exec`: Another minimal alternative, but it has known parser bugs that could lead to unexpected root execution. diff --git a/docs/decisions/devops/0003-filesystem-hierarchy.md b/docs/decisions/devops/0003-filesystem-hierarchy.md new file mode 100644 index 00000000000..13c626dec5e --- /dev/null +++ b/docs/decisions/devops/0003-filesystem-hierarchy.md @@ -0,0 +1,115 @@ +--- +status: proposed +date: 2025-02-28 +story: Standardize filesystem hierarchy for Zebra deployments +--- + +# Standardize Filesystem Hierarchy: FHS vs. XDG + +## Context & Problem Statement + +Zebra currently has inconsistencies in its filesystem layout, particularly regarding where configuration, data, cache files, and binaries are stored. We need a standardized approach compatible with: + +1. Traditional Linux systems. +2. Containerized deployments (Docker). +3. Cloud environments with stricter filesystem restrictions (e.g., Google's Container-Optimized OS). + +We previously considered using the Filesystem Hierarchy Standard (FHS) exclusively ([Issue #3432](https://github.com/ZcashFoundation/zebra/issues/3432)). However, recent changes introduced the XDG Base Directory Specification, which offers a user-centric approach. We need to decide whether to: + +* Adhere to FHS. +* Adopt XDG Base Directory Specification. +* Use a hybrid approach, leveraging the strengths of both. + +The choice impacts how we structure our Docker images, where configuration files are located, and how users interact with Zebra in different environments. + +## Priorities & Constraints + +* **Security:** Minimize the risk of privilege escalation by adhering to least-privilege principles. +* **Maintainability:** Ensure a clear and consistent filesystem layout that is easy to understand and maintain. +* **Compatibility:** Work seamlessly across various Linux distributions, Docker, and cloud environments (particularly those with restricted filesystems like Google's Container-Optimized OS). +* **User Experience:** Provide a predictable and user-friendly experience for locating configuration and data files. +* **Flexibility:** Allow users to override default locations via environment variables where appropriate. +* **Avoid Breaking Changes:** Minimize disruption to existing users and deployments, if possible. + +## Considered Options + +### Option 1: FHS + +* Configuration: `/etc/zebrad/` +* Data: `/var/lib/zebrad/` +* Cache: `/var/cache/zebrad/` +* Logs: `/var/log/zebrad/` +* Binary: `/opt/zebra/bin/zebrad` or `/usr/local/bin/zebrad` + +### Option 2: XDG Base Directory Specification + +* Configuration: `$HOME/.config/zebrad/` +* Data: `$HOME/.local/share/zebrad/` +* Cache: `$HOME/.cache/zebrad/` +* State: `$HOME/.local/state/zebrad/` +* Binary: `$HOME/.local/bin/zebrad` or `/usr/local/bin/zebrad` + +### Option 3: Hybrid Approach (FHS for System-Wide, XDG for User-Specific) + +* System-wide configuration: `/etc/zebrad/` +* User-specific configuration: `$XDG_CONFIG_HOME/zebrad/` +* System-wide data (read-only, shared): `/usr/share/zebrad/` (e.g., checkpoints) +* User-specific data: `$XDG_DATA_HOME/zebrad/` +* Cache: `$XDG_CACHE_HOME/zebrad/` +* State: `$XDG_STATE_HOME/zebrad/` +* Runtime: `$XDG_RUNTIME_DIR/zebrad/` +* Binary: `/opt/zebra/bin/zebrad` (system-wide) or `$HOME/.local/bin/zebrad` (user-specific) + +## Pros and Cons of the Options + +### FHS + +* **Pros:** + * Traditional and well-understood by system administrators. + * Clear separation of configuration, data, cache, and binaries. + * Suitable for packaged software installations. + +* **Cons:** + * Less user-friendly; requires root access to modify configuration. + * Can conflict with stricter cloud environments restricting writes to `/etc` and `/var`. + * Doesn't handle multi-user scenarios as gracefully as XDG. + +### XDG Base Directory Specification + +* **Pros:** + * User-centric: configuration and data stored in user-writable locations. + * Better suited for containerized and cloud environments. + * Handles multi-user scenarios gracefully. + * Clear separation of configuration, data, cache, and state. + +* **Cons:** + * Less traditional; might be unfamiliar to some system administrators. + * Requires environment variables to be set correctly. + * Binary placement less standardized. + +### Hybrid Approach (FHS for System-Wide, XDG for User-Specific) + +* **Pros:** + * Combines strengths of FHS and XDG. + * Allows system-wide defaults while prioritizing user-specific configurations. + * Flexible and adaptable to different deployment scenarios. + * Clear binary placement in `/opt`. + +* **Cons:** + * More complex than either FHS or XDG alone. + * Requires careful consideration of precedence rules. + +## Decision Outcome + +Pending + +## Expected Consequences + +Pending + +## More Information + +* [Filesystem Hierarchy Standard (FHS) v3.0](https://refspecs.linuxfoundation.org/FHS_3.0/fhs-3.0.html) +* [XDG Base Directory Specification](https://specifications.freedesktop.org/basedir-spec/latest/) +* [Zebra Issue #3432: Use the Filesystem Hierarchy Standard (FHS) for deployments and artifacts](https://github.com/ZcashFoundation/zebra/issues/3432) +* [Google Container-Optimized OS: Working with the File System](https://cloud.google.com/container-optimized-os/docs/concepts/disks-and-filesystem#working_with_the_file_system) diff --git a/docs/decisions/template.md b/docs/decisions/template.md new file mode 100644 index 00000000000..8b1b61f2e09 --- /dev/null +++ b/docs/decisions/template.md @@ -0,0 +1,49 @@ +--- +# status and date are the only required elements. Feel free to remove the rest. +status: {[proposed | rejected | accepted | deprecated | … | superseded by [ADR-NAME](adr-file-name.md)]} +date: {YYYY-MM-DD when the decision was last updated} +builds-on: {[Short Title](2021-05-15-short-title.md)} +story: {description or link to contextual issue} +--- + +# {short title of solved problem and solution} + +## Context and Problem Statement + +{2-3 sentences explaining the problem and the forces influencing the decision.} + + +## Priorities & Constraints + +* {List of concerns or constraints} +* {Factors influencing the decision} + +## Considered Options + +* Option 1: Thing +* Option 2: Another + +### Pros and Cons of the Options + +#### Option 1: {Brief description} + +* Good, because {reason} +* Bad, because {reason} + +## Decision Outcome + +Chosen option [Option 1: Thing] + +{Clearly state the chosen option and provide justification. Reference the "Pros and Cons of the Options" section below if applicable.} + +### Expected Consequences + +* List of outcomes resulting from this decision + + +## More Information + + + + +