Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit sops handling; handle binary case better #24

Merged
merged 2 commits into from
Dec 9, 2024
Merged

Conversation

cbarbian-sap
Copy link
Contributor

@cbarbian-sap cbarbian-sap commented Dec 8, 2024

Background of this PR

SOPS currently supports the following input and output types (both for encryption and decryption):

  • json
  • yaml
  • dotenv
  • binary

(where the input type is defaulted by file extension, and the output type is defaulted by the input type, if unspecified).

Encryption

For the first three input types, SOPS tries to preserve the structure of the original content, and encrypts only leaf nodes of the content (configurable with various switches, such as --encrytped-regex ...). For the binary input type, the whole input content is encrypted, and SOPS produces a data object which holds only one key, data, whose value is the encrypted input content. That data object is then serialized to the specified output format (default binary which means that it produces a JSON). Note that, in general, there seems to be no difference between the json and binary output formats.

Decryption

Similar to the encryption case, SOPS defaults the input type (the type of the encrypted content) by file extension; if that doesn't work, it assumes the binary format for input, which is decoding-wise the same as json. The output type is defaulted by the input type, if not specified. If then, with the assumed input type, the decryption worked, then SOPS formats the output with the assumed output format. If that happens to be binary, SOPS assumes that the decrypted object has a key called data, and returns that key's content only instead of the whole object.

Problem

And with the binary input now the trouble starts for us. The decryption implementation in this repository (which is borrowed from flux) does not rely on file extensions to determine the input type; instead, an auto-detection is used, checking the presence of certain magic marker bytes. Now, because of the way how SOPS output looks like, it is not possible for the current auto-detection code to distinguish between encrypted content that was produced through a structured (json, yaml, dotenv) or a binary encryption.

Assume the following extreme example: a binary original file was encrypted by SOPS, and is now stored in the following encrypted form (without meaningful file extension):

{
  "data": "ENC[...",
  "sops": {

  }
}

The current auto-detection is then not able to decide if that is a binary encrypted content, or if the original was a JSON document having just one string-valued key named data. Depending on whatever is then assumed, the decryption returns the whole decrypted object (including the data key), or just the value of the data key.
But even if there are different top-level keys in the encrypted object (which actually indicates that the source was not binary encrypted), the current implementation would not detect that, and might try to decrypt it as binary (which would fail). Or in the other direction, it might mistakenly recognise a binary encrypted content as json, and return the whole object (not stripping the data key).
Unfortunately, the current implementation is not even predictable (because it depends on the order of looping through a Go map).

Solution proposed by the PR

Since, to our understanding, it is actually impossible to decide whether a given SOPS encrypted object was encrypted as binary or not, we make the following assumption (which should be valid in all (at least our) cases):

Whenever a SOPS encrypted object has exactly one key (besides the sops key), and if that key is named data and if that key is string-valued, then we assume that that this is a binary-encrypted original. Which means that, after decrypting, we return only the value of that data key.

@cbarbian-sap cbarbian-sap merged commit 77f5586 into main Dec 9, 2024
7 checks passed
@cbarbian-sap cbarbian-sap deleted the revisit-sops branch December 9, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants