revisit sops handling; handle binary case better #24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background of this PR
SOPS currently supports the following input and output types (both for encryption and decryption):
json
yaml
dotenv
binary
(where the input type is defaulted by file extension, and the output type is defaulted by the input type, if unspecified).
Encryption
For the first three input types, SOPS tries to preserve the structure of the original content, and encrypts only leaf nodes of the content (configurable with various switches, such as
--encrytped-regex
...). For thebinary
input type, the whole input content is encrypted, and SOPS produces a data object which holds only one key,data
, whose value is the encrypted input content. That data object is then serialized to the specified output format (defaultbinary
which means that it produces a JSON). Note that, in general, there seems to be no difference between thejson
andbinary
output formats.Decryption
Similar to the encryption case, SOPS defaults the input type (the type of the encrypted content) by file extension; if that doesn't work, it assumes the
binary
format for input, which is decoding-wise the same asjson
. The output type is defaulted by the input type, if not specified. If then, with the assumed input type, the decryption worked, then SOPS formats the output with the assumed output format. If that happens to bebinary
, SOPS assumes that the decrypted object has a key calleddata
, and returns that key's content only instead of the whole object.Problem
And with the binary input now the trouble starts for us. The decryption implementation in this repository (which is borrowed from flux) does not rely on file extensions to determine the input type; instead, an auto-detection is used, checking the presence of certain magic marker bytes. Now, because of the way how SOPS output looks like, it is not possible for the current auto-detection code to distinguish between encrypted content that was produced through a structured (
json
,yaml
,dotenv
) or abinary
encryption.Assume the following extreme example: a binary original file was encrypted by SOPS, and is now stored in the following encrypted form (without meaningful file extension):
The current auto-detection is then not able to decide if that is a
binary
encrypted content, or if the original was a JSON document having just one string-valued key nameddata
. Depending on whatever is then assumed, the decryption returns the whole decrypted object (including thedata
key), or just the value of thedata
key.But even if there are different top-level keys in the encrypted object (which actually indicates that the source was not
binary
encrypted), the current implementation would not detect that, and might try to decrypt it asbinary
(which would fail). Or in the other direction, it might mistakenly recognise abinary
encrypted content asjson
, and return the whole object (not stripping thedata
key).Unfortunately, the current implementation is not even predictable (because it depends on the order of looping through a Go map).
Solution proposed by the PR
Since, to our understanding, it is actually impossible to decide whether a given SOPS encrypted object was encrypted as binary or not, we make the following assumption (which should be valid in all (at least our) cases):