Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard BNF for 256-bit primary key. #26

Merged
merged 17 commits into from
Jan 14, 2025

Conversation

michaelcfanning
Copy link
Member

@michaelcfanning michaelcfanning commented Jan 10, 2025

This change adds a spare markdown standard BNF specification for CASK keys.

The markdown additionally describes how to dereference/decompose an encoded form of a key or its bytewise (decoded) form to obtain constituent elements.

This change also adds a naive test helper that breaks down a 3-byte sequence into 4 easily obtained 6-bit values (rendered as bytes). It is not clear this code should be productized. This code is tested by a brute force unit tests that takes over 30 seconds on .NET framework to run (will be updated tomorrow to resolve this issue).

|-|-|-|-|-|
|key[0] - key[31]|0 - 256|0x0 - 0xFF|00000000b - 11111111b|256 bits of random data produced by a cryptographically secure RNG|
|key[32]|0d|0xFF|00000000b| A reserved byte to enforce 3-byte alignment.
|[optional 3-bye sequences]|0 - 256|0x0 - 0xFF|00000000b - 11111111b|Provider-defined data of arbitrary form.
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte

0 - 256255 #Resolved

Comment on lines 34 to 36
<key-type> ::= <256-bit-key> |

<256-bit-key> ::= 'A' | <256-bit-hash> | <384-bit-hash>
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should <256-bit-hash> and <384-bit-hash> be alternatives in the <key-type> definition rather than in <256-bit-key>? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does hash feature in the definition of primary keys?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, missed this, the hash stuff would only be relevant in an uber BNF. And maybe we should just have that? I think so. The only distinction between the key scenario and HMAC is that for the HMAC we have 12 bytes of c3id that immediately follow the randomized data. That's it. If we split the definitions, we'll have lots of duplication and have to make trivial fixes in two places, always a recipe for disaster.

|key[32]|0d|0xFF|00000000b| A reserved byte to enforce 3-byte alignment.
|[optional 3-bye sequences]|0 - 256|0x0 - 0xFF|00000000b - 11111111b|Provider-defined data of arbitrary form.
|key[keyLength - 6]|0|0xFF|00000000b - 11111100b| (KeyKind)key[keyLength - 6] >> 2 (leading 6 bits comprises kind enum + two bits of version number). Currently, [keyLength - 6] & 0xFC == 0 (as these bits are undefined in the version data)
|key[key.Length - 5]|0|0xFF|00000000b - 11110000b| (CaskVersion)([key.Length - 6] & 0xFC << 4) & key[key.Length - 5] >> 4) (leading 4 bits comprise final 4 bits of version + 4 bits of zero padding).
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numbers in different bases don't match.

key[key.Length - 6] & 0xFC << 4 2

Choose either keyLength or key.Length, and use it consistently. #Resolved

|[optional 3-bye sequences]|0 - 256|0x0 - 0xFF|00000000b - 11111111b|Provider-defined data of arbitrary form.
|key[keyLength - 6]|0|0xFF|00000000b - 11111100b| (KeyKind)key[keyLength - 6] >> 2 (leading 6 bits comprises kind enum + two bits of version number). Currently, [keyLength - 6] & 0xFC == 0 (as these bits are undefined in the version data)
|key[key.Length - 5]|0|0xFF|00000000b - 11110000b| (CaskVersion)([key.Length - 6] & 0xFC << 4) & key[key.Length - 5] >> 4) (leading 4 bits comprise final 4 bits of version + 4 bits of zero padding).
|key[keyLength - 4] - key[keyLength - 1]|0 - 256|0x0 - 0xFF|00000000b - 11111111b|CRC32(key[0] - key[key.Length - 5])
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

255 again.

Using - for subtraction and for ranges is confusing. Consider ellipsis for ranges. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and a good note re: ellipses. Thanks.


<payload-data> ::= <random-data> <reserved> [<optional-fields>] <cask-signature> <provider-id> <timestamp> <key-type> <version>

<random-data> ::= 42 * <base64url> <base64-two-zeros-suffix>; The total random data comprises 256 bits encoded as 42
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space before comment. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good eye

<payload-data> ::= <random-data> <reserved> [<optional-fields>] <cask-signature> <provider-id> <timestamp> <key-type> <version>

<random-data> ::= 42 * <base64url> <base64-two-zeros-suffix>; The total random data comprises 256 bits encoded as 42
; characters x 6 bit bits of random data = 252 bits and
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: line up semicolons. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

<optional-fields> ::= { <optional-field> } ; Zero or more 4-character (24 bit) sequences of optional data.

<optional-field> ::= 4 * <base64url> ; Each optional field is 4 characters (24 bits). This keeps
; data cleanly aligned along 3-byte/4-encoded character boundaries
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: line up semicolons. #Resolved

Comment on lines 34 to 36
<key-type> ::= <256-bit-key> |

<256-bit-key> ::= 'A' | <256-bit-hash> | <384-bit-hash>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does hash feature in the definition of primary keys?

/// Specifies a CASK hashed signature that incorporates a Keyed Hash
/// Message Authenticode of a derivation input generated by SHA-256
/// (HMAC-SHA-256) along with other CASK data such as the generated
/// C3ID of the CASK key used to initialize the HMAC instance.
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid referring to HMAC 'instance'. I think this is an implementation detail of legacy .NET crypto API. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this text in preference of your concise suggestion.

Bytes = bytes.ToArray();
}

public byte[] Bytes { get; }
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backing storage here could be int so that the data is inlined into the struct and this bit twiddling can be done without heap allocation. #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this had occurred to me. The int conversion is pesky since the byte-wise ordering (if the value is manipulated and needs to be reflected again as bytes) differs according to endianness. But with this as they are here, the data is immutable.

I took this initial approach as well because I want to see just how hard it is to dig around in the bytes if you are a key producer who is trying to leverage the readability of the encoded key (which is a primary value we're encouraging).


public byte[] Bytes { get; }

public string Encoded => Base64Url.EncodeToString(Bytes);
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid properties that allocate on every call. I suggest string Encode() method. The core API shouldn't use this and only allocate a single, complete key/hash string at the very end. If it's not a property, it will stick out more as an expense that can be avoided. #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about this one some more, want to get your thoughts. For now, I just made us accept the cost of generating this on struct creation.

/// </summary>
OneZeroZero,

Reserved = 0xfe,
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved should usually be avoided in enums. Is this API meant to stay internal?

We might consider putting these masks in separate constants if the enums will be made public. #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, next PR>

|Byte Range|Decimal|Hex|Binary|Description|
|-|-|-|-|-|
|key[0] - key[31]|0 - 256|0x0 - 0xFF|00000000b - 11111111b|256 bits of random data produced by a cryptographically secure RNG|
|key[32]|0d|0xFF|00000000b| A reserved byte to enforce 3-byte alignment.
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had said we'd fill the padding byte with random, I think, on our last call. I am just curious, did you change your mind? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk, relates to a necessary conversation around how to specify optional bytes. So, I was thinking, perhaps we allow for a single API that specifies your optional arguments. This data would be expressed as bytes. If you passed a single byte, and we had a free byte at the end of your random data, we'd use it. If you passed two bytes, and there were two free bytes of padding at the end of the random component (as there are for 16 and 64 byte allocations), we'd stuff your data there. In cases where the length of your optional data did not permit stuffing it in padding, and in cases where the user simply requested at least 3 bytes, we'd allocate a full 3-byte optional chunk. Any padding that might exist at the end of randomized data region for these cases could be zeroed out or could contain additional entropy.

Just thinking aloud. This is all contrary to our current API which requires specifying the optional data a string, which has advantages (in that it emphasizes we are looking for the encoded interpretability of this data). Following that model, if users provided a single character, and we had a single padding byte, we'd use that space. If the user provided a 2-char value and we had at least two bytes padding, we'd take that slot. If a user provided 3-chars, we'd pad this data with a leading 'A' and use a 3-byte sequence. For 4-byte char, a standard 3-byte sequence.


<cask-signature> ::= 'JQQJ' ; Fixed signature identifying the CASK key

<provider-id> ::= 4 * <base64url> ; Provider identifier (24 bits)
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called provider providerSignature in the API, should we change that to providerId or this to provider-signature? #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think? one reminder as we declare a formal naming discussion, this data is likely to be decomposed into two smaller components, a reserved company id and a per-company provider id. MS would likely reserve MS and AZ and require all these identifiers to be prefixed with this company reserved component. this is why i switched to id mentally but honestly that leap was a very tenuous one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, id is better.

Comment on lines 58 to 61
|Byte Range|Decimal|Hex|Binary|Description|
|-|-|-|-|-|
|decodedKey[0] - key[31]|0..255|0x0..0xFF|00000000b..11111111b|256 bits of random data produced by a cryptographically secure RNG|
|decodedKey[32]|0|0xFF|00000000b| A reserved byte to enforce 3-byte alignment. This data can be repurposed by providers (particularly the final base64 encoded character).
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to have "this data can be repurposed" in the spec.

#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's talk about this tomorrow. we have this spare data here which allows a provider to inject a sub-key kind or some other data, possible to encode 64 values. i have observed in other applications of this technique that it is very common that this is all that a provider needs of optional data. if we give them this free space, they don't need to commit to a full 3-byte optional sequence.

note that we can play the same trick with a 16 byte or 64-byte key, each of which will carry along 2 bytes of free padding. A 384 bit key falls on a clean 3-byte boundary and there can be no cuteness for this key length: you'd need to commit to a full 3 additional bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's talk. Writing down my thoughts in advance so I don't forget.

If something is unused now, I would like the spec to be maximally conservative and say only that it is reserved and must be zeroed out. If we have ideas on how me might use the area in the future, I'd track them elsewhere.

We can also explore and fully commit to this idea now. I think we would have to remove the alignment requirement on providerData arg and instead pad it as necessary. I don't think we can expose callers to choosing the storage for two reasons. 1. It would make the API more complex. 2. We need to know how to copy providerData from a 256-bit key into a 384-bit hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so see my complex note elsewhere on how the API could function. The behavior is easily described but is it clean/simple/desirable? Dunno. For now, I am going to take your advice and required this padding to be zeroed out unless/until we decide otherwise. I think we certainly need more clarity on our handling of optional data in the API (currently required as an optional string, which I do tend to think is a good idea).

|encodedKey[43] | 'A'..'_' | 6 bits of provider reserved data. This character should be reserved for stable, readable information, e.g., a provider-specific key kind.
|encodedKey[^20..^16]|'JQQJ'| Fixed CASK signature.
|encodedKey[^16]|'A'..'Z'\|'a'..'z'| The first character of the provider signature which must be alphabetic (upper-case indicating a customer-managed as opposed to service managed secret).
|encodedKey[^15..^12]|('A'..'Z-\_)\|('a'..'z-\_)| | The remaining three encoded characters. Any alphabetic characters must be consistently upper- or lower-case (to distinguish customer- vs. service-managed secrets).
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quotation marks don't seem to pair up. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks. not sure how best to render this, using regex format which is non-obvious. lots of syntax conventions mixed in here between the C# range stuff, some BNF conventions and regex char ranges. all suggestions for making this more consistent are welcome.

|encodedKey[^15..^12]|('A'..'Z-\_)\|('a'..'z-\_)| | The remaining three encoded characters. Any alphabetic characters must be consistently upper- or lower-case (to distinguish customer- vs. service-managed secrets).
|encodedKey[^12]|'A'..'_'|Represents the year of key allocation, 'A' (2024) to '_' (2087)|
|encodedKey[^13]|'A'..'L'|Represents the month of key allocation, 'A' (January) to 'L' (December)|
|encodedKey[^14]|'A'..'Z'\|'a'..'e'|Represents the day of key allocation, 'A' (0) to 'f' (31)|
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'e' or 'f'? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nuts, incomplete bug fix from previous incorrectness, thank you.

Comment on lines 63 to 64
|decodedKey[^15..^12]| 37, 4, 9 |0x25, 0x04, 0x09| 00100101b, 00000100b, 00001001b | Decoded 'JQQJ' signature.
|decodedKey[^12..^9]|0..255|0x0..0xFF|00000000b..11111111b| Provider identifier, e.g. , '0x4c', '0x44', '0x93' (base64 encoded as 'TEST')
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing that .. means a half-open range in decodedKey[^12..^9] but a closed range in 0..255. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I tend to agree, I switched to three dots for the ranges. I wonder if we should embed the actual ellipsis character.

I note that I'm not using the C# syntax for binary literals will do that in a follow-on change.

|decodedKey[^9..^]||||Time stamp data encoded in 4 six-bit blocks for YMDH.
|decodedKey[^6] >> 2|0, 7, or 8|0xFF|00000000b..11111100b| Leading 6 bits comprises kind enum followed by 2 bits of reserved padding. key[key.Length - 6] & 0xFC == 0.
|decodedKey[^5] >> 4]|0|0xFF|00000000b..11110000b| Leading 4 bits comprise 4 bits of reserved version information + 4 bits of zero padding (to preserve consistent rendering of the subsequence checksum data).
|decodedKey[..^4]|0 - 255|0x0..0xFF|00000000b..11111111b|CRC32(key[0] - key[key.Length - 5])
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|decodedKey[..^4]|0 - 255|0x0..0xFF|00000000b..11111111b|CRC32(key[0] - key[key.Length - 5])
|decodedKey[^4..]|0 - 255|0x0..0xFF|00000000b..11111111b|CRC32(key[..^4])
``` #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuts, thanks.

|[optional 3-byte sequences]|0..255|0x0..0xFF|00000000b..11111111b|Provider-defined data of arbitrary interpretation.
|decodedKey[^15..^12]| 37, 4, 9 |0x25, 0x04, 0x09| 00100101b, 00000100b, 00001001b | Decoded 'JQQJ' signature.
|decodedKey[^12..^9]|0..255|0x0..0xFF|00000000b..11111111b| Provider identifier, e.g. , '0x4c', '0x44', '0x93' (base64 encoded as 'TEST')
|decodedKey[^9..^]||||Time stamp data encoded in 4 six-bit blocks for YMDH.
Copy link
Contributor

@nguerrera nguerrera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|decodedKey[^9..^]||||Time stamp data encoded in 4 six-bit blocks for YMDH.
|decodedKey[^9..^6]||||Time stamp data encoded in 4 six-bit blocks for YMDH.
``` #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yes.

Copy link
Contributor

@nguerrera nguerrera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, any remaining issues will be handled in follow-up, so feel free to merge when green.

@michaelcfanning
Copy link
Member Author

At this point, I've done enough tinkering that we definitely should reset and scan the end result in detail. HMAC coming next. btw - does this stuff live in devdocs? or docs?


In reply to: 2550418565

|encodedKey[^12]|'A'...'_'|Represents the year of key allocation, 'A' (2024) to '_' (2087)|
|encodedKey[^13]|'A'...'L'|Represents the month of key allocation, 'A' (January) to 'L' (December)|
|encodedKey[^14]|'A'...'Z'\|'a'..'e'|Represents the day of key allocation, 'A' (0) to 'e' (31)|
|encodedKey[^15]|'A'...'X'|Represents the hour of key allocation, 'A' (hour 0 or midnight) to 'X' (hour 23). [TBD: This value could be used for half hour increments instead].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still missing things after timestamp. Also, the indices look off ^15..^12 is both part of the provider signature and the timestamp.

@KalleOlaviNiemitalo
Copy link

does this stuff live in devdocs? or docs?

I expect docs if it is a specification for the CASK formats so that they can be implemented without using your libraries; devdocs if it's only useful for maintaining your libraries.

@nguerrera
Copy link
Contributor

Yes, definitely docs for this change. devdocs for stuff about working in this repo, and TODOs, etc.

@michaelcfanning michaelcfanning merged commit 01ac458 into main Jan 14, 2025
8 checks passed
@michaelcfanning michaelcfanning deleted the primary-key-standard-bnf branch January 14, 2025 20:18
@michaelcfanning
Copy link
Member Author

Shouldn't the pseudocode for generating the keys/hmacs be in docs by this definition? this information relates directly to our standard, not anything specific to working in this repo.


In reply to: 2590905374

@nguerrera
Copy link
Contributor

Probably, yes. But it needs to be cleaned up. There are things out-of-date and I don't want any mentions of objects/instances that were specific to the first implementation and not part of any abstract description of the algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants