-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of remarks of current draft 04 #95
Comments
|
|
|
For item 2, do you have an application example (or two) that does that? |
Well, Raymond Chen made an blog post on it: https://devblogs.microsoft.com/oldnewthing/20220928-00/. The editor HxD (download here: https://mh-nexus.de/en/hxd/), has a Data Inspector that supports GUIDs (and no UUIDs). Imagine having the following binary data: Normally, when reading this data, we use a big-endian representation format, so the UUID will have the following representation: However, Microsoft has a GUID struct: struct GUID
{
uint32_t Data1;
uint16_t Data2;
uint16_t Data3;
uint8_t Data4[8];
} The 128 bit data is still read the same. However, the representation format is different. A GUID will print all little-endian using the struct above, so: An The representation becomes: Maybe that is also why there is a diffence between UUID en GUID, to distinguish the difference in encoding? I don't know, but it sounds reasonable to me. (I think I will do a big rewrite of the Wikipedia soon.) |
Thanks, I think I can add a bullet to the ones I added last update and state that MS implementation of GUIDs leverage little-endian and cite that MS post but also generally discourage that practice while we are at it. |
I think it isn't that simple to discourage an UUID format that is used by one of the biggest tech companies in the world. I will try to investigate the problem a little bit more, so we can describe the situation in the best way possible. |
I did some testing on GUID in C++: CLSID StringToGUID(){
CLSID clsid;
wchar_t* clsid_str = L"{5C98B8E4-2B3D-12B1-ADE2-0000F86456B2}";
CLSIDFromString(clsid_str, &clsid);
return clsid;
}
CLSID cls = StringToGUID();
char data[16];
memcpy(&data,&cls,16);
std::ofstream myfile;
myfile.open("example.dat");
for(int i=0;i<16;++i){
char c[2];
sprintf(c,"%02X", data[i] & 0xFF);
myfile << c;
}
myfile.close(); When viewing the content of Comparing it with the input string, it is definitely little-endian. Note that Imagine having an UUID with the bytes However, when going to GUID, the same sequence of bytes in the memory will behave differently. The representation format is a little different, because little-endian is used. Actually, if you look very specificly to the whole thing, it is not the representation format that behaves like it is little-endian, but it is the binary format. In case of So, it is like the following: UUID: Note that GUID uses a I will also make a table explaining it in some way.
So if you have a struct like this: struct GUID
{
uint32_t Data1;
uint16_t Data2;
uint16_t Data3;
uint8_t Data4[8];
} In case of big-endian, you DON'T HAVE TO flip bytes when encoding to and decoding from the representation format and wire-format. |
Let's take UUIDUsing https://www.uuidtools.com/decode, it tells us it is DCE variant and version 1. The UUID is saved in the same byte order. GUIDUsing the
Note that we still have the same order as the representation format. The GUID is saved in little-endian, so every byte should be flipped per field as seen above:
In the end, concatenate: ConclusionMaybe I have used too many words for it, but we are NOT talking about a new representation format, as I found out by now. We are talking about Microsoft having a different way of writing UUIDs to disk. It is not big-endian, like the normal UUID, but also not a fully 16-byte flipped thing. It is something between it, because they use their own GUID struct for chunking. I propose to not add it as new representation format. It simply isn't and I was wrong in the beginning of this issue about it. I propose we add some little sentences about Microsoft using a slightly different way of writing the UUID to disk/memory and further say it is out of scope of this specification to define how, because it has actually to do with Microsoft's variant, which is out of scope too. Something like this:
|
Defining this different method will be done in the historical RFC that @kyzer-davis and I will try to make after the successor of RFC 4122 is published. |
Just reviewed: If I add the text I will add an asterisk next to [Microsoft] in the first section 4 bullet and then call cite that text at the end of section 4. |
I think we should maybe only mention the fact that Microsoft's GUID has a slightly different binary output then the original UUID to take into account, but nothing more than that. The details will be in the historical RFC. The text I made previously ( What do you mean by adding a asterisk next to [Microsoft] in the first section 4 bullet? (I assume you mean |
Yeah, I was trying to tie to MS so it flows nicely but since we are talking about big-endian encoding in paragraph 2 it can work there. Let me whip up a PR real fast. |
Sure. I already started to improve the Wikipedia. |
@ben221199, take a peek at 408c5a7 If that looks good I will merge down and get this back to IETF under draft 05 closing this thread. |
Seems good to me. I think that DCOM (Distributed Component Object Model) is better than COM (https://nl.wikipedia.org/wiki/Distributed_component_object_model), but I also see COM appearing in some docs. I see that Raymond uses |
As seen in #83, I made some remarks on draft 03. In this issue I will list things that are not fully handled by the current draft (04) and some additional things I have in mind:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF
, the UUID will be written like00112233-4455-6677-8899-AABBCCDDEEFF
, where the GUID will be written like{33221100-5544-7766-8899-AABBCCDDEEFF}
. Except the{
and}
, it is also vissible that some parts are flipped. This is seen in HxD (a well-known Hex-Editor), for example:I think it is worth noting that GUIDs have this effect.
I think we still have to talk about the name of
Max UUID
. I understand the people that say "It is the maximum value". However, the makers of RFC 4122 went with "Nil UUID" when talking about the "minimum value". I think we should stay in the same jargon when naming things. So, we rename "Nil UUID" to "Min UUID" or something, or we change "Max UUID" to "Omni UUID". I think we should go for the Nil/Omni pair rather than the Min/Max pair. For example, Min/Max doesn't make sense when looking at UUID as a 128 bit SIGNED integer. All bits set is not the maximum and all bits not is not the minimum, see https://en.wikipedia.org/wiki/Two%27s_complement. I would rather go for the Nil/Omni pair:Nil (https://en.wiktionary.org/wiki/nil#Latin) is an alternative form of "nihil" and means "nothing" in Latin. This is true, because of all bits, NOTHING is set.
Omni (https://en.wiktionary.org/wiki/omni#Latin) is an declension of "omnis" and means "all" in Latin. This is true, because ALL bits are set.
The text was updated successfully, but these errors were encountered: