Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Endianness Enum and Extensions for BinaryReader an BinaryWriter #26904

Open
deinok opened this issue Jul 23, 2018 · 26 comments
Open

Proposal: Endianness Enum and Extensions for BinaryReader an BinaryWriter #26904

deinok opened this issue Jul 23, 2018 · 26 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@deinok
Copy link

deinok commented Jul 23, 2018

Proposal

Add an enum that handles the endiannes

	/// <summary>
	/// Endianness refers to the sequential order in which bytes are arranged into larger numerical values when stored in memory or when transmitted over digital links.
	/// </summary>
	public enum Endianness {

		/// <summary>
		/// The least significant byte (LSB) value, is at the lowest address. 
		/// </summary>
		LittleEndian,

		/// <summary>
		/// The most significant byte (MSB) value, is at the lowest address.
		/// </summary>
		BigEndian

	}

Also add extensions to BinaryReader like:

public static ushort ReadUInt16(this BinaryReader binaryReader, Endianness endianness);
public static short ReadInt16(this BinaryReader binaryReader, Endianness endianness);
public static uint ReadUInt32(this BinaryReader binaryReader, Endianness endianness);
public static int ReadInt32(this BinaryReader binaryReader, Endianness endianness);
public static ulong ReadUInt64(this BinaryReader binaryReader, Endianness endianness);
public static long ReadInt64(this BinaryReader binaryReader, Endianness endianness);
public static float ReadFloat(this BinaryReader binaryReader, Endianness endianness);
public static double ReadDouble(this BinaryReader binaryReader, Endianness endianness);

And for BinaryWriter:

public static void Write(this BinaryWriter binaryWriter, ushort value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, short value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, uint value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, int value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, ulong value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, long value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, float value, Endianness endianness);
public static void Write(this BinaryWriter binaryWriter, double value, Endianness endianness);

Rationale

BinaryReader and BinaryWriter works really well for accessing Streams but they seems to lack the support for byte-order. And the usage of an enum for representing the Endianness seems more clear to me than using the BitConverter.IsLittleEndian bool.

@khellang
Copy link
Member

Sounds like it would overlap quite a bit with BinaryPrimitives?

@deinok
Copy link
Author

deinok commented Jul 23, 2018

@khellang Yes, I have implemented this by myself and i have done it with help of BinaryPrimitives.ReverseEndianness();
But as I feel it, seems more clear to use BinaryRW when accessing streams than the way it is done via BinaryPrimitives

@vcsjones
Copy link
Member

I've not worked with many data streams that need to read different endianness data in the same stream.

BinaryReader seems to always read data in little endian format. Some other considerations.

  1. What about "I want to use the endianness of whatever the architecture is"?

  2. Should this be a class-level thing instead of per-method? It might be tedious and error prone to keep having to pass around the right endianness in different methods.

  3. Does it make sense to implement an entirely different type? BigEndianStreamReader or something along those lines since the current one appears to always use little?

@deinok
Copy link
Author

deinok commented Jul 26, 2018

Hi @vcsjones I have never worked on something that needs to change the Endianness in the same stream, but it could be a possible (I cant imagine a case like this).

I'm not completely sure, but I think that BinaryReader takes by default the Endianness of the architecture by default (Most arch are little endian).

1.- As I said, I think its the default (What is actualy implemented). But I agree that would be great to have a static property in some class that represents the system information that handles the Endianess. Just like BitConverter.IsLittleEndian but returning the actual Endianness enum value.

2.- I would prefere the API I have described, but I don't think its a bad idea what you said. Something like set the default endianness of the BinaryRW on the constructor. I think that we can have the two things at the same time without cost.

3.- As I said, I think that it gets the default of the architecture, so your case can be handled in the example of 2, putting the default endiannes on the constructor or something like that.

@Halofreak1990
Copy link

Halofreak1990 commented Jan 25, 2019

I'm not completely sure, but I think that BinaryReader takes by default the Endianness of the architecture by default (Most arch are little endian).

As far as I'm aware, the .NET CLR itself is little-endian, and the BinaryReader/Writer implementations are built with this in mind, swapping bytes as necessary when running on a big-endian machine. This simplifies running code on the CLR, as endianness is completely taken out of the equation.

@deinok
Copy link
Author

deinok commented Jan 29, 2019

@Halofreak1990 If somebody from CoreCLR could provide us a bit more of help

@JeremyKuhne
Copy link
Member

I'm not opposed to having additional members, but I would want to see more feedback to justify the value here. It is relatively trivial to derive from BinaryReader and create an EndianReader and your code would be portable.

@deinok
Copy link
Author

deinok commented Feb 24, 2019

@JeremyKuhne For my experience, most of the times I use the BinaryReader and BinaryWriter I need to enforce an endianess. In fact, I have never used the BinaryReader and BinaryWriter without forcing a specific endiannes.

@Wraith2
Copy link
Contributor

Wraith2 commented Feb 24, 2019

Hi @vcsjones I have never worked on something that needs to change the Endianness in the same stream, but it could be a possible (I cant imagine a case like this).

Just a data point on this, parts of the TDS protocol used in SqlClient require specific and opposite endian-ness

@Ryder25
Copy link

Ryder25 commented May 26, 2019

For example, if you have a game data file that needs to be written using different endianness depending on the platform.

If reading/writing the file for Windows, then you may use LittleEndian, and instantiate the BinaryReader/Writer as so. Whereas for the PlayStation 3, the reader/writer would be instantiated with BigEndian.

@Halofreak1990
Copy link

@Ryder25 that is assuming you'd output your assets in a different endianness for each specific platform, which is something I'd personally never do. Why complicate development, if the framework automagically fixes things for you? .NET, as it currently works, allows one to use little-endian assets, and in code never care about that, unless you need to fiddle with specific bits. For example, a little endian long value in a file will produce the expected value in code, regardless of the endianness of the platform. I'd be inclined to say that, if you really need control of the endiannes on a case by case basis, you really need to re-evaluate what you're doing, because something's out of whack.

@Ryder25
Copy link

Ryder25 commented May 26, 2019

I'm not saying I would do it, but I've seen it done before with several games. I assume it is with performance reasons in mind. I can see a case where the game itself is written in C++, and there could be tooling done in C#. At that point, you can't only focus on what's happening in the .Net bcl.

@carlossanlop
Copy link
Member

Triage: There are enough scenarios that we see the value in doing this.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@deinok
Copy link
Author

deinok commented Feb 10, 2020

@carlossanlop What is required for making this API real?

@carlossanlop
Copy link
Member

@deinok in the main description with the API proposal, instead of etc..., can you please list all the extension APIs that you think should receive an endianness argument?

@deinok
Copy link
Author

deinok commented Feb 11, 2020

@carlossanlop Done

@jkotas
Copy link
Member

jkotas commented Feb 11, 2020

Passing an extra Endianness argument for each call looks pretty cumbersome.

I agree with @JeremyKuhne that it is trivial to derive from BinaryReader (or include these as extension methods in your project) if you need a reader like this. I do not think that this API belongs to the framework.

@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@Ryder25
Copy link

Ryder25 commented Mar 30, 2020

There are multiple ways to do this. Here are some different ideas explained by usage examples:

  1. BinaryReader/Writer remains as the LittleEndian version, and BigEndianBinaryReader/Writer are added with BinaryReader/Writer as the base class.
public class BigEndianBinaryReader : BinaryReader { ... }
int Example(Stream in)
{
    // Assume first byte in stream denotes whether the data is little or big endian
    var endianness = in.ReadByte(); // 0 - means little, 1+ means big
    var reader = endianness == 0 ? new BinaryReader(in) : new BigEndianBinaryReader(in);
    using (reader)
    {
        return reader.ReadInt32();
    }
}
  1. BinaryReader/Writer get constructor(s) with a bool isLittleEndian. Then internally it will read and write correctly based on this setting.
int Example(Stream in)
{
    // Assume first byte in stream denotes whether the data is little or big endian
    var endianness = in.ReadByte(); // 0 - means little, 1+ means big
    bool isLittleEndian = endianness == 0;
    using (BinaryReader reader = new BinaryReader(in, isLittleEndian))
    {
        return reader.ReadInt32();
    }
}
  1. A mix of options one and two. Instead of adding isLittleEndian to the constructor of BinaryReader/Writer, another new class is created called EndianBinaryReader/Writer which takes the bool, and internally uses the little or big endian version of the binary readers depending on the setting.

These implementations should add little overhead in terms of creating work to maintain the API. This also prevents someone from having to create their own utility library for this kind of functionality, or from having to copy that code from project to project. Overall this will make it really convenient to read and write binary data for various architectures.

Of course, we'll also have to think about performance when it comes to this API which may change things. I just want to get the ball rolling with some ideas.

@carlossanlop carlossanlop modified the milestones: 5.0.0, Future Jun 23, 2020
@martingbrown
Copy link

I'd have thought that if we add a flag for this it would hit performance as an extra if condition would be needed for every read/write. Would it be better to simply provide a second set of methods for big endian with a BE suffix. Something like this?

public static ushort ReadUInt16(this BinaryReader binaryReader);
public static ushort ReadUInt16Be(this BinaryReader binaryReader);

@deinok
Copy link
Author

deinok commented Aug 5, 2020

Missed that fact. I supose that for big frames that if condition would cause a big overhead

@twest820
Copy link

twest820 commented Oct 28, 2020

A couple observations I would add as someone who's implemented big endian by derivation:

  • BinaryReader has ambiguous endianness, most obviously in that consumers of methods which read multiple bytes have no way of querying for endianness of the bytes they're retrieving. While other read types are little endian by default, this doesn't hold for characters or strings if Encoding.BigEndianUnicode or some other derived, big-endian encoding is passed to the constructor. I suspect these considerations underlie some of what @deinok was referring to about enforcing endianness.
  • Both BinaryReader and BinaryPrimitives provide chatty APIs. From a performance perspective, an efficient implementation is more likely to pass blocks of bytes through Ssse3.Shuffle(Vector128<byte>, Vector128<byte>) or similar and would therefore probably expose methods such as ReadInt32Array(int length). While this amortizes the cost of endianness if statements or vtable lookups it seems more significant BinaryReader and BinaryPrimitives lack such methods.

Given these considerations, my design endpoint here is BinaryReader is primarily helpful as a base for quick big endian implementations but, if one's writing new infrastructure, it's probably preferable to implement an entirely separate class which interacts with a Stream directly.

@pekspro
Copy link

pekspro commented Jul 20, 2021

In the UWP platform there is the DataReader and DataWriter classes that I do like. They have support for changing byte endianness and string encoding. They also have some quirks, like not supporting ordinary streams and using big endian by default. But in general, I think they are well designed. They have support for async operations too. Here some sample code:

    byte[] rawBytes;

    using (var stream = new Windows.Storage.Streams.InMemoryRandomAccessStream())
    {
        DataWriter dataWriter = new DataWriter(stream);
        dataWriter.ByteOrder = ByteOrder.LittleEndian; //Big endian is default :-(
        dataWriter.UnicodeEncoding = Windows.Storage.Streams.UnicodeEncoding.Utf16BE;
        dataWriter.WriteInt32(0x01020304);
        dataWriter.WriteUInt16(0x0506);

        string message = "Win 📱 ❤";
        // Write message length, and then the message.
        dataWriter.WriteInt32(message.Length);
        dataWriter.WriteString(message);

        // Store and flush the data to the stream.
        await dataWriter.StoreAsync();
        await dataWriter.FlushAsync();

        // Create a byte array of the data.
        rawBytes = new byte[stream.Size];
        stream.Seek(0);
        await stream.ReadAsync(rawBytes.AsBuffer(), (uint)stream.Size, Windows.Storage.Streams.InputStreamOptions.None);
    }

    using (var stream = new Windows.Storage.Streams.InMemoryRandomAccessStream())
    {
        await stream.WriteAsync(rawBytes.AsBuffer());
        stream.Seek(0);

        DataReader dataReader = new DataReader(stream);
        dataReader.ByteOrder = ByteOrder.LittleEndian;
        dataReader.UnicodeEncoding = UnicodeEncoding.Utf16BE;
        // Load the data. Data must be loaded before doing any Read-operation.
        await dataReader.LoadAsync((uint) rawBytes.Length);

        // Read value 0x01020304
        var i32 = dataReader.ReadInt32();
        // Read value 0x0506
        var u16 = dataReader.ReadUInt16();

        // Read message length
        var messageLength = dataReader.ReadInt32();
        // Read message
        var message = dataReader.ReadString((uint) messageLength);
    }

    using (var stream = new MemoryStream(rawBytes))
    {
        BinaryReader dataReader = new BinaryReader(stream, System.Text.UnicodeEncoding.BigEndianUnicode);

        // Read value 0x01020304
        var i32 = dataReader.ReadInt32();
        // Read value 0x0506
        var u16 = dataReader.ReadUInt16();

        var messageLength = dataReader.ReadInt32();
        var message = dataReader.ReadChars(messageLength);
    }

So, my proposal is to migrate these instead of making changes to BinaryReader. Documentation is available here: https://docs.microsoft.com/en-us/uwp/api/windows.storage.streams.datareader?view=winrt-20348

@GF-Huang
Copy link

This issue has been hanging around for 5 years...

@martingbrown
Copy link

@GF-Huang That's only because they shut the previous feature suggestion lists down and started new ones. It's been kicking around since .Net 1.0 was released. 21 years ago.

@GF-Huang
Copy link

21 years ago.

Amazing.. and unbelievable..

@crozone
Copy link

crozone commented Aug 16, 2024

I've implemented the Big Endian methods as extensions. Many network protocols are big endian on the wire, so being able to easily deserialize big endian binary payloads with BinaryReader comes in handy. The lack of big endian methods seems like a huge oversight, I'm not surprised it has been a known issue for 21 years.

These extensions are not as performant as an internal implementation, but enough features are exposed via the public api surface that it's possible to get close to a native implementation. In particular, there's an optimisation in BinaryReader when reading from MemoryStream to avoid buffer copies, however it only works if MemoryStream was constructed with publiclyVisible = true. It would be really nice if MemoryStream exposed MemoryStream.InternalReadSpan() as a public API, but for some reason it's an internal only (#106524).

public static class BinaryReaderBigEndianExtensions
{
    public static short ReadInt16BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadInt16BigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(2, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[2]));

    public static ushort ReadUInt16BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadUInt16BigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(2, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[2]));

    public static int ReadInt32BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadInt32BigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(4, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[4]));

    public static uint ReadUInt32BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadUInt32BigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(4, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[4]));

    public static long ReadInt64BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadInt64BigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(8, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[8]));

    public static ulong ReadUInt64BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadUInt64BigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(8, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[8]));

    public static Half ReadHalfBigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadHalfBigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(2, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[2]));

    public static float ReadSingleBigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadSingleBigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(4, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[4]));

    public static double ReadDoubleBigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadDoubleBigEndian(
        binaryReader.BaseStream is MemoryStream ms && ms.TryReadSpanUnsafe(8, out ReadOnlySpan<byte> readBytes) ? readBytes : binaryReader.ReadSpan(stackalloc byte[8]));

    private static ReadOnlySpan<byte> ReadSpan(this BinaryReader binaryReader, Span<byte> buffer)
    {
        binaryReader.BaseStream.ReadExactly(buffer);
        return buffer;
    }

    private static bool TryReadSpanUnsafe(this MemoryStream memoryStream, int numBytes, out ReadOnlySpan<byte> readBytes)
    {
        if (memoryStream.TryGetBuffer(out var msBuffer))
        {
            readBytes = msBuffer.AsSpan((int)memoryStream.Position, numBytes);
            memoryStream.Seek(numBytes, SeekOrigin.Current);
            return true;
        }
        else
        {
            readBytes = [];
            return false;
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests