Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: StringStream (a Stream that wraps a string) #46663

Open
daveaglick opened this issue Jan 7, 2021 · 8 comments
Open

API Proposal: StringStream (a Stream that wraps a string) #46663

daveaglick opened this issue Jan 7, 2021 · 8 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@daveaglick
Copy link

Background and Motivation

I've discovered over the years that providing string-based content via a Stream is a common requirement. The conventional wisdom (as indicated by highly-voted Stack Overflow questions and answers, message board posts, etc.) appears to be to convert the string to a byte array using an Encoding and then wrap it with a MemoryStream. This likely works fine for small strings or other situations where performance and memory aren't of great concern, but is likely not an optimal approach given larger strings. Instead I'd propose a StringStream class be provided in-the-box that correctly provides a given string (or ReadOnlyMemory<char>) as a Stream while buffering chunks in a reusable buffer to reduce the overall memory footprint and avoid allocating an entire byte[] to contain the fully-encoded representation at once.

Proposed API

As with most other Stream derived classes, the API surface of a StringStream would be essentially identical. It might contain the following constructors as well as additional members (other Stream members omitted for brevity):

public class StringStream : Stream
{
    public StringStream(string source);
    public StringStream(string source, Encoding encoding);
    public StringStream(string source, Encoding encoding, int bufferCharCount);
    public StringStream(in ReadOnlyMemory<char> source);
    public StringStream(in ReadOnlyMemory<char> source, Encoding encoding);
    public StringStream(in ReadOnlyMemory<char> source, Encoding encoding, int bufferCharCount);

    // Because the encoding of a string to a given encoding is not necessarily based only on character count,
    // the `StringStream` would have to be forward-only and non-seekable to avoid encoding the entire string up-front
    public override bool CanRead => true;
    public override bool CanSeek => false;
    public override bool CanWrite => false;
    public override long Length => throw new NotSupportedException();

    // The underlying source string should be available
    public ReadOnlyMemory<char> Source { get; }

    // A reset method should be provided that sets the stream
    // back to it's initial position and clears the buffers
    public virtual void Reset();

    // As with other non-seekable streams, calling Position or Seek() should throw, but if setting
    // position to 0 or seeking to an offset of 0 from SeekOrigin.Begin, Reset() could be called

    // Other Stream members
    // ...
}

To further help explain the concept, a sample (possibly naïve) implementation can be found at: https://gist.github.com/daveaglick/e49145d650ea3a4dbc3b6d0f8482fd37 (thanks to @benaadams for several ideas here).

Usage Examples

Since the StringStream is intended to derive from the Stream class and conform to it's API, usage would be similar to any other Stream:

string reallyLongString = "...";
Stream stringStream = new StringStream(reallyLongString);

// Do some stuff with the stream
byte[] buffer = new byte[256];
stringStream.Read(buffer, 0, 256);
stringStream.Reset();
stringStream.Read(buffer, 0, 256);
// etc...

Alternative Designs

As previously mentioned, similar functionality is often implemented by wrapping a fully-encoded representation of the string with a MemoryStream.

C++ has a stringstream class that appears to be more of an iterator over a stack of strings than an actual C#-like stream (and as far as I can tell it makes no attempt to handle character encoding).

Risks

Anything dealing with character encoding has the potential for edge cases and challenging logic, so care would need to be taken that all encodings are handled or at least well documented (such as preamble, fallback, etc.).

@daveaglick daveaglick added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 7, 2021
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 7, 2021
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@jnm2
Copy link
Contributor

jnm2 commented Jan 7, 2021

Ooh, I was really hoping for something close: a TextReaderStream that encodes as it reads and a TextWriterStream that decodes as it writes. The equivalent of StringStream would then be new TextReaderStream(new StringReader(someString)), but it would also be much more flexible because it would work other kinds of TextReaders.

All my use cases are based on TextReaderStream or TextWriterStream. None are actually based on strings, so I wouldn't actually be able to use a StringStream.

@KirillOsenkov
Copy link
Member

Would this help avoid byte array allocations in places such as BinaryWriter.Write(string)?
https://source.dot.net/#System.Private.CoreLib/BinaryWriter.cs,166b0572d9c907b3

Or StreamWriter.Flush:
https://source.dot.net/#System.Private.CoreLib/StreamWriter.cs,f11dc172664bb49c

@GrabYourPitchforks
Copy link
Member

Interesting concept. I could see this being useful for networking scenarios. Some API-level suggestions: (a) the bufferCharCount ctor parameter probably isn't needed since the underlying Encoding API already handles this transparently; and (b) we could consider accepting a ReadOnlySequence<char> as well. In the latter case, it means this API would probably be part of the System.Memory library.

Implementation-wise, it'd be nice to special-case UTF-8 since it's going to be the most common and can be optimized both in terms of performance and in killing unneeded state instance fields.

@KirillOsenkov I don't think this API would be useful in BinaryWriter.Write(string). But there's definitely low-hanging fruit we can address there in parallel with this.

@Sergio0694
Copy link
Contributor

Sergio0694 commented Jan 7, 2021

Nice proposal! 😄

Also thought I'd mention, while not exactly the same since it eg. doesn't support a custom encoding and it's not part of the BCL, as a temporary alternative solution this can be achieved with APIs from the Microsoft.Toolkit.HighPerformance package already:

using Stream stream = "Hello world".AsMemory().AsBytes().AsStream();

// Use the stream here to read directly from string data

EDIT: to clarify, the AsBytes() extension is included in the 7.0 release (I added it in #3520), which is scheduled to be released on the public NuGet feed soon. If you'd like to try it out, you can grab the preview package from the Toolkit's CI, see the wiki 🙂

EDIT 2: this has now been publicly released since March 2021 (.NET API docs link).

@GSPP
Copy link

GSPP commented Oct 6, 2021

Maybe construction should happen through factory methods. That would allow returning a specialized class for UTF8 and other special cases.

I also think that the Encoding should be explicit in all constructors. The default encoding is not obvious, and encoding must be a conscious choice.

@jozkee
Copy link
Member

jozkee commented Feb 28, 2023

I also think that the Encoding should be explicit in all constructors. The default encoding is not obvious, and encoding must be a conscious choice.

There are plenty of types that currently don't force you to think about Encoding e.g: StreamReader, ZipArchive, but to be fair, they are pretty old and maybe why you suggested it.

@stephentoub @GrabYourPitchforks @bartonjs, should that be a policy for new APIs?

@stephentoub
Copy link
Member

I agree with @GSPP that for StringStream specifically you should be required to specify the encoding; unlike something like StreamReader which by default tries to infer an encoding based on the BOM at the beginning of the file, for StringStream you're giving it something that has no BOM and you're asking it for bytes, which requires it to go through some Encoding to produce. We could say that strings are UTF16 and so the default is UTF16, but I expect that's unlikely to be what most consumers actually want and would lead us to a pit of failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests

10 participants