-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for writing to an existing buffer #26
Conversation
On the web we've tried to encourage people to use array buffer views when they want a (buffer, offset, length) tuple. This has mostly been successful, e.g. in the Encoding and Streams APIs. Some people argued that GPU APIs were so performance-critical that they required positional arguments to avoid the garbage (i.e. What is proposed here seems less good. I'd encourage you to align with TextEncoder's encodeInto as much as possible, or ReadableStreamBYOBReader's read() if you want to support non-Uint8Arrays. |
@domenic Do you mean having just It does make using it to decode long strings a little bit more awkward, mostly because you have to use let input = 'SGVsbG8gV29ybGQ=';
let buffer = new Uint8Array(4);
let inputOffset = 0;
let currentView = buffer;
let extra, written, read;
while (inputOffset < input.length) {
0, { extra, written, read } = Uint8Array.fromPartialBase64(input, {
extra,
into: buffer,
inputOffset,
});
inputOffset += read;
currentView = currentView.subarray(written);
console.log(written, currentView.length)
if (currentView.length === 0) {
// output is full; consume it
console.log([...buffer]);
currentView = buffer;
}
}
if (currentView.length < buffer.length) {
console.log([...buffer.subarray(0, buffer.length - currentView.length)]);
} |
I'd also be happy to get rid of |
I think @domenic's suggestion is the while (inputOffset < input.length) {
0, { extra, written, read } = Uint8Array.fromPartialBase64(input, {
extra,
- into: buffer,
- outputOffset,
+ into: buffer.subarray(outputOffset),
inputOffset,
});
inputOffset += read;
outputOffset += written;
if (outputOffset === buffer.length) {
// output is full; consume it
console.log([...buffer]);
outputOffset = 0;
}
} Which is not really better, IMHO. Allocating a throwaway buffer view just to pass it to a method that's going to destruct it back into the real |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can remove more
, because there's no guarantee that the stream ended in a consistent state. There could still unflushed bits left in extra
, and the sample code's just ignoring them.
Unflushed bits in |
I think I am suggesting the following: const { extra, written, read } = Uint8Array.fromPartialBase64Into(input.substring(inputOffset), new Uint8Array(buffer, outputOffset), { extra }); Although this would be even better if you had a proper class to hold your stateful const decoder = new Base64Decoder();
const { written, read } = decoder.decodeInto(input.substring(inputOffset), new Uint8Array(buffer, outputOffset)); Then it would be fully congruent with existing APIs. The intuition here is to use the data structures we have already in JS (substrings and Uint8Arrays) to represent spans (over strings and buffers, respectively) instead of packaging up everything into a single options object. Additionally I think it's a good idea if the |
The class-based version is very painful, so I'm not going to go with that unless there's absolutely no other way forward. Other changes I'm neutral-to-positive on, but we'll see what the rest of the committee thinks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unflushed bits in
extra
at the end of the stream are analagous (or identical) to extra non-zero bits in the final characters in a base64 string, and most implementations will silently ignore those - e.g.atob('YQ==') === atob('YW==')
.
Ahh, I thought padding always referred to the =
chars and didn't realize they apply to partial bits as well.
Although this would be even better if you had a proper class to hold your stateful
extra
bit:
I would also prefer a stateful class, but I've written encoder/decoders both ways and either is acceptable.
The intuition here is to use the data structures we have already in JS (substrings and Uint8Arrays) to represent spans (over strings and buffers, respectively) instead of packaging up everything into a single options object.
I think the beauty of the current approach is that you could do either and both would work. inputOffset
and outputOffset
both default to zero, so applying a view to your input
/buffer
and omitting the two offsets gets you exactly the API you want. And if you don't want to allocate throwaway objects, you can pass the offsets.
I think the concern over throwaway objects is misplaced; this kind of operation will generally not be performed in a hot loop, and even in hot loops we have not seen good evidence that such objects cause slowdowns due to modern generational GCs. |
In JSC/Safari's implementation, accessing the underlying ArrayBuffer of a TypedArray for the first time internally clones the data into a different heap and sets the internal type to be a That means .subarray is expensive for typed arrays created without initially passing an ArrayBuffer instance in the constructor (at least in JSC) You can see the implementation of |
Closing as not planned, since we're probably not doing streaming at all (#27) and just keeping the simple API. |
Revived in #33. V8 continues to be concerned with throwaway objects, and per Jarred's comment it sounds like it matters to JSC as well, so I've included an |
Fixes #21 (at least for base64).
This adds
into
,outputOffset
, andinputOffset
parameters, as well asread
andwritten
return values, to thefromPartialBase64
static method.It also tweaks the
extra
parameter so we can keep track of individual extra bits, not just characters; thanks @jridgewell for the suggestion. Doing so allows us to completely fill passed buffers, even when they have lengths which are not a multiple of 3, which is a useful property. In addition, this lets us get rid of themore
parameter for this method.This only solves the problem for base64-string->Uint8Array. The other direction doesn't need it, since we don't have StringBuilder or any equivalent. But we probably do want to solve the problem for hex->Uint8Array, which will mean adding a
fromPartialHex
static method.Also the names are bad but see #25 for that.
Here's the two complete examples from the playground in this PR:
Single input:
Chunked input: