-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/blocks/{round} endpoint returning malformed logs (non-ASCII data only) #5807
Comments
Thanks for creating this issue and including the code to reproduce. ProblemThere are two reasons you're seeing invalid data, and I'll discuss them below. JSONWhen requesting the block as JSON, algod does not base64 encode log values. As a result the raw binary log string gets encoded as a JSON string, meaning some characters are not able to be represented and will be replaced by the unicode replacement character, � or MsgpackWhen requesting the block as msgpack, which is a binary format, there's no need to base64 the log values, so the encoded responses does not lose any data. However, in this encoding log strings are tagged with the msgpack "string" type instead of "binary." When you decode the msgpack values in your javascript client, the msgp library you use sees the "string" type and attempts to decode the log value into a string, which isn't possible for all characters and results in the same problematic representation with unicode replacement characters. ExampleFor context, here's an example of a binary string with some invalid unicode characters being converted to a UTF-8 string. new Uint8Array([61, 180, 118, 220, 39, 166, 43, 68, 219, 116, 105, 84, 121, 46, 122, 136, 233, 221, 15, 174, 247, 19, 50, 176, 184, 221, 66, 188, 171, 36, 135, 121])
What we can doOn the algod sideDue to backwards compatibility concerns and the fact that msgpack is the canonical encoding for Algorand blocks, unfortunately it would be extremely difficult to modify the msgpack encoding. We might be able to change the JSON response, since it's not supposed to be canonical, but I need to do more research and think about this more. On the client sideThe msgpack spec does indicate that decoding clients should provide a way to get the raw bytes for strings for exactly this reason, but I've yet to see a javascript msgpack client which offers this. If you can find a client that does this, either in javascript or another language you can use, that would make the msgpack response usable. (Go is the only language which I'm 100% certain allows this, as it does not require strings to be UTF-8, which is why this works for go-algorand.) |
Perhaps I'm misunderstanding unicode, but wouldn't this still result in the same amount of bytes, just incorrect bytes. The behavior observed here is that it's not just the wrong bytes but the wrong amount of bytes. |
The Unicode replacement character is 2 bytes. But the Unicode encoding has
variable length characters.
…On Fri, Oct 27, 2023, 8:25 AM Joe Polny ***@***.***> wrote:
meaning some characters are not able to be represented and will be
replaced by the unicode replacement character, � or \ufffd.
Perhaps I'm misunderstanding unicode, but wouldn't this still result in
the same amount of bytes, just incorrect bytes. The behavior observed here
is that it's not just the wrong bytes but the wrong amount of bytes.
—
Reply to this email directly, view it on GitHub
<#5807 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADL7T6RI4MJWFGQEDPMPDTYBOR4FAVCNFSM6AAAAAA6RQ2D46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBSHAZDQNBYGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I just updated the issue with a proposed solution after talking with @joe-p offline |
closed by #5865 |
Subject of the issue
The logs in
/blocks/{round}
endpoint return malformed data, but only when the log contains non-ascii data.Your environment
Local network in Algokit docker container. Version
3.19.0.stable [rel/stable] (commit #7037cb3b)
Steps to reproduce
blocks/{round}
endpoint and parse the logs directly from the blockExpected behaviour
All logged data is present and can be decoded client-side.
Actual behaviour
The data is malformed and often (but not always) missing bytes and/or contains the wrong bytes.
Notes
A repo reproducing this bug can be seen here: https://github.com/joe-p/block-log-debug/tree/main
So far I have only confirmed that this is a bug with logs, but I have a suspicion that it might affect other
[]string
fields in the block like app arguments. I will add some more testcases to the above repo to verify if that is the case.This only affects the
/blocks/
endpoint. There are many tools, such as the ATC and algokit, that rely on logs to parse non-ascii data (ie. ABI return types) from the transaction response and they all work just fine.This is currently blocking progress being made on an event parsing mechanism
Proposed Solution
The underlying causes of this issue in the
/v2/blocks/{round}
endpoint are deep rooted and difficult to change. Because of this, a better solution would be to create a new/v2/blocks/{round}/logs
endpoint that returns the logs for a given round (similar to the current/v2/blocks/{round}/txids
endpoint).The proposed schema for this new endpoint is:
It should be straightforward to create a response following this schema from a block.
The text was updated successfully, but these errors were encountered: