Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UTF-16 output capabilities #291

Merged
merged 3 commits into from
Jun 11, 2020
Merged

Conversation

jhannemann
Copy link
Contributor

@jhannemann jhannemann commented Jun 9, 2020

Referring to issue #288:

When set to Unicode, the output conversion function now can handle
characters outside the Basic Multilingual Plane, using UTF-16.
The output conversion function will detect whether a value passed in
is a high surrogate value and save it in a static variable.
If the next character is a correct low surrogate, the function will
return the correct unicode character, otherwise it will discard the
input.

The following code should output a smiley 🙂:

Load H
Output
Load L
Output
Halt
H, HEX D83D
L, HEX DE42

When set to Unicode, the output conversion function now can handle
characters outside the Basic Multilingual Plane, using UTF-16.
The output conversion function will detect whether a value passed in
is a low surrogate value and save it in a static variable.
If the next character is a correct high surrogate, the function will
return the correct unicode character, otherwise it will discard the
input.
MARIE uses 16-bit twos complement integers. When the output is set to
BIN, the original conversion routine assumes an unsigned integer,
resulting in zeros displayed for negative 16-bit twos complement
numbers.
This patch adds signed-integer conversion and grouping functions so that
negative numbers will result in the correct bit pattern.
Copy link
Member

@auroranil auroranil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I lack in knowledge when it comes to character encoding, but I hope that my review will improve the way students understand how UTF-16 works.

Also you may want to relabel "UNICODE" to "UNICODE (UTF-16)" for output mode in src/templates/index.ejs.

One question: how would you enter high and low surrogate values in input? Would students have to type in their hexadecimal representation to store the value in memory?

@jhannemann
Copy link
Contributor Author

I've now fully implemented UTF-16BE (big-endian) and extensively commented the code. I've also added an example that outputs "Hello World!" and some other unicode characters. The example also shows the handling (ignoring) of Byte Order Markers. Should be much cleaner and readable now.

@jhannemann jhannemann requested a review from auroranil June 10, 2020 16:56
This patch adds full support for Unicode in the UTF-16BE (big-endian)
encoding. It ignores Byte-Order Markers and incorrect surrogate
sequences.

It also adds a Unicode program to the examples list, which prints out
"Hello World" and a couple of emojis outside the Basic Multilingual
Plane and a copyright sign as an example of a character inside the BMP.
Copy link
Member

@auroranil auroranil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Good job for making a Unicode example demonstrating the new feature!

@auroranil auroranil merged commit 04ca137 into MARIE-js:master Jun 11, 2020
@jhannemann jhannemann deleted the utf-16-output branch June 11, 2020 03:26
@jhannemann jhannemann restored the utf-16-output branch June 11, 2020 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants