Add UTF-16 output capabilities #291

jhannemann · 2020-06-09T20:33:21Z

Referring to issue #288:

When set to Unicode, the output conversion function now can handle
characters outside the Basic Multilingual Plane, using UTF-16.
The output conversion function will detect whether a value passed in
is a high surrogate value and save it in a static variable.
If the next character is a correct low surrogate, the function will
return the correct unicode character, otherwise it will discard the
input.

The following code should output a smiley 🙂:

Load H
Output
Load L
Output
Halt
H, HEX D83D
L, HEX DE42

When set to Unicode, the output conversion function now can handle characters outside the Basic Multilingual Plane, using UTF-16. The output conversion function will detect whether a value passed in is a low surrogate value and save it in a static variable. If the next character is a correct high surrogate, the function will return the correct unicode character, otherwise it will discard the input.

MARIE uses 16-bit twos complement integers. When the output is set to BIN, the original conversion routine assumes an unsigned integer, resulting in zeros displayed for negative 16-bit twos complement numbers. This patch adds signed-integer conversion and grouping functions so that negative numbers will result in the correct bit pattern.

auroranil

Nice work! I lack in knowledge when it comes to character encoding, but I hope that my review will improve the way students understand how UTF-16 works.

Also you may want to relabel "UNICODE" to "UNICODE (UTF-16)" for output mode in src/templates/index.ejs.

One question: how would you enter high and low surrogate values in input? Would students have to type in their hexadecimal representation to store the value in memory?

src/js/interface.js

src/js/utility.js

jhannemann · 2020-06-10T16:55:47Z

I've now fully implemented UTF-16BE (big-endian) and extensively commented the code. I've also added an example that outputs "Hello World!" and some other unicode characters. The example also shows the handling (ignoring) of Byte Order Markers. Should be much cleaner and readable now.

This patch adds full support for Unicode in the UTF-16BE (big-endian) encoding. It ignores Byte-Order Markers and incorrect surrogate sequences. It also adds a Unicode program to the examples list, which prints out "Hello World" and a couple of emojis outside the Basic Multilingual Plane and a copyright sign as an example of a character inside the BMP.

auroranil

Looks good to me. Good job for making a Unicode example demonstrating the new feature!

jhannemann added 2 commits June 9, 2020 16:30

jhannemann requested review from auroranil and cyderize as code owners June 9, 2020 20:33

jhannemann mentioned this pull request Jun 9, 2020

Unicode Output (UTF-16) is not handled correctly #288

Closed

auroranil requested changes Jun 9, 2020

View reviewed changes

src/js/interface.js Outdated Show resolved Hide resolved

src/js/interface.js Outdated Show resolved Hide resolved

src/js/interface.js Show resolved Hide resolved

src/js/interface.js Outdated Show resolved Hide resolved

src/js/utility.js Outdated Show resolved Hide resolved

jhannemann requested a review from auroranil June 10, 2020 16:56

jhannemann force-pushed the utf-16-output branch from 976a91a to a0547f4 Compare June 10, 2020 17:02

auroranil approved these changes Jun 11, 2020

View reviewed changes

auroranil merged commit 04ca137 into MARIE-js:master Jun 11, 2020

jhannemann deleted the utf-16-output branch June 11, 2020 03:26

jhannemann restored the utf-16-output branch June 11, 2020 03:35

ericjiang97 mentioned this pull request Apr 15, 2021

Suggesting a quine as an example demo program #303

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UTF-16 output capabilities #291

Add UTF-16 output capabilities #291

jhannemann commented Jun 9, 2020 •

edited

Loading

auroranil left a comment

jhannemann commented Jun 10, 2020

auroranil left a comment

Add UTF-16 output capabilities #291

Add UTF-16 output capabilities #291

Conversation

jhannemann commented Jun 9, 2020 • edited Loading

auroranil left a comment

Choose a reason for hiding this comment

jhannemann commented Jun 10, 2020

auroranil left a comment

Choose a reason for hiding this comment

jhannemann commented Jun 9, 2020 •

edited

Loading