Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with high-bit-set chars #7

Open
0cjs opened this issue Jul 25, 2019 · 2 comments
Open

Dealing with high-bit-set chars #7

0cjs opened this issue Jul 25, 2019 · 2 comments

Comments

@0cjs
Copy link
Contributor

0cjs commented Jul 25, 2019

The particular disassembly project I'm working on right now involves a program that sets the high bit of a character to indicate the end of a string. This is currently a little awkward in the disassembly because it produces stuff like FCB $C4 as the last byte of the string, which isn't very readable.

To see if this could be improved, I tried out the following little patch. (It's not ready for production use—at the very least it's missing the RB_VARIANT version of the code—but I wanted to start a discussion about this first before putting too much work into this.)

diff --git a/f9dasm.c b/f9dasm.c
index e876e9b..343c74c 100644
--- a/f9dasm.c
+++ b/f9dasm.c
@@ -1896,7 +1896,10 @@ if ((nDigits == 2) &&                   /* if 2-digit value                  */
     sprintf(s, "'%c", W);
 #endif
   else
-    sprintf(s, "$%02x", W);
+    if (isprint(W & 0x7f))
+      sprintf(s, "'%c'|$80", W & 0x7f);
+    else
+      sprintf(s, "$%02x", W);
   }
 else if (IS_BINARY(addr))               /* if a binary                       */
   {

This offers two different solutions to the problem: it can either still produce a separate FCB line with just a more readable version of that char:

FCC     "MY STRIN"
FCB     'G'|$80

or you can annotate the whole range as char which works very well for a list of short strings if you use lcomment annotations to split the lines at the appropriate spots:

FCB     'F,'O,'R'|$80            ;015F: 46 4F D2       'FO.'   FOR
FCB     'N,'E,'X,'T'|$80         ;0162: 4E 45 58 D4    'NEX.'  NEXT
FCB     'D,'A,'T,'A'|$80         ;0166: 44 41 54 C1    'DAT.'  DATA

This little hack has been working well for me, but there may be more that could be done. It would be nice to be able to have the disassembler emit FCC syntax for these:

FCC     "MY STRIN",'G'|$80,"ANOTHER STRIN",'G'|$80

but I don't know if that's actually valid for most assemblers out there. It also might or might not make sense to make this optional, adding something like a hichar annotation to set the type of bytes to determine whether or not bytes are treated this way. Or maybe just an option to turn this on and off for the full disassembly or certain areas of the disassembly would be enough.

But getting much more sophisticated than the patch above would also be a fair amount of work, which I'm feeling would probably be more than I want to do given the current state of the code, lack of a testing framework, the concomitant difficulty of refactoring, etc.

What do you guys think?

@buchty
Copy link
Collaborator

buchty commented Jul 25, 2019 via email

@0cjs
Copy link
Contributor Author

0cjs commented Jul 25, 2019

...It would be nice to be able to have the disassembler emit ...FCC "MY STRIN",'G'|$80,"ANOTHER STRIN",'G'|$80...

Alfred Arnold's AS supports it (as just confirmed by him) -- and that's all I care about :) Rainer

Well, I'd be pleased to have it, but I don't think that there's any code to do even the basic version of this (e.g., FCC "HELLO",$0D,$0A,"WORLD",$0D,$0A,$00) right now, is there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants