Onramp Assembly is an assembly language used to produce object code for the Onramp virtual machine.
This document provides the specification of the assembly language. A description of the bootstrapping stages and their implementation is here.
Onramp assembly has two different kinds of instructions: primitive instructions and compound instructions.
Primitive instructions are the instructions supported by the Onramp virtual machine. They are translated directly to their corresponding Onramp bytecode.
Compound instructions are converted by the assembler into one or more primitive instructions during assembly. For example, the compound instruction ret
is converted to the primitive instruction ldw rip 0 rsp
which is output as the bytecode 78 8F 00 8C
.
You can think of compound instructions as macros, except they are not defined in the assembly source; they are defined in the assembler program itself. (Unlike modern assemblers, there is no way to define macros in Onramp assembly.)
The first stage Onramp assembler recognizes only the following primitive instructions:
- Arithmetic:
add
,sub
,mul
,div
- Logic:
and
,or
,shl
,shru
- Memory:
ldw
,stw
,ldb
,stb
- Control:
ims
,cmpu
,jz
,sys
Each of these instructions takes three bytes of arguments and maps directly to an instruction in Onramp bytecode. The behaviour and required arguments of each is documented below.
In the first stage assembler, arguments can be passed as register or syscall names, as characters (provided they are in range), or as raw bytes.
In the full assembler, arguments can additionally be passed as numbers (decimal or hexadecimal) and a number argument may represent multiple bytes (for example as the 16-bit argument to ims
or jz
.)
The Onramp assembler recognizes the following register names:
r0
,r1
,r2
,r3
r4
,r5
,r6
,r7
r8
,r9
,ra
,rb
rfp
,rsp
,rpp
,rip
Register names are replaced by their literal bytes (80
through 8F
.) These must only be used as the argument to an instruction where a register is accepted. The early stage assemblers may not verify that a register name makes sense in the context in which it is used.
Raw hex bytes can be output by prefixing each one with '
.
'<hex>
: A raw hex byte.
The '
must be followed by two hexadecimal characters with no whitespace in between. The hexadecimal byte is output as-is (in plain text) without the leading '
.
When outputting multiple bytes, each one must be prefixed with '
.
;
starts a comment. The rest of the line is ignored.
#
starts a debug info annotation that runs until the end of the line. The assembler may pass the annotation verbatim to the output, or it may ignore it (treating it as a comment.)
Valid debug info consists of #line
and #pragma
directives. See the Debug Info document for a description of the debug info format.
The Onramp assembly language uses the same syntax for labels and symbols as Onramp object code. Unlike most assemblers, the Onramp assembler does not actually resolve labels. They are passed directly through to the linker.
Here's a quick reference table:
Syntax | Meaning |
---|---|
^ |
invocation: absolute, 32 bits |
< |
invocation: absolute, 16 high bits |
> |
invocation: absolute, 16 low bits |
& |
invocation: relative, 16-bit signed words |
: |
definition: label |
= |
definition: global symbol |
@ |
definition: static symbol |
+ |
flag: zero symbol |
? |
flag: weak definition or invocation |
{ |
flag: constructor, optional priority 0-65535 |
} |
flag: destructor, optional priority 0-65535 |
For full usage details, see the Onramp object code spec.
String literals can be provided by enclosing text in double quotes ("
).
"Hello world!"
The bytes of the string are output directly by the assembler.
You can concatenate strings and raw bytes however you like. The above is equivalent to:
"Hello" '20 "world" '21
A string can contain any printable ASCII characters except for the double-quote ("
) and the backslash (\
). There are no escape sequences. To add other bytes to a string, concatenate it with the raw hex values as above.
No null-terminator is appended. If you want one you have to append it ('00
) yourself.
Typically you would place the string in its own symbol so it can be referenced by the rest of the code. For example, here's a typical C string with a trailing line feed and null-terminator:
=hello_world
"Hello world!" '0A '00
Single-character strings can be used as instruction arguments. For example, to place the character 'a' in register r0:
mov r0 "a"
Instructions consist of an opcode and a number of arguments. Arguments can be integers, registers, quoted bytes, or single-character strings.
There are no commas separating arguments and whitespace is not significant (except where required to separate arguments.)
For most instructions (but not all), the destination register comes first, followed by the source values. For a few instructions (the stores and jumps), the source value comes first. This is done for simplicity and for symmetry with loads.
Compound instructions can assemble to different numbers of primitive instructions depending on the arguments and on the stage of the assembler. You should not assume the number of primitive instructions into which a compound instruction will expand.
Arguments have the following types:
r
: A registerm
: A mix-type byte, either a register or an immediate value in the range [-112,127]b
: An immediate 8-bit bytei
: An immediate 32-bit integer or equivalent (e.g. an absolute linker invocation, a number, four literal bytes)c
: An absolute linker invocation (i.e.^
and a symbol name)j
: A relative linker invocation (i.e.&
and a jump target)
The "Pr." column indicates which instructions are primitive. Only primitive instructions can use the scratch registers (ra
and rb
) as they are clobbered by most compound instructions.
Arithmetic:
Pr. | Opcode | Arguments | Description |
---|---|---|---|
x | add |
<r:dest> <m:src1> <m:src2> |
Adds src1 and src2, unsigned overflow |
x | sub |
<r:dest> <m:src1> <m:src2> |
Subtracts src2 from src1, unsigned underflow |
x | mul |
<r:dest> <m:src1> <m:src2> |
Multiplication |
x | divu |
<r:dest> <m:src1> <m:src2> |
Divides src1 by src2 unsigned |
divs |
<r:dest> <m:src1> <m:src2> |
Divides src1 by src2 signed | |
modu |
<r:dest> <m:src1> <m:src2> |
Modulus of src1 divided by src2 unsigned | |
mods |
<r:dest> <m:src1> <m:src2> |
Modulus of src1 divided by src2 signed | |
zero |
<r:dest> |
Sets the register to zero | |
inc |
<r:reg> |
Increments the register, unsigned overflow | |
dec |
<r:reg> |
Decrements the register, unsigned underflow | |
sxs |
<r:dest> <m:src> |
Sign-extends a short value (copies bit 15 to upper 16 bits) | |
sxb |
<r:dest> <m:src> |
Sign-extends a byte value (copies bit 7 to upper 24 bits) | |
trs |
<r:dest> <m:src> |
Truncates the value to a short (zeroes upper 16 bits) | |
trb |
<r:dest> <m:src> |
Truncates the value to a byte (zeroes upper 24 bits) |
Logic:
Pr. | Opcode | Arguments | Description |
---|---|---|---|
x | and |
<r:dest> <m:src1> <m:src2> |
Bitwise and |
x | or |
<r:dest> <m:src1> <m:src2> |
Bitwise or |
xor |
<r:dest> <m:src1> <m:src2> |
Bitwise xor | |
not |
<r:dest> <m:src> |
Bitwise not (inverts all bits) | |
x | shl |
<r:dest> <m:src1> <m:src2> |
Bitwise shift left (low to high) |
x | shru |
<r:dest> <m:src1> <m:src2> |
Bitwise logical shift right (unsigned, high to low) |
shrs |
<r:dest> <m:src1> <m:src2> |
Bitwise arithmetic shift right (signed, high to low) | |
rol |
<r:dest> <m:src1> <m:src2> |
Bitwise rotate left (low to high) | |
ror |
<r:dest> <m:src1> <m:src2> |
Bitwise rotate right (high to low) | |
mov |
<r:dest> <m:src> |
Copies src to dest | |
bool |
<r:dest> <m:src> |
Sets dest to 1 if src is non-zero, 0 otherwise | |
isz |
<r:dest> <m:src> |
Sets dest to 0 if src is non-zero, 1 otherwise |
Memory:
Pr. | Opcode | Arguments | Description |
---|---|---|---|
x | ldw |
<r:dest> <m:base> <m:offset> |
Loads a 4-byte word from memory (aligned) |
lds |
<r:dest> <m:base> <m:offset> |
Loads a 2-byte short from memory (aligned), zeroes upper 16 bits | |
x | ldb |
<r:dest> <m:base> <m:offset> |
Loads a byte from memory, zeroes upper 24 bits |
x | stw |
<m:value> <m:base> <m:offset> |
Stores a 4-byte word in memory (aligned) |
sts |
<m:value> <m:base> <m:offset> |
Stores a 2-byte short in memory (aligned), ignores upper 16 bits | |
x | stb |
<m:value> <m:base> <m:offset> |
Stores a byte in memory, ignores upper 24 bits |
push |
<m:value> |
Pushes a value to the stack | |
pop |
<r:reg> |
Pops the top stack value into the register | |
popd |
none | Pops the top stack value and discards it |
Control:
Pr. | Opcode | Arguments | Description |
---|---|---|---|
x | ims |
<r:reg> <b:high> <b:low> |
Shifts register up 16 bits, then loads a 16-bit immediate |
imw |
<r:reg> <i:value> |
Loads a 32-bit immediate | |
x * | cmpu |
<r:dest> <m:src1> <m:src2> |
Compares src1 to src2 unsigned, placing -1, 0 or 1 in dest |
* | cmps |
<r:dest> <m:src1> <m:src2> |
Compares src1 to src2 signed, placing -1, 0 or 1 in dest |
* | ltu |
<r:dest> <m:src1> <m:src2> |
Places 1 in dest if src1 is less than src2 unsigned, 0 otherwise |
lts |
<r:dest> <m:src1> <m:src2> |
Places 1 in dest if src1 is less than src2 signed, 0 otherwise | |
x | jz |
<m:pred> <j:label> |
Jumps if the predicate is zero |
jnz |
<m:pred> <j:label> |
Jumps if the predicate is not zero | |
jmp |
<j:label> or <c:function> |
Jumps unconditionally | |
call |
<c:function> |
Calls a function (pushing the return address to the stack) | |
ret |
none | Returns from a function call | |
enter |
none | Creates a stack frame | |
leave |
none | Destroys the current stack frame | |
x | sys |
<b:number> '00 '00 |
Performs a system call |
* WARNING: sys
will be removed soon.