ArkScript bytecode specification
You will find ArkScript bytecode specification on page, if you are interested in implementing your own virtual machine, or just want to learn more.
ArkScript bytecode headers
Name | Size | Description |
---|---|---|
Magic number | 4 bytes | 6386283, numeric version of "ark\0" |
Compiler.Major | 2 bytes | Big endian layout |
Compiler.Minor | 2 bytes | Big endian layout |
Compiler.Patch | 2 bytes | Big endian layout |
Timestamp | 8 bytes | Build time (Unix format), Big endian layout |
SHA256 | 32 bytes | SHA256 of the tables and code segments for integrity check |
Symbols table | ||
Symbols.count | 2 bytes | Big endian layout |
Symbol.value | Variable | Null-terminated string |
Values table | ||
Values.count | 2 bytes | Big endian layout |
Symbol.type | 1 byte | 1 for number, 2 for string, 3 for function |
Number.value | Variable | Null-terminated string representation of the number |
String.value | Variable | Null-terminated string |
Function.value | 2 bytes | Big endian layout |
Code segments | ||
Instruction count | 2 bytes | Big endian layout, can be 0 |
Instruction | 4 bytes |
Instructions with a single immediate arguments follow this layout: iiiiiiii pppppppp dddddddd dddddddd
.
p
for padding (ignored), i
for the instruction, d
for the immediate argument.
Super Instructions, with two immediate arguments, follow this layout: iiiiiiii ssssssss ssssxxxx xxxxxxxx
.
s
for the second argument (on 12 bits), x
for the primary argument (on 12 bits as well).
Using this representation, computing the primary argument is as easy as arg_16_bits & 0x0fff
, with
arg_16_bits
the primary argument for instructions with a single argument.
Note on builtins
Builtins are handled with BUILTIN id
, with id
being the id of the builtin function object. The ids of the builtins are listed below.
Name | ID |
---|---|
false | 0 |
true | 1 |
nil | 2 |
The other builtins are listed in Builtins.cpp.
The stack and the locales
The stack is used for passing temporary values around, for example the arguments of a function. On the other end the locales are there to store long term values, the variables.
They are stored in a LIFO stack and should be referenced by there identifier (index in the symbols table, also used by instructions like LOAD_SYMBOL
).
Function calling convention
If we want to call a function foo
, eg by writing (foo 1 2 3)
, the arguments will be pushed in reverse order on the stack.
First, push 3, then 2, then 1.
In the end, our stack looks like this:
1 <-- Top of the stack
2
3
... <-- Bottom of the stack
Hence, we can retrieve the arguments in the correct order. However, this has the effect of inverting the order of evaluation of the arguments,
if we pass expressions to our function: (foo (+ 1 2) (* 3 4) (- 5 6))
, the expression (+ 1 2)
will be evaluated last,
while (- 5 6)
will be evaluated first.
Instructions
TS
represents the element at the top of the stack, TS1
represents the element below it, and so on.
Code | Argument(s) | Job |
---|---|---|
NOP (0x00) | Does nothing, useful for padding | |
LOAD_SYMBOL (0x01) | symbol id | Load a symbol from its ID onto the stack |
LOAD_CONST (0x02) | symbol id | Load a constant from its ID onto the stack |
POP_JUMP_IF_TRUE (0x03) | absolute address to jump to | Jump to the provided address if the last value on the stack was equal to true. Remove the value from the stack no matter what it is |
STORE (0x04) | symbol id | Take the value on top of the stack and create a variable in the current scope, named following the given symbol id (cf symbols table) |
SET_VAL (0x05) | symbol id | Take the value on top of the stack and put it inside a variable named following the symbol id (cf symbols table), in the nearest scope. Raise an error if it couldn't find a scope where the variable exists |
POP_JUMP_IF_FALSE (0x06) | absolute address to jump to | Jump to the provided address if the last value on the stack was equal to false. Remove the value from the stack no matter what it is |
JUMP (0x07) | absolute address to jump to | Jump to the provided address |
RET (0x08) | If in a code segment other than the main one, quit it, and push the value on top of the stack to the new stack; should as well delete the current environment. Otherwise, acts as a HALT | |
HALT (0x09) | Stop the Virtual Machine | |
CALL (0x0a) | argument count | Call function from its symbol id located on top of the stack. Take the given number of arguments from the top of stack and give them to the function (the first argument taken from the stack will be the last one of the function). The stack of the function is now composed of its arguments, from the first to the last one |
CAPTURE (0x0b) | symbol id | Tell the Virtual Machine to capture the variable from the current environment. Main goal is to be able to handle closures, which need to save the environment in which they were created |
BUILTIN (0x0c) | builtin id | Push the corresponding builtin function object on the stack |
DEL (0x0d) | symbol id | Remove a variable/constant named following the given symbol id (cf symbols table) |
MAKE_CLOSURE (0x0e) | constant id | Push a Closure with the page address pointed by the constant, along with the saved scope created by CAPTURE instruction(s) |
GET_FIELD (0x0f) | symbol id | Read the field named following the given symbol id (cf symbols table) of a Closure stored in TS. Pop TS and push the value of field read on the stack |
PLUGIN (0x10) | constant id | Load a plugin dynamically, plugin name is stored as a string in the constants table |
LIST (0x11) | number of elements | Create a list from the N elements pushed on the stack. Follows the function calling convention |
APPEND (0x12) | number of elements | Append N elements to a list (TS). Elements are stored in TS(1)..TS(N). Follows the function calling convention |
CONCAT (0x13) | number of elements | Concatenate N lists to a list (TS). Lists to concat to TS are stored in TS(1)..TS(N). Follows the function calling convention |
APPEND_IN_PLACE (0x14) | number of elements | Append N elements to a reference to a list (TS), the list is being mutated in-place, no new object created. Elements are stored in TS(1)..TS(N). Follows the function calling convention |
CONCAT_IN_PLACE (0x15) | number of elements | Concatenate N lists to a reference to a list (TS), the list is being mutated in-place, no new object created. Lists to concat to TS are stored in TS(1)..TS(N). Follows the function calling convention |
POP_LIST (0x16) | Remove an element from a list (TS), given an index (TS1). Push a new list without the removed element to the stack | |
POP_LIST_IN_PLACE (0x17) | Remove an element from a reference to a list (TS), given an index (TS1). The list is mutated in-place, no new object created | |
SET_AT_INDEX (0x18) | Modify a reference to a list or string (TS) by replacing the element at TS1 (must be a number) by the value in TS2. The object is mutated in-place, no new object created | |
SET_AT_2_INDEX (0x19) | Modify a reference to a list (TS) by replacing TS[TS2][TS1] by the value in TS3. TS[TS2] can be a string (if it is, TS3 must be a string). The object is mutated in-place, no new object created | |
POP (0x1a) | Remove the top of the stack | |
DUP (0x1b) | Duplicate the top of the stack | |
CREATE_SCOPE (0x1c) | Create a new local scope | |
POP_SCOPE (0x1d) | Destroy the last local scope | |
ADD (0x1e) | Push TS1 + TS | |
SUB (0x1f) | Push TS1 - TS | |
MUL (0x20) | Push TS1 * TS | |
DIV (0x21) | Push TS1 / TS | |
GT (0x22) | Push TS1 > TS | |
LT (0x23) | Push TS1 < TS | |
LE (0x24) | Push TS1 <= TS | |
GE (0x25) | Push TS1 >= TS | |
NEQ (0x26) | Push TS1 != TS | |
EQ (0x27) | Push TS1 == TS | |
LEN (0x28) | Push len(TS) , TS must be a list | |
EMPTY (0x29) | Push empty?(TS) , TS must be a list or string | |
TAIL (0x2a) | Push tail(TS) , all the elements of TS except the first one. TS must be a list or string | |
HEAD (0x2b) | Push head(TS) , the first element of TS or nil if empty. TS must be a list or string | |
ISNIL (0x2c) | Push true if TS is nil, false otherwise | |
ASSERT (0x2d) | Throw an exception if TS1 is false, and display TS (must be a string). Do not push anything on the stack | |
TO_NUM (0x2e) | Convert TS to number (must be a string) | |
TO_STR (0x2f) | Convert TS to string | |
AT (0x30) | Push the value at index TS (must be a number) in TS1, which must be a list or string | |
AT_AT (0x31) | Push the value at index TS (must be a number), inside the list or string at index TS1 (must be a number) in the list at TS2 | |
MOD (0x32) | Push TS1 % TS | |
TYPE (0x33) | Push the type of TS as a string | |
HASFIELD (0x34) | Check if TS1 is a closure field of TS. TS must be a Closure, TS1 a String | |
NOT (0x35) | Push !TS | |
LOAD_CONST_LOAD_CONST (0x36) | constant id, constant id | Load two consts (primary then secondary ) on the stack in one instruction |
LOAD_CONST_STORE (0x37) | constant id, symbol id | Load const primary into the symbol secondary (create a variable) |
LOAD_CONST_SET_VAL (0x38) | constant id, symbol id | Load const primary into the symbol secondary (search for the variable with the given symbol id) |
STORE_FROM (0x39) | symbol id, symbol id | Store the value of the symbol primary into a new variable secondary |
SET_VAL_FROM (0x3a) | symbol id, symbol id | Store the value of the symbol primary into an existing variable secondary |
INCREMENT (0x3b) | symbol id, count | Increment the variable primary by count and push its value on the stack |
DECREMENT (0x3c) | symbol id, count | Decrement the variable primary by count and push its value on the stack |
STORE_TAIL (0x3d) | symbol id, symbol id | Load the symbol primary , compute its tail, store it in a new variable secondary |
STORE_HEAD (0x3e) | symbol id, symbol id | Load the symbol primary , compute its head, store it in a new variable secondary |
SET_VAL_TAIL (0x3f) | symbol id, symbol id | Load the symbol primary , compute its tail, store it in an existing variable secondary |
SET_VAL_HEAD (0x40) | symbol id, symbol id | Load the symbol primary , compute its head, store it in an existing variable secondary |
CALL_BUILTIN (0x41) | builtin id, argument count | Call a builtin by its id in primary , with secondary arguments. Bypass the stack size check because we do not push IP/PP since builtins calls do not alter the stack |