ArkScript bytecode specification
You will find ArkScript bytecode specification on page, if you are interested in implementing your own virtual machine, or just want to learn more.
ArkScript bytecode headers
Name | Size | Description |
---|---|---|
Magic number | 4 bytes | 6386283, numeric version of "ark\0" |
Compiler.Major | 2 bytes | Big endian layout |
Compiler.Minor | 2 bytes | Big endian layout |
Compiler.Patch | 2 bytes | Big endian layout |
Timestamp | 8 bytes | Build time (Unix format), Big endian layout |
SHA256 | 32 bytes | SHA256 of the tables and code segments for integrity check |
Symbols table | ||
Symbols.count | 2 bytes | Big endian layout |
Symbol.value | Variable | Null-terminated string |
Values table | ||
Values.count | 2 bytes | Big endian layout |
Symbol.type | 1 byte | 1 for number, 2 for string, 3 for function |
Number.value | Variable | Null-terminated string representation of the number |
String.value | Variable | Null-terminated string |
Function.value | 2 bytes | Big endian layout |
Code segments | ||
Instruction count | 2 bytes | Big endian layout, can be 0 |
Instruction | 4 bytes | Instructions follow this layout: pppppppp iiiiiiii dddddddd dddddddd ; p for padding (always ignored), i for the instruction, d for the immediate argument |
Note on builtins
Builtins are handled with BUILTIN id
, with id
being the id of the builtin function object. The ids of the builtins are listed below.
Name | ID |
---|---|
false | 0 |
true | 1 |
nil | 2 |
The other builtins are listed in Builtins.cpp.
The stack and the locales
The stack is used for passing temporary values around, for example the arguments of a function. On the other end the locales are there to store long term values, the variables.
They are stored in a LIFO stack and should be referenced by there identifier (index in the symbols table, also used by instructions like LOAD_SYMBOL
).
Instructions
TS
represents the element at the top of the stack, TS1
represents the element below it, and so on.
Code | Argument(s) | Job |
---|---|---|
NOP (0x00) | Does Nothing | |
LOAD_SYMBOL (0x01) | symbol id | Load a symbol from its id onto the stack |
LOAD_CONST (0x02) | constant id | Load a constant from its id onto the stack. Should check for a saved environment and push a Closure with the page address + environment instead of the constant |
POP_JUMP_IF_TRUE (0x03) | absolute address to jump to | Jump to the provided address if the last value on the stack was equal to true. Remove the value from the stack no matter what it is |
STORE (0x04) | symbol id | Take the value on top of the stack and put it inside a variable named following the symbol id (cf symbols table), in the nearest scope. Raise an error if it couldn't find a scope where the variable exists |
LET (0x05) | symbol id | Take the value on top of the stack and create a constant in the current scope, named following the given symbol id (cf symbols table) |
POP_JUMP_IF_FALSE (0x06) | absolute address to jump to | Jump to the provided address if the last value on the stack was equal to false. Remove the value from the stack no matter what it is |
JUMP (0x07) | absolute address to jump to (two byte, big endian) | Jump to the provided address |
RET (0x08) | If in a code segment other than the main one, quit it, and push the value on top of the stack to the new stack; should as well delete the current environment. Otherwise, acts as a HALT | |
HALT (0x09) | Stop the Virtual Machine | |
CALL (0x0a) | number of arguments when calling the function | Call function from its symbol id located on top of the stack. Take the given number of arguments from the top of stack and give them to the function (the first argument taken from the stack will be the last one of the function). The stack of the function is now composed of its arguments, from the first to the last one |
CAPTURE (0x0b) | symbol id | Used to tell the Virtual Machine to capture the variable from the current environment. Main goal is to be able to handle closures, which need to save the environment in which they were created |
BUILTIN (0x0c) | id of builtin | Push the builtin function object on the stack |
MUT (0x0d) | symbol id | Take the value on top of the stack and create a variable in the current scope, named following the given symbol id (cf symbols table) |
DEL (0x0e) | symbol id | Remove a variable/constant named following the given symbol id (cf symbols table) |
SAVE_ENV (0x0f) | Save the current environment, useful for quoted code | |
GET_FIELD (0x10) | symbol id | Used to read the field named following the given symbol id (cf symbols table) of a Closure stored in TS. Pop TS and push the value of field read on the stack |
PLUGIN (0x11) | constant id | Used to load a plugin dynamically, plugin name is stored as a string in the constants table |
LIST (0x12) | number of arguments | Create a list from the elements pushed on the stack in reverse order |
APPEND (0x13) | number of arguments | Append elements to a list in reverse order (first the last element, then the other, then the list itself) |
CONCAT (0x14) | number of arguments | Concatenate lists in reverse order |
APPEND_IN_PLACE (0x15) | number of arguments | Append elements to a reference to a list (TS) in reverse order (first the last element, then the other, then the list itself). Push nil on the stack |
CONCAT_IN_PLACE (0x16) | number of arguments | Concatenate lists in reverse order, to a reference to a list (TS). Push nil to the stack |
POP_LIST (0x17) | Remove an element from a list (TS), given an index (TS1). Push the modified list to the stack | |
POP_LIST_IN_PLACE (0x18) | Remove an element from a reference to a list (TS), given an index (TS1). Push nil to the stack | |
POP (0x19) | Remove the top of the stack | |
DUP (0x1a) | Duplicate the top of the stack | |
ADD (0x20) | Push TS1 + TS | |
SUB (0x21) | Push TS1 - TS | |
MUL (0x22) | Push TS1 * TS | |
DIV (0x23) | Push TS1 / TS | |
GT (0x24) | Push TS1 > TS | |
LT (0x25) | Push TS1 < TS | |
LE (0x26) | Push TS1 <= TS | |
GE (0x27) | Push TS1 >= TS | |
NEQ (0x28) | Push TS1 != TS | |
EQ (0x29) | Push TS1 == TS | |
LEN (0x2a) | Push len(TS) , TS must be a list | |
EMPTY (0x2b) | Push empty?(TS) , TS must be a list | |
TAIL (0x2c) | Push tail(TS) , all the elements of TS except the first one (TS must be a list) | |
HEAD (0x2d) | Push head(TS) , the first element of TS or nil if empty (TS must be a list) | |
ISNIL (0x2e) | Push true if TS is nil, false otherwise | |
ASSERT (0x2f) | Throw an exception if TS1 is false, and display TS (must be a string). Otherwise, push nil | |
TO_NUM (0x30) | Convert TS to number (must be a string) | |
TO_STR (0x31) | Convert TS to string (must be a number) | |
AT (0x32) | Push the value at index TS (must be a number) in TS1 (must be a list) | |
MOD (0x33) | Push TS1 % TS | |
TYPE (0x34) | Push the type of TS as a string | |
HASFIELD (0x35) | Check if TS1 is a closure field of TS. TS must be a Closure and TS1 a String | |
NOT (0x36) | Push !TS |