ArkScript bytecode specification

You will find ArkScript bytecode specification on page, if you are interested in implementing your own virtual machine, or just want to learn more.

ArkScript bytecode headers

NameSizeDescription
Magic number4 bytes6386283, numeric version of "ark\0"
Compiler.Major2 bytesBig endian layout
Compiler.Minor2 bytesBig endian layout
Compiler.Patch2 bytesBig endian layout
Timestamp8 bytesBuild time (Unix format), Big endian layout
SHA25632 bytesSHA256 of the tables and code segments for integrity check
Symbols table
Symbols.count2 bytesBig endian layout
Symbol.valueVariableNull-terminated string
Values table
Values.count2 bytesBig endian layout
Symbol.type1 byte1 for number, 2 for string, 3 for function
Number.valueVariableNull-terminated string representation of the number
String.valueVariableNull-terminated string
Function.value2 bytesBig endian layout
Code segments
Instruction count2 bytesBig endian layout, can be 0
Instruction4 bytes

Instructions with a single immediate arguments follow this layout: iiiiiiii pppppppp dddddddd dddddddd.

p for padding (ignored), i for the instruction, d for the immediate argument.

Super Instructions, with two immediate arguments, follow this layout: iiiiiiii ssssssss ssssxxxx xxxxxxxx.

s for the second argument (on 12 bits), x for the primary argument (on 12 bits as well). Using this representation, computing the primary argument is as easy as arg_16_bits & 0x0fff, with arg_16_bits the primary argument for instructions with a single argument.

Note on builtins

Builtins are handled with BUILTIN id, with id being the id of the builtin function object. The ids of the builtins are listed below.


NameID
false0
true1
nil2

The other builtins are listed in Builtins.cpp.

The stack and the locales

The stack is used for passing temporary values around, for example the arguments of a function. On the other end the locales are there to store long term values, the variables. They are stored in a LIFO stack and should be referenced by there identifier (index in the symbols table, also used by instructions like LOAD_SYMBOL).

Function calling convention

If we want to call a function foo, eg by writing (foo 1 2 3), the arguments will be pushed in reverse order on the stack.

First, push 3, then 2, then 1.

In the end, our stack looks like this:

1   <-- Top of the stack
2
3
... <-- Bottom of the stack

Hence, we can retrieve the arguments in the correct order. However, this has the effect of inverting the order of evaluation of the arguments, if we pass expressions to our function: (foo (+ 1 2) (* 3 4) (- 5 6)), the expression (+ 1 2) will be evaluated last, while (- 5 6) will be evaluated first.

Instructions

TS represents the element at the top of the stack, TS1 represents the element below it, and so on.

CodeArgument(s)Job
NOP (0x00)Does nothing, useful for padding
LOAD_SYMBOL (0x01)symbol idLoad a symbol from its ID onto the stack
LOAD_CONST (0x02)symbol idLoad a constant from its ID onto the stack
POP_JUMP_IF_TRUE (0x03)absolute address to jump toJump to the provided address if the last value on the stack was equal to true. Remove the value from the stack no matter what it is
STORE (0x04)symbol idTake the value on top of the stack and create a variable in the current scope, named following the given symbol id (cf symbols table)
SET_VAL (0x05)symbol idTake the value on top of the stack and put it inside a variable named following the symbol id (cf symbols table), in the nearest scope. Raise an error if it couldn't find a scope where the variable exists
POP_JUMP_IF_FALSE (0x06)absolute address to jump toJump to the provided address if the last value on the stack was equal to false. Remove the value from the stack no matter what it is
JUMP (0x07)absolute address to jump toJump to the provided address
RET (0x08)If in a code segment other than the main one, quit it, and push the value on top of the stack to the new stack; should as well delete the current environment. Otherwise, acts as a HALT
HALT (0x09)Stop the Virtual Machine
CALL (0x0a)argument countCall function from its symbol id located on top of the stack. Take the given number of arguments from the top of stack and give them to the function (the first argument taken from the stack will be the last one of the function). The stack of the function is now composed of its arguments, from the first to the last one
CAPTURE (0x0b)symbol idTell the Virtual Machine to capture the variable from the current environment. Main goal is to be able to handle closures, which need to save the environment in which they were created
BUILTIN (0x0c)builtin idPush the corresponding builtin function object on the stack
DEL (0x0d)symbol idRemove a variable/constant named following the given symbol id (cf symbols table)
MAKE_CLOSURE (0x0e)constant idPush a Closure with the page address pointed by the constant, along with the saved scope created by CAPTURE instruction(s)
GET_FIELD (0x0f)symbol idRead the field named following the given symbol id (cf symbols table) of a Closure stored in TS. Pop TS and push the value of field read on the stack
PLUGIN (0x10)constant idLoad a plugin dynamically, plugin name is stored as a string in the constants table
LIST (0x11)number of elementsCreate a list from the N elements pushed on the stack. Follows the function calling convention
APPEND (0x12)number of elementsAppend N elements to a list (TS). Elements are stored in TS(1)..TS(N). Follows the function calling convention
CONCAT (0x13)number of elementsConcatenate N lists to a list (TS). Lists to concat to TS are stored in TS(1)..TS(N). Follows the function calling convention
APPEND_IN_PLACE (0x14)number of elementsAppend N elements to a reference to a list (TS), the list is being mutated in-place, no new object created. Elements are stored in TS(1)..TS(N). Follows the function calling convention
CONCAT_IN_PLACE (0x15)number of elementsConcatenate N lists to a reference to a list (TS), the list is being mutated in-place, no new object created. Lists to concat to TS are stored in TS(1)..TS(N). Follows the function calling convention
POP_LIST (0x16)Remove an element from a list (TS), given an index (TS1). Push a new list without the removed element to the stack
POP_LIST_IN_PLACE (0x17)Remove an element from a reference to a list (TS), given an index (TS1). The list is mutated in-place, no new object created
SET_AT_INDEX (0x18)Modify a reference to a list or string (TS) by replacing the element at TS1 (must be a number) by the value in TS2. The object is mutated in-place, no new object created
SET_AT_2_INDEX (0x19)Modify a reference to a list (TS) by replacing TS[TS2][TS1] by the value in TS3. TS[TS2] can be a string (if it is, TS3 must be a string). The object is mutated in-place, no new object created
POP (0x1a)Remove the top of the stack
DUP (0x1b)Duplicate the top of the stack
CREATE_SCOPE (0x1c)Create a new local scope
POP_SCOPE (0x1d)Destroy the last local scope
ADD (0x1e)Push TS1 + TS
SUB (0x1f)Push TS1 - TS
MUL (0x20)Push TS1 * TS
DIV (0x21)Push TS1 / TS
GT (0x22)Push TS1 > TS
LT (0x23)Push TS1 < TS
LE (0x24)Push TS1 <= TS
GE (0x25)Push TS1 >= TS
NEQ (0x26)Push TS1 != TS
EQ (0x27)Push TS1 == TS
LEN (0x28)Push len(TS), TS must be a list
EMPTY (0x29)Push empty?(TS), TS must be a list or string
TAIL (0x2a)Push tail(TS), all the elements of TS except the first one. TS must be a list or string
HEAD (0x2b)Push head(TS), the first element of TS or nil if empty. TS must be a list or string
ISNIL (0x2c)Push true if TS is nil, false otherwise
ASSERT (0x2d)Throw an exception if TS1 is false, and display TS (must be a string). Do not push anything on the stack
TO_NUM (0x2e)Convert TS to number (must be a string)
TO_STR (0x2f)Convert TS to string
AT (0x30)Push the value at index TS (must be a number) in TS1, which must be a list or string
AT_AT (0x31)Push the value at index TS (must be a number), inside the list or string at index TS1 (must be a number) in the list at TS2
MOD (0x32)Push TS1 % TS
TYPE (0x33)Push the type of TS as a string
HASFIELD (0x34)Check if TS1 is a closure field of TS. TS must be a Closure, TS1 a String
NOT (0x35)Push !TS
LOAD_CONST_LOAD_CONST (0x36)constant id, constant idLoad two consts (primary then secondary) on the stack in one instruction
LOAD_CONST_STORE (0x37)constant id, symbol idLoad const primary into the symbol secondary (create a variable)
LOAD_CONST_SET_VAL (0x38)constant id, symbol idLoad const primary into the symbol secondary (search for the variable with the given symbol id)
STORE_FROM (0x39)symbol id, symbol idStore the value of the symbol primary into a new variable secondary
SET_VAL_FROM (0x3a)symbol id, symbol idStore the value of the symbol primary into an existing variable secondary
INCREMENT (0x3b)symbol id, countIncrement the variable primary by count and push its value on the stack
DECREMENT (0x3c)symbol id, countDecrement the variable primary by count and push its value on the stack
STORE_TAIL (0x3d)symbol id, symbol idLoad the symbol primary, compute its tail, store it in a new variable secondary
STORE_HEAD (0x3e)symbol id, symbol idLoad the symbol primary, compute its head, store it in a new variable secondary
SET_VAL_TAIL (0x3f)symbol id, symbol idLoad the symbol primary, compute its tail, store it in an existing variable secondary
SET_VAL_HEAD (0x40)symbol id, symbol idLoad the symbol primary, compute its head, store it in an existing variable secondary
CALL_BUILTIN (0x41)builtin id, argument countCall a builtin by its id in primary, with secondary arguments. Bypass the stack size check because we do not push IP/PP since builtins calls do not alter the stack