not logged in | [Login]
The following document describes the Intermediate Representation (IR) used by the JIT-compiler of LuaJIT 2.0. The trace-compiler records bytecode instructions, following the control-flow, and emits the corresponding IR instructions on-the-fly.
The IR has the following characteristics:
The IR is in SSA (Static Single Assignment) form. Every instruction (node) represents a single definition of a value. Multiple instructions form a partially connected data-flow graph. Data-flow for loops is represented using PHI-instructions. Control-flow is always implicit.
The IR is linear, pointer-free and implicitly numbered: every instruction can be uniquely referenced (IRRef) by its position in a linear array. Specially crafted, biased IR references allow fast const vs. non-const decisions. No space is wasted on storing an explicit reference number, value number or similar.
The IR is in 2-operand-normalized form: every instruction has an opcode and a maximum of two operands. A few instructions may need more operands (e.g.
CALL*), which are composed using extension instructions (
The IR is typed: every instruction has an output data type. The modeled types correspond to the basic Lua data types plus low-level data types. Higher-level data types are indirectly modeled as-needed with guarded assertions.
The IR has segregated, per-opcode chaining: this allows fast searching for specific instructions in reverse order without a full traversal. This is used to speed up many optimizations, like CSE or alias analysis. Most searches stop after zero (no match), one or two dereferences in practice.
The IR is very compact: it needs only 64 bits per instruction and all instructions are adjacent to each other. This layout is very cache-efficient and very fast to index or traverse.
The IR is incrementally generated: the IR array is bi-directionally grown: constants grow downwards, all other instructions grow upwards. Most optimizations are perfomed on-the-fly and eliminated instructions are either simply not emitted, ignored during code generation or appropriately tagged. There's no general need to insert or delete instructions in the middle. This avoids the very cache-inefficient linked-sea-of-nodes data structure, presented in most compiler textbooks.
The IR is unified: it carries both high-level semantics and low-level details. Different stages of the compiler use different aspects of the IR, but share a common IR format. Eliminating the classic HIR, MIR, LIR separation (high-, medium-, low-level IR) greatly reduces complexity and compiler overhead. It avoids semantic information loss due to abstraction mismatches and allows cheap and effective high-level semantic disambiguation for memory references.
The IR uses auxiliary snapshots: a snapshot captures the IR references corresponding to modified slots and frames in the bytecode execution stack. Every snapshot saves a specific bytecode execution state, which can later be restored on trace exits. Snapshots are sparsely emitted and compressed. Snapshots provide the link between the IR and the bytecode domain (and transitively the source code domain, via the bytecode debug info).
COMPLETE REWRITE IN PROGRESS
src/lj_jit.h in the LuaJIT source code for the full details. The generated IR can be listed with
luajit -jdump (traced bytecode, IR and machine code) or
luajit -jdump=i (IR only).
$ ./luajit -jdump=bitmsr LuaJIT 2.0.0-beta10 -- Copyright (C) 2005-2012 Mike Pall. http://luajit.org/ JIT: ON CMOV SSE2 SSE3 AMD fold cse dce fwd dse narrow loop abc sink fuse > local x = 1.2 for i=1,1e3 do x = x * -3 end ---- TRACE 1 start stdin:1 0006 MULVN 0 0 1 ; -3 0007 FORL 1 => 0006 ---- TRACE 1 IR .... SNAP #0 [ ---- ] 0001 rbp int SLOAD #2 CI 0002 xmm7 > num SLOAD #1 T 0003 xmm7 + num MUL 0002 -3 0004 rbp + int ADD 0001 +1 .... SNAP #1 [ ---- 0003 ] 0005 > int LE 0004 +1000 .... SNAP #2 [ ---- 0003 0004 ---- ---- 0004 ] 0006 ------------ LOOP ------------ 0007 xmm7 + num MUL 0003 -3 0008 rbp + int ADD 0004 +1 .... SNAP #3 [ ---- 0007 ] 0009 > int LE 0008 +1000 0010 rbp int PHI 0004 0008 0011 xmm7 num PHI 0003 0007 ---- TRACE 1 mcode 81 394cffa3 mov dword [0x4183f4a0], 0x1 394cffae movsd xmm0, [0x4184f698] 394cffb7 cvtsd2si ebp, [rdx+0x8] 394cffbc cmp dword [rdx+0x4], 0xfffeffff 394cffc3 jnb 0x394c0010 ->0 394cffc9 movsd xmm7, [rdx] 394cffcd mulsd xmm7, xmm0 394cffd1 add ebp, +0x01 394cffd4 cmp ebp, 0x3e8 394cffda jg 0x394c0014 ->1 ->LOOP: 394cffe0 mulsd xmm7, xmm0 394cffe4 add ebp, +0x01 394cffe7 cmp ebp, 0x3e8 394cffed jle 0x394cffe0 ->LOOP 394cffef jmp 0x394c001c ->3 ---- TRACE 1 stop -> loop
The above prints the bytecode of the trace, the IR generated from that bytecode with snapshots, and the machine code generated from the IR.
The columns of the IR are as follows:
1st column: IR instruction number (implicit SSA ref) 2nd column: physical CPU register or physical CPU stack slot that value is written to when converted to machine code. '[%x+]' (rather than register name) indicates hexadecimal offset from stack pointer. (This column is only present if the 'r' flags is included in -jdump, which augments the IR with register/stack slots. It is not part of the IR itself.) 3rd column: Instruction flags: ">" (IRT_GUARD = 0x80 instruction flag) are locations of guards (leading to possible side exits from the trace). "+" (IRT_ISPHI = 0x40 instruction flag) indicates instruction is left or right PHI operand. (i.e referred to in some PHI instruction). 4th column: IR type (see IR Types below) 5th column: IR opcode (see opcode reference) 6th/7th column: IR operands (SSA refs or literals) '#' prefixes refer to slot numbers, used in SLOADS. #0 is the base frame (modified only in tail calls). #1 is the first slot in the first frame (register 0 in the bytecode) '[+-]' prefixes indicate positive or negative numeric literals. '[0x%d+]' and NULL are memory addresses. '"..."' are strings. '@' prefixes indicate slots (what is this?). Other possible values: "bias" (number 2^52+2^51 ?), "userdata:%p", "userdata:%p" (table)--when do these occur?.
See also SSA dump format comments: http://lua-users.org/lists/lua-l/2008-06/msg00225.html (older version).
formatk in dump.lua.
Each snaphot (SNAP) lists the modified stack slots and their values. The i-th value in the snapshot list represents the index of the IR that writes a value in slot number #i. '---' indicates that the slot is not written. Frames are separated by '|'. For further comments on snapshots, see http://lua-users.org/lists/lua-l/2009-11/msg00089.html.
Every instruction has an output data type, which is either one of the
basic Lua types or a low-level type. The order is carefully designed to
simplify the mapping from tagged value types and to optimize common
checks (e.g. 'any integer type'). See
|4||str||STR||Interned string object|
|5||p32||P32||32 bit pointer|
|7||pro||PROTO||Function prototype object|
|8||fun||FUNC||Function (closure) object|
|9||p64||P64||64 bit pointer|
|13||flt||FLOAT||32 bit FP number (float)|
|14||num||NUM||64 bit FP number (double)|
|15||i8||I8||8 bit signed integer (int8_t)|
|16||u8||U8||8 bit unsigned integer (uint8_t)|
|17||i16||I16||16 bit signed integer (int16_t)|
|18||u16||U16||16 bit unsigned integer (uint16_t)|
|19||int||INT||32 bit signed integer (int32_t)|
|20||u32||U32||32 bit unsigned integer (uint32_t)|
|21||i64||I64||64 bit signed integer (int64_t)|
|22||u64||U64||64 bit unsigned integer (uint64_t)|
|23||sfp||SOFTFP||Hi-word of split soft-fp operations|
Constant instructions are only present in the constant part of the IR (growing upwards to lower refs). IR constants are interned (de-duplicated) and can be compared for equality only by looking at their references.
Constant instructions never appear in dumps, since
-jdump always shows
the actual constant value inlined into the referencing instructions.
|KPRI||Primitive type: nil, false, true|
|KINT||#int||32 bit integer constant|
|KGC||#ptr||Garbage collected constant|
|KKPTR||#ptr||Pointer constant to constant data|
|KNULL||#ptr||Typed NULL constant|
|KNUM||#k64ptr||Double-precision floating-point constant|
|KINT64||#k64ptr||64 bit integer constant|
|KSLOT||kref||#slot||Hash slot for constant|
Every trace has three
KPRI instructions at fixed references for the
constants nil, false and true (REF_NIL, REF_FALSE, REF_TRUE).
The 32 bit integer or pointer values occupy the space for both the left and the right 16 bit operand. 64 bit values are interned in a global constant table and indirectly referenced by 32 bit pointers.
KPTR is a constant pointer (absolute address) to possibly non-constant
KKPTR points to definitely constant data. Only data known by
the VM to be constant qualifies, e.g. an interned Lua string. Content
tagged as 'const' by users (e.g.
const char *) doesn't qualify.
KSLOT is used as a key for
HREFK and holds the hash slot where the
key is to be found and a reference to the constant key itself.
Guarded assertions have a dual purpose:
|LT||left||right||left < right (signed)|
|GE||left||right||left ≥ right (signed)|
|LE||left||right||left ≤ right (signed)|
|GT||left||right||left > right (signed)|
|ULT||left||right||left < right (unsigned/unordered)|
|UGE||left||right||left ≥ right (unsigned/unordered)|
|ULE||left||right||left ≤ right (unsigned/unordered)|
|UGT||left||right||left > right (unsigned/unordered)|
|EQ||left||right||left = right|
|NE||left||right||left ≠ right|
|ABC||bound||index||Array Bounds Check: bound > index (unsigned)|
|RETF||proto||pc||Return to lower frame: check target PC, shift base|
U.. opcodes provide unsigned comparison semantics for integer
types and unordered comparison semantics for floating-point types. A
NaN operand causes a 'false' outcome for
EQ and ordered comparisons,
and a 'true' outcome for
NE and unordered comparisons.
ABC is treated just like
UGT in the backend. But it follows
different FOLD rules, which simplifies ABC elimination.
The prototype returned to by
RETF is below the call graph covered by
the trace up to this point. Thus
RETF needs to anchor the prototype to
prevent recycling the PC after garbage collection.
|BNOT||ref||Bitwise NOT of ref|
|BAND||left||right||Bitwise AND of left and right|
|BOR||left||right||Bitwise OR of left and right|
|BXOR||left||right||Bitwise XOR of left and right|
|BSHL||ref||shift||Bitwise left shift of ref|
|BSHR||ref||shift||Bitwise logical right shift of ref|
|BSAR||ref||shift||Bitwise arithmetic right shift of ref|
|BROL||ref||shift||Bitwise left rotate of ref|
|BROR||ref||shift||Bitwise right rotate of ref|
The shift count for shift and rotate instructions is interpreted modulo
the bit width of the shifted type, i.e. only the lowest N bits are
significant. Appropriate bit masking instructions (
BAND) are inserted
for backends where the underlying machine instructions don't perform the
masking themselves. Similarly, rotates are unified to one direction, in
case the architecture doesn't provide machine instructions for both.
|ADD||left||right||left + right|
|SUB||left||right||left - right|
|MUL||left||right||left * right|
|DIV||left||right||left / right|
|MOD||left||right||left % right|
|POW||left||right||left ^ right|
|FPMATH||ref||#fpm||fpmath(ref), see below|
|ADDOV||left||right||left + right, overflow-checked|
|SUBOV||left||right||left - right, overflow-checked|
|MULOV||left||right||left * right, overflow-checked|
All arithmetic ops operate within the domain of their types: integers, pointers or floating-point numbers. Not all ops are defined for all possible types. Both signed and unsigned integers wrap around on overflow.
Overflow-checking operations exit the trace upon signed integer arithmetic overflow.
MOD is decomposed into left-floor(left/right)right for floating-point
POW is either turned into
POW with an integer as the right
operand, or into sqrt(left) if right is 0.5, or into
exp2(log2(left)right) otherwise (the backend may later merge this into
a call to pow()).
The undefined cases for the integer variants of
return the integer value with only the highest bit set.
ABS for floating-point numbers reference a SIMD-aligned
constant in the right operand. Some backends implement these as a
bitwise XOR or AND of the number and the constant.
The right operand of
LDEXP is a floating-point number on x86 and x64
platforms and a 32 bit integer on all others.
All floating-point arithmetic operations obey the standard definitions
from IEEE 754 wrt. +-0, +-Inf, NaN and denormals.
no defined behavior for NaN operands.
FPMATH is used for unary floating-point arithmetic operations. The
right operand specifies the actual operation:
Memory references generate a pointer value to be used by the respective
loads or stores. To preserve higher-level semantics and to simplify
alias analysis they are not decomposed into lower-level operations (like
XLOAD references). Some of them can be (partially) fused into the
operands of load or store instructions by most backends.
|HREFK||hash||kslot||Constant hash reference|
|NEWREF||tab||key||Create new reference|
|UREFO||func||#uv||Open upvalue reference|
|UREFC||func||#uv||Closed upvalue reference|
|FREF||obj||#field||Object field reference|
HREFK reference an
FLOAD of the array part or hash part
of a Lua table with their left operand.
reference a Lua table, since they need to search or extend the table.
HREFK is specialized to the hash slot where the constant key is
expected -- see
KSLOT above for its right operand.
the key does not exist in the Lua table, yet.
The left operand of
UREFC reference the current function
(closure). The right operand holds an upvalue disambiguation hash in the
lowest 8 bits and the upvalue index in the higher bits.
For the possible values of the field ID in
FREF, see IRFLDEF in
Loads and stores operate on memory references and either load a value (result of the instruction) or store a value (the right operand). To preserve higher-level semantics and to simplify alias analysis they are not unified or decomposed into lower-level operations.
|FLOAD||obj||#field||Object field load|
|SLOAD||#slot||#flags||Stack slot load|
|VLOAD||aref||Vararg slot load|
|FSTORE||fref||val||Object field store|
SLOAD inline their memory references, all other loads and
all stores have a memory reference as their left operand. All loads
XLOAD work on tagged values and simultaneously
function as a guarded assertion that checks the loaded type.
FSTORE access specific fields inside objects, identified
by the field ID of their reference (e.g. the metatable field in table or
XLOAD works on lower-level types and the memory reference is either a
STRREF or decomposed into lower-level operations, a combination of
BSHL of pointers, offsets or indexes.
The slot number of
SLOAD is relative to the starting frame of a trace,
where #0 indicates the closure/frame slot and #1 the first variable slot
(corresponding to slot 0 of the bytecode). Note that
RETF shifts down
BASE and subsequent
SLOAD instructions refer to slots of the lower
Note there are no store operations for stack slots or vararg slots. All stores to stack slots are effectively sunk into exits or side traces. Snapshots efficiently manage the references that are to be stored. Vararg slots are read-only from the perspective of the called vararg function.
For the possible values of the field ID in
FLOAD and the flags in
XLOAD, see IRFLDEF, IRSLOAD_* and IRXLOAD_* in
|SNEW||data||length||Allocate interned string|
|XSNEW||data||length||Allocate interned string from cdata|
|TNEW||#asize||#hbits||Allocate Lua table with minimum array and hash sizes|
|TDUP||template||Allocate Lua table, copying a template table|
|CNEW||ctypeid||(size)||Allocate mutable cdata|
|CNEWI||ctypeid||val||Allocate immutable cdata|
SNEW is only used to reference data that's guaranteed to stay
constant, e.g. other strings. This allows elimination of the allocation
in case the string object isn't used (its data may still be used). This
assumption doesn't hold for data referenced by
ffi.string(), which has
The size of the allocated cdata object for
inferred from the ctypeid operand. For variable-length cdata, the size
is explicitly given by the size operand, otherwise the size operand is
CNEWI is only used for immutable scalar cdata types. It combines an
allocation with an initialization. The value given by the right operand
is implicitly stored. This shortens the IR and turns allocation sinking
into simple dead-code elimination (for immutable types only -- the
generic allocation sinking optimization is more involved).
|TBAR||tab||Table write barrier|
|OBAR||obj||val||Object write barrier|
|XBAR||XLOAD/XSTORE optimization barrier|
OBAR are write barriers needed for the incremental GC.
OBAR is currently only used for stores to upvalues. These barriers are
placed after the corresponding stores.
XBAR prevents optimizations of
XSTORE across the
barrier. Note that
CALLX* implicitly functions as such a barrier.
|CONV||src||#flags||Generic type conversion|
|TOBIT||num||bias||Convert double to integer with Lua BitOp semantics|
|TOSTR||number||Convert double or integer to string|
|STRTO||str||Convert string to double|
The bias operand of
TOBIT references the floating-point constant
2^52+2^51, which is added to the FP number. The lower 32 bits of the
result represent the integer component modulo 2^32 (within certain
limits, see the Lua BitOp semantics.
The lower 5 bits of the flags operand of
CONV specify the source type,
the next 5 bits specify the destination type (identical to the IR
instruction result type). Bit 10 specifies truncation behavior for FP to
integer conversions and bit 11 specifies sign-extension for widening
integer conversions. Bits 12 and 13 specify the strength of the check
for (guarded) conversions from double to integers: 1 means any FP number
is ok, 2 and 3 check the FP number for integerness. 2 enables special
rules for backpropagation of index conversions.
|CALLN||args||#ircall||Call internal function (normal)|
|CALLL||args||#ircall||Call internal function (load)|
|CALLS||args||#ircall||Call internal function (store)|
|CALLXS||args||func||Call external function (store/barrier)|
|CARG||args||arg||Call argument extension|
Calls to internal functions are divided into three categories:
lj_tabl_len()) the compiler must check for intervening stores.
For the possible values of the ircall operand, see
The FFI emits
CALLXS for calls to external functions. They are
generally considered to be stores and cannot be eliminated. They form an
implicit optimization barrier for
The args operand of call instructions references a left-leaning tree of
arguments using the
CARG extension instruction:
The func operand of
CALLXS either directly references the function
pointer or a
CARG holding the function pointer and the ctypeid for
non-standard calling conventions (e.g. vararg calls).
|BASE||#parent||#exit||BASE reference, link to parent side exit|
|PVAL||#pref||Parent value reference|
|GCSTEP||Explicit GC step|
|HIOP||left||right||Hold hi-word operands of split instructions|
|LOOP||Separator before loop-part of a trace|
|USE||ref||Explicit use of a reference|
|PHI||left||right||PHI node for loops|
|RENAME||ref||#snap||Renamed reference below snapshot|
NOP is used to patch previously emitted IR instructions, in case they
cannot eliminated or ignored on-the-fly.
BASE is a fixed instruction at REF_BASE, used to hold the BASE
pointer. It's implicitly referenced by e.g.
PVAL provides an alternate way to reference specific values of the
parent trace, which cannot be referenced with a parent
they are not stored in the stack at the snapshot).
GCSTEP provides an explicit GC step for certain cases where it needs
to be done after the initial snapshot.
HIOP must immediately follow a split instruction (split 64 bit op or
USE is needed to keep instructions for their side-effects that would
otherwise be eliminated: e.g. an
ADDOV that's used to check for
potential integer overflow of loop bounds.
PHI instructions are positioned at the end of a looping trace. The
left operand holds a reference to the initial value, the right operand
holds a reference to the value after each loop iteration.
RENAME is generated by the register allocator, when it renames a
register for a value (for efficiency or to preserve PHI registers). A
RENAME instruction holds the register used for the reference below the
given snapshot. Multiple instances of this may be present for the same
reference. The originally referenced instruction holds the register used
above of the highest such snapshot (if any).