The LuaJIT Wiki

not logged in | [Login]

This is a mailing list post by Mike Pall.

The following things are needed to get the best speed out for numerical computations with LuaJIT (in order of importance):

  • Reduce number of unbiased/unpredictable branches.

    • Heavily biased branches (>95% in one direction) are fine.
    • Prefer branch-free algorithms.
      • Use math.min() and math.max().
      • Use bit.*, e.g. for conditional index computations.
  • Use FFI data structures.

    • Use int32_t, avoid uint32_t data types.
    • Use double, avoid float data types.
    • Metamethods are fine, but don't overdose them.
  • Call C functions only via the FFI.

    • Avoid calling trivial functions, better rewrite them in Lua.
    • Avoid callbacks -- use pull-style APIs (read/write) and iterators instead.
  • Use plain 'for i=start,stop,step do ... end' loops.

    • Prefer plain array indexing, e.g. 'a[i+2]'.
    • Avoid pointer arithmetic.
  • Find the right balance for unrolling.

    • Avoid inner loops with low iteration count (< 10).
    • Only unroll loops if the loop body has not too many instructions.
    • Consider using templates instead of hand-unrolling (see GSL Shell).
    • You may have to experiment a bit.
  • Define and call only 'local' (!) functions within a module.

  • Cache often-used functions from other modules in upvalues.

    • E.g. local sin = math.sin ... local function foo() return 2*sin(x) end
    • Don't do this for FFI C functions, cache the namespace instead, e.g. local lib = ffi.load("lib").
  • Avoid inventing your own dispatch mechanisms.

    • Prefer to use built-in mechanisms, e.g. metamethods.
  • Do not try to second-guess the JIT compiler.

    • It's perfectly ok to write 'z = x[a+b] + y[a+b]'.
      • Do not try CSE by hand, e.g. 'local c = a+b'.
    • It's perfectly ok to write 'a[i][j] = a[i][j] * a[i][j+1]'.
      • Do not try to cache partial FFI struct/array references (e.g. a[i]) unless they are long-lived (e.g. in a big loop).
  • Be careful with aliasing, esp. when using multiple arrays.

    • LuaJIT uses strict type-based disambiguation, but there are limits to this due to C99 conformance.
    • E.g. in 'x[i] = a[i] + c[i]; y[i] = a[i] + d[i]' the load of a[i] needs to be done twice, because x could alias a. It does make sense to use 'do local t = a[i] ... end' here.
  • Reduce the number of live temporary variables.

    • Best to initialize on definition, e.g. 'local y = f(x)'
      • Yes, this means you should interleave this with other code.
      • Do not hoist variable definitions to the start of a function -- Lua is not JavaScript nor K&R C.
    • Use 'do local y = f(x) ... end' to bound variable lifetimes.
  • Do not intersperse expensive or uncompiled operations.

    • print() is not compiled, use io.write().
    • E.g. avoid assert(type(x) == "number", "x is a "..mytype(x)")
      • The problem is not the assert() or the condition (basically free). The problem is the string concatenation, which has to be executed every time, even if the assertion never fails!
    • Watch the output of -jv and -jdump.

You need to take all of these factors into account before deciding on a certain algorithm. An advanced algorithm, that's fast in theory, may be slower than a simpler algorithm, if the simpler algorithm has much fewer unbiased branches.