Sunday, June 14, 2015

Proving correctness of the GCC match.pd — Part 2

The previous post talked about the background and the GCC GIMPLE IR. This post continues with a look at the format of match.pd.

The match.pd contains transformation rules for the IR, written in a "lispy" domain-specific language. Each transformation rule is specified by the simplify expression that has two operands: a template expression that is matched to the IR, and a replacement expression that replaces the matching expression in the IR. For example
/* x & ~0 -> x */
  (bit_and @0 integer_all_onesp)
match expressions corresponding to "x&~0" and replaces them with "x".

A template operand of the form @<number> is called a "capture" — it captures the operand so you can refer to it in other parts of the simplify expression. Using the same capture multiple times in a template means that the operands need to be identical, e.g.
/* x & x -> x */
  (bit_and @0 @0)
match a bit_and with two identical operands.

The predicates, such as integer_all_onesp in the first example above, are the normal predicates used in the GCC middle-end. It is possible to capture operators and predicates too, so you may write
/* x | ~0 -> ~0 */
  (bit_ior @0 integer_all_onesp@1)
for the rule that changes "x|~0" to "~0".

Pattern matching for commutative operators may use the :c decoration to match the operands in any order. For example
/* x | ~0 -> ~0 */
  (bit_ior:c @0 integer_all_onesp@1)
will match both "x|~0" and "~0|x" (although it does not matter in this case, as the GCC IR always has the constant as its second argument for commutative operations).

There are cases where you cannot express the replacement using this syntax, so it is possible to generate the replacement expression using C++ code by placing it between curly braces
/* x ^ x -> 0 */
  (bit_xor @0 @0)
  { build_zero_cst (type); })
The type of the outermost matching expression (i.e. the type of the bit_xor) is available in the variable type.

It is often the case that additional conditions need to be fulfilled in order to permit the replacement. This can be handled with an if expression:
/* x / -1 -> -x */
  (exact_div @0 integer_minus_onep)
  (if (!TYPE_UNSIGNED (type))
   (negate @0)))
This replaces the original expression only if the type of the outermost expression is signed. The operand to if is a C-style expression, so it is possible to create complex conditions, such as
(if (!HONOR_SNANS (element_mode (type))
     && (!HONOR_SIGNED_ZEROS (element_mode (type))
         || !COMPLEX_FLOAT_TYPE_P (type)))
Many rules are identical for several operations, and you can handle this with a for expression
/* x / -1 -> -x */
(for div (trunc_div ceil_div floor_div round_div exact_div)
   (div @0 integer_minus_onep)
   (if (!TYPE_UNSIGNED (type))
    (negate @0))))
The if and for can be nested arbitrarily, and you can use them to construct complex rules where multiple simplifications are done within the same for or if expression.

Finally, there are some additional functionality, such as specifying parts of templates optional, but I'll wait with that until I have made some progress with the functionality described so far...

The next blog post will look at actually proving things.

Sunday, June 7, 2015

API for manipulating and optimizing SPIR-V binaries

I have cleaned up the API in the spirv-tools module I wrote about in a previous blog post. There is currently functionality for
  • Reading/writing SPIR-V binaries and my high level assembler
  • Iterating over instructions, functions, and basic blocks
  • Adding/removing/examining instructions, functions, and basic blocks
  • Some optimization passes (dead code elimination, and CFG simplification)
The idea behind the API is that it should correspond directly to the SPIR-V binary; the binary is conceptually represented as a list of SPIR-V instructions by the Module class, and each Instruction consist of the operation name, result ID, type ID, and operands. Iterating over the module returns instructions in the same order as in the binary. The API also has concepts of functions and basic blocks, and the Function and BasicBlock classes encapsulates sub-sequences of the binary's instructions.

Reading and examining binaries can be done with a minimum of code. For example, here is how to read a SPIR-V binary and count the number of load instructions:
#!/usr/bin/env python
import read_spirv

with open('frag.spv', 'r') as stream:
    module = read_spirv.read_module(stream)

nof_loads = 0
for inst in module.instructions():
    if inst.op_name == 'OpLoad':
        nof_loads += 1

print 'Number of load instructions: ' + str(nof_loads)

The main use case for the API is analyzing binaries, test generation, etc., but it is useful for implementing optimizations too. For example, a simple peephole optimization for transforming integer "-(-x)" to "x" can be written as
for inst in module.instructions():
    if inst.op_name == 'OpSNegate':
        op_inst = module.id_to_inst[inst.operands[0]]
        if op_inst.op_name == 'OpSNegate':
            src_inst = module.id_to_inst[op_inst.operands[0]]
For each instruction we check if it is an OpSNegate instruction, if so, we access the predecessor instruction. If that too is an OpSNegate, then we replaces all uses of the original instruction with the second instruction's predecessor. This leaves the OpSNegate dead, so you probably want to run the dead code elimination pass after this, which you do as

A slightly more involved example is optimizing integer "x+x" to "x<<1":
for inst in module.instructions():
    if (inst.op_name == 'OpIAdd' and
           inst.operands[0] == inst.operands[1]):
        const1 = module.get_constant(inst.type_id, 1)
        sll = ir.Instruction(module, 'OpShiftLeftLogical',
                             module.new_id(), inst.type_id,
                             [inst.operands[0], const1.result_id])
Here we need to replace the OpIAdd instruction with a newly created OpShiftLeftLogical. One complication is that SPIR-V specifies the type partially by the PrecisionLow, PrecisionMedium, and PrecisionHigh decorations, so we need to copy the decorations each time we create a new instruction as failing to do this may give a dramatic performance reduction for some architectures. I still think that SPIR-V should change how the type and precision modifiers are handled for graphical shaders...

The module.get_constant returns an OpConstant or OpConstantComposite instruction with the specified type and value. The value should in general have the same number of elements as the type for vector types, but it is allowed to pass a scalar value, which replicates the value over the vector width.

inst.replace_with(sll) replaces inst with sll in the basic block, updates all uses of inst to use sll, and destroys inst. It is safe to add/remove instructions while iterating. The only possible issue is that the iterator may not see instructions that are inserted in the current basic block, and you may see an instruction in its old place if it is moved within the current basic block. The predecessors/successors are however always correctly updated — it is only the iteration order that may be surprising.

Basic blocks are handled in a similar way to instructions. See the simplify_cfg pass for an example of how they are used in reality.

There are still some functionality missing (see the TODO file) that I'm planning to fix eventually. Please let me know if you need some specific functionality, and I'll bump its priority.