## Writing a code generator

Dr. Paul Bone
paul.bone.id.au

I these slides use some JavaScript and WebAssembly in addition to the reveal.js library. I tested this with Firefox 63, I did not test on other browsers or attempt to polyfill things (such as arrow functions) for older browsers. If it doesn't work try a more recent browser.

I have added some exposition slides, to help make reading the slides without the presentation clearer.

Press SPACEBAR to go to the next slide / reveal fragments.

Some images are fairly obviously not my own work. Diagrams, code and prose are Copyright (C) 2018 Paul Bone. You may view this information, but may not duplicate it, pass it off as your own work etc.

### Who am I?

• I work on the garbage collector of Firefox's JavaScript engine.
• I work on compilers for fun.
• Because compilers are fun.
• I started on compilers when working on my honours' project in 2007, I was lucky.

### Compilers

Compilers are made of a pipeline of operations, transforming different data structures.

• Text
• Abstract Syntax Tree
• Assembly / Bytecode

This talk is about code generation.

### Why code generation?

People assume it's hard.

Or worse, eldritch knowledge kept secret by a sacred order of complier authors.

1992 — Teen Talk Barbie
"Math class is tough"

1994 — Lisa vs Malibu Stacy
Malibu Stacy: "Thinking too much gives you wrinkles"
Lisa Lionheart: "Trust in yourself, and you can achieve anything"

### Exposition

The point I'm trying to make with these slides and what I was saying verbally; is that we shouldn't tell ourselves "X is hard" because if we do, then we'll never attempt it. In some cases, this can even lead to gross misunderstandings held about a topic.

### So you're saying it's easy?

Saying something is easy can also be a problem.

What I'm getting at is that it's not as hard as you may think. It's achievable.

I also like to say that it's fun.

### Any topic can be deep

These are all deep topics, but you can learn the basics in an afternoon.

• Music performance
• Pottery
• Quantum mechanics
• Barista

The basics are fairly basic, but the depths can get really deep.

### WebAssembly

• Cross-platform virtual-machine for the web
• Open format
• Compact binary format & human readable text format (s-exprs)
• Safe: memory-safe, sandboxed
• Under development/specification
• Available in the 4 most popular browsers now!
• webassembly.org

### WebAssembly is stack-based

$f=\frac{9c}{5}+32$
 Stack
``` push 32; push 35; // c push 9; mul; push 5; div; add; ```

### Datatypes

• `i32`, `i64`, `f32`, `f64` and function types
• More to come

2's complement arithmetic:

• Most integer operations are sign-agnostic
• division, less-than, greater-than etc have signed/unsigned versions
• Floats need their own operations

### Relational operators use `i32`

The relational operators (equal, not-equal, less-than etc) return `i32` regardless of their input type.

Conditional branch consumes an `i32`

```// x (64-bit) is on the stack i64.const 5 i64.lt_s // (i64 i64 - i32) if ... // (i64 -)```

### Control flow

Control flow is structured

$\mathrm{instr}=\begin{array}{l}\mathrm{atomic-instr}\\ \mathrm{BLOCK}\phantom{\rule{0.5em}{0ex}}{\mathrm{instr}}^{*}\phantom{\rule{0.5em}{0ex}}\mathrm{END}\\ \mathrm{IF}\phantom{\rule{0.5em}{0ex}}\mathrm{type}\phantom{\rule{0.5em}{0ex}}\mathrm{THEN}\phantom{\rule{0.5em}{0ex}}{\mathrm{instr}}^{*}\phantom{\rule{0.5em}{0ex}}\mathrm{ELSE}\phantom{\rule{0.5em}{0ex}}{\mathrm{instr}}^{*}\phantom{\rule{0.5em}{0ex}}\mathrm{END}\\ \mathrm{LOOP}\phantom{\rule{0.5em}{0ex}}\mathrm{type}\phantom{\rule{0.5em}{0ex}}{\mathrm{instr}}^{*}\phantom{\rule{0.5em}{0ex}}\mathrm{END}\end{array}$
• Block-like instructions all have labels
• Only branch to a label only from within its block

This is fairly unusual for an IR, it allows: `break` and `continue` but not arbitrary `goto`.

### Indirect calls

`call_indirect x`

where x is a function type.

1. Pops the `i32` from the top-of-stack, let it be a
2. Indexes the function table with a, let it be f.
3. Verify that f has type x.
4. Call f.

### What's missing

Probably coming in a future version:

• GC
• Returning multiple values
• Tail-calls
• DOM integration

Are being considered / might be considered:

• Unstructured control flow
• Stack shuffling (but there are locals)

I don't work WebAssembly, including it's specification.

### Exposition

The next slides show a Hypothetical language. This is designed to be as simple as possible in order to demonstrate code generation. This is not supposed to be a good code generator but a basic code generator.

### Hypothetical language

Abstract Syntax Tree

```data Func = Func { name :: String, args :: [String], body :: Expr }```
```data Expr = BOp Op Expr Expr | Let String Expr Expr | Case Expr [PatExpr] | Call String [Expr] | Var String | Lit32 Integer ```
``` data Op = Add | Subtract | ... | LessThan | LessThanOrEqual | ... | NotEqual | Equal data PatExpr = PatExpr Pattern Expr data Pattern = Number Integer | Wildcard ```

### Example program

$c= 35 f= 9c 5 +32$
``` Let "c" (Lit32 35) BOp Add (BOp Div (BOp Mul (Lit32 9) (Var "c")) (Lit32 5)) (Lit32 32) ```

### What it doesn't have

• Only one data type 32-bit signed integer.
• Higher order values / calls.
• Closures
• Any data structure, eg: cons cell

### Generating WebAssembly

Visit each node depth first, generate instructions.

``` i32.const 9 i32.get_local 0 i32.mul i32.const 5 i32.div i32.const 32 i32.add ```

How did we know where this variable was?

## Allocate locals

```data Symtab a = Symtab { symbols :: M.Map a Int, next_id :: Int } build_locals :: Ast.Expr -> State (S.Symtab String) Ast.Expr build_locals (Ast.Let var0 let0 in0) = do do_rename <- s_member var0 (var, in2) <- if do_rename then renaming else do s_new var0 return (var0, in0) let_ <- build_locals let0 in_ <- build_locals in2 return (Ast.Let var let_ in_)```
• I have no duplicate variables, this could have been more optimal.
• Add each variable to the symbol table.

### Get and set locals

#### Get

`Var x`:

1. Lookup the var in the symbol table.
2. Generate `get_local id`.
3. Which will read the local and push its value onto the stack.

#### Set

`Let x e1 e2`:

1. Generate the code for e1.
2. Lookup the var in the symbol table.
3. Generate `set_local id`.
4. Which will take the result of e1 from the stack and save it to the local
5. Generate the code for e2.

### Case

```... get_local 1 // Step 1. set_local 2 // Step 3. get_local 2 // Step 4.1 i32.const 1 // cont. i32.eq // cont. (if (result i32) (then (i32.const 1)) // 4.2. (else // 4.1. (get_local 0) // 4.2. (i32.const 1) i32.sub ...) )```

WebAssembly makes this very easy with its nested blocks.

`Case expr pats`

1. Generate the code for expr
2. Create a new variable.
3. Generate: `i32.set_local v`.
4. for each pattern: ```PatExpr pat expr```
1. Generate test code for pat.
2. Generate code for expr.
5. Generate: `unreachable`

Exposition: This slide kind-of has a bug. I may fix it.

### Demo!

Status:
Input:
Output:

Source:

```-- Sorry, the syntax doesn't actually support -- multi-line functions. public fib n = let test = n < 2 in case test of 1 -> 1 0 -> fib(n - 1) + fib(n - 2)```

### WebAssembly Text Format (WAT)

```(func (export "fib") (param i32) (result i32) (local i32 i32) (get_local 0) (i32.const 2) i32.lt_s (set_local 1) (get_local 1) (set_local 2) (get_local 2) (i32.const 1) i32.eq (if (result i32) (then (i32.const 1)) (else (get_local 2) (i32.const 0) i32.eq (if (result i32) (then (get_local 0) (i32.const 1) i32.sub (call 2) (get_local 0) (i32.const 2) i32.sub (call 2) i32.add) (else unreachable)))) return)```

### Thank you

This code generator:
github.com/PaulBone/ast2wasm
Me:
paul.bone.id.au / @Paul_Bone

Code generation is challenging / rewarding / fun.