1 files changed, 534 insertions, 1 deletions
diff --git a/Compiler in a Day.page b/Compiler in a Day.page
index 2a5c60b..a732d4d 100644
--- a/Compiler in a Day.page
+++ b/Compiler in a Day.page
@@ -21,4 +21,537 @@ They are, in the order they get run:
 
 TODO: pictures!
 
-Before we can start looking at these steps, however, we should look at the language we'll be compiling.
-\ No newline at end of file
+Before we can start looking at these steps, however, we should look at the language we'll be compiling.
+
+# The Ciad Language
+
+Before we start looking at the compiler, we should see and understand the language we’ll be compiling. This language is named Ciad (pronounced like “key-add”), which is an acronym for the name of this article.
+
+TODO: This is [here](https://plwiki.org/Compiler%20in%20a%20Day/01-ciadlang), but I really don't like it and want to rewrite it.
+
+Unlike many (most?) other compiler tutorials, we _won’t_ be starting with parsing. Instead, we’ll start with how we represent the program after it’s been parsed.
+
+# The Abstract Syntax Tree
+
+Most programming languages' syntax can be described with a tree structure.
+An expression like `m.a * m.d - m.b * m.c` could be represented as:
+
+```
+          ╭────────── - ──────────╮
+    ╭──── * ────╮           ╭──── * ────╮
+ ╭─ . ─╮     ╭─ . ─╮     ╭─ . ─╮     ╭─ . ─╮
+var  field  var  field  var  field  var  field
+ │     │     │     │     │     │     │     │
+"m"   "a"   "m"   "d"   "m"   "b"   "m"   "c"
+```
+
+Such a tree is called an *abstract syntax tree*, or AST.
+
+TODO: Move the CST material to the "what to do better" section?
+
+In our compiler, the parser directly produces the AST.
+In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST.
+
+The CST is a tree that exactly corresponds to the input program, with every token present as a child.
+
+```
+              ╭────────────── binop ──────────────╮
+     ╭───── binop ─────╮        │        ╭───── binop ─────╮
+ ╭─ dot ─╮    │    ╭─ dot ─╮    │    ╭─ dot ─╮    │    ╭─ dot ─╮
+var  │ field  │   var  │ field  │   var  │ field  │   var  │ field
+ │   │   │    │    │   │   │    │    │   │   │    │    │   │   │
+"m" "." "a"  "*"  "m" "." "d"  "-"  "m" "." "b"  "*"  "m" "." "c"
+```
+
+This tree is called the *concrete syntax tree*, or CST.
+It gets simplified to produce the AST.
+
+TODO: A better example, one where there's actual meaningful simplification?
+
+Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible.
+
+In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead.
+
+Now, it's finally time to see some of the code in our compiler!
+
+To start with, we'll just define the structure of the tree as Ruby classes.
+
+Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations.
+
+```ruby
+class Program
+  def initialize(decls)
+    @decls = decls
+  end
+end
+```
+
+Declarations either declare a function or a variable.
+Both have a name, but a function also has a list of arguments and a block of statements that comprise its body.
+
+```ruby
+class DeclFun
+  def initialize(name, args, body)
+    @name, @args, @body = name, args, body
+  end
+end
+
+class DeclVar
+  def initialize(name)
+    @name = name
+  end
+end
+```
+
+We'll also define a helper class for arguments.
+For now, this will just wrap the name of an argument, but later, it'll come in handy.
+
+```ruby
+class Arg
+  def initialize(name)
+    @name = name
+  end
+end
+```
+
+There are four kinds of statements: assignment statements, if statements, return statements, and while statements.
+These follow what's in the grammar.
+
+Remember, though, that expressions and variable declarations can also show up as statements!
+
+In a statically typed language, we would probably define `StmtExpr` and `StmtDeclVar` classes to handle them.
+However, we don't need to in Ruby, since it's dynamically typed!
+
+TODO: okay, I feel like I ought to justify this better...
+But the justification involves the visitors we're going to introduce later.
+
+```ruby
+class StmtAssign
+  def initialize(lhs, rhs)
+    @lhs, @rhs = lhs, rhs
+  end
+end
+
+class StmtIf
+  attr_reader :then_stmts
+  attr_reader :else_stmts
+  def initialize(cond, then_stmts, else_stmts)
+    @cond = cond
+    @then_stmts = then_stmts
+    @else_stmts = else_stmts
+  end
+end
+
+class StmtReturn
+  def initialize(expr)
+    @expr = expr
+  end
+end
+
+class StmtWhile
+  attr_reader :body_stmts
+  def initialize(cond, body_stmts)
+    @cond, @body_stmts = cond, body_stmts
+  end
+end
+```
+
+Finally, we have the classes for expressions.
+There are 10 of these, that are exactly like what we've seen so far.
+
+```ruby
+class ExprAppend # lhs ++ rhs
+  def initialize(lhs, rhs)
+    @lhs, @rhs = lhs, rhs
+  end
+end
+
+class ExprArrayIndex # array[index]
+  def initialize(array, index)
+    @array, @index = array, index
+  end
+end
+
+class ExprArrayLit # { exprs... }
+  def initialize(exprs)
+    @exprs = exprs
+  end
+end
+
+class ExprCall # func(args...)
+  def initialize(func, args)
+    @func, @args = func, args
+  end
+end
+
+class ExprEquals # lhs == rhs
+  def initialize(lhs, rhs)
+    @lhs, @rhs = lhs, rhs
+  end
+end
+
+class ExprIntLit
+  def initialize(value)
+    @value = value
+  end
+end
+
+class ExprMinus # lhs - rhs
+  def initialize(lhs, rhs)
+    @lhs, @rhs = lhs, rhs
+  end
+end
+
+class ExprPlus # lhs + rhs
+  def initialize(lhs, rhs)
+    @lhs, @rhs = lhs, rhs
+  end
+end
+
+class ExprStringLit
+  def initialize(value)
+    @value = value
+  end
+end
+
+class ExprVariable
+  def initialize(name)
+    @name = name
+  end
+end
+```
+
+Now that we have all these classes, we can build ASTs corresponding to programs.
+
+Let's look at a simple program and determine what its AST should look like.
+
+```ciad
+var x;
+
+fun main(args) {
+    x = length(args);
+    print({x, args});
+}
+```
+
+Because our AST is so simple, it corresponds pretty directly to the program.
+
+TODO: Is there more to say here?
+
+```ruby
+example = Program.new([
+  # var x;
+  DeclVar.new("x"),
+  # fun main(args) { ... }
+  DeclFun.new("main", [Arg.new("args")], [
+    # x = length(args);
+    StmtAssign.new(
+      ExprVariable.new("x"),
+      ExprCall.new("length", [ExprVariable.new("args")])),
+    # print({x, args});
+    ExprCall.new("print", [
+      ExprArrayLit.new([
+        ExprVariable.new("x"),
+        ExprVariable.new("args"),
+      ])
+    ]),
+  ]),
+])
+```
+
+Now that we have our AST, we'll need to process it.
+
+# Frame Layout
+
+When programs execute, their local variables' values are typically stored into a data structure called the *call stack*, or commonly just *the stack*.
+
+This is a stack data structure whose entries are called *stack frames*.
+These contain all the information that a function might need to save while calling another one.
+
+For example, consider the following program.
+
+```ciad
+fun f(x) {
+  var y;
+  var z;
+
+  y = x + 3;
+  z = g(x - 1);
+  return y - z;
+}
+
+fun g() {
+  if x == 0 {
+    return 42;
+  } else {
+    return f(x - 1);
+  }
+}
+
+fun main(args) {
+  return f(2);
+}
+```
+
+When `f` calls `g`, we need to store the value of `y` somewhere, so that it can be used again after `g` returns.
+We also need to store what function `f` should return to once it finishes, and we store that on the stack as well.
+
+In general, most languages structure their stacks as something like:
+
+```
+╭───────────────────────────────╮ <-- "bottom of the stack"
+│ address main should return to │          (highest address
+├───────────────────────────────┤           on most current
+│      saved frame pointer      │        CPU architectures)
+├───────────────────────────────┤
+│                               │
+│ saved local variables of main │
+│                               │
+┝━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
+│   address f should return to  │
+├───────────────────────────────┤
+│      saved frame pointer      │
+├───────────────────────────────┤
+│                               │
+│   saved local variables of f  │
+│                               │
+┝━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
+│   address g should return to  │
+├───────────────────────────────┤
+│      saved frame pointer      │             frame pointer
+├───────────────────────────────┤ <--------- when running g
+│                               │
+│      local variables of g     │             stack pointer
+│                               │       when running g, aka
+╰───────────────────────────────╯ <----- "top of the stack"
+```
+
+The stack then "grows down" as functions are called.
+When a function returns, it removes its frame from the stack, shrinking it.
+
+The stack is kept track of with the *stack pointer*, which points to the top of the stack, i.e. the address at the _end_ of the current stack frame.
+
+In some programs, there's also a *frame pointer*.
+This points to the address at the start of local variables of the current stack frame.
+
+With a smarter compiler than we're building in this series, the frame pointer isn't necessary.
+However, it can still be useful for a debugger and some kinds of runtime introspection.
+
+In our compiler, we'll need to have a pass that computes how much storage the local variables of each function will take up in the stack frame.
+
+Once we know this, we can compute the address of any local variable as an offset from the frame pointer.
+
+Alright, let's get started with the code!
+
+The first thing we'll need to do is to extend `Arg` and `DeclVar` with the ability to store a slot index.
+This the offset of the variable from the frame pointer.
+
+```ruby
+class Arg
+  # ...
+
+  attr_accessor :slot
+end
+
+class DeclVar
+  # ...
+
+  attr_accessor :slot
+end
+```
+
+Then, we can add the method to perform the pass to the `Program` class.
+
+Computing frame layouts doesn't require any sort of information that's shared between functions, so we can just tell each function declaration to compute it independently.
+
+```ruby
+class Program
+  # ...
+
+  def compute_frame_layouts
+    @decls.each do |decl|
+      case decl
+      in DeclFun
+        decl.compute_frame_layout
+      in DeclVar
+        # Ignore it; variables don't have any kind of frame
+        # to be laid out.
+      end
+    end
+  end
+end
+```
+
+Once we're processing the function, we start with storing a counter for the number of local variables we have in the declaration.
+
+We define a helper function (`assign_slot_to`) to be called on every `Arg` and `DeclVar` in the function as well.
+
+```ruby
+class DeclFun
+  # ...
+
+  def compute_frame_layout
+    @locals_count = 0
+    def assign_slot_to(var)
+      var.slot = @locals_count
+      @locals_count += 1
+    end
+
+    @args.each { |arg| assign_slot_to arg }
+
+    def traverse_stmt(stmt)
+      case stmt
+      in DeclVar
+        assign_slot_to stmt
+
+      in StmtIf
+        stmt.then_stmts.each { |stmt| traverse_stmt stmt }
+        stmt.else_stmts.each { |stmt| traverse_stmt stmt }
+      in StmtWhile
+        stmt.body_stmts.each { |stmt| traverse_stmt stmt }
+
+      else
+        # Do nothing; nothing else needs to get a slot nor
+        # contains anything that needs to get a slot.
+      end
+    end
+
+    @body.each { |stmt| traverse_stmt stmt }
+  end
+end
+```
+
+Once this has finished running, every `DeclStmt` will have an `@locals_count`, and every `Arg` and non-global `DeclVar` will have an `@slot`.
+
+Before we can start generating code, though, we should look at some details of exactly what code we'll be generating.
+
+# The ABI
+
+We're ready to start making the final decisions about the machine code we're going to produce.
+
+To reiterate, we're _not_ aiming to produce efficient machine code for now.
+We're just aiming to get the simplest thing that works.
+
+The important things to decide are:
+
+- How different registers are allowed to be used.
+- The way we pass arguments to and receive arguments from functions.
+- How the various data types in our language are represented.
+
+These questions (and more related ones we won't touch on) are called the *application binary interface*, or ABI.
+
+Usually, a platform has an ABI already defined and documented for the C programming language.
+This C ABI serves as the "lowest common denominator" for other languages' ABIs.
+
+Even though you can make your compiler follow whatever ABI you want, it simplifies calling into other languages (and letting them call your code) if it roughly follows the C ABI.
+
+Although we aren't providing features for users to interoperate with any other code they want, we still need to call into our runtime.
+Since our runtime will be written in C, we want to make this as painless as possible.
+
+We're producing code to run on Linux on x86_64 CPUs, so the most relevant C ABI document is the [AMD64 Architecture Processor Supplement of the System V ABI].
+
+We can copy its choices about how registers can be used, and how arguments get passed to and return values get returned from functions:
+
+| Registers                  | Saved by | Notes                 |
+| -------------------------- | -------- | --------------------- |
+| rdi, rsi, rdx, rcx, r8, r9 | Caller   | Function arguments    |
+| rax                        | Caller   | Function return value |
+| r10, r11                   | Caller   |                       |
+| rbp                        | Callee   | Frame pointer         |
+| rsp                        | Callee   | Stack pointer         |
+| rbx, r12, r13, r14, r15    | Callee   |                       |
+
+When a register is caller-saved, that means that the functions we generate are allowed to freely modify them.
+However, so is any function we call, so we can't rely on values in them staying unchanged.
+
+Conversely, a callee-saved register must be restored to its original value before we return, but we can rely on functions we call doing the same.
+
+Next, we need to figure out how we're going to represent values.
+We have three types of values: integers, strings, and arrays.
+
+Since we're implementing a dynamically typed language, any variable needs to be able to store values of any of these types.
+
+We also want values to fit in a single (64-bit) register, for simplicity and performance.
+
+Finally, we want values that contain memory addresses to be represented by those addresses, so that the garbage collector we're using can recognize them as pointers.
+
+The easiest thing to do is to simply represent all three as pointers to structures in the heap that have information about the type stored in them.
+
+For example, the array `{42, "foo"}` might be represented as the pointer `0x08001230`, pointing to data like:
+
+```
+╭────────────╮
+│ 0x08001230 │     HEAP
+╰────────────╯
+          │   │    ...     │
+          ╰─> ├────────────┤
+     type tag │      ARRAY │
+ data pointer │ 0x08001250 │───╮
+       length │          2 │   │
+     capacity │          4 │   │
+              ├────────────┤ <─╯
+              │ 0x08001270 │─────╮
+              │ 0x08001280 │───╮ │
+              │          0 │   │ │
+              │          0 │   │ │
+              ├────────────┤ <───╯
+     type tag │    INTEGER │   │
+        value │         42 │   │
+              ├────────────┤ <─╯
+     type tag │     STRING │
+ data pointer │ 0x080012a0 │───╮
+       length │          2 │   │
+     capacity │          8 │   │
+              ├────────────┤ <─╯
+              │ "foo", ... │
+              ├────────────┤
+              │    ...     │
+```
+
+We know that `0x08001230` points to an array, since when we dereference it, we get the `ARRAY` constant.
+
+It doesn't really matter what values the `INTEGER`, `ARRAY`, and `STRING` constants have.
+For simplicity, let's make them:
+
+| Constant  | Value |
+| --------- | ----- |
+| `INTEGER` |   0   |
+| `ARRAY`   |   1   |
+| `STRING`  |   2   |
+
+So, putting it all together, let's imagine the equivalent of this simple function in C:
+
+```
+fun f(arr, x) {
+    return arr[x] + x;
+}
+```
+
+It might look something like:
+
+```c
+enum type_tag { INTEGER, ARRAY, STRING };
+struct array  { struct value** data_ptr; size_t len, cap; };
+struct string {          char* data_ptr; size_t len, cap; };
+struct value {
+    enum type_tag tag;
+    union {
+        int64_t integer;
+        struct array array;
+        struct string string;
+    };
+};
+
+struct value* f(struct value* arr, struct value* x) {
+    assert(arr->tag == ARRAY);
+    assert(  x->tag == INTEGER);
+    assert(x->integer < arr->array.len);
+    struct value* arr_x = arr->array.data_ptr[x->integer];
+
+    assert(arr_x->tag == INTEGER);
+    assert(    x->tag == INTEGER);
+    return make_int(arr_x->integer + x->integer);
+}
+```
+
+[AMD64 Architecture Processor Supplement of the System V ABI]: https://gitlab.com/x86-psABIs/x86-64-ABI