+++
title = "The Abstract Syntax Tree — Compiler In A Day"
prev = "01-ciadlang.md"
next = "03-frame-layout.md"
+++

# The Abstract Syntax Tree

Most programming languages' syntax can be described with a tree structure.
An expression like `m.a * m.d - m.b * m.c` could be represented as:

```
          ╭────────── - ──────────╮
    ╭──── * ────╮           ╭──── * ────╮
 ╭─ . ─╮     ╭─ . ─╮     ╭─ . ─╮     ╭─ . ─╮
var  field  var  field  var  field  var  field
 │     │     │     │     │     │     │     │
"m"   "a"   "m"   "d"   "m"   "b"   "m"   "c"
```

Such a tree is called an *abstract syntax tree*, or AST.

TODO: Move the CST material to the "what to do better" section?

In our compiler, the parser directly produces the AST.
In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST.

The CST is a tree that exactly corresponds to the input program, with every token present as a child.

```
              ╭────────────── binop ──────────────╮
     ╭───── binop ─────╮        │        ╭───── binop ─────╮
 ╭─ dot ─╮    │    ╭─ dot ─╮    │    ╭─ dot ─╮    │    ╭─ dot ─╮
var  │ field  │   var  │ field  │   var  │ field  │   var  │ field
 │   │   │    │    │   │   │    │    │   │   │    │    │   │   │
"m" "." "a"  "*"  "m" "." "d"  "-"  "m" "." "b"  "*"  "m" "." "c"
```

This tree is called the *concrete syntax tree*, or CST.
It gets simplified to produce the AST.

TODO: A better example, one where there's actual meaningful simplification?

Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible.

In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead.

Now, it's finally time to see some of the code in our compiler!

To start with, we'll just define the structure of the tree as Ruby classes.

Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations.

```ruby
class Program
  def initialize(decls)
    @decls = decls
  end
end
```

Declarations either declare a function or a variable.
Both have a name, but a function also has a list of arguments and a block of statements that comprise its body.

```ruby
class DeclFun
  def initialize(name, args, body)
    @name, @args, @body = name, args, body
  end
end

class DeclVar
  def initialize(name)
    @name = name
  end
end
```

We'll also define a helper class for arguments.
For now, this will just wrap the name of an argument, but later, it'll come in handy.

```ruby
class Arg
  def initialize(name)
    @name = name
  end
end
```

There are four kinds of statements: assignment statements, if statements, return statements, and while statements.
These follow what's in the grammar.

Remember, though, that expressions and variable declarations can also show up as statements!

In a statically typed language, we would probably define `StmtExpr` and `StmtDeclVar` classes to handle them.
However, we don't need to in Ruby, since it's dynamically typed!

TODO: okay, I feel like I ought to justify this better...
But the justification involves the visitors we're going to introduce later.

```ruby
class StmtAssign
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class StmtIf
  attr_reader :then_stmts
  attr_reader :else_stmts
  def initialize(cond, then_stmts, else_stmts)
    @cond = cond
    @then_stmts = then_stmts
    @else_stmts = else_stmts
  end
end

class StmtReturn
  def initialize(expr)
    @expr = expr
  end
end

class StmtWhile
  attr_reader :body_stmts
  def initialize(cond, body_stmts)
    @cond, @body_stmts = cond, body_stmts
  end
end
```

Finally, we have the classes for expressions.
There are 10 of these, that are exactly like what we've seen so far.

```ruby
class ExprAppend # lhs ++ rhs
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class ExprArrayIndex # array[index]
  def initialize(array, index)
    @array, @index = array, index
  end
end

class ExprArrayLit # { exprs... }
  def initialize(exprs)
    @exprs = exprs
  end
end

class ExprCall # func(args...)
  def initialize(func, args)
    @func, @args = func, args
  end
end

class ExprEquals # lhs == rhs
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class ExprIntLit
  def initialize(value)
    @value = value
  end
end

class ExprMinus # lhs - rhs
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class ExprPlus # lhs + rhs
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class ExprStringLit
  def initialize(value)
    @value = value
  end
end

class ExprVariable
  def initialize(name)
    @name = name
  end
end
```

Now that we have all these classes, we can build ASTs corresponding to programs.

Let's look at a simple program and determine what its AST should look like.

```ciad
var x;

fun main(args) {
    x = length(args);
    print({x, args});
}
```

Because our AST is so simple, it corresponds pretty directly to the program.

TODO: Is there more to say here?

```ruby
example = Program.new([
  # var x;
  DeclVar.new("x"),
  # fun main(args) { ... }
  DeclFun.new("main", [Arg.new("args")], [
    # x = length(args);
    StmtAssign.new(
      ExprVariable.new("x"),
      ExprCall.new("length", [ExprVariable.new("args")])),
    # print({x, args});
    ExprCall.new("print", [
      ExprArrayLit.new([
        ExprVariable.new("x"),
        ExprVariable.new("args"),
      ])
    ]),
  ]),
])
```

Now that we have our AST, we'll need to process it.

In the next post, we'll start the first "real step" of compiling: laying out stack frames.