+++ title = "The Abstract Syntax Tree — Compiler In A Day" prev = "01-ciadlang.md" next = "03-frame-layout.md" +++ # The Abstract Syntax Tree Most programming languages' syntax can be described with a tree structure. An expression like `m.a * m.d - m.b * m.c` could be represented as: ``` ╭────────── - ──────────╮ ╭──── * ────╮ ╭──── * ────╮ ╭─ . ─╮ ╭─ . ─╮ ╭─ . ─╮ ╭─ . ─╮ var field var field var field var field │ │ │ │ │ │ │ │ "m" "a" "m" "d" "m" "b" "m" "c" ``` Such a tree is called an *abstract syntax tree*, or AST. TODO: Move the CST material to the "what to do better" section? In our compiler, the parser directly produces the AST. In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST. The CST is a tree that exactly corresponds to the input program, with every token present as a child. ``` ╭────────────── binop ──────────────╮ ╭───── binop ─────╮ │ ╭───── binop ─────╮ ╭─ dot ─╮ │ ╭─ dot ─╮ │ ╭─ dot ─╮ │ ╭─ dot ─╮ var │ field │ var │ field │ var │ field │ var │ field │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ "m" "." "a" "*" "m" "." "d" "-" "m" "." "b" "*" "m" "." "c" ``` This tree is called the *concrete syntax tree*, or CST. It gets simplified to produce the AST. TODO: A better example, one where there's actual meaningful simplification? Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible. In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead. Now, it's finally time to see some of the code in our compiler! To start with, we'll just define the structure of the tree as Ruby classes. Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations. ```ruby class Program def initialize(decls) @decls = decls end end ``` Declarations either declare a function or a variable. Both have a name, but a function also has a list of arguments and a block of statements that comprise its body. ```ruby class DeclFun def initialize(name, args, body) @name, @args, @body = name, args, body end end class DeclVar def initialize(name) @name = name end end ``` We'll also define a helper class for arguments. For now, this will just wrap the name of an argument, but later, it'll come in handy. ```ruby class Arg def initialize(name) @name = name end end ``` There are four kinds of statements: assignment statements, if statements, return statements, and while statements. These follow what's in the grammar. Remember, though, that expressions and variable declarations can also show up as statements! In a statically typed language, we would probably define `StmtExpr` and `StmtDeclVar` classes to handle them. However, we don't need to in Ruby, since it's dynamically typed! TODO: okay, I feel like I ought to justify this better... But the justification involves the visitors we're going to introduce later. ```ruby class StmtAssign def initialize(lhs, rhs) @lhs, @rhs = lhs, rhs end end class StmtIf attr_reader :then_stmts attr_reader :else_stmts def initialize(cond, then_stmts, else_stmts) @cond = cond @then_stmts = then_stmts @else_stmts = else_stmts end end class StmtReturn def initialize(expr) @expr = expr end end class StmtWhile attr_reader :body_stmts def initialize(cond, body_stmts) @cond, @body_stmts = cond, body_stmts end end ``` Finally, we have the classes for expressions. There are 10 of these, that are exactly like what we've seen so far. ```ruby class ExprAppend # lhs ++ rhs def initialize(lhs, rhs) @lhs, @rhs = lhs, rhs end end class ExprArrayIndex # array[index] def initialize(array, index) @array, @index = array, index end end class ExprArrayLit # { exprs... } def initialize(exprs) @exprs = exprs end end class ExprCall # func(args...) def initialize(func, args) @func, @args = func, args end end class ExprEquals # lhs == rhs def initialize(lhs, rhs) @lhs, @rhs = lhs, rhs end end class ExprIntLit def initialize(value) @value = value end end class ExprMinus # lhs - rhs def initialize(lhs, rhs) @lhs, @rhs = lhs, rhs end end class ExprPlus # lhs + rhs def initialize(lhs, rhs) @lhs, @rhs = lhs, rhs end end class ExprStringLit def initialize(value) @value = value end end class ExprVariable def initialize(name) @name = name end end ``` Now that we have all these classes, we can build ASTs corresponding to programs. Let's look at a simple program and determine what its AST should look like. ```ciad var x; fun main(args) { x = length(args); print({x, args}); } ``` Because our AST is so simple, it corresponds pretty directly to the program. TODO: Is there more to say here? ```ruby example = Program.new([ # var x; DeclVar.new("x"), # fun main(args) { ... } DeclFun.new("main", [Arg.new("args")], [ # x = length(args); StmtAssign.new( ExprVariable.new("x"), ExprCall.new("length", [ExprVariable.new("args")])), # print({x, args}); ExprCall.new("print", [ ExprArrayLit.new([ ExprVariable.new("x"), ExprVariable.new("args"), ]) ]), ]), ]) ``` Now that we have our AST, we'll need to process it. In the next post, we'll start the first "real step" of compiling: laying out stack frames.