summaryrefslogtreecommitdiff
path: root/Compiler in a Day/02-ast.page
diff options
context:
space:
mode:
authorNathan Ringo <nathan@remexre.com>2023-10-05 02:27:31 -0500
committerNathan Ringo <nathan@remexre.com>2023-10-05 02:27:31 -0500
commitc33dc81d26eb5b77faf0108502ec7588a13ebad6 (patch)
tree097f63838ed76849fdae525018f554fe4777a4b5 /Compiler in a Day/02-ast.page
parente49b67af7125cc7c5a5ec6f321057bee1036ba05 (diff)
Adds Compiler in a Day series. This is a direct import and will need cleanup.
Diffstat (limited to 'Compiler in a Day/02-ast.page')
-rw-r--r--Compiler in a Day/02-ast.page237
1 files changed, 237 insertions, 0 deletions
diff --git a/Compiler in a Day/02-ast.page b/Compiler in a Day/02-ast.page
new file mode 100644
index 0000000..a130533
--- /dev/null
+++ b/Compiler in a Day/02-ast.page
@@ -0,0 +1,237 @@
++++
+title = "The Abstract Syntax Tree — Compiler In A Day"
+prev = "01-ciadlang.md"
+next = "03-frame-layout.md"
++++
+
+# The Abstract Syntax Tree
+
+Most programming languages' syntax can be described with a tree structure.
+An expression like `m.a * m.d - m.b * m.c` could be represented as:
+
+```
+ ╭────────── - ──────────╮
+ ╭──── * ────╮ ╭──── * ────╮
+ ╭─ . ─╮ ╭─ . ─╮ ╭─ . ─╮ ╭─ . ─╮
+var field var field var field var field
+ │ │ │ │ │ │ │ │
+"m" "a" "m" "d" "m" "b" "m" "c"
+```
+
+Such a tree is called an *abstract syntax tree*, or AST.
+
+TODO: Move the CST material to the "what to do better" section?
+
+In our compiler, the parser directly produces the AST.
+In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST.
+
+The CST is a tree that exactly corresponds to the input program, with every token present as a child.
+
+```
+ ╭────────────── binop ──────────────╮
+ ╭───── binop ─────╮ │ ╭───── binop ─────╮
+ ╭─ dot ─╮ │ ╭─ dot ─╮ │ ╭─ dot ─╮ │ ╭─ dot ─╮
+var │ field │ var │ field │ var │ field │ var │ field
+ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
+"m" "." "a" "*" "m" "." "d" "-" "m" "." "b" "*" "m" "." "c"
+```
+
+This tree is called the *concrete syntax tree*, or CST.
+It gets simplified to produce the AST.
+
+TODO: A better example, one where there's actual meaningful simplification?
+
+Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible.
+
+In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead.
+
+Now, it's finally time to see some of the code in our compiler!
+
+To start with, we'll just define the structure of the tree as Ruby classes.
+
+Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations.
+
+```ruby
+class Program
+ def initialize(decls)
+ @decls = decls
+ end
+end
+```
+
+Declarations either declare a function or a variable.
+Both have a name, but a function also has a list of arguments and a block of statements that comprise its body.
+
+```ruby
+class DeclFun
+ def initialize(name, args, body)
+ @name, @args, @body = name, args, body
+ end
+end
+
+class DeclVar
+ def initialize(name)
+ @name = name
+ end
+end
+```
+
+We'll also define a helper class for arguments.
+For now, this will just wrap the name of an argument, but later, it'll come in handy.
+
+```ruby
+class Arg
+ def initialize(name)
+ @name = name
+ end
+end
+```
+
+There are four kinds of statements: assignment statements, if statements, return statements, and while statements.
+These follow what's in the grammar.
+
+Remember, though, that expressions and variable declarations can also show up as statements!
+
+In a statically typed language, we would probably define `StmtExpr` and `StmtDeclVar` classes to handle them.
+However, we don't need to in Ruby, since it's dynamically typed!
+
+TODO: okay, I feel like I ought to justify this better...
+But the justification involves the visitors we're going to introduce later.
+
+```ruby
+class StmtAssign
+ def initialize(lhs, rhs)
+ @lhs, @rhs = lhs, rhs
+ end
+end
+
+class StmtIf
+ attr_reader :then_stmts
+ attr_reader :else_stmts
+ def initialize(cond, then_stmts, else_stmts)
+ @cond = cond
+ @then_stmts = then_stmts
+ @else_stmts = else_stmts
+ end
+end
+
+class StmtReturn
+ def initialize(expr)
+ @expr = expr
+ end
+end
+
+class StmtWhile
+ attr_reader :body_stmts
+ def initialize(cond, body_stmts)
+ @cond, @body_stmts = cond, body_stmts
+ end
+end
+```
+
+Finally, we have the classes for expressions.
+There are 10 of these, that are exactly like what we've seen so far.
+
+```ruby
+class ExprAppend # lhs ++ rhs
+ def initialize(lhs, rhs)
+ @lhs, @rhs = lhs, rhs
+ end
+end
+
+class ExprArrayIndex # array[index]
+ def initialize(array, index)
+ @array, @index = array, index
+ end
+end
+
+class ExprArrayLit # { exprs... }
+ def initialize(exprs)
+ @exprs = exprs
+ end
+end
+
+class ExprCall # func(args...)
+ def initialize(func, args)
+ @func, @args = func, args
+ end
+end
+
+class ExprEquals # lhs == rhs
+ def initialize(lhs, rhs)
+ @lhs, @rhs = lhs, rhs
+ end
+end
+
+class ExprIntLit
+ def initialize(value)
+ @value = value
+ end
+end
+
+class ExprMinus # lhs - rhs
+ def initialize(lhs, rhs)
+ @lhs, @rhs = lhs, rhs
+ end
+end
+
+class ExprPlus # lhs + rhs
+ def initialize(lhs, rhs)
+ @lhs, @rhs = lhs, rhs
+ end
+end
+
+class ExprStringLit
+ def initialize(value)
+ @value = value
+ end
+end
+
+class ExprVariable
+ def initialize(name)
+ @name = name
+ end
+end
+```
+
+Now that we have all these classes, we can build ASTs corresponding to programs.
+
+Let's look at a simple program and determine what its AST should look like.
+
+```ciad
+var x;
+
+fun main(args) {
+ x = length(args);
+ print({x, args});
+}
+```
+
+Because our AST is so simple, it corresponds pretty directly to the program.
+
+TODO: Is there more to say here?
+
+```ruby
+example = Program.new([
+ # var x;
+ DeclVar.new("x"),
+ # fun main(args) { ... }
+ DeclFun.new("main", [Arg.new("args")], [
+ # x = length(args);
+ StmtAssign.new(
+ ExprVariable.new("x"),
+ ExprCall.new("length", [ExprVariable.new("args")])),
+ # print({x, args});
+ ExprCall.new("print", [
+ ExprArrayLit.new([
+ ExprVariable.new("x"),
+ ExprVariable.new("args"),
+ ])
+ ]),
+ ]),
+])
+```
+
+Now that we have our AST, we'll need to process it.
+
+In the next post, we'll start the first "real step" of compiling: laying out stack frames.