diff options
author | Nathan Ringo <nathan@remexre.com> | 2023-10-05 02:27:31 -0500 |
---|---|---|
committer | Nathan Ringo <nathan@remexre.com> | 2023-10-05 02:27:31 -0500 |
commit | c33dc81d26eb5b77faf0108502ec7588a13ebad6 (patch) | |
tree | 097f63838ed76849fdae525018f554fe4777a4b5 /Compiler in a Day/02-ast.page | |
parent | e49b67af7125cc7c5a5ec6f321057bee1036ba05 (diff) |
Adds Compiler in a Day series. This is a direct import and will need cleanup.
Diffstat (limited to 'Compiler in a Day/02-ast.page')
-rw-r--r-- | Compiler in a Day/02-ast.page | 237 |
1 files changed, 237 insertions, 0 deletions
diff --git a/Compiler in a Day/02-ast.page b/Compiler in a Day/02-ast.page new file mode 100644 index 0000000..a130533 --- /dev/null +++ b/Compiler in a Day/02-ast.page @@ -0,0 +1,237 @@ ++++ +title = "The Abstract Syntax Tree — Compiler In A Day" +prev = "01-ciadlang.md" +next = "03-frame-layout.md" ++++ + +# The Abstract Syntax Tree + +Most programming languages' syntax can be described with a tree structure. +An expression like `m.a * m.d - m.b * m.c` could be represented as: + +``` + ╭────────── - ──────────╮ + ╭──── * ────╮ ╭──── * ────╮ + ╭─ . ─╮ ╭─ . ─╮ ╭─ . ─╮ ╭─ . ─╮ +var field var field var field var field + │ │ │ │ │ │ │ │ +"m" "a" "m" "d" "m" "b" "m" "c" +``` + +Such a tree is called an *abstract syntax tree*, or AST. + +TODO: Move the CST material to the "what to do better" section? + +In our compiler, the parser directly produces the AST. +In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST. + +The CST is a tree that exactly corresponds to the input program, with every token present as a child. + +``` + ╭────────────── binop ──────────────╮ + ╭───── binop ─────╮ │ ╭───── binop ─────╮ + ╭─ dot ─╮ │ ╭─ dot ─╮ │ ╭─ dot ─╮ │ ╭─ dot ─╮ +var │ field │ var │ field │ var │ field │ var │ field + │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ +"m" "." "a" "*" "m" "." "d" "-" "m" "." "b" "*" "m" "." "c" +``` + +This tree is called the *concrete syntax tree*, or CST. +It gets simplified to produce the AST. + +TODO: A better example, one where there's actual meaningful simplification? + +Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible. + +In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead. + +Now, it's finally time to see some of the code in our compiler! + +To start with, we'll just define the structure of the tree as Ruby classes. + +Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations. + +```ruby +class Program + def initialize(decls) + @decls = decls + end +end +``` + +Declarations either declare a function or a variable. +Both have a name, but a function also has a list of arguments and a block of statements that comprise its body. + +```ruby +class DeclFun + def initialize(name, args, body) + @name, @args, @body = name, args, body + end +end + +class DeclVar + def initialize(name) + @name = name + end +end +``` + +We'll also define a helper class for arguments. +For now, this will just wrap the name of an argument, but later, it'll come in handy. + +```ruby +class Arg + def initialize(name) + @name = name + end +end +``` + +There are four kinds of statements: assignment statements, if statements, return statements, and while statements. +These follow what's in the grammar. + +Remember, though, that expressions and variable declarations can also show up as statements! + +In a statically typed language, we would probably define `StmtExpr` and `StmtDeclVar` classes to handle them. +However, we don't need to in Ruby, since it's dynamically typed! + +TODO: okay, I feel like I ought to justify this better... +But the justification involves the visitors we're going to introduce later. + +```ruby +class StmtAssign + def initialize(lhs, rhs) + @lhs, @rhs = lhs, rhs + end +end + +class StmtIf + attr_reader :then_stmts + attr_reader :else_stmts + def initialize(cond, then_stmts, else_stmts) + @cond = cond + @then_stmts = then_stmts + @else_stmts = else_stmts + end +end + +class StmtReturn + def initialize(expr) + @expr = expr + end +end + +class StmtWhile + attr_reader :body_stmts + def initialize(cond, body_stmts) + @cond, @body_stmts = cond, body_stmts + end +end +``` + +Finally, we have the classes for expressions. +There are 10 of these, that are exactly like what we've seen so far. + +```ruby +class ExprAppend # lhs ++ rhs + def initialize(lhs, rhs) + @lhs, @rhs = lhs, rhs + end +end + +class ExprArrayIndex # array[index] + def initialize(array, index) + @array, @index = array, index + end +end + +class ExprArrayLit # { exprs... } + def initialize(exprs) + @exprs = exprs + end +end + +class ExprCall # func(args...) + def initialize(func, args) + @func, @args = func, args + end +end + +class ExprEquals # lhs == rhs + def initialize(lhs, rhs) + @lhs, @rhs = lhs, rhs + end +end + +class ExprIntLit + def initialize(value) + @value = value + end +end + +class ExprMinus # lhs - rhs + def initialize(lhs, rhs) + @lhs, @rhs = lhs, rhs + end +end + +class ExprPlus # lhs + rhs + def initialize(lhs, rhs) + @lhs, @rhs = lhs, rhs + end +end + +class ExprStringLit + def initialize(value) + @value = value + end +end + +class ExprVariable + def initialize(name) + @name = name + end +end +``` + +Now that we have all these classes, we can build ASTs corresponding to programs. + +Let's look at a simple program and determine what its AST should look like. + +```ciad +var x; + +fun main(args) { + x = length(args); + print({x, args}); +} +``` + +Because our AST is so simple, it corresponds pretty directly to the program. + +TODO: Is there more to say here? + +```ruby +example = Program.new([ + # var x; + DeclVar.new("x"), + # fun main(args) { ... } + DeclFun.new("main", [Arg.new("args")], [ + # x = length(args); + StmtAssign.new( + ExprVariable.new("x"), + ExprCall.new("length", [ExprVariable.new("args")])), + # print({x, args}); + ExprCall.new("print", [ + ExprArrayLit.new([ + ExprVariable.new("x"), + ExprVariable.new("args"), + ]) + ]), + ]), +]) +``` + +Now that we have our AST, we'll need to process it. + +In the next post, we'll start the first "real step" of compiling: laying out stack frames. |