Compiler in a Day/02-ast.page


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237

+++
title = "The Abstract Syntax Tree — Compiler In A Day"
prev = "01-ciadlang.md"
next = "03-frame-layout.md"
+++

# The Abstract Syntax Tree

Most programming languages' syntax can be described with a tree structure.
An expression like `m.a * m.d - m.b * m.c` could be represented as:

```
          ╭────────── - ──────────╮
    ╭──── * ────╮           ╭──── * ────╮
 ╭─ . ─╮     ╭─ . ─╮     ╭─ . ─╮     ╭─ . ─╮
var  field  var  field  var  field  var  field
 │     │     │     │     │     │     │     │
"m"   "a"   "m"   "d"   "m"   "b"   "m"   "c"
```

Such a tree is called an *abstract syntax tree*, or AST.

TODO: Move the CST material to the "what to do better" section?

In our compiler, the parser directly produces the AST.
In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST.

The CST is a tree that exactly corresponds to the input program, with every token present as a child.

```
              ╭────────────── binop ──────────────╮
     ╭───── binop ─────╮        │        ╭───── binop ─────╮
 ╭─ dot ─╮    │    ╭─ dot ─╮    │    ╭─ dot ─╮    │    ╭─ dot ─╮
var  │ field  │   var  │ field  │   var  │ field  │   var  │ field
 │   │   │    │    │   │   │    │    │   │   │    │    │   │   │
"m" "." "a"  "*"  "m" "." "d"  "-"  "m" "." "b"  "*"  "m" "." "c"
```

This tree is called the *concrete syntax tree*, or CST.
It gets simplified to produce the AST.

TODO: A better example, one where there's actual meaningful simplification?

Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible.

In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead.

Now, it's finally time to see some of the code in our compiler!

To start with, we'll just define the structure of the tree as Ruby classes.

Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations.

```ruby
class Program
  def initialize(decls)
    @decls = decls
  end
end
```

Declarations either declare a function or a variable.
Both have a name, but a function also has a list of arguments and a block of statements that comprise its body.

```ruby
class DeclFun
  def initialize(name, args, body)
    @name, @args, @body = name, args, body
  end
end

class DeclVar
  def initialize(name)
    @name = name
  end
end
```

We'll also define a helper class for arguments.
For now, this will just wrap the name of an argument, but later, it'll come in handy.

```ruby
class Arg
  def initialize(name)
    @name = name
  end
end
```

There are four kinds of statements: assignment statements, if statements, return statements, and while statements.
These follow what's in the grammar.

Remember, though, that expressions and variable declarations can also show up as statements!

In a statically typed language, we would probably define `StmtExpr` and `StmtDeclVar` classes to handle them.
However, we don't need to in Ruby, since it's dynamically typed!

TODO: okay, I feel like I ought to justify this better...
But the justification involves the visitors we're going to introduce later.

```ruby
class StmtAssign
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class StmtIf
  attr_reader :then_stmts
  attr_reader :else_stmts
  def initialize(cond, then_stmts, else_stmts)
    @cond = cond
    @then_stmts = then_stmts
    @else_stmts = else_stmts
  end
end

class StmtReturn
  def initialize(expr)
    @expr = expr
  end
end

class StmtWhile
  attr_reader :body_stmts
  def initialize(cond, body_stmts)
    @cond, @body_stmts = cond, body_stmts
  end
end
```

Finally, we have the classes for expressions.
There are 10 of these, that are exactly like what we've seen so far.

```ruby
class ExprAppend # lhs ++ rhs
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class ExprArrayIndex # array[index]
  def initialize(array, index)
    @array, @index = array, index
  end
end

class ExprArrayLit # { exprs... }
  def initialize(exprs)
    @exprs = exprs
  end
end

class ExprCall # func(args...)
  def initialize(func, args)
    @func, @args = func, args
  end
end

class ExprEquals # lhs == rhs
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class ExprIntLit
  def initialize(value)
    @value = value
  end
end

class ExprMinus # lhs - rhs
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class ExprPlus # lhs + rhs
  def initialize(lhs, rhs)
    @lhs, @rhs = lhs, rhs
  end
end

class ExprStringLit
  def initialize(value)
    @value = value
  end
end

class ExprVariable
  def initialize(name)
    @name = name
  end
end
```

Now that we have all these classes, we can build ASTs corresponding to programs.

Let's look at a simple program and determine what its AST should look like.

```ciad
var x;

fun main(args) {
    x = length(args);
    print({x, args});
}
```

Because our AST is so simple, it corresponds pretty directly to the program.

TODO: Is there more to say here?

```ruby
example = Program.new([
  # var x;
  DeclVar.new("x"),
  # fun main(args) { ... }
  DeclFun.new("main", [Arg.new("args")], [
    # x = length(args);
    StmtAssign.new(
      ExprVariable.new("x"),
      ExprCall.new("length", [ExprVariable.new("args")])),
    # print({x, args});
    ExprCall.new("print", [
      ExprArrayLit.new([
        ExprVariable.new("x"),
        ExprVariable.new("args"),
      ])
    ]),
  ]),
])
```

Now that we have our AST, we'll need to process it.

In the next post, we'll start the first "real step" of compiling: laying out stack frames.