1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
|
+++
title = "The Abstract Syntax Tree — Compiler In A Day"
prev = "01-ciadlang.md"
next = "03-frame-layout.md"
+++
# The Abstract Syntax Tree
Most programming languages' syntax can be described with a tree structure.
An expression like `m.a * m.d - m.b * m.c` could be represented as:
```
╭────────── - ──────────╮
╭──── * ────╮ ╭──── * ────╮
╭─ . ─╮ ╭─ . ─╮ ╭─ . ─╮ ╭─ . ─╮
var field var field var field var field
│ │ │ │ │ │ │ │
"m" "a" "m" "d" "m" "b" "m" "c"
```
Such a tree is called an *abstract syntax tree*, or AST.
TODO: Move the CST material to the "what to do better" section?
In our compiler, the parser directly produces the AST.
In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST.
The CST is a tree that exactly corresponds to the input program, with every token present as a child.
```
╭────────────── binop ──────────────╮
╭───── binop ─────╮ │ ╭───── binop ─────╮
╭─ dot ─╮ │ ╭─ dot ─╮ │ ╭─ dot ─╮ │ ╭─ dot ─╮
var │ field │ var │ field │ var │ field │ var │ field
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │
"m" "." "a" "*" "m" "." "d" "-" "m" "." "b" "*" "m" "." "c"
```
This tree is called the *concrete syntax tree*, or CST.
It gets simplified to produce the AST.
TODO: A better example, one where there's actual meaningful simplification?
Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible.
In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead.
Now, it's finally time to see some of the code in our compiler!
To start with, we'll just define the structure of the tree as Ruby classes.
Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations.
```ruby
class Program
def initialize(decls)
@decls = decls
end
end
```
Declarations either declare a function or a variable.
Both have a name, but a function also has a list of arguments and a block of statements that comprise its body.
```ruby
class DeclFun
def initialize(name, args, body)
@name, @args, @body = name, args, body
end
end
class DeclVar
def initialize(name)
@name = name
end
end
```
We'll also define a helper class for arguments.
For now, this will just wrap the name of an argument, but later, it'll come in handy.
```ruby
class Arg
def initialize(name)
@name = name
end
end
```
There are four kinds of statements: assignment statements, if statements, return statements, and while statements.
These follow what's in the grammar.
Remember, though, that expressions and variable declarations can also show up as statements!
In a statically typed language, we would probably define `StmtExpr` and `StmtDeclVar` classes to handle them.
However, we don't need to in Ruby, since it's dynamically typed!
TODO: okay, I feel like I ought to justify this better...
But the justification involves the visitors we're going to introduce later.
```ruby
class StmtAssign
def initialize(lhs, rhs)
@lhs, @rhs = lhs, rhs
end
end
class StmtIf
attr_reader :then_stmts
attr_reader :else_stmts
def initialize(cond, then_stmts, else_stmts)
@cond = cond
@then_stmts = then_stmts
@else_stmts = else_stmts
end
end
class StmtReturn
def initialize(expr)
@expr = expr
end
end
class StmtWhile
attr_reader :body_stmts
def initialize(cond, body_stmts)
@cond, @body_stmts = cond, body_stmts
end
end
```
Finally, we have the classes for expressions.
There are 10 of these, that are exactly like what we've seen so far.
```ruby
class ExprAppend # lhs ++ rhs
def initialize(lhs, rhs)
@lhs, @rhs = lhs, rhs
end
end
class ExprArrayIndex # array[index]
def initialize(array, index)
@array, @index = array, index
end
end
class ExprArrayLit # { exprs... }
def initialize(exprs)
@exprs = exprs
end
end
class ExprCall # func(args...)
def initialize(func, args)
@func, @args = func, args
end
end
class ExprEquals # lhs == rhs
def initialize(lhs, rhs)
@lhs, @rhs = lhs, rhs
end
end
class ExprIntLit
def initialize(value)
@value = value
end
end
class ExprMinus # lhs - rhs
def initialize(lhs, rhs)
@lhs, @rhs = lhs, rhs
end
end
class ExprPlus # lhs + rhs
def initialize(lhs, rhs)
@lhs, @rhs = lhs, rhs
end
end
class ExprStringLit
def initialize(value)
@value = value
end
end
class ExprVariable
def initialize(name)
@name = name
end
end
```
Now that we have all these classes, we can build ASTs corresponding to programs.
Let's look at a simple program and determine what its AST should look like.
```ciad
var x;
fun main(args) {
x = length(args);
print({x, args});
}
```
Because our AST is so simple, it corresponds pretty directly to the program.
TODO: Is there more to say here?
```ruby
example = Program.new([
# var x;
DeclVar.new("x"),
# fun main(args) { ... }
DeclFun.new("main", [Arg.new("args")], [
# x = length(args);
StmtAssign.new(
ExprVariable.new("x"),
ExprCall.new("length", [ExprVariable.new("args")])),
# print({x, args});
ExprCall.new("print", [
ExprArrayLit.new([
ExprVariable.new("x"),
ExprVariable.new("args"),
])
]),
]),
])
```
Now that we have our AST, we'll need to process it.
In the next post, we'll start the first "real step" of compiling: laying out stack frames.
|