summaryrefslogtreecommitdiff
path: root/Compiler in a Day.page
diff options
context:
space:
mode:
Diffstat (limited to 'Compiler in a Day.page')
-rw-r--r--Compiler in a Day.page68
1 files changed, 33 insertions, 35 deletions
diff --git a/Compiler in a Day.page b/Compiler in a Day.page
index a732d4d..be8c815 100644
--- a/Compiler in a Day.page
+++ b/Compiler in a Day.page
@@ -1,15 +1,18 @@
# Overview
-This is intended to be a walkthrough of a complete compiler for a simple language that can be read and understood in a single day.
+This is intended to be a walkthrough of a complete compiler for a simple language.
+Ideally, this walkthrough can be read and understood in a single day.
In order to achieve this, we're going to be cutting a lot of corners, mostly around code generation.
The assembly we'll be producing will run correctly, but it will be very inefficient.
-Our compiler will accept a file written in our programming language and output x86_64 assembly, which can be assembled and linked by [GNU Binutils](https://www.gnu.org/software/binutils/), intended to be run on Linux.
+Our compiler will accept a file written in our programming language and output x86_64 assembly.
+This assembly code can be assembled and linked by [GNU Binutils](https://www.gnu.org/software/binutils/) and run on Linux.
It should also run on the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/) or on FreeBSD with its [Linux ABI support](https://man.freebsd.org/cgi/man.cgi?query=linux&sektion=4&format=html).
-We'll also have a small runtime, written in C, and using [the Boehm-Demers-Weiser garbage collector](https://en.wikipedia.org/wiki/Boehm_garbage_collector).
-The source code we'll show for the compiler is in Ruby, but nothing Ruby-specific will be used.
-In fact, a previous version of this compiler was written in C11.
+The source code we'll show for the compiler is in Python, but nothing Python-specific will be used.
+In fact, previous versions of this compiler were written in C and in Ruby.
+
+We'll also have a small runtime written in C.
Our compiler will have four parts.
They are, in the order they get run:
@@ -51,7 +54,6 @@ TODO: Move the CST material to the "what to do better" section?
In our compiler, the parser directly produces the AST.
In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST.
-
The CST is a tree that exactly corresponds to the input program, with every token present as a child.
```
@@ -69,38 +71,34 @@ It gets simplified to produce the AST.
TODO: A better example, one where there's actual meaningful simplification?
Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible.
-
-In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead.
+We're trying to do the simplest compiler possible, though, so we check for errors on the AST instead.
Now, it's finally time to see some of the code in our compiler!
+To start with, we'll just define the structure of the tree as Python classes.
-To start with, we'll just define the structure of the tree as Ruby classes.
+Recall from the grammar above that a program is a sequence of declarations.
-Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations.
+TODO: linkify
-```ruby
-class Program
- def initialize(decls)
- @decls = decls
- end
-end
+```python
+@dataclass
+class Program:
+ decls: list[Decl]
```
Declarations either declare a function or a variable.
Both have a name, but a function also has a list of arguments and a block of statements that comprise its body.
-```ruby
-class DeclFun
- def initialize(name, args, body)
- @name, @args, @body = name, args, body
- end
-end
+```python
+class DeclFun:
+ name: str
+ args: list[str]
+ body: list[Stmt]
-class DeclVar
- def initialize(name)
- @name = name
- end
-end
+class DeclVar:
+ name: str
+
+Decl = DeclFun | DeclVar
```
We'll also define a helper class for arguments.
@@ -452,14 +450,14 @@ We're producing code to run on Linux on x86_64 CPUs, so the most relevant C ABI
We can copy its choices about how registers can be used, and how arguments get passed to and return values get returned from functions:
-| Registers | Saved by | Notes |
-| -------------------------- | -------- | --------------------- |
-| rdi, rsi, rdx, rcx, r8, r9 | Caller | Function arguments |
-| rax | Caller | Function return value |
-| r10, r11 | Caller | |
-| rbp | Callee | Frame pointer |
-| rsp | Callee | Stack pointer |
-| rbx, r12, r13, r14, r15 | Callee | |
+| Registers | Saved by | Notes |
+| -------------------------------------- | -------- | --------------------- |
+| `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` | Caller | Function arguments |
+| `rax` | Caller | Function return value |
+| `r10`, `r11` | Caller | |
+| `rbp` | Callee | Frame pointer |
+| `rsp` | Callee | Stack pointer |
+| `rbx`, `r12`, `r13`, `r14`, `r15` | Callee | |
When a register is caller-saved, that means that the functions we generate are allowed to freely modify them.
However, so is any function we call, so we can't rely on values in them staying unchanged.