diff options
-rw-r--r-- | Compiler in a Day.page | 68 |
1 files changed, 33 insertions, 35 deletions
diff --git a/Compiler in a Day.page b/Compiler in a Day.page index a732d4d..be8c815 100644 --- a/Compiler in a Day.page +++ b/Compiler in a Day.page @@ -1,15 +1,18 @@ # Overview -This is intended to be a walkthrough of a complete compiler for a simple language that can be read and understood in a single day. +This is intended to be a walkthrough of a complete compiler for a simple language. +Ideally, this walkthrough can be read and understood in a single day. In order to achieve this, we're going to be cutting a lot of corners, mostly around code generation. The assembly we'll be producing will run correctly, but it will be very inefficient. -Our compiler will accept a file written in our programming language and output x86_64 assembly, which can be assembled and linked by [GNU Binutils](https://www.gnu.org/software/binutils/), intended to be run on Linux. +Our compiler will accept a file written in our programming language and output x86_64 assembly. +This assembly code can be assembled and linked by [GNU Binutils](https://www.gnu.org/software/binutils/) and run on Linux. It should also run on the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/) or on FreeBSD with its [Linux ABI support](https://man.freebsd.org/cgi/man.cgi?query=linux&sektion=4&format=html). -We'll also have a small runtime, written in C, and using [the Boehm-Demers-Weiser garbage collector](https://en.wikipedia.org/wiki/Boehm_garbage_collector). -The source code we'll show for the compiler is in Ruby, but nothing Ruby-specific will be used. -In fact, a previous version of this compiler was written in C11. +The source code we'll show for the compiler is in Python, but nothing Python-specific will be used. +In fact, previous versions of this compiler were written in C and in Ruby. + +We'll also have a small runtime written in C. Our compiler will have four parts. They are, in the order they get run: @@ -51,7 +54,6 @@ TODO: Move the CST material to the "what to do better" section? In our compiler, the parser directly produces the AST. In some compilers, there's an intermediate tree called the *concrete syntax tree*, or CST. - The CST is a tree that exactly corresponds to the input program, with every token present as a child. ``` @@ -69,38 +71,34 @@ It gets simplified to produce the AST. TODO: A better example, one where there's actual meaningful simplification? Typically, this is done so that type-checking and other error checks can be performed with as much information about the source code present as possible. - -In a simple compiler like ours, though, we don't lose much quality of error reporting checking for errors on the AST instead. +We're trying to do the simplest compiler possible, though, so we check for errors on the AST instead. Now, it's finally time to see some of the code in our compiler! +To start with, we'll just define the structure of the tree as Python classes. -To start with, we'll just define the structure of the tree as Ruby classes. +Recall from the grammar above that a program is a sequence of declarations. -Recall from [the syntactic grammar](./01-ciadlang.md#syntactic-grammar) that a program is a sequence of declarations. +TODO: linkify -```ruby -class Program - def initialize(decls) - @decls = decls - end -end +```python +@dataclass +class Program: + decls: list[Decl] ``` Declarations either declare a function or a variable. Both have a name, but a function also has a list of arguments and a block of statements that comprise its body. -```ruby -class DeclFun - def initialize(name, args, body) - @name, @args, @body = name, args, body - end -end +```python +class DeclFun: + name: str + args: list[str] + body: list[Stmt] -class DeclVar - def initialize(name) - @name = name - end -end +class DeclVar: + name: str + +Decl = DeclFun | DeclVar ``` We'll also define a helper class for arguments. @@ -452,14 +450,14 @@ We're producing code to run on Linux on x86_64 CPUs, so the most relevant C ABI We can copy its choices about how registers can be used, and how arguments get passed to and return values get returned from functions: -| Registers | Saved by | Notes | -| -------------------------- | -------- | --------------------- | -| rdi, rsi, rdx, rcx, r8, r9 | Caller | Function arguments | -| rax | Caller | Function return value | -| r10, r11 | Caller | | -| rbp | Callee | Frame pointer | -| rsp | Callee | Stack pointer | -| rbx, r12, r13, r14, r15 | Callee | | +| Registers | Saved by | Notes | +| -------------------------------------- | -------- | --------------------- | +| `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` | Caller | Function arguments | +| `rax` | Caller | Function return value | +| `r10`, `r11` | Caller | | +| `rbp` | Callee | Frame pointer | +| `rsp` | Callee | Stack pointer | +| `rbx`, `r12`, `r13`, `r14`, `r15` | Callee | | When a register is caller-saved, that means that the functions we generate are allowed to freely modify them. However, so is any function we call, so we can't rely on values in them staying unchanged. |