summaryrefslogtreecommitdiff
path: root/Compiler in a Day.page
blob: 2a5c60bd8907a6d87f9f602299526674f3f96114 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Overview

This is intended to be a walkthrough of a complete compiler for a simple language that can be read and understood in a single day.
In order to achieve this, we're going to be cutting a lot of corners, mostly around code generation.
The assembly we'll be producing will run correctly, but it will be very inefficient.

Our compiler will accept a file written in our programming language and output x86_64 assembly, which can be assembled and linked by [GNU Binutils](https://www.gnu.org/software/binutils/), intended to be run on Linux.
It should also run on the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/) or on FreeBSD with its [Linux ABI support](https://man.freebsd.org/cgi/man.cgi?query=linux&sektion=4&format=html).
We'll also have a small runtime, written in C, and using [the Boehm-Demers-Weiser garbage collector](https://en.wikipedia.org/wiki/Boehm_garbage_collector).

The source code we'll show for the compiler is in Ruby, but nothing Ruby-specific will be used.
In fact, a previous version of this compiler was written in C11.

Our compiler will have four parts.
They are, in the order they get run:

- [Lexing](https://en.wikipedia.org/wiki/Lexical_analysis): the process of breaking up the strings of source code into lexical units known as "tokens." This simplifies parsing.
- [Parsing](https://en.wikipedia.org/wiki/Parsing): the process of building a tree representing the program from the tokens.
- Frame layout: the process of assigning slots in each function's [stack frame](https://en.wikipedia.org/wiki/Call_stack#Structure) to its local variables.
- Code generation: the process of generating actual assembly code from the program.

TODO: pictures!

Before we can start looking at these steps, however, we should look at the language we'll be compiling.