summaryrefslogtreecommitdiff
path: root/Compiler in a Day/04-abi.page
diff options
context:
space:
mode:
Diffstat (limited to 'Compiler in a Day/04-abi.page')
-rw-r--r--Compiler in a Day/04-abi.page136
1 files changed, 136 insertions, 0 deletions
diff --git a/Compiler in a Day/04-abi.page b/Compiler in a Day/04-abi.page
new file mode 100644
index 0000000..ec96b40
--- /dev/null
+++ b/Compiler in a Day/04-abi.page
@@ -0,0 +1,136 @@
++++
+title = "The ABI — Compiler In A Day"
+prev = "03-frame-layout.md"
+next = "05-codegen.md"
++++
+
+# The ABI
+
+We're ready to start making the final decisions about the machine code we're going to produce.
+
+To reiterate, we're _not_ aiming to produce efficient machine code for now.
+We're just aiming to get the simplest thing that works.
+
+The important things to decide are:
+
+- How different registers are allowed to be used.
+- The way we pass arguments to and receive arguments from functions.
+- How the various data types in our language are represented.
+
+These questions (and more related ones we won't touch on) are called the *application binary interface*, or ABI.
+
+Usually, a platform has an ABI already defined and documented for the C programming language.
+This C ABI serves as the "lowest common denominator" for other languages' ABIs.
+
+Even though you can make your compiler follow whatever ABI you want, it simplifies calling into other languages (and letting them call your code) if it roughly follows the C ABI.
+
+Although we aren't providing features for users to interoperate with any other code they want, we still need to call into our runtime.
+Since our runtime will be written in C, we want to make this as painless as possible.
+
+We're producing code to run on Linux on x86_64 CPUs, so the most relevant C ABI document is the [AMD64 Architecture Processor Supplement of the System V ABI].
+
+We can copy its choices about how registers can be used, and how arguments get passed to and return values get returned from functions:
+
+| Registers | Saved by | Notes |
+| -------------------------- | -------- | --------------------- |
+| rdi, rsi, rdx, rcx, r8, r9 | Caller | Function arguments |
+| rax | Caller | Function return value |
+| r10, r11 | Caller | |
+| rbp | Callee | Frame pointer |
+| rsp | Callee | Stack pointer |
+| rbx, r12, r13, r14, r15 | Callee | |
+
+When a register is caller-saved, that means that the functions we generate are allowed to freely modify them.
+However, so is any function we call, so we can't rely on values in them staying unchanged.
+
+Conversely, a callee-saved register must be restored to its original value before we return, but we can rely on functions we call doing the same.
+
+Next, we need to figure out how we're going to represent values.
+We have three types of values: integers, strings, and arrays.
+
+Since we're implementing a dynamically typed language, any variable needs to be able to store values of any of these types.
+
+We also want values to fit in a single (64-bit) register, for simplicity and performance.
+
+Finally, we want values that contain memory addresses to be represented by those addresses, so that the garbage collector we're using can recognize them as pointers.
+
+The easiest thing to do is to simply represent all three as pointers to structures in the heap that have information about the type stored in them.
+
+For example, the array `{42, "foo"}` might be represented as the pointer `0x08001230`, pointing to data like:
+
+```
+╭────────────╮
+│ 0x08001230 │ HEAP
+╰────────────╯
+ │ │ ... │
+ ╰─> ├────────────┤
+ type tag │ ARRAY │
+ data pointer │ 0x08001250 │───╮
+ length │ 2 │ │
+ capacity │ 4 │ │
+ ├────────────┤ <─╯
+ │ 0x08001270 │─────╮
+ │ 0x08001280 │───╮ │
+ │ 0 │ │ │
+ │ 0 │ │ │
+ ├────────────┤ <───╯
+ type tag │ INTEGER │ │
+ value │ 42 │ │
+ ├────────────┤ <─╯
+ type tag │ STRING │
+ data pointer │ 0x080012a0 │───╮
+ length │ 2 │ │
+ capacity │ 8 │ │
+ ├────────────┤ <─╯
+ │ "foo", ... │
+ ├────────────┤
+ │ ... │
+```
+
+We know that `0x08001230` points to an array, since when we dereference it, we get the `ARRAY` constant.
+
+It doesn't really matter what values the `INTEGER`, `ARRAY`, and `STRING` constants have.
+For simplicity, let's make them:
+
+| Constant | Value |
+| --------- | ----- |
+| `INTEGER` | 0 |
+| `ARRAY` | 1 |
+| `STRING` | 2 |
+
+So, putting it all together, let's imagine the equivalent of this simple function in C:
+
+```
+fun f(arr, x) {
+ return arr[x] + x;
+}
+```
+
+It might look something like:
+
+```c
+enum type_tag { INTEGER, ARRAY, STRING };
+struct array { struct value** data_ptr; size_t len, cap; };
+struct string { char* data_ptr; size_t len, cap; };
+struct value {
+ enum type_tag tag;
+ union {
+ int64_t integer;
+ struct array array;
+ struct string string;
+ };
+};
+
+struct value* f(struct value* arr, struct value* x) {
+ assert(arr->tag == ARRAY);
+ assert( x->tag == INTEGER);
+ assert(x->integer < arr->array.len);
+ struct value* arr_x = arr->array.data_ptr[x->integer];
+
+ assert(arr_x->tag == INTEGER);
+ assert( x->tag == INTEGER);
+ return make_int(arr_x->integer + x->integer);
+}
+```
+
+[AMD64 Architecture Processor Supplement of the System V ABI]: https://gitlab.com/x86-psABIs/x86-64-ABI