Writing A Compiler In Go
Thorsten Ball
Writing A Compiler In Go
Acknowledgments
I started writing this book one month after my daughter was born and finished shortly after her first birthday. Or, in other words: this book wouldnt exist without the help of my wife. While our baby grew into the wonderful girl she is now and rightfully demanded the attention she deserves, my wife always created time and room for me to write. I couldnt have written this book without her steady support and unwavering faith in me. Thank you!
Thanks to Christian for supporting me from the start again with an open ear and encouragement. Thanks to Ricardo for providing invaluable, in-depth feedback and expertise. Thanks to Yoji for his diligence and attention to detail. Thanks to all the other beta-readers for helping to make this book better!
Introduction
It might not be the most polite thing to do, but lets start with a lie: the prequel to this book, Writing An Interpreter In Go, was much more successful than I ever imagined it would be. Yes, thats a lie. Of course, I imagined its success. The name on the top of bestseller lists, me showered with praise and admiration, invited to fancy events, strangers walking up to me in the street, wanting to get their copy signed who wouldnt imagine that when writing a book about a programming language called Monkey? But, now, in all seriousness, the truth: I really didnt expect the book to be as successful as it was.
Sure, I had a feeling that some people might enjoy it. Mainly because its the book I myself wanted to read, but couldnt find. And on my fruitless search I saw other people looking for the exact same thing: a book about interpreters that is easy to understand, doesnt take shortcuts and puts runnable and tested code front and center. If I could write a book like that, I thought, there might just be a chance that others would enjoy it, too.
But enough about my imagination, heres what actually happened: readers really enjoyed what I wrote. They not only bought and read the book, but sent me emails to thank me for writing it. They wrote blog posts about how much they enjoyed it. They shared it on social networks and upvoted it. They played around with the code, tweaked it, extended it and shared it on GitHub. They even helped to fix errors in it. Imagine that! They sent me fixes for my mistakes, all the while saying sorry for finding them. Apparently, they couldnt imagine how thankful I was for every suggestion and correction.
Then, after reading one email in which a reader asked for more, something in me clicked. What lived in the back of my mind as an idea turned into an obligation: I have to write the second part. Note that I didnt just write a second part, but the second part. Thats because the first book was born out of a compromise.
When I set out to write Writing An Interpreter In Go the idea was not to follow it up with a sequel, but to only write a single book. That changed, though, when I realized that the final book would be too long. I never wanted to write something that scares people off with its size. And even if I did, completing the book would probably take so long that I would have most likely given up long before.
That led me to a compromise. Instead of writing about building a tree-walking interpreter and turning it into a virtual machine, I would only write about the tree-walking part. That turned into Writing An Interpreter In Go and what youre reading now is the sequel I have always wanted to write.
But what exactly does sequel mean here? By now you know that this book doesnt start with Decades after the events in the first book, in another galaxy, where the name Monkey has no meaning No, this book is meant to seamlessly connect to its predecessor. Its the same approach, the same programming language, the same tools and the codebase that we left at the end of the first book.
The idea is simple: we pick up where we left off and continue our work on Monkey. This is not only a successor to the previous book, but also a sequel to Monkey, the next step in its evolution. Before we can see what that looks like, though, we need look back, to refresh our memory of Monkey.
Evolving Monkey
The Past and Present
In Writing An Interpreter In Go we built an interpreter for the programming language Monkey. Monkey was invented with one purpose in mind: to be built from scratch in Writing An Interpreter In Go and by its readers. Its only official implementation is contained in Writing An Interpreter In Go, although many unofficial ones, built by readers in a variety of languages, are floating around the internet.
In case you forgot what Monkey looks like, here is a small snippet that tries to cram as much of Monkeys features into as few lines as possible:
let name = "Monkey" ; let age = ; let inspirations = [ "Scheme" , "Lisp" , "JavaScript" , "Clojure" ] ; let book = { "title" : "Writing A Compiler In Go" , "author" : "Thorsten Ball" , "prequel" : "Writing An Interpreter In Go" }; let printBookName = fn (book) { let title = book[ "title" ] ; let author = book[ "author" ] ; puts (author + " - " + title) ; }; printBookName (book) ; // => prints: "Thorsten Ball - Writing A Compiler In Go" let fibonacci = fn (x) { if (x == ) { } else { if (x == ) { return ; } else { fibonacci (x - ) + fibonacci (x - ) ; } } }; let map = fn (arr , f) { let iter = fn (arr , accumulated) { if ( len (arr) == ) { accumulated } else { iter ( rest (arr) , push (accumulated , f ( first (arr)))) ; } }; iter (arr , []) ; }; let numbers = [ , + , - , * , + , / ] ; map (numbers , fibonacci) ; // => returns: [1, 1, 2, 3, 5, 8]
Translated into a list of features, we can say that Monkey supports:
- integers
- booleans
- strings
- arrays
- hashes
- prefix-, infix- and index operators
- conditionals
- global and local bindings
- first-class functions
- return statements
- closures
Quite a list, huh? And we built all of these into our Monkey interpreter ourselves and most importantly! we built them from scratch, without the use of any third-party tools or libraries.
We started out by building the lexer that turns strings entered into the REPL into tokens. The lexer is defined in the lexer
package and the tokens it generates can be found in the token
package.
After that, we built the parser, a top-down recursive-descent parser (often called a Pratt parser) that turns the tokens into an abstract syntax tree, which is abbreviated to AST. The nodes of the AST are defined in the ast
package and the parser itself can be found in the parser
package.
After it went through the parser, a Monkey program is then represented in memory as a tree and the next step is to evaluate it. In order to do that we built an evaluator. Thats another name for a function called Eval
, defined in the evaluator
package. Eval
recursively walks down the AST and evaluates it, using the object system we defined in the object
package to produce values. It would, for example, turn an AST node representing 1 + 2
into an object.Integer{Value: 3}
. With that, the life cycle of Monkey code would be complete and the result printed to the REPL.
This chain of transformations from strings to tokens, from tokens to a tree and from a tree to object.Object
is visible from start to end in the main loop of the Monkey REPL we built:
// repl/repl.go package repl func Start(in io.Reader, out io.Writer) { scanner := bufio.NewScanner(in) env := object.NewEnvironment() for { fmt.Printf(PROMPT) scanned := scanner.Scan() if !scanned { return } line := scanner.Text() l := lexer.New(line) p := parser.New(l) program := p.ParseProgram() if len (p.Errors()) != { printParserErrors(out, p.Errors()) continue } evaluated := evaluator.Eval(program, env) if evaluated != nil { io.WriteString(out, evaluated.Inspect()) io.WriteString(out, " \n " ) } }}