Based on the original series “Let’s Build a Compiler!” by Jack Crenshaw.
Introduction
As we all know, a compiler parses and translates an input language (in our case numerical expressions and assignments so far) to code that is executed on a target CPU or virtual machine. An interpreter on the other hand, generates no output code. Instead, the arithmetic is computed immediately as the parsing is going on.
As Jack says, our compiler is what is called a “pure” compiler, i.e. every time a construct is recognised, the code is produced immediately. And this is the reason for the not so efficient code. At the other end of the spectrum, a “pure” interpreter will do no translation on the source code. And these are the two “extremes” of translation. I’m sure your gut feel is telling you that in the real world it might be beneficial for a compiler to user techniques used in interpreters and vice-versa.
We know for a fact, that the Microsoft Basic interpreter translated the source code to some intermediate code (tokenizing) to make it easier to parse it in real time. Also an Assembler, looks for constant expressions, calculates them and uses the result instead of producing object code for them.
Let’s think of the below expression:
x = x + 3 – 2 – ( 5 – 4 )
Run it as input to our compiler. It produces 18 lines of object code! But if you look at it better and do the maths, you will see that it translates to
x = x + 0
or
x = x
which effectively tells us to do nothing at all.
This is the concept of “lazy” translation, which means that you don’t produce code at every action. You only produce code when you have to, and in some cases like above, you don’t produce code at all. Obviously, to implement this concept every parsing routine would have to make decisions whether to produce code or not and this would complicate these routines significantly. So, clearly, this technique is not named “lazy” because it’s easier on the compiler programmer…
Now that we have set the scene, I will present the Kotlin version of Jack’s interpreter.
The Interpreter
Please open a new project in your IDE and copy the InputProgramScanner.kt
file as it evolved at the end of chapter 3 and the basic CompilerMain.kt
from chapter 1 (which I have renamed to InterpreterMain.kt
) to start with. Don’t forget to remove the call to getNextChar
from the main program, as this has now been moved to the Init
code of class InputProgramScanner
. Also in CompilerMain
please rename the function initCompiler
to initInterpreter
.
I’m using the InputProgramScanner from chapter 3 as it already supports white spaces and multi-character identifiers and this makes it a bit more user-friendly.
Given that we will be performing arithmetic, the first thing we need to do is to change the getNumber
function to return an integer by changing the declaration and by modifying the return line at the end:
/**
* get a number
* <number> ::= [ <digit> ] +
*/
fun getNumber(): Int {
var value: String = ""
if (!isNumeric(nextChar)) {
expected("Number")
} else {
while (isNumeric(nextChar)) {
value += nextChar.toString()
skipNextChar()
}
skipWhite()
}
return value.toInt()
}
And let’s have a simplified version of parseExpression
, which will also have to return an integer:
/** parse a numeric expression */
fun parseExpression(): Int {
return inp.getNumber()
}
To make this simple interpreter work, add this line in the main
function:
/** main function */ fun main(args: Array<String>) { initInterpreter(args) println(parseExpression()) }
Compile this and run with an input program consisting of one integer; it will recognise and print that integer.
Once you have this simple version working, let’s move on and extend it with addition and subtraction. Here’s the new parseExpression
:
/** parse a numeric expression */ fun parseExpression(): Int { var value: Int // provision for -Expression and +Expression if (inp.isAddop(inp.nextChar)) value = 0 else value = inp.getNumber() while (inp.isAddop(inp.nextChar)) { when (inp.nextChar) { addOp -> { inp.getNextChar() value += inp.getNumber() } subOp -> { inp.getNextChar() value -= inp.getNumber() } } } return value }
This is of course very similar to the parseExpression
from chapter 3, with one little difference. Given that we need both sides of the operation to do the the arithmetic, it’s simpler to just execute the addition or subtraction in-line in this function than define separate add
and subtract
functions as we did in the previous chapters. If we had chosen to keep them as separate functions, we would have to pass the current value of value
to them and they would have to return the result back to the caller, which would add unnecessary complexity. This also shows that the way the compiler has been written in the first 3 chapters is not suitable for lazy evaluation.
Compile it and make sure it works. It can process any expression of integers that includes any number of additions and/or subtractions.
I’m sure you will agree that it will be very easy to add support for multiplication and division. Just add the below parseTerm
function and replace every call to getNumber
in parseExpression
with a call to parseTerm
:
/** parse e term */
fun parseTerm(): Int {
var value: Int = inp.getNumber()
while (inp.isMulop(inp.nextChar)) {
when (inp.nextChar) {
mulOp -> {
inp.getNextChar()
value *= inp.getNumber()
}
divOp -> {
inp.getNextChar()
value /= inp.getNumber()
}
}
}
return value
}
As usual compile it and run it with any combination of integer additions, subtractions, multiplications and divisions that you can think of, including erroneous input.
And to finish this first part of the interpreter we can add a new function parseFactor
and replace all the calls to getNumber
in parseTerm
with calls to parseFactor
. This will enable us to process parenthesised expressions:
/** parse a factor */
fun parseFactor(): Int {
val value: Int
if (inp.isLeftParen(inp.nextChar)) {
inp.getNextChar()
value = parseExpression()
inp.match(rightParen)
}
else
value = inp.getNumber()
return value
}
Please compile it and try it out, make sure it handle any complex parenthesised expression and try to break it by giving it wrong input.
At this stage Jack talks about “a little philosophy” where he is giving us his thoughts about using the stack in expression parsing and how we are getting this for free here by using recursion in a high-level language like Kotlin (or Pascal which is what he used). I would invite you to read the original.
Using Variables
When we introduced variables in our compiler, it was a very simple matter as we just passed the variable name to the Assembler to deal with it. Here it’s slightly more complicated as we have to find a place to store the name of each variable and it’s value. Those of you who know Java, must have guessed that a good solution is a Map with the variable name as the key and the variable’s value as the data. So, here goes:
- Add the Map that will hold all our variables:
// the variables' space
var variable: MutableMap<String, Int> = mutableMapOf()
- Modify parseFactor to use the values stored in this map when a variable is referenced:
/** parse a factor */ fun parseFactor(): Int { var value: Int = 0 if (inp.isLeftParen(inp.nextChar)) { inp.getNextChar() value = parseExpression() inp.match(rightParen) } else if (inp.isAlpha(inp.nextChar)) { val varName = inp.getName() val temp: Int? = variable[varName] if (temp == null) abort("variable ${varName} not initialised") else value = temp } else value = inp.getNumber() return value }
- Add a parseAssignment function to set a variable to a value:
/** parse an assignment */ fun parseAssignment(): Int { val identifier = inp.getName() inp.match(equalsOp) variable[identifier] = parseExpression() // return value used for testing purposes only return variable[identifier] ?: 0 }
You can only partly verify parseFactor
as it will now fail in the first mention of any variable (as it’s unassigned). You can test parseAssignement
if you change the main function to print the return value of parseAssignemt
and run the interpreter with an assignment statement using only constant values.
And this brings us naturally to processing of newlines. This will enable us to feed a multi-line program to our interpreter and verify the result of each line. All we need to do is to add a function that will skip any newline character and change the main function to parse assignments in a loop, while skipping new lines (please remember also to skip any newlines in the initialisation code in InputProgramScanner.kt
to get rid of any empty lines in the beginning of your program):
/** skip newline */
fun skipNewline() {
while (isEndOfLine(nextChar))
getNextChar()
}
and
/** main function */ fun main(args: Array<String>) { initInterpreter(args) while (true) { println(parseAssignment()) inp.skipNewline() } }
As you can see, in this test version, main
is an endless loop, so how do we get out of it? In my code the loop terminates via the “end of input” test in getNextChar
. Jack in his code had used a program terminator character ('.'
) and broke the loop when he saw that char.
A note on Kotlin:
Those of you who know Java would know that you can update a map or retrieve a value from a map using:
map.put(key, value)
and
value = map.get(key
)
As you can see above we can now do in Kotlin
map[key] = value
and value = map[key]
…and those of you who happen to know Pascal, may have noticed the similarity with Jack’s Turbo Pascal code:
Table: Array['A'..'Z'] of integer;
...
Table['A'] := 25;
...
Value := Table['A'];
Also, Kotlin, unlike Java, does not normally allow null
values. One of Kotlin’s aims was to eliminate the NullPointerException
that has caused so many program crashes and so much pain to so many of us. If however a variable may have a null
value for a good reason (as in the return value from the map in case the variable name has not been saved before) then it has to be declared as above Int?
as opposed to Int
. You can read more about Null Safety in Kotlin here.
With these powerful features in mind, we can rewrite parseFactor
and make it a bit more compact using the Elvis Operator:
/** parse a factor */ fun parseFactor(): Int { val value: Int if (inp.isLeftParen(inp.nextChar)) { inp.getNextChar() value = parseExpression() inp.match(rightParen) } else if (inp.isAlpha(inp.nextChar)) { val varName = inp.getName() value = variable[varName] ?: abort("variable ${varName} not initialised") } else value = inp.getNumber() return value }
In order to use the abort
function at the right-hand side of the Elvis operator I have used the trick to declare it as Int
. The Kotlin compiler does not complain about the fact that abort
does not return any value, because the last statement in it is exitProcess
(so no one would care about whatever value it returned).
/** abort compilation */ fun abort(errMsg: String): Int { error(errMsg) exitProcess(1) }
Input and Output
To make our interpreter a bit more interactive, Jack concluded his original chapter 4 by adding input and output functions. The token '?'
is the “read” statement and the token '!'
is the “write” statement. Below are the respective functions:
/** input function */ fun input() { inp.getNextChar() val varName = inp.getName() println("enter value for [${varName}]") val s: String? = readLine() variable[varName] = s?.toInt() ?: 0 } /** output function */ fun output() { inp.getNextChar() val varName = inp.getName() println("value of [${varName}] = ${variable[varName] ?: abort("variable ${varName} not initialised")}") }
I could not resist the “temptation” to make them a tiny bit more complex than in Jack’s version, so I have added prompts to tell us which variable we are inputting the value for, or seeing the value of. Also, please note the safeguards against null values and the use of the Elvis operator (in case it has gone unnoticed, you can read more about Null Safety in Kotlin here).
And here’s the loop in the main function, which now becomes:
/** main function */ fun main(args: Array<String>) { initInterpreter(args) while (true) { when (inp.nextChar) { inToken -> input(); outToken -> output() else -> parseAssignment() } inp.skipNewline() } }
We also need to add to our global definitions:
// input / output
val inToken = '?'
val outToken = '!'
Try it out. Use for example the below as your input program:
? BaseLength
? BaseWidth
BaseArea = BaseLength * BaseWidth
! BaseArea
? Height
Volume = BaseArea * Height
! Volume
Wow! We now have a fully functional calculator that can do addition, subtraction, multiplication and division, and process a series of any kind of complex numerical expressions including parenthesis. It also has a practically infinite number of “memory” locations where we can store anything we want during our calculation. Feel free to add real numbers, power of, square root, logarithms, sine, cosine, etc… and you have a proper scientific calculator. Those of you inclined towards mobile apps, can take this code, add a proper graphical user interface and create your own calculator app for Android and put it up in the AppStore (not for profit of course). If you would like to do this, please ensure you contact me first, and also mention my name and of course Jack’s name in your credits.
Coming up next: Control Structures
Below is the full code for this version of our Interpreter:
InterpreterMain.kt
package you.set.this import kotlin.system.exitProcess /** * A Simple Interpreter * Based on the Let's Build a Compiler! series by Jack Crenshaw * Version 1.0 21.10.2021 */ // the input program scanner lateinit var inp: InputProgramScanner /** report an error */ fun error(errMsg: String) { System.err.println("Error: $errMsg") } /** abort compilation */ fun abort(errMsg: String): Int { error(errMsg) exitProcess(1) } /** print message and exit */ fun exit(msg: String) { System.err.println(msg) exitProcess(0) } /** report what was expected and abort */ fun expected(expMsg: String) = abort("Expected $expMsg \n found [${inp.nextChar}]") /** compiler initialisation */ fun initInterpreter(args: Array<String>) { if (args.isEmpty()) abort("no input file specified") // get a new input program scanner object - initialise from input file inp = InputProgramScanner(args[0]) } /** main function */ fun main(args: Array<String>) { initInterpreter(args) while (true) { when (inp.nextChar) { inToken -> input(); outToken -> output() else -> parseAssignment() } inp.skipNewline() } }
InputProgramScanner.kt
package you.set.this import java.io.File /** * The input program scanner class * Performs the lexical scanner functions * Processes the char-by-char input and returns the tokens from the input stream */ ///////////////////////////////////////////////////////// // global special characters definitions ///////////////////////////////////////////////////////// // numeric operators val addOp: Char = '+' val subOp: Char = '-' val mulOp: Char = '*' val divOp: Char = '/' // parentheses val leftParen: Char = '(' val rightParen: Char = ')' // equals val equalsOp = '=' // input / output val inToken = '?' val outToken = '!' ///////////////////////////////////////////////////////// /** this class implements the lexical scanner */ class InputProgramScanner(inputFile: String = "") { // the input program as string var inputProgram: String = "" var indx = 0 // the next character from input // this is our lookahead character var nextChar: Char = ' ' /** initialisation code */ init { try { val f = File(inputFile) // read the whole program into a string // add a dummy \n at the end // this way the lookahead will work properly when we reach the end of input inputProgram = f.readText() + "\n" // set the lookahead character to the first input char getNextChar() skipNewline() } catch (e: Exception) { abort("could not open file [$inputFile]") } } /** get the next character from input after skipping white spaces */ fun getNextChar() { skipNextChar() skipWhite() } /** set the lookahead character to the next char from input */ fun skipNextChar() { if (indx < inputProgram.length) nextChar = inputProgram[indx++] else exit("end of input") } /** skip white spaces */ fun skipWhite() { while (isWhite(nextChar)) skipNextChar() } /** skip newline */ fun skipNewline() { while (isEndOfLine(nextChar)) getNextChar() } /** match a specific input char */ fun match(x: Char) { // return a match when the next char matches x if (nextChar == x) getNextChar() else expected("'$x'") } /** * get an identifier * <identifier> ::= <alpha> [ <alphanumeric> | <_> ] * */ fun getName(): String { var token: String = "" if (!isAlpha(nextChar)) expected("Identifier") else { while (isAlphanumeric(nextChar)) { token += nextChar.uppercase() skipNextChar() } skipWhite() } return token } /** * get a number * <number> ::= [ <digit> ] + */ fun getNumber(): Int { var value: String = "" if (!isNumeric(nextChar)) { expected("Number") } else { while (isNumeric(nextChar)) { value += nextChar.toString() skipNextChar() } skipWhite() } return value.toInt() } /** check for an alpha char */ fun isAlpha(c: Char): Boolean = c.uppercaseChar() in 'A'..'Z' /** check for a numeric char */ fun isNumeric(c: Char): Boolean = c in '0'..'9' /** check for alphanumeric */ fun isAlphanumeric(c: Char): Boolean = isAlpha(c) || isNumeric(c) || c == '_' /** check for an "addop" (+,-) */ fun isAddop(c: Char): Boolean = c == addOp || c == subOp /** check for a "mulop" (*,/) */ fun isMulop(c: Char): Boolean = c == mulOp || c == divOp /** check for left parenthesis */ fun isLeftParen(c: Char): Boolean = c == leftParen /** check for a white space */ fun isWhite(c: Char): Boolean = c == ' ' || c == '\t' /** check for end of line */ fun isEndOfLine(c: Char): Boolean = c == '\n' || c == '\r' }
ProgramParser.kt
package you.set.this // the variables' space var variable: MutableMap<String, Int> = mutableMapOf() /** parse an assignment */ fun parseAssignment(): Int { val identifier = inp.getName() inp.match(equalsOp) variable[identifier] = parseExpression() // return value used for testing purposes only return variable[identifier] ?: 0 } /** parse a numeric expression */ fun parseExpression(): Int { var value: Int // provision for -Expression and +Expression if (inp.isAddop(inp.nextChar)) value = 0 else value = parseTerm() while (inp.isAddop(inp.nextChar)) { when (inp.nextChar) { addOp -> { inp.getNextChar() value += parseTerm() } subOp -> { inp.getNextChar() value -= parseTerm() } } } return value } /** parse e term */ fun parseTerm(): Int { var value: Int = parseFactor() while (inp.isMulop(inp.nextChar)) { when (inp.nextChar) { mulOp -> { inp.getNextChar() value *= parseFactor() } divOp -> { inp.getNextChar() value /= parseFactor() } } } return value } /** parse a factor */ fun parseFactor(): Int { val value: Int if (inp.isLeftParen(inp.nextChar)) { inp.getNextChar() value = parseExpression() inp.match(rightParen) } else if (inp.isAlpha(inp.nextChar)) { val varName = inp.getName() value = variable[varName] ?: abort("variable ${varName} not initialised") } else value = inp.getNumber() return value } /** input function */ fun input() { inp.getNextChar() val varName = inp.getName() println("enter value for [${varName}]") val s: String? = readLine() variable[varName] = s?.toInt() ?: 0 } /** output function */ fun output() { inp.getNextChar() val varName = inp.getName() println("value of [${varName}] = ${variable[varName] ?: abort("variable ${varName} not initialised")}") }