Based on the original series “Let’s Build a Compiler!” by Jack Crenshaw.
Introduction
In the previous chapter I introduced functions with parameters, which made TINSEL a much more mature programming language. In this chapter I will go one more step further and introduce local variables.
Until now, the only variables TINSEL recognises are global variables that must be declared immediately after the program <name>
line using the var
keyword. In this chapter I will introduce local variables, i.e. variables that have limited scope, within the block of code that they have been declared.
Local Variables Declaration and Scope
In TINSEL I will let a local variable be declared anywhere in a block of code. By block of code I mean any part of code surrounded by the block_start and block_end delimiters { }
. This could be the body of a function, an if-block, a while-block, or any other block of code surrounded by the block delimiters.
The local variables will be declared exactly in the same way as the global variables:
var i: int var a: string(20)
Their scope will be within the block they have been declared in, i.e. they will be recognised from the line they have been declared in, to the closing block delimiter.
Unlike global variables, which are instantiated at compile-time and live in the .data
section of the assembly output, the local variables will be instantiated at run-time and will live in the stack.
Our compiler has so far all the tools to implement the above enhancements, so implementing local variables should be a relatively easy job.
Declaring Local Variables
Starting top-down, we want to be able to declare a local variable anywhere in a block. This means that we need to recognise and process the var
keyword as part of the processing of the block. It looks like a good place to do this is in parseStatement
which is the heart of the parseBlock
function. For this we need to add one line:
fun parseStatement(breakLabel: String, continueLabel: String, blockName: String) { when (inp.lookahead().encToken) { Kwd.varDecl -> parseLocalVars(blockName) Kwd.startBlock -> parseBlock(breakLabel, continueLabel) Kwd.ifToken -> parseIf(breakLabel, continueLabel) ...
This will recognise a local var declaration anywhere in the block. Let’s see what is does:
fun parseLocalVars(blockName: String) {
parseVarDecl(VarScope.local, blockName)
}
Very simple: it calls the existing parseVarDecl
function (that until now used to process global vars) with two parameters: the flag to tell it to process a local variable and the name of the block (more about this further down). The only change required in parseValDecl
is to pass these parameters on to parseOneValDecl
:
fun parseVarDecl(scope: VarScope = VarScope.global, blockName: String = "") { while (inp.lookahead().encToken == Kwd.varDecl) { do { inp.match() parseOneVarDecl(scope, blockName) } while (inp.lookahead().encToken == Kwd.commaToken) } }
and this is where we will start doing some real work:
fun parseOneVarDecl(scope: VarScope, blockName: String) { val varName = inp.match(Kwd.identifier).value inp.match(Kwd.colonToken) when (inp.lookahead().encToken) { Kwd.intType -> parseOneIntDecl(varName, scope) Kwd.stringType -> parseOneStringDecl(varName, scope) else -> inp.expected("variable type (int or string)") } if (scope == VarScope.local) { // add any local vars to the local vars map for this block val localVarsList: MutableList<String> = localVarsMap[blockName] ?: mutableListOf() localVarsList.add(varName) localVarsMap[blockName] = localVarsList } }
First, the scope
is passed further on to parseOneIntDecl
and parseOneStringDecl
. In addition we have a new section added above that maintains a list of all the local variables declared in each block in the localVarsMap
. The key to this map is the name of the block, which is set automatically by the compiler in the beginning of each block:
fun parseBlock(breakLabel: String = "", continueLabel: String = "") { inp.match(Kwd.startBlock) mustRestoreSP = true val blockName = "$BLOCK_NAME${blockId++}" // blockName is used as key to the local vars map for this block while (inp.lookahead().type != TokType.endOfBlock && !inp.isEndOfProgram()) { parseStatement(breakLabel, continueLabel, blockName) } releaseLocalVars(blockName, mustRestoreSP) inp.match(Kwd.endBlock) }
Going back to parseOneIntDecl
and parseOneStringDecl
, they just pass the scope
on to declareVar
:
fun parseOneIntDecl(varName: String, scope: VarScope) { ... declareVar(varName, DataType.int, initValue, INT_SIZE, scope) }
declareVar
simply checks the scope and calls one of the two functions, declareGlobalVar
(which is our old declareVar) and declareLocalVar
, which is where we need to do the work to declare the local variables:
fun declareVar(name: String, type: DataType, initValue: String, length: Int, scope: VarScope) { // check for duplicate var declaration if (identifiersMap[name] != null) abort ("line ${inp.currentLineNumber}: identifier $name already declared") when (scope) { VarScope.global -> declareGlobalVar(name, type, initValue, length) VarScope.local -> declareLocalVar(name, type, initValue, length) } }
So, until now, apart from the introduction of the block name, the scope and the local vars map, we have only made minimum changes to the compiler code.
fun declareLocalVar(name: String, type: DataType, initValue: String, length: Int) { val stackOffset: Int val lengthRoundedTo64bits = (length / 8 + 1) * 8 when (type) { DataType.int -> { stackOffset = code.allocateStackVar(INT_SIZE) initLocalIntVar(stackOffset, initValue) } DataType.string -> { stackOffset = code.allocateStackVar(STRPTR_SIZE) initLocalStringVar(name, stackOffset, initValue, lengthRoundedTo64bits) } else -> return } identifiersMap[name] = IdentifierDecl( TokType.variable, type, initialised = true, size = lengthRoundedTo64bits, isStackVar = true, stackOffset = stackOffset ) }
As you can see above, declareLocalVar
checks the type of the variable and calls first the function to allocate the space in the stack for an integer or a string pointer (which both happen to be 8 bytes by the way) and then initialises that stack variable. Finally, it adds the local variable to the identifiersMap
so that all the details are at hand for when this variable is referenced. Please note that in the beginning of this function, the size of the local variable is rounded up to the nearest 8-byte size, in order to keep the stack in good shape.
Also, regarding local vars initialisation, in case of an int, then the local variable may or may not be initialised with the declaration. In case of string though, the local variable must be always initialised, either by specifying the length of the string or the initial value. The reason for this is the way TINSEL handles strings, which requires each string pointer to point to a certain memory area where the contents of the string lives. Here’s the code that takes care of the initialisation of the string local variables:
fun initLocalStringVar(name: String, stackOffset: Int, initValue: String, length: Int) {
if (initValue.isEmpty() && length == 0)
abort ("line ${inp.currentLineNumber}: local variable $name is not initialised")
var constStringAddress = ""
// check for the constant string init value
stringConstants.forEach { (k, v) -> if (v == initValue) constStringAddress = k }
if (constStringAddress == "") { // if not found
// save the string in the map of constant strings
constStringAddress = STRING_CONST_PREFIX + (++stringCnstIndx).toString()
stringConstants[constStringAddress] = initValue
}
val stringDataOffset = code.allocateStackVar(length)
code.initStackVarString(stackOffset, stringDataOffset, constStringAddress)
}
This function first checks whether the local string var is initialised and aborts otherwise (exactly same as the global string vars) and then checks for the scenario where an initial value for the string is given. In that case checks to see if the value already exists in the stringConstants
map, and if not, it adds it. Finally, it allocates space for the contents of the string in the stack and calls the function that will produce the code to initialise it at run time:
fun initStackVarString(stackOffset: Int, stringDataOffset: Int, constStrAddress: String) {
outputCodeTabNl("lea\t$stringDataOffset(%rbp), %rax")
outputCodeTab("movq\t%rax, $stackOffset(%rbp)\t\t")
outputCommentNl("initialise local var string address")
if (constStrAddress.isNotEmpty()) {
outputCodeTabNl("lea\t$constStrAddress(%rip), %rsi")
outputCodeTabNl("movq\t$stackOffset(%rbp), %rdi")
outputCodeTab("call\tstrcpy_\t\t")
outputCommentNl("initialise local var string")
}
}
This function first sets the value of the string pointer to the block in the stack where the contents of the string lives and then checks whether there is initial value, which is then copied onto the string.
Cleaning up
This is the only bit left to be done at the end of the block:
fun parseBlock(breakLabel: String = "", continueLabel: String = "") { inp.match(Kwd.startBlock) mustRestoreSP = true val blockName = "$BLOCK_NAME${blockId++}" // blockName is used as key to the local vars map for this block while (inp.lookahead().type != TokType.endOfBlock && !inp.isEndOfProgram()) { parseStatement(breakLabel, continueLabel, blockName) } releaseLocalVars(blockName, mustRestoreSP) inp.match(Kwd.endBlock) } fun releaseLocalVars(blockName: String, restoreSP: Boolean) { var localVarSize = 0 localVarsMap[blockName]?.forEach { localVarSize += when (identifiersMap[it]?.type) { DataType.int-> INT_SIZE DataType.string-> STRPTR_SIZE + identifiersMap[it]?.size!! else-> INT_SIZE } identifiersMap.remove(it) } if (localVarSize > 0 && restoreSP) code.releaseStackVar(localVarSize) }
The function releaseLocalVars
is called at the end of the block and does two things: (a) calculates the stack space allocated by all the local vars in this block and releases it and (b) removes these local vars from the identifiersMap
so that they cannot be accessed outside the block.
You may have noticed that releaseLocalVars
has an additional parameter (apart from the block name), restoreSP
, which tells it to release the stack space used by the local vars in this block or not. This parameter is set to true
in the beginning of the block so that at the end of it the stack space will be released. With one exception: when we have a return
statement, then this parameter is set to false
. The reason is that when we return from a function, the whole stack frame is restored to its previous state, so any space allocated to local vars is automatically released. No need to increase the stack pointer.
fun parseReturn() { inp.match() ... code.returnFromCall() mustRestoreSP = false }
And that was it. TINSEL now supports local variables. And with this, we can write a professional-looking recursive version of the n-factorial program (strictly speaking it does not require local variables but it does require functions with parameters, which we covered in chapter xv). It works nicely until it hits integer overflow for input values above 25… And of course, now that we have recursion, we need to be careful not to overflow the stack.
And as always, you can find the code in my GitHub repository.
Coming up next – the next big step for TINSEL: TINSEL for Raspberry Pi