Based on the original series “Let’s Build a Compiler!” by Jack Crenshaw.
Introduction
Those of you who tried the “diamond” program from the last chapter (diamond.tnsl) must have noticed a certain weakness: when the print_line
function is called from inside the for
loop in the main program, it needs to know the value of i
so that it can print the correct pattern. And the only way we could do that in the previous chapter, was by assigning a global variable with the value of the for
loop control variable. This is not very neat, so let’s go and improve it by adding parameters to our functions.
The Ground Rules
In the x86-64 assembly standard the first 6 parameters to a function are passed via the registers %rdi, %rsi, %rdx, %rcx, %r8,
and %r9
. Any additional parameters or any parameters with size over 64 bits are passed via the stack. TINSEL will support up to 6 parameters (which are too many anyway) so only these 6 registers will be used for passing function parameters.
TINSEL supports int
and string
variables. In case of an int
parameter, the value of the variable will be passed and in case of a string
parameter the address of the string variable will be passed (i.e. int
parameters will be passed by value and string
parameters will be passed by reference).
As a side note, passing function parameters directly via the CPU registers may sound simpler than passing them via the stack (which is the traditional method), however, it introduces certain complexity, which we will address further down in this chapter.
Declaring Function with Parameters
This should be a relatively easy job. The function declaration will look like this:
<function declaration> ::= fun <identifier> : <type> ( [ <param declaration> ] * ) <block>
<param declaration> ::= <identrifier> : <type>
So, at first we need to add the code to parse any parameter declarations between the '('
and the ')'
. The code below does exactly this:
fun parseFunDecl() { while (inp.lookahead().encToken == Kwd.funDecl) { inp.match() val functionName = inp.match(Kwd.identifier).value ... inp.match(Kwd.leftParen) parseFunParams(functionName) // <---- inp.match(Kwd.rightParen) inp.match(Kwd.colonToken) ... inp.match() declareFun(functionName, funType) storeParamsToStack(functionName) // <---- parseFunctionBlock() } }
As you can see the first item that was added to our parseFunDecl
function is to parse the parameters:
fun parseFunParams(functionName: String) { var paramCount = 0 val paramTypesList = mutableListOf<FunctionParameter>() if (inp.lookahead().encToken == Kwd.identifier) { do { if (paramCount++ >= code.MAX_FUN_PARAMS) abort("line ${inp.currentLineNumber}: a function can have only up to ${code.MAX_FUN_PARAMS} parameters maximum") if (inp.lookahead().encToken == Kwd.commaToken) inp.match() paramTypesList.add(parseOneFunParam()) } while (inp.lookahead().encToken == Kwd.commaToken) } funParamsMap[functionName] = paramTypesList } fun parseOneFunParam(): FunctionParameter { val paramName = inp.match(Kwd.identifier).value if (identifiersMap[paramName] != null) abort("line ${inp.currentLineNumber}: parameter name $paramName has already been declared") inp.match(Kwd.colonToken) var paramType = DataType.none when (inp.lookahead().encToken) { Kwd.intType -> paramType = DataType.int Kwd.stringType -> paramType = DataType.string else -> inp.expected("variable type (int or string)") } inp.match() return FunctionParameter(paramName, paramType) }
parseFunParams
is a loop that processes a comma-separated parameter list. For each parameter it calls parseOneFunParam
.
parseOneFunParam
first checks whether the name has already been declared (which could be either as a global variable or as another parameter in this function) and if not, it processes the declaration by checking the type of the parameter. Finally it returns an object that contains the name and type of that parameter.
parseFunParams
builds a list of all the function parameters with their names and types and saves it in the funParamsMap
. This map will be used in two places: (a) in the beginning of the code for the function block and (b) when a function is called and the parameters must be assigned values.
Let’s look at (a) first.
You may have noticed above that there is one more line added to parseFunDecl
: it’s the call to storeParamsToStack
, which does exactly what it’s name says: stores the values of the parameters from the registers (up to six) passed by the caller to the stack. Why? Here’s the reason:
Based on the x86-64 architecture, the 6 registers used to pass the function parameters, are part of the so-called caller-save group of registers, which means that their value is not necessarily retained across function calls. Even though it would have been nice to keep the 6 parameters in these 6 registers and have them handy to use them through the function code, it’s very likely that we will have the scenario where another function is called that modifies e.g. %rdi
or %rsi
or %rcx
or %rdx
or %r8
or %r9
(the first 4 are very commonly used – they are also used in the TINSEL runtime library), and this means that the value of one (or more) parameter(s) would be corrupted.
The solution to the problem is to store the values in the stack as stack variables. We have used the stack to store a variable already in chapter 13 for the for-loop control variable. When the parameters are needed later in the function code, their value will be retrieved from the stack. This, by the way, is exactly how the GNU C compiler has dealt with this problem.
fun storeParamsToStack(functionName: String) { val paramsList = funParamsMap[functionName] ?: listOf() for (i in paramsList.indices) { val paramVarOffs = code.allocateStackVar(INT_SIZE) identifiersMap[paramsList[i].name] = IdentifierDecl( TokType.variable, paramsList[i].type, initialised = true, size = INT_SIZE, isStackVar = true, stackOffset = paramVarOffs, canAssign = false ) code.storeFunParamToStack(i, paramVarOffs) } }
This function foes the following for each parameter: allocates space in the stack, creates an entry in the identifiersMap
and finally saves the value of the parameter in the stack, which is done by the below assembly code:
val funInpParamsCpuRegisters = arrayOf("%rdi", "%rsi", "%rdx", "%rcx", "%r8", "%r9")
fun storeFunParamToStack(paramIndx: Int, stackOffset: Int) {
outputCodeTabNl("movq\t${funInpParamsCpuRegisters[paramIndx]}, $stackOffset(%rbp)")
}
There is one more thing required to be done just before the end of the function. We need to remove the parameters from the identifiersMap
, otherwise a subsequent function declaration would not be able to use parameters with the same names. This is done at the end of parseFunctionBlock
:
fun parseFunctionBlock() { hasReturn = false parseBlock() if (!hasReturn) abort("line ${inp.currentLineNumber}: function $funName has no ${inp.decodeToken(Kwd.retToken)}") // clean up declarations of parameters so that the names can be reused in other functions funParamsMap[funName]?.forEach { identifiersMap.remove(it.name) } }
Please note, there is no need here to release the space that was allocated in the stack for the function parameters, because the code produced on the back of the return
instruction takes care of this:
private fun funEnd() {
restoreStackFrame()
outputCodeTab("popq\t%rbx\t\t")
outputCommentNl("restore \"callee\"-save registers")
}
private fun restoreStackFrame() {
outputCodeTab("movq\t%rbp, %rsp\t\t")
outputCommentNl("restore stack frame")
outputCodeTabNl("popq\t%rbp")
}
Calling a Function with Parameters
So, most of the work is already done. Let’s go now and make the necessary changes to pass parameters when we call a function.
The value of each parameter will be the result of a boolean expression, which, as you remember from the previous chapters (and also from the BNF definitions TINSEL_BNF.md), can be a boolean result (0 or 1), a result of a numberical expression or even a result of string expression.
As per the x86-64 standard we have to set the CPU registers mentioned above with the value of each parameter. Here we have the same problem mentioned in the previous section, which is that these registers are not presevred across function calls. This means that we may have a scenario where we set %rdi to the value of the first parameter, we set %rsi to the vallue of the second parameter, and then we call a function to set %rdx to the value of the third parameter, which function changes the values of %rdi and %rsi corrupting the first two parameters.
Again, I will follow the excample of the GNU C compiler and do this:
First, I will set the registers %rbx, %r12-%r15
and %rax
to the values of the parameters (that is, to the values of the parameters that have been declared – up to 6). The first 5 of these registers are guaranteed to maintain their value across function calls so the above issue goes away. %rax
obviously does not maintain its value across function calls, but will be the last one set, so we will not lose the value of the sixth parameter.
Second, once all the values of the parameters have been set into these temporary registers, I will move them to the formal x86 parameter registers: %rdi, %rsi, %rdx, %rcx, %r8,
and %r9
.
And here’s how this is done:
fun parseFunctionCall(): DataType { val funcName = inp.match(Kwd.identifier).value inp.match(Kwd.leftParen) parseAssignFunParams(funcName) setInpFunParams(funcName) inp.match(Kwd.rightParen) code.callFunction(funcName) restoreParamRegisters(funcName) return getType(funcName) }
As you can see there are three new functions called by parseFunctionCall
:
The first one, parseAssignFunParams
, first produces the necessary code that will return the value for each parameter and checks that the value will be of the same type as the parameter, and then sets the temporary registers to the value of the parameters as mentioned above. As you can see in setIntTempFunParam
, each egister is first saved in the stack – based on the x86-64 standard it is our responsibility to ensure that these registers will retain their values when we return / exit.
fun parseAssignFunParams(functionName: String) { val paramTypeList = funParamsMap[functionName] ?: listOf() for (i in paramTypeList.indices) { if (i > 0) inp.match(Kwd.commaToken) val paramExprType = parseBooleanExpression() if (paramExprType != paramTypeList[i].type) abort ("line ${inp.currentLineNumber}: parameter #${i+1} must be type ${paramTypeList[i].type}, found $paramExprType") when (paramExprType) { DataType.int -> code.setIntTempFunParam(i) // the same code is used both for int and for string parameters DataType.string -> code.setIntTempFunParam(i) // i.e. moves %rax to the appropriate register for this parameter else -> {} } } }
fun setIntTempFunParam(paramIndx: Int) {
if (funTempParamsCpuRegisters[paramIndx] == "%rax")
return
outputCodeTabNl("pushq\t${funTempParamsCpuRegisters[paramIndx]}")
outputCodeTabNl("movq\t%rax, ${funTempParamsCpuRegisters[paramIndx]}")
}
The second one, setInpFunParams
, simply transfers the value of each temporary parameter register to the corresponding formal parameter register as per x86-64 standard:
fun setInpFunParams(functionName: String) {
val paramTypeList = funParamsMap[functionName] ?: listOf()
for (i in paramTypeList.indices)
code.setFunParamReg(i)
}
// these registers hold the fun params at the time of the call val funInpParamsCpuRegisters = arrayOf("%rdi", "%rsi", "%rdx", "%rcx", "%r8", "%r9") // during the assignment of the parameters, their values are saved temporarily here, // so that they are not corrupted by function calls executed during the assignment of the parameters val funTempParamsCpuRegisters = arrayOf("%rbx", "%r12", "%r13", "%r14", "%r15", "%rax") // 6 params maximum allowed val MAX_FUN_PARAMS = funInpParamsCpuRegisters.size fun setFunParamReg(paramIndx: Int) { outputCodeTabNl("movq\t${funTempParamsCpuRegisters[paramIndx]}, ${funInpParamsCpuRegisters[paramIndx]}") }
Finally, restoreParamRegisters
, restores the values of the temporary registers from the stack so that we are compliant with the x86-64 standard for register usage:
fun restoreParamRegisters(functionName: String) { val paramTypeList = funParamsMap[functionName] ?: listOf() for (i in paramTypeList.indices) code.restoreFunTempParamReg(paramTypeList.size - i - 1) }
fun restoreFunTempParamReg(paramIndx: Int) {
if (funTempParamsCpuRegisters[paramIndx] == "%rax")
return
outputCodeTabNl("popq\t${funTempParamsCpuRegisters[paramIndx]}")
}
Plase note %rax
is neither saved nor restored, as it is not required to maintain its value across function calls.
A Point to note about Function Return Values
As we are in the subject of calling a function, I’d like to point out the following scenario:
Suppose we have a TINSEL function (let’s call it test_alpha()
) that returns an integer value, which we want to treat as boolean (0 for false, non-zero for true). We would want to use it like this:
if (test_alpha()) { // do something }
Please note that this code simply works because of the way the expressions are calculated and the flags register is set. So, when at some point in function test_alpha
there is a return
statement
return <some_expression>
the result of some_expression
is calculated and is placed in %rax
. At the same time the Z-flag
is set either by the instructions executed as part of the expression calculation (addq
, subq
, etc) or by specifically executing
testq %rax, %rax
which sets the flags based on the contents of %rax
, when %rax
is set by a movq
instruction from the contents of a memory location.
This means that a function not only returns a value in register %rax
, but also returns the flags register, which is set based on the contents of %rax
. And this is what makes the above code work.
Now TINSEL looks a bit more like a grown up language (although there is still a long way to go) and you can modify the diamond.tnsl example to use a parameter for the print_line()
function instead of the global variable k
.
And as always, you can find the code in my GitHub repository.
Coming up next: Local Variables and Recursion