Chapter XIII: Fixing the For Loop

Based on the original series “Let’s Build a Compiler!” by Jack Crenshaw.

Introduction

Our newly introduced programming language, TINSEL, has now taken shape and we can write a decent program. We have improved our error checking and it gives us meaningful error messages pointing to the line where the error was. There is still one thing missing though – the FOR loop. If you look at parseFor, you will see that it still produces dummy code, so, let’s go and fix it in this chapter and produce our TINSEL V1.2.

Fixing the For Loop

Jack had pointed out in his tutorial that “the for loop is a bear to translate” and I would agree. There are a lot of moving parts: (a) the control variable – where do we keep it? in the stack? in a register? (b) the start value (c) the end value – do we recalculate it at the end of each iteration or only once in the beginning? (d) the step. So, let’s set the scene and define first what we want to do and then decide how we do it.

This will be my specification of the for loop:

    <for> ::= for ( <identifier> = <expression1> [ down ] to <expression2> [ step <expression3> ] ) <block>

It will work like this:

  1. Set the control variable to expression1
  2. Check if it has exceeded the limit (expression2) and if yes exit the loop
  3. Execute the block
  4. Update the control variable
  5. Loop back to point 2.

In addition I will support an optional down keyword, to count down as opposed to up. If it’s there, then the control variable will be decreased by 1 at the end of each iteration (while, if it isn’t there, it will obviously be increased by 1).

Finally there is an optional step keyword followed by an expression. If this is present, then the value of expression3 will be added to the control variable at the end of each iteration (as opposed to the control variable increased or decreased by 1).

It is important to remember that it will be the responsibility of the programmer to set the step expression (but also the initial and the to expressions) to the right values so that the control variable will eventually reach the limit and the loop will be correctly terminated. Please note, if down is used, then the result of the step expression must be a negative number.

Furthermore, in my implementation, expression2 and expression3 (if it exists) are evaluated only once when the loop is entered and not in every iteration.

The Control Variable

The control variable must be a new variable not previously defined in that function or main program. It exists only within the for block and cannot be assigned a value in an assignment statement – it is assigned a new value automatically at the end of each iteration. However, it’s value can be used on the right-hand side of an assignment within the for block.

The next question is “where do we store it?”. Jack had decided to use the global variables space in his implementation of the for loop. That would not work with TINSEL, e.g. in the scenario where we have a for loop in the main block that calls a function that has a for loop and uses the same name for the control variable. This is valid scenario and TINSEL should support it.

I would be tempted to use a CPU register given that the x86 architecture has a few spare ones (e.g. %r8 – %r15). This would be a really easy option, but apart from the above, it presents an additional problem: how do we handle nested for loops? We would have to use a new register in each level and also keep track of which level of for we are in and use the appropriate register when we want to access the control variable. This would introduce unnecessary complexity plus we would soon run out of registers… Hence, the answer is: the stack. It’s the most appropriate place, as the nature of the control variable is exactly the same as of a local variable.

Using the Stack for Storing a Variable

In the x86 architecture there are standard techniques for using the stack for temporary variable storage (e.g. for local variables or, as in our case, for the for loop control variable). The first step is to create a new stack frame in the beginning of each function. This means that the old stack frame must be restored at the of the function. I do the same in the beginning and at the end of the main block as well.

To create a new stack frame we just need to save the Base Pointer in the stack and then give it the value of the Stack Pointer (i.e. have the new stack frame start where the stack pointer is pointing after the last push):

    pushq %rbp
    movq %rsp, %rbp

Consequently, to restore the old stack frame we need to restore the Stack Pointer from the Base Pointer and then restore the Base Pointer from the stack:

    movq %rbp, %rsp
    popq %rbp

I have implemented the two functions newStackFrame and restoreStackFrame in the code module that produce the code for these two operations. They are called in the beginning and at the end (where we have the return statement) of each function block and of the main block.

With our new stack frame in place, we can allocate space in the stack for a temporary variable by reducing the stack pointer by the appropriate number of bytes. E.g. to allocate space for a 64-bit integer (i.e. 8 bytes) we just do:

    subq $8, %rsp

In addition we need to keep track of the offset of each variable we allocate from the base of the stack (don’t forget, rbp had the value of rsp in the beginning of the block). This means that when we define a new stack frame, we start with an offset of 0, and this is decreased by the appropriate amount of bytes every time space for a variable is allocated in the stack. E.g. the offset of the first 64-bit integer we allocated is -8, and we can access it like this:

    movq -8(%rbp), %rax

The current offset from the Base Pointer is managed within the code module.

To release the space that has just been allocated at the top of the stack, we just add the variable’s size in bytes to the stack pointer, e.g.:

    addq $8, %rsp

For these two operations I have implemented allocateStackVar(size: Int) and releaseStackVar(size: Int) in the code module.

With all that clarified, here’s what’s needed in our compiler:

In the IdentifierDecl map that holds all the variables and functions declarations I have added two more parameters:

var stackVar: Boolean = false

which is true when the variable is stored in the stack and false if it’s a global variable

var stackOffset: Int = 0

which holds the offset from the base pointer for a stack variable.

And to actually save or retrieve a stack variable, I have implemented the functions assignmentLocalVar(offset: Int) and setAccumulatorToLocalVar(offset: Int) respectively in the code module.

With these tools in our toolbox we can now go ahead and code the parseFor function

The New Shape of parseFor

In my first attempt, I started rewriting the for parser implementing all the various aspects (the control variable in the stack, the initial expression, the to expression, the optional down keyword and the optional step keyword) and the outcome was a function that was over 80 lines long… I had to break that down in smaller pieces so that I could read it and debug it. So, here’s the new parseFor function:

fun parseFor(): Boolean {
inp.match()
parseForLine()
presetCtrlVar()
val label1 = newLabel()
val label2 = newLabel()
postLabel(label1) // actual start of the loop
stepAndCheck() // increase (or decrease) ctrl var and check
code.branchIfFalse(label2) // if limit reached, exit
val foundReturn = parseBlock(label2) // the FOR block
code.branch(label1) // loop back to the beginning of the loop
postLabel(label2) // exit point of the loop
cleanUpStack()
return foundReturn
}

I will go through this at high level and will let you check all the details by looking at the code in the repository. Due to the complexity and the number of lines, I have put the for parser and the related functions in a separate file and actually a separate class: ProgParser1aControlFor.kt, class ForParser.

And now let’s go through the parts of parseFor:

First comes parseForLine, which parses the actual for line from the for keyword to the closing parenthesis.

It first gets the control variable name and checks if it’s been declared already. If not, it allocates space for it in the stack and adds it to the variables’ map marking it as stack variable and saving its offset from the base pointer. It then parses the “from” expression and produces the code to set the initial value of the control variable.

Next, it checks for a down keyword and sets the relevant flag if it finds it.

It then producers the code for the to expression and saves its value also in the stack, but without assigning a variable name to it; it’s not needed. However, it has to save the offset from rbp and for this a class variable is used.

And the last bit of this first part is to process the step expression (if there is any), set a flag if there is one, and produce the code to calculate the value of the expression and save it also in the stack. This is not saved as a variable in the variables’ map either, so a class variable is also used to save the offset from rbp.

Next we have presetCtrlVar, which pre-decrements (or pre-increments if we have down) the control variable to get it ready for the first “increment-and-check” (or decrement-and-check respectively).

Then we produce the two labels that every loop needs: label1, which marks the beginning of the loop and label2, which marks the first instruction after the end of the loop.

Here starts the loop: we first call stepAndCheck. This function first increments the control variable by 1 (or decrements it by 1 if we have down). Please note that if we have step, then the step value is always added to the control variable, as it will be positive if there is no down and negative if there is down.

It then invokes the appropriate comparison, >= or <= depending on the presence of down or not.

And this is where we know whether we have to stop the loop by executing branchIfFalse, same as in the while and repeat implementations.

Having established whether we can continue with the next iteration, we execute the actual for block (parseBlock) and then go back to the beginning of the loop to check all over again.

The final part in our for parser, after we are done with the loop, is to clean up the stack and the variables’ space: cleanUpStack. It first removes the entry for the control variable from the variables’ map so that it can be potentially reused in a new for block.

It then releases the stack space allocated for the step value (if it was there) plus the space for the to value and the control variable.

And that was it – this is our for parser done. This class is almost 150 lines long but at least one can read it and understand what each function does.

We also need to change the line in parseStatement where for is called:

    Kwd.forToken -> return ForParser().parseFor()   // in separate module due to increased complexity

Now you can try this nested loop, compile it and run it on your Linux machine and see the output:

/*--
 test nested for loops
*/
program TestFor

var n = 0

main {
    read n
    for (i=n down to 1 step 2) {
        print i
        for (j=1 to i) {
            print i*j
        }
        print 999
    }
}

endprogram

Continue

Now that we are in the topic of “for”, it’s a good opportunity to add continue. We have break already, implemented based on the method that Jack has shown us, so it will be quite easy to add the necessary processing for continue.

If you remember, every loop parser generates two labels: label1 and label2. Label1 marks the start of the iteration, while label2 marks the next instruction following the loop (the one that is executed next, after the loop is terminated).

And it’s label2 that is passed as a parameter to parseBlock, and then on to parseStatement and then on to parseIf, and on and on down the recursion tree, so that if and when a break statement is encountered, it will produce an unconditional jump instruction to that label in order to break the loop.

We will use exactly the same technique to implement continue. We will add a second parameter to parseBlock, parseStatement and parseIf, which will be the label to be used in parseContinue:

fun parseBlock(breakLabel: String = "", continueLabel: String = ""): Boolean

The “continue” label will be passed down the recursion tree alongside the “break” label and if and when a continue statement is found, it will use it:

   Kwd.continueToken -> { parseContinue(continueLabel); return false }

And the actual parseContinue function is effectively a copy of parseBreak, as it does exactly the same thing (generates an unconditional jump to the label it has received as parameter). In fact we could have used the very same function, but in the interest of clean code I will keep them separate:

fun parseContinue(label: String) {
    inp.match()
    if (label == "")
        abort("line ${inp.currentLineNumber}: no loop to continue")
    code.jump(label)
}

Don’t forget, we need to add the continueToken in our LanguageTokens file.

And that was it – with this minor change in our code, we can now support continue, and this is thanks to Jack’s method, which resulted in code that is so easy to expand.

In this version I have also introduced two small enhancements in the code module:

First, a constant CODE_ID that contains an identification of the flavour of assembly code produced. It is currently set to “x86-64 Assembly Code – AT&T format” and is printed as a comment at the top of the compiler output file. This is to help the reader identify easily what kind of code has been produced in that file. When a new code module is developed (e.g. to support ARM architecture) this string will be updated.

Second, I have added a line counter for the assembly code produced: outputLines. This is being incremented based on the number of new lines in the output string in outputCode and is printed on the console when the compiler finishes (together with the number of source lines).

And always, you can find the code in my GitHub repository.

Coming up next: Strings

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s