Chapter 5b: Control Structures – part 2

Based on the original series “Let’s Build a Compiler!” by Jack Crenshaw.

Introduction

This is the second part of the Control Structures chapter. Without further ado, we will go straight to the REPEAT statement.

REPEAT Statement

Having done WHILE, it should be very easy to process the REPEAT statement. The syntax of REPEAT in our language will be:

    REPEAT <block> UNTIL <condition>

The result of this statement will be again a loop like while, with the only difference that the condition is executed at the end of each iteration. In syntax-directed translation:

    REPEAT         { L = NewLabel
                     PostLabel(L) }
    <block>
    UNTIL
    <condition>    { BranchIfFalse(L) }

And the corresponding parse function can be written very easily. We will use 'r' for REPEAT and 'u' for UNTIL:

/**
 * parse repeat statement
 * REPEAT <block> UNTIL <condition>
 */
fun parseRepeat() {
    inp.getNextChar()
    val label = newLabel()
    postLabel(label)
    parseBlock()
    inp.match(untilToken)
    parseCondition()
    code.branchIfFalse(label)
}

We also need to define our ‘repeat’ and ‘until’ tokens:

val repeatToken = 'r'
val untilToken = 'u'

The function isEndBlock is now becoming a long list of comparisons which is a bit messy. Why not use a Kotlin Set object to define all the possible block terminators:

// the set of tokens that are block terminators
private val blockTerminators = setOf(endifToken, elseToken, endwhileToken, untilToken)

And with this, isEndBlock now looks much better:

/** check for end of block */
fun isEndBlock(c: Char) = blockTerminators.contains(c) || isEndProgram(c)

Finally, need to change parseBlock and tell it to recognise the repeat statement:

/**
 * parse a block
 * <block> ::= [ <statement> ] * end
 */
fun parseBlock() {
    while (!inp.isEndBlock(inp.nextChar)) {
        when (inp.nextChar) {
            ifToken -> parseIf()
            whileToken -> parseWhile()
            repeatToken -> parseRepeat()
            else -> parseOther()
        }
    }
}

Compile it, verify it and make sure it works.

BREAK Statement

This is a quite useful statement that can simplify a program a lot. The challenge with it is when you have nested loops, which of them do you break out of? The answer is of course the inner-most loop, so the parser for break statement needs to know the exit address of the loop it has been called from, in order to produce the correct assembly branch code.

In our implementation, each loop iteration is processed by parseBlock, which also processes the IF block, which is where the break statement is most likely to appear. Based on this train of thought, it sounds like we would need the following:

  • when there is an IF command in a loop (parseWhile or parseRepeat), the call to parseIf must pass the exit label for that loop
  • parseIf must also pass that label to both of its blocks (if and else) as this is where the BREAK would be (if any)
  • remember that parseIf will be called by parseBlock, so parseBlock must receive the exit label when it is called by parseWhile or parseRepeat in order to pass it down to parseIf

You can see the recursion here, where parseBlock calls parseWhile, which calls parseBlock, which calls parseIf, which calls parseBlock, and over again, and at some point, one of those parseBlock instances will call parseBreak. The stack in Kotlin (that we are getting for free) ensures that the value of the exit label parameter that is being passed down the chain is always the most recent.

This may sound a bit complicated but it’s a really clever yet simple solution from Jack, and here’s the code for it:

/** parse break statement */
fun parseBreak(label: String) {
    inp.getNextChar()
    if (label == "")
        abort("no loop to break of")
    else
        code.branch(label)
}

We need to define our break token:

val breakToken = 'b'

And of course tell parseBlock to deal with BREAK as well:

/**
 * parse a block
 * <block> ::= [ <statement> ] *
 */
fun parseBlock(breakLabel: String = "") {
    while (!inp.isEndBlock(inp.nextChar)) {
        when (inp.nextChar) {
            ifToken -> parseIf(breakLabel)
            whileToken -> parseWhile()
            repeatToken -> parseRepeat()
            breakToken -> parseBreak(breakLabel)
            else -> parseOther()
        }
    }
}

This is the modified code for ifParse that passes the break Label it receives from the caller, to the two blocks further down:

/**
 * parse if statement
 * IF <condition> <block> [ ELSE <block> ] ENDIF
 */
fun parseIf(breakLabel: String) {
    inp.getNextChar()
    parseCondition()
    val label1 = newLabel()
    code.branchIfFalse(label1)
    parseBlock(breakLabel)
    if (inp.nextChar == elseToken) {
        inp.getNextChar()
        val label2 = newLabel()
        code.branch(label2)
        postLabel(label1)
        parseBlock(breakLabel)
        postLabel(label2)
    }
    else
        postLabel(label1)
    inp.match(endifToken)
}

The version of parseWhile that passes the break Label is below:

/**
 * parse while statement
 * WHILE <condition> <block> ENDWHILE
 */
fun parseWhile() {
    inp.getNextChar()
    val label1 = newLabel()
    val label2 = newLabel()
    postLabel(label1)
    parseCondition()
    code.branchIfFalse(label2)
    parseBlock(label2)
    inp.match(endwhileToken)
    code.branch(label1)
    postLabel(label2)
}

And the new version of parseRepeat is:

/**
 * parse repeat statement
 * REPEAT <block> UNTIL <condition>
 */
fun parseRepeat() {
    inp.getNextChar()
    val label1 = newLabel()
    val label2 = newLabel()
    postLabel(label1)
    parseBlock(label2)
    inp.match(untilToken)
    parseCondition()
    code.branchIfFalse(label1)
    postLabel(label2)
}

As usual, compile it, verify it and try to break it (pun intended and copied from the original).

FOR Statement

I’ve left the FOR loop last for a reason. Jack in his original chapter 5, explained how complex it is to produce M68000 assembly code to process a FOR loop while having the value of the counter accessible within the loop. In fact the simplest he could do was 14 lines of assembly code just for the FOR loop, and that’s without counting the code to calculate the <from> and <to> expressions and without the code that is executed inside the loop. This is 2x or 3x the size of the while or repeat code and this is also reflected in the parser function.

I will take a different approach. I will analyse and process the FOR statement, but given that my goal is not to learn or to teach M68000 assembly, I will just produce dummy pseudo-code just to demonstrate the concept as Jack analysed it. When we get to convert our compiler to produce TI-59 code, I will make it produce real TI-59 code for the FOR loop.

So, the FOR in our language is:

    FOR <variable> = <expression1> TO <expression2> <block> ENDFOR

This is equivalent to our familiar WHILE loop:

<variable> = <expression1>
TEMP = <expression2>
WHILE <variable> <= TEMP
<block>
++<variable>
ENDWHILE

Please note, the assumption here is that the step is always +1 and the value of expression2 is calculated only once, before the loop starts.

I will translate this into code, with one difference. I will increase the value of variable just before the condition is evaluated, instead of increasing it at the end of the iteration. This means that I have to decrease it by 1 just after I set it to expression1 in the beginning, to get the first time right.

/** 
 * parse for statement 
 * dummy version using pseudo-code
 * focuses on parsing of the structure only - not on producing code
 */
fun parseFor() {
        inp.getNextChar()
        val counterName = inp.getName()
        inp.match(equalsOp)
        code.dummyInstr("Allocate space for $counterName and set value to:")
        parseOther() // this is the FROM expression
        code.dummyInstr("Decrease $counterName by 1")
        inp.match(toToken)
        code.dummyInstr("Allocate space for TO variable and set value to:")
        parseOther() // this is the TO expression
        val label1 = newLabel()
        val label2 = newLabel()
        // actual start of the loop
        postLabel(label1)
        // increase counter and check the condition
        code.dummyInstr("Increase $counterName by 1")
        code.dummyInstr("Is $counterName <= TO?")
        code.branchIfFalse(label2)
        // execute the body of the loop
        parseBlock(label2)
        inp.match(endforToken)
        // back to the beginning of the loop
        code.branch(label1)
        // exit point of the loop
        postLabel(label2)
        code.dummyInstr("Release space held for $counterName and TO")
}

I have also added a dummy instruction to our M68000Instructions.kt:

/** dummy instruction */
fun dummyInstr(cmd: String) = outputCodeNl(cmd)

We need to define the tokens for FOR, TO and ENDFOR in our global definitions and add ENDFOR to the list of block terminators:

val forToken = 'f'
val toToken = 't'
val endforToken = 'e'

// the set of tokens that are block terminators
private val blockTerminators = setOf(endifToken, elseToken, endwhileToken, untilToken, endforToken)

And of course we need to recognise the FOR token in parseBlock:

/**
 * parse a block
 * <block> ::= [ <statement> ] *
 */
fun parseBlock(breakLabel: String = "") {
    while (!inp.isEndBlock(inp.nextChar)) {
        when (inp.nextChar) {
            ifToken -> parseIf(breakLabel)
            whileToken -> parseWhile()
            repeatToken -> parseRepeat()
            forToken -> parseFor()
            breakToken -> parseBreak(breakLabel)
            else -> parseOther()
        }
    }
}

Compile it and try it with an input program like:

f counter = expr1 t expr2
    xloop_block
e
.

If you feel adventurous you can add a STEP token as well (but don’t forget that this logic assumes a positive step as the exit condition checks for Counter <= TO). Also, please remember, for the block inside the FOR loop use for now a letter that has not been used as a token before (e.g. the word block will be interpreted as break – but as discussed this will be dealt with in chapter 7 where we will have multi-character tokens).

At this point I would like to make one more little improvement: in the various parser functions I have used getNextChar to advance to the next token (in our case single-character token) where Jack was using match. He did that for consistency and used match everywhere, even when he did not have to actually match the token as it had already been matched immediately above. In these cases I have used getNextChar in order to avoid the unnecessary call to match that would cause confusion to the reader.

However I really appreciate the importance of consistent code and I’d really like to use match everywhere – but only when a match is needed. So how do I reconcile the two? I have changed slightly function match to (a) match the given token and advance to the next character or abort if not matched and (b) just advance to the next character if no token is passed as a parameter. This can be easily achieved in Kotlin by using the default value for the parameter to the function: when there is no parameter provided by the caller, the value of the parameter in the function will be the default value I have defined (null character(0)) and this can be used as our wildcard.

/**
 * match a specific input char
 * and advance to next character
 * also produce a match if called with no input or if input is null character
 */
fun match(x: Char = nullChar.toChar()) {
    // return a match when the next char matches x
    // or when x is null character
    if (nextChar == x || x == nullChar.toChar())
        getNextChar()
    else
        expected("'$x'")
}

With this new version of match, I can now use inp.match(endforToken) where I need to match a token and advance to the next one, or inp.match() where I just need to skip the token and advance to the next one (the match has already happened before this point). Please replace all the calls to getNextChar in the parser functions with match.

And that’s it for this chapter.

Coming up next: Boolean Expressions

Below is the full code for this version of our Compiler. As you can see I have gathered all the parser functions in a new file called ProgParser1ControlStruct.kt to separate them from the numerical expressions parser functions:

CompilerMain.kt

package you.set.this

import kotlin.system.exitProcess

/**
 * A Simple Compiler
 * Based on the Let's Build a Compiler! series by Jack Crenshaw
 * This version produces Assembly language code for the 68000 microprocessor
 * A future version will produce code for the TI-59 Calculator
 * It will produce output that uses the TI-59 mnemonics that can be keyed directly to the calculator
 * Version 1.0 01.10.2021
 */

// the input program scanner
lateinit var inp: InputProgramScanner
// the Motorola 68000 instruction set
val code = M68000Instructions()

/** report an error */
fun error(errMsg: String) {
    System.err.println("Error: $errMsg")
}

/** abort compilation */
fun abort(errMsg: String) {
    error(errMsg)
    exitProcess(1)
}

/** print message and exit */
fun exit(msg: String) {
    System.err.println(msg)
    exitProcess(0)
}

/** report what was expected and abort */
fun expected(expMsg: String) {
    // setup a meaningful message if we have reached end of input
    val foundStr: String = if (inp.nextChar==endOfInput) "end of input" else inp.nextChar.toString()
    abort("Expected $expMsg \n found [${foundStr}]")
}

/** compiler initialisation */
fun initCompiler(args: Array<String>) {
    if (args.isEmpty())
        abort("no input file specified")
    // get a new input program scanner object - initialise from input file
    inp = InputProgramScanner(args[0])
}

/** main function */
fun main(args: Array<String>) {
    initCompiler(args)
    parseProgram()
}

InputProgramScanner.kt

package you.set.this

import java.io.File

/**
 * The input program scanner class
 * Performs the lexical scanner functions
 * Processes the char-by-char input and returns the tokens from the input stream
 */

/////////////////////////////////////////////////////////
// global special characters definitions
/////////////////////////////////////////////////////////
// numeric operators
val addOp: Char = '+'
val subOp: Char = '-'
val mulOp: Char = '*'
val divOp: Char = '/'
// parentheses
val leftParen: Char = '('
val rightParen: Char = ')'
// equals
val equalsOp = '='
// tokens
val endProg = '.'
val ifToken = 'i'
val endifToken = 'e'
val elseToken = 'l'
val whileToken = 'w'
val endwhileToken = 'e'
val repeatToken = 'r'
val untilToken = 'u'
val forToken = 'f'
val toToken = 't'
val endforToken = 'e'
val breakToken = 'b'
// end of input mark
val nullChar = 0
val endOfInput = nullChar.toChar()
/////////////////////////////////////////////////////////

/** this class implements the lexical scanner */
class InputProgramScanner(inputFile: String = "") {

    // the input program as string
    var inputProgram: String = ""
    var indx = 0

    // the next character from input
    // this is our lookahead character
    var nextChar: Char = ' '

    // the set of tokens that are block terminators
    private val blockTerminators = setOf(endifToken, elseToken, endwhileToken, untilToken, endforToken)

    /** initialisation code class InputProgramScanner */
    init {
        try {
            val f = File(inputFile)
            // read the whole program into a string
            inputProgram = f.readText()
            // set the lookahead character to the first input char
            getNextChar()
        } catch (e: Exception) {
            abort("could not open file [$inputFile]")
        }
    }

    /** get the next character from input after skipping white spaces */
    fun getNextChar() {
        skipNextChar()
        skipWhite()
    }

    /** set the lookahead character to the next char from input */
    fun skipNextChar() {
        if (indx < inputProgram.length)
            nextChar = inputProgram[indx++]
        else
            nextChar = endOfInput
    }

    /** skip white spaces */
    fun skipWhite() {
        while (isWhite(nextChar))
            skipNextChar()
    }

    /**
     * match a specific input char
     * and advance to next character
     * also produce a match if called with no input or if input is null character
     */
    fun match(x: Char = nullChar.toChar()) {
        // return a match when the next char matches x
        // or when x is null character
        if (nextChar == x || x == nullChar.toChar())
            getNextChar()
        else
            expected("'$x'")
    }

    /**
     * get an identifier
     * <identifier> ::= <alpha> [ <alphanumeric> | <_> ] *
     */
    fun getName(): String {
        var token = ""
        if (!isAlpha(nextChar))
            expected("Identifier")
        else {
            while (isAlphanumeric(nextChar)) {
                token += nextChar.uppercase()
                skipNextChar()
            }
            skipWhite()
        }
        return token
    }

    /**
     * get a number
     * <number> ::= [ <digit> ] +
     */
    fun getNumber(): String {
        var value = ""
        if (!isNumeric(nextChar)) {
            expected("Number")
        } else {
            while (isNumeric(nextChar)) {
                value += nextChar.toString()
                skipNextChar()
            }
            skipWhite()
        }
        return value
    }

    /** check for an alpha char */
    fun isAlpha(c: Char): Boolean = c.uppercaseChar() in 'A'..'Z'

    /** check for a numeric char */
    fun isNumeric(c: Char): Boolean = c in '0'..'9'

    /** check for alphanumeric */
    fun isAlphanumeric(c: Char): Boolean = isAlpha(c) || isNumeric(c) || c == '_'

    /** check for an "addop" (+,-) */
    fun isAddop(c: Char): Boolean = c == addOp || c == subOp

    /** check for a "mulop" (*,/) */
    fun isMulop(c: Char): Boolean = c == mulOp || c == divOp

    /** check for left parenthesis */
    fun isLeftParen(c: Char): Boolean = c == leftParen

    /** check for end of line */
    fun isEndOfLine(c: Char): Boolean = c == '\n' || c == '\r'

    /** check for a white space */
    fun isWhite(c: Char): Boolean = c == ' ' || c == '\t' || isEndOfLine(c)

    /** check for end of block */
    fun isEndBlock(c: Char) = blockTerminators.contains(c) || isEndProgram(c)

    /** check for end of program */
    fun isEndProgram(c: Char): Boolean = c == endProg || c == endOfInput
}

ProgParser1ControlStruct.kt

package you.set.this

/**
 * Program parsing - module 1
 * Control Structures
 */

// global vars
var labelIndx: Int = 0

/** create a unique label*/
fun newLabel(): String = "L${labelIndx++}"

/** post a label to output */
fun postLabel(label: String) = code.outputLabel(label)

////////////////////////////////////////////////////////////

/**
 * parse a program
 * <program> ::= <block> end
 */
fun parseProgram() {
    parseBlock()
    inp.match(endProg)
    println("\nEND PROGRAM")
}

/**
 * parse a block
 * <block> ::= [ <statement> ] *
 */
fun parseBlock(breakLabel: String = "") {
    while (!inp.isEndBlock(inp.nextChar)) {
        when (inp.nextChar) {
            ifToken -> parseIf(breakLabel)
            whileToken -> parseWhile()
            repeatToken -> parseRepeat()
            forToken -> parseFor()
            breakToken -> parseBreak(breakLabel)
            else -> parseOther()
        }
    }
}

/**
 * parse if statement
 * IF <condition> <block> [ ELSE <block> ] ENDIF
 */
fun parseIf(breakLabel: String) {
    inp.match()
    parseCondition()
    val label1 = newLabel()
    code.branchIfFalse(label1)
    parseBlock(breakLabel)
    if (inp.nextChar == elseToken) {
        inp.match()
        val label2 = newLabel()
        code.branch(label2)
        postLabel(label1)
        parseBlock(breakLabel)
        postLabel(label2)
    }
    else
        postLabel(label1)
    inp.match(endifToken)
}

/**
 * parse while statement
 * WHILE <condition> <block> ENDWHILE
 */
fun parseWhile() {
    inp.match()
    val label1 = newLabel()
    val label2 = newLabel()
    postLabel(label1)
    parseCondition()
    code.branchIfFalse(label2)
    parseBlock(label2)
    inp.match(endwhileToken)
    code.branch(label1)
    postLabel(label2)
}

/**
 * parse repeat statement
 * REPEAT <block> UNTIL <condition>
 */
fun parseRepeat() {
    inp.match()
    val label1 = newLabel()
    val label2 = newLabel()
    postLabel(label1)
    parseBlock(label2)
    inp.match(untilToken)
    parseCondition()
    code.branchIfFalse(label1)
    postLabel(label2)
}

/**
 * parse for statement
 * dummy version using pseudo-code
 * focuses on parsing of the structure only - not on producing code
 */
fun parseFor() {
    inp.match()
    val counterName = inp.getName()
    inp.match(equalsOp)
    code.dummyInstr("Allocate space for $counterName and set value to:")
    parseOther() // this is the FROM expression
    code.dummyInstr("Decrease $counterName by 1")
    inp.match(toToken)
    code.dummyInstr("Allocate space for TO variable and set value to:")
    parseOther() // this is the TO expression
    val label1 = newLabel()
    val label2 = newLabel()
    // actual start of the loop
    postLabel(label1)
    // increase counter and check the condition
    code.dummyInstr("Increase $counterName by 1")
    code.dummyInstr("Is $counterName <= TO?")
    code.branchIfFalse(label2)
    // execute the body of the loop
    parseBlock(label2)
    inp.match(endforToken)
    // back to the beginning of the loop
    code.branch(label1)
    // exit point of the loop
    postLabel(label2)
    code.dummyInstr("Release space held for $counterName and TO")
}

/** parse break statement */
fun parseBreak(label: String) {
    inp.match()
    if (label == "")
        abort("no loop to break of")
    else
        code.branch(label)
}

/** dummy parse condition */
fun parseCondition() {
    println("\t${inp.getName()}")
}

/** dummy parse function for anything else than control statements */
fun parseOther() {
    println("\t${inp.getName()}")
}

M68000Instructions.kt

package you.set.this

class M68000Instructions {

    /** output code */
    fun outputCode(s: String) = print("\t$s")

    /** output code with newline */
    fun outputCodeNl(s: String) = outputCode("$s\n")

    /** output a label */
    fun outputLabel(s: String) = println("$s:")

    /** set accumulator to a value */
    fun setAccumulator(value: String) = outputCodeNl("MOVE #${value},D0")

    /** clear accumulator */
    fun clearAccumulator() = outputCodeNl("CLR D0")

    /** push accumulator to the stack */
    fun saveAccumulator() = outputCodeNl("MOVE D0,-(SP)")

    /** add top of stack to accumulator */
    fun addToAccumulator() = outputCodeNl("ADD (SP)+,D0")

    /** subtract top of stack from accumulator */
    fun subFromAccumulator() {
        outputCodeNl("SUB (SP)+,D0")
        // negate accumulator as the result is the wrong way round
        outputCodeNl("NEG D0")
    }

    /** multiply accumulator by top of stack */
    fun multiplyAccumulator() = outputCodeNl("MULS (SP)+,D0")

    /** divide accumulator by top of stack */
    fun divideAccumulator() {
        outputCodeNl("MOVE (SP)+,D1")
        outputCodeNl("DIVS D1,D0")
    }

    /** set accumulator to variable */
    fun setAccumulatorToVar(identifier: String) = outputCodeNl("MOVE ${identifier}(PC),D0")

    /** call a subroutine */
    fun callSubroutine(subroutine: String) = outputCodeNl("BSR ${subroutine}")

    /** assignment var to value */
    fun assignment(identifier: String) {
        outputCodeNl("LEA $identifier(PC),A0")
        outputCodeNl("MOVE D0,(A0)")
    }

    /** branch if false */
    fun branchIfFalse(label: String) = outputCodeNl("BEQ $label")

    /** branch */
    fun branch(label: String) = outputCodeNl("BRA $label")

    /** dummy instruction */
    fun dummyInstr(cmd: String) = outputCodeNl(cmd)
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s