Based on the original series “Let’s Build a Compiler!” by Jack Crenshaw.
Introduction
This is the second part of the Control Structures chapter. Without further ado, we will go straight to the REPEAT statement.
REPEAT Statement
Having done WHILE, it should be very easy to process the REPEAT statement. The syntax of REPEAT in our language will be:
REPEAT <block> UNTIL <condition>
The result of this statement will be again a loop like while, with the only difference that the condition is executed at the end of each iteration. In syntax-directed translation:
REPEAT { L = NewLabel PostLabel(L) } <block> UNTIL <condition> { BranchIfFalse(L) }
And the corresponding parse function can be written very easily. We will use 'r'
for REPEAT and 'u'
for UNTIL:
/** * parse repeat statement * REPEAT <block> UNTIL <condition> */ fun parseRepeat() { inp.getNextChar() val label = newLabel() postLabel(label) parseBlock() inp.match(untilToken) parseCondition() code.branchIfFalse(label) }
We also need to define our ‘repeat’ and ‘until’ tokens:
val repeatToken = 'r' val untilToken = 'u'
The function isEndBlock
is now becoming a long list of comparisons which is a bit messy. Why not use a Kotlin Set
object to define all the possible block terminators:
// the set of tokens that are block terminators private val blockTerminators = setOf(endifToken, elseToken, endwhileToken, untilToken)
And with this, isEndBlock
now looks much better:
/** check for end of block */
fun isEndBlock(c: Char) = blockTerminators.contains(c) || isEndProgram(c)
Finally, need to change parseBlock
and tell it to recognise the repeat statement:
/** * parse a block * <block> ::= [ <statement> ] * end */ fun parseBlock() { while (!inp.isEndBlock(inp.nextChar)) { when (inp.nextChar) { ifToken -> parseIf() whileToken -> parseWhile() repeatToken -> parseRepeat() else -> parseOther() } } }
Compile it, verify it and make sure it works.
BREAK Statement
This is a quite useful statement that can simplify a program a lot. The challenge with it is when you have nested loops, which of them do you break out of? The answer is of course the inner-most loop, so the parser for break statement needs to know the exit address of the loop it has been called from, in order to produce the correct assembly branch code.
In our implementation, each loop iteration is processed by parseBlock
, which also processes the IF block, which is where the break
statement is most likely to appear. Based on this train of thought, it sounds like we would need the following:
- when there is an IF command in a loop (
parseWhile
orparseRepeat
), the call toparseIf
must pass the exit label for that loop parseIf
must also pass that label to both of its blocks (if and else) as this is where the BREAK would be (if any)- remember that
parseIf
will be called byparseBlock
, soparseBlock
must receive the exit label when it is called byparseWhile
orparseRepeat
in order to pass it down toparseIf
You can see the recursion here, where parseBlock
calls parseWhile
, which calls parseBlock
, which calls parseIf
, which calls parseBlock
, and over again, and at some point, one of those parseBlock
instances will call parseBreak
. The stack in Kotlin (that we are getting for free) ensures that the value of the exit label parameter that is being passed down the chain is always the most recent.
This may sound a bit complicated but it’s a really clever yet simple solution from Jack, and here’s the code for it:
/** parse break statement */ fun parseBreak(label: String) { inp.getNextChar() if (label == "") abort("no loop to break of") else code.branch(label) }
We need to define our break token:
val breakToken = 'b'
And of course tell parseBlock to deal with BREAK as well:
/** * parse a block * <block> ::= [ <statement> ] * */ fun parseBlock(breakLabel: String = "") { while (!inp.isEndBlock(inp.nextChar)) { when (inp.nextChar) { ifToken -> parseIf(breakLabel) whileToken -> parseWhile() repeatToken -> parseRepeat() breakToken -> parseBreak(breakLabel) else -> parseOther() } } }
This is the modified code for ifParse
that passes the break Label it receives from the caller, to the two blocks further down:
/** * parse if statement * IF <condition> <block> [ ELSE <block> ] ENDIF */ fun parseIf(breakLabel: String) { inp.getNextChar() parseCondition() val label1 = newLabel() code.branchIfFalse(label1) parseBlock(breakLabel) if (inp.nextChar == elseToken) { inp.getNextChar() val label2 = newLabel() code.branch(label2) postLabel(label1) parseBlock(breakLabel) postLabel(label2) } else postLabel(label1) inp.match(endifToken) }
The version of parseWhile
that passes the break Label is below:
/** * parse while statement * WHILE <condition> <block> ENDWHILE */ fun parseWhile() { inp.getNextChar() val label1 = newLabel() val label2 = newLabel() postLabel(label1) parseCondition() code.branchIfFalse(label2) parseBlock(label2) inp.match(endwhileToken) code.branch(label1) postLabel(label2) }
And the new version of parseRepeat
is:
/** * parse repeat statement * REPEAT <block> UNTIL <condition> */ fun parseRepeat() { inp.getNextChar() val label1 = newLabel() val label2 = newLabel() postLabel(label1) parseBlock(label2) inp.match(untilToken) parseCondition() code.branchIfFalse(label1) postLabel(label2) }
As usual, compile it, verify it and try to break it (pun intended and copied from the original).
FOR Statement
I’ve left the FOR loop last for a reason. Jack in his original chapter 5, explained how complex it is to produce M68000 assembly code to process a FOR loop while having the value of the counter accessible within the loop. In fact the simplest he could do was 14 lines of assembly code just for the FOR loop, and that’s without counting the code to calculate the <from> and <to> expressions and without the code that is executed inside the loop. This is 2x or 3x the size of the while
or repeat
code and this is also reflected in the parser function.
I will take a different approach. I will analyse and process the FOR statement, but given that my goal is not to learn or to teach M68000 assembly, I will just produce dummy pseudo-code just to demonstrate the concept as Jack analysed it. When we get to convert our compiler to produce TI-59 code, I will make it produce real TI-59 code for the FOR loop.
So, the FOR in our language is:
FOR <variable> = <expression1> TO <expression2> <block> ENDFOR
This is equivalent to our familiar WHILE loop:
<variable> = <expression1> TEMP = <expression2> WHILE <variable> <= TEMP <block> ++<variable> ENDWHILE
Please note, the assumption here is that the step is always +1 and the value of expression2
is calculated only once, before the loop starts.
I will translate this into code, with one difference. I will increase the value of variable
just before the condition is evaluated, instead of increasing it at the end of the iteration. This means that I have to decrease it by 1 just after I set it to expression1
in the beginning, to get the first time right.
/** * parse for statement * dummy version using pseudo-code * focuses on parsing of the structure only - not on producing code */ fun parseFor() { inp.getNextChar() val counterName = inp.getName() inp.match(equalsOp) code.dummyInstr("Allocate space for $counterName and set value to:") parseOther() // this is the FROM expression code.dummyInstr("Decrease $counterName by 1") inp.match(toToken) code.dummyInstr("Allocate space for TO variable and set value to:") parseOther() // this is the TO expression val label1 = newLabel() val label2 = newLabel() // actual start of the loop postLabel(label1) // increase counter and check the condition code.dummyInstr("Increase $counterName by 1") code.dummyInstr("Is $counterName <= TO?") code.branchIfFalse(label2) // execute the body of the loop parseBlock(label2) inp.match(endforToken) // back to the beginning of the loop code.branch(label1) // exit point of the loop postLabel(label2) code.dummyInstr("Release space held for $counterName and TO") }
I have also added a dummy instruction to our M68000Instructions.kt
:
/** dummy instruction */
fun dummyInstr(cmd: String) = outputCodeNl(cmd)
We need to define the tokens for FOR, TO and ENDFOR in our global definitions and add ENDFOR to the list of block terminators:
val forToken = 'f' val toToken = 't' val endforToken = 'e' // the set of tokens that are block terminators private val blockTerminators = setOf(endifToken, elseToken, endwhileToken, untilToken, endforToken)
And of course we need to recognise the FOR token in parseBlock:
/** * parse a block * <block> ::= [ <statement> ] * */ fun parseBlock(breakLabel: String = "") { while (!inp.isEndBlock(inp.nextChar)) { when (inp.nextChar) { ifToken -> parseIf(breakLabel) whileToken -> parseWhile() repeatToken -> parseRepeat() forToken -> parseFor() breakToken -> parseBreak(breakLabel) else -> parseOther() } } }
Compile it and try it with an input program like:
f counter = expr1 t expr2 xloop_block e .
If you feel adventurous you can add a STEP token as well (but don’t forget that this logic assumes a positive step as the exit condition checks for Counter <= TO
). Also, please remember, for the block inside the FOR loop use for now a letter that has not been used as a token before (e.g. the word block
will be interpreted as break
– but as discussed this will be dealt with in chapter 7 where we will have multi-character tokens).
At this point I would like to make one more little improvement: in the various parser functions I have used getNextChar
to advance to the next token (in our case single-character token) where Jack was using match
. He did that for consistency and used match
everywhere, even when he did not have to actually match the token as it had already been matched immediately above. In these cases I have used getNextChar
in order to avoid the unnecessary call to match
that would cause confusion to the reader.
However I really appreciate the importance of consistent code and I’d really like to use match
everywhere – but only when a match is needed. So how do I reconcile the two? I have changed slightly function match
to (a) match the given token and advance to the next character or abort if not matched and (b) just advance to the next character if no token is passed as a parameter. This can be easily achieved in Kotlin by using the default value for the parameter to the function: when there is no parameter provided by the caller, the value of the parameter in the function will be the default value I have defined (null character(0)) and this can be used as our wildcard.
/** * match a specific input char * and advance to next character * also produce a match if called with no input or if input is null character */ fun match(x: Char = nullChar.toChar()) { // return a match when the next char matches x // or when x is null character if (nextChar == x || x == nullChar.toChar()) getNextChar() else expected("'$x'") }
With this new version of match, I can now use inp.match(endforToken)
where I need to match a token and advance to the next one, or inp.match()
where I just need to skip the token and advance to the next one (the match has already happened before this point). Please replace all the calls to getNextChar
in the parser functions with match
.
And that’s it for this chapter.
Coming up next: Boolean Expressions
Below is the full code for this version of our Compiler. As you can see I have gathered all the parser functions in a new file called ProgParser1ControlStruct.kt
to separate them from the numerical expressions parser functions:
CompilerMain.kt
package you.set.this import kotlin.system.exitProcess /** * A Simple Compiler * Based on the Let's Build a Compiler! series by Jack Crenshaw * This version produces Assembly language code for the 68000 microprocessor * A future version will produce code for the TI-59 Calculator * It will produce output that uses the TI-59 mnemonics that can be keyed directly to the calculator * Version 1.0 01.10.2021 */ // the input program scanner lateinit var inp: InputProgramScanner // the Motorola 68000 instruction set val code = M68000Instructions() /** report an error */ fun error(errMsg: String) { System.err.println("Error: $errMsg") } /** abort compilation */ fun abort(errMsg: String) { error(errMsg) exitProcess(1) } /** print message and exit */ fun exit(msg: String) { System.err.println(msg) exitProcess(0) } /** report what was expected and abort */ fun expected(expMsg: String) { // setup a meaningful message if we have reached end of input val foundStr: String = if (inp.nextChar==endOfInput) "end of input" else inp.nextChar.toString() abort("Expected $expMsg \n found [${foundStr}]") } /** compiler initialisation */ fun initCompiler(args: Array<String>) { if (args.isEmpty()) abort("no input file specified") // get a new input program scanner object - initialise from input file inp = InputProgramScanner(args[0]) } /** main function */ fun main(args: Array<String>) { initCompiler(args) parseProgram() }
InputProgramScanner.kt
package you.set.this import java.io.File /** * The input program scanner class * Performs the lexical scanner functions * Processes the char-by-char input and returns the tokens from the input stream */ ///////////////////////////////////////////////////////// // global special characters definitions ///////////////////////////////////////////////////////// // numeric operators val addOp: Char = '+' val subOp: Char = '-' val mulOp: Char = '*' val divOp: Char = '/' // parentheses val leftParen: Char = '(' val rightParen: Char = ')' // equals val equalsOp = '=' // tokens val endProg = '.' val ifToken = 'i' val endifToken = 'e' val elseToken = 'l' val whileToken = 'w' val endwhileToken = 'e' val repeatToken = 'r' val untilToken = 'u' val forToken = 'f' val toToken = 't' val endforToken = 'e' val breakToken = 'b' // end of input mark val nullChar = 0 val endOfInput = nullChar.toChar() ///////////////////////////////////////////////////////// /** this class implements the lexical scanner */ class InputProgramScanner(inputFile: String = "") { // the input program as string var inputProgram: String = "" var indx = 0 // the next character from input // this is our lookahead character var nextChar: Char = ' ' // the set of tokens that are block terminators private val blockTerminators = setOf(endifToken, elseToken, endwhileToken, untilToken, endforToken) /** initialisation code class InputProgramScanner */ init { try { val f = File(inputFile) // read the whole program into a string inputProgram = f.readText() // set the lookahead character to the first input char getNextChar() } catch (e: Exception) { abort("could not open file [$inputFile]") } } /** get the next character from input after skipping white spaces */ fun getNextChar() { skipNextChar() skipWhite() } /** set the lookahead character to the next char from input */ fun skipNextChar() { if (indx < inputProgram.length) nextChar = inputProgram[indx++] else nextChar = endOfInput } /** skip white spaces */ fun skipWhite() { while (isWhite(nextChar)) skipNextChar() } /** * match a specific input char * and advance to next character * also produce a match if called with no input or if input is null character */ fun match(x: Char = nullChar.toChar()) { // return a match when the next char matches x // or when x is null character if (nextChar == x || x == nullChar.toChar()) getNextChar() else expected("'$x'") } /** * get an identifier * <identifier> ::= <alpha> [ <alphanumeric> | <_> ] * */ fun getName(): String { var token = "" if (!isAlpha(nextChar)) expected("Identifier") else { while (isAlphanumeric(nextChar)) { token += nextChar.uppercase() skipNextChar() } skipWhite() } return token } /** * get a number * <number> ::= [ <digit> ] + */ fun getNumber(): String { var value = "" if (!isNumeric(nextChar)) { expected("Number") } else { while (isNumeric(nextChar)) { value += nextChar.toString() skipNextChar() } skipWhite() } return value } /** check for an alpha char */ fun isAlpha(c: Char): Boolean = c.uppercaseChar() in 'A'..'Z' /** check for a numeric char */ fun isNumeric(c: Char): Boolean = c in '0'..'9' /** check for alphanumeric */ fun isAlphanumeric(c: Char): Boolean = isAlpha(c) || isNumeric(c) || c == '_' /** check for an "addop" (+,-) */ fun isAddop(c: Char): Boolean = c == addOp || c == subOp /** check for a "mulop" (*,/) */ fun isMulop(c: Char): Boolean = c == mulOp || c == divOp /** check for left parenthesis */ fun isLeftParen(c: Char): Boolean = c == leftParen /** check for end of line */ fun isEndOfLine(c: Char): Boolean = c == '\n' || c == '\r' /** check for a white space */ fun isWhite(c: Char): Boolean = c == ' ' || c == '\t' || isEndOfLine(c) /** check for end of block */ fun isEndBlock(c: Char) = blockTerminators.contains(c) || isEndProgram(c) /** check for end of program */ fun isEndProgram(c: Char): Boolean = c == endProg || c == endOfInput }
ProgParser1ControlStruct.kt
package you.set.this /** * Program parsing - module 1 * Control Structures */ // global vars var labelIndx: Int = 0 /** create a unique label*/ fun newLabel(): String = "L${labelIndx++}" /** post a label to output */ fun postLabel(label: String) = code.outputLabel(label) //////////////////////////////////////////////////////////// /** * parse a program * <program> ::= <block> end */ fun parseProgram() { parseBlock() inp.match(endProg) println("\nEND PROGRAM") } /** * parse a block * <block> ::= [ <statement> ] * */ fun parseBlock(breakLabel: String = "") { while (!inp.isEndBlock(inp.nextChar)) { when (inp.nextChar) { ifToken -> parseIf(breakLabel) whileToken -> parseWhile() repeatToken -> parseRepeat() forToken -> parseFor() breakToken -> parseBreak(breakLabel) else -> parseOther() } } } /** * parse if statement * IF <condition> <block> [ ELSE <block> ] ENDIF */ fun parseIf(breakLabel: String) { inp.match() parseCondition() val label1 = newLabel() code.branchIfFalse(label1) parseBlock(breakLabel) if (inp.nextChar == elseToken) { inp.match() val label2 = newLabel() code.branch(label2) postLabel(label1) parseBlock(breakLabel) postLabel(label2) } else postLabel(label1) inp.match(endifToken) } /** * parse while statement * WHILE <condition> <block> ENDWHILE */ fun parseWhile() { inp.match() val label1 = newLabel() val label2 = newLabel() postLabel(label1) parseCondition() code.branchIfFalse(label2) parseBlock(label2) inp.match(endwhileToken) code.branch(label1) postLabel(label2) } /** * parse repeat statement * REPEAT <block> UNTIL <condition> */ fun parseRepeat() { inp.match() val label1 = newLabel() val label2 = newLabel() postLabel(label1) parseBlock(label2) inp.match(untilToken) parseCondition() code.branchIfFalse(label1) postLabel(label2) } /** * parse for statement * dummy version using pseudo-code * focuses on parsing of the structure only - not on producing code */ fun parseFor() { inp.match() val counterName = inp.getName() inp.match(equalsOp) code.dummyInstr("Allocate space for $counterName and set value to:") parseOther() // this is the FROM expression code.dummyInstr("Decrease $counterName by 1") inp.match(toToken) code.dummyInstr("Allocate space for TO variable and set value to:") parseOther() // this is the TO expression val label1 = newLabel() val label2 = newLabel() // actual start of the loop postLabel(label1) // increase counter and check the condition code.dummyInstr("Increase $counterName by 1") code.dummyInstr("Is $counterName <= TO?") code.branchIfFalse(label2) // execute the body of the loop parseBlock(label2) inp.match(endforToken) // back to the beginning of the loop code.branch(label1) // exit point of the loop postLabel(label2) code.dummyInstr("Release space held for $counterName and TO") } /** parse break statement */ fun parseBreak(label: String) { inp.match() if (label == "") abort("no loop to break of") else code.branch(label) } /** dummy parse condition */ fun parseCondition() { println("\t${inp.getName()}") } /** dummy parse function for anything else than control statements */ fun parseOther() { println("\t${inp.getName()}") }
M68000Instructions.kt
package you.set.this class M68000Instructions { /** output code */ fun outputCode(s: String) = print("\t$s") /** output code with newline */ fun outputCodeNl(s: String) = outputCode("$s\n") /** output a label */ fun outputLabel(s: String) = println("$s:") /** set accumulator to a value */ fun setAccumulator(value: String) = outputCodeNl("MOVE #${value},D0") /** clear accumulator */ fun clearAccumulator() = outputCodeNl("CLR D0") /** push accumulator to the stack */ fun saveAccumulator() = outputCodeNl("MOVE D0,-(SP)") /** add top of stack to accumulator */ fun addToAccumulator() = outputCodeNl("ADD (SP)+,D0") /** subtract top of stack from accumulator */ fun subFromAccumulator() { outputCodeNl("SUB (SP)+,D0") // negate accumulator as the result is the wrong way round outputCodeNl("NEG D0") } /** multiply accumulator by top of stack */ fun multiplyAccumulator() = outputCodeNl("MULS (SP)+,D0") /** divide accumulator by top of stack */ fun divideAccumulator() { outputCodeNl("MOVE (SP)+,D1") outputCodeNl("DIVS D1,D0") } /** set accumulator to variable */ fun setAccumulatorToVar(identifier: String) = outputCodeNl("MOVE ${identifier}(PC),D0") /** call a subroutine */ fun callSubroutine(subroutine: String) = outputCodeNl("BSR ${subroutine}") /** assignment var to value */ fun assignment(identifier: String) { outputCodeNl("LEA $identifier(PC),A0") outputCodeNl("MOVE D0,(A0)") } /** branch if false */ fun branchIfFalse(label: String) = outputCodeNl("BEQ $label") /** branch */ fun branch(label: String) = outputCodeNl("BRA $label") /** dummy instruction */ fun dummyInstr(cmd: String) = outputCodeNl(cmd) }