Skip to main content Link Search Menu Expand Document (external link)

Theme: auto

USCC Compiler

Introduction

USCC is the reference compiler for the University Simple C language, which is a subset of Standard C. Further information on the language itself is outlined in the USC Language Reference.

The target output language for USCC, by default, is the LLVM (version 15.x) bitcode IR.

Usage

By default, USCC will emit LLVM bitcode to a “.bc” file. So for example:

uscc quicksort.usc

Would output a file called quicksort.bc, which contains LLVM bitcode.

The generic syntax for USCC is:

uscc [OPTIONS] <input>

Note that the compiler only accepts a single input file.

And the following command-line options currently supported:

Option Description
-a
--print-ast
Output the AST to stdout once the syntax and semantic analysis is complete. By default, this will prevent the compiler from generating LLVM IR. However, if LLVM IR or generation is still desired, -b can also be specified.
-b
--bitcode
This flag is only used if you want to emit the LLVM bitcode in addition to some other output (such as the AST).
-h
--help
Prints usage instructions.
-l
--print-symbols
Output the symbol table to stdout, once the syntax and semantic analysis is complete. If used with -a, the symbols will output after the AST.
-o OUTFILE
--output OUTFILE
Specify the target output file. The default output file is the input with its last extension replaced with either “.bc” or “.s”. Note that if -b and -s are specified simultaneously, -o is ignored.
-p
--print-bc
This outputs a human-readable version of the bitcode to stdout.
-O Enables optimization passes. Can be combined with the other flags (most useful combined with -p).
-s Output assembly code for the host platform (currently only tested with x86_64 and arm64). This includes a custom graph coloring register allocation pass.

Errors

Syntax and semantic errors in source are reported by USCC in the following format:

filename:line:col: error: message
diagnostic message

USCC provides adequate diagnostics for error messages. When a parse or semantic error is detected, a caret is placed at the exact point of error.

For example, the following source file:

int main()
{
	{
		return ((1));
		{
			return "Hello world!";
			return (x);
			return;
		}
	}
}

Yields the following error messages:

test003.usc:6:11: error: Expected type int in return statement
			return "Hello world!";
			       ^
test003.usc:7:12: error: Use of undeclared identifier 'x'
			return (x);
			        ^
test003.usc:8:10: error: Invalid empty return in non-void function
			return;
			      ^
test003.usc:11:1: error: USC requires non-void functions to end with a return
}
^
4 Error(s)

Other errors such as a failure to specify an input file will yield messages with this format:

uscc: error: message

Code Overview

Here is an overview of which files correspond to the different parts of the compiler. Generally, the comments within the files are verbose, so the source files can be consulted for further information.

One thing to note is that the code is most definitely written in a C++11 dialect. So there are plenty uses of STL, shared pointers, auto, range-based for loops, and so on.

This section is broken down by subdirectory within the main uscc directory.

uscc

main.cpp – Main entry point and driver for the compiler as a whole.

ezOptionParser.hpp – External command line parsing library.

parse

Tokens.h/cpp/def – Defines all the valid tokens for the language, with some additional information used mostly to help with error messages.

usc.l – Input file for Flex. The rules in here are pretty straightforward.

Parse.h – Declares the Parser class that drives the scanner, the recursive descent parsing, and the semantic analysis. There are quite a few helper functions declared here, as well, including those that make it easy to match specific tokens (or sequences of tokens) and for generating error messages.

This file also declares all of the mutually-recursive functions used by the recursive descent parser. Generally, each function corresponds to a specific grammar rule, with some exceptions. The end result of constructing a Parser class is an AST intermediate representation.

Parse.cpp – Implements all of the Parser helper functions, as well as a few of the top-level recursive descent functions.

ParseExpr/ParseStmt.cpp – These files implement the vast majority of the recursive descent functions.

ParseExcept.h/cpp – Define and implement the exceptions that are utilized by the Parser class. Exceptions seemed like a great way to catch and recover from syntax errors during the recursive descent.

Types.h – Defines the type enum used internally by the Parser and the AST nodes.

ASTNodes.h – Defines all of the AST Nodes that are used by the AST. Since most of the code for the AST construction is fairly simple, a lot of the functions are implemented inline.

ASTNodes/ASTExpr/ASTStmt.cpp – Implements some functions for the nodes where they were complex enough to not warrant placing inline.

ASTPrint.cpp – Implements the printNode member function for every node in the AST, which is used by the -a command line option.

ASTEmit.cpp – This is where most of the code for the LLVM IR generation is contained. There is one function for every AST node, and it constructs all of the basic blocks and instructions within the hybrid CFG uses by LLVM.

Symbols.h/cpp – These files define and implement the Identifier, SymbolTable, and StringTable classes. In the .cpp, there is some code related to generating the LLVM IR. When generating the code for a function, all of the variables declared within the scope of the function (regardless of nesting) have their storage allocated via the emitIR function defined inside ScopeTable. There is similarly some code for generating global string constants in StringTable.

Emitter.h/cpp – These files facilitate the other main actions of USCC. It kicks off the LLVM IR construction from the root AST node. The functions for printing, writing to a file, and verifying the LLVM IR is also contained in here.

opt

SSABuilder.h/cpp – These files implement Static Single Assignment, and is implemented using the algorithm outlined in “Simple and Efficient Construction of SSA Form” (Braun et. al.)

Passes.h/cpp and ConstantOps/ConstantBranch/DeadBlocks/LICM.cpp – These define the optimization passes. At the moment, there is constant propagation, constant branch folding, dead block removal, and loop invariant code motion.

RegAlloc.cpp - Implements a graph coloring register allocator

tests

This directory contains the multitude of test cases. To execute the test suites, run:

python3 [testsuite]

The test suites are:

testParse.py – These test basic AST generation without semantic analysis. However, since there is no way to disable semantic analysis, the vast majority of these test cases will fail in the reference compiler. These test cases are more useful when the parser is actively being developed.

testSemant.py – These extensively test the semantic analysis. Many of these tests look at expected errors, though some also look at expected AST/symbol table generation from code that passes semantic analysis.

testEmit.py – These test the emitted LLVM bitcode by leveraging the lli tool. Each USC file ran by this test suite has its output compared against the expected.

testOpt.py – Like testEmit.py, except it runs with optimizations enabled.

testAsm.py - Like testEmit.py, but also assembles to the host assembly architecture, then generates a native executable, and verify the output of the native executable is correct.

Sample Output

Here is the output that is generated by USCC for quicksort.usc (the source for quicksort.usc is listed at the end of the USC Language Reference).

AST and Symbol table

Command Line:

uscc -a -l quicksort.usc

Output:

Program:
---Function: int partition
------ArgDecl: char[] array
------ArgDecl: int left
------ArgDecl: int right
------ArgDecl: int pivotIdx
------CompoundStmt:
---------Decl: char pivotVal
------------ArrayExpr: 
---------------ArraySub: array
------------------IdentExpr: pivotIdx
---------Decl: int storeIdx
------------IdentExpr: left
---------Decl: int i
------------IdentExpr: left
---------Decl: char temp
---------AssignStmt: temp
------------ArrayExpr: 
---------------ArraySub: array
------------------IdentExpr: pivotIdx
---------AssignArrayStmt:
------------ArraySub: array
---------------IdentExpr: pivotIdx
------------ArrayExpr: 
---------------ArraySub: array
------------------IdentExpr: right
---------AssignArrayStmt:
------------ArraySub: array
---------------IdentExpr: right
------------IdentExpr: temp
---------WhileStmt
------------BinaryCmp <:
---------------IdentExpr: i
---------------IdentExpr: right
------------CompoundStmt:
---------------IfStmt: 
------------------BinaryCmp <:
---------------------ToIntExpr: 
------------------------ArrayExpr: 
---------------------------ArraySub: array
------------------------------IdentExpr: i
---------------------ToIntExpr: 
------------------------IdentExpr: pivotVal
------------------CompoundStmt:
---------------------AssignStmt: temp
------------------------ArrayExpr: 
---------------------------ArraySub: array
------------------------------IdentExpr: i
---------------------AssignArrayStmt:
------------------------ArraySub: array
---------------------------IdentExpr: i
------------------------ArrayExpr: 
---------------------------ArraySub: array
------------------------------IdentExpr: storeIdx
---------------------AssignArrayStmt:
------------------------ArraySub: array
---------------------------IdentExpr: storeIdx
------------------------IdentExpr: temp
---------------------ExprStmt
------------------------IncExpr: storeIdx
---------------ExprStmt
------------------IncExpr: i
---------AssignStmt: temp
------------ArrayExpr: 
---------------ArraySub: array
------------------IdentExpr: storeIdx
---------AssignArrayStmt:
------------ArraySub: array
---------------IdentExpr: storeIdx
------------ArrayExpr: 
---------------ArraySub: array
------------------IdentExpr: right
---------AssignArrayStmt:
------------ArraySub: array
---------------IdentExpr: right
------------IdentExpr: temp
---------ReturnStmt:
------------IdentExpr: storeIdx
---Function: void quicksort
------ArgDecl: char[] array
------ArgDecl: int left
------ArgDecl: int right
------CompoundStmt:
---------Decl: int pivotIdx
---------IfStmt: 
------------BinaryCmp <:
---------------IdentExpr: left
---------------IdentExpr: right
------------CompoundStmt:
---------------AssignStmt: pivotIdx
------------------BinaryMath +:
---------------------IdentExpr: left
---------------------BinaryMath /:
------------------------BinaryMath -:
---------------------------IdentExpr: right
---------------------------IdentExpr: left
------------------------ConstantExpr: 2
---------------AssignStmt: pivotIdx
------------------FuncExpr: partition
---------------------IdentExpr: array
---------------------IdentExpr: left
---------------------IdentExpr: right
---------------------IdentExpr: pivotIdx
---------------ExprStmt
------------------FuncExpr: quicksort
---------------------IdentExpr: array
---------------------IdentExpr: left
---------------------BinaryMath -:
------------------------IdentExpr: pivotIdx
------------------------ConstantExpr: 1
---------------ExprStmt
------------------FuncExpr: quicksort
---------------------IdentExpr: array
---------------------BinaryMath +:
------------------------IdentExpr: pivotIdx
------------------------ConstantExpr: 1
---------------------IdentExpr: right
---------ReturnStmt: (empty)
---Function: int main
------CompoundStmt:
---------Decl: char[36] letters
------------StringExpr: thequickbrownfoxjumpsoverthelazydog
---------ExprStmt
------------FuncExpr: quicksort
---------------IdentExpr: letters
---------------ConstantExpr: 0
---------------ConstantExpr: 34
---------ExprStmt
------------FuncExpr: printf
---------------StringExpr: %s

---------------IdentExpr: letters
---------ReturnStmt:
------------ConstantExpr: 0
Symbols:
function main
function partition
function printf
function quicksort
---char[] array
---int i
---int left
---int pivotIdx
---char pivotVal
---int right
---int storeIdx
---char temp
---char[] array
---int left
---int pivotIdx
---int right
---char[] letters

LLVM Bitcode

Command Line:

uscc -p quicksort.usc

Output:

; ModuleID = 'main'
source_filename = "main"

@.str = private local_unnamed_addr constant [4 x i8] c"%s\0A\00"
@.str.1 = private local_unnamed_addr constant [36 x i8] c"thequickbrownfoxjumpsoverthelazydog\00"

declare i32 @printf(i8*, ...)

define i32 @partition(i8* %array, i32 %left, i32 %right, i32 %pivotIdx) {
entry:
  %0 = getelementptr inbounds i8, i8* %array, i32 %pivotIdx
  %1 = load i8, i8* %0, align 1
  %2 = getelementptr inbounds i8, i8* %array, i32 %pivotIdx
  %3 = load i8, i8* %2, align 1
  %4 = getelementptr inbounds i8, i8* %array, i32 %right
  %5 = load i8, i8* %4, align 1
  %6 = getelementptr inbounds i8, i8* %array, i32 %pivotIdx
  store i8 %5, i8* %6, align 1
  %7 = getelementptr inbounds i8, i8* %array, i32 %right
  store i8 %3, i8* %7, align 1
  br label %while.cond

while.cond:                                       ; preds = %if.end, %entry
  %storeIdx = phi i32 [ %storeIdx11, %if.end ], [ %left, %entry ]
  %i = phi i32 [ %inc7, %if.end ], [ %left, %entry ]
  %cmp = icmp slt i32 %i, %right
  %8 = zext i1 %cmp to i32
  %tobool = icmp ne i32 %8, 0
  br i1 %tobool, label %while.body, label %while.end

while.body:                                       ; preds = %while.cond
  %9 = getelementptr inbounds i8, i8* %array, i32 %i
  %10 = load i8, i8* %9, align 1
  %conv = sext i8 %10 to i32
  %conv3 = sext i8 %1 to i32
  %cmp4 = icmp slt i32 %conv, %conv3
  %11 = zext i1 %cmp4 to i32
  %tobool5 = icmp ne i32 %11, 0
  br i1 %tobool5, label %if.then, label %if.end

while.end:                                        ; preds = %while.cond
  %12 = getelementptr inbounds i8, i8* %array, i32 %storeIdx
  %13 = load i8, i8* %12, align 1
  %14 = getelementptr inbounds i8, i8* %array, i32 %right
  %15 = load i8, i8* %14, align 1
  %16 = getelementptr inbounds i8, i8* %array, i32 %storeIdx
  store i8 %15, i8* %16, align 1
  %17 = getelementptr inbounds i8, i8* %array, i32 %right
  store i8 %13, i8* %17, align 1
  ret i32 %storeIdx

if.then:                                          ; preds = %while.body
  %18 = getelementptr inbounds i8, i8* %array, i32 %i
  %19 = load i8, i8* %18, align 1
  %20 = getelementptr inbounds i8, i8* %array, i32 %storeIdx
  %21 = load i8, i8* %20, align 1
  %22 = getelementptr inbounds i8, i8* %array, i32 %i
  store i8 %21, i8* %22, align 1
  %23 = getelementptr inbounds i8, i8* %array, i32 %storeIdx
  store i8 %19, i8* %23, align 1
  %inc = add i32 %storeIdx, 1
  br label %if.end

if.end:                                           ; preds = %if.then, %while.body
  %storeIdx11 = phi i32 [ %inc, %if.then ], [ %storeIdx, %while.body ]
  %inc7 = add i32 %i, 1
  br label %while.cond
}

define void @quicksort(i8* %array, i32 %left, i32 %right) {
entry:
  %cmp = icmp slt i32 %left, %right
  %0 = zext i1 %cmp to i32
  %tobool = icmp ne i32 %0, 0
  br i1 %tobool, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  %sub = sub i32 %right, %left
  %div = sdiv i32 %sub, 2
  %add = add i32 %left, %div
  %1 = getelementptr inbounds i8, i8* %array, i32 0
  %call = call i32 @partition(i8* %1, i32 %left, i32 %right, i32 %add)
  %2 = getelementptr inbounds i8, i8* %array, i32 0
  %sub1 = sub i32 %call, 1
  call void @quicksort(i8* %2, i32 %left, i32 %sub1)
  %3 = getelementptr inbounds i8, i8* %array, i32 0
  %add2 = add i32 %call, 1
  call void @quicksort(i8* %3, i32 %add2, i32 %right)
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  ret void
}

define i32 @main() {
entry:
  %letters = alloca [36 x i8], align 8
  %0 = getelementptr inbounds [36 x i8], [36 x i8]* %letters, i32 0, i32 0
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* getelementptr inbounds ([36 x i8], [36 x i8]* @.str.1, i32 0, i32 0), i64 36, i1 true)
  call void @quicksort(i8* %0, i32 0, i32 34)
  %1 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i32 0, i32 0), i8* %0)
  ret i32 0
}

; Function Attrs: argmemonly nocallback nofree nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nocallback nofree nounwind willreturn }