Compiler Introduction

Compiler Definition

A compiler is a program that reads a program written in one programming language (the source language) and translates it into an equivalent program in another programming language (the target language).


Compiler

  • Input: source program
  • Source Code Analysis (front end)
  • Intermediate Code Generator 中间代码发生器
  • Synthesis (back end) Output: target program

Analysis Front end

Analysis comes in three phases:

  • Lexical analysis
  • Syntax analysis
  • Semantic analysis

Analysis

Lexical Analysis 词法分析

The program does lexical analysis is called lexer or scanner.

  • Reads the input program character by character as a stream.
  • Splits the stream into lexemes, means “lexical elements”.
  • Classifies each lexeme into a category of lexemes, called token.

We use regular expressions to do lexical analysis.

Example

num=1+23;
No. Lexeme Token
1 num identifier
2 = ass_opt
3 1 int_literal
4 + plus_opt
5 23 int_literal
6 ; semicolon

Tokens

Here is a list of common tokens:

  • identifier: names the programmer chooses, like variable names; 变量名
  • keywords: names already in the programming language, like if;关键字
  • separator: punctuation characters and paired-delimiters, like {};
  • operator: symbols operate on arguments and produce results, like +;
  • literal: numeric, logical, textual, reference literals, like true, 123;字面量
  • comment

Syntax Analysis 句法分析

The program which does syntax analysis is called a parser.

  • Uses a grammar to analyze the form of tokens.
  • And groups the tokens into a nested hierarchical structure.
    • The structure is called a parse tree.
  • The parse tree shows the structure of the program.
  • Internal vertices are called non-terminals. Leaves are called terminals.

Same as using English grammar to check if the nouns, verbs, adjectives, etc., in a sentence are in the correct order.

Sometimes the parse tree can be extremely large.

  • To simplify the structure, we can remove the internal vertices and let the operators be the parents.
  • This is called syntax tree.

Semantic Analysis语义分析

  1. It takes a parse tree as input.
  2. Checks the “meaning” of the program.
  3. Searching errors in the program.
    • For example,
    • undefined variables,
    • uninitialized variables,
    • multiple declaration,
    • types of variables for operations and functions, etc.

Semantic Analysis also does type conversions for many programming languages.

float x=3.5;
int y=x;

the compiler adds a type conversion on before signing the value to .

ERRORS

Intermediate Code Generator

Three-address Code

Synthesis (Back End)