date: 2024-11-20
title: Compiler Semantic Analysis
status: DONE
author:
- AllenYGY
tags:
- NOTE
- Compiler
- SemanticAnalysis
publish: true
Compiler Semantic Analysis
After syntax analysis, the next phase is semantic analysis and intermediate code generation.
The two steps are all based on syntax-directed translation, which is procedure to compute the attribute of each grammar symbol in a syntax-tree.
To compute the attributes, we use attribute computation functions, also called semantic rules.
Each grammar production rule has zero or multiple semantic rules to compute the attributes.
The attributes can record a lot of information that a compiler wants to know, for example the values of a math expression, the type of a variable, etc.
Two different ways to specify the semantic rules in syntax directed translation:
Syntax-directed definitions are more flexible and easier to understand.
In syntax-directed definitions, attributes are of two types:
合成属性 自底向上计算属性
继承属性 自顶向下,从左至右或从右至左计算属性
For a grammar production
语义规则的形式 :这里的
The attribute of
换句话说,在一棵语法树中,父节点的属性仅取决于其子节点的属性。
The symbol
这说明 综合属性是自下而上的计算过程,因为必须先计算子节点的属性,才能计算父节点的属性。
In other words, in a parse tree, the attribute
综合属性遵循 自下而上(bottom-up) 的计算模式,这与语法分析中语法树的构建过程密切相关。例如,在语法分析中,叶子节点的值(通常是终结符的属性)被逐步向上传递,最终计算根节点的属性。
On the other hand, an inherited attribute is computed by
The attribute
Different from synthesized attributes, which is a grammar symbol on the LHS.
In a parse tree,
The order to compute the attributes of all siblings is unclear.
The computation of synthesized attributes is in fact a recursion.
For every grammar symbol
If
Following the recursion,
If
Thus, synthesized attributes for tokens are called intrinsic.
Meaning that they are not computed by any semantic rule.
In both the synthesized and inherited cases, the attribute computation function
If all semantic rules in syntax-directed definitions are functions, then the syntax-directed definitions and the corresponding grammar productions together form an attribute grammar.
Which is a special case of syntax-directed definition.
语法制导定义(Syntax-Directed Definitions, SDD):
属性文法(Attribute Grammar, AG):
In semantic rules, if all the attributed are synthesized, then the syntax-directed definitions are called
换句话说,𝑆-attributed定义是指向语法的定义,其所有属性都可以在解析树中自下而上地计算。
Here is an example of
Note:
Think about a sequence of variable definitions like int x,y,z
.
The production rules can be easily defined as
And the compiler wants to check the type for each variable.
Synthesized attributes do not work in this example.
The attributes for the tokens derived by
Thus, inherited attributes are used here.
Note that
The semantic rule 4 does two things: pass the attribute
Thus, the above syntax-directed definitions are
Syntax trees are tractive in implementations because
they are much smaller than parse trees, and they only depends on the source program, but not the grammars.
Many compilers use syntax tree to represent the intermediate code instead of three-address code.
Given the following grammar and the corresponding
The attributes in the above cases are intrinsic.
We also use a synthesized
For
Remember synthesized attributes are computed in the bottom-up order.
We do not need an individual phase for semantic analysis.
The attributes are computed at the same time with parsing.
For
Remember that inherited attributes have to be computed top-down, left-right, or right-left.
Infeasible to be done at the same time as parsing.
Therefore, it needs another phase for
Sometimes, the computation can be very chaotic when
That’s why people usually try to avoid using inherited attributes.
But sometimes the inherited attributes are unavoidable.
Fortunately, there is a one case that the computation is tractable, when all attributes can be computed in a single depth-first traversal.
DFS(A){
Evaluate the inherited attributes of A
For each child $X_i$ of $A$ (from left to right) do {
DFS($X_i$)
}
Evaluate the synthesized attributes of A
}
The depth-first traversal can also be presented in a parse tree.
If all synthesized and inherited attributes of the syntax-directed definitions can be computed in a single depth-first traversal, the definitions are called
Note that all
Semantic Analysis is the last phase in the front-end.
This phase commonly checks
This course only focus on type checking using synthesized attributes.
Thus, this can be done by a simple bottom-up computation together with parsing.
To keep our life easier, we only stick with simple programming languages, like C.
We won’t discuss advanced features like classes or polymorphism in OOPs.
Type check these advanced features cannot be done by easily using attributes.
The goal of type checking is to compute a type for each expression and statement in a program.
And check that all these types are coherent.
A type is a syntactic description of a set of possible runtime values.
"Syntactic" means that a type is only a syntax, given by the source program.
At compile time, the type checker gives a meaning to the syntax, which is a set of values.
The values that the corresponding expression is allowed to have when the program is running.
And thus, types do not exist at runtime.
At runtime, the microprocessor only manipulates bits in the memory.
For examples, 110101011 is only a bit string. Microprocessors cannot know if string represents a float, an integer, a pointer, or other things.
Some programming languages, like Python or php, do not have types.
Compilers/Interpreters for these languages do not to type checking at all.
In these languages, each value has an extra tag at runtime.
Which describes what kind of value the program is dealing with.
There are two kinds of types:
A type expression is the internal representation of a type inside the compiler.
Compilers use type expressions to represent the types of statements in the source program.
The type checker computes a type expression for each expression in the source program.
Basic type expressions
type_error
void
Let’s look at our first typing rules.
Given the following grammar for a small programming language.
A program in this language is a sequence of variable definitions followed by a single expression.
The type for a variable can be a basic type, like char or int.
It can also be a constructed type, like pointer.