Joke Collection Website - Bulletin headlines - All Noun Explanations of Compilation Principle

All Noun Explanations of Compilation Principle

Don't be so lazy in books! .

The compilation process is divided into six stages: lexical analysis, grammatical analysis, semantic analysis, intermediate code generation, code optimization and target code generation.

Interpreter: Convert the source program of one language into the equivalent program of another language-the target language, and then execute the target program. Interpreter is to accept a sentence input by a high-level language, interpret it and control the computer to execute it, and immediately get the execution result of this sentence, and then accept the next sentence.

Compiler: refers to a program in which a source program written in a high-level language can be converted into an object program (machine language program or assembly language program) in a logically equivalent low-level language form.

Explain the fundamental difference between a program and a compiler: whether to generate object code or not.

Ambiguity of sentences (ambiguity here refers to grammatical structure. ): If a sentence with grammar G [s] can find two different leftmost derivatives (or rightmost derivatives), or if there are two different grammar trees, the sentence is said to be ambiguous.

Grammatical ambiguity: A grammar is ambiguous if it contains ambiguous sentences, otherwise it is ambiguous.

The meaning of LL( 1): (LL( 1) is ambiguous in grammar; LL( 1) syntax does not contain left recursion)

65438th+0th L: Scanning the 2nd L of the input string from left to right will generate the leftmost derivative.

1: Look at the 1 input symbol on the right to decide which production formula to choose.

Equivalent transformation from some non-LL( 1) grammars to LL( 1) grammars: 1. Extract common factor 2. Eliminate left recursion.

Attribute of grammatical symbols: the meaning of words, that is, some information related to grammatical symbols, such as type, value, storage address, etc.

An attribute grammar is a triple A =(G, v, f).

G: context-free grammar.

V: Attributes are limited. Each attribute is associated with a terminator or a non-terminator of the grammar. Attributes, like variables, can be calculated and passed.

F: A finite set of assertions or predicates about attributes (the calculation rules of a set of attributes). Assertions or semantic rules are associated with a product and only refer to attributes associated with terminators or non-terminators at the left or right end of the product.

Comprehensive attribute: if the attribute value of a single non-terminal A in the left part of production is determined by the attribute value of a non-terminal A in the right part, the attribute of A is called comprehensive genus.

Inherited attribute: if the attribute value of the produced right symbol B is determined by the attribute value of the left non-terminal or other symbols on the right, then the attribute of B is an inherited attribute.

(1) Non-terminators can have both comprehensive attributes and inheritance attributes, but the grammar opening symbol has no inheritance attribute.

(2) The finalizer has only comprehensive attributes, but no inherited attributes. They are provided by the vocabulary program.

In calculation: comprehensive attributes are passed up the attribute syntax tree; Inherited attributes are passed down the attribute syntax tree.

Grammar-guided translation refers to the description of the semantic rules of the products used in the process of grammatical analysis.

Grammar-guided translation: analyze the word symbol string grammatically, build a syntax analysis tree, then build an attribute dependency graph as needed, traverse the syntax tree and calculate according to semantic rules at each node of the syntax tree.

Intermediate code (intermediate language)

1 is a representation of the complexity between source language and machine language.

2. Usually, a fast compiler directly generates the object code.

3. In order to make the compiler structure logically simpler and clearer, intermediate code is often used, so that some machine-related implementation details can be carefully handled in the code generation stage and optimized at the intermediate code level, making code optimization easier to realize.

What is an intermediate code? The internal representation of the source program is independent of the structure of the target machine and easy to generate code mechanically.

Why should it be converted into intermediate code? (1) The logical structure is clear; It is beneficial to implement the same language on different target machines.

(2) It is convenient for transplantation, modification and machine-independent optimization.

Several forms of intermediate code: inverse Polish notation, ternary and tree representation, and quaternary.

General form of symbol table: symbol table consists of two items, namely name column and information column.

The information column contains many sub-columns and flag bits, which are used to record the corresponding names and various attributes. The name column is also called the main column, and the contents of the main column are called keywords.

Function of symbol table: (1) Collect symbol attributes; (2) Check the legitimacy of context semantics; (3) Check the consistency and legality of the identifier attribute in the context; And (3) as the basis of address allocation in the generation stage of the object code.

Main attributes and functions of symbols:

1. Symbol name 2. Symbol type (integer, real number, string, etc.). ) 3. Symbol storage category (public * * *, private).

4. Scope and visibility of symbols (global and local) 5. Storage and allocation information of symbol variables (static storage area and dynamic storage area)

Storage allocation scheme strategy: static storage allocation; Dynamic storage allocation: stack and heap.

static storage allocation

1 basic strategy

When compiling, all data spaces of the target program are arranged, and the unit address of each data item can be determined.

2. Applicable distribution object: the target code segment of the subroutine; Global data target (global variable)

3. Requirements for static storage allocation: Recursive calls are not allowed, excluding variable arrays.

FORTRAN program is a segment structure, recursion is not allowed, and the size and nature of data names are fixed. This is a typical static allocation.

dynamic memory allocation

1. If a programming language allows recursive procedures, variable arrays or allows users to freely apply for and release space, then dynamic storage management technology is needed.

2. Two dynamic storage allocation methods: stack and heap.

Stack dynamic storage allocation

Allocation strategy: design the data space of the whole program as a stack.

In a language program with recursive structure, every time a procedure is called, the data space it needs is allocated at the top of the stack, and this space will be released when the procedure ends.

The data space required for this process includes two parts.

Some of them are data objects existing in this activity, such as local variables, parameter units, temporary variables, etc.

The other part is the recorded information (connection data) used to manage the process activities.

activation record

The information required for a process execution is managed by a continuous storage area, which is called activity record.

compose

1, temporary work unit; 2. Local variables; 3. Machine status information; 4. Access chain;

5. Control chain; 6. Real ginseng; 7. Return address

What is code optimization?

The so-called optimization is equivalent transformation of the code, so that the result of the transformed code is the same as that before the transformation, but the running speed is faster or the storage space is reduced.

Optimization principle: equivalence principle: after optimization, the result of program operation should not be changed.

Effective principle: Make the optimized target code run shorter and occupy less storage space.

Principle of cost performance: achieve better optimization effect at the lowest possible cost.

Common optimization techniques

(1) Delete redundant operation (delete male * * * subexpression) (2) Code extrapolation+delete inductive variable +(3) Strength weakening; (4) Changing the loop control condition (5) Merging the known quantity with the copy propagation (6) Deleting the useless assignment.

Basic block definition

A sequence of statements with only one entrance and one exit in a program is called the basic block of the program.

Give me a score.