Phases of translation
The C source file is processed by the compiler as if the following phases take place, in this exact order. Actual implementation may combine these actions or process them differently as long as the behavior is the same.
Phase 1
'0'
to '9'
'a'
to 'z'
and from 'A'
to 'Z'
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '
Phase 2
Phase 3
<stdio.h>
or "myfile.h"
1..E+3.foo
or 0JBK
+
, <<=
, <%
, or ##
.If the input has been parsed into preprocessing tokens up to a given character, the next preprocessing token is generally taken to be the longest sequence of characters that could constitute a preprocessing token, even if that would cause subsequent analysis to fail. This is commonly known as maximal munch.
int foo = 1; int bar = 0xE+foo; // error: invalid preprocessing number 0xE+foo int baz = 0xE + foo; // OK int quux = bar+++++baz; // error: bar++ ++ +baz, not bar++ + ++baz.
The sole exception to the maximal munch rule is:
- Header name preprocessing tokens are only formed within a
#include
directive and in implementation-defined locations within a#pragma
directive.
#define MACRO_1 1 #define MACRO_2 2 #define MACRO_3 3 #define MACRO_EXPR (MACRO_1 <MACRO_2> MACRO_3) // OK: <MACRO_2> is not a header-name
Phase 4
Phase 5
Note: the conversion performed at this stage can be controlled by command line options in some implementations: gcc and clang use -finput-charset
to specify the encoding of the source character set, -fexec-charset
and -fwide-exec-charset
to specify the encodings of the execution character set in the string literals and character constants that don't have an encoding prefix (since C11).
Phase 6
Adjacent string literals are concatenated.
Phase 7
Compilation takes place: the tokens are syntactically and semantically analyzed and translated as a translation unit.
Phase 8
Linking takes place: Translation units and library components needed to satisfy external references are collected into a program image which contains information needed for execution in its execution environment (the OS).
References
- C11 standard (ISO/IEC 9899:2011):
- 5.1.1.2 Translation phases (p: 10-11)
- 5.2.1 Character sets (p: 22-24)
- 6.4 Lexical elements (p: 57-75)
- C99 standard (ISO/IEC 9899:1999):
- 5.1.1.2 Translation phases (p: 9-10)
- 5.2.1 Character sets (p: 17-19)
- 6.4 Lexical elements (p: 49-66)
- C89/C90 standard (ISO/IEC 9899:1990):
- 2.1.1.2 Translation phases
- 2.2.1 Character sets
- 3.1 Lexical elements
See also
© cppreference.com
Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.
http://en.cppreference.com/w/c/language/translation_phases