Compilation Model
Creating an executable from a single, small source file is conceptually straightforward:
1
source file -> file_to_binary -> executable
However, when a project consists of multiple source files, determining the correct order of compilation and linking becomes essential. Recompiling the entire codebase after every change is inefficient because compilation is a time-consuming process; the compiler must thoroughly analyze and translate the code into machine code. By dividing the code into smaller compilation units and then linking them, each unit being compiled into an object file that are very similar to the final executable. we can recompile only the parts of the code that have changed:
1
2
3
source file1 -> compilation --objective_file1--> linking ->
executable
source file2 -> compilation --objective_file2--> linking ->
This modular approach not only speeds up the build process but also simplifies maintenance. C++ has evolved several key concepts based on this model, including:
- Declaration & definition
- Header files and source files
- Translation Units
- One Definition Rule (ODR)
Declaration & Definition
Assume we have a variable x in one source file. We want to share it with other source files. Of course we cannot define it in all these source files. So what we can do is in file A, we define x: int x = 1;. In other source files, we declare it to be defined elsewhere: extern int x;. Then, this variable could be found during linking. The same goes with function definitions: the prototypes are declared in headerfiles, and they are defined in a single source file (or header file)
Header files and source files
Header files are primarily used to declare interfaces rather than to include all of the actual definitions. They let you share declarations—such as function prototypes, type definitions, constants, and macros—across multiple source files.
There are exceptions, though. Inline functions, templates, or classes with inline member functions often have their definitions in header files because they need to be visible at the point of use.
Translation Units
A translation unit is one single source file + directly linked / indirectly lined header files - preprocessed directives (Macros, etc?). A compiler actually processes a translation unit to begin with.
One Definition Rule
If we had the same variable defined in two different places, we are faced with these issues:
- Ambiguity: Which definition should the program use?
- Memory Allocation: Multiple definitions could result in separate memory allocations for what should be a single entity, leading to inconsistent behavior.
Exceptions - Translation Unit Level Sharing
- Inline functions - functions that defined with the keyword
inline.- It suggests to the compiler that it may replace a function call with the function’s body, potentially reducing function call overhead.
- However, it’s important to note that the compiler is not obligated to inline the function - it just serves as a hint.
- If you have an inline function defined in a header file, it is allowed (and expected) to appear in multiple translation units (source files) as long as all these definitions are identical. Otherwise, you’ll encounter undefined behavior or linkage errors.
- Template: Requirement: the complete definitions must be available when the template is instantiated.
- You can technically have the same template definition in multiple source files. But it’s more convenient to define them in header files.
- Class:
- Class declaration must be in a header file. Class member definitions could be in a source file, or in a header file if they are declared as
inline
- Class declaration must be in a header file. Class member definitions could be in a source file, or in a header file if they are declared as
Include Guard
Include Guards
When multiple files include the same header file, the compiler should include that file only once per translation unit, regardless of how many times it’s included. To ensure this, you can use include guards:
1
2
3
4
#ifndef MY_HEADER_FILE
#define MY_HEADER_FILE
#endif
This prevents duplicate definitions during compilation. However, if two different header files happen to use the same guard name (e.g., MY_HEADER_FILE), it can cause one of them to be ignored—leading to hard-to-trace bugs.
A modern and simpler alternative is:
1
#pragma once
This is a compiler directive (a type of #pragma). It’s not part of the C++ standard, but it’s widely supported
Complete Compilation Model
There are 4 stages of the full compilation model:
- Preprocessing: expands preprocessor directives such as
#include,#define,#ifdef, etc.- Output: a “pure” C/C++ file with all headers and macros expanded.
g++ -E file.cpp -o file.i
- Compilation: converts the preprocessed code into assembly code.
- Performs syntax checks and translates to low-level instructions.
- Output: assembly code
g++ -S file.i -o file.s
- Assembly: converts the assembly code to machine code (binary), but not yet linked.
- Output is an object file (.o or .obj).
g++ -c file.cpp -o file.oor from assembly:as file.s -o file.o
- Linking: combines one or more object files (.o) and libraries into a final executable.
- Resolves external symbols (e.g., functions or variables defined in other files).
g++ file.o -o my_program
Compiler Optimization
Given a code sample:
1
2
3
4
5
6
7
// Type your code here, or load an example.
int main(){
int res=0;
for(int i=0; i < 100; ++i){
res += i;
}
}
With no optimization (-O0), the compiler generates straightforward instructions:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 0
...
The loop is preserved as written.
With aggressive optimization (-O3), the compiler recognizes that the result of the loop isn’t used and eliminates it entirely:
xor eax, eax
ret
All code has been optimized out! ⚠️ As a result, an important point here is that in a debugger, if our executable is set to debug mode, some debug points will be missed. Use -O0 -g during debugging for best results.
Compilers
gccandg++are both part of theGNU Compiler Collection(GCC).gcccan be used to compile C++ code, but you need to manually link the C++ standard library1
gcc main.cpp -lstdc++ -o app
g++us built on top ofgcc, and defines__cplusplusmacros
ELF (Executable and Linkable Format):
ELF is the standard binary file format on UNIX-like systems. It’s used for:
- object file
.o - Executable binaries
- Shared libraries:
.so
An ELF consists of:
- Header: magic number, architecture, entry point
- Program Headers: describe segments for runtime loader
- Section Headers: sections like .text, .data, .rodata, symbol tables
- Symbol Tables: for linking and debugging
- String Tables: names of symbols and sections
E.g., readelf -h # show ELF header