C++ - [Compilation 1] Compilation Model - Rico贾若童的博客

Compilation Model

Creating an executable from a single, small source file is conceptually straightforward:

source file -> file_to_binary -> executable

However, when a project consists of multiple source files, determining the correct order of compilation and linking becomes essential. Recompiling the entire codebase after every change is inefficient because compilation is a time-consuming process; the compiler must thoroughly analyze and translate the code into machine code. By dividing the code into smaller compilation units and then linking them, each unit being compiled into an object file that are very similar to the final executable. we can recompile only the parts of the code that have changed:

source file1 -> compilation --objective_file1--> linking -> 
                                                    executable
source file2 -> compilation --objective_file2--> linking -> 

This modular approach not only speeds up the build process but also simplifies maintenance. C++ has evolved several key concepts based on this model, including:

Declaration & definition
Header files and source files
Translation Units
One Definition Rule (ODR)

Declaration & Definition

Assume we have a variable x in one source file. We want to share it with other source files. Of course we cannot define it in all these source files. So what we can do is in file A, we define x: int x = 1;. In other source files, we declare it to be defined elsewhere: extern int x;. Then, this variable could be found during linking. The same goes with function definitions: the prototypes are declared in headerfiles, and they are defined in a single source file (or header file)

Header files and source files

Header files are primarily used to declare interfaces rather than to include all of the actual definitions. They let you share declarations—such as function prototypes, type definitions, constants, and macros—across multiple source files.

There are exceptions, though. Inline functions, templates, or classes with inline member functions often have their definitions in header files because they need to be visible at the point of use.

Translation Units

A translation unit is one single source file + directly linked / indirectly lined header files - preprocessed directives (Macros, etc?). A compiler actually processes a translation unit to begin with.

One Definition Rule

If we had the same variable defined in two different places, we are faced with these issues:

Ambiguity: Which definition should the program use?
Memory Allocation: Multiple definitions could result in separate memory allocations for what should be a single entity, leading to inconsistent behavior.

Inline functions - functions that defined with the keyword inline.
- It suggests to the compiler that it may replace a function call with the function’s body, potentially reducing function call overhead.
- However, it’s important to note that the compiler is not obligated to inline the function - it just serves as a hint.
- If you have an inline function defined in a header file, it is allowed (and expected) to appear in multiple translation units (source files) as long as all these definitions are identical. Otherwise, you’ll encounter undefined behavior or linkage errors.
Template: Requirement: the complete definitions must be available when the template is instantiated.
- You can technically have the same template definition in multiple source files. But it’s more convenient to define them in header files.
Class:
- Class declaration must be in a header file. Class member definitions could be in a source file, or in a header file if they are declared as inline

Include Guard

Include Guards

When multiple files include the same header file, the compiler should include that file only once per translation unit, regardless of how many times it’s included. To ensure this, you can use include guards:

#ifndef MY_HEADER_FILE
#define MY_HEADER_FILE

#endif

This prevents duplicate definitions during compilation. However, if two different header files happen to use the same guard name (e.g., MY_HEADER_FILE), it can cause one of them to be ignored—leading to hard-to-trace bugs.

A modern and simpler alternative is:

#pragma once

This is a compiler directive (a type of #pragma). It’s not part of the C++ standard, but it’s widely supported

Complete Compilation Model

There are 4 stages of the full compilation model:

Preprocessing: expands preprocessor directives such as #include, #define, #ifdef, etc.
- Output: a “pure” C/C++ file with all headers and macros expanded.
- g++ -E file.cpp -o file.i
Compilation: converts the preprocessed code into assembly code.
- Performs syntax checks and translates to low-level instructions.
- Output: assembly code
- g++ -S file.i -o file.s
Assembly: converts the assembly code to machine code (binary), but not yet linked.
- Output is an object file (.o or .obj).
- g++ -c file.cpp -o file.o or from assembly: as file.s -o file.o
Linking: combines one or more object files (.o) and libraries into a final executable.
- Resolves external symbols (e.g., functions or variables defined in other files).
- g++ file.o -o my_program

Compiler Optimization

Given a code sample:

// Type your code here, or load an example.
int main(){
    int res=0;
    for(int i=0; i < 100; ++i){
        res += i;
    }
}

With no optimization (-O0), the compiler generates straightforward instructions:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 0
...

The loop is preserved as written.

With aggressive optimization (-O3), the compiler recognizes that the result of the loop isn’t used and eliminates it entirely:

xor     eax, eax
ret

All code has been optimized out! ⚠️ As a result, an important point here is that in a debugger, if our executable is set to debug mode, some debug points will be missed. Use -O0 -g during debugging for best results.

Compilers

gcc and g++ are both part of the GNU Compiler Collection (GCC).
- gcc can be used to compile C++ code, but you need to manually link the C++ standard library
```
1
  gcc main.cpp -lstdc++ -o app
```
- g++ us built on top of gcc, and defines __cplusplus macros

ELF (Executable and Linkable Format):

ELF is the standard binary file format on UNIX-like systems. It’s used for:

object file .o
Executable binaries
Shared libraries: .so

An ELF consists of:

Header: magic number, architecture, entry point
Program Headers: describe segments for runtime loader
Section Headers: sections like .text, .data, .rodata, symbol tables
Symbol Tables: for linking and debugging
String Tables: names of symbols and sections

E.g., readelf -h # show ELF header

C++ - [Compilation 1] Compilation Model

Header Files, Translation Unit, One Definition Rule (ODR), Compiler Optimization, Compiler, ELF

Compilation Model

Declaration & Definition

Header files and source files

Translation Units

One Definition Rule

Include Guard

Complete Compilation Model

Compiler Optimization

Compilers

ELF (Executable and Linkable Format):

CATALOG

FEATURED TAGS

FRIENDS

Compilation Model

Declaration & Definition

Header files and source files

Translation Units

One Definition Rule

Exceptions - Translation Unit Level Sharing

Include Guard

Complete Compilation Model

Compiler Optimization

Compilers

ELF (Executable and Linkable Format):

CATALOG

FEATURED TAGS

FRIENDS