Introduction
The C++ memory model was formally introduced in C++11 mainly for multithreading. Before C++11:
- Threading was platform-dependent. (POSIX threads for Unit systems)
- Behavior of shared memory and data races was undefined and left up to the OS/hardware/compiler.
- 🫠️ “Pthreads lib assume no data race.” Pthreads are a thin wrapper over OS-level primitives and don’t prevent data races. It’s the programmer’s job to synchronize correctly.
C++11 added:
- A standard memory model.
- Compared to
Pthreads, the C++11 memory model enforces stricter semantics, which helps correctness but might introduce overhead compared to bare-metal threading APIs like Pthreads. - ❗️ “Memory ordering: Thou shalt not modify the behavior of a single-threaded program”. The compiler is free to reorder instructions for optimization as long as the observable behavior of a single-threaded program doesn’t change — this is called the as-if rule. But in multithreaded programs, these reorderings can cause data races unless explicitly synchronized, which is why C++11 introduced a well-defined memory model.
- Compared to
std::thread, std::mutex, std::atomictypes
This brought C++ more in line with Java/C# in terms of native threading support.
Example: Memory Reordering by the Compiler
C++ compilers can reorder instructions as part of optimization — as long as the observable behavior in a single-threaded program is preserved (per the as-if rule). This can lead to subtle issues in multi-threaded programs.
- cpp code
1 2 3 4 5 6
int A, B; void foo() { A = B + 1; B = 0; }
- without optimization:
$ gcc -S -masm=intel foo.c $ cat foo.s ... mov eax, DWORD PTR _B (redo this at home...) add eax, 1 mov DWORD PTR _A, eax mov DWORD PTR _B, 0 ... - With Optimization:
1 2 3 4 5 6 7 8
$ gcc -O2 -S -masm=intel foo.c $ cat foo.s ... mov eax, DWORD PTR B mov DWORD PTR B, 0 add eax, 1 mov DWORD PTR A, eax ...
Why this matters: One can see that it’s possible that B (which may be atomic) can be ready before A (which may/may not be atomic). In a multithreaded context, this reordering can cause issues. For example, another thread might observe B == 0 while A has not yet been updated. This can violate intended synchronization logic.
How to prevent: Use std::atomic with proper memory ordering (e.g., memory_order_seq_cst) to prevent unwanted reordering.
Atomics act as compiler and CPU fences, ensuring ordering constraints are respected where required.
C++ Memory Model vs Lower Level Register Reads:
In a low-level device register, a read is usually read-clear (reading an IRQ status bit), or read-toggle. Two cores reading simultaneously could acknowledge it twice.
Some dangerous MMIO (Memory-Mapped IO)examples include:
| Scenario (MMIO) | Safe? | Why / Remedy |
|---|---|---|
| Two threads poll a PCIe status register | ❌ | Read‑clear → lost/dupe events. Use a single polling thread or a spin‑lock. |
| CPU reads a continuously updating 64‑bit timer | ⚠️ | Possible tear. Follow the ‘latch‑high‑then‑low’ sequence in the datasheet. |
| Two cores read a ROM device‑ID register | ✅ | No side‑effects. Still mark as volatile / use ioread32(). |
The above is well outside the C++ memory model. The C++ memory model explicitly excludes “actions performed by or on behalf of the hardware”. Correctness is platform specific.
By the C++ memory model:
- Concurrent read-only access is thread-safe, iff no other threads are modifying
- Read / write to 8 byte
std::atomic<T>is atomic, no guarantee to regular objects
Rule of thumb: Ordinary RAM objects obey the C++ memory model; MMIO obeys the hardware datasheet + architecture I/O ordering rules.
std::memory_order_relaxed
In this example, the progress callback is only used to count how many synchronized lidar/IMU batches have been processed:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <atomic>
#include <functional>
#include <iostream>
#include <thread>
std::atomic<int> synchronized_batches{0};
std::function<void()> progress_cb;
void worker_loop() {
// Pretend we processed 5 synchronized lidar/IMU batches.
for (int i = 0; i < 5; ++i) {
// Finished processing one batch.
if (progress_cb) {
progress_cb();
}
}
}
int main() {
progress_cb = [&]() {
synchronized_batches.fetch_add(1, std::memory_order_relaxed);
};
std::thread worker(worker_loop); // starts immediately
worker.join();
std::cout << "Processed batches: "
<< synchronized_batches.load(std::memory_order_relaxed)
<< "\n";
}
The callback runs on the frontend’s worker thread after each synchronized measurement is processed. Since another thread may read synchronized_batches, the counter must be atomic. However, we do not need this counter to synchronize access to any other data. We only care that the increment itself is race-free and that the final count is eventually correct. That is exactly what std::memory_order_relaxed provides: atomicity without extra ordering constraints.