Threading
Motivation: Why C++ Concurrency Matters
Standardized concurrency arrived with C++11, replacing the ad-hoc C / compiler-specific APIs used before C++98 with a formal memory model.
Key library components:
1
2
3
std::thread // spawn & manage threads
std::unique_lock // RAII-based mutex ownership
std::atomic // lock-free atomics, fences, memory-order flags
- These components originated from Boost’s experimental threads library; Boost later aligned with the ISO STL.
- RAII underpins safe resource management across threads.
- A well-defined memory model plus low-level atomic primitives enables predictable synchronization.
The difference between synchronity, single-threaded asynchrony, and multithreading
Model | Core Idea | Typical Unit | Everyday Analogy |
---|---|---|---|
Synchronous | Caller blocks until work completes | – | Set timer → cook egg → wait → set timer → boil water |
Asynchronous (single thread) | Start work, then do something else before awaiting result | Task | Set timer → cook egg → while waiting start timer → boil water → await both |
Multithreaded | Multiple workers execute in parallel, optionally using async tasks | Worker / Thread | Cook A sets timer → cooks egg while doing other chores; Cook B boils water concurrently |
Context Switching
Even a single-core CPU achieves quasi multithreading via task (context) switching—rapidly swapping thread state on one core.
Multi-core systems still rely on context switches to juggle more runnable threads than available cores (e.g., OS background services, browsers, editors).
Process vs Multithreading
Aspect | Processes | Threads |
---|---|---|
Isolation / Safety | Separate address spaces → strong OS protection → harder for one process to corrupt another. | Share the same address space → easier data sharing but greater risk of accidental corruption. |
Communication | Inter-process communication (IPC) is explicit and often slow (pipes, sockets, shared memory). | Direct access to shared data; no built-in protection, so mutexes/atomics required. |
Overhead | Higher cost to create, schedule and manage; context switch is heavier. | Cheaper to spawn and switch; but each thread still consumes stack (often ~1 MB) and other kernel resources. |
Scalability | Can distribute across machines (e.g., Erlang “actor” model). | Limited to one machine; too many threads exhaust RAM and scheduler efficiency—thread pools mitigate this. |
When to choose | Favor when memory safety/isolation outweighs IPC cost or when you need multi-machine distribution. | Favor when fast, low-latency sharing is critical and the complexity of synchronization is acceptable. |
Rule of thumb: Don’t default to multiprocess or multithread—measure which model yields net benefit for your workload.
Invariant
An invariant is a statement that must always be true for a data structure (before & after each operation).
In the context of multithreading, one example is to insert into a lock-free list. Some invariants are:
-
Read
prev
andnext
pointers that must still satisfyprev->next == next
. -
Attempt
CAS(prev->next, next, newNode)
.- If another thread changed
prev->next
, the invariant you relied on is gone → retry. - On success, fix
newNode->next
and ensurenext->prev
eventually points back.
- If another thread changed
Race Condition
Definition – The final value (or overall outcome) could be different depending on the relative timing / ordering of two or more threads that access the same data.
Race conditions are likely NOT replicable under a debugger, or adding log statements. In those cases, debugging points or the additional log statements could mess up the timing required for the race conditions to happen. So they are also called a Heisenbug.
Mitigation checklist
- Share immutable data where possible.
- Keep mutex-protected data private; expose safe accessors instead of raw addresses.
- Use
std::atomic
orhigher-level concurrent containers
when the cost of a lock is too high.
To overcaome the race condition,
Technique | How it works | Strengths | Weaknesses |
---|---|---|---|
Mutex / lock | One thread at a time enters a critical section. (std::mutex , std::unique_lock ) |
Simple, general; easy to reason about invariants. | Blocking & potential contention; risk of deadlock. |
Lock-free algorithms | Combine atomic read-modify-write ops (CAS, fetch-add) in retry loops; threads never block. | Scales under contention; avoids kernel scheduling latency. | Harder to design & verify; starvation possible. |
Transactional Memory (TM) | Group reads/writes into a transaction. If another core touches the same cache lines, the transaction aborts and restarts. (std::atomic<T> extensions or hardware TM on some CPUs) |
Declarative “all-or-nothing” style—no explicit locks; good for coarse updates. | Limited hardware/compiler support; throughput drops under heavy conflicts due to repeated aborts. |
Mutex
Here is an example of multi-threading with a detached thread (which will be terminated when the main thread finishes). Note that one could join a thread as well.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <mutex>
void worker(std::vector<int>& nums, std::mutex& mtx) // runs on its own thread
{
std::cout << "[worker] started\n";
for (int n : nums) {
std::lock_guard<std::mutex> lg(mtx);
std::cout << " • " << n << "² = " << n * n << '\n';
}
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "[worker] done\n";
}
int main()
{
std::vector<int> data{1, 2, 3, 4, 5};
std::mutex mtx;
// ───────────────────────────────────────────────
// 1. Thread constructor spawns the OS thread.
// 2. The new thread begins executing *immediately*.
// ───────────────────────────────────────────────
std::thread t(worker, std::ref(data), std::ref(mtx)); // pass the vector by value (safe)
// why would you need ref?
t.detach(); // fire-and-forget -- we can’t join later
// t.join(); // If the thread is not detached, join it.
std::cout << "[main] continues\n";
// Give the worker time to finish so we can watch the output.
// In real code you’d use join(), condition variables, futures, etc.
std::this_thread::sleep_for(std::chrono::seconds(2));
}
Some subtleties about std::thread
include:
- When the
std::thread
constructor is called it decay-copies every argument into an internal tuple so the new thread has its own copy.- “Decay-copy” means: strip references, cv-qualifiers, and array/function types, then copy or move the resulting value.
- This is why raw reference wouldn’t work, and
std::ref
is used.