Posted by Rico's Nerd Cluster on June 28, 2025

Threading

Motivation: Why C++ Concurrency Matters

Standardized concurrency arrived with C++11, replacing the ad-hoc C / compiler-specific APIs used before C++98 with a formal memory model.

Key library components:

1
2
3
std::thread   // spawn & manage threads  
std::unique_lock  // RAII-based mutex ownership  
std::atomic   // lock-free atomics, fences, memory-order flags
  • These components originated from Boost’s experimental threads library; Boost later aligned with the ISO STL.
  • RAII underpins safe resource management across threads.
  • A well-defined memory model plus low-level atomic primitives enables predictable synchronization.

The difference between synchronity, single-threaded asynchrony, and multithreading

Model Core Idea Typical Unit Everyday Analogy
Synchronous Caller blocks until work completes Set timer → cook egg → wait → set timer → boil water
Asynchronous (single thread) Start work, then do something else before awaiting result Task Set timer → cook egg → while waiting start timer → boil water → await both
Multithreaded Multiple workers execute in parallel, optionally using async tasks Worker / Thread Cook A sets timer → cooks egg while doing other chores; Cook B boils water concurrently

Context Switching

Even a single-core CPU achieves quasi multithreading via task (context) switching—rapidly swapping thread state on one core.

Multi-core systems still rely on context switches to juggle more runnable threads than available cores (e.g., OS background services, browsers, editors).

Process vs Multithreading

Aspect Processes Threads
Isolation / Safety Separate address spaces → strong OS protection → harder for one process to corrupt another. Share the same address space → easier data sharing but greater risk of accidental corruption.
Communication Inter-process communication (IPC) is explicit and often slow (pipes, sockets, shared memory). Direct access to shared data; no built-in protection, so mutexes/atomics required.
Overhead Higher cost to create, schedule and manage; context switch is heavier. Cheaper to spawn and switch; but each thread still consumes stack (often ~1 MB) and other kernel resources.
Scalability Can distribute across machines (e.g., Erlang “actor” model). Limited to one machine; too many threads exhaust RAM and scheduler efficiency—thread pools mitigate this.
When to choose Favor when memory safety/isolation outweighs IPC cost or when you need multi-machine distribution. Favor when fast, low-latency sharing is critical and the complexity of synchronization is acceptable.

Rule of thumb: Don’t default to multiprocess or multithread—measure which model yields net benefit for your workload.

Invariant

An invariant is a statement that must always be true for a data structure (before & after each operation).

In the context of multithreading, one example is to insert into a lock-free list. Some invariants are:

  • Read prev and next pointers that must still satisfy prev->next == next.

  • Attempt CAS(prev->next, next, newNode).

    • If another thread changed prev->next, the invariant you relied on is gone → retry.
    • On success, fix newNode->next and ensure next->prev eventually points back.

Race Condition

Definition – The final value (or overall outcome) could be different depending on the relative timing / ordering of two or more threads that access the same data.

Race conditions are likely NOT replicable under a debugger, or adding log statements. In those cases, debugging points or the additional log statements could mess up the timing required for the race conditions to happen. So they are also called a Heisenbug.

Mitigation checklist

  • Share immutable data where possible.
  • Keep mutex-protected data private; expose safe accessors instead of raw addresses.
  • Use std::atomic or higher-level concurrent containers when the cost of a lock is too high.

To overcaome the race condition,

Technique How it works Strengths Weaknesses
Mutex / lock One thread at a time enters a critical section. (std::mutex, std::unique_lock) Simple, general; easy to reason about invariants. Blocking & potential contention; risk of deadlock.
Lock-free algorithms Combine atomic read-modify-write ops (CAS, fetch-add) in retry loops; threads never block. Scales under contention; avoids kernel scheduling latency. Harder to design & verify; starvation possible.
Transactional Memory (TM) Group reads/writes into a transaction. If another core touches the same cache lines, the transaction aborts and restarts. (std::atomic<T> extensions or hardware TM on some CPUs) Declarative “all-or-nothing” style—no explicit locks; good for coarse updates. Limited hardware/compiler support; throughput drops under heavy conflicts due to repeated aborts.

Mutex

Here is an example of multi-threading with a detached thread (which will be terminated when the main thread finishes). Note that one could join a thread as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <mutex>

void worker(std::vector<int>& nums, std::mutex& mtx)        // runs on its own thread
{
    std::cout << "[worker] started\n";

    for (int n : nums) {
        std::lock_guard<std::mutex> lg(mtx);
        std::cout << "  • " << n << "² = " << n * n << '\n';
    }

    std::this_thread::sleep_for(std::chrono::milliseconds(200));

    std::cout << "[worker] done\n";
}

int main()
{
    std::vector<int> data{1, 2, 3, 4, 5};
    std::mutex mtx;

    // ───────────────────────────────────────────────
    // 1. Thread constructor spawns the OS thread.
    // 2. The new thread begins executing *immediately*.
    // ───────────────────────────────────────────────
    std::thread t(worker, std::ref(data), std::ref(mtx));  // pass the vector by value (safe)
    // why would you need ref?

    t.detach();                   // fire-and-forget -- we can’t join later
    // t.join();                  // If the thread is not detached, join it. 

    std::cout << "[main] continues\n";

    // Give the worker time to finish so we can watch the output.
    // In real code you’d use join(), condition variables, futures, etc.
    std::this_thread::sleep_for(std::chrono::seconds(2));
}

Some subtleties about std::thread include:

  • When the std::thread constructor is called it decay-copies every argument into an internal tuple so the new thread has its own copy.
    • “Decay-copy” means: strip references, cv-qualifiers, and array/function types, then copy or move the resulting value.
    • This is why raw reference wouldn’t work, and std::ref is used.