C++

C++ - [Container 1] Associative Container

std::set, std::unordered_set, std::map, std::unordered_map, std::multiset, std::unordered_multiset, std::unordered_multimap

Posted by Rico's Nerd Cluster on January 28, 2023

Introduction

Associative containers in C++ are containers that allows fast retrieval of elements based on keys, rather than positions.

  • Construction: std::set, and std::map also have sorting using comparison. std::unordered_set, and unordered_map uses a hash function. (O(log n))
  • Look up: std::set, and std::map uses a red-black tree. std::unordered_set, and unordered_map uses a hash table (O(1))
  • Duplicates: No duplicates are allowed in associative containers except for std::multiset, std::unordered_multiset, and std::unordered_multimap

In this article, we will examine the below behaviours:

  • Insert, assign, emplace
  • Hashing
  • Remove
  • Element Traversal and access

Finally, we will see how to define hash functions for these associative container datatypes.

std::unordered_map

  • Insert Or Assign (C++17)
1
2
3
4
5
6
7
8
9
10
#include <iostream>
#include <unordered_map>
int main() {
    std::unordered_map<int, std::string> my_map;

    my_map.insert_or_assign(1, "apple");   // Insert key 1
    my_map.insert_or_assign(2, "banana");  // Insert key 2
    my_map.insert_or_assign(1, "avocado"); // Assign new value to key 1
    std::cout << "After: " << my_map[1] << "\n";   // Outputs: avocado
}

std::unordered_set

Under the hood, unordered set is a hash table, it can only store objects with a hash function and equality operator. Their values will be stored as keys. unordered_map is a hash-table plus a look up.

  • Insert: In set and unordered_set, if an element already exists, insert does not do anything:
1
2
3
4
5
6
7
8
9
#include <iostream>
#include <set>
int main() {
    std::set<int> my_set;

    auto result1 = my_set.insert(42);  // First insert
    auto result2 = my_set.insert(42);  // Second insert
    std::cout << "Set size: " << my_set.size() << "\n";               // 1
}

std::set

    • Insert: In set and unordered_set, if an element already exists, insert does not do anything. See std::unordered_set for code snippet.

std::map

  • Erase: map, unordered_map, set, unordered_set have:
    • container.erase(value) to erase all occurences of value
    • container.erase(it) to erase the single occurence of it
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// 
std::map<int, std::string> m = {{1, "one"}, {2, "two"}};
m.erase(1);  // Removes key , o(log n)
std::unordered_map<int, std::string> umap = {{1, "one"}, {2, "two"}};
umap.erase(1);  // Removes key , o(1)
std::set<int> s = {1, 2, 3};
s.erase(2);  // Removes element with value 2, o(log n)
std::unordered_set<int> uset = {1, 2, 3};
uset.erase(2);  // Removes element with value 2 o(1)

// Alternative 2: universal erase with iterator:

auto it = my_map.find(key);
if (it != my_map.end()) {
    my_map.erase(it);  // faster than key lookup + erase
}

std::unordered_map

Emplace

  • unordered_map::emplace() -> pair [iterator, bool] vs try_emplace() -> pair [iterator, bool]:
    • If a key already exists, neither will replace the associated value.
      • insert_or_assign() would.
    • However, try_emplace() is always preferred over emplace, because:
      • try_emplace() Only constructs the value if the key doesn’t exist. (C++17)
      • emplace(key, value) Always constructs the key and value (even if the key already exists — it won’t be inserted though).
    • Another subtle difference is:
      • piecewise_construct is to construct either value of a pair using a tuple of args.
      • try_emplace does not take in args for key (so key must be fully constructed before the call)
        • but it can take in args of the value, so value is constructed in-place, but key will have to be move-constructed / copy constructed in the container.
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      
        // emplace supports piecewise construct
        itr_lookup_.emplace(
            std::piecewise_construct,
            std::forward_as_tuple(cache_.begin()->first),
            std::forward_as_tuple(cache_.begin()));
              
        // try emplace only supports construct + move?
        itr_lookup_.try_emplace(
            list_it->first,   // key
            list_it           // mapped_type must be constructible from this iterator
        );
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <iostream>
#include <unordered_map>
#include <string>

int main()
{
    std::unordered_map<std::string, std::string> my_map;
    std::string expensive = "expensive";
    
    // emplace will always create the pair, even if key exists
    auto it = my_map.emplace("dog", expensive);  
    
    // try_emplace avoids creating the value if "dog" is already in map
    my_map.try_emplace("dog", "dog2");  // Only constructs the value if the key doesn’t exist.
    // always construct the key-value pair, but if the object exists, the pair won't be inserted
    my_map.emplace("dog", "expensivo");  
    std::cout<<my_map["dog"]<<std::endl;    // see expensive
    std::cout<<(it.first->second)<<std::endl;    // see expensive
    return 0;
}

Hashing

By default, keys of types std::pair<int, int> are not hashable. One needs to define a callable for that.

  • The first alternative is defining a functor
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>
#include <unordered_map>
#include <utility> 

struct MyHash{
    size_t operator()(const std::pair<size_t, size_t>& p) const {
        return std::hash<size_t>{}(p.first) ^ (std::hash<size_t>{}(p.second) << 1);
    }
};

int main() {
    std::unordered_map<std::pair<size_t, size_t>, std::string, MyHash> my_map;

    my_map[{1, 2}] = "one-two";
    my_map[{3, 4}] = "three-four";

    std::cout << "Value at (1, 2): " << my_map[{1, 2}] << "\n";
    std::cout << "Value at (3, 4): " << my_map[{3, 4}] << "\n";

    return 0;
}
  • Another alternative is defining a global hash function. It’s a bit intrusive
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <unordered_map>
#include <utility>
#include <iostream>
namespace std{
    template<>
    struct hash <std::pair<size_t, size_t>>{
        size_t operator()(const std::pair<size_t, size_t>& p) const {
            return std::hash<size_t>{}(p.first) ^ (std::hash<size_t>{}(p.second) << 1);
        }
    };
}

int main() {
    std::unordered_map<std::pair<size_t, size_t>, std::string> my_map;
    my_map[{7, 8}] = "seven-eight";
    std::cout << my_map[{7, 8}] << std::endl;
    my_map[{1, 2}] = "one-two";
    my_map[{3, 4}] = "three-four";

    std::cout << "Value at (1, 2): " << my_map[{1, 2}] << "\n";
    std::cout << "Value at (3, 4): " << my_map[{3, 4}] << "\n";

    return 0;
}

PMR: Polymorphic Memory Resources (C++17)

  • Custom Allocators: Instead of having containers decide how memory is allocated, PMR allows you to supply a custom memory resource (allocator) that the container uses.
  • Reusable Strategies: You can create memory resources that implement different allocation strategies, such as pooling, monotonic allocation, or synchronized (thread-safe) allocation, and then reuse them across multiple containers.
    • Memory Efficiency: PMR can help you pre-allocate a large chunk of memory (from a buffer) and then use it for many small allocations, which is useful in real-time or embedded systems.
    • Multiple PMR-based containers can share the same memory resource.
  • memory allocation strategies by using user-supplied memory resources.

Thread-Safety of Associative Containers

Can I do std::for_each(parallel, ), grab each deque, do stuff, then put in grid_? Will there be memory issues?

Parallel access to map, unordered_map is not inherantly thread safe.

1
2
3
4
5
6
7
8
9
std::mutex mtx
std::for_each(
    std::execution::par,
    my_map.begin(), my_map.end(), [&](auto& e){
        ...
        std::lock_guard<std::mutex> lg(mtx);
        my_map[entry] = ... // potential memory re-allocation, etc.
    }
);

Read-only access is fine

If there is no memory re-allocation, AND one element is only accessed by a single thread, lockless is fine. These operators could trigger memory re-allocation:

  • reserve()
  • insert()
  • operator[]

  • Alternatives:
    • TBB’s (Thread-Building-Blocks) concurrent_unordered_map. TODO: test it