Warning, /sdk/heaptrack/3rdparty/robin-map/README.md is written in an unsupported language. File is not indexed.
0001 [![CI](https://github.com/Tessil/robin-map/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/Tessil/robin-map/actions/workflows/ci.yml) 0002 0003 ## A C++ implementation of a fast hash map and hash set using robin hood hashing 0004 0005 The robin-map library is a C++ implementation of a fast hash map and hash set using open-addressing and linear robin hood hashing with backward shift deletion to resolve collisions. 0006 0007 Four classes are provided: `tsl::robin_map`, `tsl::robin_set`, `tsl::robin_pg_map` and `tsl::robin_pg_set`. The first two are faster and use a power of two growth policy, the last two use a prime growth policy instead and are able to cope better with a poor hash function. Use the prime version if there is a chance of repeating patterns in the lower bits of your hash (e.g. you are storing pointers with an identity hash function). See [GrowthPolicy](#growth-policy) for details. 0008 0009 A **benchmark** of `tsl::robin_map` against other hash maps may be found [here](https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html). This page also gives some advices on which hash table structure you should try for your use case (useful if you are a bit lost with the multiple hash tables implementations in the `tsl` namespace). 0010 0011 ### Key features 0012 0013 - Header-only library, just add the [include](include/) directory to your include path and you are ready to go. If you use CMake, you can also use the `tsl::robin_map` exported target from the [CMakeLists.txt](CMakeLists.txt). 0014 - Fast hash table, check the [benchmark](https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html) for some numbers. 0015 - Support for move-only and non-default constructible key/value. 0016 - Support for heterogeneous lookups allowing the usage of `find` with a type different than `Key` (e.g. if you have a map that uses `std::unique_ptr<foo>` as key, you can use a `foo*` or a `std::uintptr_t` as key parameter to `find` without constructing a `std::unique_ptr<foo>`, see [example](#heterogeneous-lookups)). 0017 - No need to reserve any sentinel value from the keys. 0018 - Possibility to store the hash value alongside the stored key-value for faster rehash and lookup if the hash or the key equal functions are expensive to compute. Note that hash may be stored even if not asked explicitly when the library can detect that it will have no impact on the size of the structure in memory due to alignment. See the [StoreHash](https://tessil.github.io/robin-map/classtsl_1_1robin__map.html#details) template parameter for details. 0019 - If the hash is known before a lookup, it is possible to pass it as parameter to speed-up the lookup (see `precalculated_hash` parameter in [API](https://tessil.github.io/robin-map/classtsl_1_1robin__map.html#a35021b11aabb61820236692a54b3a0f8)). 0020 - Support for efficient serialization and deserialization (see [example](#serialization) and the `serialize/deserialize` methods in the [API](https://tessil.github.io/robin-map/classtsl_1_1robin__map.html) for details). 0021 - The library can be used with exceptions disabled (through `-fno-exceptions` option on Clang and GCC, without an `/EH` option on MSVC or simply by defining `TSL_NO_EXCEPTIONS`). `std::terminate` is used in replacement of the `throw` instruction when exceptions are disabled. 0022 - API closely similar to `std::unordered_map` and `std::unordered_set`. 0023 0024 ### Differences compared to `std::unordered_map` 0025 0026 `tsl::robin_map` tries to have an interface similar to `std::unordered_map`, but some differences exist. 0027 - The **strong exception guarantee only holds** if the following statement is true `std::is_nothrow_swappable<value_type>::value && std::is_nothrow_move_constructible<value_type>::value` (where `value_type` is `Key` for `tsl::robin_set` and `std::pair<Key, T>` for `tsl::robin_map`). Otherwise if an exception is thrown during the swap or the move, the structure may end up in a undefined state. Note that per the standard, a `value_type` with a noexcept copy constructor and no move constructor also satisfies this condition and will thus guarantee the strong exception guarantee for the structure (see [API](https://tessil.github.io/robin-map/classtsl_1_1robin__map.html#details) for details). 0028 - The type `Key`, and also `T` in case of map, must be swappable. They must also be copy and/or move constructible. 0029 - Iterator invalidation doesn't behave in the same way, any operation modifying the hash table invalidate them (see [API](https://tessil.github.io/robin-map/classtsl_1_1robin__map.html#details) for details). 0030 - References and pointers to keys or values in the map are invalidated in the same way as iterators to these keys-values. 0031 - For iterators of `tsl::robin_map`, `operator*()` and `operator->()` return a reference and a pointer to `const std::pair<Key, T>` instead of `std::pair<const Key, T>` making the value `T` not modifiable. To modify the value you have to call the `value()` method of the iterator to get a mutable reference. Example: 0032 ```c++ 0033 tsl::robin_map<int, int> map = {{1, 1}, {2, 1}, {3, 1}}; 0034 for(auto it = map.begin(); it != map.end(); ++it) { 0035 //it->second = 2; // Illegal 0036 it.value() = 2; // Ok 0037 } 0038 ``` 0039 - No support for some buckets related methods (like `bucket_size`, `bucket`, ...). 0040 0041 These differences also apply between `std::unordered_set` and `tsl::robin_set`. 0042 0043 Thread-safety guarantees are the same as `std::unordered_map/set` (i.e. possible to have multiple readers with no writer). 0044 0045 ### Growth policy 0046 0047 The library supports multiple growth policies through the `GrowthPolicy` template parameter. Three policies are provided by the library but you can easily implement your own if needed. 0048 0049 * **[tsl::rh::power_of_two_growth_policy.](https://tessil.github.io/robin-map/classtsl_1_1rh_1_1power__of__two__growth__policy.html)** Default policy used by `tsl::robin_map/set`. This policy keeps the size of the bucket array of the hash table to a power of two. This constraint allows the policy to avoid the usage of the slow modulo operation to map a hash to a bucket, instead of <code>hash % 2<sup>n</sup></code>, it uses <code>hash & (2<sup>n</sup> - 1)</code> (see [fast modulo](https://en.wikipedia.org/wiki/Modulo_operation#Performance_issues)). Fast but this may cause a lot of collisions with a poor hash function as the modulo with a power of two only masks the most significant bits in the end. 0050 * **[tsl::rh::prime_growth_policy.](https://tessil.github.io/robin-map/classtsl_1_1rh_1_1prime__growth__policy.html)** Default policy used by `tsl::robin_pg_map/set`. The policy keeps the size of the bucket array of the hash table to a prime number. When mapping a hash to a bucket, using a prime number as modulo will result in a better distribution of the hash across the buckets even with a poor hash function. To allow the compiler to optimize the modulo operation, the policy use a lookup table with constant primes modulos (see [API](https://tessil.github.io/robin-map/classtsl_1_1rh_1_1prime__growth__policy.html#details) for details). Slower than `tsl::rh::power_of_two_growth_policy` but more secure. 0051 * **[tsl::rh::mod_growth_policy.](https://tessil.github.io/robin-map/classtsl_1_1rh_1_1mod__growth__policy.html)** The policy grows the map by a customizable growth factor passed in parameter. It then just use the modulo operator to map a hash to a bucket. Slower but more flexible. 0052 0053 0054 To implement your own policy, you have to implement the following interface. 0055 0056 ```c++ 0057 struct custom_policy { 0058 // Called on hash table construction and rehash, min_bucket_count_in_out is the minimum buckets 0059 // that the hash table needs. The policy can change it to a higher number of buckets if needed 0060 // and the hash table will use this value as bucket count. If 0 bucket is asked, then the value 0061 // must stay at 0. 0062 explicit custom_policy(std::size_t& min_bucket_count_in_out); 0063 0064 // Return the bucket [0, bucket_count()) to which the hash belongs. 0065 // If bucket_count() is 0, it must always return 0. 0066 std::size_t bucket_for_hash(std::size_t hash) const noexcept; 0067 0068 // Return the number of buckets that should be used on next growth 0069 std::size_t next_bucket_count() const; 0070 0071 // Maximum number of buckets supported by the policy 0072 std::size_t max_bucket_count() const; 0073 0074 // Reset the growth policy as if the policy was created with a bucket count of 0. 0075 // After a clear, the policy must always return 0 when bucket_for_hash() is called. 0076 void clear() noexcept; 0077 } 0078 ``` 0079 0080 ### Installation 0081 0082 To use robin-map, just add the [include](include/) directory to your include path. It is a **header-only** library. 0083 0084 If you use CMake, you can also use the `tsl::robin_map` exported target from the [CMakeLists.txt](CMakeLists.txt) with `target_link_libraries`. 0085 ```cmake 0086 # Example where the robin-map project is stored in a third-party directory 0087 add_subdirectory(third-party/robin-map) 0088 target_link_libraries(your_target PRIVATE tsl::robin_map) 0089 ``` 0090 0091 If the project has been installed through `make install`, you can also use `find_package(tsl-robin-map REQUIRED)` instead of `add_subdirectory`. 0092 0093 The library is available in [vcpkg](https://github.com/Microsoft/vcpkg/tree/master/ports/robin-map) and [conan](https://conan.io/center/tsl-robin-map). It's also present in [Debian](https://packages.debian.org/buster/robin-map-dev), [Ubuntu](https://packages.ubuntu.com/disco/robin-map-dev) and [Fedora](https://apps.fedoraproject.org/packages/robin-map-devel) package repositories. 0094 0095 The code should work with any C++11 standard-compliant compiler and has been tested with GCC 4.8.4, Clang 3.5.0 and Visual Studio 2015. 0096 0097 To run the tests you will need the Boost Test library and CMake. 0098 0099 ```bash 0100 git clone https://github.com/Tessil/robin-map.git 0101 cd robin-map/tests 0102 mkdir build 0103 cd build 0104 cmake .. 0105 cmake --build . 0106 ./tsl_robin_map_tests 0107 ``` 0108 0109 ### Usage 0110 0111 The API can be found [here](https://tessil.github.io/robin-map/). 0112 0113 All methods are not documented yet, but they replicate the behavior of the ones in `std::unordered_map` and `std::unordered_set`, except if specified otherwise. 0114 0115 0116 ### Example 0117 0118 ```c++ 0119 #include <cstdint> 0120 #include <iostream> 0121 #include <string> 0122 #include <tsl/robin_map.h> 0123 #include <tsl/robin_set.h> 0124 0125 int main() { 0126 tsl::robin_map<std::string, int> map = {{"a", 1}, {"b", 2}}; 0127 map["c"] = 3; 0128 map["d"] = 4; 0129 0130 map.insert({"e", 5}); 0131 map.erase("b"); 0132 0133 for(auto it = map.begin(); it != map.end(); ++it) { 0134 //it->second += 2; // Not valid. 0135 it.value() += 2; 0136 } 0137 0138 // {d, 6} {a, 3} {e, 7} {c, 5} 0139 for(const auto& key_value : map) { 0140 std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl; 0141 } 0142 0143 0144 if(map.find("a") != map.end()) { 0145 std::cout << "Found \"a\"." << std::endl; 0146 } 0147 0148 const std::size_t precalculated_hash = std::hash<std::string>()("a"); 0149 // If we already know the hash beforehand, we can pass it in parameter to speed-up lookups. 0150 if(map.find("a", precalculated_hash) != map.end()) { 0151 std::cout << "Found \"a\" with hash " << precalculated_hash << "." << std::endl; 0152 } 0153 0154 0155 /* 0156 * Calculating the hash and comparing two std::string may be slow. 0157 * We can store the hash of each std::string in the hash map to make 0158 * the inserts and lookups faster by setting StoreHash to true. 0159 */ 0160 tsl::robin_map<std::string, int, std::hash<std::string>, 0161 std::equal_to<std::string>, 0162 std::allocator<std::pair<std::string, int>>, 0163 true> map2; 0164 0165 map2["a"] = 1; 0166 map2["b"] = 2; 0167 0168 // {a, 1} {b, 2} 0169 for(const auto& key_value : map2) { 0170 std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl; 0171 } 0172 0173 0174 0175 0176 tsl::robin_set<int> set; 0177 set.insert({1, 9, 0}); 0178 set.insert({2, -1, 9}); 0179 0180 // {0} {1} {2} {9} {-1} 0181 for(const auto& key : set) { 0182 std::cout << "{" << key << "}" << std::endl; 0183 } 0184 } 0185 ``` 0186 0187 #### Heterogeneous lookups 0188 0189 Heterogeneous overloads allow the usage of other types than `Key` for lookup and erase operations as long as the used types are hashable and comparable to `Key`. 0190 0191 To activate the heterogeneous overloads in `tsl::robin_map/set`, the qualified-id `KeyEqual::is_transparent` must be valid. It works the same way as for [`std::map::find`](http://en.cppreference.com/w/cpp/container/map/find). You can either use [`std::equal_to<>`](http://en.cppreference.com/w/cpp/utility/functional/equal_to_void) or define your own function object. 0192 0193 Both `KeyEqual` and `Hash` will need to be able to deal with the different types. 0194 0195 ```c++ 0196 #include <functional> 0197 #include <iostream> 0198 #include <string> 0199 #include <tsl/robin_map.h> 0200 0201 0202 struct employee { 0203 employee(int id, std::string name) : m_id(id), m_name(std::move(name)) { 0204 } 0205 0206 // Either we include the comparators in the class and we use `std::equal_to<>`... 0207 friend bool operator==(const employee& empl, int empl_id) { 0208 return empl.m_id == empl_id; 0209 } 0210 0211 friend bool operator==(int empl_id, const employee& empl) { 0212 return empl_id == empl.m_id; 0213 } 0214 0215 friend bool operator==(const employee& empl1, const employee& empl2) { 0216 return empl1.m_id == empl2.m_id; 0217 } 0218 0219 0220 int m_id; 0221 std::string m_name; 0222 }; 0223 0224 // ... or we implement a separate class to compare employees. 0225 struct equal_employee { 0226 using is_transparent = void; 0227 0228 bool operator()(const employee& empl, int empl_id) const { 0229 return empl.m_id == empl_id; 0230 } 0231 0232 bool operator()(int empl_id, const employee& empl) const { 0233 return empl_id == empl.m_id; 0234 } 0235 0236 bool operator()(const employee& empl1, const employee& empl2) const { 0237 return empl1.m_id == empl2.m_id; 0238 } 0239 }; 0240 0241 struct hash_employee { 0242 std::size_t operator()(const employee& empl) const { 0243 return std::hash<int>()(empl.m_id); 0244 } 0245 0246 std::size_t operator()(int id) const { 0247 return std::hash<int>()(id); 0248 } 0249 }; 0250 0251 0252 int main() { 0253 // Use std::equal_to<> which will automatically deduce and forward the parameters 0254 tsl::robin_map<employee, int, hash_employee, std::equal_to<>> map; 0255 map.insert({employee(1, "John Doe"), 2001}); 0256 map.insert({employee(2, "Jane Doe"), 2002}); 0257 map.insert({employee(3, "John Smith"), 2003}); 0258 0259 // John Smith 2003 0260 auto it = map.find(3); 0261 if(it != map.end()) { 0262 std::cout << it->first.m_name << " " << it->second << std::endl; 0263 } 0264 0265 map.erase(1); 0266 0267 0268 0269 // Use a custom KeyEqual which has an is_transparent member type 0270 tsl::robin_map<employee, int, hash_employee, equal_employee> map2; 0271 map2.insert({employee(4, "Johnny Doe"), 2004}); 0272 0273 // 2004 0274 std::cout << map2.at(4) << std::endl; 0275 } 0276 ``` 0277 0278 0279 #### Serialization 0280 0281 The library provides an efficient way to serialize and deserialize a map or a set so that it can be saved to a file or send through the network. 0282 To do so, it requires the user to provide a function object for both serialization and deserialization. 0283 0284 ```c++ 0285 struct serializer { 0286 // Must support the following types for U: std::int16_t, std::uint32_t, 0287 // std::uint64_t, float and std::pair<Key, T> if a map is used or Key for 0288 // a set. 0289 template<typename U> 0290 void operator()(const U& value); 0291 }; 0292 ``` 0293 0294 ```c++ 0295 struct deserializer { 0296 // Must support the following types for U: std::int16_t, std::uint32_t, 0297 // std::uint64_t, float and std::pair<Key, T> if a map is used or Key for 0298 // a set. 0299 template<typename U> 0300 U operator()(); 0301 }; 0302 ``` 0303 0304 Note that the implementation leaves binary compatibility (endianness, float binary representation, size of int, ...) of the types it serializes/deserializes in the hands of the provided function objects if compatibility is required. 0305 0306 More details regarding the `serialize` and `deserialize` methods can be found in the [API](https://tessil.github.io/robin-map/classtsl_1_1robin__map.html). 0307 0308 ```c++ 0309 #include <cassert> 0310 #include <cstdint> 0311 #include <fstream> 0312 #include <type_traits> 0313 #include <tsl/robin_map.h> 0314 0315 0316 class serializer { 0317 public: 0318 serializer(const char* file_name) { 0319 m_ostream.exceptions(m_ostream.badbit | m_ostream.failbit); 0320 m_ostream.open(file_name, std::ios::binary); 0321 } 0322 0323 template<class T, 0324 typename std::enable_if<std::is_arithmetic<T>::value>::type* = nullptr> 0325 void operator()(const T& value) { 0326 m_ostream.write(reinterpret_cast<const char*>(&value), sizeof(T)); 0327 } 0328 0329 void operator()(const std::pair<std::int64_t, std::int64_t>& value) { 0330 (*this)(value.first); 0331 (*this)(value.second); 0332 } 0333 0334 private: 0335 std::ofstream m_ostream; 0336 }; 0337 0338 class deserializer { 0339 public: 0340 deserializer(const char* file_name) { 0341 m_istream.exceptions(m_istream.badbit | m_istream.failbit | m_istream.eofbit); 0342 m_istream.open(file_name, std::ios::binary); 0343 } 0344 0345 template<class T> 0346 T operator()() { 0347 T value; 0348 deserialize(value); 0349 0350 return value; 0351 } 0352 0353 private: 0354 template<class T, 0355 typename std::enable_if<std::is_arithmetic<T>::value>::type* = nullptr> 0356 void deserialize(T& value) { 0357 m_istream.read(reinterpret_cast<char*>(&value), sizeof(T)); 0358 } 0359 0360 void deserialize(std::pair<std::int64_t, std::int64_t>& value) { 0361 deserialize(value.first); 0362 deserialize(value.second); 0363 } 0364 0365 private: 0366 std::ifstream m_istream; 0367 }; 0368 0369 0370 int main() { 0371 const tsl::robin_map<std::int64_t, std::int64_t> map = {{1, -1}, {2, -2}, {3, -3}, {4, -4}}; 0372 0373 0374 const char* file_name = "robin_map.data"; 0375 { 0376 serializer serial(file_name); 0377 map.serialize(serial); 0378 } 0379 0380 { 0381 deserializer dserial(file_name); 0382 auto map_deserialized = tsl::robin_map<std::int64_t, std::int64_t>::deserialize(dserial); 0383 0384 assert(map == map_deserialized); 0385 } 0386 0387 { 0388 deserializer dserial(file_name); 0389 0390 /** 0391 * If the serialized and deserialized map are hash compatibles (see conditions in API), 0392 * setting the argument to true speed-up the deserialization process as we don't have 0393 * to recalculate the hash of each key. We also know how much space each bucket needs. 0394 */ 0395 const bool hash_compatible = true; 0396 auto map_deserialized = 0397 tsl::robin_map<std::int64_t, std::int64_t>::deserialize(dserial, hash_compatible); 0398 0399 assert(map == map_deserialized); 0400 } 0401 } 0402 ``` 0403 0404 ##### Serialization with Boost Serialization and compression with zlib 0405 0406 It is possible to use a serialization library to avoid the boilerplate. 0407 0408 The following example uses Boost Serialization with the Boost zlib compression stream to reduce the size of the resulting serialized file. The example requires C++20 due to the usage of the template parameter list syntax in lambdas, but it can be adapted to less recent versions. 0409 0410 ```c++ 0411 #include <boost/archive/binary_iarchive.hpp> 0412 #include <boost/archive/binary_oarchive.hpp> 0413 #include <boost/iostreams/filter/zlib.hpp> 0414 #include <boost/iostreams/filtering_stream.hpp> 0415 #include <boost/serialization/split_free.hpp> 0416 #include <boost/serialization/utility.hpp> 0417 #include <cassert> 0418 #include <cstdint> 0419 #include <fstream> 0420 #include <tsl/robin_map.h> 0421 0422 0423 namespace boost { namespace serialization { 0424 template<class Archive, class Key, class T> 0425 void serialize(Archive & ar, tsl::robin_map<Key, T>& map, const unsigned int version) { 0426 split_free(ar, map, version); 0427 } 0428 0429 template<class Archive, class Key, class T> 0430 void save(Archive & ar, const tsl::robin_map<Key, T>& map, const unsigned int /*version*/) { 0431 auto serializer = [&ar](const auto& v) { ar & v; }; 0432 map.serialize(serializer); 0433 } 0434 0435 template<class Archive, class Key, class T> 0436 void load(Archive & ar, tsl::robin_map<Key, T>& map, const unsigned int /*version*/) { 0437 auto deserializer = [&ar]<typename U>() { U u; ar & u; return u; }; 0438 map = tsl::robin_map<Key, T>::deserialize(deserializer); 0439 } 0440 }} 0441 0442 0443 int main() { 0444 tsl::robin_map<std::int64_t, std::int64_t> map = {{1, -1}, {2, -2}, {3, -3}, {4, -4}}; 0445 0446 0447 const char* file_name = "robin_map.data"; 0448 { 0449 std::ofstream ofs; 0450 ofs.exceptions(ofs.badbit | ofs.failbit); 0451 ofs.open(file_name, std::ios::binary); 0452 0453 boost::iostreams::filtering_ostream fo; 0454 fo.push(boost::iostreams::zlib_compressor()); 0455 fo.push(ofs); 0456 0457 boost::archive::binary_oarchive oa(fo); 0458 0459 oa << map; 0460 } 0461 0462 { 0463 std::ifstream ifs; 0464 ifs.exceptions(ifs.badbit | ifs.failbit | ifs.eofbit); 0465 ifs.open(file_name, std::ios::binary); 0466 0467 boost::iostreams::filtering_istream fi; 0468 fi.push(boost::iostreams::zlib_decompressor()); 0469 fi.push(ifs); 0470 0471 boost::archive::binary_iarchive ia(fi); 0472 0473 tsl::robin_map<std::int64_t, std::int64_t> map_deserialized; 0474 ia >> map_deserialized; 0475 0476 assert(map == map_deserialized); 0477 } 0478 } 0479 ``` 0480 0481 ### License 0482 0483 The code is licensed under the MIT license, see the [LICENSE file](LICENSE) for details.