c++ - unordered_map, why some buckets have more than 1 element even hashed values are all different? -


i playing default hash function std::unordered_map , checking if there collisions happening. here below code:

void check(std::map<long, bool> & mp, const string & s, const long v){     if(mp.count(v) == 0){         mp[v] = true;     }else{         cout<<"collision "<<s<<endl;     } } int main(){     std::unordered_map<std::string, long> um;     std::map<long, bool> map;     auto hasher = um.hash_function();      std::string str = "";  long count = 0;      (int = 0; < 26; ++i)     {         str = char(65+i);         um[str];         auto value = hasher(str);         check(map, str, value);          (int j = 0; j < 26; ++j)         {             str = char(65+i);             str += char(65+j);             um[str];             auto value = hasher(str);             check(map, str, value);              (int k = 0; k < 26; ++k)             {                 str = char(65+i);                 str += char(65+j);                 str += char(65+k);                 um[str];                 auto value = hasher(str);                 check(map, str, value);                  (int l = 0; l < 26; ++l)                 {                     str = char(65+i);                     str += char(65+j);                     str += char(65+k);                     str += char(65+l);                     um[str];                     auto value = hasher(str);                     check(map, str, value);                      (int m = 0; m < 26; ++m)                     {                         str = char(65+i);                         str += char(65+j);                         str += char(65+k);                         str += char(65+l);                         str += char(65+m);                         um[str];                         auto value = hasher(str);                         check(map, str, value);                     }                 }             }         }     }         for(int = 0; < um.bucket_count(); ++i){         if(um.bucket_size(i) >= 2)cout<<"um_collison happened @ "<<i<<"  "<<um.bucket_size(i)<<endl;     }      return 0; } 

the problem have found that, there no output collision for, there plenty of outputs um_collison happened @ 17961055 3, using g++ 4.9.2, why strings hashed different values buckets still have more 1 element?

the value returned hasher function size_t. hasher function assumed give @ least weakly pseudorandom (i.e. "pairwise independent" or similar) output when given random input. however, output of hasher function not used directly bin number of given input. require std::unordered_map has 2^64 bins on 64 bit machine... way memory.

instead, based on current number of elements, there number of bins, b. map take output of hasher, modulo b, usually, map items bins. items different hashes can go same bin. once hash table gets tightly packed, depending on heuristics, b increased , items rehashed reduce number of collisions. is, recompute hashes , take them modulo b' instead larger b'. how works implementation detail.


Comments

Popular posts from this blog

php - Admin SDK -- get information about the group -

dns - How To Use Custom Nameserver On Free Cloudflare? -

Python Error - TypeError: input expected at most 1 arguments, got 3 -