C++ multithreading with openMP: poor performance despite localized variables (false sharing?) -


i have run weird openmp problem.

the task take vector of strings , split each element contained k-mers (all contained substrings of length k). should parallelize trivially along elements of vector, k-merification procedure happens independently each element. want store results in map/set stl data structure (std::map<long long, std::map<std::string, std::set<unsigned int> > > local_forreturn) , , allocate thread-local variable that.

the achieved parallelization behaviour, however, surprisingly bad - top on linux shows cpu usage of ~ 200%, despite running 40 threads on 40 core machine. (and have tested #omp critical section not bottleneck).

my hunch might related false sharing, actual data contained in localized map/set stl classes end on heap. however, have neither idea of how test intuition, or how reduce false sharing stl constructs (if problem). appreciate ideas!

complete code:

#include <string> #include <assert.h> #include <set> #include <map> #include <vector> #include <omp.h> #include <iostream>  int threads = 40; int k = 31;  std::string generaterandomsequence(int length); char randomnucleotide(); std::vector<std::string> partitionstringintokmers(std::string str, int k);  int main(int argc, char *argv[]) {     // generate test data     std::vector<std::string> requiredseq;     for(unsigned int = 0; < 10000; i++)     {         std::string seq = generaterandomsequence(20000);         requiredseq.push_back(seq);     }      // variable contain final result     std::map<long long, std::map<std::string, std::map<unsigned int, int> > > forreturn;      omp_set_num_threads(threads);      std::cerr << "data generated, start parallel processing\n" << std::flush;      // split workload (ie requiredseq) according number of threads     long long max_i = requiredseq.size() - 1;     long long chunk_size = max_i / threads;     #pragma omp parallel     {         assert(omp_get_num_threads() == threads);         long long thisthread = omp_get_thread_num();         long long firstpair = thisthread * chunk_size;         long long lastpair = (thisthread+1) * chunk_size - 1;         if((thisthread == (threads-1)) && (lastpair < max_i))         {             lastpair = max_i;         }          std::map<long long, std::map<std::string, std::map<unsigned int, int> > > local_forreturn;          for(long long seqi = firstpair; seqi <= lastpair; seqi++)         {             const std::string& seq_sequence = requiredseq.at(seqi);              const std::vector<std::string> kmersinsegment = partitionstringintokmers(seq_sequence, k);             for(unsigned int kmeri = 0; kmeri < kmersinsegment.size(); kmeri++)             {                 const std::string& kmerseq = kmersinsegment.at(kmeri);                 local_forreturn[seqi][kmerseq][kmeri]++;             }            }          #pragma omp critical         {             forreturn.insert(local_forreturn.begin(), local_forreturn.end());         }     }      return 0;    }  std::string generaterandomsequence(int length) {     std::string forreturn;     forreturn.resize(length);     for(int = 0; < length; i++)     {         forreturn.at(i) = randomnucleotide();     }     return forreturn; }  char randomnucleotide() {     char nucleotides[4] = {'a', 'c', 'g', 't'};     int n = rand() % 4;     assert((n >= 0) && (n <= 3));     return nucleotides[n]; }   std::vector<std::string> partitionstringintokmers(std::string str, int k) {     std::vector<std::string> forreturn;     if((int)str.length() >= k)     {         forreturn.resize((str.length() - k)+1);          for(int = 0; <= (int)(str.length() - k); i++)         {             std::string kmer = str.substr(i, k);             assert((int)kmer.length() == k);             forreturn.at(i) = kmer;         }     }     return forreturn; } 


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -