c# - Dumping multithreaded accessed ConcurrentBag to File not fast enough -


i have built code process string comparison between large number of strings in parallel go faster.

i've used concurrentbag threads (tasks) can write thread safe collection. dump collection file.

the issue have concurrentbag<string> log dump file filled faster can write file. program consumes more , more ram continuously until runs out of memory.

my question can ? improve writing log ? pause tasks until concurrentbag dumped resume tasks ? fastest option ?

here code :

csvwriter csv = new csvwriter(@"c:\test.csv");  list<bailleur> bailleurs = dataloader.loadbailleurs(); concurrentbag<string> log = new concurrentbag<string>(); int = 0;  var taskwritetolog = new task(() => {     // consume items in bag     string item;     while (true)  //  (!log.isempty)     {         if (!log.isempty)         {             if (log.trytake(out item))             {                 csv.writeline(item);             }             else                 console.writeline("concurrent bag busy");         }         else         {             system.threading.thread.sleep(1000);         }     } });  taskwritetolog.start();  parallel.foreach(bailleurs, s1 => {     foreach (bailleur s2 in bailleurs)     {         var lcs2 = longestcommonsubsequenceextensions.longestcommonsubsequence(s1.name, s2.name);         string line = string.format("\"lcs\",\"{0}\",\"{1}\",\"{2}\"", s1.name, s2.name, lcs2.item2);         log.add(line);         // console.writeline(line);          var dic = dicecoefficientextensions.dicecoefficient(s1.name, s2.name);         line = string.format("\"dice\",\"{0}\",\"{1}\",\"{2}\"", s1.name, s2.name, dic);         log.add(line);         // console.writeline(line);     }     i++;     console.writeline(i); });  public class csvwriter {     public string filepath { get; set; }     private filestream _fs { get; set; }     private streamwriter _sw { get; set; }      public csvwriter2(string filepath)     {         filepath = filepath;         _fs = new filestream(filepath, filemode.create, fileaccess.write);         _sw = new streamwriter(_fs);     }      public void writeline(string line)     {         _sw.writeline(line);     } } 

don't use concurrent bag directly, use blockingcollection has concurrent bag backing store (by default concurrent queue).

one of constructor overloads lets set upper limit on size of collection, if bag gets full block inserting thread untill there room insert.

it gives getconsumingenumerable() makes easy take items out of bag, use in foreach loop , keep giving consumer data till completeadding called. after runs till bag empty exits other normal ienumerable has completed. if bag "goes dry" before completeadding called block thread , automatically restart when more data put in bag.

void processlog() {     csvwriter csv = new csvwriter(@"c:\test.csv");      list<bailleur> bailleurs = dataloader.loadbailleurs();      const int max_bag_size = 500;     blockingcollection<string> log = new blockingcollection<string>(new concurrentbag<string>(), max_bag_size);      int = 0;      var taskwritetolog = new task(() =>     {         // consume items in bag, no need sleeps or poleing, when items available runs, when bag empty completedadding has not been called blocks.         foreach(string item in log.getconsumingenumerable())         {             csv.writeline(item);         }     });      taskwritetolog.start();      parallel.foreach(bailleurs, s1 =>     {         //snip... can switch blockingcollection without changes section of code.     });      log.completeadding(); //lets using getconsumingenumerable know no new items comming can leave foreach loops when bag becomes empty. } 

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -