c# - Conflicting records when recording time taken to for regex to to find a match -
what wanted: had see , prove outcome whether simple regex runned slower compiled 1 or not. decided generate 100000 random strings, make 2 regex- 1 simple , 1 compiled- , check match in 100000 set of strings. also, have many outputs resolve from, did took 100000 strings , ran code , logged timings, took first 80000 strings same set , logged output, took first 50000 strings same set , logged output , on...
the code:
var chars = "abcdefghijklmnopqrstuvwxyz"; var random = new random(); int[] reps = new int[] { 1,10,100,1000,2000,5000,8000,10000,20000,50000,80000,100000 }; list<string> strings = new list<string>(); // generating random strings (int = 0; < reps[reps.length-1]; i++) strings.add(new string( enumerable.repeat(chars, 8) .select(s => s[random.next(s.length)]) .toarray())); string regexstr = "[aeiou]{2,3}(qwerty|asdfgh|zxcvbn){}"; regex regexsimple = new regex(regexstr); regex regexcompiled = new regex(regexstr, regexoptions.compiled); using (system.io.streamwriter file = new system.io.streamwriter(@"c:\users\harshittiwari\desktop\assignment1\test2.txt")) { file.writeline("numberofstrings,ticksforsimpleregex,ticksforcomplexregex"); list<long> simple = new list<long>(); list<long> compiled = new list<long>(); (int j = reps.length - 1; j >= 0; j--) { stopwatch time1 = stopwatch.startnew(); (int k = 0; k < reps[j]; k++) regexsimple.matches(strings[k]); time1.stop(); simple.add(time1.elapsedticks); } (int j = reps.length - 1; j >= 0; j--) { stopwatch time1 = stopwatch.startnew(); (int k = 0; k < reps[j]; k++) regexcompiled.matches(strings[k]); time1.stop(); compiled.add(time1.elapsedticks); } (int j = reps.length - 1,k=0; j >= 0; j--,k++) file.writeline(reps[j] + "," + simple[k] + "," + compiled[k]); } the problem: weird output hard explain. output is:
numberofstrings,ticksforsimpleregex,ticksforcomplexregex 100000,300368,217506 80000,240373,201553 50000,178212,98878 20000,13362,202933 10000,6417,6377 8000,5868,7408 5000,3737,3142 2000,160473,1921 1000,1351,1883 100,84,141 10,23,21 1,17,17 note ticksforsimpleregex numberofstrings=2000 greater numberofstrings=5000. everytime run program it's more or less same. because of caching problem or due compiler optimization? also, should make output consistent? consistent mean ticksforsimpleregex should in decreasing order(basic logic: number of strings decreases, time taken shold decrease.)
here wanted that, reading less number of strings , going more number of strings, like:
1,..,.. 10,..,.. 100,..,.. 1000,..,.. ... 80000,..,.. 100000,..,.. however, realized caching issues , decided go order there now.
edit1: read http://allben.net/post/2009/08/06/performance-compiled-vs-interpreted-regular-expressions taht interpreted(simple) regex should have taken less time, in our case, not result. why so?
when using ismatch() method instead of matches(), code not create large amount of objects need collected (and collected when see performance hits). seem getting quite consistent results way.
Comments
Post a Comment