python - multiprocessing.pool context and load balancing -


i've encountered unexpected behaviour of python multiprocessing pool class.

here questions:
1) when pool creates context, later used serialization?
the example below runs fine long pool object created after container definition. if swap pool initializations, serialization error occurs. in production code initialize pool way before defining container class. possible refresh pool "context" or achieve in way.
2) pool have own load balancing mechanism , if how work?
if run similar example on i7 machine pool of 8 processes following results:
- light evaluation function pool favours using 1 process computation. creates 8 processes requested of time 1 used (i printed pid inside , see in htop).
- heavy evaluation function behaviour expected. uses 8 processes equally.

3) when using pool see 4 more processes requested (i.e. pool(processes=2) see 6 new processes). role?

i use linux python 2.7.2

from multiprocessing import pool datetime import datetime  power = 10  def eval_power(container):     power in xrange(2, power):         container.val **= power     return container  #processes = pool(processes=2)  class container(object):     def __init__(self, value):         self.val = value  processes = pool(processes=2)  if __name__ == "__main__":     cont = [container(foo) foo in xrange(20)]     = datetime.now()     processes.map(eval_power, cont)     = datetime.now()     print "eval time:", - 


edit - bakuriu
1) afraid that's case.
2) don't understand linux scheduler has python assigning computations processes. situation can ilustrated example below:

from multiprocessing import pool os import getpid collections import counter   def light_func(ind):     return getpid()   def heavy_func(ind):     foo in xrange(1000000):         ind += foo     return getpid()   if __name__ == "__main__":     list_ = range(100)     pool = pool(4)     l_func = pool.map(light_func, list_)     h_func = pool.map(heavy_func, list_)      print "light func:", counter(l_func)     print "heavy func:", counter(h_func) 


on i5 machine (4 threads) following results:
light func: counter({2967: 100})
heavy func: counter({2969: 28, 2967: 28, 2968: 23, 2970: 21})

it seems situation i've described it. still don't understand why python way. guess tries minimise communication expenses, still mechanism uses load balancing unknown. documentation isn't helpful either, multiprocessing module poorly documented.
3) if run above code 4 more processes described before. screen comes htop: http://i.stack.imgur.com/pldmm.png

  1. the pool object creates subprocesses during call __init__ hence must define container before. way, wouldn't include code in single file use module implement container , other utilities , write small file launches main program.

  2. the pool described in documentation. in particular has no control on scheduling of processes hence see linux's scheduler thinks right. small computations take little time scheduler doesn't bother parallelizing them(this have better performances due core affinity etc.)

  3. could show example , see in task manager? think may processes handle queue inside pool, i'm not sure. on machine can see main process plus 2 subprocesses.


update on point 2:

the pool object puts tasks queue, , child processes arguments queue. if process takes no time execute object, linux scheduler let process execute more time(hence consuming more items queue). if execution takes time scheduler change processes , other child processes executed.

in case single process consuming items because computation take little time before other child processes ready has finished items.

as said, pool doesn't balancing work of subprocesses. it's queue , bunch of workers, pool puts items in queue , processes items , compute results. afaik thing control queue putting number of tasks in single item in queue(see the documentation) there no guarantee process grab task. else left os.

on machine results less extreme. 2 processes twice number of calls other 2 light computation, while heavy 1 have more or less same number of items processed. on different oses and/or hardware obtain different results.


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -