Dynamically reordering jobs in a multiprocessing pool in Python -


i'm writing python script (for cygwin , linux environments) run regression testing on program run command line using subprocess.popen(). basically, have set of jobs, subset of need run depending on needs of developer (on order of 10 1000). each job can take anywhere few seconds 20 minutes complete.

i have jobs running across multiple processors, i'm trying eke out time savings intelligently ordering jobs (based on past performance) run longer jobs first. complication jobs (steady state calculations) need run before others (the transients based on initial conditions determined steady state).

my current method of handling run parent job , child jobs recursively on same process, jobs have multiple, long-running children. once parent job complete, i'd add children pool farm out other processes, need added head of queue. i'm not sure can multiprocessing.pool. looked examples manager, based on networking seems, , not particularly applicable. in form of code or links tutorial on multiprocessing (i've googled...) appreciated. here's skeleton of code i've got far, commented point out child jobs spawned off on other processors.

import multiprocessing import subprocess  class job(object):   def __init__(self, popenargs, runtime, children)     self.popenargs = popenargs #list fed popen     self.runtime = runtime #approximate runtime job     self.children = children #jobs require job run first  def runjob(job):   subprocess.popen(job.popenargs).wait()   ####################################################   #i want remove this, , instead kick these pool   j in job.children:      runjob(j)   ####################################################  def main(jobs):   # jobs argument contains jobs ready run   # ie no children, parent-less jobs   jobs.sort(key=lambda job: job.runtime, reverse=true)   multiprocessing.pool(4).map(runjob, jobs) 

first, let me second armin rigo's comment: there's no reason use multiple processes here instead of multiple threads. in controlling process you're spending of time waiting on subprocesses finish; don't have cpu-intensive work parallelize.

using threads make easier solve main problem. right you're storing jobs in attributes of other jobs, implicit dependency graph. need separate data structure orders jobs in terms of scheduling. also, each tree of jobs tied 1 worker process. want decouple workers data structure use hold jobs. workers each draw jobs same queue of tasks; after worker finishes job, enqueues job's children, can handled available worker.

since want child jobs inserted @ front of line when parent finished stack-like container seem fit needs; queue module provides thread-safe lifoqueue class can use.

import threading import subprocess queue import lifoqueue  class job(object):   def __init__(self, popenargs, runtime, children):     self.popenargs = popenargs     self.runtime = runtime     self.children = children  def run_jobs(queue):   while true:     job = queue.get()     subprocess.popen(job.popenargs).wait()     child in job.children:        queue.put(child)     queue.task_done()  # parameter 'jobs' contains jobs have no parent. def main(jobs):   job_queue = lifoqueue()   num_workers = 4   jobs.sort(key=lambda job: job.runtime)   job in jobs:     job_queue.put(job)   in range(num_workers):     t = threading.thread(target=run_jobs, args=(job_queue,))     t.daemon = true     t.start()   job_queue.join() 

a couple of notes: (1) can't know when work done monitoring worker threads, since don't keep track of work done. that's queue's job. main thread monitors queue object know when work complete (job_queue.join()). can mark worker threads daemon threads, process exit whenever main thread without waiting on workers. thereby avoid need communication between main thread , worker threads in order tell latter when break out of loops , stop.

(2) know work done when tasks have been enqueued have been marked done (specifically, when task_done() has been called number of times equal number of items have been enqueued). wouldn't reliable use queue's being empty condition work done; queue might momentarily , misleadingly empty between popping job , enqueuing job's children.


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -