Dynamically reordering jobs in a multiprocessing pool in Python -
i'm writing python script (for cygwin , linux environments) run regression testing on program run command line using subprocess.popen(). basically, have set of jobs, subset of need run depending on needs of developer (on order of 10 1000). each job can take anywhere few seconds 20 minutes complete.
i have jobs running across multiple processors, i'm trying eke out time savings intelligently ordering jobs (based on past performance) run longer jobs first. complication jobs (steady state calculations) need run before others (the transients based on initial conditions determined steady state).
my current method of handling run parent job , child jobs recursively on same process, jobs have multiple, long-running children. once parent job complete, i'd add children pool farm out other processes, need added head of queue. i'm not sure can multiprocessing.pool. looked examples manager, based on networking seems, , not particularly applicable. in form of code or links tutorial on multiprocessing (i've googled...) appreciated. here's skeleton of code i've got far, commented point out child jobs spawned off on other processors.
import multiprocessing import subprocess class job(object): def __init__(self, popenargs, runtime, children) self.popenargs = popenargs #list fed popen self.runtime = runtime #approximate runtime job self.children = children #jobs require job run first def runjob(job): subprocess.popen(job.popenargs).wait() #################################################### #i want remove this, , instead kick these pool j in job.children: runjob(j) #################################################### def main(jobs): # jobs argument contains jobs ready run # ie no children, parent-less jobs jobs.sort(key=lambda job: job.runtime, reverse=true) multiprocessing.pool(4).map(runjob, jobs)
first, let me second armin rigo's comment: there's no reason use multiple processes here instead of multiple threads. in controlling process you're spending of time waiting on subprocesses finish; don't have cpu-intensive work parallelize.
using threads make easier solve main problem. right you're storing jobs in attributes of other jobs, implicit dependency graph. need separate data structure orders jobs in terms of scheduling. also, each tree of jobs tied 1 worker process. want decouple workers data structure use hold jobs. workers each draw jobs same queue of tasks; after worker finishes job, enqueues job's children, can handled available worker.
since want child jobs inserted @ front of line when parent finished stack-like container seem fit needs; queue
module provides thread-safe lifoqueue
class can use.
import threading import subprocess queue import lifoqueue class job(object): def __init__(self, popenargs, runtime, children): self.popenargs = popenargs self.runtime = runtime self.children = children def run_jobs(queue): while true: job = queue.get() subprocess.popen(job.popenargs).wait() child in job.children: queue.put(child) queue.task_done() # parameter 'jobs' contains jobs have no parent. def main(jobs): job_queue = lifoqueue() num_workers = 4 jobs.sort(key=lambda job: job.runtime) job in jobs: job_queue.put(job) in range(num_workers): t = threading.thread(target=run_jobs, args=(job_queue,)) t.daemon = true t.start() job_queue.join()
a couple of notes: (1) can't know when work done monitoring worker threads, since don't keep track of work done. that's queue's job. main thread monitors queue object know when work complete (job_queue.join()
). can mark worker threads daemon threads, process exit whenever main thread without waiting on workers. thereby avoid need communication between main thread , worker threads in order tell latter when break out of loops , stop.
(2) know work done when tasks have been enqueued have been marked done (specifically, when task_done()
has been called number of times equal number of items have been enqueued). wouldn't reliable use queue's being empty condition work done; queue might momentarily , misleadingly empty between popping job , enqueuing job's children.
Comments
Post a Comment