Parallel computing in R (Windows): changing code from foreach %do% to foreach %dopar% -


i have written code run several time-series rolling-regressions multiple securities. since number of securities more 10,000, , having more 200 rolling windows each security, runtime sequential set-up (using foreach %do%) 30min.

i implement foreach %dopar% parallel computing instead, using "doparrallel" backend. changing %do% %dopar% in code doesn't trick. new parrallel computing method, , hope help.

here foreach %do% code:

sec = ncol(ret.zoo) num.factors = 2 rows = nrow(ret.zoo) - 60 + 1 beta.temp = matrix(nc = num.factors + 1, nr = sec*rows) gvkey.vec = matrix(nc = 1, nr = sec*rows)  d = 1 foreach(i=1:sec) %do% {       df = merge(ret.zoo[,i], data)       names(df) <- c("return", names(data))       gvkey = substr(colnames(ret.zoo)[i],2,9)        reg = function(z) {           z.df = as.data.frame(z)           ret = z.df[,which(names(z.df) ==  "return")]           ret.no.na = ret[!is.na(ret)]           if(length(ret.no.na) >= 30) {              coef(lm(return ~ val + size, data = as.data.frame(z), na.action = na.omit))           }           else {              as.numeric(rep(na,num.factors + 1))   ## "+1" intercept value           }           }       beta = rollapply(df, width = 60, fun = reg, by.column = false, align = "right")      beta.temp[d:(d+rows-1),] = beta      gvkey.vec[d:(d+rows-1),] = gvkey      d = d+rows } beta.df = data.frame(secid = gvkey.vec, date = rep(index(beta), sec), beta.temp) colnames(beta.df) <- c("gvkey", "date", "intercept", "val", "size") 

in order enable parallel computing using %dopar%, have called , registered backend "doparallel".

thank much!

update

here first try:

library(doparallel) ## parallel backend foreach function registerdoparallel()  sec = ncol(ret.zoo) num.factors = 2 rows = nrow(ret.zoo) - 60 + 1  result <- foreach(i=1:sec) %dopar% {     library(zoo)     library(stats)      df = merge(ret.zoo[,i], data)     names(df) <- c("return", names(data))     gvkey = substr(colnames(ret.zoo)[i],2,9)      reg = function(z) {         z.df = as.data.frame(z)         ret = z.df[,which(names(z.df) ==  "return")]         ret.no.na = ret[!is.na(ret)]         if(length(ret.no.na) >= 30) {             coef(lm(return ~ val + size, data = as.data.frame(z), na.action = na.omit))         }         else {             as.numeric(rep(na,num.factors + 1))   ## "+1" intercept value         }        }      rollapply(df, width = 60, fun = reg, by.column = false, align = "right") } beta.df = do.call('combine', result) 

this works until end of loop. however, beta.df = do.call('combine', result) gives following error: error in do.call("combine", result) : not find function "combine".

how can combine output of result. list rather dataframe.

thanks,

here way of combining results different clusters dataframe (very efficient runtime standpoint):

lstdata <- map(as.data.frame, result) dfdata <- rbindlist(lstdata) beta.df = as.data.frame(dfdata) 

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -