Parallel computing in R (Windows): changing code from foreach %do% to foreach %dopar% -
i have written code run several time-series rolling-regressions multiple securities. since number of securities more 10,000, , having more 200 rolling windows each security, runtime sequential set-up (using foreach %do%) 30min.
i implement foreach %dopar% parallel computing instead, using "doparrallel" backend. changing %do% %dopar% in code doesn't trick. new parrallel computing method, , hope help.
here foreach %do% code:
sec = ncol(ret.zoo) num.factors = 2 rows = nrow(ret.zoo) - 60 + 1 beta.temp = matrix(nc = num.factors + 1, nr = sec*rows) gvkey.vec = matrix(nc = 1, nr = sec*rows) d = 1 foreach(i=1:sec) %do% { df = merge(ret.zoo[,i], data) names(df) <- c("return", names(data)) gvkey = substr(colnames(ret.zoo)[i],2,9) reg = function(z) { z.df = as.data.frame(z) ret = z.df[,which(names(z.df) == "return")] ret.no.na = ret[!is.na(ret)] if(length(ret.no.na) >= 30) { coef(lm(return ~ val + size, data = as.data.frame(z), na.action = na.omit)) } else { as.numeric(rep(na,num.factors + 1)) ## "+1" intercept value } } beta = rollapply(df, width = 60, fun = reg, by.column = false, align = "right") beta.temp[d:(d+rows-1),] = beta gvkey.vec[d:(d+rows-1),] = gvkey d = d+rows } beta.df = data.frame(secid = gvkey.vec, date = rep(index(beta), sec), beta.temp) colnames(beta.df) <- c("gvkey", "date", "intercept", "val", "size")
in order enable parallel computing using %dopar%, have called , registered backend "doparallel".
thank much!
update
here first try:
library(doparallel) ## parallel backend foreach function registerdoparallel() sec = ncol(ret.zoo) num.factors = 2 rows = nrow(ret.zoo) - 60 + 1 result <- foreach(i=1:sec) %dopar% { library(zoo) library(stats) df = merge(ret.zoo[,i], data) names(df) <- c("return", names(data)) gvkey = substr(colnames(ret.zoo)[i],2,9) reg = function(z) { z.df = as.data.frame(z) ret = z.df[,which(names(z.df) == "return")] ret.no.na = ret[!is.na(ret)] if(length(ret.no.na) >= 30) { coef(lm(return ~ val + size, data = as.data.frame(z), na.action = na.omit)) } else { as.numeric(rep(na,num.factors + 1)) ## "+1" intercept value } } rollapply(df, width = 60, fun = reg, by.column = false, align = "right") } beta.df = do.call('combine', result)
this works until end of loop. however, beta.df = do.call('combine', result)
gives following error: error in do.call("combine", result) : not find function "combine"
.
how can combine output of result. list rather dataframe.
thanks,
here way of combining results different clusters dataframe (very efficient runtime standpoint):
lstdata <- map(as.data.frame, result) dfdata <- rbindlist(lstdata) beta.df = as.data.frame(dfdata)
Comments
Post a Comment