Applying a set of operations across several data frames in r -


i've been learning r project , have been unable google solution current problem.

i have ~ 100 csv files , need perform exact set of operations across them. i've read them in separate objects (which assume improper r style) i've been unable write function can loop through. each csv dataframe contain information, including column dates in decimal year form. need create 2 new columns containing year , day of year. i've figured out how manually find way automate process. here's i've been doing:

#setup library(lubridate)  #used check leap years df.00 <- data.frame( site = seq(1:10), date = runif(10,1980,2000 ))   #what need done  df.00$doy <- na  # make empty column i'm going place day of year df.00$year <- floor(df.00$date) # grabs year date column df.00$dday <- df.00$date - df.00$year # year fraction. intermediate step.  # multiply fraction year 365 or 366 if it's leap year give me day of year df.00$doy[which(leap_year(df.00$year))] <- round(df.00$dday[which(leap_year(df.00$year))] * 366) df.00$doy[which(!leap_year(df.00$year))] <- round(df.00$dday[which(!leap_year(df.00$year))] * 365) 

the above, while inelegant, to. however, need other data frames, df.01 - df.99. far i've been unable place in function or loop. if place function:

funtest <- function(x) {     x$doy <- na } 

funtest(df.00) nothing. expect understanding of how functions work in r if wrap in loop:

for(i in c(df.00)) {      i$doy <- na } 

i "in i$doy <- na : coercing lhs list" several times tells me loop isn't treat dataframe single unit perhaps looking @ each column in frame.
appreciate insight on should doing. feel have solved using bash , awk less incompetent using r

the efficient , direct way use list.

  1. put of csv's 1 folder
  2. grab list of files in folder
    eg: files <- dir('path/to/folder', full.names=true)
  3. iterativly read in files list of data.frames
    eg: df.list <- lapply(files, read.csv, <additional args>)
  4. apply function iteratively on each data.frame eg: lapply(df.list, myfunc, <additional args>)

since df's loaded, , have nice convenient names, can grab them using following:

nms     <- c(paste0("df.0", 0:9), paste0("df.", 10:99)) df.list <- lapply(nms, get) 

then take have in #what need done portion , put inside function, eg:

myfunc <- function(df) {     # want done single df return(df) } 

and lapply accordingly

df.list <- lapply(df.list, myfunc) 

on separate notes, regarding functions:

the reason funtest "does nothing" not having return anything. say, is doing something, when finishes doing that, then "nothing".

you need include return(.) statement in function. alternatively, output of last line of function, if not assigned object, used return value -- last sentence loosely true , hence 1 needs cautious. cleanest option (in opinion) use return(.)


regarding for loop on data.frame

as observed, using for (i in somedataframe) {...} iterates on columns of data.frame.

you can iterate on rows using apply:

apply(mydf, margin=1, function(x) { x$doy <- ...; return(x) } ) # dont forget return 

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -