How do you subset a data frame in R based on a minimum sample size -


let's have data frame 2 levels of factors looks this:

factor1    factor2    value          1          0.75          1          0.34          2          1.21             2          0.75           2          0.53 b          1          0.42 b          2          0.21   b          2          0.18 b          2          1.42 

etc.

how subset data frame ("df", if will) based on condition combination of factor1 , factor2 (fact1*fact2) has more than, say, 2 observations? can use length argument in subset this?

assuming data.frame called mydf, can use ave create logical vector subset:

mydf[with(mydf, as.logical(ave(factor1, factor1, factor2,                             fun = function(x) length(x) > 2))), ] #   factor1 factor2 value # 3             2  1.21 # 4             2  0.75 # 5             2  0.53 # 7       b       2  0.21 # 8       b       2  0.18 # 9       b       2  1.42 

here's ave counting combinations. notice ave returns object same length number of rows in data.frame (this makes convenient subsetting).

> with(mydf, ave(factor1, factor1, factor2, fun = length)) [1] "2" "2" "3" "3" "3" "1" "3" "3" "3" 

the next step compare length threshold. need anonymous function our fun argument.

> with(mydf, ave(factor1, factor1, factor2, fun = function(x) length(x) > 2)) [1] "false" "false" "true"  "true"  "true"  "false" "true"  "true"  "true"  

almost there... since first item character vector, our output character vector. want as.logical can directly use subsetting.


ave doesn't work on objects of class factor, in case you'll need like:

mydf[with(mydf, as.logical(ave(as.character(factor1), factor1, factor2,                                 fun = function(x) length(x) > 2))),] 

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -