Reshaping data in R without using dcast (reshape2) -
my dcast rcodes not running anymore. have problem discussed here: segfault in r using reshape2 package , dcast
the bug has not yet been fixed looking other ways of achieving dcast output. suggestions appreciated!
below small dput of dataset. basically, there's 1 entry per species per survey id ("eid"). 1 entry per survey id ("eid") species columns associated value ("value") i.e., wide format.
> dput(sample) structure(list(eid = c("l00155/69/2000-09-06", "q99107/178/1999-08-23", "g02192/1/2002-07-08", "g97158/1/1997-10-26", "q06091/2/2006-07-04", "l00004/171/2000-03-01", "g11094/15/2011-09-05", "q04127/16/2004-07-28", "q02122/230/2002-10-29", "g08002/6/2008-02-03", "q99006/143/1999-02-17", "q08053/3/2008-06-12", "q99128/22/1999-08-19", "l00177/83/2000-12-18", "q05122/11/2005-08-30", "q04156/44/2004-10-29", "l01097/69/2001-06-26", "g08004/169/2008-05-14", "q03041/26/2003-06-14", "g98115/60/1998-09-11", "g00002/20/2000-01-17", "g00002/20/2000-01-17", "g00054/1/2000-05-31", "g00054/1/2000-05-31"), tspp.name = structure(c(13l, 13l, 13l, 13l, 16l, 13l, 13l, 4l, 13l, 13l, 13l, 13l, 13l, 11l, 4l, 13l, 13l, 13l, 13l, 20l, 13l, 13l, 24l, 24l), .label = c("american plaice", "american sand lance", "arctic cod", "atlantic cod", "atlantic halibut", "atlantic herring", "bigeye tuna", "black dogfish", "bluefin tuna", "capelin", "greenland halibut", "lookdown", "northern shrimp", "ocean quahog", "porbeagle", "redfishes", "slenteye headlightfish", "smooth flounder", "spiny dogfish", "striped pink shrimp", "summer flounder", "white hake", "winter flounder", "witch flounder", "yellowtail flounder" ), class = "factor"), elasmo.name = structure(c(26l, 30l, 30l, 30l, 30l, 25l, 21l, 30l, 30l, 30l, 30l, 21l, 30l, 5l, 30l, 30l, 30l, 21l, 30l, 30l, 14l, 21l, 24l, 21l), .label = c("arctic skate", "atlantic sharpnose shark", "barndoor skate", "basking shark", "black dogfish", "blue shark", "deepsea cat shark", "greenland shark", "jensen's skate", "little skate", "manta", "ocean quahog", "oceanic whitetip shark", "porbeagle", "portuguese shark", "rough sagre", "roughtail stingray", "round skate", "sharks", "shortfin mako", "skates", "smooth skate", "soft skate", "spiny dogfish", "spinytail skate", "thorny skate", "white shark", "white skate", "winter skate", "na"), class = "factor"), elasmo.discard = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 25, 0, 0, 0, 1, 0, 0, 1, 1, 15, 25)), .names = c("eid", "tspp.name", "elasmo.name", "elasmo.discard"), class = "data.frame", row.names = c("18496", "488791", "87549", "236671", "139268", "15606", "11132", "115531", "93441", "159675", "403751", "42587", "485941", "19285", "130395", "119974", "73826", "7953", "99124", "351461", "71", "72", "184", "185"))
at end, wish obtain this:
library(plyr) test<-dcast(sample, ...~elasmo.name,value.var ="elasmo.discard",fun.aggregate=sum) test
note "dcast" code works here, fatal error when run on overall dataset has 145349 rows.
many thanks!!
this pre-hadley method; first aggregate sums, reshape.
foo <- aggregate(d[,4,drop=false], by=d[,1:3], sum) reshape(foo, v.names="elasmo.discard", idvar=c("eid", "tspp.name"), timevar="elasmo.name", direction="wide")
if first part slow, may have fewer columns in "by" part; looks tspp.name
defined eid
, if so, don't aggregate instead add in after fact.
if second part slow, perhaps try 1 of methods here: https://stackoverflow.com/a/9617424/210673.
to better on speeding up, provide appropriate example (perhaps using sample or rep) code can tested on. solution speed depends on how many unique combinations of each variable there are.
Comments
Post a Comment