dataframe - Reshape wide data frame to long format in R using column name information -
i having troubles converting complicated data base output wide form long format. has couple of hundred rows , ~1,000 columns. looks this:
the problem variable of df1_long or columns of df1_wide contain information want select upon. rather want have this:
id part task subgroup type result ind_a 12 aa 2 yes ind_a 12 bb 2 yes ind_a b 12 aa 3 opt_1 ind_a b 13 aa 4 100 ind_b 12 aa 2 no ind_b 12 bb 2 yes ind_b b 12 aa 3 opt_2 ind_b b 13 aa 4 50 ind_c 12 aa 2 no ind_c 12 bb 2 no ind_c b 12 aa 3 opt_1 ind_c b 13 aa 4 200
i dont mind numerical values/results transformed characters.
is there easy way accomplish stack() or reshape() function? or have code function performes strsplits commands on column names extract relevant information , store in separate new columns.
the long format more easy work , ask questions data set. coding >1000 columns cryptic columns names looks nightmare me.
ok fine. give in.
here's data:
df1_wide <- data.frame(v1 = c("ind_a", "ind_b", "ind_c"), v2 = c("y", "n", "n"), v3 = c("y", "y", "n"), v4 = c("op1_1", "opt_2", "opt_1"), v5 = c(100, 50, 200)) rownames(df1_wide) <- letters[1:3] colnames(df1_wide) <- c("id", "a_t12_aa (type # 2)", "a_t12_bb (type # 2)", "b_t12_aa (type # 3)", "b_t13_aa (type # 4)") df1_wide # id a_t12_aa (type # 2) a_t12_bb (type # 2) b_t12_aa (type # 3) b_t13_aa (type # 4) # ind_a y y op1_1 100 # b ind_b n y opt_2 50 # c ind_c n n opt_1 200
this have done far:
df1_long <- melt(df1_wide, id.vars="id")
this sounds want:
cbind(df1_long["id"], colsplit(gsub("\\s|\\(type|\\)|#", " ", df1_long$variable), pattern="_|\\s+", names = c("part", "task", "subgroup", "type")), df1_long["value"]) # id part task subgroup type value # 1 ind_a t12 aa 2 y # 2 ind_b t12 aa 2 n # 3 ind_c t12 aa 2 n # 4 ind_a t12 bb 2 y # 5 ind_b t12 bb 2 y # 6 ind_c t12 bb 2 n # 7 ind_a b t12 aa 3 op1_1 # 8 ind_b b t12 aa 3 opt_2 # 9 ind_c b t12 aa 3 opt_1 # 10 ind_a b t13 aa 4 100 # 11 ind_b b t13 aa 4 50 # 12 ind_c b t13 aa 4 200
i can't guarantee regular expression used gsub
work actual data, enough point in right direction.
in future, please note sharing screenshot of data pretty useless. please share data did in question, others can copy , paste started on trying out.
you may want consider rename variables along lines of a_t12_aa_2
, in future, in case, colsplit
step colsplit(variable, "_", c("part", "task", "subgroup", "type"))
.
Comments
Post a Comment