dataframe - Reshape wide data frame to long format in R using column name information -

January 15, 2012

i having troubles converting complicated data base output wide form long format. has couple of hundred rows , ~1,000 columns. looks this:

wide df , melt result

the problem variable of df1_long or columns of df1_wide contain information want select upon. rather want have this:

id       part   task    subgroup    type    result ind_a           12          aa       2    yes ind_a           12          bb       2    yes ind_a       b     12          aa       3    opt_1 ind_a       b     13          aa       4    100 ind_b           12          aa       2    no ind_b           12          bb       2    yes ind_b       b     12          aa       3    opt_2 ind_b       b     13          aa       4    50 ind_c           12          aa       2    no ind_c           12          bb       2    no ind_c       b     12          aa       3    opt_1 ind_c       b     13          aa       4    200

i dont mind numerical values/results transformed characters.

is there easy way accomplish stack() or reshape() function? or have code function performes strsplits commands on column names extract relevant information , store in separate new columns.

the long format more easy work , ask questions data set. coding >1000 columns cryptic columns names looks nightmare me.

ok fine. give in.

here's data:

df1_wide <- data.frame(v1 = c("ind_a", "ind_b", "ind_c"),                         v2 = c("y", "n", "n"), v3 = c("y", "y", "n"),                         v4 = c("op1_1", "opt_2", "opt_1"),                         v5 = c(100, 50, 200)) rownames(df1_wide) <- letters[1:3] colnames(df1_wide) <- c("id", "a_t12_aa (type # 2)", "a_t12_bb (type # 2)",                          "b_t12_aa (type # 3)", "b_t13_aa (type # 4)") df1_wide #      id a_t12_aa (type # 2) a_t12_bb (type # 2) b_t12_aa (type # 3) b_t13_aa (type # 4) # ind_a                   y                   y               op1_1                 100 # b ind_b                   n                   y               opt_2                  50 # c ind_c                   n                   n               opt_1                 200

this have done far:

df1_long <- melt(df1_wide, id.vars="id")

this sounds want:

cbind(df1_long["id"],       colsplit(gsub("\\s|\\(type|\\)|#", " ", df1_long$variable),                 pattern="_|\\s+",                 names = c("part", "task", "subgroup", "type")),       df1_long["value"]) #       id part task subgroup type value # 1  ind_a     t12       aa    2     y # 2  ind_b     t12       aa    2     n # 3  ind_c     t12       aa    2     n # 4  ind_a     t12       bb    2     y # 5  ind_b     t12       bb    2     y # 6  ind_c     t12       bb    2     n # 7  ind_a    b  t12       aa    3 op1_1 # 8  ind_b    b  t12       aa    3 opt_2 # 9  ind_c    b  t12       aa    3 opt_1 # 10 ind_a    b  t13       aa    4   100 # 11 ind_b    b  t13       aa    4    50 # 12 ind_c    b  t13       aa    4   200

i can't guarantee regular expression used gsub work actual data, enough point in right direction.

in future, please note sharing screenshot of data pretty useless. please share data did in question, others can copy , paste started on trying out.

you may want consider rename variables along lines of a_t12_aa_2, in future, in case, colsplit step colsplit(variable, "_", c("part", "task", "subgroup", "type")).

Search This Blog

Detect

dataframe - Reshape wide data frame to long format in R using column name information -

Comments

Post a Comment

Popular posts from this blog

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -

c++ - importing crypto++ in QT application and occurring linker errors? -