regex - Possible to change the record delimiter in R? -
is possible manipulate record/observation/row delimiter when reading in data (i.e. read.table) text file? it's straightforward adjust field delimiter using sep="", haven't found way change record delimiter end-of-line character.
i trying read in pipe delimited text files in many of entries long strings include carriage returns. r treats these crs end-of-line, begins new row incorrectly , screws number of records , field order.
i use different delimiter instead of cr. turns out, each row begins same string, if use use \nstring identify true end-of-line, table import correctly. here's simplified example of 1 of text files might like.
v1,v2,v3,v4 string,a,5,some text string,b,2,more text , more text string,b,7,some different text string,a,,
should read r as
v1 v2 v3 v4 string 5 text string b 2 more text , more text string b 7 different text string n/a n/a
i can open files in text editor , clean them find/replace before reading in, systematic solution within r great. help.
we can read them in , collapse them afterwards. g have value 0 header, 1 next line (and follow on lines, if any, go it) , on. tapply
collapses lines according g
giving l2
, re-read lines:
lines <- "v1,v2,v3,v4 string,a,5,some text string,b,2,more text , more text string,b,7,some different text string,a,," l <- readlines(textconnection(lines)) g <- cumsum(grepl("^string", l)) l2 <- tapply(l, g, paste, collapse = " ") df <- read.csv(text = l2, as.is = true) df$v4[ df$v4 == "" ] <- na
this gives:
> df v1 v2 v3 v4 1 string 5 text 2 string b 2 more text , more text 3 string b 7 different text 4 string na <na>
Comments
Post a Comment