regex - Possible to change the record delimiter in R? -


is possible manipulate record/observation/row delimiter when reading in data (i.e. read.table) text file? it's straightforward adjust field delimiter using sep="", haven't found way change record delimiter end-of-line character.

i trying read in pipe delimited text files in many of entries long strings include carriage returns. r treats these crs end-of-line, begins new row incorrectly , screws number of records , field order.

i use different delimiter instead of cr. turns out, each row begins same string, if use use \nstring identify true end-of-line, table import correctly. here's simplified example of 1 of text files might like.

v1,v2,v3,v4 string,a,5,some text string,b,2,more text , more text string,b,7,some different text string,a,, 

should read r as

v1      v2       v3      v4 string         5       text string  b        2       more text , more text string  b        7       different text string         n/a     n/a 

i can open files in text editor , clean them find/replace before reading in, systematic solution within r great. help.

we can read them in , collapse them afterwards. g have value 0 header, 1 next line (and follow on lines, if any, go it) , on. tapply collapses lines according g giving l2 , re-read lines:

lines <- "v1,v2,v3,v4 string,a,5,some text string,b,2,more text , more text string,b,7,some different text string,a,,"  l <- readlines(textconnection(lines))  g <- cumsum(grepl("^string", l)) l2 <- tapply(l, g, paste, collapse = " ")  df <- read.csv(text = l2, as.is = true) df$v4[ df$v4 == "" ] <- na 

this gives:

> df       v1 v2 v3                      v4 1 string   5               text 2 string  b  2 more text , more text 3 string  b  7     different text 4 string  na                    <na> 

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -