tm - How to convert corpus to data.frame with meta data in R -
how can convert corpus data frame in r contains meta data? tried suggestion convert corpus data.frame in r, resulting data frame contains text lines docs in corpus. need document id , maybe line number of text line in 2 columns. so, how can extend command: dataframe <- data.frame(text=unlist(sapply(mycorpus,
[, "content")), stringsasfactors=false)
data?
i tried
dataframe <- data.frame(id=sapply(corpus, meta(corpus, "id")), text=unlist(sapply(corpus, `[`, "content")), stringsasfactors=f)
but didn't help; got error message "error in match.fun(fun) : 'meta(corpus, "id")' ist nicht funktion, zeichen oder symbol"
the corpus extracted plain text files; here example:
> str(corpus) [...] $ 1178531510 :list of 2 ..$ content: chr [1:67] " uberrasch sagt [...] gemacht echt schad verursacht" ... ..$ meta :list of 7 .. ..$ author : chr(0) .. ..$ datetimestamp: posixlt[1:1], format: "2015-08-16 14:44:11" .. ..$ description : chr(0) .. ..$ heading : chr(0) .. ..$ id : chr "1178531510" # <--- id want in data.frame .. ..$ language : chr "de" .. ..$ origin : chr(0) .. ..- attr(*, "class")= chr "textdocumentmeta" ..- attr(*, "class")= chr [1:2] "plaintextdocument" "textdocument" [...]
many in advance :)
there 2 problems : should not repeat argument corpus in sapply
, , multi-paragraphs texts turned character vectors of length > 1 should paste before unlisting.
dataframe <- data.frame(id=sapply(corpus, meta, "id"), text=unlist(lapply(sapply(corpus, '[', "content"),paste,collapse="\n")), stringsasfactors=false)
Comments
Post a Comment