r - Predicting LDA topics for new data -
it looks question has may have been asked few times before (here and here), has yet answered. i'm hoping due previous ambiguity of question(s) asked, indicated comments. apologize if breaking protocol asking simliar question again, assumed questions not seeing new answers.
anyway, new latent dirichlet allocation , exploring use means of dimension reduction textual data. extract smaller set of topics large bag of words , build classification model using topics few variables in model. i've had success in running lda on training set, problem having being able predict of same topics appear in other test set of data. using r's topicmodels package right now, if there way using other package open well.
here example of trying do:
library(topicmodels) data(associatedpress) train <- associatedpress[1:100] test <- associatedpress[101:150] train.lda <- lda(train,5) topics(train.lda) #how can predict topic(s) "train.lda" each document in "test"?
with of ben's superior document reading skills, believe possible using posterior() function.
library(topicmodels) data(associatedpress) train <- associatedpress[1:100] test <- associatedpress[101:150] train.lda <- lda(train,5) (train.topics <- topics(train.lda)) # [1] 4 5 5 1 2 3 1 2 1 2 1 3 2 3 3 2 2 5 3 4 5 3 1 2 3 1 4 4 2 5 3 2 4 5 1 5 4 3 1 3 4 3 2 1 4 2 4 3 1 2 4 3 1 1 4 4 5 # [58] 3 5 3 3 5 3 2 3 4 4 3 4 5 1 2 3 4 3 5 5 3 1 2 5 5 3 1 4 2 3 1 3 2 5 4 5 5 1 1 1 4 4 3 test.topics <- posterior(train.lda,test) (test.topics <- apply(test.topics$topics, 1, which.max)) # [1] 3 5 5 5 2 4 5 4 2 2 3 1 3 3 2 4 3 1 5 3 5 3 1 2 2 3 4 1 2 2 4 4 3 3 5 5 5 2 2 5 2 3 2 3 3 5 5 1 2 2
Comments
Post a Comment