machine learning - R: Naives Bayes classifier bases decision only on a-priori probabilities -

June 15, 2010

i'm trying classify tweets according sentiment 3 categories (buy, hold, sell). i'm using r , package e1071.

i have 2 data frames: 1 trainingset , 1 set of new tweets sentiment need predicted.

trainingset dataframe:

   +--------------------------------------------------+     **text | sentiment**     *this stock buy* | buy     *markets crash in tokyo* | sell     *everybody excited new products* | hold     +--------------------------------------------------+

now want train model using tweet text trainingset[,2] , sentiment category trainingset[,4].

classifier<-naivebayes(trainingset[,2],as.factor(trainingset[,4]), laplace=1)

looking elements of classifier

classifier$tables$x

i find conditional probabilities calculated..there different probabilities every tweet concerning buy,hold , sell.so far good.

however when predict training set with:

predict(classifier, trainingset[,2], type="raw")

i classification based only on a-priori probabilities, means every tweet classified hold (because "hold" had largest share among sentiment). every tweet has same probabilities buy, hold, , sell:

      +--------------------------------------------------+        **id | buy | hold | sell**        1  |0.25 | 0.5  | 0.25        2  |0.25 | 0.5  | 0.25        3  |0.25 | 0.5  | 0.25       ..  |..... | ....  | ...        n  |0.25 | 0.5  | 0.25       +--------------------------------------------------+

any ideas i'm doing wrong? appreciate help!

thanks

it looks trained model using whole sentences inputs, while seems want use words input features.

usage:

## s3 method class 'formula' naivebayes(formula, data, laplace = 0, ..., subset, na.action = na.pass) ## default s3 method: naivebayes(x, y, laplace = 0, ...)   ## s3 method class 'naivebayes' predict(object, newdata,   type = c("class", "raw"), threshold = 0.001, ...)

arguments:

  x: numeric matrix, or data frame of categorical and/or      numeric variables.    y: class vector.

in particular, if train naivebayes way:

x <- c("john likes cake", "marry likes cats , john") y <- as.factor(c("good", "bad"))  bayes<-naivebayes( x,y )

you classifier able recognize these 2 sentences:

naive bayes classifier discrete predictors  call: naivebayes.default(x = x,y = y)  a-priori probabilities: y  bad   0.5  0.5   conditional probabilities:             x       x y      john likes cake marry likes cats , john   bad                0                         1                 1                         0

to achieve word level classifier need run words inputs

x <-             c("john","likes","cake","marry","likes","cats","and","john") y <- as.factors( c("good","good", "good","bad",  "bad",  "bad", "bad","bad") ) bayes<-naivebayes( x,y )

you get

naive bayes classifier discrete predictors  call: naivebayes.default(x = x,y = y)  a-priori probabilities: y  bad   0.625 0.375   conditional probabilities:       x y            ,      cake      cats      john     likes     marry   bad  0.2000000 0.0000000 0.2000000 0.2000000 0.2000000 0.2000000   0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000

in general r not suited processing nlp data, python (or @ least java) better choice.

to convert sentence words, can use strsplit function

unlist(strsplit("john likes cake"," ")) [1] "john"  "likes" "cake"

Search This Blog

Detect

machine learning - R: Naives Bayes classifier bases decision only on a-priori probabilities -

Comments

Post a Comment

Popular posts from this blog

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -

c++ - importing crypto++ in QT application and occurring linker errors? -