machine learning - R: Naives Bayes classifier bases decision only on a-priori probabilities -
i'm trying classify tweets according sentiment 3 categories (buy, hold, sell). i'm using r , package e1071.
i have 2 data frames: 1 trainingset , 1 set of new tweets sentiment need predicted.
trainingset dataframe:
+--------------------------------------------------+ **text | sentiment** *this stock buy* | buy *markets crash in tokyo* | sell *everybody excited new products* | hold +--------------------------------------------------+
now want train model using tweet text trainingset[,2]
, sentiment category trainingset[,4]
.
classifier<-naivebayes(trainingset[,2],as.factor(trainingset[,4]), laplace=1)
looking elements of classifier
classifier$tables$x
i find conditional probabilities calculated..there different probabilities every tweet concerning buy,hold , sell.so far good.
however when predict training set with:
predict(classifier, trainingset[,2], type="raw")
i classification based only on a-priori probabilities, means every tweet classified hold (because "hold" had largest share among sentiment). every tweet has same probabilities buy, hold, , sell:
+--------------------------------------------------+ **id | buy | hold | sell** 1 |0.25 | 0.5 | 0.25 2 |0.25 | 0.5 | 0.25 3 |0.25 | 0.5 | 0.25 .. |..... | .... | ... n |0.25 | 0.5 | 0.25 +--------------------------------------------------+
any ideas i'm doing wrong? appreciate help!
thanks
it looks trained model using whole sentences inputs, while seems want use words input features.
usage:
## s3 method class 'formula' naivebayes(formula, data, laplace = 0, ..., subset, na.action = na.pass) ## default s3 method: naivebayes(x, y, laplace = 0, ...) ## s3 method class 'naivebayes' predict(object, newdata, type = c("class", "raw"), threshold = 0.001, ...)
arguments:
x: numeric matrix, or data frame of categorical and/or numeric variables. y: class vector.
in particular, if train naivebayes
way:
x <- c("john likes cake", "marry likes cats , john") y <- as.factor(c("good", "bad")) bayes<-naivebayes( x,y )
you classifier able recognize these 2 sentences:
naive bayes classifier discrete predictors call: naivebayes.default(x = x,y = y) a-priori probabilities: y bad 0.5 0.5 conditional probabilities: x x y john likes cake marry likes cats , john bad 0 1 1 0
to achieve word level classifier need run words inputs
x <- c("john","likes","cake","marry","likes","cats","and","john") y <- as.factors( c("good","good", "good","bad", "bad", "bad", "bad","bad") ) bayes<-naivebayes( x,y )
you get
naive bayes classifier discrete predictors call: naivebayes.default(x = x,y = y) a-priori probabilities: y bad 0.625 0.375 conditional probabilities: x y , cake cats john likes marry bad 0.2000000 0.0000000 0.2000000 0.2000000 0.2000000 0.2000000 0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000
in general r
not suited processing nlp data, python
(or @ least java
) better choice.
to convert sentence words, can use strsplit
function
unlist(strsplit("john likes cake"," ")) [1] "john" "likes" "cake"
Comments
Post a Comment