python - y from sklearn.datasets.make_classification -


in sklearn.datasets.make_classification, how class y calculated? let's run his:

from sklearn.datasets import make_classification x, y = make_classification(n_samples=1000, n_features=2, n_informative=2,                            n_classes=2, n_clusters_per_class=1, random_state=0) 

what formula used come y's x's? documentation touches on when talks informative features:

the number of informative features. each class composed of number of gaussian clusters each located around vertices of hypercube in subspace of dimension n_informative. each cluster, informative features drawn independently n(0, 1) , randomly linearly combined in order add covariance. clusters placed on vertices of hypercube.

thanks,

g

the y not calculated, every row in x gets associated label in y according class row in (notice n_classes variable). of these labels possibly flipped if flip_y greater zero, create noise in labeling.

edit: giving example

for example, assume want 2 classes, 1 informative feature, , 4 data points in total. assume 2 class centroids generated randomly , happen 1.0 , 3.0. every data point gets generated around first class (value 1.0) gets label y=0 , every data point gets generated around second class (value 3.0), gets label y=1. example x1's first class might happen 1.2 , 0.7. second class, 2 points might 2.8 , 3.1. have 4 data points, , know class generated, final data be:

y x1 0 1.2 0 0.7 1 2.8 1 3.1 

as see, there nothing calculated, assign class randomly generate data


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -