python - y from sklearn.datasets.make_classification -
in sklearn.datasets.make_classification, how class y calculated? let's run his:
from sklearn.datasets import make_classification x, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_classes=2, n_clusters_per_class=1, random_state=0)
what formula used come y's x's? documentation touches on when talks informative features:
the number of informative features. each class composed of number of gaussian clusters each located around vertices of hypercube in subspace of dimension n_informative. each cluster, informative features drawn independently n(0, 1) , randomly linearly combined in order add covariance. clusters placed on vertices of hypercube.
thanks,
g
the y not calculated, every row in x gets associated label in y according class row in (notice n_classes variable). of these labels possibly flipped if flip_y greater zero, create noise in labeling.
edit: giving example
for example, assume want 2 classes, 1 informative feature, , 4 data points in total. assume 2 class centroids generated randomly , happen 1.0 , 3.0. every data point gets generated around first class (value 1.0) gets label y=0 , every data point gets generated around second class (value 3.0), gets label y=1. example x1's first class might happen 1.2 , 0.7. second class, 2 points might 2.8 , 3.1. have 4 data points, , know class generated, final data be:
y x1 0 1.2 0 0.7 1 2.8 1 3.1
as see, there nothing calculated, assign class randomly generate data
Comments
Post a Comment