Use of the data set Iris flower data set



unsatisfactory k-means clustering result (the data set not cluster known classes) , actual species visualized using elki



an example of so-called metro map iris data set. small fraction of iris-virginica mixed iris-versicolor. other samples of different iris species belong different nodes.


based on fisher s linear discriminant model, data set became typical test case many statistical classification techniques in machine learning such support vector machines.


the use of data set in cluster analysis not common, since data set contains 2 clusters rather obvious separation. 1 of clusters contains iris setosa, while other cluster contains both iris virginica , iris versicolor , not separable without species information fisher used. makes data set example explain difference between supervised , unsupervised techniques in data mining: fisher s linear discriminant model can obtained when object species known: class labels , clusters not same.


nevertheless, 3 species of iris separable in projection on nonlinear branching principal component. data set approximated closest tree penalty excessive number of nodes, bending , stretching. so-called metro map constructed. data points projected closest node. each node pie diagram of projected points prepared. area of pie proportional number of projected points. clear diagram (left) absolute majority of samples of different iris species belong different nodes. small fraction of iris-virginica mixed iris-versicolor (the mixed blue-green nodes in diagram). therefore, 3 species of iris (iris setosa, iris virginica , iris versicolor) separable unsupervising procedures of nonlinear principal component analysis. discriminate them, sufficient select corresponding nodes on principal tree.








Comments

Popular posts from this blog

CACHEbox ApplianSys

Kinship systems Apache

Western Apache Apache