Author : Mr.Asadi Srinivasulu 1
Date of Publication :15th March 2017
Abstract: Clustering is the process of organizing objects into groups whose members are similar in some way or differ significantly from other objects. There are two approaches viz., pre-clustering and post-clustering. Pre-clustering is an unsupervised learning that assigns labels to objects in unlabeled data. The important pre-clustering approaches that we have considered are Dark Block Extraction (DBE), Cluster Count Extraction (CCE) and Co-VAT (Visual Assessment of Cluster Tendency). The present work focuses on pre-clustering approach. The limitations of these pre-clustering algorithms are i) DBE can’t handle the large data ii) CCE suffers because of perplexing iii) Co-VAT works with only rectangular data. Our work proposes Extended Dark Block Extraction (EDBE), Extended Cluster Count Extraction (ECCE) and Extended co-VAT to overcome the above said limitations. The following five steps results after integrating pre and post clustering approaches. They are 1) Extracting a VAT image of an input dissimilarity matrix. 2) Performing image segmentation on the VAT image to obtain a binary image, followed by directional morphological filtering. 3) Applying a distance transform to the filtered binary image and smoothing the pixel values on the main diagonal axis of the image to form a smoothening signal. 4) Applying first-order derivative and fast fourier transformation on smoothened signal for detecting major peaks and valleys. 5) Now post-clustering approach i.e. k-means algorithm is applied to the major peaks and valleys in-order to obtain refined clusters. The proposed algorithms viz., EDBE, ECCE and Extended Co-VAT uses VAT as well as the combination of several image processing techniques are applied on various real world data sets like IRIS, WINE and Image Data sets. These extended approaches use Reordered Dissimilarity Image (RDI) that highlights potential clusters as a set of 'Dark blocks' along the diagonal of the image. The simulation results show that EDBE, ECCE, Extended co-VAT outperform DBE, CCE and co-VAT in terms of time-complexity and accuracy of labeled and unlabeled data.
Reference :
-
- Liang Wang, Christopher Leckie, Kotagiri Ramamohanarao, and James Bezdek, “Automatically Determining the Number of Clusters in Unlabeled Data Sets”, Vol. 21, No. 3, pp. 335-350, Fellow, IEEE- March 2009 .
- Timothy C. Havens, Senior Member, IEEE, and James C. Bezdek, “An Efficient Formulation of the Improved Visual Assessment of Cluster Tendency (iVAT) Algorithm”, Vol. 21, No. 3, pp. 335-350, Fellow, IEEE, 2012.
- Timothy C. Havens1, James C. Bezdek1, and James M. Keller1, “A New Implementation of the co-VAT Algorithm for Visual Assessment of Clusters in Rectangular Relational Data”, Vol. 21, No. 3, pp. 335-350, Fellow, IEEE, 2012.
- Ahmad A, Dey L (2007) K-Mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering63: 503-527, Volume 63, Issue 2, November 2007.
- Azuaje F, Dubitzky W, Black N, Adamson K (2000) Discovering relevance knowledge in data: a growing cell structures approach. IEEE transactions on systems, man, and cybernetics Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society30: 448- 460, Volume: 30 Issue: 3, Jun 2000.
- Bandyopadhyay S, Saha S (2008) A point symmetry-based clustering technique for automatic evolution of clusters. Knowledge and Data Engineering, IEEE Transactions on20: 1441- 1457, Volume: 20, Issue: 11, Nov. 2008
- Belkin M, Niyogi P (2001) Laplacian Eigen maps and spectral techniques for embedding and clustering. Advances in neural information processing systems14: 585-591, Volume 15 Issue 6, June 2003.
- Bezdek JC, Hathaway RJ (2002) VAT: A tool for visual assessment of (cluster) tendency. In Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint Conference on, Vol. 3, pp 2225-2230, 2002
- Bezdek JC, Hathaway RJ, Huband JM (2007) Visual assessment of clustering tendency for rectangular dissimilarity matrices. Fuzzy Systems, IEEE Transactions on15: 890-903, Volume: 15 Issue: 5, Oct. 2007.
- Bezdek JC, Pal NR (1998), some new indexes of cluster validity. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on28: 301-315, Volume: 28 Issue: 3, Jun 1998.
- Breitenbach M, Grudic GZ (2005) Clustering through ranking on manifolds. In Proceedings of the 22nd international conference on Machine learning, pp 73-80, doi:10.1016/j.neucom.2009.03.012, & 2009.
- Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Communications in Statisticstheory and Methods3: 1-27, Volume: 41, Issue: 12, 2012.
- Cattell R (1944) A note on correlation clusters and cluster search methods. Psychometrika 9: 169-184, Volume 9, Issue 3, September 1944
- Cross VV, Sudkamp TA (2002) Similarity and compatibility in fuzzy set theory: Assessment and Applications, Vol. 93: Physical Verlag, 2002.
- Czekanowski J (1909) Zur differential diagnose derNeand ertalgruppe: Friedr. Vieg & Sohn, DOI: 10.1371/journal.pone.0136550, September 29, 2015.
- Dhillon IS, Modha DS, Spangler WS (1998) Visualizing class structure of multidimensional data. Computing Science and Statistics: 488-493, volume = "30", year = "1998".
- Floodgate G, Hayes P (1963) The Adansonian taxonomy of some yellow pigmented marine bacteria. Journal of General Microbiology30: 237- 244, Volume 30, Issue 2, 1963.
- Garai G, Chaudhuri B (2004) A novel genetic algorithm for automatic clustering. Pattern Recognition Letters25: 173-187, Volume 25 Issue 2, 19 January 2004.
- Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council13: 780-784, Volume: 13 Issue: 3, May 2002.
- Gonzalez RC, Woods RE, Eddins SL (2009) Digital image processing using MATLAB, Vol. 2: Gatesmark Publishing Tennessee, 2009.
- Grünwald P, Kontkanen P, Myllymaki P, Silander T, Tirri H (1998) Minimum encoding approaches for predictive modeling. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pp 183-192, 1998.