Date of Publication :21st February 2018
Abstract: Cluster ensembles have been shown to be better than any standard clustering algorithm at improving accuracy and robustness across different data collections. This meta-learning formalism also helps users to overcome the dilemma of selecting an appropriate technique and the corresponding parameters, given a set of data to be investigated. Almost two decades after the first publication of a kind, the method has proven effective for many problem domains, especially microarray data analysis and its down- streaming applications. Recently, it has been greatly extended both in terms of theoretical modelling and deployment to problem solving. The survey attempts to match this emerging attention with the provision of fundamental basis and theoretical details of state-of-the-art methods found in the present literature. It yields the ranges of ensemble generation strategies, summarization and representation of ensemble members, as well as the topic of consensus clustering. This review also includes different applications and extensions of cluster ensemble, with several research issues and challenges being highlighted
Reference :
-
- D. Jiang, C. Tang, A. Zhang, Cluster analysis for gene expression data: A survey, IEEE Trans. Knowl. Data Eng. 16 (2004) 1370–1386.
- R.C. Wu, R.S. Chen, C.C. Chang, J.Y. Chen, Data mining application in customer relationship management of credit card business, in: Proceedings of interna- tional conference on Computer software and applications, 2005, pp. 39–40.
- S.K. Bhatia, J.S. Deogun, Conceptual clustering in information retrieval, IEEE Trans. Syst. Man Cybern. 28 (1998) 427–436.
- J. Zhang, J. Mostafa, H. Tripathy, Information retrieval by semantic analysis and visualisation of the concept space of D-Lib magazine, D-Lib Mag. 8 (2002).
- J.A.F. Costa, M. de Andrade Netto, Cluster analysis using self-organising maps and image processing techniques, Proc. IEEE Int. Conf. Syst. Man Cybern. 5 (1999) 367–372.
- H. Tao, T.S. Huang, Color image edge detection using cluster analysis, in: Proceedings of IEEE International Conference on Image Processing, 1997, pp. 834–836.
- G.S. Day, R.M. Heeler, Using cluster analysis to improve marketing experi- ments, J. Market. Res. 8 (1971) 340–347.
- A.G. Sheppard, The sequence of factor analysis and cluster analysis: Differ- ences in segmentation and dimensionality through the use of raw and factor scores, Tourism Anal. 1 (1996) 49–57. [9] D.B. Henry, P.H. Tolan, D. Gorman-Smith, Cluster analysis in family psychol- ogy research, J. Family Psychol. 19 (2005) 121–132.
- K. Kim, H. Ahn, A recommender system using GA K-means clustering in an online shopping market, Expert Syst. Appl. 34 (2008) 1200–1209.
- M. Bredel, C. Bredel, D. Juric, G. Harsh, H. Vogel, L. Recht, B. Sikic, Functional network analysis reveals extended gliomagenesis pathway maps and three novel MYC-interacting genes in human gliomas, Cancer Res. 65 (2005) 8679– 8689.
- E. Kim, S. Kim, D. Ashlock, D. Nam, MULTI-K: Accurate classification of microarray subtypes using ensemble k-means clustering, BMC Bioinform. 10 (2009) 260.
- T. Sorlie, R. Tibshirani, J. Parker, T. Hastie, J. Marron, A. Nobel, S. Deng, H. Johnsen, R. Pesich, S. Geisler, Repeated observation of breast tumor subtypes in independent gene expression data sets, Proc. Natl. Acad. Sci. USA 100 (2003) 8418–8423.
- A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: A review, ACM Comput. Survey 31 (1999) 264–323.
- A. Ahmad, L. Dey, A k-mean clustering algorithm for mixed numeric and categorical data, Data Knowl. Eng. 63 (2007) 503–527.
- Z. Huang, Claustering large data sets with mixed numeric and categorical values, in: Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, 1997, pp. 21–34.
- S. Dudoit, J. Fridyand, A prediction-based resampling method for estimating the number of clusters in a dataset, Genome Biol. 3 (2002) RESEARCH0036.
- T. Boongoen, Q. Shen, Nearest-neighbour guided evaluation of data reliability and its applications, IEEE Trans. Syst. Man Cybern. B 40 (2010) 1622–1633.
- W.M. Rand, Objective criteria for the evaluation of clustering methods, J. Amer. Statist. Assoc. 66 (1971) 846–850.
- N. Iam-On, T. Boongoen, S. Garrett, LCE: A linkbased cluster ensemble method for improved gene expression data analysis, Bioinformatics 26 (2010) 1513– 1519.