Author : Prathima V.R 1
Date of Publication :30th June 2018
Abstract: - Data mining and machine learning has been an important research topic in recent years. The topic of sparsely distributed data and the various issues related with curse of dimensionality of high-dimensional data, make most of the traditional clustering algorithm, lose action in high dimensional space. Therefore, clustering of data in high dimensional space is becoming the hot research areas. Clustering in data mining can be used as a data exploration or future prediction tool. With the advent of raise in huge data or high dimensional data such as DNA arrays, Images or GPS data, bag-of-words document representation etc., the goal of clustering is to group multiple data points in such way that they can be represented more efficiently for better understanding of the data. In this context we study the pitfalls of high dimensional data clustering concepts and algorithms are discussed then we study the SSC, SSSC, SMRS algorithms. This paper offers a subset based algorithm for automatically determining the optimal number of clusters on high dimensional data. The Main aim of this paper is to design an algorithm with reasonable complexity which computes representatives and clustering high-dimensional data accurately. In this paper we have made the following contributions i.e. designing of algorithm which used the divide-and-conquer strategy, which can able to compute the representatives within reasonable time and this algorithm is named as Hierarchical Sparse representatives.