Outliers Analysis with Fuzzy Clustering Model

V. Deneshkumar; K. Senthamarai Kannan; M. Manikandan

Outliers Analysis with Fuzzy Clustering Model

V. Deneshkumar, K. Senthamarai Kannan, M. Manikandan

Abstract

Outlier detection is important in many fields. In statistics, an outlier is a observation that is numerically far-away from the rest of the data. The handling of outlying observations in a data set is one of the most important tasks in data pre-processing. The large data base can be classified in an unsupervised manner using clustering and classification algorithms. Fuzzy C-means is a method of clustering
which was developed by Dunn in (1973) and improved by Bezdek in (1981). This allocates one piece of data in two or more clusters and it is frequently used in pattern recognition. Herein a proposed method based on Fuzzy approach which combines outlier analysis and clustering technique is presented. Clustering validation technique adaptively evaluated the results of a clustering algorithm. A numerical
example is provided for illustration using iris data set.

Keywords

Outlier Detection, Fuzzy Clustering, Silhouette Index, FCM Algorithm, and Random Number Simulation

Full Text:

PDF

References

Ahmad .A and L. Dey., “A k-mean clustering algorithm for mixed numeric and categorical data”, Data and Knowledge Engineering Elsevier Publication, Vol.63, pp.503-527, 2007.

Almeida. J, Barbosa.L, Pais. A and S. Formosinho., “Improving Hierarchical cluster Analysis: A new method with outlier detection and automatic clustering”, Chemometrics and intelligent laboratory systems, pp.208-217, 2007.

Barnett. V and Lewis.T., Outliers in Statistical Data, John Wiley & Sons, New York, 1994.

Chaira.T, “A Novel Intuitionistic Fuzzy C-Means Clustering Algorithm and Its Application to Medical Images”, Applied Soft Computing, Vol.11, pp.1711–1717, 2011.

Dan Li, Hong Gu, Liyong Zhang., “A Fuzzy C-Means Clustering Algorithm Based on Nearest-Neighbor Intervals for Incomplete Data”, Expert Systems with Applications, Vol.37, pp.6942–6947, 2010.

David Ben-Arieh and Deep Kumar Gullipalli., Data Envelopment Analysis of clinics with sparse data: Fuzzy clustering approach, Computers & Industrial Engineering, vol.63, pp.13–21, 2012.

Deneshkumar. V and Senthamarai kannan. K. “Outliers Detection in Time Series Data Mining”, CiiT International Journal of Data Mining Knowledge Engineering, Vol.3 (6), pp.1-5. 2011.

Hawkins. D.M. Identification of Outliers, Chapman and Hall, 1980.

Hodge, V.J. “A survey of outlier detection methodologies”, Kluver Academic Publishers, Netherlands, January, 43, 2004.

Iliadis. L.S., Vangeloudh. M., Spartalis.S.., “An Intelligent System Employing an Enhanced Fuzzy C-Means Clustering Model: Application in the Case of Forest Fires”, Computers and Electronics in Agriculture, Vol.70, pp. 276–284, 2010.

Iris Data Base in: http://archive.ics.uci.edu/ ml/datasets/Iris

Jiang .M.F, Tseng S.S and Su. C.M. “Two-phase clustering process for outliers detection”, Pattern recognition letters, Vol.22, pp.691-700, 2001.

Krista Rizman Z., “An efficient k-means clustering algorithm”, Pattern Recognition Letters, Vol.29, 1385–1391, 2008.

Kuo Lung Wu., “Analysis of Parameter Selections for Fuzzy C-Means”, Pattern Recognition, Vol.45, pp.407–415, 2012.

Moh’d Belal Al- Zoubi, “An Effective Clustering-Based Approach for Outlier Detection”, European Journal of Scientific Research, Vol.28 (2), pp.310-316, 2009.

Nedret Billor and Gulsen Kiral, “A Comparision of multiple outlier Detection methods for regression data”,Communication in statistics, Vol.37, pp.521-545,2008.

Penny K.I and Jolliffe I.T., “A Comparison of Multivariate Outlier Detection Methods for Clinical Laboratory Safety Data”, Journal of the Royal statistical Society. Series D (The Statistician), Vol.50 (3), pp.295-308, 2001.

Petrovskiy, M.I., “Outlier Detection Algorithms in Data Mining Systems”, Programming and Computer Software, Vol.29 (4), pp.228–23, 2003.

Rousseeuw.P.J.,”Silhouettes: a graphical aid to the interpretation and validation of cluster analysis”, Journal of the computational and applied mathematics, Vol.20, pp.53-65, 1987.

Sadaaki Miyamoto, Hidetomo Ichihashi and Katsuhiro Honda., Algorithms for Fuzzy Clustering, Springer- Verlag. 2008.

Tina Geweniger , Dietlind Zulke , Barabara Hammer, Thomas Villmann., “Median Fuzzy C-Means for Clustering Dissimilarity Data”, Neuro computing, Vol.73, pp.1109–1116, 2010.

Wen-Liang Hung, Miin-Shen Yang and E. Stanley Lee., “A Robust Clustering Procedure for Fuzzy Data”, Computers & Mathematics with Applications. Vol 60(1), pp.151-165, 2010.

Yu, D. Sheikholeslami G. and Zhang, A., “FindOut: Finding Outliers in Very Large Datasets”. Knowledge and Information Systems, Springer-Verlag, London, Vol.4, pp.387-412, 2002.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me