Open Access Open Access  Restricted Access Subscription or Fee Access

Clustering Based Outlier Detection Using K-Means Strategy

S. Vasuki, Dr. K. Subramanian

Abstract


The process of detecting outliers is a surveillance that comes into view to move away patently from other surveillances in the model. This arrangement is planned to demonstrate the fixations happened among the explorations occurred between client, and server. In the practical scenario all the individuals obviously are familiar with the procedure of how to transfer a request for the meticulous requirements, and how to get a comeback for that demand. On the other hand no one knows about the inside process of searching information from a huge database. Clustering is one of the best known techniques to maintain the information efficiently into the database. Clustering employs grouping of similar objects (similarity in terms of data content or there may be any other factors also). Outlier detection is one of the main divisions of data mining and deserves further research attention from data mining community. The brilliant technique for text classification process is called Feature Selection. These processes merge with k-means and produce more effective result. Words in the feature vector are grouped and forming a header to that group based on the similarity test. Each cluster is formed based on the behavior of the text with other text and the average mean value. Same words into the cluster are grouped together and produce better data maintenance as well as through this process the data searching by the user is also categorized and fledged in a probabilistic analytical manner. This paper primarily focuses on comparing various outlier detection methods based on clustering and association rule applications and also prove that this present approach is efficient enough to find the outliers and represent the outlier as the cluster head.

Keywords


Clustering, Data Mining, Outlier Detection.

Full Text:

PDF

References


Ben-Gal I., Outlier detection, In: Maimon O. and Rockach L. (Eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers," Kluwer Academic Publishers, 2005.

Kennt Ward Church and Patrick Hanks. Word association norms, mutual information and lexicography. In proceedings of ACL 27, pages76-83.

R. Agrawal, R. Srikant, “Fast algorithms for mining association Rules”,Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 478–499

Al-Zoubi, M. (2009) An Effective Clustering-Based Approach for Outlier Detection, European Journal of Scientific Research.

Jiang, S. And An, Q. (2008) Clustering Based Outlier Detection Method, Fifth International Conference on Fuzzy Systems and Knowledge Discovery. John Peter. S., Department of computer science and research center St. Xavier’s College, Palayamkottai, An Efficient Algorithm for Local Outlier Detection Using Minimum Spanning Tree, International Journal of Research and Reviews in Computer Science (IJRRCS), March 2011.

Loureiro, A., Torgo, L. And Soares, C. (2004) Outlier Detection using Clustering Methods: A Data Cleaning Application, in Proceedings of KDNet Symposium on Knowledge-Based Systems.Knorr, E. and Ng, R. (1997). A unified approach for mining outliers. In Proc. KDD, pp. 219–222.

Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, “LOF Identifying density-based local outliers”, Jörg Sander, 2000 ACM SIGMOD international conference on Management of data, pp. 93-104, ACM, New York, NY, USA.

Ian H. Witten and Eibe Frank, Morgan Kaufmann, “Data Mining: Practical Machine learning tools with Java implementations”, San Francisco 2000

J.Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of 2000 ACM-SIGMOD International Conference on Management of Data (SIGMOD’00), pp. 1–12, Dallas, TX, May 2000.

Forman, G.,an Experimental Study of Feature Selection Metrics for TextCategorization.Journal of Machine Learning Research, 3 2003, pp. 1289-1305.

Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in SVMs. In NIPS 15, 2002.

Frayling N., Mladenic D., “Interaction of Feature Selection Methods and Linear Classification Models”Proc. of the 19th International Conference on Machine Learning, Australia, 2002.

Torkkola K., “Discriminative Features for Text Document Classification”, Proc.International Conference on Pattern Recognition,Canada, 2002

L. Bottou and Y. Bengio, ªConvergence Properties of the k-means Algorithms,º Advances in Neural Information Processing Systems 7, G. Tesauro and D. Touretzky, eds., pp. 585-592. MIT Press, 1995.

P.S. Bradley and U. Fayyad, ªRefining Initial Points for K-means Clustering,º Proc. 15th Int'l Conf. Machine Learning, pp. 91-99, 1998.

Ron Kohavi, George H. John.1997. Wrappers for feature subset Selection, Artificial Intelligence, Vol. 97, No. 1-2. pp. 273-324

Geo®rey J. McLachlan and Thriyambakam Krishnan. The EM Algorithm and Extensions. John Wiley & Sons, Inc., New York, 1997.

D.W. Aha and R. L. Blankert. Feature selection for case-based classification of cloud types. In Working Notes of th AAAI-94 Workshop on Case-Based Reasoning, pages 106–112, 1994.

Agrawal, Shipra and Krishnan, Vijay and Haritsa, Jayant R (2004)OnAddressingEfficiency Concerns in Privacy- Preserving Mining. Proceedings4th International ConferenceAdvances in Knowledge Discovery and DataMining, volume 3918 ofLecture Notes in Computer Science, pages 577–593.Springer Berlin / Heidelberg, 2006.

A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1–38, 1977.

Steve Carr and Ken Kennedy. Blocking linear algebra codes for memory hierarchies. In Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, Society for Industrial and Applied Mathematics, 1989.

Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT '98), p. 92 100, (1998)

Omidiora Elijah Olusayo and Olabiyisi Stephen Olatunde An Exploratory Study of K-Means and Expectation Maximization Algorithms Adigun


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.