Document Clustering Using Firefly Algorithm
Abstract
Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most of these techniques relies on the initial value of k clusters. Such an approach may not be suitable as we may not have prior knowledge on the collection of documents. To date, there are various swarm based clustering techniques proposed to address such problem including this paper that explores the adaptation of Firefly Algorithm (FA) in document clustering. We extend the work on Gravitation Firefly Algorithm (GFA) by introducing a relocate mechanism that relocates assigned documents, if necessary. The newly proposed clustering algorithm, known as GFA_R, is then tested on a benchmarked dataset obtained from the 20Newsgroups. Experimental results on external and relative quality metrics for the GFA_R are compared against the one obtained using the standard GFA. It is learned that by extending GFA to becoming GFA_R, a better quality clustering is obtained.
Keywords
Full Text:
PDFReferences
Jusoh Shaidah and Alfawareh Hejab M., “Techniques Applications and Challenging Issue in Text Mining uses, Applications”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No 2, November 2012 ISSN (Online): 1694-0814
Gupta Vishal and Lehal Gurpreet S., “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence, VOL.1, NO.1, August, 2009
Shehata Shady, “Enhancing Text Clustering using Concept-based Mining Model”, Proceedings of the Sixth International Conference on Data Mining (ICDM'06) 0-7695-2701-9/06/2006
Khare Akhil, Jadhav Amol N., “An Efficient Concept-Based Mining Model For Enhancing Text Clustering” ,IJAET/Vol.II/ Issue IV/October-December, 2011
Shehata Shady, “A WordNet-based Semantic Model for Enhancing Text Clustering”, IEEE International Conference on Data Mining Workshops, IEEE, 2009
Steinbach Michael, “A Comparison of Document Clustering Techniques”, University of Minnesota, Technical Report #00-034 (2000).
Azaryuon Kayvan , Fakhar Babak, “A Novel Document Clustering Algorithm Based on Ant Colony Optimization Algorithm”, Journal of mathematics and computer Science Vol.7 , pp. 171 -180, 2013.
Abdel Hamid Nihal M., AbdelHalim M.B. & Fakhr M.W., “Document clustering using Bees Algorithm‖”, International Conference of Information Technology, IEEE, Indonesia, 2013.
Salton G. and McGill M. J., Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
Miller G. A., “Wordnet: a lexical database for English,” Commun. ACM, vol. 38, no. 11, pp. 39–41, 1995.
Drakshayani B. and Prasad E.V., “Semantic Based Model for Text Document Clustering with Idioms”, International Journal of Data Engineering (IJDE), Volume (4): Issue (1):2013
Charu C. Aggarwal, ChengXiang Zhai,” A SURVEY OF TEXT CLUSTERING ALGORITHMS”.
Neepa Shah, Sunita Mahajan,” Document Clustering: A Detailed Review” International Journal of Applied Information Systems (IJAIS) – ISSN: 2249-0868 Foundation of Computer Science FCS, New York, USA Volume 4– No.5, October 2012
Athraa Jasim Mohammed, Yuhanis Yusof, Husniza Husni,” Document Clustering Based on Firefly Algorithm” Journal of Computer Science 2015, 11 (3): 453.465 DOI: 10.3844/jcssp.2015.453.465.
Rekha Behgal, Dr. Renu Dhir,” A Frequent Concepts Based Document Clustering Algorithm”, International Journal of Computer Applications (0975 – 8887) Volume 4 – No.5, July 2010.
Refbacks
- There are currently no refbacks.