Open Access Open Access  Restricted Access Subscription or Fee Access

Synonym based Document Clustering using Thesaurus

A. Rajeswari, Dr. M. Kannan

Abstract


A Synonym based document clustering approach is proposed to cluster more document related to the user query. The synonym of the word is got from online thesaurus. Document clustering is one of the concepts in data mining. Many techniques are used for clustering. In the existing synonyms of the word and their synonyms stored in the database by the user. User should store all the words one by one so it takes more time. Sometimes all the words could not be stored in the database. If the word has more than one synonym it will be complex. In this proposed synonyms are got from the thesaurus.com (online library). In this method both the user entered keyword and their synonyms also clustered. Tf- idf method is used for ranking the clustered documents by using c#.net code. So it gives more relevant and accurate results of the user query. For experimental purpose we have used some text files. It gives better performance than the existing method and there is no need to maintain the database.


Keywords


Document Clustering, Synonym Based Search, TF-Idf, Thesaurus.

Full Text:

PDF

References


Palvi Arora and Bhalla, “A Synonym Based Approach of Data Mining in Search Engine Optimization”, IJCIT, ISSN: 2231 – 2803, Vol.12, No.4, 2014, pp.201- 204.

Rajendra Kumar Roul et.al, “Web Document clustering and Ranking using Tf – Idf based Apriori Approach”, ICACEA, 2014, pp.34 -39.

Neeraj Raheja and V.K Katiyar, “Efficient web data Extraction using Clustering approach in Web usage mining”, IJCSI, ISSN: 1694-0814, Vol.11, No.2, 2014, pp. 216 - 224.

Suhas Gore and Nitin Pise, “Dynamic Algorithm selection for Data mining Classification”, IJSER, ISSN: 2229 – 5518, Vol.4, No.4, 2013, pp.2029 -2033.

Sweah Liang Yong et.al, “Ranking Web Pages using Machine learning Approaches”, IEEE, 2008, pp.677-680.

Aixin Sun et.al, “Web Classification Using Support Vector Machine” 2002.

Keyur J. Patel and Ketan J Sarvakar , “Web Page Classification Using Data mining”, IJARCCE, ISSN : 2319-5940, Vol.2, 2013, pp. 2513- 2519.

Jyoti Gautam and Ela Kumar, “An Integrated and Improved Approach to Terms Weighting in Text Classification”, IJCSI, ISSN: 1694-0784, Vol .10, No.1, 2013.

Kabita Thaoroijam, “A Study on Document Classification using Machine Learning Techniques”, IJCSI, ISSN: 1694-0784, Vol.11, No.1, 2014.

Pikakshi Manchanda et.al, “On the Automated Classification of Web Pages Using Artificial Neural Network”, IOSRJCE, ISSN: 2278-066, Vol.4, 2012, pp. 20-25.

Sini Shibu et.al, “A combination approach for web page Classification using Page Rank and Feature Selection Technique”, IACSIT, Vol.2,No.6,2010.

Krishnan Kant Lavania et.al, “Google: A Case Study (Web Searching and Crawling)”, IJCTE, Vol.5, No.2, 2013, pp. 337-340.

Paul N. Bennett et.al, “Classification-Enhanced Ranking”, 2010.

Apostolos Kritikopoulos et.al, “Word rank: A method for ranking web pages based on content similarity”, IEEE, 2007.

Kavitha S and Vijaya M.S, “Web Page Categorization using Multilayer Perceptron with Reduced Features”, IJCA, Vol.65, No.1, 2013, pp. 22 -27.

Francy .J.,(M.E) et.al, “Quey Based Expert search Based on Relevance Class and web page Quality Ranking”, IJCTT, ISSN: 2231-2803, Vol.7, No.4, 2014, pp. 200-206.

Debajyoti Mukhopadhyay et.al, “A Syntactic Classification based Web Page Ranking Algorithm”, MSPT 2006.

Sonal Vaghela et.al, “Web Page classification using Term Frequency”, IJTRE, ISSN: 2347- 4718, Vol.1, 2014. pp.949 – 953.

Alamelu mangai et.al, “A Novel Approach for Automatic Web Page Classification using Feature Intervals”, IJCSI, ISSN: 1694 – 0814, Vol 9, No. 2, 2012, pp 282 – 287.

K.Selvakuberan et.al, “Combined Feature Selection and Classification – A novel approach for the categorization of web pages”, ISSN: 1746-7659, Vol.3, No.2, 2008, pp. 83-89.

Md. Mahbubur Rahman et.al,” An effective Ranking method of webpage through TFIDF and hyperlink classified page rank”, IDDKP, Vol.3, No.4, July 2013.

Mehrnaz Mottahedi et.al, “IBCAV: Intelligent Based Clustering Algorithm in VANET”, IJCSI, ISSN: 1694-0784, Vol.10, No.2, 2013, pp. 538- 543.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.