A Semantic Search Engine using Semantic Similarity Measure Between Words

M. Karthiga; S. Sankarananth

doi:10.36039/AA042013008

A Semantic Search Engine using Semantic Similarity Measure Between Words

M. Karthiga, S. Sankarananth

Abstract

Measuring semantic similarity between words is very useful in information retrieval. Semantic similarity measure is so useful in many applications, and in the proposed work it is used to create a model Semantic Search Engine. The Semantic Search Engine uses in one hand a Technical Database for computer technology and a Semantic Similarity database to retrieve the resultant Web page for the query word. When a query word is given in the user interface the search engine first searches for the word in the technical database if the word is present the respective Webpage is displayed. If the word is not present in the technical database then the query word is searched in the semantic similarity database. If there are any similar words for the query word those words are displayed as recommendations to the user. The user has to select one of the similar words from the recommendation and accordingly the result page is retrieved. The semantic similarity measure between the words is evaluated using both Pearson correlation coefficient and Spearman correlation coefficient. The time taken to retrieve the relevant Webpage in semantic search engine is compared with normal search engine. The Precision and Recall is calculated for semantic search engine and the results are compared with normal search engine

Keywords

Information retrieval, Precision, Recall, Search engine, user generated content

Full Text:

PDF

References

Bollegala D, Matsuo Y, and Ishizuka M (2011),”Measuring semantic similarity between words using web search engines”, IEEE Transactions on Knowledge and Data Engineering, vol.23, Issue 7, pp.977-990.

Chen H, Lin M, and Wei Y (2006), “Novel Association Measures Using Web Search with Double Checking”, Proceedings of the 21st International Conference on Computational Linguistics, pp. 1009-1016.

Cilibrasi R and Vitanyi P (2007), “The Google Similarity Distance,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, Issue 3, pp. 370-383.

Church K and Hanks P (1991),” Word Association Norms, Mutual Information and Lexicography,” Computational Linguistics, vol. 16, pp. 22-29.

Hearst M (1992), “Automatic Acquisition of Hyponyms from Large Text Corpora,” Proceedings of the 14th Conference on Computational Linguistics (COLING), pp. 539-545.

Hirst G and St-Onge D (1998), “Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms,” WordNet: An Electronic Lexical Database, pp. 305-332, MIT Press.

Hughes T and Ramage D (2007), “Lexical Semantic Relatedness with Random Graph Walks,” Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL ’07), pp. 581-589.

Imen Akermi and Rim Faiz (2012), “Semantic similarity measure based on multiple resources”, Proceedings of the International Conference on Information Technology and e-Services, pp.546-550.

Kilgarriff A, “Googleology Is Bad Science (2007),” Computational Linguistics, vol. 33, pp. 147-151.

Lapata M and Keller F (2005), “Web-Based Models for Natural Language Processing,” ACM Transaction Speech and Language Processing, vol. 2, no. 1, pp. 1-3.

Lin D (1998), “An Information-Theoretic Definition of Similarity,” Proceedings of the 15th International Conference on Machine Learning (ICML), pp. 296-304.

Matsuo Y, Sakaki T, Uchiyama K and Ishizuka M (2006),” Graph-based word clustering using web search engine”, Proceedings of EMNLP, pp. 523-530.

Mclean D, Li Y, and Bandar Z. A (2003), “An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, Issue 4, pp. 871-882.

Pei T, Han J, Mortazavi-Asi B, Wang J, Pinto H, Chen Q, Dayal U, and Hsu M (2004), “Mining Sequential Patterns by Pattern- Growth: The Prefixspan Approach,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1424-1440.

Pasca M, Lin D, Bigham J, Lifchits A, and Jain A (2006), “Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge,” Proceedings of National Conference on Artificial Intelligence (AAAI ’06).

Rada R, Mili H, Bichnell E, and Blettner M (1989), “Development and Application of a Metric on Semantic Nets”, IEEE Transaction Systems, Man and Cybernetics, vol. 19, Issue 1, pp. 17-30.

Resnik P (1995), “Using Information Content to Evaluate Semantic Similarity in a Taxonomy”, Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp.448-453.

Rosenfield R (1996), “A Maximum Entropy Approach to Adaptive Statistical Modelling,” Proceedings on Computer Speech and Language, vol. 10, pp. 187-228.

Sahami M and Heilman T (2006), “A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets”, Proceedings of the 15th International World Wide Web Conference, pp.326-331.

Schickel-Zuber V and Faltings B (2007), “OSS: A Semantic Similarity Function Based on Hierarchical Ontologies,” Proceedings of International Joint Conference on Artificial Intelligence (IJCAI ’07), pp. 551-556.

Siddharth P, Banerjee S and Pedersen T (2003),”Using measures of semantic relatedness for word sense disambiguation”, Proceedings of the Fourth International Conference on Intelligent on Text Processing and Computational Linguistics, Mexico City, Mexico, pages 241-257.

Snow R, Jurafsky D, and Ng A (2005), “Learning Syntactic Patterns for Automatic Hypernym Discovery,” Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1297-1304.

Strube M and Ponzetto S.P (2006), “Wikirelate! Computing Semantic Relatedness Using Wikipedia,” Proceedings of National Conference on Artificial Intelligence(AAAI ’06), pp. 1419-1424.

Turney P.D (2001), “Mining the web for synonyms: Pmi-ir versus lsa on toefl”, proceedings of ECML, pp. 491–502.

Wu Z and Palmer M (1994), “Verb Semantics and Lexical Selection,” Proceedings of Ann. Meeting on Assoc. for Computational Linguistics (ACL ’94), pp. 133-138.

DOI: http://dx.doi.org/10.36039/AA042013008

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me