Analysis of Anchor Text based on Pattern Growth Graph Algorithm for Name Alias Detection System
Abstract
Identifying the correct alias for person‟s name playing a crucial role in the field of information retrieval, sentiment analysis, and person name disambiguation as well as in biomedical fields. Traditional system provides the solution on solving lexical ambiguity, but it lagged on the problem of referential ambiguity. Through this paper we emphasis on referential ambiguity to extract correct alias for a given name. Given a person name and/or with context data such as location, organization retrieves top-K snippets from a web search engine. With the help of Lexical-pattern extract candidate aliases. As to find correct alias from a list of aliases we used anchor text analysis based on link and forming graph with link called as in-link and out-link. Anchor text analysis used co-train algorithm for preprocessing and after that prepared a set of anchor text word. For rank a node from graph we integrate various similarity measures such as dice, Jaccard coefficient for word relation along with degree distribution and clustering coefficient. There by our method providing more promising result in terms to improve the precision and minimize the recall that than the previous baseline method.
Keywords
Full Text:
PDFReferences
Sumitra Jakhete, Shweta Dharmadhikari, Madhuri Chopade ,”Name Alias Detection system using graph mining method “,2nd international conference on computer application, pondicheery, Jan 2012.
Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka, Member, IEEE “Automatic Discovery of Personal Name Aliases from the Web,” IEEE Transaction on knowledge and data engineering, vol. 23, no. 6, june 2011.
Dmitri V. Kalashnikov Zhaoqi Chen Rabia Nuray-Turan Sharad Mehrotra Zheng Zhang “Web People Search via Connection Analysis,” IEEE International Conference on Data Engineering, 2009.
Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka, “Extracting Key Phrases to Disambiguate personal names on the web” in CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing,2006
C. Galvez and F. Moya-Anegon, “Approximate Personal Name- Matching through Finite-State Graphs,” J. Am. Soc. for Information Science and Technology, vol. 58, pp. 1-17, 2007
T. Hokama and H. Kitagawa, “Extracting Mnemonic Names of People from the Web,” Proc. Ninth Int‟l Conf. Asian Digital Libraries (ICADL ‟06), pp. 121-130, 2006. [7] Christian Borgelt,” Graph Mining: An Overview” Proc. 19th GMA/GI Workshop Computational Intelligence, Germany 2009.
Md. Rafiqul Islam and Md. Rakibul Islam “An Effective Term Weighting Method Using Random Walk Model for Text Classification” In Proceedings of 11th International Conference on Computer and Information Technology (ICCIT 2008) 25-27 December, 2008, Khulna, Bangladesh.
A. Bagga and B. Baldwin, “Entity-Based Cross-Document Coreferencing Using the Vector Space Model,” Proc. Int‟l Conf. Computational Linguistics (COLING ‟98), pp. 79-85, 1998.
D. Kavitha “A Survey on Assorted Approaches to Graph Data Mining” in International Journal of Computer Applications (0975 – 8887) Volume 14– No.1, January 2011.
G. Salton and M. McGill, Introduction to Modern Information Retreival. McGraw-Hill Inc., 1986.
Soumen Chakrabarti Mining the web :Discover the web form hypertext data, ISBN 1558607544 Elsevier,2003.
Michael Berry Text Mining Application and Theory ,John Wiley and Sons Ltd. ,2010.
Deepayan Chakrabarti,Yiping Zhany, Christos Faloutsos “R-MAT: A Recursive Model for Graph Mining “ In Proceedings of the 2004 SIAM International Conference on Data Mining,2004.
www.googleapi.com//04-5-2012.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.