Open Access Open Access  Restricted Access Subscription or Fee Access

Disambiguating the Appearances of People by an Automatic Discovery Model of Personal Name

D. Deepika, P. Betty

Abstract


The most common activities of internet users are searching for information. Retrieving information about people from web search engines can become difficult when a person has nicknames or name aliases or in different names. Thus there is an important issue of knowing the exact name of the user in case of information reclamation, outlook analysis, personal name disambiguation, and relation extraction. Identifying aliases of a name are important in information retrieval. Automatically extracted lexical pattern-based approach is used to efficiently extract a large set of candidate aliases from snippets retrieved from a web search engine. The proposed method comprises two main components: pattern extraction, and pseudonym extraction and ranking. Using a seed list of name-alias pairs, first extract lexical patterns that are frequently used to convey information related to alias on the web. The extracted patterns are then used to find candidate pseudonyms or aliases for a given name. They define various ranking scores using the hyperlink structure on the web and page counts retrieved from a search engine to identify the correct aliases among the extracted candidates. Reduction in generation of infrequent candidates will improve the speed of mining process to many folds making the algorithm highly efficient.


Keywords


Information Retrieval; Web Mining, Information Mining, Text Analysis.

Full Text:

PDF

References


X. Wan, J. Gao, M. Li, and B. Ding. Person resolution in person search results: Webhawk. CIKM ‟05:Proceedings of the 14th ACM international conference on Information and knowledge management, pages 163–170, 2005.

R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 3–7, 2006.

R. Guha and A. Garg, “Disambiguating People in Search,” technical report, tanford Univ., 2004.

A. Bagga and B. Baldwin, “Entity-based cross-document coreferencing using the Vector Space Model,” Proceedings of the 17th international conference on Computational linguistics-Volume 1, pages 79–85, 1998.

R. Bekkerman and A. McCallum. Disambiguating Web appearances of people in a social network. Proceedings of the 14th international conference on World Wide Web, pages 463–470, 2005.

Dmitri V. Kalashnikov, Sharad Mehrotra, Zhaoqi Chen and Rabia Nuray-Turan, “Disambiguation Algorithm for People Search on the Web” Appeared in: IEEE International Conference on Data Engineering (IEEE ICDE), April 16-20, 2007.

G. Mann and D. Yarowsky, “Unsupervised Personal Name Disambiguation,” Proc. Conf. Computational Natural Language Learning (CoNLL ‟03), pp. 33-40, 2003.

D. Bollegala, T. Honma, Y. Matsuo, and M. Ishizuka. “Automatically extracting personal name aliases from the web,” GoTAL ‟08: Proceedings of the 6thinternational conference on Advances in Natural Language Processing, pages 77–88, 2008.

X. Liu, Y. Gong, W. Xu, and S. Zhu. “Document clustering with cluster refinement and model selection capabilities.” In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 191–198. ACM Press, 2002.

C. Niu, W. Li, and R. Srihari. “Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction,” context, 2:1, 2004.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.