Detecting and Removing Duplicate Records from Multiple Web Databases
Abstract
Keywords
Full Text:
PDFReferences
Wendy Alvey and Bettye Jamerson, Record LinkageTechniques – 1997, Proceedings of an International Workshop and Exposition, March 1997, Federal Committee on Statistical Methodology, Office of Management and Budget.
I. P. Fellegi and A. B. Sunter, A Theory For Record Linkage, Journal of the American Statistical sociation 64 (1969), no. 328, 1183–1210.
Mauricio Antonio Harn´andez-Sherrington, A Generalization of Band Joins and the Merge/Purge Problem, Ph.D. thesis, Department of Computer Sciences, Columbia University, 1996.
Jeremy A. Hylton, Identifying and Merging Related Bibliographic Records, Master’s thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1996.
Matthew A. Jaro, Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association 84 (1989), no. 406, 414–420.
Beth Kliss and Wendy Alvey, Record Linkage Techniques – 1985, Proceedings of the Workshop on Exact Matching Methodologies, May 1985,
Record Matching over Query Results from Multiple Web Databases by Weifeng Su, Jiying Wang, and Frederick H. Lochovsky,IEEE.
Duplicate Detection of Query Results from Multiple Web Databases by Hemalatha S, Raja K, Tholkappia Arasu IEEE.
B. He and K.C.-C. Chang, ―Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach,‖ ACM Trans. Database Systems, vol. 31, no. 1, pp. 346-396, 2006.
M.A. Hernandez and S.J. Stolfo, ―The Merge/Purge Problem for Large Databases,‖ ACM SIGMOD Record, vol. 24, no. 2, pp. 127-138, 1995.
M.A. Jaro, ―Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,‖ J. Am. Statistical Assoc., vol. 89, no. 406, pp. 414-420, 1989.
D.V. Kalashnikov, S. Mehrotra, and Z. Chen, ―Exploiting Relationships for Domain-Independent Data Cleaning,‖ Proc. SIAM Int’l Conf. Data Mining, pp. 262-273, 2005.
N. Koudas, S. Sarawagi, and D. Srivastava, ―Record Linkage: Similarity Measures and Algorithms (Tutorial),‖ Proc. ACM SIGMOD, pp. 802-803, 2006.
F. Letouzey, F. Denis, and R. Gilleron, ―Learning from Positive and Unlabeled Examples,‖ Proc. 11th Int’l Conf. Algorithmic Learning Theory, pp. 71-85, 2000.
R. Baxter, P. Christen, and T. Churches, ―A Comparison of Fast Blocking Methods for Record Linkage,‖ Proc. KDD Workshop Data Cleaning, Record Linkage, and Object Consolidation, pp. 25-27, 2003
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, ―Robust and Efficient Fuzzy Match for Online Data Cleaning,‖ Proc. ACM SIGMOD, pp. 313-324, 2003.
P. Christen, T. Churches, and M. Hegland, ―Febrl—A Parallel Open Source Data Linkage System,‖ Advances in Knowledge Discovery and Data Mining, pp. 638-647, Springer, 2004.
O. Bennjelloun, H. Garcia-Molina, D. Menestrina, Q. Su,S.E.Whang, and J. Widom, ―Swoosh: A Generic Approach to Entity Resolution,‖ The VLDB J., vol. 18, no. 1, pp. 255-276, 2009.
M. Bilenko and R.J. Mooney, ―Adaptive Duplicate Detection Using Learnable String Similarity Measures,‖ Proc. ACM SIGKDD, pp. 39-48, 2003.
P. Christen, ―Automatic Record Linkage Using Seeded Nearest Neighbour and Support Vector Machine Classification,‖ Proc. ACM SIGKDD, pp. 151-159, 2008.
W.E. Winkler, ―Using the EM Algorithm for Weight Computationin the Fellegi-Sunter Model of Record Linkage,‖ Proc. Section Survey Research Methods, pp. 667-671, 1988
S. Chaudhuri, V. Ganti, and R. Motwani, ―Robust Identification of Fuzzy Duplicates,‖ Proc. 21st IEEE Int’l Conf. Data Eng. (ICDE ’05),pp. 865-876, 2005.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.