An Optimized Approach to Record Deduplication
Abstract
Keywords
Full Text:
PDFReferences
A.K. Elmagarmid, P.G. Ipeirotis, and V.S. Verykios, “Duplicate Record Detection: A Survey,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 1, pp. 1-16, Jan. 2007.
A.Z. Broder, S.C. Glassman, M.S. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Proc. Sixth Int’l World Wide Web Conf. (WWW6), pp. 1157-1166, 1997.
Ahmed, K. Elmagarmid, Panagiotis G. Ipeirotis and Vassilios S. Verykios, 2007. Duplicate recorddetection: A survey. IEEE Trans. Knowl. Data Eng., 19: 1-16. DOI: 0.1109/TKDE.2007.250581.
Bhagwat, D., K. Eshghi, D.D. Long and M.Lillibridge, 2009. Extreme binning: Scalable, parallel deduplication for chunk-based file backup. Proceedings of the 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, (MASCOTS ’09), London, UK.
Bolosky, W.J., S. Corbin, D. Goebel and J.R. Douceur,2000. Single instance storage in Windows® 2000.Proceedings of the 4th Conference on USENIX Windows Systems Symposium, (WSS ’00),USENIX Association Berkeley, CA, USA, pp: 2-2.
Donghui Feng, Gully Burns and Eduard Hovy ,“Extracting Data Records from Unstructured Biomedical Full Text” Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 837–846, Prague, June 2007.
H.B. Newcombe, “Record Linking: The Design of Efficient Systems for Linking Records into Individual and Family Histories,” Am. J. Human Genetics, vol. 19, no. 3, May 1967.
Jiansheng Wei,1Ke Zhou, 2Lei Tian, 1Hua Wang, Dan Feng,” A Fast Dual-level Fingerprinting Scheme for Data Deduplication“
Kumar, J.P. and P. Govindarajulu, 2009. Duplicate and near duplicate documents detection: A review. Eur.J. Sci. Res., 32: 514-527.
M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, “Adaptive Name Matching in Information Integration,” IEEE Intelligent Systems, vol. 18, no. 5, pp. 16-23, Sept./Oct. 2003.
Mark W. Storer Kevin Greenan Darrell D. E. Long Ethan L. Miller,” Secure Data Deduplication”
Michael O. Rabin, "Fingerprinting by random polynomials", Technical Report, No. TR-15-81, Center for Research in Computing Technology, Harvard University, Cambridge, MA, USA, 1981.
Moise´s G. de Carvalho, Alberto H.F. Laender, Marcos Andre´ Gonc¸alves, and Altigran S. da Silva.” A Genetic Programming Approach to Record Deduplication”
P. Christen, “Probabilistic Data Generation for Deduplication and Data Linkage,” Intelligent Data Eng. and Automated Learning, pp. 109-116, Springer, 2005.
Peter Christen.”Probabilistic Data Generation for Deduplication and Data Linkage”, http://datamining.anu.edu.au/linkage.html.
R. Bell and F. Dravis, “Is You Data Dirty? and Does that Matter?,” Accenture Whiter Paper, http://www.accenture.com, 2006.
R.A. Baeza-Yates and B.A. Ribeiro-Neto, Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.
Weifeng Su, Jiying Wang, and Frederick H. Lochovsky, “Record Matching over Query Results from Multiple Web Databases”, IEEE Transactions On Knowledge And Data Engineering, VOL. 22, NO. 4, APRIL 2010
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.