Open Access Open Access  Restricted Access Subscription or Fee Access

Challenges for De-Identification Policies using MapReduce in Big Data

N. M. Abroja, P. Bhuvana, R. Sathish Kumar

Abstract


Many data owners are obligatory to release the data in a variability of real world application, since it is of prominence to discovery valuable evidence stay behind the data. The motivation for this is the high scalability of the MapReduce prototype which allows for enormously parallel and distributed accomplishment over a huge number of computing nodes. This paper identifies MapReduce disputes and trials in handling Big Data with the separated of providing an suggestion of the field, simplifying better planning and management of big data developments and identifying occasions for future research in this field. Big data analytics (machine learning and interactive analytics) consists of online processing, security and privacy. Besides, current efforts aimed at refining and extending MapReduce to address recognized challenges are presented. Consequently, by detecting issues and challenges MapReduce faces when handling Big data, this study inspires upcoming Big data research.


Keywords


Big Data, De-Identification Policies, Mapreduce Overview, Policy Generation.

Full Text:

PDF

References


B. C. M. Fung, K.Wang, R. Chen, and P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Comput. Surv., vol. 42, no. 4, pp. 14:1–14:53, 2010.

X. MA, H. Li, J. Ma, Q. Jiang, S. Gao, N. Xi, and D. Lu, “Applet: A privacy-preserving framework for location-aware recommender system,” Sci China Inf Sci, vol. 59, no. 2, pp. 1–15, 2016.

W. Xia, R. Heatherly, X. Ding, J. Li, and B. Malin, “Efficient discovery of de-identification policies through a risk-utility frontier,” in CODASPY, 2013, pp. 59–70.

K. Benitez, G. Loukides, and B. Malin, “Beyond safe harbor: Automatic discovery of health information de-identification policy alternatives,” in IHI, 2010, pp. 163–172.

K. E. Emam, “Heuristics for de-identifying health data,” IEEE Security and Privacy, vol. 6, no. 4, pp. 58–61, 2008.

L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 555–570, 2002.

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “ℓ-diversity: Privacy beyond k-anonymity,” in TKDD, 2007, pp. 1–52.

N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and ℓ-diversity,” in ICDE, 2007, pp. 106–115.

J. Brickell and V. Shmatikov, “The cost of privacy: Destruction of data-mining utility in anonymized data publishing,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 70–78.

J. Cao and P. Karras, “Publishing microdata with a robust privacy guarantee,” Proc. VLDB Endow., vol. 5, no. 11, pp. 1388–1399, 2012.

W. Xia, R. Heatherly, X. Ding, J. Li, and B. A. Malin, “Ru policy frontiers for health data de-identification,” Journal of the American Medical Informatics Association, vol. 22, no. 5, pp. 1029–1041, 2015.

W. Qardaji, W. Yang, and N. Li, “Priview: Practical differentially private release of marginal contingency tables,” in SIGMOD, 2014, pp. 1435–1446.

J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao, Privbayes: Private data release via bayesian networks,” in SIGMOD, 2014, pp. 1423–1434.

T. Rekatsinas, A. Deshpande, and A. Machanavajjhala, “Sparsi: Partitioning sensitive data amongst multiple adversaries,” Proc. VLDB Endow., vol. 6, no. 13, pp. 1594–1605, 2013.

D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern, “Worst-case background knowledge for privacy preserving data publishing.” in ICDE, 2007, pp. 126–135.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.