Open Access Open Access  Restricted Access Subscription or Fee Access

A Survey on Data Anonymization Approaches Using MapReduce Framework on Cloud

Rahul H. Jadhav, Dr. R.B. Ingle

Abstract


Many Cloud users require exchanging the Private data over internet for Data analysis and mining. Like Sensitive data require to Health Research centre for analysis purpose. Private data require high Privacy and this privacy preserved by some Privacy preservation techniques. Anonymization is one of the most important privacy preservation techniques. Data anonymization has several approaches like Top down, Bottom up. Both approaches executed until K-anonymity violated. In this paper we compare the performance of both approaches according K-anonymity. For parallel processing of large scale data on Cloud we use the MapReduce framework. MapReduce increase the computation power of Cloud. Anonymization approaches performs efficiently according to K-anonymity. From last several years scalability and efficiency problems occurs in data anonymization. There is several challenges for anonymization that is Scalability and Efficiency, another is rapidly growing scale of data on cloud that is Big data.


Keywords


Data Anonymization, Top Down Approach, Bottom Up Approach, Mapreduce Framework, Cloud, Privacy Preservation.

Full Text:

PDF

References


B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, “Privacy-Preserving Data Publishing: A Survey of Recent Developments,” ACM Computing Surveys, vol. 42, no. 4, pp. 1-53, 2010.

B.C.M. Fung, K. Wang, and P.S. Yu, “Anonymizing Classification Data for Privacy Preservation,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 711-725, May 2007.

X. Xiao and Y. Tao, “Anatomy: Simple and Effective Privacy Preservation,” Proc. 32nd Int’l Conf. Very Large Data Bases (VLDB ’06), pp. 139-150, 2006.

K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Incognito: Efficient Full-Domain K-Anonymity,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’05), pp. 49-60, 2005.

J. Dean and S. Ghemawat, “Mapreduce: Simplified Data Processing on Large Clusters,” Comm. ACM, vol. 51, no. 1, pp. 107-113, 2008.

N. Mohammed, B. Fung, P.C.K. Hung, and C.K. Lee, “Centralized and Distributed Anonymization for High-Dimensional Healthcare Data,” ACM Trans. Knowledge Discovery from Data, vol. 4, no. 4, Article 18, 2010.

B. Fung, K. Wang, L. Wang, and P.C.K. Hung, “Privacy- Preserving Data Publishing for Cluster Analysis,” Data and Knowledge Eng., vol. 68, no. 6, pp. 552-575, 2009.

W. Ke, P.S. Yu and S. Chakraborty, “Bottom-up generalization: A data mining solution to privacy protection,” Proc. 4th IEEE International Conference on Data Mining (ICDM’04), pp.249-256, 2004.

Amazon Web Services, “Amazon Elastic Mapreduce,” http:// aws.amazon.com/elasticmapreduce/, 2013.

J. Dean and S. Ghemawat, “MapReduce: A flexible data processing tool,” Communications of the ACM, vol. 53, no.1, pp. 72-77, 2010

X. Zhang, L.T. Yang, C. Liu and J. Chen, “A scalable two phase top-down specialization approach for data anonymization using MapReduce on cloud,” IEEE Transactions on Parallel and Distributed Systems, In press, 2013.

I. Palit and C.K. Reddy, “Scalable and parallel boosting with mapreduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012.

H. Takabi, J.B.D. Joshi, and G. Ahn, “Security and Privacy Challenges in Cloud Computing Environments,” IEEE Security and Privacy, vol. 8, no. 6, pp. 24-31, Nov. 2010.

T. Iwuchukwu and J.F. Naughton, “K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization,” Proc. 33rd Int’l Conf. Very Large Data Bases (VLDB ’07), pp. 746-757, 2007.

X. Xiao and Y. Tao, “Personalized Privacy Preservation,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’06), pp. 229-240, 2006.

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox, “Twister: A Runtime for Iterative Mapreduce,” Proc. 19th ACM Int’l Symp. High Performance Distributed Computing (HDPC ’10), pp. 810-818, 2010


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.