An Efficient Privacy Preserving Classification Tree Technique in K-Anonymity for Secure Data Mining and Data Publishing

P. Deivanai; J. Jesu Vedha Nayahi; Dr.V. Kavitha

An Efficient Privacy Preserving Classification Tree Technique in K-Anonymity for Secure Data Mining and Data Publishing

P. Deivanai, J. Jesu Vedha Nayahi, Dr.V. Kavitha

Abstract

In recent years of data mining applications, an effective technique to preserve privacy is to anonymize the dataset that include private information before being released for mining. Inorder to anonymize the dataset, manipulate its content so that the records adhere to k-anonymity. Two common manipulation techniques used to achieve k-anonymity of a dataset are generalization and suppression. However, generalization presents a major drawback as it requires a manually generated domain hierarchy taxonomy for every quasi identifier in the dataset on which kanonymity has to be performed. In this paper, new method for achieving k-anonymity based on suppression is proposed. In this method, efficient multi-dimensional suppression is performed, i.e.,values are suppressed only on certain records depending on other attribute values, without the need for manually-produced domain hierarchy trees. Thus, this method identify attributes that have less influence on the classification of the data records and suppress them if needed in order to comply with k-anonymity. The method wasevaluated on several datasets to evaluate its accuracy as compared to other k-anonymity based methods. Additionally, a new revised algorithm of kactus called ‘CombS’ can be used.

Keywords

Privacy Preserving Data Mining, k-Anonymity, Decision Trees, Classification

Full Text:

PDF

References

Slava Kisilevich, Lior Rokach, Yuval Elovici, Bracha Shapira,” Efficient Multi-Dimensional Suppression for K-Anonymity”, IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 3.(March 2010), pp. 334-347

M. Kantarcioglu, J. Jin, and C. Clifton, “When Do Data Mining Results Violate Privacy?” Proc. 2004 Int’l Conf. Knowledge Discovery and Data Mining, pp. 599-604, 2004.

P. Samarati and L. Sweeney, “Generalizing Data to Provide Anonymity When Disclosing Information,” Proc. 17th ACM SIGACT-SIGMODSIGART Symp. Principles of Database Systems,vol. 17, p. 188, 1998

L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int’l J.Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, no 5, pp. 557-570, 2002.

L. Sweeney, “Achieving k-Anonymity Privacy Protection Using Generalization and Suppression,” Int’l J. Uncertainty, Fuzziness, and Knowledge-Based Systems, vol. 10, no. 5, pp. 571-588, 2002.

B.C.M. Fung, K. Wang, and P.S. Yu, “Top-Down Specialization for Information and Privacy Preservation,” Proc. 21st IEEE Int’l Conf.Data Eng. (ICDE ’05), pp. 205-216, Apr. 2005.

K. Wang, P.S. Yu, and S. Chakraborty, “Bottom-Up Generalization: A Data Mining Solution to Privacy Protection,” Proc.Fourth IEEE Int’l Conf. Data Mining, pp. 205-216, 2004.

L. Tiancheng and I. Ninghui, “Optimal K-Anonymity with Flexible Generalization Schemes through Bottom-Up Searching,” Proc. Sixth IEEE Int’l Conf. Data Mining Workshops, pp. 518-523, 2006.

S.V. Iyengar, “Transforming Data to Satisfy Privacy Constraints,”Proc. Eighth ACM SIGKDD, pp. 279-288, 2002.

B.C.M. Fung, K. Wang, and P.S. Yu, “Anonymizing Classification Data for Privacy Preservation,” IEEE Trans. Knowledge and Data Eng., vol.19, no. 5, pp. 711-725, May 2007.

K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Incognito: Efficient Full Domain k-Anonymity,” Proc. 2005 ACM SIGMOD, pp. 49-60,2005.

A. Friedman, R. Wolff, and A. Schuster, “Providing k-Anonymity in Data Mining,” Int’l J. Very Large Data Bases, vol. 17, no. 4, pp. 789-804, 2008.

R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,”Artificial Intelligence, vol. 97, nos. 1/2, pp. 273-324, 1997.

K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Mondrian

Multidimensional k-Anonymity,” Proc. 22nd Int’l Conf. Data Eng.,p. 25,Apr. 2006.

G. Aggarwal, A. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy,D.Thomas and A. Zhu, “Approximation Algorithms for k-Anonymity,”Journal of Privacy Technology (JOPT), 2005.

A. Meyerson and R. Williams, “On the Complexity of Optimal k-Anonymity,” Proceedings of the twenty-third ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, pp.223-228, 2004.(ACM New York, NY, USA)

P. Samarati , “Protecting Respondents' Identities in Microdata Release,”IEEE Transactions on Knowledge and Data Engineering, vol. 13, no.6,pp. 1010-1027, 2001. (IEEE Computer Society)

K. LeFevre, D.J. DeWitt and R. Ramakrishnan, “Workload-Aware Anonymization,” Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277-286,2006.(ACM New York, NY, USA)

L. Sweeney, “Datafly: A System for Providing Anonymity in Medical Data,” Proceedings of the IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI: Status and Prospects, pp. 356-381,1997.

P. Sharkey, H. Tian, W. Zhang and S. Xu, “Privacy-Preserving Data Mining Through Knowledge Model Sharing,” Privacy, Security and Trust in KDD, vol. 4890, pp. 97-115, 2008. (Springer Berlin / Heidelberg)

S. Grumbach, T. Milo, “Towards Tractable Algebras for Bags,” Journal of Computer and System Sciences, vol. 52, no. 3, pp. 570-588, 1996.(Elsevier)

Y. Du, T. Xia, Y. Tao, D. Zhang and F. Zhu, “On Multidimensional k-Anonymity with Local Recoding Generalization,” Proceedings of International Conference on Data Engineering (ICDE), pp. 1422-1424,2007.

Keke Chen, Ling Liu,” Privacy-Preserving Multiparty

CollaborativeMining with Geometric Data Perturbation’ IEEE

transactions on parallel and distributed computing vol20, no.12,December 2009.

Slava Kisilevich, Yuval Elovici, Bracha Shapira, and Lior Rokach,”Privacy Preserving in ClassificationTasks Using k-Anonymity”

L. Rokach, L. Naamani, and A. Shmilovici, “Pessimistic Cost- Sensitive Active Learning of Decision Trees,” Data Mining and Knowledge Discovery, vol. 17, no. 2, pp. 283-316, 2008.

J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann,1993.

E. Alpaydin, “Combined 5_2 cv F Test for Comparing Supervised Classification Learning Classifiers,” Neural Computation, vol. 11,no. 8,pp. 1885-1892, 1999.

A. Asuncion and D.J. Newman, “UCI Machine Learning

Repository,”School of Information and Computer Science, Univ. of California, http://mlearn.ics.uci.edu/MLRepository.html, 2007.

E. Frank and I.H. Witten, “Generating Accurate Rule Sets without Global Optimization,” Proc. 15th Int’l Conf. Machine Learning, pp. 144-151, 1998.

I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools. Morgan Kaufmann, 2005.

J. Demsar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” J. Machine Learning Research, vol. 7, pp. 1-30, 2006.

L. Rokach, “Genetic Algorithm-Based Feature Set Partitioning for Classification Problems,” Pattern Recognition, vol. 41, no. 5,pp. 1693-1717, 2008.

B. Gilburd, A. Schuster, and R. Wolff, “k-TTP: A New Privacy Model for Large-Scale Distributed Environments,” Proc. 10th ACM SIGKDD,pp. 563-568, 2004.

Z. Yang, S. Zhong, and R.N. Wright, “Privacy-Preserving Classification of Customer Data without Loss of Accuracy,” Proc. Fifth Int’l Conf.Data Mining, 2005.

J. Roberto, Jr. Bayardo, and A. Rakesh, “Data Privacy through Optimal k-Anonymization,” Proc. Int’l Conf. Data Eng., vol. 21,pp. 217-228,2005.

M.S. Wolf and C.L. Bennett, “Local Perspective of the Impact of the HIPAA Privacy Rule on Research,” Cancer-Philadelphia Then Hoboken,vol. 106, no. 2, pp. 474-479, 2006.

L. Rokach, R. Romano, and O. Maimon, “Negation Recognition in Medical Narrative Reports,” Information Retrieval, vol. 11, no. 6,pp.499-538, 2008.

P. Samarati and L. Sweeney, “Generalizing Data to Provide Anonymity When Disclosing Information,” Proc. 17th ACM SIGACT-SIGMODSIGART Symp. Principles of Database Systems,vol. 17, p. 18, 1998.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me