Perturbation Based Technique for Privacy Preserving Clustering of High Dimensional Data

R. VidyaBanu; N. Nagaveni

Perturbation Based Technique for Privacy Preserving Clustering of High Dimensional Data

R. VidyaBanu, N. Nagaveni

Abstract

Privacy of personal data is a fundamental human right.
The freedom and transparency of data flow due to swift advances in data processing techniques and internet technology has heightened concerns of privacy .Reluctance to provide personal information could impede the success of data mining. . Concern about the privacy of data
is becoming an important concern in business, academic, defense and health care domains. Privacy-preserving data mining (PPDM) addresses these issues by striking a balance between privacy preservation and knowledge discovery. We propose a novel linear component analysis based transformation technique for Privacy
preserving clustering to preserve the privacy of confidential data. We further evaluate the performance of this technique with the classical k-means clustering algorithm. The effectiveness of our new approach is demonstrated by various experiments conducted on synthetic data sets of varying dimensions. The accuracy of clustering has been
computed before and after privacy preserving transformation using adjusted rand Index. Based on our results, we conclude that our method is an effective and feasible technique to build data mining models from perturbed data.

Keywords

Adjusted Rand Index, K-Means, Linear Components Analysis, Transformation Matrix.

Full Text:

PDF

References

C. Clifton and D. Marks, Security and privacy implications of data mining

in: Proceedings of the 1996 ACM SIGMOD Workshop on Data Mining

and Knowledge Discovery, 1996, pp.15-19 .

V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and

Y. Theodoridis, State-of-the-art in privacy preserving data mining in:

Proceedings of ACM SIGMOD, 2004, pp. 50–57.

Jian Wang, Yongcheng Luo, Yan Zhao, Jiajin Le, A Survey on Privacy

Preserving Data Mining in : Proceedings of International Workshop on

Database Technology and Applications, 2009 , pp. 111-114.

L. Sweeney, k-anonymity: a model for protecting privacy, International

Journal on Uncertainty, Fuzziness and Knowledge based Systems, 10, (5),

, pp. 557-570.

Ackerman, M. S., Cranor, L. F., and Reagle, J, Privacy in ecommerce:

examining user scenarios and privacy preferences in: Proc. EC99, 1999,

pp. 1-8.

W. Du, Y. Han, and S. Chen, Privacy-preserving multivariate statistical

analysis: Linear regression and classification in: Proceedings of the

Fourth SIAM International Conference on Data Mining, 2004, pp.

–233.

[R.VidyaBanu, Divya Suzanne Thomas, N.Nagaveni, Enhancing privacy

of Confidential Data using K Anonymization, International Journal of

Recent Trends in Engineering, 2(1) 2009. pp 130-133.

S. Rizvi, and J. Haritsa, Maintaining data privacy in association rule

mining in : Proceedings of 28th Intl. conf on Very large

Databases(VLDB) , 2002

Ye, J., Ji, S.: Discriminant analysis for dimensionality reduction: An

overview of recent developments.In: Boulgouris, N., Plataniotis, K.N.,

Micheli-Tzanakou, E. (eds.) Biometrics: Theory, Methods, and

Applications. Wiley-IEEE Press, New York ,2010. Chap. 1.

E.W. Weisstein, K-Means Clustering Algorithm. Retrieved Apr 28,2011

From Math World -- A Wolfram Web Resource.

Jorge M. Santos and Mark Embrechts, On the Use of the Adjusted Rand

Index as a Metric for Evaluating Supervised Classification Artificial

Neural Networks – ICANN 2009 ,Lecture Notes in Computer Science,

, Volume 5769, 2009, Pages 175-184.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me