

Preserving Privacy by Quantizing
Abstract
Advances in Data Mining resulted in collection of
sensitive information from the published data. The web rendering a platform for data publishing and development of automated software technologies added fuel to the burning problem of personal privacy. The sensitive data need to be anonymized and published in the web. A number of methods were proposed earlier for anonymization. Methods include data partitioning, data swapping, generalization, suppression, randomization, perturbation, secure multiparty computation etc. A method of perturbation is discussed in this paper. Domain values of the private table are clustered together using clustering algorithms. To anonymize the private table the values are represented by the cluster
head. This decreases the utility of data to be published. Care need to be taken that while anonymizing a balance of utility and privacy need to be maintained. F-measure and distortion are the metrics deployed to find the utility of the data that get perturbed.
Keywords
References
Qiang Yang, Xindong Wu, et al., ”10 Challenging Problems in Data
Mining Research”, International Journal of Information Technology and
Decision Making. World Scientific Journals, Volume 5(4), pp 597-604,
Qiang Yang, ”Three Challenges in Data Mining”, Frontiers of Computer
Science in China, Springer, Volume 4, Number 3, pp.324-333.
P. Samarati and L.Sweeny, ”Generalizing Data to Provide Anonymity
when disclosing information”. In proceeding of 17th ACM
SIGAT-SIGMOD Symposium on Principle of DataBase System
(PODS‟98) pp 188 Seatlle WA 1998.
Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S.
Yu,”Introduction to Privacy-Preserving Data Publishing Concepts and
Techniques”, CRC Press, 2011, 43-44.
C.C. Aggarwal, and P.S. Yu,”Privacy Preserving Data Mining: Models
and Algorithm”, Volume 34 of Advances in Database System,
Springer-Velag, Newyork,2008
Jaideep Vaidya , Christopher W. Clifton and Yu Michael Zhu, “Privacy
Preserving Data Mining”, Chapter 1, Springer, New York, 2006.
L. Sweeney, ”Achieving k-anonymity privacy preotection using
generalization and suppression”, International Journal on Uncertainty,
Fuzziness and Knowledge-based Systems,10(5), 571-588.
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam M.
“l-Diversity: Privacy beyond k-anonymity”, ACM Trans. Knowledge
Discovery of Data, 1(3),March 2007.
Ninghui Li et al., “t-Closeness: Privacy Beyond k-Anonymity and
l-Diversity”, Proceedings of IEEE 23rd ICDE, 2007, pp. 106-115, April
Korra Sathya Babu and Sanjay Kumar Jena, “Balancing between Utility
and Privacy for k-Anonymity”, Edition 1 of Communications in
Computer and Information Science, Volume 191, Advances in
Computing and Communications, Part 1, Pages 1-8. springer-verlag
berlin heidelberg 2011.
Oliveira S.R.M, Zaiane Osmar R., A Privacy-Preserving Clustering
Approach Toward Secure and Effective Data Analysis for Business
Collaboration, In Proceedings of the International Workshop on Privacy
and Security Aspects of Data Mining in conjunction with ICDM 2004,
Brighton, UK, November 2004.
Wang Qiang , Megalooikonomou, Vasileios, A dimensionality reduction
technique for efficient time series similarity analysis, Inf. Syst. 33, 1
(Mar.2008), 115- 132.
Nancy Chinchor, “Evaluation Metrics”, in Proc. of the Fourth Message
Understanding Conference, pp. 22–29, 1992.
Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang
Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu,
Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, Dan
Steinberg, ”Top 10 algorithms in data mining”, Knowl. Inf. Syst. 14(1):
pp 1-37, 2008.
UCI Repository of machine learning databases, University of California,
Irvine. http://archive.ics.uci.edu/ml/
www.un.org/en/documents/udhr/
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.