Open Access Open Access  Restricted Access Subscription or Fee Access

A Brief Survey on Privacy Preserving Data Mining Techniques

N. P. Nethravathi, Vaibhav J. Desai, P. Deepa Shenoy, M. Indiramma, K.R. Venugopal


With the onset of the digital revolution, organizations are increasingly maintaining a huge amount of information on their databases and use data mining tools to extract useful information for their business intelligence. The problem with the availability of the digital information is the scarce privacy leakage. In many business domains, leakage of personal information of the client either directly or through data mining tools can lead to loss of competitive edge of the company, loss of revenue and customer churn. Companies are pushing for encryption and other data transformation methods to keep the data private. But mining tools which invoke algorithms like clustering, classification etc. may not work properly on the transformed data. In this paper, we analyze the privacy preserving data mining solutions and privacy leakage in them through indirect means. The main objective of this paper is to identify the open areas of research on privacy-preserving data mining.


Transformation Strategy, Privacy Preserving Data Mining, Cryptography, Wavelet Transformation, Correlation Analysis.

Full Text:



Demerjian, Dave (15 March 2007). "Rise of the Netflix Hackers" Wired. Retrieved 13 December 2014.

Singel, Ryan."Netflix spilled your Brokeback Mountain secret, lawsuit claims." Threat Level (blog), Wired (2009).

Narayanan, A., Shi, E., and Rubinstein, B. 2011."Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge." Proceedings of the 2011 International Joint Conference on Neural Networks (IJCNN). Preprint.

Kaufman, Shachar, et al. "Leakage in data mining: formulation, detection, and avoidance." ACM Transactions on Knowledge Discovery from Data (TKDD) 6.4 (2012):15.

Arvind Narayanan and Vitaly Shmatikov, "Robust de-anonymization of large sparse data sets." In IEEE Symposium on Security and Privacy, pp 111-125, 2008.

Jinyan Zang, Krysta Dummit, James Graves, Paul Lisker, and Latanya Sweeney." Who Knows What About Me? A Survey of Behind the Scenes Personal Data Sharing to Third Parties by Mobile Apps."

Englehardt, S., Reisman, D., Eubank, C., Zimmerman, P., Mayer, J., Narayanan, A.and Felten, E.W., 2015, May. "Cookies That Give You Away: The Surveillance Implications of Web Tracking." In Proceedings of the 24th International Conference on World Wide Web (pp. 289-299). International World Wide Web Conferences Steering Committee.

Patwardhan, S., Banerjee, S., Pedersen, T. (2003)."Using Measures of Semantic Relatedness for Word Sense Disambiguation." In A. F. Gelbukh (Ed.), 4th International Conference on Computational Linguistics and Intelligent Text Processing and Computational Linguistics, CICLing 2003 (Vol. 2588, pp. 241-257). Mexico City, Mexico: Springer Berlin / Heidelberg.

Lin, D.(1998)."An Information-Theoretic Definition of Similarity." In J.Shavlik (Ed.), Fifteenth International Conference on Machine Learning, ICML 1998 (pp. 296-304). Madison, Wisconsin, USA: Morgan Kaufmann.

Budanitsky, A., Hirst, G. (2001). "Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures." In Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics (pp. 10-15). Pittsburgh, USA.

Zhou, Xuezhong, et al."Ontology development for unified traditional Chinese medical language system." Artificial Intelligence in Medicine 32.1 (2004): pp.15-27.

Jiang, Guoqian, et al."Context-based ontology building support in clinical domains using formal concept analysis." International journal of medical informatics 71.1 (2003): pp.71-81.

N.Khasawneh and C.-C. Chan. "Active user-based and ontology-based web log data preprocessing for web usage mining." In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 325-328, 2006.

D. Perez-Rey, A. Anguita, and J. Crespo. "Ontodataclean: Ontologybased integration and preprocessing of distributed data." In Biological and Medical Data Analysis, pp. 262-272. Springer, 2006.

D. Tanasa and B. Trousse. "Advanced data preprocessing for intersites web usage mining." Intelligent Systems, IEEE, 19(2):pp.59-65, 2004.

Geetha Mary. A and Sriman Narayana Iyengar."A Frame Work for Ontological Privacy Preserved Mining." International Journal of Network Security and Its Application (IJNSA), Vol.2, No.1, January 2010.

Mao-Song Lin, Hui Zhang, and Zhang-Guo Yu, "An Ontology for Supporting Data Mining Process."}IMACS Multiconference on Computational Engineering in Systems Applications (CESA), October 4-6, 2006, Beijing, China.

Prasenjit Mitra, Peng Liu, Chi-Chun Pan,"Privacy-preserving Ontology Matching."} American Association for Artificial Intelligence, 2005.

Pawel lula and Grazyna Paliwoda Pekosz,"An Ontology based cluster analysis framework"}, ISWC'08 October 26-30, 2008, karlruhe, Germany, ACM 2008.

S.Anitha Elavarasi, J.Akilandeswari, Ph.D and K.Menaga,"Ontology based Semantic Similarity Measure using Concept Weighting", International Journal of Computer Applications® (IJCA) (0975 - 8887) International Conference on Knowledge Collaboration in Engineering, ICKCE-2014.

Pivovarov R, Elhadad N."A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts", Journal of Biomedical Informatics.2012; 45(3):pp. 471-481.

Aygul I, Cicekli N, Cicekli I."Searching documents with semantically related keyphrases", In: Sixth international conference on advances in semantic processing; 2012. pp. 59-64.

Xingming Sun, Yanling Zhu, Zhihua Xia and Lihong Chen,"Privacy-Preserving Keyword-based Semantic Search over Encrypted Cloud Data"}, International Journal of Security and its Applications Vol.8, No.3 (2014), pp. 9-20.

Yuh-Jong Hu, Jiun-Jan Yang,"A Semantic Privacy-Preserving Model for Data Sharing and Integration", WIMS '11 Proceedings of the International Conference on Web Intelligence, Mining and Semantics, ACM New York, NY, USA, 2011.

Singh, Rishabh, and Sumit Gulwani."Learning semantic string transformations from examples." Proceedings of the VLDB Endowment 5.8 (2012): pp.740-751.

N P Nethravathi, Prasanth G Rao, P Deepa Shenoy, Indiramma M, Venugopal K R,emph{"CBTS: Correlation Based Transformation Strategy for Privacy Preserving Data Mining"}, in IEEE WIECON-ECE 2015 December 19-20, Dhaka, Bangladesh.

Xiao X, Tao Y."Anatomy: Simple and effective privacy preservation." In Proceedings of the 32nd international conference on Very large data bases 2006 Sep 1 (pp. 139-150). VLDB Endowment.

D Zhu, XB Li, S Wu "Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining." Decision Support Systems, 2009 – Elsevier.

Panackal JJ, Pillai AS. "Adaptive Utility-based Anonymization Model: Performance Evaluation on Big Data Sets." Procedia Computer Science. 2015 Dec 31, 50, (pp. 347-352).

Ghinita G, Tao Y, Kalnis P."On the anonymization of sparse high-dimensional data." In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on 2008 Apr 7 (pp. 715-724).

He, Y., Naughton, J.F. "Anonymization of Set-Valued Data via Top-Down, Local Generalization." 35th Int. Conf. VLDB. V.2. Lyon, France, (2009), pp. 934-945.

Sweeney, L. "k-anonymity: a mode for protecting privacy." International Journal on Uncertainty,Fuzziness and Knowledge-based Systems 10 (5), (2002), pp. 557-570.

SV Kaya, TB Pedersen, E Savaş, Y Saygıýn "Efficient privacy preserving distributed clustering based on secret sharing." Discovery and Data Mining, 2007 – Springer.

Maneesh Upmanyu, Anoop M. Namboodiri, Kannan Srinathan, and C.V. Jawahar "Efficient Privacy Preserving K-Means Clustering." Springer-Verlag Berlin Heidelberg 2010.

Li T, Li N, Zhang J, Molloy I." Slicing: A new approach for privacy preserving data publishing. Knowledge and Data Engineering", IEEE Transactions on. 2012 Mar;24(3), pp. 561-574.

Agrawal, D. and Aggarwal, C.C. (2001). "On the design and quantification of privacy preserving data mining algorithms." In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of Database Systems. Santa Barbara, CA, pp. 247-255.

Fienberg, S. E. and McIntyre, J. (2003)."Data swapping: Variations on a theme by dalenius and reiss." Technical report, National Institute of Statistical Sciences, Research Triangle Park, NC.

Kim, J.J. and Winkler, W.E. (2003)." Multiplicative noise for masking continuous data." Technical Report Statistics 2003-01, Statistical Research Division, U.S. Bureau of the Census. Washington, D.C.

Liu L, Yang K, Hu L, Li L. "Using noise addition method based on pre-mining to protect healthcare privacy." Journal of Control Engineering and Applied Informatics. 2012 Jun 29;14(2): pp. 58-64.

K Chen, L Liu "Geometric data perturbation for privacy preserving outsourced data mining." Knowledge and Information Systems Springer Dec 2011.

Chongjing Sun, Yan Fu, Junlin Zhou, and Hui Gao "Personalized Privacy-Preserving Frequent Itemset Mining Using Randomized Response" The Scientific World Journal March 2014.

Blanton, "Achieving Full Security in Privacy-Preserving Data Mining." Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom)

Dowon Hong and Abedelaziz Mohaisen "Augmented Rotation-Based Transformation for Privacy-Preserving Data Clustering" ETRI Journal, Volume 32, Number 3, June 2010.

AA Hosain "Shear-based Spatial Transformation to Protect Proximity Attack in Outsourced Databae."IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2013

Yan W, Jiajin L, Dongmei H."A method for privacy preserving mining of association rules based on web usage mining." In Web Information Systems and Mining (WISM), 2010 International Conference on 2010 Oct 23 (Vol.1, pp. 33-37).

Khaled Alotaibi and Beatriz de la Iglesia "Privacy-Preserving SVM Classification using Non-metric MDS SECURWARE 2013 ": The Seventh International Conference on Emerging Security Information, Systems and Technologies.

SRM Oliveira, OR Zaiane "A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration" Computers & Security, 2007- Elsevier.

B Fung, K Wang, L Wang"A framework for privacy-preserving cluster analysis", Intelligence and Security Informatics, 2008. ISI 2008. IEEE International Conference.

Pui K. Fong "Privacy Preserving Decision Tree Learning Using Unrealized Data Sets" IEEE Transactions On Knowledge and Data Engineering, VOL. 24, NO. 2, FEBRUARY 2012.

Gábor Szűcs" Random Response Forest for Privacy-Preserving Classification" Journal of Computational Engineering volume 2013.

Andruszkiewicz P. "Classification with meta-learning in privacy preserving data mining." In Database Systems for Advanced Applications 2009 Jan 1 (pp. 261-275).Springer Berlin Heidelberg.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.