Open Access Open Access  Restricted Access Subscription or Fee Access

A Tool Prototype for Privacy Preserving Distributed Data Mining and Analytics

Swapnil Shrivastava, Supriya N Pal, N. Sarat Chandra Babu, Sudhanshu Raman, Milindra Pratap Singh


Big Data Analytics techniques are becoming increasingly important in various scientific and commercial domains for intelligence gathering and decision making. Data Privacy is one of the key challenges that hinders in the intended implementation of these techniques. In several government and private organizations, data stores are located at different sites and bringing data together in centralized location for analysis is not possible due to privacy concerns. Hence, there is a strong need for data mining tool to solve the problems which involve creation of privacy preserved comprehensive view for data stored across standalone repositories or silos. There are several Privacy Preserving Distributed Data Mining algorithms and techniques available in literature.  However they are not readily available in the form of tools or libraries for usage. In this paper we would present the design and implementation of a tool prototype to perform Privacy Preserving Data Mining and Analytics in distributed environment.  The benefit and usefulness of the tool prototype is demonstrated for Census Data case study. We strongly believe that the complete implementation of this tool would result in effective usage, efficient development and evaluation of various Privacy Preserving Distributed Data Mining techniques which in turn could address Big Data privacy challenge.


Privacy Preserving Distributed Data Mining, Shamir Secret Sharing, Secure Multiparty Computation, K-Mean Clustering

Full Text:



Gartner's Big Data Definition Consists of Three Parts, Not to Be Confused with Three "V"s

A. Labrinidis and H.V. Jagadish, Challenges and opportunities with big data, Proceedings of the VLDB Endowment 5 (12), 2032-2033

Data Silo - Definition, Search Cloud Applications,

J. Hurwitz, A. Nugent, F. Halper, M. Kaufman, Big Data for Dummies, Wiley publications, 2013.

R. Kimball and M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, Second Edition, Wiley Publication, 2002

Distributed Data Mining$ejr31cyc

Top Ten Big Data Security and Privacy Challenges, Cloud Security Alliance, 2012

Z. Xu, Analysis of Privacy Preserving Distributed Data Mining Protocols, MS by Research thesis, School of Engineering and Science, Faculty of Health, Engineering and Science, Victoria University, 2011

Z. Xu and X. Yi, Classification of Privacy-preserving Distributed Data Mining Protocols, Sixth International Conference on Digital Information Management,2011

C. Clifton, M. Kantarcioglou, J. Vaidya, X. Lin and M. Y. Zhu, Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations, 4(2), 2002.

Y. Lindell and B. Pinkas, Privacy Preserving Data Mining, Journal of Cryptology, Vol. 15, No. 3, pp. 177-206, 2002.

B. Pinkas, Cryptographic techniques for privacy preserving data mining, SIGKDD Explorations, Vol. 4 (2), Dec 2002.

A. C. Yao, How to generate and exchange secrets. In Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pages 162-167. IEEE, 1986.

Secret Sharing- Wikipedia

M. Naor and B. Pinkas, Efficient Oblivious Transfer Protocols, Proceedings of 12th SIAM Symposium on Discrete Algorithms (SODA), January 7-9 2001, Washington DC, pp. 448–457.

X. Yi, R. Paulet and E. Bertino , Homomorphic Encryption and Applications, Springer, 2014

Y. Ejgenberg, M. Farbstein, M. Levy and Y. Lindell, “SCAPI: The secure computation application programming interface,” IACR Cryptology ePrint Archive, 2012.

D. Bogdanov, S. Laur and J. Willemson, “Sharemind: A Framework for Fast Privacy-Preserving Computations,” in Proceedings of the 13th European Symposium on Research in Computer Security - ESORICS’08, 2008.

Viff, the virtual ideal functionality framework - Homepage

M. Burkhart, M. Strasser, D. Many and X. Dimitropoulos, “Sepia: Privacy-preserving aggregation of multi-domain network events and statistics,” in Proceedings of the 19th USENIX Conference on Security, 2010.

M. Kantarcioglu and C. Clifton, Privacy-preserving distributed mining of association rules on horizontally partitioned data, The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'02), June 2 2002

L. Ying-hua, Y. Bing-ru, C. Dan-yang and M. Nan, "Stateof-the-art in distributed privacy preserving data mining, "Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on , vol., no., 2011, pp. 545-549

J. Vaidya, Y. M. Zhu and C.W. Clifton, Privacy Preserving Data Mining, Springer Science Business Media Inc., Boston, MA 2006.

Office of the Registrar General and Census Commissioner, India- Home Page

A. Shamir, How to share a secret. Communications of the ACM, vol.22, no. 11, pp.612–613, 1979.

E. Forgey, Cluster analysis of multivariate data: Efficiency vs. interpretability of classification. Biometrics, vol.21, no.768, 1965

S. Patel, S. Garasia and D. Jinwala, "An efficient approach for Privacy Preserving Distributed K-Means clustering based on Shamir's Secret Sharing scheme". In proceedings of 6th IFIP WG 11.11 International conference of Trust Management, Springer, May 2012.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.