Open Access Open Access  Restricted Access Subscription or Fee Access

An Estimation of Privacy in Incremental DataMining

V. Rajalakshmi, G.S. Anandha Mala, R. Balasubramanian


Data are values of qualitative or quantitative variables, belonging to a set of items. In recent years, advances in hardware technology have lead to an increase in the capability to store and record personal data about consumers and individuals. This has lead to concerns that the personal data may be misused for a variety of purposes. Data explains a business transaction, a medical record, bank details, educational details etc., Use of technology for data collection and analysis has seen an unprecedented growth in the last couple of decades. Such information includes private details, which the owner doesn’t want to disclose. Such data are the sources for data mining. Data mining gives us “facts” that are not obvious to human analysts of the data. When such sensitive data are given directly for mining, the security of the individual is highly affected. So the data are modified and presented for data mining. But the problem is that the altered data should also produce a similar mining result. This has lead an area called privacy preservation in datamining which is an intersection of data mining and information security. The fact in this area is the additional task which is used to implement the privacy degrades the performance of the data mining algorithm, which results in incorrect mining results. This crucial situation has led to the development of this paper which deals with the data metrics that  determines the quality of the following existing privacy preserving algorithms viz., Correlation- aware Anonymization of High-dimensional Data (CAHD) [1], Privacy-Preserving Outlier Detection Through Random Nonlinear Data Distortion (PRND) [2], Privacy-Preserving Data Aggregation(PPDA) [3], Privacy-Preserving Incremental Data sets( PRID) [4] which defines various methods for implementing privacy in incremental data. Major metrics like data utility, privacy and computational time are considered for evaluation and their detailed performance is discussed.



Datamining, Privacy Preservation, Perturbation, Quality Metrics, Anonymization

Full Text:



Gabriel Ghinita, Panos Kalnis, and Yufei Tao,” Anonymous Publication of Sensitive Transactional Data”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011.

Kanishka Bhaduri, Mark D. Stefanski, and Ashok N. Srivastava,” Privacy-Preserving Outlier Detection ThroughRandom Nonlinear Data Distortion”, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 41, NO. 1, FEBRUARY 2011.

Arijit Ukil,” Privacy Preserving Data Aggregation in Wireless Sensor Networks”, 2010 Sixth International Conference on Wireless and Mobile Communications

Yingjie Wu, Zhihui Sun, Xiaodong Wang, “Privacy Preserving k-Anonymity for Re-publication of Incremental Datasets”, 2009 World Congress on Computer Science and Information Engineering.

J.Gitanjali, Dr.J.Indumathi, Dr.N.Ch.Sriman Narayana Iyengar,” A Pristine Clean Cabalistic Foruity Strategize Based Approach for Incremental Data Stream Privacy Preserving Data Mining”, 2010 IEEE 2nd International Advance Computing Conference,pp 410-415.

Z. Huang, W. Du, and B. Chen, “Deriving Private Information from Randomized Data,” Proc. ACM SIGMOD, pp. 37-48, 2005.

Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” in Proc. IEEE SSP, 2008, pp. 111–125.

K. Liu, H. Kargupta, and J. Ryan, “Random projection-based multiplicative data perturbation for privacy preserving distributed data mining,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 1, pp. 92–106, Jan. 2006.

V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009.

P. S. Bradley, U. M. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In KDD, pages 9–15, 1998.

Fatih Altiparmak, Hakan Ferhatosmanoglu,” Incremental Maintenance of Online Summaries over Multiple Streams”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 2, FEBRUARY 2008.

Bi-Ru Dai, Li-Hsiang Chiang,” Hiding Frequent Patterns in the Updated Database”, 2010 IEEE.

Jia Yubo,Duan Yuntao, Wang Yongli,” An Incremental Updating Algorithm forOnline Mining Association Rules”, 2009 International Conference on Web Information Systems and Mining.

Huidong Jin, K.-S. Leung”, Scalable Model-Based Clustering for Large Databases Based on Data Summarization”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 11, NOVEMBER 2005.

Shuguo Han, Wee Keong Ng, Li Wan,” Privacy-Preserving Gradient Descent Methods”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,March 2010.

Xindong Wu,Gong-Qing Wu, Fei Xie,Zhu Zhu, Xue-Gang Hu, Hao Lu, Huiqian Li,” News Filtering and Summarization on the Web Intelligent Systems, IEEE, Sept.-Oct. 2010 Vol. 25 , Issue:5 .

R. Agrawal and R. Srikant, “Privacy Preserving Data Mining”, Proc. ACM SIGMOD, pp. 439-450, 2000.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.