Deriving Private Information from Randomized Dataset using Data Reorganization Techniques
Abstract
Publishing data about individuals without revealing
sensitive information about them is an important problem. To enforce privacy-preserving paradigms, such as k-anonymity and l-diversity, while minimizing the information loss incurred in the anonymizing process (i.e. maximize data utility). work well for fixed-schema data,
with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transaction data (or basket data), which involves hundreds or even thousands of dimensions, rendering
existing methods unusable. A novel anonymization method for sparse high dimensional data is achieved. Two categories of novel anonymization method for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through
locality-sensitive hashing (LSH). In the second category, a data transformation that capture the correlation in the underlying data Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through
an efficient linear-time heuristic.
Keywords
Full Text:
PDFReferences
G. Ghinita, Y. Tao, and P. Kalnis, “On the Anonymization of Sparse,
High-Dimensional Data,” Proc. IEEE Int’l Conf. Data Eng. (ICDE), pp.
-724, 2008.
A. Machanavajjhala, J. Gehrke, D. Kifer, and M.Venkitasubramaniam,
“l-Diversity: Privacy beyond k-Anonymity,” Proc. IEEE Int’l Conf. Data
Eng. (ICDE), 2006
K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Incognito:Efficient
Full Domain k-Anonymity,” Proc. ACM SIGMOD, pp. 49-60, 2005
C.C. Aggarwal, “On k-Anonymity and the Curse of Dimensionality,”
Proc. Int’l Conf. Very Large Data Bases (VLDB), pp. 901-909, 2005.
X. Xiao and Y. Tao, “Anatomy: Simple and Effective Privacy
Preservation,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp.
-150, 2006.
A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High
Dimensions via Hashing,” Proc. Int’l Conf. Very Large Data Bases
(VLDB), pp. 518-529, 1999.
B.-C. Chen, K. LeFevre, and R. Ramakrishnan, “Privacy Skyline:
Privacy with Multidimensional Adversarial Knowledge,” Proc. Int’l
Conf. Very Large Data Bases (VLDB), pp. 770-781, 2007
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, “Fast Data
Anonymization with Low Information Loss,” Proc. Int’l Conf. Very
Large Data Bases (VLDB), pp. 758-769, 2007.
Q. Zhang, N. Koudas, D. Srivastava, and T. Yu, “Aggregate Query
Answering on Anonymized Tables,” Proc. IEEE Int’l Conf. Data Eng.
(ICDE), pp. 116-125, 2007.
D. Richards, “Data Compression and Gray Code Sorting,” Information
Processing Letters, vol. 22, pp. 201-205, 1986
D. Kifer and J. Gehrke, “Injecting Utility into Anonymized Datasets,”
Proc. ACM SIGMOD, pp. 217-228, 2006.
J.K. Reid and J.A. Scott, “Reducing the Total Bandwidth of a Sparse
Unsymmetric Matrix,” SIAM J. Matrix Analysis and Applications, vol.
, no. 3, pp. 805-821, 2006.
P. Samarati and L. Sweeney. Protecting privacy when disclosing
information: k anonymity and its enforcement through generalization
and suppression. Technical report, CMU, SRI, 1998.
A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High
Dimensions via Hashing,” Proc. Int’l Conf. Very Large Data Bases
(VLDB), pp. 518-529, 1999.
M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi, “Anonymity
Preserving Pattern Discovery,” VLDB J., vol. 17, pp. 703-727, 2008
X. Xiao and Y. Tao. Anatomy: Simple and Effective Privacy
Preservation. In International Conference on Very Large Data Bases,
Seoul, Korea, Sept. 2006.
C. Yao, S.Wang, and S. Jajodia. Checking for k-Anonymity Violiation
by Views. In International Conference on Very Large Data Bases,
Trondheim, Norway, Aug. 2005.
C. C. Aggarwal, "On k-Anonymity and the Curse of Dimensionality." in
Proc. of VLDB, 2005, pp. 901-909.
XuW W Wang, J. Pei, X. Wang B. Shi and A. Fu,"Utility-Based
Anonynization Using Local Recoding," in Proc.of SIGKDD,2006, pp.
-23.
W. Winkler. Using simulated annealing for k-anonymity. Research
Report 2002-07, US Census Bureau Statistical Research Division, 2002.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.