### Deriving Private Information from Randomized Dataset using Data Reorganization Techniques

#### Abstract

Publishing data about individuals without revealing

sensitive information about them is an important problem. To enforce privacy-preserving paradigms, such as k-anonymity and l-diversity, while minimizing the information loss incurred in the anonymizing process (i.e. maximize data utility). work well for fixed-schema data,

with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transaction data (or basket data), which involves hundreds or even thousands of dimensions, rendering

existing methods unusable. A novel anonymization method for sparse high dimensional data is achieved. Two categories of novel anonymization method for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through

locality-sensitive hashing (LSH). In the second category, a data transformation that capture the correlation in the underlying data Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through

an efficient linear-time heuristic.

#### Keywords

#### Full Text:

PDF#### References

G. Ghinita, Y. Tao, and P. Kalnis, “On the Anonymization of Sparse,

High-Dimensional Data,” Proc. IEEE Int’l Conf. Data Eng. (ICDE), pp.

-724, 2008.

A. Machanavajjhala, J. Gehrke, D. Kifer, and M.Venkitasubramaniam,

“l-Diversity: Privacy beyond k-Anonymity,” Proc. IEEE Int’l Conf. Data

Eng. (ICDE), 2006

K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, “Incognito:Efficient

Full Domain k-Anonymity,” Proc. ACM SIGMOD, pp. 49-60, 2005

C.C. Aggarwal, “On k-Anonymity and the Curse of Dimensionality,”

Proc. Int’l Conf. Very Large Data Bases (VLDB), pp. 901-909, 2005.

X. Xiao and Y. Tao, “Anatomy: Simple and Effective Privacy

Preservation,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp.

-150, 2006.

A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High

Dimensions via Hashing,” Proc. Int’l Conf. Very Large Data Bases

(VLDB), pp. 518-529, 1999.

B.-C. Chen, K. LeFevre, and R. Ramakrishnan, “Privacy Skyline:

Privacy with Multidimensional Adversarial Knowledge,” Proc. Int’l

Conf. Very Large Data Bases (VLDB), pp. 770-781, 2007

G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, “Fast Data

Anonymization with Low Information Loss,” Proc. Int’l Conf. Very

Large Data Bases (VLDB), pp. 758-769, 2007.

Q. Zhang, N. Koudas, D. Srivastava, and T. Yu, “Aggregate Query

Answering on Anonymized Tables,” Proc. IEEE Int’l Conf. Data Eng.

(ICDE), pp. 116-125, 2007.

D. Richards, “Data Compression and Gray Code Sorting,” Information

Processing Letters, vol. 22, pp. 201-205, 1986

D. Kifer and J. Gehrke, “Injecting Utility into Anonymized Datasets,”

Proc. ACM SIGMOD, pp. 217-228, 2006.

J.K. Reid and J.A. Scott, “Reducing the Total Bandwidth of a Sparse

Unsymmetric Matrix,” SIAM J. Matrix Analysis and Applications, vol.

, no. 3, pp. 805-821, 2006.

P. Samarati and L. Sweeney. Protecting privacy when disclosing

information: k anonymity and its enforcement through generalization

and suppression. Technical report, CMU, SRI, 1998.

A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High

Dimensions via Hashing,” Proc. Int’l Conf. Very Large Data Bases

(VLDB), pp. 518-529, 1999.

M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi, “Anonymity

Preserving Pattern Discovery,” VLDB J., vol. 17, pp. 703-727, 2008

X. Xiao and Y. Tao. Anatomy: Simple and Effective Privacy

Preservation. In International Conference on Very Large Data Bases,

Seoul, Korea, Sept. 2006.

C. Yao, S.Wang, and S. Jajodia. Checking for k-Anonymity Violiation

by Views. In International Conference on Very Large Data Bases,

Trondheim, Norway, Aug. 2005.

C. C. Aggarwal, "On k-Anonymity and the Curse of Dimensionality." in

Proc. of VLDB, 2005, pp. 901-909.

XuW W Wang, J. Pei, X. Wang B. Shi and A. Fu,"Utility-Based

Anonynization Using Local Recoding," in Proc.of SIGKDD,2006, pp.

-23.

W. Winkler. Using simulated annealing for k-anonymity. Research

Report 2002-07, US Census Bureau Statistical Research Division, 2002.

### Refbacks

- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.