Open Access Open Access  Restricted Access Subscription or Fee Access

Handling Missing Information for Approximate Association Rule Mining

Dinesh J. Prajapati, Jagruti H. Prajapati

Abstract


Data mining is based upon searching the concatenation of multiple databases that usually contain some amount of missing data along with a variable percentage of inaccurate data, pollution, outliers and noise. Data warehouses usually have some missing values due to unavailable data which affect the number and the quality of the generated rules. Missing values creates a problem while extracting useful information from the data set. Handling missing data without affecting the quality of the data is challenging task. Association rule algorithms identify patterns from the database. Handling Missing Information for Approximate Association Rule mining allows data that approximately matches the pattern to contribute toward the overall support of the pattern. This approach is also useful in processing missing data, which probabilistically contributes to the support of possibly matching patterns. Apriori like candidate-generation-and-test approach may encounter serious challenges when mining datasets with long patterns. Hotspot algorithm is faster than some recently reported new frequent pattern mining methods. With Hotspot algorithm, many interesting patterns can also be mined efficiently. The actual data mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Therefore, the significance of the analysis depends heavily on the accuracy of the database and on the chosen sample data to be used for training and testing. The issue of missing data must be addressed because ignoring this can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this paper is to perform data mining process for the database with missing information effectively.

Keywords


Data Cleansing, Data Mining, Knowledge Discovery, Missing Values, Preprocessing.

Full Text:

PDF

References


Jiawei Han, Micheline Kamber, Data Mining Concepts & Techniques, Morgan Kaufmann Publishers, San Francisco, 2004.

Nayak, J. Cook, D (2001) “Approximate association rule mining”, Proceedings of In Florida Artificial Intelligence Research Symposium.

Azzam Sleit, Mousa Al-Akhras, Inas Juma, Marwah Alian,“ Applying Ordinal Association Rules for Cleansing Data With Missing Values”, Marsland Press Journal of American Science 2009:5(3) 52-62.

Jianhua Wu, Qinbao Songl Junyi Shen, “A Novel Association Rule Mining Based Missing Nominal Data Imputation Method”, Proceedings of Eighth ACIS International Conference.

Chih-Hung Wu, Chian-Huei Wun, Hung-Ju Chou, "Using Association Rules for Completing Missing Data,", Fourth International Conference on Hybrid Intelligent Systems (HIS'04), 2004 pp.236-241.

Cornelia Gyorödi, Robert Gyorödi, Stefan Holban – "A Comparative Study of Association Rules Mining Algorithms", SACI 2004, 1st Romanian-Hungarian Joint Symposium on Applied Computational Intelligence, Timisoara, Romania, May25-26, 2004, page. 213-222.

Lakshminarayan, K., Harp, S., Goldman, R., and Samad, T. 1996, “Imputation of missing data using machine learning techniques” In Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data Mining.

Ragel, A. and Cremilleux, B., “MVC- a preprocessing method to deal with missing values”, In Proceedings of Knowl.-Based Syst 1999, 285-291.

Arnaud Ragel & Bruno Cremilleux, “Treatment of Missing Values for Association Rules”, In Proceedings of PAKDD 1998.

Arnaud Ragel, Bruno Cremilleux & J. L. Bosson “An Interactive and Understandable Method to Treat Missing Values: Application to a Medical Data Set”, In ACM Comput. Surv. 1985.

Jinze Liu, Susan Paulsen, Wei Wang, Andrew Nobel, Jan Prins, “Mining Approximate Frequent Itemsets from Noisy Data”, In proceedings of ICDM 2005.

Luai Al Shalabi, “A comparative study of techniques to deal with missing data in data sets”, In Proceedings of the 4th International

Multiconference on Computer Science and Information Technology CSIT 2006.

R. A. Browse, D. B. Skillicorn, S. M. Mcconnell, “Using Competitive Learning to Handle Missing Values in Astrophysical Datasets”, In proceeding of Queen's University School of Computing 2002- 458.

Tracy Cerrillo, Mary Hansen, Michael Harwell, “The Effects of Different Methods of Handling Missing Data n Institutional-Level Models of Student Persistence”, University of Pittsburgh, April 2000.

Chryssy A. Potsiou and Charalabos Ioannidis, “The Mystery of Missing Information and the Difficulty of Their Integration in to a Legal Framework”, In Proceedings of the 5th FIG Regional Conference for Africa.

R. Agrawal et al., “Fast Discovery of Association Rules”, Advances in Knowledge Discovery and Data Mining 1996: pp. 307–328.

UCI Data Repository.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.