Open Access Open Access  Restricted Access Subscription or Fee Access

Empirical Evaluation of Grid-Based Text Mining Tasks

R. Boobathiraj, J. Geetha

Abstract


The enormous amount of information stored in unstructured texts cannot simply be used for further processing by computers, which typically handle text as simple sequences of character strings. Text mining is the process of extracting interesting information and knowledge from unstructured text. This study implies an overview of our research activities aimed at efficient use of Grid infrastructure to solve various text mining tasks. Grid-enabling of various text mining tasks was mainly driven by increasing volume of processed data. Integration of text mining services into the distributed service oriented system enables plenty of various possibilities for building the distributed text mining services. Three different data driven distributed approaches for text mining have been proposed they are induction of decision trees, GHSOM clustering algorithm and FCA method. The objective of this work was to evaluate the concept that proposed by Min Sarnovskartý, which yielded favorable results.

Keywords


Decision Trees, GHSOM, Grid, Text Mining.

Full Text:

PDF

References


Miller, G.A, The magic number seven (plus 1or minus two): Some limits on our capacity for processing information, Psychological Review, 63, 81-93.,1956.

W. Frawley, G. Piatetsky-Shapiro, C. Matheus. Knowledge Discovery in Databases:An Overview. AI Magazine, 213-228, 1992.

Foster I., Kesselman, C.: Computional Grids, The Grid-Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999.

Manning C. D. and Schutze H., Foundations of Statistical Natural Language Processing [M]. Cambridge: MIT Press. 1999.

Yang Y. and Liu X., A Re-examination of Text Categorization Methods [A]. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. 42-49, 1999.

Joachims T., TEXT Categorization with Support Vector Machines: Learning with Many Relevant Features [A]. In: Proceedings of the European Conference on Machine Learning [C]. , 1998.

Li Baoli, Chen Yuzhong, and Yu Shiwen,, A Comparative Study on Automatic Categorization Methods for Chinese Search Engine [A]. In: Proceedings of the Eighth Joint International Computer Conference [C]. Hang Zhou: Zhejiang University Press, 117-120, 2002.

Kohonen, T.: Self-Organizing Maps, Springer-Verlag, Berlin, 1995.

Ditttenbach, M., Rauber, A., Merkl, D.: The Growing Hierarchical Self- Organizing Map, in Proceedings of International Joint Conference on Neural Networks, Como, Italy, 2000.

Ganter, B., Wille, R.: Formal Concept Analysis, Springer Verlag, 1997.

Krajci, S.: Clustering Algorithm Via Fuzzy Concepts, in Proceedings of DATESO workshop, Ostrava, Czech Republic, pp. 94-100, 2003.

Butka, P. Combination of Problem Reduction Techniques and Fuzzy FCA Approach for Building of Conceptual Models from Textual Documents (in Slovak), in Znalosti 2006, 5th annual conference, Ostrava, Czech Republic, pp. 71-82 ,2006.

Belohlavek, R.: Concept Lattices and Formal Concept Analysis (in Czech), in Znalosti 2004, 3rd annual conference, Brno, Czech Rep., pp. 66-84, 2004.

Quan, T. T., Hui, S. C., Cao, T. H.: A Fuzzy FCA-based Approach to Conceptual Clustering for Automatic Generation of Concept Hierarchy on Uncertainty Data, in Proceedings of CLA conference, Ostrava, Czech Republic, pp. 1-12, 2004.

Bednar, P., Butka, P., Paralic., J.: Java Library for Support of Text Mining and Retrieval, in Proceedings of Znalosti 2005, 4th annual conference, Stara Lesna, Slovakia, pp. 162-169, 2005.

Brezany, P., Janciak, I., Sarnovsky, M.: Text Mining within the GridMiner Framework, in 2nd Dialogue Workshop, Edinburg, GB, 2006.

Janciak, I., Sarnovsky, M., Tjoa, A. M., Brezany, P.: Distributed Classification of Textual Documents on the Grid, in High Performance Computing and Communications, HPCC, LNCS 4208, Munich, Germany, September 13-15, pp. 710-718, 2006.

Sarnovský, M., Butka, P., Safko, V.: Distributed Clustering oftTextual Documents in the Grid Environment (in Slovak), in Znalosti 2008, 7 h annual conference, Bratislava, Slovakia, pp. 192-203, 2008.

Butka, P., Sarnovský, M., Bednár, P. One Approach to Combination of FCA-based Local Conceptual Models for Text Analysis - Grid-based Acta Polytechnica Hungarica Vol. 6, No. 4, 2009 - 27 - Approach, in Proceedings of SAMI 2008, IEEE CONFERENCE, Herlany, Slovakia, pp. 131-135, 2008.

Butka, P., Zeher, M. Simple Approach to Combination of FCA-based Local Conceptual Models for Text Analysis. In Proceedings of the 7th International Workshop on Data Analysis, WDA 2006, Košice, Slovakia, pp. 1-10 , 2006 .

Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.

B. Akhgar. Strategic Information Systems beyond Technology: A Knowledge Management Perspective. Systems Management Publications IR, 2(3), 1999.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.