Open Access Open Access  Restricted Access Subscription or Fee Access

Design and Implementation of the Cloud based Application for Text Mining Tasks

Martin Sarnovsky

Abstract


Main objective of the presented work was to design and implement the visualization solution for text mining tasks. Specific goal within the effort presented here was to design the decision tree classifier visualization focused mostly on specific aspects of text classification. Visualization tool is based on commonly used decision tree visualization techniques, which are modified to specific conditions of visualizations of large data. Designed solution was implemented in Processing language on top of Java based text mining library and tested on several standard textual document collections. Solution was integrated to the web portal providing text mining services to enhance the experience of solving text mining tasks by addition of a module providing a visual feedback of constructed models.


Keywords


Text Mining, Classification, Decision Trees, Visualization.

Full Text:

PDF

References


J Paralic, et .al, Dolovanie znalostí z textov. Košice, Equilibria, 2010, 182 s. ISBN 978-80-89284-62-7.

M. Sarnovsky, P. Butka, V Safko: Distribuované zhlukovanie textových dokumentov v prostredí Gridu. Publikované v: Znalosti 2008, Bratislava, Slovakia, s.192-203, ISBN 978-80-227-2827-0.

M. Sarnovsky, I. Janciak, A Min Tjoa, P. Brezany: Distributed classification of textual documents on the Grid. Publikované v: High Performance Computing and Communications : Second international conference, HPCC 2006, Munich, Germany, September 13-15, 2006 : Proceedings. Berlin : Springer, 2006. s. 710-718, ISBN 3-540-39368-4.

T. D. Nguyen, T. B. Ho, H. Shimodaira, A visualization tool for interactive learning of large decision trees. ICTAI, 2000. Proceedings. 12th IEEE International Conference on. IEEE, 2000. s. 28-35.

T. D. Nguyen, T. B. Ho, H. Shimodaira, Interactive Visualization in Mining Large Decision Trees. Publikované v: Knowledge Discovery and Data Mining. Current Issues and New Applications: Current Issues and New Applications: 4th Pacific-Asia Conference, PAKDD 2000 Kyoto, Japan, April 18-20, 2000 Proceedings. Springer, 2000. s. 345 - 348.

Y. Liu, G. Salvendy, Interactive Visual Decision Tree Classification. Publikované v: Human-computer Interaction: 12th International Conference, HCI International 2007, Beijing, China, Júl 22-27, 2007: Proceedings. Interaction platforms and techniques. Springer-Verlag Berlin Heidelberg, 2007. s. 92 - 105, ISBN 978-3-540-73107-8..

C. Reas, B. Fry, Ben: Getting Started with Processing. USA: O'Reilly Media, June 2010. 208 s,. ISBN 978-1-4493-7980-3.

B. Fry, Ben: Visualizing Data. USA: O'Reilly Media, December 2007. 384 s., ISBN 978-0-596-51455-6.

P. Bednar, P. Butka, “JBOWL - java bag-of-words library“, in 5th PhD Student Conference, FEI TU Kosice, Slovakia, pp. 19-20, 2005.

GridGain Systems: GridGain 3.0 White Paper [online]. 2010. Available online: http://www.gridgain.com/media/gridgain_white_paper.pdf

J. R. Quinlan. Learning first-order definitions of functions. Journal of Artificial Intelligence Research, 5:139–161, 1996.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.