Open Access Open Access  Restricted Access Subscription or Fee Access

Text Mining: Knowledge Discovery from Unstructured Data

Rafi Ahmad Khan, Sabreen Kanth

Abstract


Everyday academic, economic, business and social activities lead to generation of huge amount of new data and information. This content of data which is predicted to increase at enormous rate each year has significant value be it societal or economic. The automatic detection of similarity between texts and the determination of relevance of documents relative to a certain query is therefore essential to exploit the potential of this enormous amount of data available. Text-Mining is the technique that is used to automatically discover and extract information/knowledge from the huge volume of semi-structured or unstructured data. Keeping in view the importance of Text-Mining, this paper discusses the concept of Text-Mining and also provides a framework to form the step by step model of Textual Data Mining. It also discusses tools of Text Mining as well as its applications and limitation.


Keywords


Textual Data Mining (TDM), Data Mining (DM), Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), Data Warehouse (DW), Intermediate Form (IF).

Full Text:

PDF

References


A Comprehensive Survey of Data Mining-based Fraud Detection Research [Journal] / auth. Phua C. [et al.] // Clayton School of Information Technology, Monash University. - 2005.

A Survey of Text Mining Techniques and Applications [Journal] / auth. Gupta Vishal and Lehal Gurpreet S. // Journal of Emerging Technologies in Web Intelligence. - 2009.

Automated Concept Extraction from Plain Text [Journal] / auth. Gelfand Boris, Wulfekuhler Marilyn and Punch William F.. - 1998.

Common Knowledge-How Companies Thrive by Sharing What They Know [Book] / auth. Dixon Nancy M.. - [s.l.] : Harvard Business School Press, 2000.

Data on the Web: from relations to semistructured data and XML [Book] / auth. Abiteboul S., Buneman, P., Suciu, D. - San Francisco,USA : Morgan Kaufmann Publishers Inc, 1999.

Digging for Nuggets of Wisdom [Report] / auth. Guernsey Lisa. - [s.l.] : The New York Times, 2003.

Extracting Schema from Semistructured Data [Journal] / auth. Nestorov S, Abiteboul S and Motwani R // SIGMOD'99. - 1998.

FACTA: a text search engine for finding associated biomedical concepts [Journal] / auth. Tsuruoka Y, Tsujii J and Ananiadou S // Bioinformatics. - 2008. - 21 : Vol. 24. - pp. 2259-60.

From Data Mining to Knowledge Discovery in Databases [Report] / auth. Fayyad Usama, Piatetsky-Shapiro Gregory and Smyth Padhraic. - [s.l.] : Cambridge Mass: MIT Press/AAAI Press., 1996.

Kleio: a knowledge-enriched information retrieval system for biology [Conference] / auth. Nobata Chikashi [et al.] // 1st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. - Singapore : [s.n.], 2008.

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization [Journal] / auth. Mustafa Atika, Akbar Ali and Sultan Ahmer // International Journal of Multimedia and Ubiquitous Engineering. - 2009.

Knowledge Extraction Using Rule Based Decision Tree [Journal] / auth. Bhargavi P [et al.] // IJCSNS International Journal of Computer Science and Network Security. - 2008.

Knowledge Nirvana-Achieving The Competitive Advantage Through Enterprise Content Management and Optimizing Team Collaboration [Book] / auth. Kelley Juris. - [s.l.] : Xulon Press, 2002.

Knowlegde Discovery in Databases [Journal] / auth. Frawley William J., Piatetsky-Shapiro G. and Matheus Christopher J. // An Overview: in Knowledge Discovery in Databases. - 1992. - pp. 1-27.

Marketing Information Systems [Journal] / auth. Harmon Robert R. // Encyclopedia of Information Systems. - 2003. - pp. 137-151.

MEDIE [Online] / auth. MEDIE. - 2010. - 3 5, 2016. - http://www.nactem.ac.uk/medie/.

Mining Knowledge from Text Using Information Extraction [Journal] / auth. Mooney Raymond J. and Bunescu Razvan // Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). - 2005. - pp. 1-10.

Text analysis and knowledge Mining system [Journal] / auth. Nasukawa T. and Nagano T. // IBM Systems Journal. - 2001.

Text Analysis, Text Mining, and Information Retrieval Software [Online] / auth. kdnuggets. - kdnuggets, 2016. - 3 5, 2016. - http://www.kdnuggets.com/software/text.html.

Text Mining - Knowledge extraction from unstructured textual data [Journal] / auth. Rajman Martin and Besançon Romaric // 6th Conference of International Federation of Classification Societies-IFCS. - 1998.

Text Mining with Information Extraction [Journal] / auth. Nahm Un Yong and Mooney Raymond J. // Spring Symposium on Mining Answers from Texts and Knowledge Bases. - 2002.

Text Mining: A Burgeonining Quality Improvement Tool [Journal] / auth. Alkin Mohammad // Ph.D. Proposal, Department of the Computer Sciences, University of Texas at Austin,. - 2007.

Text Mining:The state of the art and the challenges [Journal] / auth. Tan Ah-Hwee // In Proceedings of the Pacific Asia COnference on Knowedge Discovery and Data Mining PAKDD'99 Workshop on Knowledge Discovery from Advanced Databases. - 2006. - pp. 65-70.

The Essential Guide to Knowledge Management [Book] / auth. Tiwana Amrit. - [s.l.] : Prentice- Hall, 2001.

The Problem with Unstructured Data [Report] / auth. Blumberg R., Atre, S.. - [s.l.] : Information Management Magazine, 2003.

The Value and Benefit of Text Mining [Journal] / auth. McDonald Diane and Kelly Ursula // Joint Information systems Committee. - 2012.

The Wealth of Knowledge: Intellectual Capital and the Twenty-First Century Organization [Book] / auth. Stewart Thomas. - London : Nicholas Brealey Publishing, 2002.

Topic Modeling for OLAP on Multidimensional Text Databases:Topic Cube and its Applications [Journal] / auth. Zhang Duo [et al.] // Statistical Analysis and Data Mining. - 2009. - pp. 378-395.

Untangling Text Data Mining [Journal] / auth. Hearst M. A. // In Proceedings of ACL'99: the 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland. - 1999.

Web Mining: Machine Learning for Web Applications [Journal] / auth. Chen H. and Chau M. // Annual Review of Information Science and Technology. - 2004. - pp. 289-329.

a—Part I: An assessment of feasibility (Periodical style),” IEEE Trans. Electron Devices, vol. ED-11, pp. 34–39, Jan. 1959.

S. Chen, B. Mulgrew, and P. M. Grant, “A clustering technique for digital communications channel equalization using radial basis function networks,” IEEE Trans. Neural Networks, vol. 4, pp. 570–578, July 1993.

R. W. Lucky, “Automatic equalization for digital communication,” Bell Syst. Tech. J., vol. 44, no. 4, pp. 547–588, Apr. 1965.

S. P. Bingulac, “On the compatibility of adaptive controllers (Published Conference Proceedings style),” in Proc. 4th Annu. Allerton Conf. Circuits and Systems Theory, New York, 1994, pp. 8–16.

G. R. Faulhaber, “Design of service systems with priority reservation,” in Conf. Rec. 1995 IEEE Int. Conf. Communications, pp. 3–8.

W. D. Doyle, “Magnetization reversal in films with biaxial anisotropy,” in 1987 Proc. INTERMAG Conf., pp. 2.2-1–2.2-6.

G. W. Juette and L. E. Zeffanella, “Radio noise currents n short sections on bundle conductors (Presented Conference Paper style),” presented at the IEEE Summer power Meeting, Dallas, TX, June 22–27, 1990, Paper 90 SM 690-0 PWRS.

J. G. Kreifeldt, “An analysis of surface-detected EMG as an amplitude-modulated noise,” presented at the 1989 Int. Conf. Medicine and Biological Engineering, Chicago, IL.

J. Williams, “Narrow-band analyzer (Thesis or Dissertation style),” Ph.D. dissertation, Dept. Elect. Eng., Harvard Univ., Cambridge, MA, 1993.

N. Kawasaki, “Parametric study of thermal and chemical nonequilibrium nozzle flow,” M.S. thesis, Dept. Electron. Eng., Osaka Univ., Osaka, Japan, 1993.

J. P. Wilkinson, “Nonlinear resonant circuit devices (Patent style),” U.S. Patent 3 624 12, July 16, 1990.

IEEE Criteria for Class IE Electric Systems (Standards style), IEEE Standard 308, 1969.

Letter Symbols for Quantities, ANSI Standard Y10.5-1968.

R. E. Haskell and C. T. Case, “Transient signal propagation in lossless isotropic plasmas (Report style),” USAF Cambridge Res. Lab., Cambridge, MA Rep. ARCRL-66-234 (II), 1994, vol. 2.

E. E. Reber, R. L. Michell, and C. J. Carter, “Oxygen absorption in the Earth’s atmosphere,” Aerospace Corp., Los Angeles, CA, Tech. Rep. TR-0200 (420-46)-3, Nov. 1988.

(Handbook style) Transmission Systems for Communications, 3rd ed., Western Electric Co., Winston-Salem, NC, 1985, pp. 44–60.

Motorola Semiconductor Data Manual, Motorola Semiconductor Products Inc., Phoenix, AZ, 1989.

(Basic Book/Monograph Online Sources) J. K. Author. (year, month, day). Title (edition) [Type of medium]. Volume(issue). Available: http://www.(URL)

J. Jones. (1991, May 10). Networks (2nd ed.) [Online]. Available: http://www.atm.com

(Journal Online Sources style) K. Author. (year, month). Title. Journal [Type of medium]. Volume (issue), paging if given. Available: http://www.(URL)

R. J. Vidmar. (1992, August). On the use of atmospheric plasmas as electromagnetic reflectors. IEEE Trans. Plasma Sci. [Online]. 21(3). pp. 876—880. Available: http://www.halcyon.com/pub/journals/21ps03-vidmar


Refbacks

  • There are currently no refbacks.