A General Study on Information Extraction and Ontologies
Ontology is a model of the world, represented as a tangled tree of linked concepts. Concepts are language-independent abstract entities, not words. They are expressed in this ontology using English words and phrases only as a simplifying convention.
Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents.
Information Extraction aims to retrieve certain types of information from natural language text by processing them automatically. For example, an information extraction system might retrieve information about geopolitical indicators of countries from a set of web pages while ignoring other types of information. IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model.
Ontology-based information extraction has recently emerged as a subfield of information extraction. Here, ontologies - which provide formal and explicit specifications of conceptualizations - play a crucial role in the information extraction process.
Aone C, Ramos-Santacruz M (2000) REES: A Large–Scale Relation and Event Extraction System. Proc. of ANLP’2000.
Springer-Verlag, pp. 172-188. Basili R, Pazienza MT, (1993) Semi-automatic extraction of linguistic informa¬tion for syntactic disambiguation, in Applied Artificial Intelligence, 7:339-364.
Kim J, Moldovan D (1995) Acquisition of linguistic patterns for knowledge-based informa¬tion extraction. IEEE Transcactions on Knowledge and Data Engineering, 7(5):713-724.
Cowie J, Wilks Y (2000) Information Extraction. In R Dale, H Moisl and H Somers (eds.) Handbook of Natural Language Processing. New York: Marcel Dekker.
Embley DW, Campbell DM, Smith RD (1998) Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Document. Proceedings of CIKM’98.
Freitag D (1998) Toward General-Purpose Learning for Information Extraction. Proceed¬ings of COLING-ACL-98.
Grishman R, Sundheim B (1996) Message Understanding Conference –6: A Brief History, Proceedings of COLING’96, pp 466-471.
Soderland S (1999) Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning Journal, vol 34.
Wilks Y (1997): Information Extraction as a core language technology, in Information Extraction, MT Pazienza (ed), Springer, Berlin.
Gaizauskas R, Wilks Y (1997) Information Extraction: Beyond Document Retrieval. Memoranda in Computer and Cognitive Science, CS-97-10.
Brachman RJ (1979) On the epistemological status of semantic networks, In Associative networks : representation and use of knowledge by computers, Findler, Academic Press,New York, 3-50.
Mikheev A, Grover C (1999) Named entity recognition without gazetteers. In Proceedings of the Annual Meeting of the European Association of Computational Linguistics (EACL’99), Bergen, pp 1-8.
Miller GA (1990) WordNet: An on-line lexical database. International Journal of Lexicog¬raphy, 3(4):235-312.
Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 2nd Edition (Prentice-Hall, Englewood Cliffs, NJ, 2003) 848-850.
Hwang, Incompletely and imprecisel speaking: using dynamic ontologies for representing and retrieving information. In: E. Franconi and M. Kifer (eds), Proceedings of the 6th International Workshop on Knowledge Representation Meets Databases, (ACM, New York, 1999)
F. Wu, R. Hoffmann, and D. S. Weld, Information extraction from Wikipedia: moving down the long tail. In: Proceedings of the 14th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, (ACM, NewYork, 2008)
A. Maedche and S. Staab, The Text-To-Onto Ontology Learning Environment. In Software Demonstration at the Eighth International Conference on Conceptual Structures, (Springer-Verlag, Berlin, 2000).
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.