A General Study on Information Extraction and Ontologies

S. Malathi


Ontology is a model of the world, represented as a tangled tree of linked concepts. Concepts are language-independent abstract entities, not words. They are expressed in this ontology using English words and phrases only as a simplifying convention.

Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents.

Information Extraction aims to retrieve certain types of information from natural language text by processing them automatically. For example, an information extraction system might retrieve information about geopolitical indicators of countries from a set of web pages while ignoring other types of information. IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and key­words, because the extracted pieces of texts are interpreted with respect to a prede­fined partial domain model. 

Ontology-based information extraction has recently emerged as a subfield of information extraction. Here, ontologies - which provide formal and explicit specifications of conceptualizations - play a crucial role in the information extraction process.


Ontologies,Information extraction (IE) and RDF.

