A Proposal for the Semantic based Report Generation of Related HTML Documents

A.M. Abirami, Dr.A. Askarunisa


Today most of the web pages are in the form of HTML only. Many data do exist, but there is no or less way for generating reports from various but related HTML pages. For example, the information of an individual person may be stored in HTML pages. There is no way for collectively getting the report about all the people for particular information. Most of the time, this is done manually. This paper proposes a semantic based approach for generating reports from HTML pages using semantic technologies like OWL, RDF and SPARQL. The required HTML pages are navigated and information from the table and the list are collected as a first step. The data is pre-processed and formatted in a CSV file, such that it enables further processing easier. OWL files are created for the corresponding domain which can act as a dictionary for the application. CSV contents are separated based on the OWL files and the rules. Separated contents are stored in the RDF format and SPARQL is used to query the RDF file. The proposed model thus can be a handy tool for the management people to generate reports readily, without spending much manual time.



