Open Access Open Access  Restricted Access Subscription or Fee Access

A Proposal for the Semantic based Report Generation of Related HTML Documents

A.M. Abirami, Dr.A. Askarunisa


Today most of the web pages are in the form of HTML only. Many data do exist, but there is no or less way for generating reports from various but related HTML pages. For example, the information of an individual person may be stored in HTML pages. There is no way for collectively getting the report about all the people for particular information. Most of the time, this is done manually. This paper proposes a semantic based approach for generating reports from HTML pages using semantic technologies like OWL, RDF and SPARQL. The required HTML pages are navigated and information from the table and the list are collected as a first step. The data is pre-processed and formatted in a CSV file, such that it enables further processing easier. OWL files are created for the corresponding domain which can act as a dictionary for the application. CSV contents are separated based on the OWL files and the rules. Separated contents are stored in the RDF format and SPARQL is used to query the RDF file. The proposed model thus can be a handy tool for the management people to generate reports readily, without spending much manual time.



Full Text:



David Camacho and Maria D. R-Moreno, “Web Data Extraction from Semantic Generators”, VSP International Science Publishers, The Netherlands, 2006.

Gopinath Ganapathy, S. Sagayaraj, “To Gerate the Ontology from Java Source Code”, International Journal of Advanced Computer Science and Applications, Vol. 2, No.2, February 2011.

Pavel Smr and Marek Schmidt, “Information Extraction in Semantic Wikis”.

Urvi Shah, Tim Finin, Anupam Joshi, “Information Retrieval on the Semantic Web”.

Peter Haase, Nenad Stojanovic, York Sure, and Johanna Volker, “Personalized Information Retrieval in Bibster, a Semantics-Based Bibliographic Peer-to-Peer System”.

Shengping Liu, Yuan Ni, Jing Mei, “iSMART: Ontology-based Semantic Query of CDA Documents”.

Joost De Valk, “Semantic HTML and Search Engine Optimization”.

Peter Coetzee, Tom Heath, Enrico Motta, “SparqPlug: Generating Linked Data from Legacy HTML, SPARQL and the DOM”.

Josef Petrak, Jan Zemanek, Vojtech Svatek, “Case Study on Linked Data and SPARQL Usage for Web Application Development”.

Olaf Hartig, Christian Bizer, “Executing SPARQL Queries over the Web of Linked Data”.

Shelley Powers, “Practical RDF”.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.