Open Access Open Access  Restricted Access Subscription or Fee Access

A Survey on the Classification of Dark Web using Unclassified Ontology Method

M. Sreekrishna, B. Chitra, A. Naveenkumar

Abstract


The deep web are the web that are not a part of surface web. Due to the large volume of data deep web have grained a large attention in recent years. Traditional search engines cannot be used to retrieve content in the deep Web. Those pages do not exist until they are created dynamically as the result of a specific search. The deep web is found to be large magnitude than the surface web. Further those deep web mostly comprises of online domain specific databases, which are accessed by using web query interfaces. In order to make the extraction relevant to user it is necessary to classify the deep web database. In this paper unclassified ontology based web classification method is used for to classify the data in the deep web. This method involves completely unclassified set of data and uses Wikipedia category network for to analyze the meta-information of the deep web sources. The result of the experiment is found to more accurate and fine-grained classification when compared to the existing approaches.

Keywords


Deep Web, Ontology, Semantic Information Retrieval, Semantic Search, Wikipedia

Full Text:

PDF

References


K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. Structured databases on the web: Observations and implications. SIGMOD Record, 33(3):61-70, Sept. 2004.

B. He, T. Tao, and K. C.-C. Chang. "Organizing structured web sources by query schemas: a clustering approach," Proc. Of Conference on Information and Knowledge Management (CIKM 04), ACM Press, 2004, pp.22--31.

Deep web search directory service: http://www.completeplanet.com.

Deep web search directory service: http://www.invisibleweb.com.

Wikipedia:http://en.wikipedia.org/wiki/Deep_Web

BrightPlanet.com. The deep web: Surfacing hidden value. Accessible at http://brightplanet.com, July 2000.

Barbosa, L., Freire, J., Silva, A. "Organizing hidden-web databases by clustering visible web documents," Proc. Of IEEE 23rd International Conference on on Data Engineering (ICDE 07), IEEE Press, 2007, pp.326--335.

L. Gravano, P. G. Ipeirotis, and M. Sahami. QProber: A system for automatic classification of hidden-Webdatabases. ACM TOIS, 21(1):1-41, 2003.

Panagiotis G. Ipeirotis , Luis Gravano , Mehran Sahami, Automatic Classification of Text Databases Through Query Probing, Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases, p.245-255, May 18-19, 2000

Noor, U., Rashid, Z. and Rauf, A. A survey of automatic deep web classification techniques. In International Journal of Computer Applications (IJCA), Vol. 19(6), 2011, 43-50.

Schonhofen, P. Identifying document topics using the Wikipedia category network. In the proceedings of International Conference on Web Intelligence (IEEE/WIC/ACM), 2006, 456-462.

Huynh, D., Cao, T., Pham, P. and Hoang,T. Using Hyperlink Texts to Improve Quality of Identifying Document Topics Based on Wikipedia. In the proceedings of International Conference on Knowledge and Systems Engineering (ICKSE), 2009, 249-254.

Feilmayr, C., Barta, R., Grün, C., Pröll, B. and Werthner, H. Covering the Semantic Space of Tourism: an Approach based on Modularized Ontologies. In workshop on Context, Information and Ontologies (CIAO, ESWC), 2009.

Nummiaho, A. and Vainikainen, S. Utilizing Linked Open Data Sources for Automatic Generation of Semantic Metadata and Semantic Research. In Communications in Computer and Information Science (CCIS), 2010, 78-83.

Halevy, A. Y. Why your data don‟t mix. In the journal of ACM Queue, Vol. 3(8), 2005.

Janik, M. and Kochut, K. Training less ontology based text categorization. In workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR), 2008, 3-17.

Gabrilovich, E. and Markovitch, S. Feature Generation for Text Categorization Using World Knowledge. In the proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2005, 1048- 1053.

UMBEL: http://www.umbel.org.

CompletePlanet. http://www.completeplanet.com

Syed, Z., Finin, T. and Joshi, A. Wikipedia as Ontology for Describing Documents. In the proceedings of International Conference on Weblogs and Social Media (AAAI), 2008, 136-144.

C.-N. Hsu and M.-T. Dung, “Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web,” Information Systems, vol. 23, no. 8, pp. 521-538, 1998.

Huynh, D., Cao, T., Pham, P. and Hoang, T. Using Hyperlink Texts to Improve Quality of Identifying Document Topics Based on Wikipedia. In the proceedings of International Conference on Knowledge and Systems Engineering (ICKSE), 2009, 249–254.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.