Open Access Open Access  Restricted Access Subscription or Fee Access

A New Approach to Improve the Performance of Page Content Ranking in Web Content Mining

R. Gunasundari, S. Karthikeyan

Abstract


The Internet is a huge collection of data that is highly unstructured which makes it enormously difficult to search and retrieve valuable information. The present day’s web searching capabilities, networking and computational efficiency has allowed the user with huge bandwidth and very fast downloading speeds, but the time wasted in browsing through the uninteresting documents is enormous. The unstructured characteristic of the information sources on the Web makes automated discovery of Web information difficult. The goal of the paper is to design a new method in the Web Content Mining category and to describe its prototype implementation and the first experiments. The proposed method concerns the problem and how to determine a relevance ranking of web pages with respect to a given query. 


Keywords


Content Mining Data Mining, Search Engines, Soft Computing, Web Mining

Full Text:

PDF

References


http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html (Retrieved: June 2008)

Bing Liu, Kevin Chen- Chuan Chang , Editorial: Special issue on Web Content Mining , SIGKDD Explorations, Volume 6, Issue 2.

Cheng Wang, Ying Liu, Liheng Jian, Peng Zhang, A Utility based Web Content Sensitivity Mining Approach, International Conference on Web Intelligent and Intelligent Agent Technology (WIIAT), IEEE/WIC/ACM 2008.

Hongqi li, Zhuang Wu, Xiaogang Ji, Research on the techniques for Effectively Searching and Retrieving Information from Internet, International Symposium on Electronic Commerce and Security, IEEE 2008

Jaroslav Pokorny, Jozef Smizansky, Page Content Rank: An approach to the Web Content Mining

http://pr.efactory.de/e-pagerank-algorithm.shtml (Retrieved: June 2008)

Brin, S. and Page, L. (2007). The Anatomy of a Large-Scale Hypertextual Web Search Engine, Stanford University Computer Science Department, USA. http://www.db.stanford.edu/~backrub/google.html

G.Poonkuzhali, K.Thiagarajan, K.Sarukesi, Elimination of redundant Links in web pages- Mathematical Approach, Proc. Of World Academy of Science, Engineering and Technology, Volume 40, April 2009, pp 555-562

Ozsoyoglu, G., and Al-Hamdani, A. (2008). Web Information Resource Discovery: Past, Present, and Future, invited paper, 18th International Symposium on Computer and Information Sciences (ISCIS), Antalya, Turkey, Pages 9-18.

G.Poonkuzhali, K.Thiagarajan, K.Sarukesi,Set theoretical Approach for mining web content through outliers detection, International journal on research and industrial applications, Volume 2, Jan 2009.

http://www.webworkshop.net/pagerank.html (Retrieved: June 2008).

Brian, S. and Page, L. (1998). The anatomy of a large-scale hyper textual Web search engine, Computer Networks 30 (1-7):107-117.

Peng Yang, Biao Huang, A modified Density Based Outliers Mining Algorithm for large Dataset, 2008 IEEE, International Seminar on Future Information technology and Management Engineering.

Peng Yang, Biao Huang, Density Based Outliers Mining Algorithm with Application to Intrusion Detection, 2008 IEEE, Pacific asia workshop on computational Intelligence and Industrial Application.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.