Open Access Open Access  Restricted Access Subscription or Fee Access

Web Content Mining Applications and Methods-A Survey

Roma Yadav, S.R. Tandan


This survey paper concerned with the approaches of web substance mining and different uses of web mining. The World Wide Web (www) is the colossal wellspring of data which is available and searchable as well as a gigantic and huge correspondence channel for world wide. Web contains accumulation of hyperlinks, texts and images. Web mining methods are incredible framework utilized for data extraction. This review goes for giving an organized and extensive outline of the writing in the region of Web Data Extraction Methods and Applications.


Eb Content Mining, Application, Methods, Web Mining

Full Text:



Chia-Hui Chang, Mohammed Kayed, Moheb Ramzy Girgis, Khaled Shaalan "A Survey of Web Information Extraction Systems"IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-0475-1104.R3

Ananthi.J "A Survey Web Content Mining Methods andApplications for Information Extraction from Online Shopping Sites" Ananthi.J / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3) , 2014, 4091-4094

Mr. Dushyant B. Rathod,Dr. Samrat Khanna "A Review on Emerging Trends of Web Mining and A Review on Emerging Trends of web mining and its applications"INTERNATIONAL JOURNAL OF ENGINEERING DEVELOPMENT AND RESEARCH | IJEDR

Emilio Ferraraa ,Pasquale De Meob, Giacomo Fiumarac, Robert Baumgartnerd "Web Data Extraction, Applications and Techniques: A Survey"

M.Srividya, D.Anandhi M.S.Irfan Ahmed "Web Mining and Its Categories – A Survey" International Journal Of Engineering And Computer Science ISSN:2319-7242Volume 2 Issue 4 April, 2013 Page No. 1338-1345

S. Chakrabarti. Information digging for hypertext: An instructional exercise overview. ACM SIGKDD Explorations, 1(2):1–11, 2000

Srivastava, Cooley, Deshpande, and Tan 2000

Michael Jennings,‖ What are the major comparisons or differences between Web mining and data mining?‖ Information Management Online, June 25, 2002.

Sumit Ahlawat , An Introduction to Internet Data Mining,International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 6 - Apr 2014

Wang and Liu 1998; Moh, Lim, and Ng 2000

Faten Khalil, Jiuyong Li and Hua Wang ―A Framework of Combining Markov Model with Association Rules for Predicting Web Page Accesses‖ ,Proc. Fifth Australasian Data Mining Conference (AusDM2006), CRPIT Volume 61,177-184.

Kaikala Anjani Sravanthi1, Yalamarthi Madhavi Lata2 , Web Mining Using Cloud Computing International Journal of Emerging Technology and Advanced Engineering, (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 4, April 2013)

F. Kokkoras, E. Lampridou, K. Ntonas, and I. Vlahavas. MOpiS: A multiple opinion summarizer. In Proc. of the 5th Hellenic Conference on Artificial Intelligence (SETN 2008), pages 110{122, Syros, Greece, 2008. Springer.

K. Dave, S. Lawrence, and D. Pennock. Mining the peanut gallery: opinion extraction and semantic classifcation ofproduct reviews. In Proc. 12th international conference on World Wide Web, pages 519{528, New York, NY, USA, 2003.ACM.

R. Baumgartner, A. Campi, G. Gottlob, and M. Herzog. Web data extraction for service creation. Search Computing: Challenges and Directions, 2010.

F. Chen, A. Doan, J. Yang, and R. Ramakrishnan. Efficient information extraction over evolving text data. In Proc.IEEE 24th International Conference on Data Engineering, pages 943{952, Cancun, Mexico, 2008. IEEE.

D. Amal_tano, A. Fasolino, and P. Tramontana. Reverse engineering _nite state machines from rich internet applications. In Proc. 2008 15th Working Conference on Reverse Engineering, pages 69{73, Washington, DC, USA, 2008. IEEE Computer Society.

R. Baumgartner, O. Frolich, G. Gottlob, P. Harz, M. Herzog, P. Lehmann, and T. Wien. Web data extraction for businessintelligence: the lixto approach. In Proc. 12th Conference on Datenbanksysteme in Buro, Technik und Wissenschaft,pages 48{65, 2005.

Baumgartner, G. Gottlob, and M. Herzog. Scalable web data extraction for online market intelligence. Proc. 35th International Conference on Very Large Databases, 2(2):1512{1523, 2009.

R. Baumgartner, K. Froschl, M. Hronsky, M. Pottler, and N. Walchhofer. Semantic online tourism market monitoring.Proc. 17th ENTER eTourism International Conference, 2010.

E. Melomed, I. Gorbach, A. Berger, and P. Bateman. Microsoft SQL Server 2005 Analysis Services (SQL Server Series).Sams, Indianapolis, IN, USA, 2006.

J. Han and M. Kamber. Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000.

A. Zanasi. Competitive intelligence through data mining public sources. In Competitive intelligence review, volume 9, pages 44{54. Wiley, New York, NY, ETATS-UNIS (1990-2001) (Revue), 1998.

H. Chen, M. Chau, and D. Zeng. Ci spider: a tool for competitive intelligence on the web. Decis. Support Syst., 34(1):1{17, 2002.

R. Fayzrakhmanov, M. Goebel, W. Holzinger, B. Kruepl, A. Mager, and R. Baumgartner. Modelling web navigationwith the user in mind. In Proc. 7th International Cross-Disciplinary Conference on Web Accessibility, 2010.

C. Kohlschutter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proc. 3rd ACMinternational conference on Web search and data mining, pages 441{450. ACM, 2010.

J. Masan_es. Web archiving: issues and methods. Web Archiving, pages 1{53, 2006.

G. Gkotsis, K. Stepanyan, A. Cristea, and M. Joy. Self-supervised automated wrapper generation for weblog dataextraction. In Proc. of the British National Conference on Databases (BNCOD 2013) - Big Data, pages 292{302, Oxford, United Kingdom, 2013. Springer.

E. Morphy. Amazon pushes ’personalized store for every customer’., 2001.

Google Inc.

Google Recognized As Top Business-To-Business Media Property.

Google News.

T. Springer. Google LaunchesNews Service., 1080%1, 74470, 00.html, 2002.

DoubleClick’s DART Technology., 2002.

Alexa research.

DoubleClick’s Lawsuit.,1367,36434,00.html, 2002.

C. Dembeck and P. A. Greenberg. Amazon: Caught Between a Rockand a Hard Place., 2002.

MicroSoft.NET Passport.

America Online., 2002.

eBay Inc.

E. Colet. Using Data Mining to Detect Fraud in Auctions, 2002.

Yahoo!, Inc.

Yodlee, Inc.

K. Bollacker, S. Lawrence, and C.L. Giles. CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. In Katia P. Sycara and Michael Wooldridge, editors, Proceedings of the Second InternationalConference on Autonomous Agents, pages 116–123, New York, 1998.ACM Press.

CiteSeer Scientific Literature Digital Library.

S. Lawrence, C.L. Giles, and K. Bollacker. Digital Libraries and AutonomousCitation Indexing. IEEE Computer, 32(6):67–71, 1999.

H. Ahonen, O. Heinonen, M Klementtinen, and A. Verkamo, Applying data mining techniques for descriptive phrase extraction in digital document collections.In advance in digital library (ADL’98), Santa Barbara, California, USA, April 1998.

D. billsus and M.Pazzani. A hybrid user model for news story classifications. In Proceedings of the seventh International Conference on User modeling (UM ‘99), Banff. Canada, 1999.

W.W. Cohen. Learningto classify English text with ilp methods. In Advances in inductive Logic programming (Ed. L. De Raedt), IOS press, 1995.

S. Dumais, J. Platt, D. Heckerman, and M. sahami, Inductive learning algorithm and representation for text categorization. In Proceedings’ of the 1998 ACM 7th International conference on Information and knowledge management, pages 148-155, 1998

R. Feldman and I. Dagan. Knowledge discovery in textual database (kdt). In proceedingsof the first International Conference on knowledge Discovery and Data Mining (KDD-95), pages112-117, Montreal, Canada, 1995.

R. Feldman, M. Fresko, Y. Lindell,O. Liphstat, M. rajman, Y.Schler, and O. zamir.Text Mining at the term leve.In Principles of Data Mining and knowledge Discovery, Second European Symposium, PKDD ’98 volume 1510 of Lecture Notes in Computer Science, pages 56-64. Springer, 1998

E. frank, G. W. painter,L.H. witten, C. Gutwin, and C.G. Nevill-Manning.Domain-specific keyphrase extraction.In Preceedings Of 16th International Joint Conference an Artificial Intelligence IJCAI-99,pages 668 673, 1999

D. Freitag, and McCallum. Information extraction with hmmms and shrinkage,In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, 1999

T. Hofmann. The Cluster-abstraction model;Unsupervised learning of topic hierarchies from text data.In Proceedings of 16th International Joint Conference On Artificial Intelligence IJCAI-99, pages 682-687 , 1999

T. Honkela, S Kaski,K. Lagus and T. Kohonen. Webson-self-organizing maps of documents collections.In Proc. Of Workshop On Self-Organizing Maps 1997(WSOM’97), 19997

M. Junker, et al. Learning for text categorization and information extraction with ilp.In Proceeding of the workshop on learning language in logic,Bled, Slovenia1999.

H. Kargupta et al., Distributed data mining using an agent based architecture.In proceeding of knowledge Discovery and data mining.AAAI press, 1997.

U.Y. Nahm et al. Ua mutually benificialintegration of data mining and information extraction, AAAI-00, 2000.

K. Ngam, et al.Using maximum entropy for text classification, IJCAI, 1999.

S. Scott et al. Feature engineering for text classification. ICML, 1999.

S. Soderland. Learning information extraction rules for semi-structured and free text. 1996.

S.M. Weiss et al. maximizing text-mining performance, IEEE 1999.

W.Wiener et al. A neural network approach to topic spotting, SDAIR -1995.

L.H. Witten et al. Text mining: A new frontier for lossless compression, 1999.

Y.Yang et al. learning approach for detecting and tracking news events.

M. Craven et al. Learning to extract symbolic knowledge from the World Wide Web. AAAI-1998.

F. Crimmins et al. information discovery on the internet, IEEE-1999.

J. Fiirnkranz.Exploiting structural information for text classification on the www.IDA-1999.

T. Joachims et al.Webwatcher: A tour guide for the world wide web.IJCAI-1997.

I.Muslea et al. wrapper induction for semi structured, web based information source.CONALD-1998.

T. Elliasi-rad. Intelligent agents for web based tasks: An advice-taking approach. AAAI-1999.

L. Singh et al. A robust system architecture for mining semi- structured data. 1998.

S. Soderland. Learning information extraction rules for semi-structured and free text. 1996.

R. Goldman et al.Query processing for structured data and not standard data formats 1999.

Grubach et al. In search of the lost schema. ICDT-1999.

S. Nestorov et al. Inferring structure in semistructured data. SIGMOD – 1997

H. Toivonen et al. knowledge discovery from advance database. KDAD-1999.

H.L.K. Wang et al. Discovery association of structure from semi-structured object.IEEE-1999.

O. Zaiane et al. Webml: Querying the World Wide Web for resources and knowledge. ACMCIKM-1998.

Raymond Kosala et al. Web Mining Research: A Survey, SIGKDD-2000.

Ketul B. patel et al.Web Mining in E-Commerce: Pattern Discovery,Issues and Applications, International Journal of P2P Network Trends and Technology- Volume1Issue3- 2011

Pramod patil et al.Web Forum Crawling Techniques, International Journal of Computer Applications2014.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.