Open Access Open Access  Restricted Access Subscription or Fee Access

Knowledge Discovery from Semantic Web Data using Data Mining Techniques: A Survey

Brinda S. Pujara, Sahista S. Machchhar


Current Web includes data in various forms such as text, image, video, audio, etc. Thus data is highly unstructured and heterogeneous. In today’s world of fast growing technology there is a great need for the knowledge discovery regarding web data, which was demanded to match the relevance of the data, presented to the user. Thus, it demands for techniques which make the user data machine understandable. Semantic web provides various ways to discover and extract knowledge from web using semantics or we can say metadata, by using various data mining techniques. This paper presents a brief layout of semantic web mining and will provide information about various data mining techniques like Association, Classification, Clustering, etc. for semantic web data , which will be helpful for knowledge discovery. Also, comparison of various data mining techniques is provided.


Association Rules, Classification, Clustering, Data Mining, Semantic Web Mining, Machine Learning, Soft Computing.

Full Text:



A. Sharma, S. Kumar, M. Singh , ―Semantic Web Mining For Intelligent Web Personalization‖ , Journal of Global Research in Computer Science, vol. 2, No. 6, pp.77-81, June 2011.W.-K. Chen, Linear Networks and Systems (Book style). Belmont, CA: Wadsworth, 1993, pp. 123–135.

G. Stummea,, A. Hothoa, , Bettina Berendt, , ―Semantic Web Mining: State of the art and future directions‖, Web Semantics: Science, Services and Agents on the World Wide Web: Elsevier, vol. 4, Issue 2, pp. 124-143, June 2006

S. Muggleton, L.D. Raedt, Inductive logic programming: theory and methods, The Journal of Logic Programming, vol. 19–20, supp. 1, pp. 629–679, May–July 1994..

Scholkopf, B., Smola, A.,Muller, K.R., ―Nonlinear component analysis as a kernel eigenvalue problem‖, Max Planck Institute for Biological Cybernetics, T¨ubingen, Germany, Tech. Rep. 44, 1996.

P. Patel-Schneider, D. Fensel, ―Layering the semantic Web: Problems and directions‖, The Semantic Web — ISWC 2002 ,Lecture Notes in Computer Science : Springer, vol. 2342, pp 16-29, 2002.

P. Patel-Schneider, J. Simeon, ―Building the semantic Web on XML‖, The Semantic Web — ISWC 2002, Lecture Notes in Computer Science, vol. 2342, pp 147-161, 2002

T. R. Gruber, ―Toward principles for the design of ontologies used for knowledge sharing?‖, International Journal of Human-Computer Studies : Elsevier, vol. 43, Issues 5–6, pp. , 907–928, Nov. 1995.

N. Fuhr , S. Hartmann , G. Lustig , M. Schwantner , K. Tzeras , G. Knorz., AIR/X - a Rule-Based Multistage Indexing System for Large Subject Fields, in Proc. RIO’91, 606-623,1991..

H. Schutze, D. A. Hull, J. O. Pedersen, ―A Comparison of Classifiers and Document Representations for the Routing Problem‖, in Proc. 18th Annual International ACM SIGIR conference on Research and Development in Information Retrival (SIGIR’95), 1995.

Yang Y., ―Expert Network : Effective and Efficient Learning from Human Decisions using Text Categorization and Retrieval‖, Proc. 17th Annual International ACM SIGIR Conference on Research and Developement in Information Retrieval (SIGIR’91), pp.13-22,1994.

D. Koller and M. Sahami, ―Hierarchically Classifying Documents using very few words‖, Proc. 14th International Conference on Machine Learning (ICML’97), pp.170-178,1997..

Lewis, David D., and M. Ringuette. "A comparison of two learning algorithms for text categorization", in 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR’94, vol. 33, pp. 81-93. 1994.

S. Dumais, J. Platt, D. Heckerman and M. Sahami, ―Inductive Learning Algorithms and Representations for Text Categorization‖, in Proc. 7th International Conference on Information and Knowledge Management,(CIKM’98), pp. 148-155,1998.

T. Joachims, ―Text categorization with Support Vector Machines: Learning with many relevant features‖, in Proc. 10th European Conference on Machine Learning Chemnitz, Germany, pp. 137-142, April 21–23, 1998.

S. Dumais and H. Chen,‖Hirerachical classification of web content‖, in Proc. 23rd Annual International ACM SIGIR Conference on Research and Developemnt in Information Retrival(SIGIR’00), pp.256-263, 2000.

H. Chen and S. T. Dumais, ―Bringing order to the Web: Automatically categorizing search results‖, Proc. CHI2000, pp.145-152, 2000.

A. Heß, N. Kushmerick, ―Learning to Attach Semantic Metadata to Web Services‖, Proc. 2nd International Semantic Web Conference, Sanibel Island, FL, USA, October 20-23, 2003,258-273.

Agrawal, Rakesh, and R. Srikant. "Fast algorithms for mining association rules", Proc. 20th Int. Conf. Very Large Data Bases, VLDB. vol. 1215. 1994.

Benjamin C. M. Fung , Ke Wangy and Martin Ester, ―Hierarchical document clustering using frequent itemsets‖ , Proc. the 3rd SIAM International Conference on Data Mining, pp. 56-70,2003.

Anaya-Sánchez, Henry, Aurora Pons-Porrata, and Rafael Berlanga-Llavori. "A document clustering algorithm for discovering and describing topics", Pattern Recognition Letters, vol. 31, Issue 6, pp. 502-510, 15 April 2010.

V. Nebot and R. Berlanga, ―Finding association rules in semantic web data‖, Knowledge-Based Systems, vol. 25, Issue 1, pp. 51-62, Feb. 2012, Pages 51–62.

R. Srikant, R. Agrawal, ―Mining generalized association rules‖, VLDB , pp. 407–419,1995.

Y. Chi, R.R. Muntz, S. Nijssen, J.N. Kok, ―Frequent subtree mining – an overview‖, Fundamenta Informaticae, vol. 66, issues 1–2 , pp.161–198 , 2005.

M. Kuramochi, G. Karypis "Frequent subgraph discovery",In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pp. 313-320, 2001.

B. Lent, A. Swami, J. Widom, ―Clustering association rules‖, Proc. ICDE’97, pp. 220–231, 1997.

R.J. Miller, Y. Yang, "Association rules over interval data", ACM SIGMOD Record, vol. 26, issue. 2, pp.452-461, 1997.

S. Ramaswamy, S. Mahajan, A. Silberschatz, ―On the discovery of interesting patterns in association rules‖, Proc.the 24th International Conference on Very Large Data Bases, VLDB ’98, San Francisco, CA, USA, pp. 368-379, 1998.

M.J. Zaki, ‖Mining non-redundant association rules‖, Data Mining and Knowledge Discovery: Springer , vol. 9 , issue 3, pp. 223–248, 2004.

H. Xiong, P.N. Tan, V. Kumar, ―Mining strong affinity association patterns in data sets with skewed support distribution‖, Proc. 3rd IEEE International Conference on Data Mining, ICDM’03, IEEE Computer Society, Washington, DC, USA , pp. 387-394, 2003.

L. A. Zadeh, ―Fuzzy logic, neural networks, and soft computing‖, Communications of ACM, vol. 37, pp. 77–84, 1994.

R. Baeza-Yates, ―Information retrieval in the Web: beyond current search engines‖, International Journal of Approximate Reasoning , vol. 34, issues 2–3, pp. 97–104, Nov. 2003

S. K. Pal, V. Talwar and P. Mitra, ―Web Mining in Soft Computing Framework: Relevance, State of the Art and Future Directions‖, IEEE Transactions on Neural Networks, volume 13, paper no. 5, pp. 1163-1177, Sept. 2002.

S. Bloehdorn and Y. Sure, ―Kernel Methods for Mining Instance Data in Ontologies‖, The Semantic Web Lecture Notes in Computer Science: Springer, vol. 4825, pp. 58-71, 2007, 58-71. And Proc. 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007.

Vapnik, V., Golowich, S.E., Smola, A.J, ―Support vector method for function approximation, regression estimation and signal processing‖, Advances in neural information processing systems, pp. 281–287, 1996.

Ogly Aliev, Rafik Aziz, and Rashad Rafik Aliev., ‖Soft computing and its applications‖ in World Scientific, 2001.

Wikipedia [Online]

Wikipedia [Online]

Wikipedia [Online]

Wikipedia [Online]


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.