Open Access Open Access  Restricted Access Subscription or Fee Access

A Visual Search Engine for Searching Tamil Web Pages Using Web Community Mining and Natural Language Processing

A. Vijaya Kathiravan, M. Reka

Abstract


With the growing Tamil interest and Internet, the amount of Tamil data doubles every 12-14 months and will ncrease even more dramatically in the coming year. With an enormous amount of Tamil data stored in web pages, it is increasingly important to develop powerful tools for analysis of such Tamil data and mining interesting patterns from it. There is a strong interest in employing methods of data mining to generate models of Tamil related web pages forming web communities. Web community refers collection of web pages of similar interest implicitly or explicitly.This paper proposes a new initiative for forming Tamil web communities with concise introduction about web community mining. The main intention of this paper is to employ web community mining technique for providing better results in search engines and to visualize the search engine results as Tamil web communities using a suitable visualization tool. This paper exploits visualization, web community mining and natural language processing (NLP) techniques. Visualization is the graphical presentation of information, with the goal of providing the viewer with a qualitative understanding of the information contents. This paper focuses on selecting the appropriate visualization tool best suited for displaying search engine results using visualization techniques. Various visualization techniques are also described in this paper. This community mining will yield benefits to all Tamil lovers,who want to be well-versed in a Tamil domain of his own interest.Tamil research publications and literatures in Tamil are grouped using bibliometric analysis. By forming people communities (i.e., people belonging to similar interest) using social network analysis,the domain knowledge in Tamil can be shared. Hence, web community mining may play an important role in forming Tamil Web Communities for gathering Tamil resources and documents of similar interest from the ocean of web very easily.


Keywords


Visualization, Search engine, Web community mining, Information Retrieval, Tamil communities, Social network analysis, Bibliometric analysis, Community mining, and Natural Language Processing (NLP), Tree Graph, Map Graph, Bi-partite graph.

Full Text:

PDF

References


Broder, A. Z., Glassman, S. C., Manasse, M. S., “Syntactic Clustering of the Web”, Proc.World Wide Web Conference, 1997.

Botafogo, R. A., Shneiderman, B, ”Identifying Aggregates in Hypertext Structures”, Proc.ACM Conference on Hypertext, p.63-74, 1991.

Kumar. R., Raghavan, P. Rajagopalan. S., Tomkins, A., “Trawling the Web for Emerging Cyber-Communities”, Proc. World Wide Web Conference, 1999.

Kleinberg, J. M., “Authoritative Sources in a Hyperlinked Environment”, Proc. ACM- SIAM Symposium on Discrete Algorithm,p. 668-677, 1998.

Brin, S., Page, L., “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, Proc. World Wide Web Conference, 1998.

Furen Lin, ChunHung Chen, KuoLung Tsai, “Discovering Group Interaction Patterns in a Teachers Professional Community”,Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS’03), IEEE, 2002.

Wen-Jun Zhou, Ji-Rong Wen, Wei-Ying Ma, Hong-Jiang Zhang, “A Concentric-Circle Model for Community Mining in Graph Structures”,Technical Report in Microsoft Research, MSR-TR-2002-123, Nov. 15,2002.

P. Krishna Reddy and Masaru Kitsuregawa, “An approach to relate the web communities through bipartite graphs”, Institute of Industrial Science, The University of Tokyo, Japan, 2001.

Naohiro, Matsumura1, Yukio Ohsawa, Mitsuru Ishizuka, “Future Directions of Communities on the Web”, School of Engineering,University of Tokyo, Japan, 2000.

Alexandrin Popescul, Gary William Flake, Steve Lawrence, Lyle H.Ungar, C. Lee Giles, “Clustering and Identifying Temporal Trends in Document Databases”, in IEEE Advances in Digital Libraries, ADL 2000.

Dmitry Zelenko, Chinatsu Aone, 2006, Discriminative methods for Transliteration, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP2006), pages612–617, 2006.

Surya Ganesh, Sree Harsha, Prasad Pingali, Vasudeva Varma, Statistical transliteration for CrossLangauge Information Retrieval using HMM alignment and CRF, The Second International Workshop on Cross Lingual Information Access-Addressing the Informaion Need of Multilingual Socoeties, 2008.

Sathiya Keerthi S, Sundararajan S, CRF versus SVM Struct for Sequence Labeling, Yahoo Research technical report, 2007.

Taskar B, Lacoste-Julien S, and Klein D., A Discriminative Matching Approach to Word, 2005


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.