Open Access Open Access  Restricted Access Subscription or Fee Access

Alignment of English-Hindi Sentences

Shweta Dubey, Tarun Dhar Diwan

Abstract


In this paper methodology is based on the exploitation of parallel English-Hindi word dictionary after syntactically and semantically analysis of the English-Hindi source text. We are using this methodology for the English and Hindi sentences, but the methodology can also be used for other languages. Large parallel corpus of English-Hindi pair language is not usually available. Therefore proposed system is developed in two strategies to overcome this problem. First strategy is normalization of tagged English sentences and Hindi sentences. Normalization process calculates equal number of words in English-Hindi sentences to find exact alignment of each word. Second strategy is mapping English-Hindi sentence using parallel English-Hindi word dictionary. Parallel English-Hindi dictionary contains normalized English-Hindi word, which are more able to align than previous other approach of alignment. Fortunately, this task, word alignment is well known, and some aligning algorithms are freely available. This provides strong background to this research. Hence proposed system is very successful to understand meaning of each expression generates by human being in form of natural language.


Keywords


Normalization, Tagging, Local Word Grouping, Word Mapping, Part of Speech, Word Dictionary, Multi Word Expressions

Full Text:

PDF

References


Niraj Aswani, “Aligning words in English-Hindi parallel corpora”, Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 115–118.

Tong Xiao, Huizhen Wang, “The NiuT rans Machine Translation System for NTCIR-9 Patent”, Proceedings of NTCIR-9, December 6-9, 2011, Tokyo, Japan, Pages 593-599.

Niraj Aswani, “A hybrid approach to align sentences and words in English-Hindi parallel corpora”, Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 57–64.

Antony P J, Nandini. J. Warrier, Dr. Soman K P,“Penn Treebank-Based Syntactic Parsers for South Dravidian Languages using a Machine Learning Approach”, International Journal of Computer Applications (0975 –8887), Volume 7– No.8, October 2010, pages 14-21.

Yoshinobu Kano, Jun’ichi Tsujii, “Sharable Type System Design for Tool Inter-Operability and Combinatorial Comparison”, The First International Conference on Global Interoperability for Language Resources, pages 121-129.

Richard Beaufort, Sophie Roekhaut, Louise-Amélie, Cougnon Cédrick Fairon, “A hybrid rule/model-based finite-state framework for normalizing SMS messages”, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 770– 779.

Hassan Al-Haj, Shuly Wintner, “Identifying Multi-word Expressions by Leveraging Morphological and Syntactic Idiosyncrasy”, Proceedings of the 23rd International conference on Computational Linguistics (Coling 2010), pages 10–18.

Yulia Tsvetkov, Shuly -Wintner, “Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources” Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 836–845

Aswarth Dara, Prashanth Mannem, Hemanth Sagar Bayyarapu and Avinesh PVS,” Transferring Syntactic Relations from English to Hindi Using Alignments on Local Word Groups”.

Niraj Aswani, Robert Gaizauskas, “Aligning words in English-Hindi parallel corpora”

Niraj Aswani Robert Gaizauskas, “A hybrid approach to align sentences and words in English-Hindi parallel corpora”

Akshar Bharati, V.Sriram, A.Vamshi Krishna, Rajeev Sangal, Sushma Bendre, “An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information”.

Gurpreet Singh Josan, “Development of Punjabi-Hindi Aligned Parallel Corpus from Web Using Machine Translation”

Aasim Ali, Shahid Siddiq,” Development of Parallel Corpus and English to Urdu Statistical Machine Translation”

Sachin Manchanda1, Divanshu Gupta2, Aram Bhusal, Afreen Ansari and Ratna Sanyal, “Language independent Lexicon Building Tool”


Refbacks

  • There are currently no refbacks.