Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features

R. Padmamala; E. Kannan; V. Prema

Efficient Calculation of Maximum Likelihood Estimation for Authorship Attribution Using Lexical and Syntactic Features

R. Padmamala, E. Kannan, V. Prema

Abstract

In this paper two models for Authorship Attribution using Bayesian approach are compared. Authorship attribution deals with the ascertainment of the actual author for a particular text. When two authors, say A1 and A2, claim to be the author of a particular essay, the real author is to be found out. For solving such a problem usually maximum likelihood estimation (MLE) for the authors under dispute is computed i.e., train a probabilistic model for author A1 and another probabilistic model for author A2. Then using those, calculate the MLE. This method is known as Bayesian approach. For doing this an unknown text and two authors with a large text sample each are needed. To calculate the maximum likelihood unigram, bigram or trigram models can be chosen. Usually unigrams are chosen; number of occurrences of those unigrams are found out; their probabilities are calculated. Based on the higher probability actual author is ascertained. The above seen is the method commonly used for Authorship Attribution. In this paper another method which consider the singleton unigram words is going to be used, that is, the words that have occurred only once in the text under dispute or “the unknown text”. In this paper, vocabulary usage to ascertain the original author is concentrated upon. Also an advanced method of using further grammatical features like Syntactic features is proposed. Both singleton unigram model and unigram model are used to find out the maximum likelihood estimate.

Keywords

Unigrams, Singleton Unigrams, Tokenizer, Bayesian Approach, Syntax, POS Tagger, Parser

Full Text:

PDF

References

R. Padmamala, V. Prema, “Usage of lexical and Syntactic features when calculating the maximum likelihood estimation for Authorship Attribution” ICST 2011.

Sindhu Raghavan, Adriana Kovashka, Raymond Mooney, “Authorship Attribution Using Probabilistic Context-Free Grammars”, ACL 2010.

Daniel Jurafsky & James H. Martin, Speech and language processing - An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Pearson Education, Inc. First Indian Reprint, 2002

Kim Luyckx and Walter Daelemans. 2008. Authorship Attribution and Verification with Many Authors and Limited Data. In Proceedings of the 22nd International Conference on Computational Liguisitics (COLING), pages 513-520, August.

E. Stamatatos, N. Fakotakis, and G. Kokkinakis. 1999. Automatic Authorship Attribution. In Proceedings of the 9th Cionference of the European Chapter of the Association for Computational Linguistics (EACL), pages 158-164, Miorristown, NJ, USA. Association for Computational Linguistics.

Rong Zheng, Yi Qin, Zan Huang, and Hsinchun Chen. 2009. Authorship Analysis in Cybercrime Investigation. Lecture Notes in Computer Science, 2665/2009:959.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me