Open Access Open Access  Restricted Access Subscription or Fee Access

An Implementation of Citation Parser Based On Sequence Alignment

R. Shiva Shankar, K.V.S.S.R. Murthy, V.M.N.S.S.V.K.R. Gupta, D. Ravibabu


Most of the network services failed to provide exact set of data about the publishers of academic publications due to drastic increase in their number around the world. Here the main problem is that various publishers use various citation styles [1]. They format citation string in one of the thousands of different styles. So, it is difficult for the researchers to extract accurate metadata from thousands of existing citation styles. To avoid these difficulties in extracting metadata from citations, a parser called “Bibpro” is implemented in this project. The parser converts the components of a citation as its structural properties. The main principle of Bibpro parser is to create template database containing a set of sequence templates of different formats and then use the suitable sequence template to parse the given citation string[2]. It provides accuracy and quality data representation for academic publications. Using machine learning techniques this parser also provides efficient extraction templates for effective implementation.


Bibpro, Blast, CRAMP, INFOMAP, Flux -Cim

Full Text:



Hetzner, Erik. "A simple method for citation metadata extraction using hidden markov models.", pp. 280-284. ACM, 2008.

Giles, C. Lee, Kurt D. Bollacker, and Steve Lawrence. "CiteSeer: An automatic citation indexing system.", pp. 89-98. ACM, 1998.

Han, Hui, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, and Edward A. Fox. "Automatic document metadata extraction using support vector machines." In Digital Libraries, 2003. On, pp. 37-48. IEEE, 2003.

Takasu, Atsuhiro. "Bibliographic attribute extraction from erroneous references based on a statistical model." In Digital Libraries, 2003. on, pp. 49-60. IEEE, 2003.

Huang, and Wen-Chang Lin. "Extracting citation metadata from online publication lists using BLAST." Springer Berlin Heidelberg, 2004.

Chen, Hung-Yu Kao, and Jan-Ming Ho. "BibPro: A citation parser based on sequence alignment techniques.” pp. 1175-1180. IEEE, 2008.

Lee, Dongwon, Jaewoo and Byung-Won On. "Are your citations clean?." Communications of the ACM 50, no. 12 (2007): 33-38.

Eytan Adar and Jeremy Hylton. On-the-fly hyperlinkcreation for page images. In Proceedings of DigitalLibraries ’95 -, June 1995.

T. A. Brooks. Evidence of complex citer motivations.Journal of the American Society for Information Science, 37:34–36, 1986.

Robert D. Cameron. A universal citation database as acatalyst for reform in scholarly communication. FirstMonday, 2(4), 1997.

Donna Bergmark and Carl Lagoze, "An Architecture for Automatic ReferenceLinking," TR 2000-1820, October, 2000.

Mike Jewell, "ParaTools Reference Parsing Toolkit - Version 1.0 Released," Vol. 9, No.2, Feb., 2003.

Donna Bergmark, "Automatic Extraction of Reference Linking Information from Online Documents," TR 2000-1821, November, 2000.

D. Lee, J. Kang, P. Mitra, C. L. Giles, and B.-W. On, “Are your citations clean?,” Commun. ACM, vol. 50, pp. 33-38, 2007.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.