Open Access Open Access  Restricted Access Subscription or Fee Access

Retrieval of Keyphrase Automatically from Video Lectures using Semi Supervised Machine Learning Algorithm

P. Saranya, U.K. Balamurali


A few advanced lecture browsers synchronize text with lecture video and also allow search within the transcript. However, these systems are of little use if for instance, a student wants to quickly scan the contents of a particular lecture among a series of lectures. So this project proposes a system that can automatically generate and display section-wise annotations using lecture transcripts. This approach uses a simpler keyphrase-based annotation technique, which functionally strikes a middle ground between detailed annotation and basic video tagging. It uses a supervised machine learning algorithm, based on a Naive-Bayes classifier to extract relevant keyphrases. The goal of keyphrase extraction is to generate an optimal set of phrases appearing in the lecture, which best summarizes its content. The project has shown that a combination of automatic keyphrase extraction and segmentation enhances the functionality of a lecture browser system.


Automatic Key Phrase Extraction, Lecture Browser, Segmentation, and Transcript

Full Text:



Arun Balagopalan, Lalitha Lakshmi Balasubramanian, Vidhya Balasubramanian,Nithin Chandrasekharan, Asw Damodar, “Automatic Keyphrase Extraction and Segmentation of Video Lectures”

J. R. Glass, T. J. Hazen, D. S. Cyphers, K. Schutte, and A. Park, “The mit spoken lecture processing project,” in Proceedings of HLT/EMNLP on Interactive Demonstrations, pp. 28–29, 2005.

“SEE - Stanford Engineering Everywhere.”

[4] K. Zechner, “Automatic generation of concise summaries of spoken dialogues in unrestricted domains,” in SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 199–207, 2001.

I. Gurevych and M. Strube, “Semantic similarity applied to spoken dialogue summarization,” in COLING ’04, 2004.

I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning, “Kea: practical automatic keyphrase extraction,” in DL ’99: Proceedings of the fourth ACM conference on Digital libraries, pp. 254–255, 1999.

P. D. Turney, “Learning algorithms for keyphrase extraction,” Information Retrieval, pp. 303–336, 2000.

A. Hulth, “Improved automatic keyword extraction given more linguistic knowledge,” in Proceedings of the 2003 conference on Empirical methods in natural language processing, pp. 216–223, 2003.

O. Medelyan and I. H. Witten, “Thesaurus based automatic keyphrase indexing,” in JCDL ’06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp. 296–297, 2006.

S. N. Kim and M.-Y. Kan, “Re-examining automatic keyphrase extraction approaches in scientific articles,” in MWE ’09: Proceedings of the Workshop on Multiword Expressions, pp. 9–16, 2009.

F. Fukumoto and Y. Sekiguchi, “Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles,” in

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 1998.

Y. Matsuo and M. Ishizuka, “Keyword extraction from a single document using word co-ocurrence statistical information,” 2003.

L. van der Plas, V. Pallotta, M. Rajman, and H. Ghorbel, “Automatic Keyword Extraction from Spoken Text. A Comparison of two Lexical Resources: the EDR and WordNet,” ArXiv Computer Science e-prints, 2004.

F. Liu, D. Pennell, F. Liu, and Y. Liu, “Unsupervised approaches for automatic keyword extraction using meeting transcripts,” in NAACL ’09: Proceedings of Human Language Technologies, pp. 620–628, 2009.

A. Haubold and J. R. Kender, “Analysis and visualization of index words from audio transcripts of instructional videos,” IEEE International Symposium on Multimedia Software Engineering,, pp. 570–573, 2004.

N. Yamamoto, J. Ogata, and Y. Ariki, “Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition,” in EUROSPEECH-2003, pp. 961–964, 2003.

O. Perspective and C. L. Wayne, “Topic detection tracking (tdt),” in In Proceedings DARPA Broadcast News Transcription and Understanding Workshop, 1998.

M.A.K. Halliday and R. Hasan, Language, Context and Text: Aspects of language in a social-semiotic perspective. Deakin University Press, 1989.

M. A. Hearst, “Texttiling: Segmenting text into multi-paragraph subtopic passages,” Computational Linguistics, pp. 33–64, 1997.

K. T. Frantzi and S. Ananiadou, “Extracting nested collocations,” in COLING, 1996.

P. R. Christopher Manning and H. Schutze, An Introduction to Information Retrieval. Cambridge University Press, 2009.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.