Open Access Open Access  Restricted Access Subscription or Fee Access

Recognition of Audiovisual Celebrity in Unrestrained Web Videos

Vijay Dhir, Amit Kamra, Rakesh Kumar, Baljeet Saini


A number of video clips are available online. Users are uploading videos and provides the source of indexing information as title of video and set of keywords. Automated extraction of video content from a large scale video is a challenging and yet unsolved problem. Proposed method finds the audiovisual mapping. All pieces of information is trained automatically without any human supervision. We presents the results in 1200 videos and show the effectiveness of the method per celebrity basis.


Speaker recognition, Face recognition, Diagonal covariances, Equal Error Rates.

Full Text:



W. Zhao, R. Chellappa, PJ Phillips, and A. Rosenfeld, “Face recognition: A literature survey,”ACM Computing Surveys (CSUR), vol. 35, no. 4, pp. 399–458, 2003.

A. Stolcke, S.S. Kajarekar, L. Ferrer, and E. Shrinberg, “Speaker recognition with session variability normalizationbased on mllr adaptation transforms,” Audio, Speech, and Language Processing, IEEE Transaction on, vol. 15, no. 7, pp.1987– 1998, Sept. 2007.

Tsuhan Chen, “Audiovisual speech processing,” Signal Processing Magazine, IEEE, vol. 18, no. 1, pp. 9–21, Jan 2001.

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior,“Recent advances in the automatic recognition of audiovisual speech,”Proceedings of the IEEE, vol. 91, no. 9, pp. 1306– 1326, Sept. 2003.

Tieyan Fu, Xiao Xing Liu, Lu Hong Liang, Xiaobo Pi, and A.V. Nefian, “Audio-visual speaker identification using coupled hidden markov models,” IEEE ICIP, vol. 3, pp. III–29–32 vol.2, Sept. 2003.

A.V. Nefian and Lu Hong Liang, “Bayesian networks in multimodal speech recognition and speaker identification,” Signals,Systems and Computers, 2003. Asilomar Conference on, vol.2, pp. 2004–2008 Vol.2, Nov. 2003.

M.E. Sargin, Y. Yemez, E. Erzin, and A.M. Tekalp, “Audiovisual synchronization and fusion using canonical correlation analysis,” Multimedia, IEEE Transactions on, vol. 9, no. 7, pp. 1396–1403, Nov. 2007

B. Maison, C. Neti, and A. Senior,“Audio-visual speaker recognition for video broadcast news: some fusion techniques,” Multimedia Signal Processing, 1999 IEEE 3rd Work-shop on, pp. 161–167, 1999.

J. Ajmera, I. McCowan, and H. Bourlard, “Robust speakerchange detection,”Signal Processing Letters, IEEE, vol. 11,no. 8, pp. 649–651, Aug. 2004.

D.A. Reynolds, T.F. Quatieri, and R.B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models,”Digital Signal Processing, vol-10 no. 1-3, pp. 19– 41, 2000


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.