Recognition of Audiovisual Celebrity in Unrestrained Web Videos

Vijay Dhir; Amit Kamra; Rakesh Kumar; Baljeet Saini

Recognition of Audiovisual Celebrity in Unrestrained Web Videos

Vijay Dhir, Amit Kamra, Rakesh Kumar, Baljeet Saini

Abstract

A number of video clips are available online. Users are uploading videos and provides the source of indexing information as title of video and set of keywords. Automated extraction of video content from a large scale video is a challenging and yet unsolved problem. Proposed method finds the audiovisual mapping. All pieces of information is trained automatically without any human supervision. We presents the results in 1200 videos and show the effectiveness of the method per celebrity basis.

Keywords

Speaker recognition, Face recognition, Diagonal covariances, Equal Error Rates.

Full Text:

PDF

References

W. Zhao, R. Chellappa, PJ Phillips, and A. Rosenfeld, “Face recognition: A literature survey,”ACM Computing Surveys (CSUR), vol. 35, no. 4, pp. 399–458, 2003.

A. Stolcke, S.S. Kajarekar, L. Ferrer, and E. Shrinberg, “Speaker recognition with session variability normalizationbased on mllr adaptation transforms,” Audio, Speech, and Language Processing, IEEE Transaction on, vol. 15, no. 7, pp.1987– 1998, Sept. 2007.

Tsuhan Chen, “Audiovisual speech processing,” Signal Processing Magazine, IEEE, vol. 18, no. 1, pp. 9–21, Jan 2001.

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior,“Recent advances in the automatic recognition of audiovisual speech,”Proceedings of the IEEE, vol. 91, no. 9, pp. 1306– 1326, Sept. 2003.

Tieyan Fu, Xiao Xing Liu, Lu Hong Liang, Xiaobo Pi, and A.V. Neﬁan, “Audio-visual speaker identiﬁcation using coupled hidden markov models,” IEEE ICIP, vol. 3, pp. III–29–32 vol.2, Sept. 2003.

A.V. Neﬁan and Lu Hong Liang, “Bayesian networks in multimodal speech recognition and speaker identiﬁcation,” Signals,Systems and Computers, 2003. Asilomar Conference on, vol.2, pp. 2004–2008 Vol.2, Nov. 2003.

M.E. Sargin, Y. Yemez, E. Erzin, and A.M. Tekalp, “Audiovisual synchronization and fusion using canonical correlation analysis,” Multimedia, IEEE Transactions on, vol. 9, no. 7, pp. 1396–1403, Nov. 2007

B. Maison, C. Neti, and A. Senior,“Audio-visual speaker recognition for video broadcast news: some fusion techniques,” Multimedia Signal Processing, 1999 IEEE 3rd Work-shop on, pp. 161–167, 1999.

J. Ajmera, I. McCowan, and H. Bourlard, “Robust speakerchange detection,”Signal Processing Letters, IEEE, vol. 11,no. 8, pp. 649–651, Aug. 2004.

D.A. Reynolds, T.F. Quatieri, and R.B. Dunn, “Speaker Veriﬁcation Using Adapted Gaussian Mixture Models,”Digital Signal Processing, vol-10 no. 1-3, pp. 19– 41, 2000

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me