Open Access Open Access  Restricted Access Subscription or Fee Access

Finding Celebrity in Web Videos using Audiovisual Recognition

V.M. Gayathri, Dr.R. Nedunchelian

Abstract


There are number of video clips available online is upward at a stunningstride. Conservatively, user-supplied metadata text, such as the title of the video and a set of keywords, has been the only source of indexing information for user-uploaded videos. Automated extraction of video content for unconstrained and large scale video databases is a challenging and yet baffling problem. In this paper, we current an audiovisual celebrity recognition system nearreflex tagging of unconstrained web videos. Earlier work on audio-visual person recognition depend on the fact that the person in the video is speaking and the structures extracted from audio and visual domain are allied with each other throughout the video. However, this assumption is not valid on unconstrained web videos. Projected method finds the audiovisual mapping and hence improves upon the association assumption. Considering the scale of the application, all pieces of the system are trained automatically without any human supervision. We present the results on 26,000 videos and show the effectiveness of the method per-celebrity basis.


Keywords


Speaker Recognition

Full Text:

PDF

References


W. Zhao, R. Chellappa, PJ Phillips, and A. Rosenfeld, “Face recognition: A literature survey,” ACM Computing Surveys (CSUR), vol. 35, no. 4, pp. 399–458, 2003.

A. Stolcke, S.S. Kajarekar, L. Ferrer, and E. Shrinberg, “Speaker recognition with session variability normalization based on mllr adaptation transforms,” Audio, Speech, and Lan-guage Processing, IEEE Transactions on, vol. 15, no. 7, pp. 1987–1998, Sept. 2007.

Tsuhan Chen, “Audiovisual speech processing,” Signal Pro-cessing Magazine, IEEE, vol. 18, no. 1, pp. 9–21, Jan 2001.

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior, “Recent advances in the automatic recognition of audiovisual speech,” Proceedings of the IEEE, vol. 91, no. 9, pp. 1306– 1326, Sept. 2003.

Tieyan Fu, Xiao Xing Liu, Lu Hong Liang, Xiaobo Pi, and A.V. Nefian, “Audio-visual speaker identification using cou-pled hidden markov models,” IEEE ICIP, vol. 3, pp. III–29–32 vol.2, Sept. 2003.

A.V. Nefian and Lu Hong Liang, “Bayesian networks in multi-modal speech recognition and speaker identification,” Signals, Systems and Computers, 2003. Asilomar Conference on, vol. 2, pp. 2004–2008 Vol.2, Nov. 2003.

M.E. Sargin, Y. Yemez, E. Erzin, and A.M. Tekalp, “Audio-visual synchronization and fusion using canonical correlation analysis,” Multimedia, IEEE Transactions on, vol. 9, no. 7, pp. 1396–1403, Nov. 2007.

B. Maison, C. Neti, and A. Senior, “Audio-visual speaker recognition for video broadcast news: some fusion tech-niques,” Multimedia Signal Processing, 1999 IEEE 3rd Work-shop on, pp. 161–167, 1999.

Ming Zhao, Jay Yagnik, Hartwig Adam, and David Bau, “Large scale learning and recognition of faces in web videos,” Automatic Face and Gesture Recognition, 2008. FGR 2008. 8th Int. Conf. on, September 2008.

J. Ajmera, I. McCowan, and H. Bourlard, “Robust speaker change detection,” Signal Processing Letters, IEEE, vol. 11, no. 8, pp. 649–651, Aug. 2004.

D.A. Reynolds, T.F. Quatieri, and R.B. Dunn, “Speaker Ver-ification Using Adapted Gaussian Mixture Models,” Digital Signal Processing, vol. 10, no. 1-3, pp. 19–41, 2000.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.