Open Access Open Access  Restricted Access Subscription or Fee Access

Speaker Recognition Application using MFCC GUI Concept

Bhargav Ravat, Arani Shah, Arjun Bambhaniya, Anjali Diwan

Abstract


Speech, iris, face, finger print are the fundamental parameters that can help in designing a biometric authentication system. These kinds of systems are helpful in recognizing the identity of the authenticated person. The voice is a signal of infinite information. Speech based biometric authentication system is the most successful one due to simple nature of acquiring the voice and uniqueness associated with it. In this paper Speech processing is emerged as one of the important application area of digital signal processing. Various fields for research in speech processing are speech recognition, speaker recognition, speech synthesis, speech coding etc. The objective of automatic speaker recognition is to extract, characterize and recognize the information about speaker identity. The mel frequency cepstral coefficient (MFCC) is one of the most important features required among various kinds of speech applications. Fundamental motto is to design a safety box which is being operated by the voice through MATLAB - MFCC and GUI. The system developed is able to recognize specific user by extracting various characteristics of speech signals. Tested on various sample speech which belongs to male and female of various age group. The level of accuracy obtained from MFCC is much higher than of the other concepts like Dynamic Time Wrapping and Perceptual Linear Prediction. Another advantage is its ease of usage which is being obtained through its GUI concept.

Keywords


Mel Frequency Cepstral Coefficient (MFCC), GUI, Speech Processing, FFT, DCT, Hamming Window, Feature Extraction.

Full Text:

PDF

References


Lawrence Rabiner, Biing-Hwang Juang – „Fundamentals of Speech Recognition‟

Wei Han, Cheong-Fat Chan, Chiu-Sing Choy and Kong-Pang Pun – „An Efficient MFCC Extraction Method in Speech Recognition‟, Department of Electronic Engineering, The Chinese University of Hong Kong, Hong, IEEE – ISCAS, 2006

Leigh D. Alsteris and Kuldip K. Paliwal – „ASR on Speech Reconstructed from, Short- time Fourier Phase Spectra‟, School of Microelectronic Engineering Griffth University, Brisbane, Australia, ICLSP - 2004

Waleed H. Abdulla – „Auditory Based Feature Vectors for Speech Recognition Systems‟, Electrical & Electronic Engineering Department, The University of Auckland

Pradeep Kumar P and Preeti Rao – „A Study of Frequency-Scale Warping for Speaker Recognition‟, Dept of Electrical Engineering, IIT- Bombay, National Conference on Communications, NCC 2004, IISc Bangalore, Jan 30 -Feb 1, 2004

Beth Logan – „Mel Frequency Cepstral Coefficients for Music Modeling‟, Cambridge Research Laboratory, Compaq Computer Corporation

Keller, E.: “Fundamentals of Speech Synthesis and Speech Recognition”, John Wiley & Sons, New York, USA, (1994).

Markowitz, J.A.: “Using Speech Recognition”, Prentice Hall, (1996).

Yılmaz, C.: “A Large Vocabulary Speech Recognition System for Turkish“, MS Thesis, Bilkent University, Institute of Engineering and Science, Ankara, Turkey, (1999).

Mengüsoglu, E.: “Rule Based Design and Implementation of a Speech Recognition System for Turkish Language”, MS Thesis, Hacettepe University, Inst. for Graduate Studies in Pure and Applied Sciences, Ankara, Turkey, (1999).

Zegers, P.: “Speech Recognition Using Neural Networks”, MS Thesis, University of Arizona, Department of Electrical Engineering in the Graduate College, Arizona, USA, (1998).

Woszczyna, M.: “JANUS 93: Towards Spontaneous Speech Translation”, IEEE Electronics & communication Eng. Institute of technology, Nirma University _ Page 67 Proceedings Conference on Neural Networks, (1994).

Somervuo, P.: “Speech Recognition using context vectors and multiple feature streams”, MS Thesis, (1996).

Nilsson, M.; Ejnarsson, M.: “Speech Recognition Using HMM: Performance Evaluation in Noisy Environments”, MS Thesis, Blekinge Institute of Technology, Department of Telecommunications and Signal Processing, (2002).

Hakkani-Tur, D.; Oflazer, K.; Tur, G.:. “Statistical Morphological Disambiguation for Agglutinative Languages”, Technical Report, Bilkent University, (2000).

Ursin, M.: “Triphone Clustering in Continuous Speech Recognition”, MS Thesis, Helsinki University of Technology, Department of Computer Science, (2002).

www.dspguide.com/zipped.htm: “The Scientist and Engineer's Guide to Digital Signal Processing” (Access date: March 2005).

Brookes, M.: “VOICEBOX: a MATLAB toolbox for speech processing”, www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, (2003).

Davis, S.; Mermelstein, P.: “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 4 (1980).

Skowronski, M.D.: “Biologically Inspired Noise-Robust Speech Recognition for Both Man and Machine”, PhD Thesis, The Graduate School of the University of Florida, (2004).

Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi , Voice Recognition Algorithms using Mel frequency Cepstral Coefficient (MFCC) and Dynamic Time Wraping (DTW) Techniques

Vibha Tiwari, MFCC and its application in Speaker Recognition.

Bansod N.S. , Seema Kawathekar ,Dabhade S.B., „Review Of Different Techniques for Speaker Recognition Systems‟


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.