Text-Independent Speaker Identification using Residual Feature Extraction Technique

S. Selva Nidhyananthan; R. Shantha Selva Kumari; G. Jaffino

Text-Independent Speaker Identification using Residual Feature Extraction Technique

S. Selva Nidhyananthan, R. Shantha Selva Kumari, G. Jaffino

Abstract

The Mel Frequency Cepstral Coefficients (MFCC)
parameters are derived mainly to represent the spectral envelope or formant structure of the vocal tract system. In this paper, a new feature extraction technique WOCOR is proposed to capture the spectro temporal source excitation characteristics embedded in the linear predictive (LP) residual signal. The vocal Source Wavelet Octave Coefficients Of Residues (WOCOR) information contains pitch frequency and phase in the residual signal. WOCOR features are called vocal source feature because they are dependent on the source of the speech namely the pitch being generated by the vocal
folds. WOCOR is generated by applying pitch synchronous wavelet transform to the residual signal. Pitch Synchronous wavelet transform is used to capture the spectro temporal characteristics of the excitation signal. Experimental evaluation is carried out on TIMIT database with 630 speakers using Gaussian Mixture Model (GMM) and Naive Bayesian Classifier. Experimental results show that, speaker identification based on GMM modeling out performs Naive
Bayesian classifier based speaker identification. Comparatively an increased in speaker identification efficiency of 6.69% is achieved with GMM modeling for WOCOR feature extraction.

Keywords

Naive Bayesian Classifier, Feature Extraction, GMM, Speaker identification, WOCOR.

Full Text:

PDF

References

R.Shantha Selva Kumari, S.Selva Nidhyananthan and G.Jaffino,”Vocal

Source Feature Extraction for Robust Speaker Identification,

“International conference for AEEICB’12.

D.O.Shaughnessy,”Speaker Recognition,” IEEE Acoustic speech signal

process, Mag.,vol.3,no.4, pp 4-7, oct-1986.

Douglas O’ Shaughnessy,”Speech Communication Human and

Machines,” II nd edition, Universities press (India) Limited (2001).

Ning Wang,” Robust Speaker Recognition using denoised vocal source

and vocal tract features,” IEEE Transactions on audio, speech and

language processing vol.19,no.1,Jan.2011.

D.A.Reynolds and R.C.Rose published a paper,” Robust textindependent

speaker identification using Gaussian mixture speaker

models,” IEEE Transactions on speech audio processing, vol.3, 1995, pp

-83.

A.E.Rosenberg et al.,” Connected word talker verification using whole

word Hidden Markov Models,” in Proc.ICASSP, 1991, pp 381-384.

Tomoko Matsui and Sadaoki Furui,” Comparison of Text independent

speaker Recognition methods using VQ Distortion and Discrete

Continuous HMM’s,” IEEE Transactions on speech and audio

processing, vol.2, no.3, July1994.

L.Baird, D.Smalenberger, S.Ingkiriwang, “One-step Neural network

inversion with pdf learning and emulation”, IEEE International

conference, vol.2, Aug.2005.

Jesper Kjaer Nielsen, Mads Graesboll Christensen, A.Taylan Cemgil,

Simon J.Godstill and Soren Holdt Jensen,” Bayesian Interpolation and

parameter estimation in a dynamic sinusoidal model,” IEEE

Transactions on audio, speech and language processing, vol.19, no.7,

September 2011.

Wai Nang Chan, Nengheng Zheng and Tan Lee,” Discrimination power

of vocal source and vocal tract related features for speaker

segmentation,” IEEE Transactions on audio, speech and language

processing, vol.15, no.6, august 2007.

Nengheng Zheng, Tan Lee and P.C.Ching,” Integration of

complementary acoustic features for speaker Recognition,” IEEE signal

processing letters, vol.14, no.3, march 2007.

Nengheng Zheng, P.C.Ching and Tan Lee,” Time-Frequency analysis of

vocal source signal for speaker Recognition,” in Proc.ICSLP 2004,

pp.2336.

L.Daubechies, Ten Lectures on wavelets. Philadelphia, PA: SIAM,

Lawrence R.Rabiner, Ronald W.Schafer,” Introduction to Digital Speech

Processing,” vol.1, Nos. 1-2 (2007)1-194.

C.Miyajima, Y.Hattori, K.Tokuda, T.Kabayashi and T.Kitamura,” Text-

Independent speaker identification using Gaussian mixture models based

on multispace probability distribution,” IEEE Transactions on

information and system, vol.E84-B, 2001, pp.847-855.

Ching Tang Hsieh, Eugene Lai and You Chuang Wang,” Robust

Speaker identification system based on wavelet transform and Gaussian

mixture model,” Journal of Information science and Engineering 19,

-282 (2003).

B.S.Atal,”Effectiveness of linear prediction characteristics of the speech

wave for automatic speaker identification and verification,” The journal

of Acoustical society of America, vol.55, no.6, pp.1304-1312, 1974.

Nengheng Zheng, Ning wang, Tan Lee and P.C.Ching,”Speaker

Verification using complementary information from vocal source and

vocal tract,” IEEE conference ISCSLP 2006.

Ke Chen, Lan Wang and Huisheng Chi,”Methods of combining multiple

classifiers with different features and their applications to textindependent

speaker identification,” International Journal of pattern

Recognition and artificial Intelligence, vol.11, no.3, pp.417-445, 1997.

Ke Chen, Lan Wang and Huisheng Chi,”Methods of combining multiple

classifiers with different features and their applications to textindependent

speaker identification,” International Journal of pattern

Recognition and artificial Intelligence, vol.11, no.3, pp.417-445, 1997.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me