Open Access Open Access  Restricted Access Subscription or Fee Access

Speech Recognition in Noisy Conditions using Radon Transform and Discrete Cosine Transform from the Features Derived from Gammatone Filter Bank (GTFB)

Yogesh S. Angal, Mangesh S. Deshpande, Raghunath S. Holambe, Rajan H. Chile

Abstract


This paper presents a new feature extraction technique
based on a Gammatone Filter Bank (GTFB) for speech recognition using Radon Transform (RT) and Discrete Cosine Transform (DCT). In the proposed scheme speech specific features have been extracted by applying image processing technique to the patterns available from
speech signal by applying Gammatone Filter Bank. Radon projections for twenty six orientations are captured. The acoustic characteristics of the Gammatone Filter Bank applied to the speech signal. DCT applied on Radon projections yields low dimensional feature vectors. The
technique is computationally efficient and robust to session variations and insensitive to additive noise. The performance of the proposed algorithm is evaluated in presence of additive white Gaussian noise from (30dB to -5dB SNR) on Texas Instruments-46 (TI-46) speech
database. The proposed algorithm improves the performance of the speech recognition system in noisy environment compared to the existing popular algorithms like Mel frequency Cepstral Coefficient
(MFCC), Linear Predictive Cepstral Coefficients (LPCC), Perceptual Linear Prediction (PLP).


Keywords


Speech Recognition, Gammatone Filters, Feature Extraction, Radon Transform, Discrete Cosine Transform.

Full Text:

PDF

References


S.B.Davis and P. Mermelstein, “Comparison of Parametric

Representations for Monosyllabic Word Recognition in Continuously

Spoken Sentences,” IEEE Trans. ASSP, vol. 28, no. 4, pp. 357–366, Aug.

F. Itakura, “Minimum Prediction residual principle applied to speech

recognition,” IEEE Trans. Acoustics, Speech, Signal Processing, vol

ASSP-23, pp.67-72, February 1975.

J. Markhoul, “Spectral analysis of speech by linear prediction,” IEEE

Trans. Audio Electroacoust, vol.21, pp.140-148, June 1973.

G. M. White and R.B.Neely, “Speech Recognition Experiments with

Linear Prediction, Bandpass Filtering, and Dynamic Programming,”

IEEE Trans. Acoustics, Speech Signal Proc., and ASSP-25 vol.5,pp.

-442, October 1977.

J. D. Markel and A.H. Gray, Jr., “Linear Prediction Of speech,” Springer

Verlag, 1976.

B. S. Atal and S. L. Hanauer, “Speech Analysis and Synthesis by Linear

Prediction of speech wave,” J. Acoust. Soc. Am., vol.50, no.2,

pp.637-655, August, 1971.

H.Hermansky, “Perceptual Linear Predictive (PLP) Analysis of Speech,”

J. Acoust. Soc. Am., vol. 87, no.4, pp.1738-1752, 1990.

R. Munkong and B. H. Jaung, “Auditory perception and cognition,” IEEE

Signal Processing Magazine, pp.98-99, 2008.

R.A.Cole, A.I.Rudnicky and V.M.Zue, “Performance of an expert

spectrogram reader,” Journal of Acoustic Society of America,vol.65,

pp.81-87, 1979.

V. Hohmann, “Frequency analysis and Synthesis using a Gammatone

filterbank,” ACTA Acoustic United with Acustica, vol.88, pp.433-442,

P.I.M.Johanesma, “The pre-response stimulus ensemble of neurons in the

cochlear nucleus,” In the Proceedings of symposium on hearing

Theory,Eindoven, Netherlands,pp.58-69,1972

E.Boer and H.R.Jongh, “On cochlear encoding potentialities and

limitations of the reverse-correlations technique,” The journal of the

Acoustical Society of America, vol 63, no.1, pp.115-135, 1978.

Patterson, R.D. and B.C.J Moore, “Auditory filters and excitation patterns

as representations of frequency resolution”, In: Moore, B.C.J. (Eds),

Frequency Selectivity in Hearing. Academic Press Ltd., London, pp.

-177, 1986.

R.D.Patterson, K.Robinson , J. Holdsworth, D. McKeown, C. Zhang, and

M. Allerhand , ‘ Complex sounds and auditory images,” In: Cazals, Y.,

Demany, L., Horner, K. (Eds), Auditory physiology and perception, Proc.

th International Symposium on Hearing. Pergamon, Oxford, pp.

-177, 1992.

B.R.Glasberg and B.C.J.Moore, B.C.J., “Derivation of auditory filter

shapes from notched-noise data, “ Hearing Research, vol.47,

pp.103-138, 1990.

G. Beylkin, “Discrete radon transform,“ IEEE Transactions on Acoustics,

Speech and Signal Processing , vol.35,no.2, pp.162–172,1987

P. K. Ajmera, D. V. Jadhav, R. S. Holambe.,“Text-independent speaker

identification using Radon and discrete cosine transforms based features

from speech spectrogram,” Pattern Recognition,2011,

doi:10.1016/j.patcog.2011.04.009,

D.V.Jadhav and R.S. Holambe., “Radon and Discrete Cosine Transforms

based feature extraction and dimensionality reduction approach for face

recognition,” Signal Process. vol.88, no.10, pp.2604-2609, 2008.

D.V.Jadhav and R.S. Holambe, “Feature extraction using Radon and

Wavelet transforms with application to face recognition,”

Neuro-computing,72, pp.1951-1959,2009

D.V.Jadhav and R.S. Holambe,“Rotation illumination invariant

polynomial kernel Fisher discreminant analysis using Radon and discrete

cosine transforms based features for face recognition,” Pattern

Recognition Letters-31, pp.1002-1009, 2010.

R. Schbuter, I. Bezrukov, H. Wagner, H. Ney “Gammatone features and

feature combination for large vocabulary speech recognition”, Proc. IEEE

ICASSP, pp. IV-649–IV-652, 2007

L. Rabiner, B. H. Juang, “Fundamentals of Speech Recognition,” Prentice

Hall, Englewood Cliffs, NJ, 1993.

D.P.W.Ellis."Gammatone-like spectrograms," web

resourcehttp://www.ee.columbia.edu/~dpwe/resources/matlab/gammato

negram/, 2009.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.