Speech Recognition in Noisy Conditions using Radon Transform and Discrete Cosine Transform from the Features Derived from Gammatone Filter Bank (GTFB)
Abstract
This paper presents a new feature extraction technique
based on a Gammatone Filter Bank (GTFB) for speech recognition using Radon Transform (RT) and Discrete Cosine Transform (DCT). In the proposed scheme speech specific features have been extracted by applying image processing technique to the patterns available from
speech signal by applying Gammatone Filter Bank. Radon projections for twenty six orientations are captured. The acoustic characteristics of the Gammatone Filter Bank applied to the speech signal. DCT applied on Radon projections yields low dimensional feature vectors. The
technique is computationally efficient and robust to session variations and insensitive to additive noise. The performance of the proposed algorithm is evaluated in presence of additive white Gaussian noise from (30dB to -5dB SNR) on Texas Instruments-46 (TI-46) speech
database. The proposed algorithm improves the performance of the speech recognition system in noisy environment compared to the existing popular algorithms like Mel frequency Cepstral Coefficient
(MFCC), Linear Predictive Cepstral Coefficients (LPCC), Perceptual Linear Prediction (PLP).
Keywords
Full Text:
PDFReferences
S.B.Davis and P. Mermelstein, “Comparison of Parametric
Representations for Monosyllabic Word Recognition in Continuously
Spoken Sentences,” IEEE Trans. ASSP, vol. 28, no. 4, pp. 357–366, Aug.
F. Itakura, “Minimum Prediction residual principle applied to speech
recognition,” IEEE Trans. Acoustics, Speech, Signal Processing, vol
ASSP-23, pp.67-72, February 1975.
J. Markhoul, “Spectral analysis of speech by linear prediction,” IEEE
Trans. Audio Electroacoust, vol.21, pp.140-148, June 1973.
G. M. White and R.B.Neely, “Speech Recognition Experiments with
Linear Prediction, Bandpass Filtering, and Dynamic Programming,”
IEEE Trans. Acoustics, Speech Signal Proc., and ASSP-25 vol.5,pp.
-442, October 1977.
J. D. Markel and A.H. Gray, Jr., “Linear Prediction Of speech,” Springer
Verlag, 1976.
B. S. Atal and S. L. Hanauer, “Speech Analysis and Synthesis by Linear
Prediction of speech wave,” J. Acoust. Soc. Am., vol.50, no.2,
pp.637-655, August, 1971.
H.Hermansky, “Perceptual Linear Predictive (PLP) Analysis of Speech,”
J. Acoust. Soc. Am., vol. 87, no.4, pp.1738-1752, 1990.
R. Munkong and B. H. Jaung, “Auditory perception and cognition,” IEEE
Signal Processing Magazine, pp.98-99, 2008.
R.A.Cole, A.I.Rudnicky and V.M.Zue, “Performance of an expert
spectrogram reader,” Journal of Acoustic Society of America,vol.65,
pp.81-87, 1979.
V. Hohmann, “Frequency analysis and Synthesis using a Gammatone
filterbank,” ACTA Acoustic United with Acustica, vol.88, pp.433-442,
P.I.M.Johanesma, “The pre-response stimulus ensemble of neurons in the
cochlear nucleus,” In the Proceedings of symposium on hearing
Theory,Eindoven, Netherlands,pp.58-69,1972
E.Boer and H.R.Jongh, “On cochlear encoding potentialities and
limitations of the reverse-correlations technique,” The journal of the
Acoustical Society of America, vol 63, no.1, pp.115-135, 1978.
Patterson, R.D. and B.C.J Moore, “Auditory filters and excitation patterns
as representations of frequency resolution”, In: Moore, B.C.J. (Eds),
Frequency Selectivity in Hearing. Academic Press Ltd., London, pp.
-177, 1986.
R.D.Patterson, K.Robinson , J. Holdsworth, D. McKeown, C. Zhang, and
M. Allerhand , ‘ Complex sounds and auditory images,” In: Cazals, Y.,
Demany, L., Horner, K. (Eds), Auditory physiology and perception, Proc.
th International Symposium on Hearing. Pergamon, Oxford, pp.
-177, 1992.
B.R.Glasberg and B.C.J.Moore, B.C.J., “Derivation of auditory filter
shapes from notched-noise data, “ Hearing Research, vol.47,
pp.103-138, 1990.
G. Beylkin, “Discrete radon transform,“ IEEE Transactions on Acoustics,
Speech and Signal Processing , vol.35,no.2, pp.162–172,1987
P. K. Ajmera, D. V. Jadhav, R. S. Holambe.,“Text-independent speaker
identification using Radon and discrete cosine transforms based features
from speech spectrogram,” Pattern Recognition,2011,
doi:10.1016/j.patcog.2011.04.009,
D.V.Jadhav and R.S. Holambe., “Radon and Discrete Cosine Transforms
based feature extraction and dimensionality reduction approach for face
recognition,” Signal Process. vol.88, no.10, pp.2604-2609, 2008.
D.V.Jadhav and R.S. Holambe, “Feature extraction using Radon and
Wavelet transforms with application to face recognition,”
Neuro-computing,72, pp.1951-1959,2009
D.V.Jadhav and R.S. Holambe,“Rotation illumination invariant
polynomial kernel Fisher discreminant analysis using Radon and discrete
cosine transforms based features for face recognition,” Pattern
Recognition Letters-31, pp.1002-1009, 2010.
R. Schbuter, I. Bezrukov, H. Wagner, H. Ney “Gammatone features and
feature combination for large vocabulary speech recognition”, Proc. IEEE
ICASSP, pp. IV-649–IV-652, 2007
L. Rabiner, B. H. Juang, “Fundamentals of Speech Recognition,” Prentice
Hall, Englewood Cliffs, NJ, 1993.
D.P.W.Ellis."Gammatone-like spectrograms," web
resourcehttp://www.ee.columbia.edu/~dpwe/resources/matlab/gammato
negram/, 2009.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.