Open Access Open Access  Restricted Access Subscription or Fee Access

Speech Recognition of Isolated Words in Noisy Conditions Using Radon Transform and Discrete Cosine Transform Based Features Derived from Speech Spectrogram

Yogesh S. Angal, Pawan K. Ajmera, Raghunath S. Holambe, Rajan H. Chile


This paper presents a new feature extraction technique
for speech recognition using Radon Transform (RT) and Discrete Cosine Transform (DCT). A spectrogram is a time varying spectrum(forming an image) that shows how the spectral density of a signal
varies with time. In the proposed scheme speech specific features have been extracted by applying image processing technique to the patterns
available in the spectrogram. Radon transform has been used to derive the effective acoustic features from speech spectrogram. The proposed technique computes radon projections for nine orientations and
captures the acoustic characteristics of the speech spectrogram. DCT applied on Radon projections yields low dimensional feature vectors. The technique is computationally efficient, speaker-independent,
robust to session variations and insensitive to additive noise. Radon projections for seven orientations capture the acoustic characteristics of the spectrogram. The performance of the proposed algorithm has been evaluated in presence of additive white Gaussian noise from
(30dB to -5dB SNR) on Texas Instruments-46(TI-46) speech database. The performance of the proposed technique in noisy environment is much better than existing popular algorithms


Speech Recognition, Spectrogram, Feature Extraction, Radon Transform, Discrete Cosine Transform.

Full Text:



F.Itakura, “Minimum prediction residual principal applied to speech

recognition,” IEEE Trans. Acoust Speech Proces, ASSP-23, 67-72.1975.

L.R.Rabiner, B. H. Jaung, Fundamentals of Speech Recognition, Prentice

Hall Englewood Cliffs , NJ, 1993.

T.F.Quatari, Discrete-time Speech Signal Processing: Principals and

Practice, Prientice Hall,Massachusetts, 2002.

S. B.Davis, P. Mermelstain, “Comparison of parametric representation

for monosyllabic word recognition in continuously spoken sentences,”

IEEE Transactions on

Acoustic, Speech and Signal Processing, vol.28,no.4,pp. 357-366.,1980

B. S. Atal and S. L. Hanauer, “Speech Analysis and Synthesis by Linear

Prediction of speech wave,” J. Acoust. Soc. Am., vol.50 no.2,

pp.637-655, August 1971.

H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech," J.

Acoust. Soc. America, pp. 1738-1752, 1990.

S.A.Falup and Kelly Fitz, “A spectrogram for the twenty first century,”

Report of the Vision 2010 Committee, Acoustics Today, pp.26-32, July

R. A. Cole, A. I. Rudnicky and V. M. Zue, “Performance of an expert

spectrogram reader”, Journal of Acoustic Society of America, 65,pp.

-87, 1979.

R.A.Cole, A.I.Rudnicky, V.M.Zue, D.R.Reddy, “R.A .Cole(Ed),Speech

as patterns on paper, perception and production of fluent speech,”


M. Kleinschmidt, V. Hohmann, “Sub-band SNR estimation using

auditory feature processing,” Speech Communication, vol. 39, no,1,


M.Kleinschmidt, “Methods for capturing spectro- temporal modulations

in automatic speech recognition”, Acta Acustica, 8, pp. 1-6, 2001.

K,Saeed, M.K.Nammous, “A speech-and–speaker identification system

:feature extraction, description and classification of speech-signal

image,” IEEE Transactions on Industrial


G. Beylkin, “Discrete radon transform”, IEEE Transactions on Acoustics,

Speech and Signal Processing, vol.35, no.2, pp. 162–172,1987.

D.V.Jadhav, R.S. Holambe, “Radon and Discrete Cosine Transforms

based feature extraction and dimensionality reduction approach for face

recognition”, Signal Process.vol.88, no.10, pp.2604-2609, 2008.

P.K.Ajmera, D.V.Jadhav and R.S.Holambe, “Text-Independent speaker

identification using Radon and discrete cosine transforms based features

from speech spectrogram,”Pattern Recognition, Elsevier Press, vol.44

no.10-11, pp.2749-2759, Oct-Nov.2011.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.