Arabic Speech Recognition by MFCC and Bionic Wavelet Transform using a Multi Layer Perceptron for Voice Control
Abstract
In this paper, we have proposed a new technique of
Arabic speech recognition with mono-locutor and a reduced
vocabulary. This technique consists at first step in using our proper speech database containing Arabic speech words which are recorded by a mono-locutor for a voice command. The second step consists in features extraction from those recorded words. The third step consists in classifying those extracted features. The features extraction is performed by computing at first, the Mel Frequency Cepstral Coefficients (MFCCs) from each recorded word, then the Bionic Wavelet Transform (BWT) was applied to the vector obtained from
the concatenation of the obtained MFCCs. The obtained bionic
wavelet coefficients were then concatenated to construct one input of a Multi-Layer Perceptual (MLP) used for features classification. In the MLP learning and test phases, we have used eleven Arabic words each if them was repeated twenty five times by the same locutor. A simulation program used to test the performance of the proposed technique showed a classification rate equals to 99.39%.
Keywords
Full Text:
PDFReferences
Z. BENKHELLAT and A. BELMEHD, “Utilisation des Algorithmes
Génétiques pour la Reconnaissance de la Parole,” SETIT 2009.
F. Maouche and M.Benmohamed, “Automatic Recognition of Arabic
words by genetic algorithm and MFCC modeling,” Faculty of
Informatics, Mentouri University, Constantine, Algeria.
I. Patel and Dr. Y. Srinivas Rao2, “speech recognition using HMM with
MFCC- an analysis using frequency specral decomposion technique,”
signal & image processing : an international journal(SIPIJ) vol.1, no.2,
December 2010.
A.M. Othman and M. H. Riadh, “Speech Recognition Using Scaly Neural
Networks,” World Academy of Science, Engineering and Technology 38
A. sadiqui and N. Chenfour, “Réalisation d’un système de
reconnaissance automatique de la parole arabe base sur CMU Sphinx,”
Annals .Computer Science Series 8th Tome 1st Fasc.-2010.
A.M. Alimi and M. Ben Jemaa, “Beta Fuzzy Neural Network
Application in Recognition of Spoken Isolated Arabic Words,”
International Journal of Control and Intelligent Systems, Special Issue on
Speech Processing Techniques and Application, Vol.30,No.2,2002.
M. Alghamdi, M. Elshafie and H. Al-Muhtaseb, “Arabic broadcast news
transcription system,” Journal of Speech Technology, April, 2009.
H. Tabbal, W. Al-Falou and B. Monla, “Analysis and Implementation of
an Automated Delimiter of Quranic Verses in Audio Files using Speech
Recognition Techniques Robust Speech Recognition and
Understanding,” Chapter of the Book "Robust Speech Recognition and
Understanding", edited by: Michael Grimm and Kristian KroschelP
pp.460. June 2007.
J. Park, F. Diehl, M. Gales, M. Tomalin, and P. Woodland, “Training and
Adapting MLP Features for Arabic Speech Recognition,” Proc. Of IEEE
Conf. Acoust. Speech Signal Process. (ICASSP), 2009.
W. Al-Sawalmeh, K. Daqrouq, O. Daoud and A. Al-Qawasmi “Speaker
Identification System-based Mel Frequency and Wavelet Transform
using Neural Network Classifier,” European Journal of Scientific
Research ISSN Vol.41 No.4 (2010), pp.515-525.
D. Kewley-Port and Y. Zheng, “Auditory models of formant frequency
discrimination for isolated vowels,” Journal of the Acoustical Society of
America, 103(3), 1998, pp. 1654-1666.
Md. Rabiullslam, Md. F. Rahmant and M.A. Goffar Khant,
“Improvement of Speech Enhancement Techniques for Robust Speaker
Identification in Noise,” Proceedings of 2009 12th International
Conference on Computer and Information Technology (ICCIT 2009)
-23 December,2009, Dhaka, Bangladesh.
J. Park, F. Diehl, M. Gales, M. Tomalin, and P. Woodland, “Training and
Adapting MLP Features for Arabic Speech Recognition,” Proc. Of IEEE
Conf. Acoust. Speech Signal Process. (ICASSP), 2009.
A. Zabidi, W. Mansor, L. Y. Khuan, I. M. Yassin and R. Sahak, “The
effect of F-Ratio in the Classification of Asphyxiated Infant Cries Using
Multilayer Perceptron Neural Network,” IEEE EMBS Conference on
Biomedical Engineering & Sciences (IECBES 2010), Kuala Lumpur,
Malaysia, 30th November 2010 - 2nd December.
A. Zabidi, et al., “Mel-Frequency Cepstrum Coefficient Analysis of
Infant Cry with Hypothyroidism,” presented at the 2009 5th Int.
Colloquium on Signal Processing & Its Applications, Kuala Lumpur,
Malaysia, 2009.
X. Yuan, “Auditory Model-Based Bionic Wavelet Transform for Speech
Enhancement", Master's thesis, Marquette University, Milwaukee, WI,
USA, 2003.
O. Sayadi and M.B. Shamsollahi, “Multiadaptive Bionic Wavelet
Transform: Application to ECG Denoising and Baseline Wandering
Reduction,” EURASIP Journal of Applied Signal Processing,
(Article ID 41274):11 pages, 2007.
J. Yao and Y. T. Zhang, “Bionic wavelet transform: A new
time-frequency method based on an auditory model,” IEEE Transactions
on Biomedical Engineering, vol.48, no.8, pp.856-863, 2001.
J. Yao and Y. T. Zhang, “The application of bionic wavelet transform to
speech signal processing in cochlear implants using neural network
simulations,”IEEE Transactions on Biomedical Engineering, vol.49,
no.11, pp.1299-1309, 2002.
M. Talbi, L. Salhi, S. Abid, A. Cherif, “Recurrent Neural Network and
Bionic Wavelet Transform for speech enhancement,” Int. J. Signal and
Imaging Systems Engineering, vol.3, no.2, pp.93-101, 2010.
Dr.R.L.K.Venkateswarlu, Dr. R. V. Kumari and G.Vani Jayasri “Speech
Recognition using Radial basis Function Neural Network,” IEEE, 2011.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.