Open Access Open Access  Restricted Access Subscription or Fee Access

A New Speaker Recognition System with Combined Feature Extraction Techniques in Continuous Speech

Pooja Jaiswal, Praveen Chouksey, Rohit Miri

Abstract


Speech knowledge can be conked out down into language mixture, lecturer detection and words detection. mixture refers to the process of repeatedly produce speech using a processor. The other main areas of notice both involve speech as input: whereas the objective of lecturer detection is to make out an entity based on his or her voice, language detection attempt to routinely understand the linguistic contented of such a word. Human easily and professionally communicate in order via speech despite a lot of complication, including environment noise, slip related to natural speech (stammer, full pause, false starts, etc.) and the innate inconsistency of person words. The latter represent the most confront for routine language detection, and will be examine from the viewpoint of the following three areas of linguistic learn:

1. Phonetics: expression and Acoustics

2. Phonology: Phonemes, Phonotactics and Coarticulation

3. Prosody: tension, field and rhythm

A lecturer confirmation method is collected of two different phase, a teaching stage and a test stage. Each of them can be see as a sequence of autonomous module. The first and foremost unit is the element removal unit assigning lecturer in order extract from the words. This is the plinth module, where the entire system presentation relies. The next unit is speaker modeling unit, represent that speaker’s say and audio features. The collection of model is mainly dependent on the type of words to be used, desired presentation, the ease of teaching and updating and storage space and computation consideration. The final module is for creation choice based on the teaching and testing stage. The system, in turn, output a dual choice: Either admit or reject the validity for the claim lecturer. Victory in speaker certification depends on extract and model the lecturer needy uniqueness of the speech signal which can efficiently differentiate one talker from another.


Keywords


Speech Technology, Stream of Sound, Desired Performance, Efficient Communication, Graphical User Interface

Full Text:

PDF

References


Keller, E. 1994. Fundamentals of Speech Synthesis and Speech Recognition. Toronto: John Wiley & Sons

Bazzi, I., Acero, A., and Deng, L. 2003. An expectation maximization approach for formant tracking using a parameter-free non-linear predictor, in Proceedings of ICASSP 2003, Hong Kong, pp. I.464–I.467

Juang, B.-H., Chou, W., and Lee, C.-H. 1996. Statistical and discriminative methods for speech recognition, in Automatic Speech and Speaker Recognition, Advanced Topics, edited by C.-H. Lee, F. Soong, and K. Paliwal Kluwer Academic, Boston

Weber, K. 2003. HMM mixtures - HMM2 for robust speech recognition, PhD thesis, Swiss Federal Institute of Technology Lausanne EPFL, Lausanne, Switzerland

K. Mustafa and I. C. Bruce, 2004 Robust formant tracking for continuous speech with speaker variability, in Proceedings of the Seventh International Symposium on Signal Processing and Its Applications (ISSPA), Vol. 2. Piscataway, NJ: IEEE, ,

L.R.Rabiner and B.H Juang. 1993 Fundamentals of Speech Recognition Prentice-Hall, Englewood Cliffs, NJ.

M.G. Sumithra, 2K. Thanuskodi and 3A. Helen Jenifer Archana 2005 A New Speaker Recognition System with Combined Feature Extraction Techniques Journal of Computer Science 7 (4): 459-465, 2011ISSN 1549-3636 Science Publications

Aronowitz, H., Burshtein, D., Amir, A., 2004. Speaker indexing in audio archives using test utterance gaussian mixture modeling. In: Proc. Of ICSLP,

Liu, E. Shriberg, A. Stolcke, D. Hillard, M. Ostendorf,and M. Harper. 2006. Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Trans. Audio, Speech and Language Processing, 14(5):1526–1540.

I. Mporas and T. Ganchev, 2007Estimation of unknown speakers height from speech,” International Journal of Speech Technology, vol. 12, no. 4

Y. Liu, et. al.2005, Structural metadata research in the EARS program, Proc. ICASSP,

Kuldeep Kumar and R.K. Aggarwal, 2010 Hındı speech recognıtıon system usıng HTK, International Journal of Computing and Business Research, vol.2, no.2, 2010

Satya Dharanipragada, et.al206, Gaussian mixture models with covariance s or Precisions inshared multiple subspaces, IEEE Transactions on Audio, Speech and Language Processing,vol.14, no.4

Mathias De-Wachter, et.al. 2007 Template based continuous speech recognition, IEEE transactions on Audio, speech and Language processing, vol.15, no.4, .

Yifan Gong, 1997 Stochastic Trajectory Modeling and Sentence Searching for continuousSpeech Recognition, IEEE Transactions On Speech And Audio Processing, vol.5,no.1,.

George Saon and Mukund Padmanabhan, 2001 Data-Driven Approach to Designing Compound Words for continuous Speech Recognition, IEEE Transactions On Speech And AudioProcessing, vol. 9, no.4,.

Kevin M.Indrebo, et.al, 2008 Minimum mean squared error estimation of mel-frequency cepstral co-efficients using a Novel Distortion model, IEEE Transactions On Audio,Speech AndLanguage Processing, vol.16, no.1,.

Xiong Xiao, 2008Normalisaton of the speech modulation spectra for robust speech recognition,IEEE transactions on Audio, Speech and Language Processing, vol.16, no.1,

Mohit Dua, R.K.Aggarwal, Virender Kadyan and Shelza Dua, 2012. Punjabi Automatic SpeechRecognition Using HTK, International Journal of Computer Science Issues, vol.9, no.4,


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.