Open Access Open Access  Restricted Access Subscription or Fee Access

Human Computer Interactions using Adaptive Speech-Text-Speech Conversions

B. Kanisha, Dr.G. Balakrishanan


In this paper, we present an adaptive method for speech-text-speech conversion and discuss about the issues relevant to the development of advanced human computer interaction. The existing STS method is modified by replacing the speech recognition method in an adaptive way. The new system helps the visually impaired people by compensating noises of an input speech using recursive least square method of Dynamic Time warping and adjusting the parameters of HMM speech model .The input speech having plurality of speech frames can be adjusted using the refinement normalisation method. These issues are especially important for robust speech recognition. This method of speech can be combined with unit selection method of speech synthesis that leads to an adaptive human computer interaction


Speech Recognition, Speech Synthesis, STS Method

Full Text:



D. Jones, F. Wolf, E. Gibson, E. Williams, E. Fedorenko, D. Reynolds, and M. Zissman, “Measuring the readability of automatic speech-to-text transcripts,” in Proc. of Eurospeech, 2003, pp. 1585– 1588.

P. Heeman and J. Allen, “Speech repairs, intonational phrases and discourse markers: Modeling speakers’ utterances in spoken dialogue,” Computational Linguistics, vol. 25 pp. 527–571, 1999.

J. Kim and P. C. Woodland, “The use of prosody in a combined system for punctuation generation and speech recognition,” in Proc. of Eurospeech, 2001, pp. 2757–2760.

Y.Gotoh and S.Renals,” Sentence boundary detection in broadcast speech transcripts,” in Proc. of ISCA Workshop: Automatic Speech Recognition: Challenges for the new Millennium ASR-2000, 2000, pp.228–235.

R. Kompe, Prosody in Speech Understanding System. Springer-Verlag, 1996.

M. Snover, B. Dorr, and R. Schwartz, “A lexically-driven algorithm for disfluency detection”, in Proc of HLT/NAACL,2004

J. Kim, “Automatic detection of sentence boundaries, disfluencies, and conversational fillers in spontaneous speech,” Master’s thesis, University of Washington, 2004.

M. Johnson and E. Charniak, “A TAG- based noisy channel model of speech repairs,” in Proc. of ACL, 2004.

Meysam, F.Fardad,” An advanced method for speech recognition”, World Academy of Science, Engineering and Technology 2009

Kirschning. "Continuous Speech Recognition Using the Time-Sliced Paradigm" MEng. Dissertation, University Of Tokushinia, 1998.

J.Tebelskis. "Speech Recognition Using Neural Networks", PhD. Dissertation, School Of ComputerScience, Carnegie Mellon University, 1995.

J. Tchorz, B. Kollmeier; "A Psychoacoustical Model of the Auditory Periphery as Front-end for ASR"; ASAEAAiDEGA Joint Meeting on Acoustics; Berlin, March 1999.

Cory L. Clark "LabVIEW Digital Signal Processing and Digital Communications". McGraw- Hill Companies.2005

" Digital Signal Processing System-Level Design Using Lab VIEW " by Nasser Kehtarnavaz and Namjin Kim University of Texas at Dallas. 2005.

M. Kantardzic. Data Mining Concepts, Models, Methods, and Algorithms. IEEE, Piscataway, NJ, USA, 2003.

R.P. Lippmann, "An Introduction to Computing with neural nets." IEEE ASSP Mag. , vol 4, Apr.1997

H. B. D. Martin T. Hagan and M. Beale Neural Network Design. PWS Publishing Company, Boston, MA, USA, 1996.

T. G. Dietterich. Machine learning for sequential data: A review. In Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition,15–30, 2002. Springer- Verlag, London, UK.

MathWorks. Neural Network Toolbox User’s Guide, 2004.

S.P. Kishore, A. W Black, Rohit Kumar and Rajiv Sangal, “Experiments with unit selection Speech Databases for Indian Languages.”

Aniruddha Sen, “Speech Synthesis in India”, IETE Technical Review, Vol 24, No 5, Sep-Oct 2007, pp 343-3 50.

S. P Kishore and A. W. Black, “Unit size in Unit selection Speech Synthesis”, Proceedings of EUROSPEECH, Geneva Switzerland, 2003.

S. P. Kishore, Rohit Kumar, and Rajeev Sangal, “ A data – driven synthesis approach for Indian Languages using syllable as basic unit,” in Proceedings of International Conference on National Language Processing (ICON), 2002.

S. P. Kawachale and J. S. Chitode, “An Optimized Soft Cutting Approach to Derive Syllables from Words in Text to Speech Synthesizer”, in proceedings Signal and Image Processing, 2006, pp 534.

Hiroyuki Segi, Tohru Takagi and Takayuki Ito, “A Concatenative Speech Synthesis Method using Context Dependent Phoneme Sequences with variable length as a Search Units, Fifth ISCA Speech Synthesis Workshop- Pittsburgh

Eric Lewis and Mark Tatham, “Word and Syllable Concatenation in Text to Speech Synthesis”.

Jerneja Zganec Gros and Mario Zganec, “An Efficient Unit-selection Method for Concatenative Text-to-speech Synthesis

Vijayakumar, V.R. and P.T. Vanathi, 2007. Modified adaptive filtering algorithm for noise Cancellation in speech signals, electronics and electrical engineering.Kaunas Technol., 74: 17-20.

S.A.R.Al-Haddad,S.A.Samad,. Hussain,K.A.Ishak,A.O.A.Noor,”Robust Speech recognition using fusion Techniques and adaptive filtering,American Journal of Applied Science-2009 publications


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.