A Survey on Spoken Content Retrieval

Vaishnavi S Ramu; Sanjay Aradhyamath

A Survey on Spoken Content Retrieval

Vaishnavi S Ramu, Sanjay Aradhyamath

Abstract

Spoken information retrieval is the practice of indexing and retrieving spoken material directly from an audio recording rather than through text explanations. It's a feature that transforms audio to text. One of the most significant benefits of speech recognition systems is that they reduce the quantity of misspelt keywords that some typists may experience when typing. A text retrieval engine analyses the ASR output to find relevant information once the spoken content is transcribed into text or lattice format. This architecture is well-suited when the ASR accuracy is sufficiently high. This page details the significant technological contributions made by this research line's theories, concepts, approaches, and successes. It includes two fundamental guidelines: 1) Changed ASR for Retrieval: cascade ASR with text retrieval, but the ASR has been modified or optimised for retrieving spoken content; 2) Interactive Retrieval and Presentation of Retrieved Objects in an Efficient Way: Better retrieval outcomes and user experiences may be obtained by an interactive retrieval technique that incorporates user interactions.

Keywords

Automated Speech Recognition, Interactive Retrieval, Speech Recognition System, Spoken Content Retrieval.

Full Text:

PDF

References

R. DeMori and G. Tur , “Spoken Language Understanding: Systems for Extracting Semantic Information from Speech”. NY, USA, Wiley, 2011, pp. 417–446.

C. Alberti et al., “An audio indexing system for election video material,” in Proceedings ICASSP, 2009.

M. Larson and G. J. F. Jones, “Spoken content retrieval: A survey of techniques and technologies,” Foundation Trends Information. Retrieval, vol. 5, pp. 235– 422, 2012.

B. Chen and L. Lee, “Spoken document understanding and organization,” IEEE Signal Process. Mag., vol. 22, pp. 42–60,2005.

K. Koumpis and S. Renals, “Content-based access to spoken audio,” IEEE Signal Processing, vol. 22, pp. 61–69, 2005.

A. Mandal, P. Mitra and K. Prasanna Kumar, “Recent developments in spoken term detection: A survey,” International Journal of Speech Technology, pp. 1–16, 2013.

A. Acero and C. Chelba “Position specific posterior lattices for indexing speech,” in Proceedings 43rd Annual Meeting Association with Computer Linguist., pp. 443–450,2005.

J. Mamou, O. Siohan and B. Ramabhadran, “Vocabulary independent spoken term detection,” in Proceeding with SIGIR, 2007.

D. R. H. Miller et al., “Rapid and accurate spoken term detection,” in Proceedings on Interspeech, 2007.

J. S. Garofolo, E. M. Voorhees and C. G. P. Auzanne, “The TREC Spoken Document Retrieval Track: A Success Story”, 2000.

J. Ogata and M. Goto, “Podcastle: Collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription,” in Proceedings on Interspeech, 2009.

C. Chelba, M. Saraclar and T. Hazen, “Retrieval and browsing of spoken content,” IEEE Signal Processing Magazine, vol. 25, pp. 39–49, 2008.

J. Glass et al., “Recent progress in the MIT spoken lecture processing project,” in Processing Interspeech, 2007.

S.-Y. Kong et al., “Learning on demand–course lecture distillation by information extraction and semantic structuring for spoken documents,” in Proceedings on ICASSP, 2009.

L. Lee, H. Lee C. Chan and J. Glass, "Spoken Content Retrieval—Beyond Cascading Speech Recognition with Text Retrieval," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, pp. 1389-1420, 2015.

J. Meng, J. Zhang and H. Zhao, "Overview of the Speech Recognition Technology," 2012 Fourth International Conference on Computational and Information Sciences, 2012, pp. 199-202.

Y. Yu, "Research on Speech Recognition Technology and Its Application," 2012 International Conference on Computer Science and Electronics Engineering, 2012, pp. 306-309.

B. Jolad and R. Khanai, "An Art of Speech Recognition: A Review," 2019 2nd International Conference on Signal Processing and Communication (ICSPC), 2019, pp. 31-35.

D. O'Shaughnessy, "Automatic speech recognition," 2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON), 2015, pp. 417-424.

H. Meloni and J. Guizol, "A speech recognition system," ICASSP '82. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1982, pp. 1625-1628.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me