Methods for Audio Classification & Segmentation

Madhuri P. Borawake; Rameshwar Kawitkar

Methods for Audio Classification & Segmentation

Madhuri P. Borawake, Rameshwar Kawitkar

Abstract

This paper describes the work done on the development of an audio segmentation and classification system. Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. Many existing works on audio classification deal with the problem of classifying known homogeneous audio segments. In this work, audio recordings are divided into acoustically similar regions and classified into basic audio types such as speech, music or silence. Audio features used in this paper include real Cepstral coefficients, ,Linear predictive cepstral coefficients ,result Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate and Short Term Energy (STE) to get 100% result.. These features were extracted from audio files that were stored in a .WAV format. Possible use of features, which are extracted directly from MPEG audio files, is also considered. Statistical based methods are used to segment and classify audio signals using these features. The classification methods used include the General Mixture Model (GMM) and the k- means algorithms. It is shown that the system implemented achieves an accuracy rate of more than 95% for discrete audio classification.

Keywords

Audio Content Analysis, Segmentation, Classification, GMM, „k‟ Means, MFCC, ZCR, STE and MPEG

Full Text:

PDF

References

Lie Lu, Hong-Jiang Zhang and Hao Jiang. “Content analysis for audio classification and segmentation”. IEEE Transactions on speech and audio processing, vol.10, no.7, October 2002

K. El-Maleh, M. Klein, G. Petrucci and P. Kabal , “ Speech/Music discrimination for multimedia applications,” Proc. IEEE Int. Conf. on acoustics, Speech, Signal Processing (Istanbul), pp. 2445-2448, June 2000

H. Meindo and J.Neto, “Audio Segmentaion, Classification and Clustering in a Broadcast News Task” , in Proceedings ICASSP 2003, Hong Kong, China, 2003.

G. Tzanetakis and P. Cook, “ Multifeature audio segmentation for browsing and annotation,” Proc.1999 IEEE workshop on applications of signal processing to Audio and Acoustics, New Paltz, New York, Oct17-20, 1999.

C. Panagiotakis and G.Tziritas “ A Speech/Music Discriminator Based on RMS and Zero-Crossings”. IEEE Transactions on multimedia, 2004.

E. Scheirer and M. Slaney, “ Construction and evaluation of a robust multifeature speech/music discriminator, ” in Proc. ICASSP ‟97, Munich, Germany, 1997, , .

Davis Pan, "A Tutorial on MPEG/Audio Compression,". IEEE Multimedia Vol. 2, No. 7, 1995, pp. 60-74.

R. Dannenberg and M. Goto, “Music Structure Analysis from Acoustic Signals“, Music Structure 16 April 2005.

E. Scheirer and M. Slaney, Construction and Evaluation of a Robust Multifeature Music/Speech Discriminator. Proc. ICASSP 97, vol. II, pp 1331-1334. IEEE, April 1997

D. Kimber and L. Wilcox. Acoustic Segmentation for Audio Browsers, Proc. Interface Conference, Sydney, Australia, July, 1996

J. P. Campbell, JR. Speaker Recognition: A Tutorial. Proceedings of the IEEE, vl.85, no.9, pp.1437~1462, 1997.

A. V. McCree and T. P. Barnwell. Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding. IEEE Transaction on Speech and Audio Processing, vol. 3, No. 4, pp242-250. July 1995.

K. El-Maleh, M. Klein, G. Petrucci and P. Kabal. Speech/music discrimination for multimedia application. ICASSP00, 2000

S. Srinivasan, D. Petkovic and D. Ponceleon. Towards robust features for classifying audio in the CueVideo System. Proceedings of the seventh ACM international conference on Multimedia, pp.393 – 400, 1999.

S. Tranter and D. Reynolds, “An overview of automatic speaker systems,” IEEE Trans. Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1557–1565, 2006.

Beth Logan ,”Mel Frequency Cepstral Coefficients for Music Modeling “ in international Symposium on Music information retrieval , October 2000.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me