Open Access Open Access  Restricted Access Subscription or Fee Access

Aria - A VAE Model using Spectrogram for Music

Preeti R Prajapati, B. R. Janani, G. Priyanka


Music travels in the form of sound waves and is incorporated into every aspect of day to day life through entertainment, advertisement, games etc., Music includes pitch, rhythm, dynamics, timbre and texture in its composition. With the rise in technology music can be created with the help of electronic devices to create new and unusual types of music. Even though music is available easily, when it comes to customizing music according to personal taste, a person without any knowledge in music composition cannot make music that is copyright free. We can solve this by deploying a Deep Learning (DL) model where the user can give certain inputs and generate music. We set out to build a Deep Neural Network (DNN) that would ultimately compose music by understanding music theory and creates something completely new. The architecture used in our research is Variational Auto Encoder (VAE) which is trained using spectrogram.


Music, Deep Learning, VAE, Spectrogram.

Full Text:



Chi, Wayne, Prachi Kumar, Suri Yaddanapudi, Rahul Suresh, and Umut Isik. "Generating Music with a Self-Correcting Non-Chronological Autoregressive Model." arXiv preprint arXiv:2008.08927 (2020).

Y. M. G. Costa, L. S. Oliveira, A. L. Koericb and F. Gouyon, "Music genre recognition using spectrograms," 2011 18th International Conference on Systems, Signals and Image Processing, Sarajevo, 2011, pp. 1-4.

Elmsley (né Lambert), A., Weyde, T. and Armstrong, N. (2017). Generating Time: Rhythmic Perception, Prediction and Production with Recurrent Neural Networks. Journal of Creative Music Systems, 1(2), doi: 10.5920/JCMS.2017.04

oord, Aaron & Dieleman, Sander & Zen, Heiga & Simonyan, Karen & Vinyals, Oriol & Graves, Alex & Kalchbrenner, Nal & Senior, Andrew & Kavukcuoglu, Koray. “WaveNet: A Generative Model for Raw Audio.” arXiv preprint arXiv:1609.03499v2. (2016).

Dhariwal, Prafulla & Jun, Heewoo & Payne, Christine & Kim, Jong & Radford, Alec & Sutskever, Ilya. “Jukebox: A Generative Model for Music.” arXiv preprint arXiv: 2005.00341v1. (2020).

Vasquez, Sean & Lewis, Mike. “MelNet: A Generative Model for Audio in the Frequency Domain”. arXiv preprint arXiv: 1906.01083v1. (2019).

Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, and Mohammad Norouzi. "Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders." 2017.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.