Deep Neural Network for Music Source Separation in Tensorflow

This work is from Jeju Machine Learning Camp 2017

  • Co-author: Mark Kwon (hjkwon0609@gmail.com)
  • Final work will be done in Jeju ML Camp. Please check here.
  • Take a look at the demo!

Intro

Recently, deep neural networks have been used in numerous fields and improved quality of many tasks in the fields. Applying deep neural nets to MIR(Music Information Retrieval) tasks also provided us quantum performance improvement. Music source separation is a kind of task for separating voice from music such as pop music. In this project, I implement a deep neural network model for music source separation in Tensorflow.

Implementations

  • I used Posen's deep recurrent neural network(RNN) model [2, 3].
    • 3 RNN layers + 2 dense layer + 2 time-frequency masking layer
  • I used iKala dataset introduced by [1] and MIR-1K dataset which is public together when training.

Requirements

  • Numpy >= 1.3.0
  • TensorFlow == 1.2
  • librosa == 0.5.1

Usage

  • Configuration
    • config.py: set dataset path appropriately.
  • Training
    • python train.py
    • check the loss graph in Tensorboard.
  • Evaluation
    • python eval.py
    • check the result in Tensorboard (audio tab).

References

  1. Zhe-Cheng Fan, Tak-Shing T. Chan, Yi-Hsuan Yang, and Jyh-Shing R. Jang, "Music Signal Processing Using Vector Product Neural Networks", Proc. of the First Int. Workshop on Deep Learning and Music joint with IJCNN, May, 2017
  2. P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136–2147, Dec. 2015
  3. P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks" in International Society for Music Information Retrieval Conference (ISMIR) 2014.
  4. Tohru Nitta, "A backpropagation algorithm for neural networks based an 3D vector product. In Proc. IJCNN", Proc. of IJCAI, 2007.