Music Generation using Text Generation

In this post, we’ll look at Josh Bloom’s method to music generation, of which a step-by-step description can be found here.

Josh uses the Humdrum **kern format developed by Ohio State University to represent music in a character-based format, where each note is denoted by a group of symbols. For example, a quarter note middle c would be represented as 4c.

Using the ** kern format, it is now possible treat the music as text, allowing the use of a text generating neural network to generate more music. In his blog, Josh uses Andrej Karpathy’s character-level recurrent neural network.

Using a machine with Intel i5-7500 processor and a GeForce GTX 1070 NVIDIA GPU, it took me about 30 minutes to train 50 epochs. After training, I got the following result when the training data came from Mozart: mozart.mp3
And from Beethoven: beethoven.mp3

The music has good rhythm, and the piece is overall in the correct key. However, there’s no melody, and the left hand has way too many notes for a classical piece.

Written on July 12, 2018
Back