Machine Learning. Artificial Intelligence.
Each button generates an original music file in the style of that game. For details look below.
Unlike other types of music files, midi files don't contain a representation of the waveforms of the sounds, but the notes that an instrument would play. Given a set of midi files with only piano tracks, the model takes in 100 notes and tries to guess what the next note is. After countless iterations of guessing and checking, the model starts doing a fair job of guessing what comes next. To generate it's own music, you can start the model off with some notes, let it guess the next note, feed that note back into the model, starting the process that will let the model to go off and create it's own score.
The model is a type of neural network. It takes inputs, in our case a series of notes, and performs layers of transformations until it comes out with a single output note. The parameters of the transformations start off randomized, and if the output is incorrect, the parameters are adjusted to try and get the correct output. These adjustments are how the model is trained.
The process in which the parameters are adjusted is called back propagation. In back propagation the error between the desired output and actual output is used to change the parameters of the layer above the output. How much the parameter changes is calculated in a process called gradient descent. The change is taken as the error in that layer, and is used similarly to change the parameters of the layer above that. This way the changes move up the layers all the way back to the input, in other words the errors propagate back through the whole model.
Since we are dealing with music, the notes played in the past have a strong influence on the next note played. This makes it necessary for the model to have some kind of memory. This model has memory in the form of Long Short Term Memory (LSTM) layers. During model training, the LSTM layer tunes a forget gate an remember gate as it learns which past notes are most relevant to the future. In this way longer term patterns in the music can be remembered, rather than just what is in the immediate past.
There are several points where the model can be further developed.
There is no sense of the beginning or end of the song. The songs the model is trained on is treated as if it is one long song it is trying to learn instead of separate tracks. So when a song is generated, there is a sharp cutoff rather a planned ending. A way to deal with this might be to encode the start and end of a song as a type of note itself.
The generator only makes piano music. To generate a full orchestra the proper training data is needed and would need a larger network to deal with so many instruments.
The models are currently trained on the music of a single game. It might perform better if it was rather categorized by the use case of the music, i.e battle music, menu screen music.
A demonstration of OpenAI's GPT-2 text generator. Given a starting prompt, the generator will create about a hundred more words of text.