Project: AI Music Generator
To create an automatic AI music generator, we first need to understand music theory.
Here in this project, we will build a model that generates piano music using the datasets that we feed in.
Let's check some theory on Piano:
- A piano has 12 notes in total (including the 5 accidental notes).
- The notes are: C, D, E, F, G, A, B
- The accidental notes are: C#, D#, F#, G#, A#
- One set of notes is called an octave and there are n octaves in pianos depending upon the size(n-keys) of the piano.
- The left end of the Piano has low frequency whereas the right end of the Piano has a high frequency.
- A chord is a mixture of notes played simultaneously.
Note: A music is a combination of notes and chords played sequentially. The melodiousness of the music solely depends upon the order and combination of the notes and chords being played.
Data
The dataset consists of music files in midi format. MIDI stands for Music Instruments Digital Interface.
Download the dataset from below:
Imports
TASK
So, the idea is we'll read all the midi files and extract the components(notes and chords) from it so that we can use it later in preprocessing and training the model.
1. Reading a midi file:
1. Reading a midi file:
midi = converter.parse("midi_songs/EyesOnMePiano.mid")
music21.converter contains tools for loading music from various file formats, whether from disk, from the web, or from the text, into music21.stream.:class:~music21.stream.Score objects (or other similar stream objects).
The most powerful and easy to use tool is the
parse()
function. Simply provide a filename, URL, or text string and, if the format is supported, a Score
will be returned.
2. Playing the parsed file:
This method loads the music and provides functionality to play and stop the music.
Viewing the song in the text format. If we look closely then we will notice that the structure is this way:
This means the main container contains multiple sub-containers within which the notes and chords are present separately. (Note: Here the container is not list but the Score)
midi.show('midi')
This method loads the music and provides functionality to play and stop the music.
3. Displaying the song in the text format:
midi.show('text')
Viewing the song in the text format. If we look closely then we will notice that the structure is this way:
- Container
- Sub-Container
- notes
- notes
- chords
- Sub-Container
- notes
- chords
- Sub-Container
- chords
- notes
- notes
This means the main container contains multiple sub-containers within which the notes and chords are present separately. (Note: Here the container is not list but the Score)
We can interpret this as the lists-of-lists containing the notes and the chords.
4. Flattening the object and checking the length:
elements_to_parse = midi.flat.notes
len(elements_to_parse)
So, we'll flatten the list so that all the elements are present within a single list reason being what matters ultimately are the notes and the chords.
5. Checking the timing of the notes and the chords played:
for e in elements_to_parse:
print(e, e.offset)
We can call the offset property on the elements and check the timing they were played at.
Some examples of the output are:
By now, it is clear that the iterator "elements_to_parse" contains only the notes and the chords. It's our turn to decide what to do with these tunes?
6. Storing the Pitch:
Okay so let me simplify, the note is a single tune and if we capture its pitch we can utilize it in the formation of new music later. Similarly, the chord has multiples notes so we can extract the notes in the chord and store their pitch.
*In the code above*
- Calling the pitch on the elements returns the pitch of the note along with its type.
- Therefore, converted it to a string so that later mapping into integers is efficient while feeding it to LSTM.
- In case of chords; there is a possibility that multiple notes, as well as single notes, are considered to be a chord, so we are concatenating them using the '+'
- normalOrder returns the list of values that corresponds to the chord. Eg: Chord: A6+C4, then the normalOrder may return the '1+6'.
- In this scenario, we can seek help from the modules of the library namely note and chord.
- We can check the instance of the elements if they belong to any of the categories, we can operate on them as specified.
Now, we need to preprocess all the music files and extract their notes and chords. Using glob to serve the purpose and also printing the file being parsed.
At this point out list contains 60498 elements which are either notes or chords. But all are not unique, so we can check the unique elements by typecasting them into a set and check a few of the elements:
Prepare Sequential Data for LSTM
Steps to consider in this stage:
- Let's take 100 types of inputs and then predict one new Output.
- So we'll set the sequence_length to be 100.
- As we all know our LSTM is not going to work on the string inputs
- We need to map these strings into numbers of integers.
- So we'll create a dictionary and map each unique element with some integer values.
Now,
- Since our current input ranges from 0 to 359
- We'll normalize the input values so that it ranges between 0 to 1
Creating Model
By now, we have created a supervised data and now we need to create and train a model.
The first layer of the model is LSTM and we'll feed in the values, the units to be 512 and the shape is (100,1). Since this not the last layer, we will return the sequence.
To reduce the overfitting and improve the performance of the model we will assign a dropout of 0.3 after every layer.
Further, we are going to add two more layers of the LSTM
And at the end of the LSTMs add a Dense layer to update the values during propagation.
Also, we are going to use softmax as the activation function.
The first layer of the model is LSTM and we'll feed in the values, the units to be 512 and the shape is (100,1). Since this not the last layer, we will return the sequence.
To reduce the overfitting and improve the performance of the model we will assign a dropout of 0.3 after every layer.
Further, we are going to add two more layers of the LSTM
And at the end of the LSTMs add a Dense layer to update the values during propagation.
Also, we are going to use softmax as the activation function.
*The training of the model on Colab took nearly 10hours. *
Predictions
To predict the music sequence, we need to give some input, but if we check the network input right now, it is a NumPy array but we'll create a list again.
So that the network input is a list of list and each data point consist of 100 elements.
Taking the prediction:
We will take any data point start generating the notes and append this into the sequence of the datapoint, now we'll discard the first element and take the rest of the elements to generate new element. This will be done recursively.
To process this approach, we can start with any random datapoint. And for the output, we need to create a dictionary and map all the integers to their respective string values so that the corresponding notes and chords can be played.
Also, determining the number of elements we want to generate.(Note: these elements will combine to form music).
Creating Midi File
Now, we have to create a midi file from the prediction that we got. So that it can be played.
We will start with the time value 0 and slowly add up according to the notes and chords.
Here, again we will have to check the pattern, if it is a chord or a note and process them accordingly.
If the element is a note then, in this case, we will create a note object using the module 'note' on that element and if the element was a chord, we will create a chord object using the 'chord' module; so that it converts it into a playable note.
How to check if the element is a chord or not?
If the element has a '+'(multiple notes) in it it must be a chord or if the element is just a digit(single note) then it is also a chord.
Since this whole dataset belongs to the piano music, we can store the notes calling the piano method defining the type of instrument.
In the end, we will add the offset so that the time stamp increases linearly as the notes/chords are appended.
So, this way we can train a model to generate music. Besides this, there are few drawbacks in this model.
In the end, we will add the offset so that the time stamp increases linearly as the notes/chords are appended.
So, this way we can train a model to generate music. Besides this, there are few drawbacks in this model.
Drawbacks
- We have taken many samples and generated a new one, the model does not know how a song should start or end.
- The offset is fixed, it is not variable.
- Add more instruments, for this, we can create a new container and then proceed
- Train for more epochs.
0 Comments