Black Friday deal

Music Recommendation

Banner Image

As music lovers ourselves, we hope to build a music preference classifier that is more tailored to each

user’s personal music taste through machine learning techniques. We started the project with a single

user’s Spotify music dataset from Kaggle (“Spotify Song Attributes”)1, in which we utilized the data

visualization skills to have a better understanding of the types of music the user likes, and to select the

input attributes into the modeling process later on. For data preparation, we used the Principal

Components Analysis procedure to turn a set of correlated variables into a set of nonlinearly-correlated

variables. From there, we built ten machine learning models to check which one produces the highest

accuracy rate. Among the ten models, we chose the top three best performing models to construct the

coherent predictor for music preferences, namely K-nearest neighbor, Support Vector Machine and

Neural Network.


Data Visualization and Model Building

1) Innovative Algorithm:

The current music filtering method Spotify is using is named Collaborative Filtering. While this algorithm

is so far widely adopted on the Internet, it is not perfect. One of the biases it would raise specifically in

Spotify is that popular songs are generally more likely to be recommended to users than the non-

mainstream ones. To minimize this bias and to make the recommendation more personalized, we aim to

build a music classifier that is based on the music itself regardless of the choices of any other users.

Spotify API platform identifies each song with 16 features: duration, acousticness, danceability,

speechiness, loudness, energy, valence, etc. To grasp the potential correlation between the attributes and

their impact on whether the user likes the song or not, we cleansed data and chose 13 out of the 16

attributes of each song to run an OLS regression. Based on the t- and p-value yielded, we observed

multiple attributes with statistical significance. Then dividing the original dataset into two groups

depending on the user’s preference, we drew the pdf of each attribute and found distinct differences

between the two groups. Afterward, drawing the pairplot graph of these attributes, we are again confirmed

that the attributes are strongly correlated in a nonlinear way. Hence, we need more sophisticated models

to understand and predict users’ music preference for each song.


2) Self-studying Machine learning models:

Before setting up models, we first performed feature scaling to reduce biases as preparation for Principal

Component Analysis. Followed by that, we applied Principal Component Analysis to reduce dimensions

and improve machine learning training efficiency. We then successfully built up 10 machine learning

models, seven of which were not covered in class, such as SVM and Neural Network. The largest

challenge we encountered was building the Neural Network. In order to construct a model that can satisfy

our requests, we decided to build the Neural Network model completely on our own to customize

parameters, including code for the sigmoid function, backward propagation, forward propagation etc.

Additionally, we also took the potential overfitting problem into consideration by incorporating

regularization terms into the cost function for gradient descent. Furthermore, to make our models more

tailored to our specific questions, we have also applied algorithms like grid search to tuned our models by

validation set.

3) General Suitability:

We don’t want to stop at constructing a model that predicts the music preference of one user and want to

add more practicability. So we went above and beyond, and turned this algorithm into a commercial

prototype by adding two extra modules. One module asks users for the Spotify id’s several playlists of the

songs they like and dislike, and automatically turns these playlists into dataframes with 16 attributes of

each song in the list using Spotify API. During this process, the user input information will help train our

models and tune parameters to predict which songs they will like. The other module takes in any new

playlist that users are interested in and will use the trained model to recommend songs in the new playlist

that users may like in just a few seconds.

4) Successful Validation:

To validate the result, we tested our prototype on our group members. As inputs, each of us came up with

two lists of songs: one of the songs we liked and the other we disliked. We each contributed about 1400

songs on average. Obtaining attributes from Spotify API for these songs, we trained the model and got

back results with an accuracy of over 70%, while some even hit 90%.



Are you a contestant for RMDS 2021 Data Science Competition?
Type: Other
Release Date: Aug 08, 2019
Last Updated: Feb 22, 2021

Average rating is 3.7 with 10 vote(s)


Please sign in or create an account to give a rating or comment.

Please sign in or create an account to view the download file