Music Recommendation

Home / Resources /

Overview Download & Link

As music lovers ourselves, we hope to build a music preference classifier that is more tailored to each

user’s personal music taste through machine learning techniques. We started the project with a single

user’s Spotify music dataset from Kaggle (“Spotify Song Attributes”)1, in which we utilized the data

visualization skills to have a better understanding of the types of music the user likes, and to select the

input attributes into the modeling process later on. For data preparation, we used the Principal

Components Analysis procedure to turn a set of correlated variables into a set of nonlinearly-correlated

variables. From there, we built ten machine learning models to check which one produces the highest

accuracy rate. Among the ten models, we chose the top three best performing models to construct the

coherent predictor for music preferences, namely K-nearest neighbor, Support Vector Machine and

Neural Network.

Data Visualization and Model Building

1) Innovative Algorithm:

The current music filtering method Spotify is using is named Collaborative Filtering. While this algorithm

is so far widely adopted on the Internet, it is not perfect. One of the biases it would raise specifically in

Spotify is that popular songs are generally more likely to be recommended to users than the non-

mainstream ones. To minimize this bias and to make the recommendation more personalized, we aim to

build a music classifier that is based on the music itself regardless of the choices of any other users.

Spotify API platform identifies each song with 16 features: duration, acousticness, danceability,

speechiness, loudness, energy, valence, etc. To grasp the potential correlation between the attributes and

their impact on whether the user likes the song or not, we cleansed data and chose 13 out of the 16

attributes of each song to run an OLS regression. Based on the t- and p-value yielded, we observed

multiple attributes with statistical significance. Then dividing the original dataset into two groups

depending on the user’s preference, we drew the pdf of each attribute and found distinct differences

between the two groups. Afterward, drawing the pairplot graph of these attributes, we are again confirmed

that the attributes are strongly correlated in a nonlinear way. Hence, we need more sophisticated models

to understand and predict users’ music preference for each song.

2) Self-studying Machine learning models:

Before setting up models, we first performed feature scaling to reduce biases as preparation for Principal

Component Analysis. Followed by that, we applied Principal Component Analysis to reduce dimensions

and improve machine learning training efficiency. We then successfully built up 10 machine learning

models, seven of which were not covered in class, such as SVM and Neural Network. The largest

challenge we encountered was building the Neural Network. In order to construct a model that can satisfy

our requests, we decided to build the Neural Network model completely on our own to customize

parameters, including code for the sigmoid function, backward propagation, forward propagation etc.

Additionally, we also took the potential overfitting problem into consideration by incorporating

regularization terms into the cost function for gradient descent. Furthermore, to make our models more

tailored to our specific questions, we have also applied algorithms like grid search to tuned our models by

validation set.

3) General Suitability:

We don’t want to stop at constructing a model that predicts the music preference of one user and want to

add more practicability. So we went above and beyond, and turned this algorithm into a commercial

prototype by adding two extra modules. One module asks users for the Spotify id’s several playlists of the

songs they like and dislike, and automatically turns these playlists into dataframes with 16 attributes of

each song in the list using Spotify API. During this process, the user input information will help train our

models and tune parameters to predict which songs they will like. The other module takes in any new

playlist that users are interested in and will use the trained model to recommend songs in the new playlist

that users may like in just a few seconds.

4) Successful Validation:

To validate the result, we tested our prototype on our group members. As inputs, each of us came up with

two lists of songs: one of the songs we liked and the other we disliked. We each contributed about 1400

songs on average. Obtaining attributes from Spotify API for these songs, we trained the model and got

back results with an accuracy of over 70%, while some even hit 90%.

Language:
Are you a contestant for RMDS 2021 Data Science Competition?
Type:	Other
Release Date:	Aug 08, 2019
Last Updated:	Feb 22, 2021

Average rating is 3.7 with 10 vote(s)

Comments

Best Best Cici is the best!!

Gaojia Xu August 15

Best Best Cici is the best!!

Interesting!

Vivian Phu August 19

Love this project!! Can't wait to see what you do next

Good Job Cici!

Daniel Chen December 03

Good Job Cici!

That’s a great idea!…

Anna Anoshina February 13

That’s a great idea! Recommendation algorithms in music apps are far from perfect.

Please sign in or create an account to give a rating or comment.

Please sign in or create an account to view the download file

IM Data Speaker and Panelist Presentations

By RMDS Lab Official Account

Rapid intensification prediction based on deep neural network method using SHIPS data and precipitation data

By Yun Li

IM Data Training in Healthcare Industry: Stroke Prediction

By RMDS Lab Official Account

Tropical Cyclone Features with Satellite Data for Hurricane Forecast

By Peng Wu

Music Recommendation

By Cici Zhao

Linear Regression Analysis of US Census Data

By Alicia Wei

PermissionedBC-IoT-Cloud

By Anang Hudaya Muhamad Amin

Risk Scoring for Low Income Applicants

By Alex Liu

Organ Transplant Analysis

By Jidong Li

Tropical Cyclone Features with Satellite Data for Hurricane Forecast

By Peng Wu

RMDS Meetup Speakers Presentations

By RMDS Lab Official Account

Rapid intensification prediction based on deep neural network method using SHIPS data and precipitation data

By Yun Li

RMDS Newsletter Archives

By RMDS Lab Official Account

IM Data Training in Healthcare Industry: Stroke Prediction

By RMDS Lab Official Account

IM Data Speaker and Panelist Presentations

By RMDS Lab Official Account

IM Data Training in Analytics and CRM Strategy by Dr. Sijun Wang

By RMDS Lab Official Account

See All Projects

RMDS Lab Official Account

at RMDS Lab

Impact Score: 40.84

Jidong Li

Data Scientist

at RMDS Lab Inc.

Impact Score: 34.24

Erika Meyers

Talent Development Manager

at RMDS Lab

Impact Score: 31.32

Vincent Chen

Data Scientist

at RMDS lab

Impact Score: 29.33

Alicia Wei

Senior Data Scientist

at Instacart

Impact Score: 28.33

Competition Data: Tropical Cyclone Features with Satellite Data for Hurricane Forecast

36868 Views

The volume of direct foreign trade classified by commodities

24579 Views

Cereal export

22760 Views

UAE Direct Total Trade volume by emirate

22235 Views

Civil Service List (Active)

20826 Views

Fruits export

20804 Views

Fruits and vegetables Re-export

20701 Views

Fruits Import

20689 Views

Fruits Re-export

20520 Views

Live Animal export

20455 Views

Fruits and vegetables Import

20300 Views