Almost every day we go online we encounter recommender systems; if you are listening to your favorite song on Spotify, binge-watching a TV show on Netflix or buying a new laptop on Amazon. Although we all know these recommendation engines exist, it is less known what algorithms lie behind such recommendations.
Building our own book recommender with Python
To get a better understanding of the algorithms used in recommender systems we decided to build a recommender ourselves! With COVID-19 making us more housebound than ever, a topic for our recommendation engine was quickly found; we decided to build a book recommender using Python.
Although there are ready-to-use packages to build recommender systems (such as Surprise) we decided to build or own recommendation system.
Taking you through the process: notebooks
In this notebook we explain step-by-step how we built our recommendation engine. Our notebooks contain of the following steps:
- Transforming and appending the information supplied by the user to the training data.
- Creating the references for both books and raters equal to their row/column, later to be used to be able to find back the books/raters in the data.
- Create an empty sparse matrix equal in dimensions to the data and fill it with the book ratings. We chose to work with sparse matrices since most people have rated a few books, so most elements of the matrix are in fact zeros. This is not necessarily needed, but it greatly improves the speed of calculation.
- Create sparse vector with the ratings of the user.
- Quantify the similarity between the user and all other people in the data. This is done with the cosine similarity.
- Now that we calculated the similarity, we can select top x people in the training data who had most similar books ratings to the user, based on cosine similarity. At this step we also filter the points based on the minimum number (N) f books that need to match.
- Select the books that were rated by the top x but not by the user him/herself and calculate the average rating of the books based on the data of the top x.
- Select the number of the top-rated books that the user wishes to receive and display the results.
Choosing the best parameters
By searching the internet, we could find many example scripts for recommenders, but not no advice in choosing the best parameters. For our book recommender we didn’t know what the best options were for parameters x, the number of most similar people you select for recommendations, and N which is the minimum number of times a book must be reviewed by the top x people to be part of the final recommendations. N=1 can result in recommendations based on outliers, but also very good recommendation which are not that common. A higher N means automatically a higher x. This means less outliers, but possible to general recommendations. We run a test among our colleagues to find the best parameters. In the notebook you can read what in our case with our dataset the best parameters are.
Take a glance yourself …
If you are curious about what we did, then take a look at our notebook.