Building a book recommender from scratch

1 April 2021

Almost every day we go online we encounter recommender systems; if you are listening to your favorite song on Spotify, binge-watching a TV show on Netflix or buying a new laptop on Amazon. Although we all know these recommendation engines exist, it is less known what algorithms lie behind such recommendations.


Building our own book recommender with Python

To get a better understanding of the algorithms used in recommender systems we decided to build a recommender ourselves! With COVID-19 making us more housebound than ever, a topic for our recommendation engine was quickly found; we decided to build a book recommender using Python.

Although there are ready-to-use packages to build recommender systems (such as Surprise) we decided to build or own recommendation system.


Taking you through the process: notebooks

In this notebook we explain step-by-step how we built our recommendation engine. Our notebooks contain of the following steps:

  1. Transforming and appending the information supplied by the user to the training data.
  2. Creating the references for both books and raters equal to their row/column, later to be used to be able to find back the books/raters in the data.
  3. Create an empty sparse matrix equal in dimensions to the data and fill it with the book ratings. We chose to work with sparse matrices since most people have rated a few books, so most elements of the matrix are in fact zeros. This is not necessarily needed, but it greatly improves the speed of calculation.
  4. Create sparse vector with the ratings of the user.
  5. Quantify the similarity between the user and all other people in the data. This is done with the cosine similarity.
  6. Now that we calculated the similarity, we can select top x people in the training data who had most similar books ratings to the user, based on cosine similarity. At this step we also filter the points based on the minimum number (N) f books that need to match.
  7. Select the books that were rated by the top x but not by the user him/herself and calculate the average rating of the books based on the data of the top x.
  8. Select the number of the top-rated books that the user wishes to receive and display the results.


Choosing the best parameters

By searching the internet, we could find many example scripts for recommenders, but not no advice in choosing the best parameters. For our book recommender we didn’t know what the best options were for parameters x, the number of most similar people you select for recommendations, and N which is the minimum number of times a book must be reviewed by the top x people to be part of the final recommendations. N=1 can result in recommendations based on outliers, but also very good recommendation which are not that common. A higher N means automatically a higher x. This means less outliers, but possible to general recommendations. We run a test among our colleagues to find the best parameters. In the notebook you can read what in our case with our dataset the best parameters are.


Take a glance yourself …

If you are curious about what we did, then take a look at our notebook.


Latest news

Find your “high risk files” according to GDPR using our DriveScanner

17 April 2023

In every company it’s a struggle to make sure we only keep the documents we want... read more

Nachos Hackathon 2022

7 September 2022

We don’t know if you’ve heard already, but there is yet another crisis on our horizon:... read more

In 2022, we are going to have a great year of celebration! Many weddings expected

21 February 2022

Plan your calendar free and make sure you have plenty of party clothes in your closet,... read more

Subscribe to our newsletter

Never miss anything in the field of advanced analytics, data science and its application within organizations!