Skip to main content

If, like me, you are a book lover who devours books, you are probably also familiar with the problem of bringing enough books while traveling. This is where the e-book entered my life, and it is definitely here to stay. But how to keep track of all my e-books? And how do I remember what they’re about by just seeing their title? And, the worst of all problems, how to pick the next book to read from the huge virtual pile of books?

After struggling with this for years, I finally decided it was time to spend some time to solve this problem. Why not create a Python script to list all my books, retrieve their ISBN and finally find their summary on Google Books, ending up with an Excel file containing all this information? And yes, this is as simple as it sounds! I’ve used the brilliantly simple isbntools package and combined this with the urllib package to get the summary from Google Books. Easy does it! And the result looks like this:

If you’re curious on which book to read next now you’ve got such a nice list of all the books you’ve already read, you should definitely checkout our fantastic book recommender to help you find your next best book.

Take a tour in the code


Let’s start by importing packages and initializing our input information, like the link to Google Books, the file path where my books can be found, and the desired name of the Excel file we want to create.

import os, re, time
import pandas as pd
from import *
import urllib.request
import json

# the link where to retrieve the book summary
base_api_link = ""

# the directory where the books can be found and the current list of books (if exists)
bookdir = "C:/MyDocuments/Boeken" #os.getcwd()
current_books = os.path.join(bookdir, 'boekenlijst.xlsx')

Check for an existing file with books

Next we want to check if we’ve ran this script before and if there is already a list of books. If this is the case we will leave the current list untouched and will only add the new books to the file.

# check to see if there is a list with books available already
# if this file does not exist already, this script will create it automatically
if os.path.exists(current_books):
    my_current_books = pd.read_excel(current_books, dtype='object')
    my_current_books = pd.DataFrame(columns=["index", "ISBN", "summary", "location"])

Retrieve all books

Now we walk through all folders in the given directory and only look for pdf and epub files. In my case these were the only files that I would consider to be books.

# create an empty dictionary
my_books = {}

print("Starting to list all books (epub and pdf) in the given directory")

# create a list of all books (epub or pdf files) in the directory and all its subdirectories
# r=root, d=directories, f=files
for r, d, f in os.walk(bookdir):
    for file in f:
        if (file.endswith(".epub")) or (file.endswith(".pdf")):
            # Remove the text _Jeanine_ 1234567891234 from the filenames
            booktitle = re.sub('.epub', '', re.sub('_Jeanine _\d{13}', '', file))
            booklocation = re.sub(bookdir, '', r)
            my_books[booktitle] = booklocation

print(f"Found {len(my_books.keys())} books in the given directory")
print(f"Found {len(my_current_books)} in the existing list of books")

Only process the newly found books

We don’t want to keep searching online for ISBN and summaries of the books we’ve already found before. Therefore we will remove all books from the existing file with the list of books (if this existed). If there wasn’t already such a list, nothing will happen during this step.

# only keep the books that were not already in the list
if len(my_current_books) > 0:
    for d in my_current_books["index"]:
            del my_books[d]
        except KeyError:
            pass # if a key is not found, this is no problem

print(f"There are {len(my_books.keys())} books that were not already in the list of books")

Find the books online

For all the new books, we want to search for their ISBN using the book title (name of the file). Using this ISBN, we will try to find the summary of the book in Google Books.

# try to get more information on each book
i = 0
for book, location in my_books.items():
    print(f"Processing: {book}, this is number {i+1} in the list")
    isbn = 0
    summary = ""
        # retrieve ISBN
        isbn = isbn_from_words(book)
        # retrieve book information from Google Books
        if len(isbn) > 0:
            with urllib.request.urlopen(base_api_link + isbn) as f:
                text =
            decoded_text = text.decode("utf-8")
            obj = json.loads(decoded_text) 
            volume_info = obj["items"][0]
            summary = re.sub(r"\s+", " ", volume_info["searchInfo"]["textSnippet"])
    except Exception as e:
        print(f"got an error when looking for {book}, the error is: {e}")
    my_books[book] = {"location":location, "ISBN":isbn, "summary":summary}
    i += 1
    # sleep to prevent 429 time-out error in the API request to get the ISBN

Store the list of books in Excel

Finally, we combine the existing list of books with all new books we’ve found and store this complete overview in the Excel file. We’re all done now and good to go.

# write to Excel
all_books = pd.DataFrame(data=my_books)
all_books = (all_books.T)
all_books = all_books.reset_index()
all_books = pd.concat([my_current_books, all_books])
all_books.to_excel(current_books, index=False)

It’s definitely not perfect yet, since the ISBN is matched using the title, which won’t be correct all the time, but I had a lot of fun creating this script, and I hope you’ll have fun using it!

Go to Gitlab to find the whole script and requirements.

Principal Consultant & Data Scientist
Close Menu