Taking part in a Kaggle competition was truly an exciting experience. Read about what we learned while taking part here. We built a model that computes a readability/text complexity score for a short text. Although we did not win the Kaggle competition, we enjoyed working with this question so much so that we didn’t feel like letting go of our model just yet. Our team had a thousand and one ideas about how we and our colleagues could use it. Thus, we decided to create a package and an API. This way we would allow anyone to either reuse our code for their own model (by using our package) or simply use our model to get a sense of the complexity of their text (by using our API).
For some of us this was the first time building a package from scratch. It was interesting to understand what elements are required to have a fully working python package and how it should be structured. In the first version of the package, we decided to include the data pre-processing steps that are the same as we used on the data for model training. In our API we included our pre-trained ‘award non-winning’ model, which is now available to all of you!
Since we were collaborating with a team of six developers, we decided to use gitlab to be able to work independently on various elements of the package and API while maintain version control and avoid spending forever on merging all the scripts together.
To have good predictions we decided to use a transformer model as one of the steps. Such deep learning models benefit greatly from using GPU. Thus, we trained it on Azure Machine Learning. The model then got deployed using our Azure platform into a docker container.
Are you curious to try the API or package for yourself? You can try out the API below by inputting your own (English) text in the input field. The package can be found in our Gitlab repo. We have also prepared a tutorial notebook for you. Go ahead, check if you can write a text that is more complicated than the most scored in our data or use your own reference set and try and beat it instead!