Skip to main content

Natural Language Processing is a subfield of Artificial Intelligence and has already existed for some time. Recent years, there have been many developments and nowadays not only human language can be analyzed, but it can also be generated by AI models. There are several so-called language models able to generate human-like texts. Probably the most well-known example is GPT-3, which is created by OpenAI. This model shows the current possibilities for language generation.

The impressive possibilities of GPT-3 made us start a project to learn more about language generation. Our goal is to use an existing language model and finetune it for another task. The Hugging Face 🤗infrastructure seems to be the best way to go for this project. At Cmotions we are a big fan of the Hugging Face infrastructure and services. This data science platform and community provides many integrated tools to build, train or deploy models based on open-source code. In general, it is a place where engineers, scientists and researchers gather to share ideas, code, and best practices to learn, get support or to contribute to open-source projects. In short, Hugging Face is the “home of machine learning”.

The main focus of Hugging face is on Transformers, a platform that provides the community with APIs to access and use state-of-the-art pre-trained models available from the Hugging Face hub. You can make use of existing state-of-the art models resulting in lower computing costs and a smaller carbon footprint. If you are looking for either high performance natural language understanding and generation or even computer vision and audio tasks this is the place to go.

‍Models that you have trained or datasets that you have created can be easily shared with the community on the Hugging Face Hub. Models are hosted as a Git repository with all the benefits of versioning and reproducibility. When you share a model or dataset to the hub it means that anyone can use it, eliminating their need to train a model on their own. Isn’t that great! As a bonus sharing a model on the Hub automatically deploys a hosted Inference API for your model, a sort of instant demo.

Our Beatles project

Now back to our project, what did we do? We wanted to see if we could train an already existing language model in a way that it could produce Beatles-like text, maybe even Beatles-like songs! First, we searched for language models that could be used. From the Hugging Face Hub we chose to use GPT-Medium, GPT-Neo-125m and the Bloom-560m model, which are already trained language models by Hugging Face users. To teach these pre-trained models how to behave we required additional training on Beatles lyrics. Therefore, we created a dataset with all Beatles lyrics and fine-tuned our models. The newly trained models were put on the Hugging Face Hub, so that we could use these for our text generation, and the community as well. With these models, we were able to generate lyrics based on an input prompt. We evaluated several models to see which settings gave the best result. In the figure below, an example of a generated text is shown.


But what are song lyrics without a song title? So we extended our scope and based on a summarization model ‘story-to-title‘ we created a title for the generated lyrics as well. This model created by Caleb Zearing is based on a T5 model and trained on a large collection of movie descriptions and titles. This model proved valuable for our Beatles song titles: when given generated lyrics it will generate a corresponding title.

Only one thing was missing then: a song cover. Wouldn’t it be great if we could generate an album cover image based on the title? With the introduction of so-called Stable Diffusion models a text-prompt can be used to generate a detailed image. Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.

Put it all together on a Hugging Face space

When you are ready to show the world what you have made, Hugging Face Spaces makes it easy to present your work. These Spaces are great to showcase your work to a non-technical audience. In our case we’ve used the Gradio library to build our demo. Gradio wraps a python function into a user interface and the result can be launched within a jupyter notebook, a colab notebook, on your own website or hosted on Hugging Face Spaces.

We want to combine the lyrics, title and cover generation function into a good-looking interface, and Gradio has all we need to showcase what we have done. In the example code snippet below we simple call the Gradio Interface creation.

             , inputs=[input_box, temperature, top_p, given_input_style]
             , outputs=[gen_title, gen_lyrics, gen_image, gen_image_style]
             , title=title
             , css=css
             , description=description
             , article=article
             , allow_flagging='never'

The function that is used is the function we made to generate the lyrics, title and cover. The inputs consist of the input prompt, some model parameters and a list to choose from for the style of the album cover. Lastly, the type of output is defined. That is all there is to it. Our Beatles lyrics demo can be found here, as well as the code to reproduce it.

Principal Consultant & Data Scientist
Simone Spierings
Senior Consultant
Close Menu