Tell me what you read: online reading segmentation for marketing personalization

12 February 2019

During the MIE event 2019 on the 7th of Februari, Margot Rozendaal (De Persgroep) and Jurriaan Nagelkerke (Cmotions) explained how De Persgroep was able to personalize content for online readers, by using online reading segmentation. With their presentation, the duo showed their audience which different segmentations De Persgroep uses to create more relevance for its readers. The presentation also explained how behavioural segmentation brings the online user closer to the business. Topic modelling and segmentation techniques were used to create nine different online reading profiles. These segments are used in the online channels to approach visitors with offers that are in line with their interests in news.


The reason

De Persgroep (AD, Volkskrant, Trouw, various regional titles) is in the middle of a digital transition. Even in 2018 De Persgroep saw an increase of 86% for online subscribers. Given a monthly total of 28 million articles read, the consumer clearly knows how to find the different news platforms. About 1.3 million subscribers are presented daily with all different kinds of news: from current affairs in international politics to the latest show news. That means a lot of data is available. Millions of visitors each week and billions of clicks on articles, information regarding reading time, as well as the type of device that was used. This results in about 70 GB of useable data each day.


Optimalization of the Customer Life Cycle

The Customer Intelligence team at De Persgroep utilizes a large number of data sources to provide various internal customer groups with insights and data products. Here, online data is increasingly used to optimize the Customer Life Cycle, among others. Most of the visitors are anonymous. Data from non-anonymous visitors can be combined with CRM-data, such as information about current subscriptions, what way a subscriber was recruited and data about the household. Al this data is used in a number of different data products, such as dashboards, rapports and models. These are models which for example try to predict behaviour, such as the ideal timing for cross-selling or renewing a subscription.


Segmentations within the Customer Life Cycle

For optimizing the customer life cycle, De Persgroep also makes use of segmentations. In essence, this is about creating homogenous groups of customers, yet heterogeneous in relation to each other. It is important to determine beforehand the insight you want to gain from the segmentations, emphasizes Margot. On one hand, this dictates the data you need to put in, while on the other hand it determines the possibilities you gain from this. Margot compares this to making pasta: as soon as you have made spaghetti from flower and eggs, you can never make it into lasagna. This is the reason De Persgroep uses multiple segmentations.


Need-driven segmentation

The first segmentation discussed by Margot, is a need-driven segmentation. This is made based on a questionnaire that was sent to 5.000 consumers. The questionnaire contained questions with regard to the participants’ view on current affairs, the consuming of news, how often one watches news, watching behaviour on other media, etc. This resulted in four different segments: experts, fans, addicts and passives.
The insight into the need-driven segments is used for developing ‘above the line’ communication for costumers, with regard to their intentions for consuming news. At the same time, a need arose within the business, which ended in a request to the Customer Intelligence team, to deliver these segments in selections. An example for this is a selection of the ‘experts’ for De Volkskrant on a household level.


Intention vs. actual behaviour

The segmentation based on need is only known among people who filled out the questionnaire. This is the reason Margot and Jurriaan have tried to predict the need segments in a number of different ways. Based on sociodemographic data, subscription data and online clicks a query was executed to see whether or not the segment showed differences. If this were to be the case, this would make it possible to predict for anyone who did not fill out the questionnaire if they are an ‘addict’, an ‘expert’, ‘fan’ or ‘passive’. Unfortunately, it turned out that the different segments hardly showed any differences with regard to the characteristics that were researched. Only one segment was identified where some differences could be distinguished, which was the segment of the ‘passives’. For De Persgroep, this is actually not a segment which yields a large turnover…
The conclusion was that there is a difference between the intentions of subscribers and their actual behaviour. When you are asked what you would like to read about the United Kingdom, then you might say that you want to follow news about the Brexit. While in reality it turns out that you mostly like reading about prins Harry and prinses Meghan.


Preferences of online visitors

To address all online visitors with regard to their preferences, a segmentation was chosen solely based on online behaviour. This resulted in three combinations of segmentations:

  • ‘How’ segmentation: How do you read news? On a smartphone, tablet or desktop?
  • ‘When segmentation: When do you read the news? During office hours, or rather in the evening?
  • ‘What’ segmentation: What are you actually reading? Which topics have your interest?

Jurriaan elaborated on the ‘What’ segmentation. A segmentation based on reading interests was made in two steps with Python, by using text analysis:

  1. What are the articles about? (Topic modeling)
  2. What are visitors reading about? (Cluster analyse)


Topic modelling

In his presentation, Jurriaan explained why already available classifications of articles, such as the layout of articles on the website, could not be used to serve as a basis for the segmentation of online visitors. This is the reason why topic modelling was used to identify the topics and to link all articles to these topics. Topic modelling is a statistical technique to identify topics from a collection of documents – in this case news articles. From the analysis of 400,000 articles, eventually a number of 30 reading interests emerged. These interests can very well be distinguished from each other, on a similar level and also independent from time. This is important because for its current application – segmenting online visitors on reading interests based on their long-term reading behaviour – De Persgroep is looking for a trusted tool that will remain usable for a long time.


Cluster analysis

After articles have been given topics, a reading profile can be made for each online visitor. Based on these reading profiles, a cluster analysis – kmeans – has resulted in 9 different segments for reading interests. The specific reading interests within the segments are as homogenous as possible, while the segments themselves are differentiated as much as possible. The biggest segment is for current affairs (Nieuws van de Dag). This is a group of readers who return often to follow the latest news. It is the biggest segment for all nationwide and regional titles. Although all segments occur with all titles, there are clear differences in the frequency that different segments occur. Titles like De Volkskrant and Trouw have more readers in the segment Current Affairs while for AD and regional newspapers the segment Sport attracts more readers. Surprisingly, the segment Cultuur en Vermaak has a relatively high frequency with the more ‘serious’ newspaper De Volkskrant. Extensive profiling makes the different segments relevant for the readers. Jurriaan indicated that at present, realtime online visitors are categorised in the segments, so the customer experience can be adapted accordingly. Monitoring the behaviour of the segments with dashboards, provides insight in shifting behaviour. It also makes clear whether or not personalization based on these insights, works.
After some interesting discussions with regard to the various segmentation forms and the online behavioural segmentation, Margot and Jurrian gave the audience three tips to end their presentation:

Tip 1:
Don’t forget the pasta! (make sure the type of segmentation fits your goal. Once you have chosen spaghetti…)

Tip 2:
Segmentation = More Art than Science. (There is no ‘best solution’ for optimalization solely based on data. There is a best solution based on application. Be creative!)

Tip 3:
Test, test, test (Segments offer the opportunity to increase relevance. It only takes the right content.)

Latest news

Nachos Hackathon 2022

7 September 2022

We don’t know if you’ve heard already, but there is yet another crisis on our horizon:... read more

In 2022, we are going to have a great year of celebration! Many weddings expected

21 February 2022

Plan your calendar free and make sure you have plenty of party clothes in your closet,... read more

Can you escape our “Power BI Escaperoom”?

7 April 2021

In these boring lockdown-times we are all desperately looking for ways to still interact with our... read more

Subscribe to our newsletter

Never miss anything in the field of advanced analytics, data science and its application within organizations!