Exploring data science and machine learning in open council information

19 March 2018

VNG Realisatie (the association of municipal councils in the Netherlands) invited Cmotions to an orientation day at the Data Science Hub in Den Bosch. We were welcomed at a beautiful former monastery overlooking the canal by Pim Bliek, the Open Accountability Project Manager at VNG Realisatie. The mission for the day was to look at the extent to which data science can offer added value in strengthening the monitoring role of municipal councils.

 

Council Information

Civil servants work hard every day to support and improve their municipality. These civil servants are directed by the Council Secretary, who in turn reports to the Board of the Mayor and Aldermen. It is the municipal council’s job to monitor whether they are doing their jobs properly. Their tools for doing this include crawling through large quantities of information: council information. Indicators are drawn up for the implementation of laws, which the councillors closely scrutinise to assess performance. For example, the government drew up a series of indicators on the Public Records Act and the Children Act, which municipalities have to meet.

 

Data science as added value

The question at this orientation day was whether we could apply data science and machine learning to automatically go through all reports (council information) and then automatically read the indicators. We soon came to the conclusion that machine learning, and deep learning in particular, are currently making great leaps forward in the area of text analysis, but that this would still be too much to ask at this stage. For the rest of the day we focused on developing a conceptual model of where data science can offer added value in the short term and which is easily achievable.

We collectively had the idea to look first at whether we can determine the context before we read the indicators. Therefore the aim isn’t to read the indicators automatically but to make it possible for the councillors and other clients to do this efficiently and effectively themselves.

The conclusion we reached was that it would be best to first focus on reducing the amount of irrelevant information and finding more relevant information. The point of which would be that councillors would not have to read 60 articles over 60 pages, but five articles on five pages.

 

Using the search engine as a channel

We felt the search engine was the ideal channel for this, given that it is the first channel councillors access to retrieve their information. We proposed three steps for this:

  1. Use text analysis to improve the search engine:
    a. There is likely to be a Pareto effect. 20% of the questions cover 80% of the search queries. But synonyms pose problems for searching. For example: Children’s Act, childcare, toddler law, after-school supervision etc. are all related terms. Start by manually compiling dictionaries based on knowledge of jargon and frequently occurring combinations of words.
    b. This process can be improved subsequently by topic modelling.
    c. Add metadata to articles, such as entity recognition and the topic model.
  2. Find other relevant articles by developing recommendation systems:
    a. content based recommenders: “the following articles are about the same subject:”
    b. user based recommenders: “other people also read”
  3. Reduce the amount of content in articles:
    a. summarise text
    b. “the indicator appears on page 56”

With the input from this interesting day, VNG Realisatie can get to work looking into further possibilities. We were delighted to be able to play our part in this orientation day on data science!

Latest news

Find your “high risk files” according to GDPR using our DriveScanner

17 April 2023

In every company it’s a struggle to make sure we only keep the documents we want... read more

Nachos Hackathon 2022

7 September 2022

We don’t know if you’ve heard already, but there is yet another crisis on our horizon:... read more

What we learned from kaggle’s commonlit readability prize 

15 September 2021

Project-Friday At Cmotions, we love a challenge. Especially those that make us both think and have... read more

Subscribe to our newsletter

Never miss anything in the field of advanced analytics, data science and its application within organizations!