How to Promote Awareness and Integrate Ethics into Your Data-Driven Organization
Het bedrijfsleven kan nu al tijden niet meer zonder data: begrippen als data literacy en datagedreven werken zijn gemeengoed geworden, om nog maar te zwijgen over machine learning en AI. Er wordt meer en meer vertrouwd op data in het maken van dagelijks beslissingen. Hiermee neemt de aandacht toe voor het maken van eerlijke en verantwoorde datagedreven beslissingen. Dat is terecht, want de schade van onethische beslissingen kan groot zijn voor de betrokkenen en daarmee je eigen organisatie.
A well-known challenge is implementing data-driven processes in an ethical manner, which extends far beyond analyzing the results of predictive models. Ethical practices should be incorporated into every step of the process, making it the responsibility of every employee. A deep understanding of ethical data-driven practices is crucial for all members of an organization.
Ethiek als begrip kan nauw samenhangen met regelgeving zoals opgenomen in de Algemene verordening gegevensbescherming (AVG). Hierbij valt bijvoorbeeld te denken aan het in eerste instantie verkrijgen van toegang tot de gegevens (wie of welke rol mag welke gegevens inzien?), maar ook welke data mag worden gebruikt en voor hoe lang. In dit artikel richten we ons bewust niet op deze specifieke regelgeving, maar leggen we de nadruk op ethiek als doorlopend proces.
We will guide you through a step-by-step approach to embedding ethics as a design principle in your data-driven processes, using the widely recognized CRISP-DM framework. This article offers an overview of CRISP-DM, focusing on the ethical considerations within each step, with subsequent articles diving deeper into each phase.
CRISP-DM: A Foundation for Ethical Data-Driven Processes
CRISP-DM (Cross-Industry Standard Process for Data Mining) is een raamwerk voor datagedreven processen. Het biedt een planmatige aanpak en is gericht op samenwerking tussen de verschillende rollen die betrokken zijn bij zo’n proces. CRISP-DM is wijdverspreid maar biedt out of the box geen handvatten voor ethiek. Wil je meer weten over CRISP-DM lees dan dit artikel wat hier verder op ingaat.
Rather than creating lengthy, seldom-used documents, we advocate for concise, actionable guidelines that facilitate ethical discussions and address key concerns for future projects.
Business understanding
The first step involves defining the project’s objectives and expected outcomes, ensuring that a broad range of stakeholders from various departments and roles are involved. A key ethical question to consider here is:
Who is the end user, how will they be affected by the analysis/model, and what unintended consequences might arise?
When answering this question, it will become clear whether ethical objections can be raised against the intended goal. Sub-questions that can help with this include, for example:
- What is the goal of the project, and could it raise any ethical concerns?
- What data is needed, and has it been used in previous projects with any identified issues?
- Was the data collected for a purpose aligned with this project’s objectives?
- How will the results be applied in practice, and what impact could this have on the end user?
- How do we mitigate potential biases in the data, and how will we evaluate our results?
Ethical oversight must be a continuous process, not limited to a single assessment at the project’s outset.
Data understanding
This phase involves exploring and comprehending the data’s relevance to the project, with the data analyst playing a key role in evaluating data quality. Central to this is the principle of data minimization—using the least sensitive data necessary to achieve the desired outcome. Ethical considerations at this stage include:
What biases might already exist in the data, and is it still suitable for use? Is the data fairly distributed and representative of the target group?
Other critical questions include:
- Are there known biases, and what measures can be taken to minimize them?
- Are there tests we can run to assess the influence of bias on the final results?
- Could the data contain hidden information that might lead to unintended biases? (It is well known, for example, that a postal code area can also represent income levels.)
- Does the data reflect the intended target group accurately?
As with all stages, ethical concerns related to business goals should be revisited.
Data preparation
In this stage, the data analyst prepares the data for the next phases. The challenges identified in the previous step are addressed and resolved as much as possible, or measures are taken to minimize them. The goal is to create a reliable dataset, free from as much bias as possible for the next phase. The analyst must consider how bias can be removed from the data. For example, she focuses on the question:
Are you aware of unavoidable biases, and what steps are being taken to mitigate them?
It is important to document the potential risks arising from the known bias and how the results from this step will be applied in the workflow. Additionally, it is essential to record the data preparation process itself to ensure transparency regarding the decisions that were made. Additionally, the analyst must ensure the preparation process itself does not introduce new biases.
Other important questions in this phase include:
- Which data needs to be pseudonymized before it can be used?
- How do you ensure that data cannot be traced back to an individual after anonymization?
- Does the preparation process unintentionally create bias?
Modeling
In the Modeling step, it is again primarily the data analyst who takes the lead. Don’t be misled by the name “Modeling” his step is not only about developing predictive models but also involves creating reports and analyses, or leveraging generative AI (such as GPT or other large language models). It may also involve creating a descriptive analysis. The central question here is:
Levert het product ongewenste, oneerlijke of biased resultaten, en snap ik hoe mijn model tot zijn voorspelling komt?
At this stage, it is important to continually verify whether the analyst has validated all her assumptions. Some in-depth questions that can assist in this process include:
- Welke principes kan ik gebruiken om mijn output te testen op bias en ethiek? Lees hier meer over hoe je dit kan doen met behulp van de Python package Fairlearn.
- Can I develop unit tests that provide insight into specific examples and the outcomes they would lead to (e.g., testing a facial recognition model by showing both men and women)?
- What metric should I use to select the best model? What are the implications of choosing this metric? (e.g., if you are predicting cancer, you may accept more ‘false positives’ than if you are predicting customer churn).
Evaluation
In this step, you evaluate the performance and outcomes of the model or analysis, and determine to what extent they meet the objectives set in step one, the business understanding. Here, you should challenge yourself and the team with the question: do I want to publish this, and what would the reactions be? This also brings back the earlier question:
Wat is de impact van de uitkomst van een analyse of model op de gebruiker?
Further questions include:
- Has unintended bias been introduced in the data or model?
- Are the results transparent and explainable?
- What are the consequences of publishing it?
Deployment
This step involves the implementation and launch of a product. Be transparent about any limitations the model may have that could introduce bias. These should have been identified and documented in the previous steps. It is also important to closely monitor how the product evolves over time and whether any unintended side effects or outcomes emerge later on. If such issues arise, is it clear and transparent where they stem from and how they can be addressed? One of the key monitoring questions that must continue to be asked after the product goes live is:
Are adjustments needed in earlier steps?
Be aware that new factors can influence the outcome of the application and may still cause unintended impacts, such as affecting the privacy of the end user. Therefore, consider questions such as:
- What are the long-term consequences of the choices made and the use of the data?
- How is the data stored, and what impact will this have on its usage in the future?
Conclusion
This first article provides practical guidance for fostering awareness and incorporating ethics into your organization’s data-driven processes. Our key message is that ethical behavior should permeate every aspect of the organization, rather than resting on one individual or department.
In our upcoming articles, we will delve deeper into the various roles within an organization and further elaborate on the model outlined above. For instance, how do you initiate and navigate the conversation around ethics from different roles?