Skip to content

<blog>
Understanding AI Bias

Chris Mauck

June 7, 2024 • 4+ minute read

Image credit: Original image

Originally appeared in LinkedIn Future Singularity

As artificial intelligence (AI) systems grow increasingly common in decision-making processes that affect people's lives, we must understand the various sorts of bias that might arise in machine learning models. AI, like humans, can exhibit biases; however, AI bias is embedded mathematically in training data and model design rather than through cognitive biases and heuristics. If left ignored, AI bias might result in discriminating and unpleasant outcomes that perpetuate societal stereotypes.

Bias in machine learning refers to the simplistic assumptions and systemic errors that allow a model to overlook the underlying patterns in data. In contrast, variance describes a model's susceptibility to noise or random fluctuations in training data, which can lead to overfitting.

Types of Bias

There are several key categories of bias in AI systems:

Algorithmic bias: When there is an issue with the algorithm used to carry out the calculations underlying machine learning computations, it is known as algorithmic bias. Even if the training data is neutral, this might still lead to biased outcomes due to poor algorithmic design or implementation.

Sample bias: The training data is not large enough or representative of the entire population, resulting in skewed results. This can happen when the data is too small or skewed towards specific groups, resulting in AI systems that perform well for some but poorly for others.

Prejudice bias: Real-world human biases and stereotypes are baked into the AI system via training data. This occurs when data reflects societal preconceptions, which the AI subsequently learns and reinforces.

Measurement bias: This refers to inaccuracies or unbalanced procedures used to measure/collect data, which creates errors. This can happen if the tools or methods used to gather data are flawed or biased themselves.

Exclusion bias: Occurs when important data points or groups are systematically excluded from the training data. This results in a lack of representation of certain groups or circumstances in the AI system's decision-making process.

Selection bias: The data used for training is too little and does not represent the whole distribution. This can produce skewed results and poor generalization.

Recall bias: Concerns variations in how human annotators label or categorize training data. This might occur due to subjective interpretations or errors in judgment, resulting in a lack of reliability in training data.

Bias and Variance

While eliminating bias is crucial for AI fairness, we also have to avoid allowing variation to become excessive. Variance refers to a model's tendency to capture noise and random oscillations in training data rather than the genuine underlying patterns. High variance causes overfitting of training data, which occurs when a model performs well on the data it was trained on but fails to generalize to new samples.

There is an inherent tradeoff between a model's bias and variance. Simple linear models have significant bias but low variance. They routinely overlook nonlinear correlations in data. Highly complicated models, such as deep neural nets, have low bias but significant variation. They may overfit idiosyncrasies in the training data.

The goal is to establish the most effective balance of bias and variance for a certain predictive task. Cross-validation, which trains a model on one data split and tests it on another, may help determine if bias or variance is the larger source of error. Dealing with high bias may entail using more features, interactions, or a more adaptable model type. High variance is frequently reduced by adding more training data, using regularization techniques such as dropout, or adjusting pruning parameters.

Essentially, because of the required simplifications and data approximations, every machine learning model will have some degree of bias, variation, or both. The key is to check models for particular causes of systematic bias, such as those mentioned above, as they are being developed. Models that may be biased should undergo extensive testing to ensure that different demographic groups receive varying outcomes.

Working on AI systems requires diverse, multidisciplinary teams that include ethicists, subject matter experts, and members of impacted communities. Obtaining other viewpoints can highlight blind spots that the model developers may have failed to consider.

Mitigating Bias in AI

In order to reduce these biases, strong safeguards must be implemented throughout the AI development process, beginning with data collection and preparation and continuing through algorithm design, testing, and deployment.

Representative and Diverse Data: Make sure the training set is broad, varied, and accurately depicts the population as a whole. This entails proactively searching out and incorporating data from underrepresented groups.

Bias Audits: To detect and reduce biases, regularly audit AI systems. This entails evaluating the AI on diverse subgroups and making any necessary modifications to the model or data.

Transparent Methodologies: Use transparent and consistent techniques for data gathering and annotation. Document the data sources, gathering techniques, and any possible biases in the process.

Algorithmic Fairness: Ensure that fairness is taken into consideration while designing and testing algorithms. One way to address biases in both the training data and the algorithmic design is to use fairness-aware machine learning approaches.

Continuous Monitoring: Following deployment, AI systems should be continually monitored to ensure that they function correctly and fairly across a range of scenarios and populations. Over time, biases can be addressed with the aid of feedback loops and updates based on real-world performance.

Accountability

Beyond technical mitigations, we need robust governance frameworks and accountability measures for AI bias. Clearly defined processes should be in place for evaluating live models, providing affected parties with recourse, and establishing incident response plans for AI errors that discriminate. Regulations pertaining to sensitive industries including healthcare, employment, housing, and finance might require AI fairness procedures and prohibit discriminatory tactics.

Conclusion

AI bias is a huge problem for which there are no ideal answers. However, considering AI's growing influence on critical choices, ignoring it is simply not an option. Through the development of a sophisticated comprehension of the several origins of bias, their interaction with variance, and possible solutions, we can strive to design AI systems that uphold human rights, broaden opportunities, and create a more equitable society.

Further Reading

Some helpful resources that inspired my recap on this important topic:

  1. Machines and Trust: How to Mitigate AI Bias
  2. AI bias: What it is and why it matters
  3. Machine learning bias (AI bias)
  4. Machine Bias (the COMPAS case)
  5. How We Analyzed the COMPAS Recidivism Algorithm

By exploring these resources, you can gain a deeper understanding of how AI is changing the field of mental health and the challenges that come with it.