When Data Misleads Us: Bias in Datasets and Models

This entry is part 4 of 4 in the series February 2026 - Bias and Blind Spots

Data carries an aura of authority.

Numbers feel solid. Charts look persuasive. Models produce outputs with an air of precision. In technical contexts, it is easy to assume that data-driven decisions are inherently fairer, more rational, and less biased than human judgment alone.

But data does not speak for itself.

Every dataset is the product of human choices — choices about what to collect, how to measure it, which records to keep, and how to interpret what remains. When those choices go unexamined, data can mislead with confidence.

Bias does not disappear when decisions become data-driven. It often becomes harder to see.

Where Bias Enters the Data Pipeline

Bias can enter at every stage of the data lifecycle.

CollectionData reflects what is observable and valued. If certain groups are underrepresented or excluded from collection, their experiences simply do not appear. Absence is not neutral; it is a form of distortion.

MeasurementWhat we choose to measure — and how we measure it — shapes outcomes. Proxy variables stand in for complex realities, often imperfectly. When proxies are treated as truth, nuance is lost.

Cleaning and PreprocessingMissing values are dropped. Outliers are removed. Categories are merged. Each of these steps involves judgment. What looks like noise may be someone’s reality.

Labelling and InterpretationLabels encode assumptions. Who decides what counts as “success”, “risk”, or “normal”? These decisions reflect values, not facts.

Bias here is rarely malicious. It is procedural. And because it is procedural, it often goes unquestioned.

The Seduction of the Average

One of the most common ways data misleads is through averages.

Averages smooth variation. They create a sense of stability. They are easy to communicate. But they can also hide meaningful differences.

When systems are optimised for the average user, those at the margins are often poorly served. Outcomes may look acceptable overall while being harmful for particular groups.

This is not a flaw in mathematics. It is a limitation in interpretation.

Understanding distributions, variance, and subgroup performance is essential for responsible analysis. Without it, models may reinforce existing inequities while appearing neutral.

Model Bias and Feedback Loops

Models do not merely reflect data; they act on it.

When model outputs influence future data collection — through recommendations, predictions, or automated decisions — feedback loops form. Early biases become amplified over time.

For example:

  • A recommendation system trained on past behaviour shapes future behaviour.
  • A risk model influences who receives scrutiny, shaping the data it later learns from.

Without intervention, these loops reinforce the status quo.

Bias here is dynamic. It evolves with the system.

The Myth of “Just the Data”

One of the most persistent defences against critiques of bias is the phrase: “We’re just using the data.”

This framing suggests inevitability — as though outcomes are unavoidable consequences rather than design choices.

But data does not arrive with instructions. Analysts decide how it is used, which metrics matter, and what trade-offs are acceptable.

Objectivity is not achieved by denying responsibility. It is achieved by owning it.

Recognising Bias Without Paralysis

Acknowledging bias does not mean abandoning data-driven work. It means practising it responsibly.

Practical steps include:

  • examining data provenance and gaps,
  • analysing subgroup outcomes,
  • stress-testing assumptions,
  • documenting limitations clearly,
  • involving diverse perspectives in review.

Transparency matters. So does humility.

No dataset is complete. No model is final. Responsible practitioners remain open to revision.

Bias, Trust, and Accountability

When data-driven systems influence real lives, trust becomes paramount.

Trust is not built through complexity or confidence. It is built through openness, explanation, and accountability.

Being honest about uncertainty does not weaken credibility. It strengthens it.

Building With Care

Data can illuminate. It can also obscure.

The difference lies not in the data itself, but in how we approach it. Bias thrives when assumptions go unchallenged and limitations go unacknowledged.

As this month continues, the invitation is not to abandon data, but to handle it with care — recognising its power, its limits, and its impact.

When data misleads, it is rarely because it is wrong. It is because we have asked it the wrong questions — or listened without humility.

February 2026 - Bias and Blind Spots

Search Me, O God: Naming Our Blind Spots (Ps 139:23–24)