pixelPrejudice
Loading investigation
An Investigation into AI Vision Bias

pixelPrejudice

The Problem

AImodelsdon'tjustseefaces.

Theyjudgethem.

We tested three leading vision-language models on 10,247 faces from 47 countries. The pattern was undeniable.

−0%
trust gap between the lightest
and darkest skin tones
p < 0.001
How pixelPrejudice Works
01Collect

10,247 faces from FairFace — 47 countries, 6 Fitzpatrick skin types.

02Evaluate

Each face judged by Qwen3-30B, Qwen2.5-7B, and Phi-3.5 via our agentic pipeline.

03Analyze

LLM-as-Judge scoring, statistical correlation analysis, automated bias reporting.

04Expose

Interactive evidence room — 3D globe, prompt hunting, real-time photo auditing.

The Verdict

Three models. All biased. None innocent.

0
fairness ×1000
Phi-3.5
-0.207
0
fairness ×1000
Qwen2.5-7B
-0.106
0
fairness ×1000
Qwen3-30B
-0.089
"A 2-point gap on a 10-point scale is not noise. It is prejudice, encoded in weights."
pixelPrejudice · TUM × Sony AI Hackathon 2026