pixelPrejudice

The Problem

AImodelsdon'tjustseefaces.

We tested three leading vision-language models on 10,247 faces from 47 countries. The pattern was undeniable.

−0%

trust gap between the lightest
and darkest skin tones

p < 0.001

How pixelPrejudice Works

01Collect

10,247 faces from FairFace — 47 countries, 6 Fitzpatrick skin types.

02Evaluate

Each face judged by Qwen3-30B, Qwen2.5-7B, and Phi-3.5 via our agentic pipeline.

03Analyze

LLM-as-Judge scoring, statistical correlation analysis, automated bias reporting.

04Expose

Interactive evidence room — 3D globe, prompt hunting, real-time photo auditing.

The Verdict

fairness ×1000

Phi-3.5

-0.207

fairness ×1000

Qwen2.5-7B

-0.106

fairness ×1000

Qwen3-30B

-0.089

"A 2-point gap on a 10-point scale is not noise. It is prejudice, encoded in weights."

pixelPrejudice · TUM × Sony AI Hackathon 2026