Research from the University of Washington Information School found that AI resume screeners preferred white-associated names over Black-associated names 85% of the time. That's not a bug report from a niche product. It's a finding about AI tools that are, right now, filtering candidates at companies you've heard of. Understanding what the research actually says — and what it means for organizations using these tools — is not optional.
---
The UW study used a methodology borrowed from decades of audit research on human hiring bias: resume correspondence testing. Researchers sent matched resumes to AI screening systems — resumes identical in qualifications but with names that varied in racial association. The AI systems preferred white-associated names 85% of the time. That disparity held across qualification levels and industries.
This is consequential for several reasons.
The mechanism matters.
AI resume screeners are biased because they were trained on historical hiring data that reflects historical human bias. If an organization's historical hiring data shows that people named Jamal were less likely to advance past screening than people named James — regardless of qualifications — and the AI is trained to identify candidates who look like successful past hires, it will learn to prefer James. The AI isn't doing something humans didn't already do. It's doing it faster, at scale, and without the social friction that sometimes causes humans to pause.
This is precisely the point I've made before: the problem isn't that AI is biased. The problem is that we fed it biased data and then expected it to be objective. The AI is doing its job. The job was specified incorrectly.
The VoxDev findings complicate the picture.
A separate 2025 study from VoxDev and An et al. found that LLM-based screening tools systematically favor female candidates in some contexts while disadvantaging Black male candidates. The pattern isn't simply "AI replicates historical discrimination against women." It's more complex — and more unpredictable. Different model architectures and training datasets produce different bias patterns. A tool that appears to over-index female applicants in one context may disadvantage them in another. This means you cannot assume your AI tool's bias profile based on the vendor's marketing.
The automation bias compounding effect.
The UW's second finding is equally important: when humans reviewed AI recommendations that contained bias, they mirrored that bias 90% of the time in severe cases. The human reviewer isn't catching the AI's bias. They're amplifying it with the authority of a human decision. The bias starts in the data, moves into the model, and then gets transmitted through the human review process that was supposed to serve as a check.
Bias at every stage: training data, model output, human ratification. That's the full pipeline.
What this means for employers using AI screening tools today.
First, the legal exposure is already here. California's FEHA ADS regulations make employers jointly liable with vendors for discriminatory outcomes from automated systems. NYC Local Law 144 requires annual independent bias audits with public results. If you're using an AI screening tool and can't demonstrate you've conducted a bias audit with adverse impact analysis across protected classes, you're exposed.
Second, the practical implication isn't to abandon AI screening. The implication is to audit it. Organizations need to test their own AI tools — not rely on vendor-provided bias documentation — with a methodology that mirrors the UW approach: controlled testing with matched qualifications across name-based demographic proxies and disaggregated outcome analysis across the full candidate pipeline.
Third, training data governance is the upstream fix. Organizations that supply their own historical hiring data to AI vendors for model training or fine-tuning need to audit that data for demographic disparity before training begins. Garbage in, garbage out — except the garbage in AI hiring systems shows up as discrimination at scale.
The 13% finding is underappreciated.
The UW study found that implicit association test completion before reviewing AI recommendations reduced human bias amplification by 13%. That's meaningful — but it also tells you the magnitude of the problem. After a structural intervention designed to disrupt bias, the residual effect is still substantial. IAT training is worth doing. It is not a solution.
The organizations that handle this well aren't the ones that switch off AI screening because the research is scary. They're the ones that run real bias audits, understand what their specific tools are doing to specific candidate populations, and have human review processes designed to actually catch disparate impact — not ratify it.
---
Quick Hits
Five practical steps to reduce AI screening bias now.
Conduct baseline adverse impact analysis on your current AI tool's outputs — before auditors or plaintiffs ask. Audit your historical training data for demographic disparity. Negotiate bias testing transparency into vendor contracts. Implement structured override training with a focus on AI failure modes. Establish a regular cadence of disaggregated outcome audits — not just of AI recommendations, but of final hiring decisions.
Training data is the root cause — and the hardest thing to fix.
Fixing AI bias at the model output level without fixing the training data is addressing a symptom. The root cause is training datasets built from historical hiring decisions that reflected historical discrimination. Upstream interventions — auditing training data, excluding demographic proxies, testing for disparate impact before deploying retrained models — are necessary and underinvested. Most organizations don't have visibility into what data their AI vendors train on.
Implicit association tests reduce bias by 13% — worth doing, not sufficient.
The UW research finding that IAT completion reduced human bias amplification by 13% is practically useful. It means structured awareness training before human review of AI recommendations has a measurable effect. It also means that awareness training alone is insufficient if the AI outputs contain systematic bias. The IAT helps at the margin. Fixing the model and the training data is the substantive fix.
---
The Operator's Take
I work at the intersection of AI and hiring decisions every day. The UW findings don't surprise me — they confirm what anyone who's built AI products on hiring data already knows: the data carries the past, and the AI learns from the past.
What concerns me isn't that these research findings exist. Research revealing problems is how you fix them. What concerns me is how many organizations respond to this research by adding a human review step and calling it done — without measuring whether that human review is actually reducing disparate impact, and without auditing whether the AI's outputs are systematically disadvantaging candidate populations in their specific context.
The bias isn't theoretical. It's in the pipeline, right now, at organizations that believe they're running a fair hiring process because a human signs off on every reject. That belief is comforting. The UW research suggests it's mostly wrong.
---
Running a real bias audit requires a structured methodology — one that covers training data inputs, model output analysis, human review outcomes, and final hiring decisions across protected classes. The checklist I built is based on regulatory requirements and research methodology, not vendor marketing. Use it to audit what you're already running before someone else does.
Get it here → AI Bias Audit Checklist ($29 on Gumroad)