"We always have a human review AI recommendations before any hiring decision." I've heard some version of this from almost every enterprise HR leader I've spoken to about AI governance. It sounds rigorous. Research out of the University of Washington suggests it's largely ineffective — and in some cases, actively counterproductive. This is worth sitting with.
---
Here's what the UW study actually found: when humans were provided with biased AI recommendations, they mirrored the AI's bias approximately 90% of the time in severe cases. Bias dropped only 13% when participants completed an implicit association test before reviewing the AI recommendations. In other words, the human in the loop wasn't functioning as a check on the AI — they were a conduit for the AI's bias to become a human decision.
This is automation bias, and it's one of the most robust findings in cognitive psychology. When humans receive authoritative-seeming outputs from systems they perceive as objective — and AI systems are typically perceived as more objective than human judgment — they defer. They don't critically evaluate. They anchor on what the system told them and adjust only marginally from there.
The compliance implications are significant. Eighty percent of organizations using AI hiring tools say they don't reject applicants without human review. That statistic is meant to be reassuring. Given the UW research, it mostly demonstrates that these organizations have built a liability transfer mechanism that doesn't actually work as advertised. The human review isn't catching the bias. It's ratifying it.
So what does real human oversight actually look like?
Structured override training is non-negotiable.
You cannot simply tell a recruiter to "use their judgment" when reviewing an AI recommendation and expect that to work. Humans need structured training that explicitly identifies common AI failure modes — over-reliance on resume keywords, demographic proxy signals, credential inflation — and teaches them to evaluate candidates against independent criteria before reviewing the AI score. If your AI training program consists of a 45-minute onboarding video, it's not producing real oversight.
Override rates as a metric.
If your human reviewers are overriding AI recommendations at a rate approaching zero, that's not a sign the AI is working well — it's a sign humans aren't functioning as an independent check. Healthy override rates vary by role and context, but a system where humans override AI recommendations zero to two percent of the time is a system where human review has become a rubber stamp. Measure it. Set expectations around what an appropriate override rate looks like.
Disaggregated outcome audits.
Human-in-the-loop processes need to be audited for disparate impact, not just the AI's initial outputs. If the AI recommendations show no disparity but the final human decisions do, you have a problem that your AI audit won't catch. If the AI shows disparity and the human review doesn't reduce it, that's the UW finding playing out in your organization. Audit the full pipeline: AI output, human review outcome, and final decision — disaggregated by protected class.
Make override easy and visible.
One underappreciated design issue: most ATS workflows make it easier to accept an AI recommendation than to override it. The path of least resistance is confirmation. If you want humans to exercise genuine independent judgment, the workflow needs to make that judgment visible and make override frictionless. This is a product design problem masquerading as a policy problem.
Train for specific AI failure modes, not general skepticism.
Telling recruiters to "be skeptical of AI" doesn't produce useful skepticism. Training recruiters on the specific failure modes of the AI tools they're using — which candidate populations are underrepresented in training data, which signals correlate with race or gender in the model — produces actionable skepticism. This requires vendors to share more about their models than they typically do by default. Negotiate that transparency into your contracts.
The underlying economics of this matter. If "human in the loop" governance isn't functioning as a bias check, organizations are spending resources on a compliance theater that exposes them to legal liability rather than reducing it. The question isn't whether to have human oversight — it's whether your human oversight is designed to actually work.
---
Quick Hits
Automation bias is a feature, not a bug.
Humans defaulting to authoritative system outputs isn't irrational — in most domains, automated systems are more reliable than individual human judgment. The problem is that AI hiring tools are not always more reliable, especially for underrepresented candidate populations. Automation bias becomes dangerous precisely where AI reliability is lowest. Recruiters need to know where their AI tools fail, not just where they succeed.
Designing effective override processes: the details that matter.
Effective override processes include: a required field for override rationale, visibility of override rates to managers, regular review of override patterns for consistency, and periodic calibration sessions where reviewers discuss divergent decisions. None of this is complicated. Almost no organization has all of it in place. Implementing these design elements costs less than one EEOC investigation.
Manager AI training gaps are the overlooked compliance risk.
Most AI training in HR organizations targets recruiters. Hiring managers — who often make final decisions and are the humans "in the loop" for managerial roles — receive far less training. If a hiring manager is anchoring on an AI score they barely understand and can't interrogate, the human review is meaningless. Training needs to extend to every human who touches an AI-informed decision.
---
The Operator's Take
I think about human oversight constantly, because what it actually means — as distinct from what it's claimed to mean — matters to whether AI hiring tools produce fair outcomes at scale.
The companies getting this right share a few things: they've measured their override rates, they know where their AI tools are weakest, and they've designed workflows that make genuine independent review the path of least resistance rather than the exception. The companies getting it wrong are checking a compliance box with a training module and calling it governance.
The difference shows up eventually — in disparate impact audits, in EEOC charges, in litigation. The UW research isn't academic. It's a description of what's happening in HR offices across the country right now.
---
Auditing for AI bias requires a structured process — not a one-time check, but a repeatable methodology that catches disparate impact before regulators or plaintiffs do. My checklist covers data inputs, model outputs, human review processes, and final decision outcomes — the full pipeline.
Get it here → AI Bias Audit Checklist ($29 on Gumroad)