AI hiring tools that score candidates on facial expressions and tone create legal and audit risks for employers

AI hiring tools that score tone, eye contact, and facial expressions face growing legal exposure - and most employers can't explain how those scores were produced. New rules in NYC, Illinois, and the EU now require transparency and bias audits.

Categorized in: AI News Human Resources
Published on: Apr 21, 2026
AI hiring tools that score candidates on facial expressions and tone create legal and audit risks for employers

The Legal Risk Hidden in Your AI Hiring Tools

Somewhere in your organization's hiring stack sits an AI system scoring candidates. If you approved it, ask yourself this: If a candidate, auditor or regulator challenged one of those scores, could your team explain how it was produced?

Not "the vendor says it's accurate." Not "the model was trained on historical data." A specific, documented explanation of what criteria were evaluated, how the candidate performed against them, and why those criteria matter for the job.

For a growing number of organizations using AI video interview scoring tools, the honest answer is no. As regulatory frameworks move from guidance to enforcement, that answer is becoming a liability.

What the system is actually measuring

Many video interview platforms score tone of voice, pace, eye contact, facial expressions and fluency alongside or instead of the actual content of responses. The assumption is that these signals predict job performance or cultural fit. The evidence for that assumption is weak. The evidence that these measurements introduce systematic, legally significant bias is much stronger.

Several major vendors removed facial analysis features after regulatory pressure and public scrutiny. That acknowledgment-that criteria marketed as objective were neither reliable nor fair-raises a harder question: If those criteria were in production and no one caught it until outside pressure forced a change, what else is still being measured that shouldn't be?

The EEOC has made clear that employers are liable under Title VII for discriminatory outcomes from AI hiring tools, regardless of whether they were built in-house or purchased from a vendor.

Regulatory frameworks are tightening. New York City's Local Law 144 requires annual independent bias audits of automated employment decision tools and public disclosure of results. Illinois requires notice and consent before AI evaluates video interviews. The EU AI Act, with high-risk provisions taking full effect in August, explicitly classifies employment AI as high-risk and mandates transparency, explainability and human oversight.

The common requirement across all of them: Can you explain what your AI is measuring, and can you demonstrate it's measuring the right things?

The accountability problem

Consider a hiring decision challenged by a candidate, internal audit or regulator. The question is how the decision was made. "The AI scored them lower" is not a defensible answer in any of those contexts.

It can't be traced to specific job-relevant criteria. It can't be explained to the candidate. It won't satisfy an auditor. If the system's logic is proprietary and opaque, the organization has no way to produce a satisfying answer even if it wants to.

Organizations often adopt black-box scoring tools with legitimate intentions: reduce human bias and create consistency. Those are sound goals. But a system whose internal logic can't be questioned, explained or audited just obscures bias. It doesn't reduce it. And when bias becomes difficult to see, it becomes more difficult to address.

When a system produces plausible outcomes that are wrong in ways not immediately visible, the failure compounds before it surfaces. The cost of discovering it late is almost always higher than the cost of building it right from the start.

What defensible scoring looks like

There is a meaningful difference between AI that scores interviews and AI that scores interviews in a way that can be explained and defended. The distinction is structural.

Defensible scoring starts before any candidate records a response. It starts with the job itself. What competencies does the role require? What does strong performance against each competency look like? From those answers, explicit rubrics are developed-descriptions of what high-quality, adequate and weak responses look like for each dimension being evaluated.

Those rubrics are reviewed and approved by the hiring team before scoring begins. When responses come in, the AI evaluates what candidates actually said against those pre-defined criteria. Not tone. Not pacing. Not facial expression. The content of their communication, measured against a standard the hiring team set and can explain.

Criterion-level scores roll up to an overall assessment, and every part of that chain is visible and auditable. The human remains meaningfully in the loop. The AI generates a starting point by identifying relevant competencies and drafting rubric criteria from the job description, but the standard is owned by the people responsible for the hire.

If a hiring manager can't look at a scoring rubric and explain what it's evaluating and why, it should not be deployed. That is not a burden on the tool. It is the minimum condition for using it responsibly.

Four questions for governance

What specifically is the system scoring?
Request an explicit list of evaluation criteria. If the answer includes anything beyond the content of candidate responses, ask for the validation data connecting those criteria to actual job performance outcomes.

Are the criteria derived from job requirements?
Generic rubrics applied uniformly across roles create standardized evaluation, not structured evaluation. Legitimate scoring starts from the specific competencies required for the specific role.

Can the criteria be reviewed, modified and approved before scoring begins?
If the rubrics are fixed and opaque, your organization is not in control of its own evaluation standard. That is a governance gap.

Can any score be explained to a candidate or a regulator?
This is the accountability test. If the explanation requires "the AI said so" rather than pointing to specific, documented criteria and how a candidate performed against them, the process will not withstand scrutiny.

Well-designed systems answer these questions directly. The ones that can't are telling you something important about the tradeoffs their creators made.

Why this matters now

The EU AI Act deadline is in August, forcing organizations with global operations or EU-based candidates to evaluate their technology. But getting this right isn't just regulatory-it's practical.

When hiring teams can see exactly how a score was produced, they use it. When they can't explain it, they override it or work around it, and the efficiency gains disappear. The tools that will last in enterprise hiring stacks are the ones transparent enough that the humans responsible for those decisions trust them.

That's not a high bar. But it requires being precise about what any given AI system is actually measuring, and honest about whether that's what you actually want to know.

For HR leaders overseeing these systems, consider exploring AI for Human Resources resources to build organizational competency in evaluating and governing AI hiring tools. Those responsible for strategy may benefit from the AI Learning Path for CHROs, which addresses workforce analytics, recruitment automation, and responsible AI implementation.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)