Explainable AI for Accurate Differentiation of Voice Disorders Through Acoustic Analysis

AI models analyzed voice recordings to classify disorders with 99.44% accuracy using explainable techniques. This aids non-invasive, transparent diagnosis supporting clinicians.

Categorized in: AI News Science and Research

Published on: May 26, 2025

Differentiability of Voice Disorders through Explainable AI

The voice reflects various health conditions, and detecting disorders early is crucial for effective treatment. Traditional phoniatric exams involve acoustic analysis of vocal signals, but these require specialist equipment and expertise. Recent advances in AI offer promising alternatives by analyzing voice recordings to identify pathologies automatically. This article explores a study where deep learning and explainable AI techniques were applied to classify voice disorders with high accuracy and transparency.

Voice Disorders and Their Categories

Voice disorders arise from anatomical, functional, or paralytic issues affecting voice production. They can be broadly grouped into three categories relevant to this study:

Hyperkinetic Dysphonia: Characterized by excessive muscular contraction leading to strained, labored voice quality. Conditions include vocal cord nodules, polyps, and Reinke’s edema.
Hypokinetic Dysphonia: Caused by reduced vocal fold closure resulting in breathy, weak voice. Includes vocal fold paralysis, glottic insufficiency, and laryngitis.
Reflux Laryngitis: Inflammation from gastric acid reflux causing chronic hoarseness and other symptoms.

Diagnosis usually involves laryngoscopy, an invasive procedure to inspect vocal fold anatomy. Acoustic analysis offers a non-invasive alternative by measuring voice features from recorded sounds.

Data and Methods

The study used the publicly available VOICED dataset, which contains recordings from 208 adults—150 with voice disorders and 58 healthy controls. Each participant provided a 5-second recording of the vowel /a/, captured with a mobile phone microphone in controlled conditions.

Recordings were pre-processed to remove noise using a low-pass FIR filter with a Hanning window. Each 5-second audio sample was split into overlapping 250 ms segments, generating 36 segments per recording. These segments were converted into Mel spectrograms, a time-frequency representation that aligns with human auditory perception.

For classification, transfer learning was applied with three pre-trained convolutional neural networks (CNNs): OpenL3, Yamnet, and VGGish. Models were fine-tuned on the 8-class problem, which includes seven voice disorder categories plus healthy voices. The dataset was split 70/30 for training and testing, with 5-fold cross-validation used to ensure robustness.

Explainable AI (XAI) for Transparent Diagnosis

To avoid the black-box nature of deep networks, the study used an explainability technique called Occlusion Sensitivity. This method systematically masks parts of the input spectrogram and measures how the model's confidence changes. By averaging these sensitivity maps across samples, researchers identified which time-frequency regions the model relied on for classification.

This approach introduces the concept of differentiability, describing how distinct the features of different voice disorders are from the model’s perspective. Understanding these discriminative features aids clinicians in trusting AI decisions and may reveal new acoustic biomarkers for various pathologies.

Results

The OpenL3 model achieved the highest accuracy of 99.44% across all eight classes. While some classes like Glottic Insufficiency had slightly lower precision (~98.2%), overall performance remained excellent. Yamnet and VGGish also performed well but with marginally lower accuracy.

Explainability maps highlighted that the model classifies voices based on the presence or absence of specific frequency patterns and vocal intensities within short 250 ms windows. These insights confirm that the model leverages physiologically relevant features rather than arbitrary cues.

Implications and Future Directions

This work demonstrates that combining transfer learning with explainability methods can produce highly accurate and interpretable voice disorder classifiers. Such tools can support clinicians by offering rapid, non-invasive screening, especially useful in telemedicine or resource-limited settings.

Although AI-based diagnosis does not replace specialist consultation or laryngoscopy, it provides valuable decision support and verification. Moreover, voice analysis as a biomarker extends beyond voice disorders; similar techniques could aid in detecting diseases like Parkinson’s or type 2 diabetes from vocal patterns.

Access to Data and Code

The VOICED dataset is publicly available for research purposes at PhysioNet. The related source code for generating Mel spectrograms, transfer learning models, and explainability maps can be found at Zenodo.

For those interested in expanding their AI knowledge in healthcare and related fields, comprehensive courses and resources are available at Complete AI Training.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

Explainable AI for Accurate Differentiation of Voice Disorders Through Acoustic Analysis

Differentiability of Voice Disorders through Explainable AI

Voice Disorders and Their Categories

Data and Methods

Explainable AI (XAI) for Transparent Diagnosis

Results

Implications and Future Directions

Access to Data and Code

Related AI News for Science and Research

UK researchers get priority access to Google AI for Science and Willow quantum processor, with an automated materials lab due in 2026

UK-DeepMind Partnership Launches Automated Research Lab and AI Tools for Schools and Government

AIRE Workshop Ignites AI Innovation and Student Research at Augusta University

Khatchig Mouradian Joins $11M Schmidt Sciences Initiative Bringing AI to the Humanities

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: