AI Model Identifies Advanced Heart Failure in Routine Ultrasounds
Researchers at Weill Cornell Medicine, Cornell Tech, and Columbia University have built a machine learning system that detects advanced heart failure using standard echocardiograms-ultrasounds already performed at nearly every hospital. The model achieved 85 percent accuracy in identifying high-risk patients without requiring the specialized exercise tests that most community hospitals cannot perform.
Advanced heart failure affects roughly 200,000 Americans and carries a one-year survival rate below 50 percent. Yet fewer than 6,000 patients per year receive advanced treatments such as heart transplants or mechanical pumps. The gap between need and treatment often begins in the clinic, where severity goes unrecognized.
The Diagnostic Problem
The standard for assessing advanced heart failure is cardiopulmonary exercise testing, which measures peak VO₂-how much oxygen the body consumes during intense exertion. This test reveals how hard the heart actually works under stress and determines who needs urgent intervention.
The problem is access. The test requires specialized equipment, trained personnel, and protocols that most community hospitals don't maintain. Patients who never reach academic medical centers, or who are never referred for testing, fall through without an accurate severity assessment.
How the Model Works
The research team asked whether machine learning could detect patterns that human reviewers often miss. They built a system analyzing multiple types of ultrasound data simultaneously: video images of the heart's chambers, valve motion patterns, and Doppler signals measuring blood flow. The model combined these imaging inputs with electronic health record data including patient age, body mass index, and standard clinical measurements.
Researchers trained the model on 1,000 patients at NewYork-Presbyterian/Columbia University Irving Medical Center, then tested it on 127 patients across three different hospitals. This external validation is critical-it forces the model to prove itself outside the environment where it was built.
Performance was consistent. The system scored 0.849 on the training hospitals and 0.870 on the external validation group. The slightly better external performance is unusual and encouraging.
Where It Falls Short
Accuracy dropped in patients aged 60 and older. The researchers attribute this to smaller representation of older patients in the training data and greater clinical complexity in that group. Age-related changes in heart structure are harder to interpret, and the model reflects that difficulty.
Accuracy also varied across racial groups and imaging modalities. Spectral Doppler data was particularly sensitive to differences between hospitals, suggesting that variation in how data is collected or calibrated affects predictions.
All four hospitals involved are in the New York area, which may not reflect equipment, patient populations, or clinical practices elsewhere. The external validation group was relatively small at 127 patients. Additionally, ultrasound scans and exercise tests were not always performed at the same point in a patient's care-sometimes weeks or months apart-meaning the model was occasionally predicting values measured before or after the imaging data it analyzed.
Next Steps
The model consistently outperformed earlier approaches to estimating peak VO₂ from non-exercise data across hospitals it had never seen during training. The research team's vision is integration: a model running quietly in the background of a hospital's imaging system, generating a risk estimate whenever an echocardiogram is processed and flagging the result alongside the standard report.
A clinician seeing an elevated prediction could then refer the patient for formal cardiopulmonary testing, accelerating a pathway to advanced care that currently takes months or never happens at all. For patients in smaller hospitals, rural settings, or places without exercise testing infrastructure, that shift could mean the difference between receiving advanced treatment and never being identified as needing it.
The work requires prospective clinical trials to validate performance in real-world deployment, but the foundation is solid enough to warrant that next step.
The study was published March 3 in npj Digital Medicine. Learn more about AI for Healthcare and AI Research Courses.
Your membership also unlocks: