Pillar-0 Sets a New Bar for 3D Medical Imaging AI
A research team from UC Berkeley and UCSF has released Pillar-0, an open-source vision-language model that reads CT and MRI scans as full 3D volumes and identifies hundreds of clinical findings from a single study. In validation across chest CT, abdomen CT, brain CT, and breast MRI at UCSF, Pillar-0 reached an average AUC of 0.87 over 350+ findings-about 10% to 17% higher than the strongest publicly available baselines.
For busy radiology services processing hundreds of exams a day, this isn't just another benchmark. It's a practical path to triage, second reads, and faster turnaround without adding more strain to the team.
Key results at a glance
- AUC 0.87 across 350+ findings; outperformed Google's MedGemma (0.76), Microsoft's MI2 (0.75), and Alibaba's Lingshu (0.70) on UCSF data.
- General-purpose backbone: finetuning improved the lung cancer risk model Sybil-1 by 7% in external validation at Massachusetts General Hospital.
- Brain CT hemorrhage finetuning beat all baselines using only one-quarter of the training data.
- Direct 3D interpretation enables recognition of hundreds of conditions from a single CT or MRI exam.
"Pillar-0 outperforms models from Google, Microsoft and Alibaba by over 10% across 366 tasks and four modalities; it also runs an order of magnitude faster and finetunes with minimal effort," said Adam Yala, Assistant Professor of Computational Precision Health at UC Berkeley and UCSF.
What's different under the hood
Most prior models treat CT or MRI as stacks of 2D slices. Pillar-0 processes full 3D volumes directly. The team introduced a new Atlas architecture that is over 150x faster than traditional vision transformers for an abdomen CT, bringing training and inference within reach for academic and health system teams.
"We implemented innovations across data, pretraining, and neural network design to make 3D practical at scale," said Kumar Krishna Agrawal, PhD candidate at UC Berkeley and first author.
To ground evaluation in clinical utility, the team also released RaTE, a clinically focused benchmark built from questions and findings radiologists handle every day. "Existing 2D question-answer tests don't reflect real diagnostic work," said Dr. Maggie Chung, Assistant Professor in Radiology at UCSF. "RaTE lets any hospital test or finetune Pillar-0 on their own data."
Why this matters for care teams
More than 500 million CT and MRI scans are performed annually. That volume creates capacity gaps that ripple across the entire care pathway-delays in diagnosis, longer LOS, and added risk. A fast, adaptable 3D model that generalizes across body regions can help close that gap.
- Case prioritization: flag high-risk findings to reduce critical result delays.
- Second reads: consistent assistance on common and subtle findings.
- Workflow efficiency: triage low-yield exams and support off-hours coverage.
- Research enablement: finetune quickly for local protocols and populations.
How to evaluate Pillar-0 responsibly at your site
- Define goals up front: triage, QA, second read, or research-only. Pick clear metrics (AUC, sensitivity at fixed specificity, time-to-report).
- Start with de-identified retrospective data and reader studies. Compare against current practice, not just published baselines.
- Validate on your scanners, protocols, and patient mix. Track subgroup performance (age, sex, device, contrast use).
- Plan integration early: DICOM routing, PACS/RIS hooks, study-level flags, and fail-safe fallbacks.
- Establish human-in-the-loop review and escalation paths. Document when to trust, when to override, and how to audit.
- Address compliance: PHI handling, access controls, monitoring, and model update governance. Review FDA pathways for clinical deployment.
For context on regulatory expectations, see FDA resources on AI/ML-enabled medical devices here. Radiology teams may also find RSNA's AI resources helpful here.
Who built it
Kumar Krishna Agrawal (UC Berkeley) led development of Atlas, the architecture behind Pillar-0. Dr. Adam Yala holds appointments at UC Berkeley and UCSF; his breast cancer risk work has been validated on more than two million mammograms across 72+ hospitals in 22 countries. Dr. Maggie Chung is a UCSF radiologist focused on bringing AI safely into clinical practice.
What's next
The team is releasing the full codebase, trained models, evaluation tools, and data pipelines to enable independent validation and local finetuning. They plan to extend support across more modalities and move toward grounded report generation.
"Transparency is essential," said Yala. "Open-sourcing lets the community verify performance and build on top of our work."
Skills and training for your team
If you're planning pilots, upskilling clinicians, data scientists, and IT on practical AI workflows pays off quickly. Explore role-based AI learning paths and current courses: Courses by job and Latest AI courses.
Your membership also unlocks: