$152M OMAI Project to Build Transparent, Open AI for Science as UNM Social Scientist Joins

UNM's Sarah Dreier joins AI2-led OMAI, a $152M push for transparent, open AI built for reproducible science. Expect open models, provenance-first data, and extensible tools.

Transparent, Open AI for Science: UNM Social Scientist Joins $152M OMAI Initiative

AI is only as good as its data. Most large models were trained on the open internet, which means noise, bias, and zero visibility into sources. That's a nonstarter for scientific work where reproducibility, provenance, and audit trails are non-negotiable.

Sarah Dreier, assistant professor of political science at the University of New Mexico, is joining the Open Multimodal AI Infrastructure to Accelerate Science (OMAI) project to fix that. She is the sole social scientist on the team, focusing on dataset curation and practical data needs for scientific workflows like literature analysis and code generation, backed by a $600,000 allocation.

Led by the Allen Institute for AI (AI2), OMAI is a $152 million effort to build a fully open suite of AI models and infrastructure for U.S. science. Funding includes $75 million from the U.S. National Science Foundation and $77 million from Nvidia, supporting the broader federal push for trustworthy, high-performance AI in research.

"The engineers training these models don't know what the data is," Dreier noted, pointing to the gap between opaque training corpora and the demands of rigorous science. Her goal: models that are more transparent, more open, and more flexible for real research pipelines.

Noah Smith, who directs natural language processing research at AI2 and teaches at the University of Washington, emphasized why openness matters. Many LLMs are closed: their data, training tools, and methods are private, which blocks inspection, adaptation, and reuse. "Open models are essential for transparency, reproducibility, and collaboration - the core of how scientific progress happens."

Other investigators on the five-year project include UW's Hanna Hajishirzi, University of Hawai'i at Hilo's Travis Mandel, and the University of New Hampshire's Samuel Carton. The team plans to release open models, tools, and compute infrastructure to help scientists move faster with fewer blind spots.

According to Smith, the models will help researchers parse vast literatures, generate code and visualizations, and connect new findings to prior work. Expect impact across materials science, protein function prediction, and energy research.

Why this matters to engineers and researchers

Auditability by design: Open data pipelines, model cards, and documented training sets allow you to verify sources, assess bias, and replicate results.
Domain-focused training: Curated corpora for political science, sociology, biology, and engineering ensure models are tuned for real tasks like hypothesis generation, code authoring, and data analysis.
Tooling you can extend: Open checkpoints, evaluation suites, and APIs let teams fine-tune, add retrieval, and integrate with lab systems without license constraints.
Data rights and provenance: Licensing, usage permissions, and lineage tracking reduce legal and compliance risk, especially for government- and grant-funded work.

What to expect over the next five years

Public releases of models, datasets, and benchmarks with transparent documentation.
Provenance-first data pipelines and clear governance for contributions from partner universities.
Support for multimodal inputs (text, code, visuals) to match real scientific artifacts and workflows.
Community collaboration: opportunities to contribute datasets, evaluations, and domain expertise.

How to prepare your team now

Inventory clean, licensed datasets. Add data statements that spell out collection methods, intended use, and limitations.
Stand up retrieval pipelines with strict provenance so model outputs cite sources. Avoid mixing unknown web corpora into training or fine-tuning.
Create evaluation protocols for literature synthesis, code generation, and scientific QA that reflect your lab's acceptance criteria.
Train your staff on open model workflows and governance. If you need structured upskilling, explore practical programs at Complete AI Training.

OMAI's promise is simple: make high-utility AI for science that anyone can inspect, extend, and trust. For updates, watch the Allen Institute for AI's announcements here and the U.S. National Science Foundation's programs here.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

$152M OMAI Project to Build Transparent, Open AI for Science as UNM Social Scientist Joins

Transparent, Open AI for Science: UNM Social Scientist Joins $152M OMAI Initiative

Why this matters to engineers and researchers

What to expect over the next five years

How to prepare your team now

Related AI News for Science and Research

DoD Backs University of Oklahoma AI-Driven Discovery of Switchable Materials for Neuromorphic, Energy-Efficient Computing

How AI Slipped Into Peer Review: Faster Publishing, Murky Transparency, Untapped Rigor

From Busywork to Breakthroughs: Building Reliable Scientific AI Agents with NeMo Gym and NeMo RL

AI tips off scientists to a new monkeypox weak spot, opening the door to simpler vaccines and antibody therapies

Related AI News for IT and Development

Zoom AI Companion 3.0 Launches with Agentic Workflows, Federated Models, and Real-Time CX Support

Palfinger's Pune AI Hub Fuels Momentum-€43.3 in Sight or Already Priced In?

TikTok flooded with AI videos sexualising minors, report says, linking to Telegram groups sharing child sexual abuse material

Corruptible by Design: Weird Generalizations and Backdoors in LLMs

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: