SeisModal: Earthquake Data Fuels a Science-First Foundation Model
Researchers from five U.S. Department of Energy national laboratories are building SeisModal, a multimodal AI foundation model trained on global seismic data. The aim: answer science questions with minimal retraining, starting with challenges tied to nuclear nonproliferation.
Think of SeisModal as a base layer for scientific reasoning. It learns from large, high-quality datasets and adapts to new tasks across disciplines without starting from scratch each time.
Why it matters
- Foundation models are becoming standard tools for research. SeisModal brings that capability to scientific data beyond text and code.
- It's trained on publicly available, high-quality data from the National Earthquake Information Center, spanning more than 16,000 seismic events.
- It's multimodal: it ingests waveforms (time series), magnitude and intensity, location, timing, and even text and imagery.
- It can reason over complex time series, enabling signal detection and analysis methods that many general-purpose language models miss.
What SeisModal is built to do
SeisModal integrates multiple data streams into a single representation of each event. That unified view helps researchers make confident inferences-even if some inputs are missing-by leaning on what is available and consistent across modalities.
"We're creating a foundation model with broad capability that can be applied to multiple problems in science with minimal retraining for each application," said Karl Pazdernik, chief data scientist at Pacific Northwest National Laboratory (PNNL) and science lead for the Steel Thread effort.
"SeisModal can reason over complex time series data such as seismic waveforms, which is an advance over many current large language models," added Ian Stewart, one of the model's lead architects at PNNL. "The ability to detect these signals and other uncommon data types opens the door to a wider variety of scientific analysis methods."
The dataset: scale, quality, transparency
The team is training on data maintained by the National Earthquake Information Center, chosen for its scale, accessibility, and consistency. The collection covers thousands of events with detailed waveform records, event metadata, and context-key ingredients for generalizable performance across diverse scientific tasks.
Trust, provenance, and security
Steel Thread's goal is a model that scientists can trust. That means knowing exactly what went into training, documenting data origin and processing, and describing model security and usability limits. SeisModal offers a clear example of building on transparent data to support reproducible science and defensible conclusions.
Who's building it
- Project: Steel Thread, funded by the National Nuclear Security Administration's Office of Defense Nuclear Nonproliferation R&D (learn more).
- Labs: Pacific Northwest National Laboratory (PNNL), Lawrence Livermore, Los Alamos, Oak Ridge (Chengping Chai), and Sandia (Lisa Linville).
- SeisModal lead architects at PNNL: Sai Munikoti and Ian Stewart.
Applications researchers care about
- Event discrimination and characterization, including signals relevant to underground testing.
- Seismic monitoring workflows that blend waveforms with auxiliary data and text reports.
- General scientific analysis where time series meets imagery and unstructured context.
For practitioners
If you're building models that mix time series, text, and imagery-or you need a science-first base model-SeisModal points to a practical path: high-quality public data, clear provenance, and minimal per-task retraining.
Interested in upskilling for AI in scientific workflows? Browse role-focused programs at Complete AI Training.
Your membership also unlocks: