Teenage stargazer uses AI to map 1.5 million hidden objects in the cosmos
In Pasadena, California, an 18-year-old high school senior, Matteo Paz, turned a summer research gig at Caltech into a discovery spree. He built a machine-learning pipeline that scanned NASA's NEOWISE archives and flagged 1.5 million candidates for previously unknown objects. The work earned him the Regeneron Science Talent Search's top prize of $250,000 and a seat at the table in a fast-moving discipline.
This isn't hype. It's a clean example of pairing public data with practical AI to move science faster. If you work in research, pay attention to the playbook behind it.
Why this matters for research teams
- NEOWISE gathered billions of infrared observations over a decade; most sat untouched due to scale and noise. Paz showed those archives are far from "done."
- His pipeline separates moving targets from static stars across sequential frames, then cross-matches candidates to existing catalogs to limit false hits.
- 1.5 million candidates add to the solar system census and support planetary defense by surfacing potential near-Earth objects.
- The approach generalizes: legacy datasets can be reprocessed to extract more science without new instruments.
Inside the model
Paz trained a neural network to pick out faint signals in noisy infrared data. Borrowing from computer vision, the model compares scans over time, flags motion, and rejects artifacts.
Speed was critical. The system handled images in sub-millisecond windows, then routed detections through validation steps. Cross-referencing with public catalogs cut down on spurious alerts and kept the candidate list usable for follow-up.
Scale and throughput
The NEOWISE archive includes more than 200 billion data points. Manual review is impossible, and classic pipelines stall under that load.
In six weeks, the AI processed the entire set and surfaced the 1.5 million candidates. That's the difference between a backlog and a workable queue for telescopes to verify.
Mentorship and institutional support
At Caltech, Paz worked with astronomers including Dmitry Duev to refine model parameters and evaluation rules. The partnership bridged domain knowledge and modern ML practice.
This is the template: give motivated students access to experts, compute, and open data. You don't need a new observatory to produce high-value results.
Verification and publication
Paz published a single-author, peer-reviewed paper in The Astronomical Journal, detailing the pipeline and results. That step matters-the community can test, replicate, and extend the work.
For context on the data source, see NASA's NEOWISE mission overview at JPL. For publication standards and archives, see The Astronomical Journal.
Community response and broader implications
Researchers and enthusiasts on X drew parallels to Clyde Tombaugh's Pluto discovery-same principle of spotting motion, different scale and speed. The buzz wasn't just praise; it pointed to obvious next steps with missions like JWST and ESA's Gaia.
Expect many of the 1.5 million candidates to be main-belt asteroids, distant galaxies, or stellar remnants. Even a modest confirmation rate meaningfully improves models of solar system dynamics and galactic structure.
Challenges that mattered
Noisy infrared data is messy. Cosmic rays, hot pixels, and instrument artifacts can mimic motion. Paz iterated through multiple model versions, benchmarking against known objects to keep precision and recall in check.
The guardrails-catalog cross-matching and human-in-the-loop checks-keep the science honest and the follow-up load realistic.
What's next
Follow-up observations will confirm or reject candidates and refine orbits. Teams are already exploring similar pipelines for Gaia's stellar catalog and other survey data.
Paz plans to continue in astrophysics or computer science. Meanwhile, labs are adapting his methods to new datasets. That's the signal: AI is becoming a standard instrument in observational work.
Practical playbook for labs and PIs
- Audit your archives. Identify high-noise, high-volume data where simple heuristics underperform.
- Start simple. Train a lightweight model on sequential frames to separate static sources from movers.
- Bake in validation. Cross-match against known catalogs; add artifact filters; log uncertainty and provenance.
- Close the loop. Prioritize candidates for follow-up and track outcomes to retrain and improve.
- Share outputs. Release candidate lists with confidence scores to invite replication and refinement.
- Upskill the team. If you need structured training for scientists adopting ML, see Research.
Industry and ethics
Tech and aerospace teams already use AI for planning and analysis, and work like this shows how to cut cost and time-to-insight on massive datasets. But verification stays non-negotiable.
Open methods, transparent datasets, and peer review prevent overconfidence. AI can propose; humans decide what becomes part of the scientific record.
The human factor
Paz credits early exposure to public lectures and family support for sparking his interest. He wrote the paper himself-an unusual feat in a field dominated by large collaborations.
The takeaway isn't that teenagers have an advantage. It's that curiosity, access, and focused execution can move the needle, regardless of age or title.
Key takeaways for science teams
- Legacy datasets still hold value; scale is a solvable problem with the right pipeline.
- Sequential analysis beats single-frame inspection for moving-object discovery.
- Validation layers-catalog cross-matching and artifact rejection-turn raw detections into science-ready candidates.
- Open tools and public data lower the barrier for meaningful contributions.
The bigger picture: as more surveys come online, backlogs will grow. Teams that treat AI as an instrument-tested, documented, and integrated-will extract more science from the same photons.
Your membership also unlocks: