Jülich researchers build open-source AI that extracts numerical data from scientific papers automatically

Jülich Research Centre built Quinex, an open-source AI that pulls numerical data from scientific papers and structures it for analysis. It hits 98% accuracy on numbers and units across fields from energy to biomedicine.

Categorized in: AI News Science and Research
Published on: Apr 19, 2026
Jülich researchers build open-source AI that extracts numerical data from scientific papers automatically

Researchers build AI system to extract numbers from scientific papers

Jülich Research Centre has developed an AI framework that automatically identifies numerical data buried in scientific publications, assigns units, and structures the information for analysis. The system, called Quinex, eliminates the need for researchers to manually extract quantitative information from thousands of papers.

Scientific papers contain crucial numbers-efficiency rates, temperatures, costs, emissions-scattered throughout the text. As publication volume grows, manually reviewing all relevant studies for a single research question has become impractical. Quinex converts unstructured text like "Efficiency levels of 63 to 71 percent are assumed for 2025" into structured datasets with context: the year, measurement method, and source.

How it works

The framework uses open-source language models trained specifically to recognize and classify quantitative information. Unlike proprietary systems, Quinex is relatively small and resource-efficient.

The system achieves 98% accuracy in identifying numbers and units. For classifying what was measured and the entities involved, accuracy drops to 87% and 82% respectively-still high enough for reliable trend analysis across large document sets.

Tested across fields

Researchers tested Quinex on thousands of scientific abstracts from multiple disciplines. The system successfully extracted data on electricity production costs, human oxygen uptake, earthquake magnitudes, and photovoltaic material properties. The automatically derived values matched reference data closely.

This demonstrates the framework's ability to analyze large volumes of academic literature and identify reliable trends across diverse research fields.

Built-in transparency

Quinex is not error-free. The system reliably extracts numbers and units directly from text-these cannot be fabricated by the model. However, misinterpretations occur when important context is scattered throughout a paper.

Every extracted number can be traced back to its source text. The developers emphasize that Quinex supports researchers but does not replace human judgment. Researchers remain responsible for interpreting results.

The team plans to expand Quinex with domain-specific datasets and models to improve accuracy and flexibility across different research areas.

Open access

Jülich is releasing Quinex as an open-source project. The move allows researchers worldwide to test, modify, and adapt the system for their own fields-from energy research to chemistry and biomedicine.

The framework is available at https://go.fzj.de/quinex.

For researchers working with large literature databases, understanding how AI data analysis tools extract and structure information can improve research efficiency. Those developing similar systems may also benefit from courses on AI research applications.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)