The Limits of AI in Materials Science
Researchers at Friedrich Schiller University Jena have examined how current AI-based vision-language models perform on scientific tasks. Their study reveals that while these models handle simple recognition tasks well, they struggle with more complex scientific reasoning and data integration.
Evaluating AI Fairly with a New Method
One major challenge in AI research is fairly assessing multimodal systems—those that process both text and images—especially when it's unclear which data the models have encountered during training. The team at Jena developed an innovative evaluation framework to address this problem, enabling a systematic analysis of the strengths and weaknesses of AI systems applied to scientific work.
Multimodal AI models are seen as potential assistants for researchers, capable of supporting tasks from literature review to data interpretation. This study sought to determine whether these models truly hold promise for aiding daily scientific workflows.
Testing with Over 1,100 Realistic Scientific Tasks
The research team created MaCBench, an evaluation set consisting of more than 1,100 tasks drawn from typical scientific activities. These tasks cover three key areas:
- Extracting data from scientific literature
- Understanding laboratory and simulation experiments
- Interpreting measurement results
Examples include analyzing spectroscopy data, assessing laboratory safety, and interpreting crystal structures. The study tested leading AI models on their ability to process and link visual and textual scientific information—an essential skill for effective scientific assistance.
Strengths in Simple Recognition, Weaknesses in Complex Reasoning
The results show a clear pattern: AI models excel at identifying laboratory equipment and extracting standardized data, often with near-perfect accuracy. However, they struggle significantly with spatial analysis and combining information from multiple sources.
Interestingly, the models performed better when information was presented as text rather than images, indicating that integrating different data types remains a challenge. Moreover, performance correlated strongly with how frequently test materials appeared online, suggesting reliance on pattern recognition rather than true scientific comprehension.
Implications for Future AI Scientific Assistants
These findings highlight areas requiring improvement before AI systems can be fully trusted in research environments. Enhancing spatial perception and multimodal data integration is essential for future AI assistants to provide reliable scientific support.
This study provides practical guidance for developing AI tools better suited to the demands of natural sciences, moving beyond surface-level recognition toward deeper analytical capabilities.
Further Information
Original publication: Alampara et al., “Probing the limitations of multimodal language models for chemistry and materials research,” Nature Computational Science (2025), DOI: 10.1038/s43588-025-00836-3
Contact:  
Kevin Maik Jablonka, Dr
  
Institute of Organic Chemistry and Macromolecular Chemistry
  
Email: kevin.jablonka@uni-jena.de
  
Phone: +49 3641 9-48564
For those interested in AI applications in scientific research, exploring specialized AI training courses for researchers can provide practical skills to leverage these evolving tools effectively.
Your membership also unlocks:
 
             
             
                            
                            
                           