Xuan Wang’s AI System Transforms How Scientists Search and Analyze Research Data
Xuan Wang uses AI to organize vast scientific papers, making data easier to search and analyze. Her NSF-funded project aids researchers and improves medical record use.

Xuan Wang’s CAREER Project Makes Sense of Scientific Discovery
Scientific research produces vast volumes of data, but extracting meaningful insights from this growing pool remains a challenge. Xuan Wang, an assistant professor of computer science with a background in biological research, is using artificial intelligence (AI) to address this issue. Her project focuses on building an AI system that organizes large collections of published scientific papers, making the data easy to search and analyze.
Wang recently received a five-year, $400,000 Faculty Early Career Development Program (CAREER) award from the U.S. National Science Foundation (NSF) to support this work. The goal is to accelerate discoveries by creating tools that can automatically extract reliable information from extensive scientific texts across various fields.
From Biology to Computer Science
Wang’s journey began in biochemistry, where she encountered the tedious task of manually comparing genes from the human genome to experimental data. She realized this process could be automated. This insight led her to study statistics and computer science, culminating in a Ph.D. focused on AI methods for data analysis.
Her aim is clear: to develop an AI system capable of rapidly and accurately reading through thousands of scientific papers, organizing their content so researchers can quickly find and digest relevant information.
Extracting Meaningful Data Across Disciplines
The CAREER award enables Wang’s team to extend their work with Children’s National Hospital, developing automated extraction tools for electronic health records. These records contain vast amounts of unstructured clinical notes that, if organized effectively, could help doctors locate critical information swiftly and improve patient care.
Another collaboration involves PubMed, a comprehensive biomedical literature database. Wang’s team will beta-test their AI system on over 38 million citations, ensuring it can handle large-scale datasets.
While healthcare is a primary focus, the project aims to serve many scientific and engineering fields. Decades of research across biology, physics, math, and more have produced extensive documentation stored in databases. However, even experts often struggle to locate and synthesize relevant information efficiently.
Balancing Large and Small Language Models
Wang’s research compares large and small language models—both AI approaches—to determine which best extracts and structures scientific information for search and analysis.
- Large language models are powerful and fast but costly and prone to errors like hallucination, where they generate information not present in the source data. They require significant oversight and structuring.
- Small language models are simpler, less expensive, and customizable. They can be hosted and fine-tuned on local systems, making them attractive for many research domains.
Determining the right balance between these models is essential to building reliable AI tools for scientific discovery.
Impact on Medical Research and Beyond
Wang’s work connects her original passion for biochemistry with the practical power of AI. By creating systems that organize and make sense of massive scientific datasets, she aims to accelerate innovation in healthcare and other fields.
Her project demonstrates how targeted AI applications can help researchers cut through information overload, enabling faster, more accurate advances in science.
Learn more about AI applications in research and data analysis at Complete AI Training - Latest AI Courses.