MIT researchers build million-chart dataset to improve AI chart interpretation

MIT researchers built ChartNet, a dataset of over one million charts to train AI models to read visual data. Smaller open-source models trained on it outperformed larger commercial models at tasks like data extraction and chart summarization.

Categorized in: AI News Science and Research
Published on: Jun 05, 2026
MIT researchers build million-chart dataset to improve AI chart interpretation

MIT researchers release million-chart dataset to improve AI model accuracy

Researchers at MIT and the MIT-IBM Computing Research Lab built ChartNet, a dataset of over one million charts designed to teach artificial intelligence models how to interpret visual data. The resource could help businesses extract information from financial reports and scientific figures more accurately.

Current vision-language models-AI systems that process both images and text-often struggle with charts because the task requires understanding visual elements, numbers, and language simultaneously. A company using even the latest commercial model may still receive incomplete or inaccurate summaries of chart data.

The team created ChartNet using a two-step synthetic data generation process. An automated system first converts existing chart images into code, then iteratively modifies that code to create variations-changing chart type, data values, colors, and topics. This approach generated hundreds of variations from a single chart seed.

Each chart in the dataset includes the underlying code, a text description, numerical data in table form, and question-and-answer pairs to train models on accurate interpretation. Human experts annotated a subset of the data to ensure quality and provide validity guarantees.

When trained on ChartNet, smaller open-source models outperformed much larger commercial models on tasks like data extraction and chart summarization. This performance gap could allow smaller firms with limited budgets to deploy effective AI tools without expensive commercial alternatives.

The researchers tested ChartNet by training IBM's Granite Vision models and other open-source systems on four key tasks: chart reconstruction, data extraction, summarization, and answering questions about charts. All models improved across every task.

An automated quality check process verified that generated code was executable and rendered charts were accurate and legible. "We don't want to just be generating diverse samples. We also want the information to be presented in a meaningful way," said Jovana Kondic, the MIT electrical engineering and computer science graduate student who led the work.

The lack of high-quality training data has been a major bottleneck for developing chart-reading AI. Most existing datasets contain limited chart images pulled from the internet and lack the scale and supporting information needed for reliable model training.

The dataset is open-source and available for researchers and practitioners training their own models. The work appears in a paper presented at the IEEE Computer Vision and Pattern Recognition Conference.

The team plans to expand ChartNet by incorporating more complex chart types and incorporating feedback from the research community. Practitioners can use the annotated human-verified data to fine-tune existing models for specific applications.

For professionals working with generative AI and language models, ChartNet addresses a concrete gap: enabling AI systems to handle data analysis tasks that currently require human interpretation.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)