MIT's ChartNet Dataset Trains AI to Interpret Financial Charts
New open-source resource enables smaller AI models to outperform commercial giants at extracting insights from business visualizations.

MIT tackles a critical AI weakness
Enterprises increasingly rely on generative AI to digest the charts and visualizations that populate financial reports and market analyses. Yet even advanced vision-language models frequently stumble when asked to interpret these documents, which demand simultaneous understanding of visual patterns, numerical data, and linguistic context.
Researchers from MIT and the MIT-IBM Computing Research Lab have developed ChartNet, a comprehensive training resource designed to close this performance gap. The dataset contains over one million synthetic chart images, each paired with the underlying code, numerical tables, textual descriptions, and question-answer sets that teach models how to reason about visual data.
When trained on ChartNet, several open-source vision-language models significantly outperformed much larger commercial alternatives on tasks including data extraction and chart summarization, according to findings that will be presented at the IEEE Computer Vision and Pattern Recognition Conference.
Why it matters
Chart interpretation represents a bottleneck for AI deployment in finance, consulting, and research-intensive industries where visual data drives decision-making. By enabling smaller, open-source models to exceed the capabilities of expensive proprietary systems, ChartNet could democratize access to sophisticated AI tools for organizations with limited budgets. The resource addresses a fundamental training data shortage that has constrained progress in multimodal AI development.
Synthetic data generation at scale
The ChartNet team built their dataset using a two-stage synthetic generation pipeline. The system first translates existing chart images into executable code, then systematically modifies aspects like chart type, data values, topics, and visual styling to create hundreds of variations from each seed image.
"We can start from a single chart that we use as a seed and come up with hundreds of augmentations of it. This is how we were able to build a dataset with more than a million diverse images," says Jovana Kondic, an MIT electrical engineering and computer science graduate student who led the research.
Automated quality checks verify that generated code executes correctly and produces accurate, clean visualizations. The dataset also includes human-annotated examples that practitioners can use to fine-tune models for specific applications.
Performance gains across model sizes
The researchers tested ChartNet by training IBM's Granite Vision models alongside other open-source systems of varying scales. The dataset improved accuracy across chart reconstruction, data extraction, summarization, and question-answering tasks.
"The finance industry thrives on charts. If vision-language models can extract information out of charts, like descriptions of trends, that facilitates a lot of workflows that happen downstream," notes Dhiraj Joshi, a senior scientist at IBM Research and co-author on the paper.
Unlike previous training datasets that focused narrowly on simple chart questions, ChartNet provides the multimodal annotations necessary for robust interpretation. The team plans to expand the dataset with additional complexity levels and incorporate feedback from the research community.
The work was funded in part by the MIT-IBM Computing Research Lab, and details were first reported by MIT News.
This is an original analysis by the Omega editorial team. Source reporting: AI Watch.
Want systems like this working for your business?
Book a Call
