--- license: mit pipeline_tag: image-text-to-text library_name: transformers --- # CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

This repository contains the **CodePlot-CoT** model, a core component of the paper [CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images](https://huggingface.co/papers/2510.11718). CodePlot-CoT is an innovative code-driven Chain-of-Thought (CoT) paradigm designed to enable Vision Language Models (VLMs) to "think with images" when solving mathematical problems. Instead of generating pixel-based images directly, the model outputs executable plotting code to represent its "visual thoughts". This code is then executed to render a precise figure, which is reinput to the model as a visual input for subsequent reasoning steps. The model is built upon the Qwen2.5-VL architecture and is compatible with the `transformers` library.

For more details, please refer to the [project homepage](https://math-vr.github.io) and the [GitHub repository](https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT). ## Citation If you find this work helpful, please consider citing our paper: ```bibtex @article{duan2025codeplot, title={CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images}, author={Duan, Chengqi and Sun, Kaiyue and Fang, Rongyao and Zhang, Manyuan and Feng, Yan and Luo, Ying and Liu, Yufang and Wang, Ke and Pei, Peng and Cai, Xunliang and others}, journal={arXiv preprint arXiv:2510.11718}, year={2025} } ```