📜 Biography

I am a research in Shanghai AI Lab, collaborating closely with Dr. Kaipeng Zhang and Dr. Wenqi Shao. I recevied my Ph.D. degree in 2025 from Beijing Institute of Technology (BIT), advised by Prof. Yuwei Wu and Prof. Yunde Jia, Master degree in 2020 from Northeastern University supervised by Prof. Shukuan Lin, and Bachlor degree in 2017 from Harbin University of Science and Technology.

My research area lies at:

  • vision-and-language
  • image/video generation
  • multimodal large language models
  • internet-augmented generation
  • compositional generalization

🎓 Education

  • 2020.09 - 2025.03, Ph.D. in CS, Beijing Institute of Technology, Beijing, China
  • 2017.09 - 2020.01, Master in CS, Northeastern University, Shenyang, Liaoning, China
  • 2013.09 - 2017.06, Bachelor in CS, Harbin University of Science and Technology, Harbin, Heilongjiang, China

⚡ Preprint

* indicates equal contribution

+ indicates corresponding author

arXiv 2025
sym

Sekai: A Video Dataset towards World Exploration

arXiv 2025
sym

IA-T2I: Internet-Augmented Text-to-Image Generation

  • Chuanhao Li*, Jianwen Sun*, Yukang Feng*, Mingliang Zhai, Yifan Chang, and Kaipeng Zhang+.
  • [arXiv 2025] [paper]
arXiv 2025
sym

A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

  • Yukang Feng*, Jianwen Sun*, Chuanhao Li, Zizhen Li, Jiaxin Ai, Fanrui Zhang, Yifan Chang, Sizhuo Zhou, Shenglin Zhang, Yu Dai, and Kaipeng Zhang+.
  • [arXiv 2025] [paper]
arXiv 2025
sym

ARMOR: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy

  • Jianwen Sun*, Yukang Feng*, Chuanhao Li, Fanrui Zhang, Zizhen Li, Jiaxin Ai, Sizhuo Zhou, Pengfei Zhou, Yu Dai, Shenglin Zhang, and Kaipeng Zhang+.
  • [arXiv 2025] [paper] [code]
arXiv 2025
sym

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

  • Pengfei Zhou*, Fanrui Zhang*, Xiaopeng Peng*, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You+, and Kaipeng Zhang+.
  • [arXiv 2025] [paper] [code]
arXiv 2025
sym

SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model

  • Yifan Chang*, Yukang Feng*, Jianwen Sun*, Jiaxin Ai, Chuanhao Li, S. Kevin Zhou, and Kaipeng Zhang+.
  • [arXiv 2025] [paper]

📝 Selected Publications

IJCAI 2025
sym

Multi-Sourced Compositional Generalization in Visual Question Answering

  • Chuanhao Li*, Wenbo Ye*, Zhen Li, Yuwei Wu+, and Yunde Jia.
  • [IJCAI 2025] [paper] [code]
CVPR 2025
sym

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

  • Pengfei Zhou*, Xiaopeng Peng*, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, and Kaipeng Zhang+.
  • [CVPR 2025] [Oral] (Top 3.3%) [paper] [code]
ICLR 2025
sym

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

  • Fanqing Meng*, Jin Wang*, Chuanhao Li*, Quanfeng Lu, Hao Tian, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang+, and Wenqi Shao+.
  • [ICLR 2025] [paper] [code]
AAAI 2025
sym

Consistency of Compositional Generalization across Multiple Levels

  • Chuanhao Li*, Zhen Li*, Chenchen Jing+, Xiaomeng Fan, Wenbo Ye, Yuwei Wu+, and Yunde Jia.
  • [AAAI 2025] [paper] [code]
NeurIPS 2024
sym

SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge

  • Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu+, Ping Luo, Yu Qiao, and Kaipeng Zhang+.
  • [NeurIPS 2024] [paper] [code]
NeurIPS 2024
sym

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models

  • Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao+, and Kaipeng Zhang+.
  • [NeurIPS 2024] [Spotlight] [paper] [code]
ECCV 2024
sym

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

  • Chuanhao Li*, Zhen Li*, Chenchen Jing+, Yuwei Wu+, Mingliang Zhai, and Yunde Jia.
  • [ECCV 2024] [paper] [code]
EMNLP 2024
sym

In-Context Compositional Generalization for Large Vision-Language Models

  • Chuanhao Li, Chenchen Jing, Zhen Li, Mingliang Zhai, Yuwei Wu+, and Yunde Jia.
  • [EMNLP 2024] [Main Conference] [paper]
TOMM 2024
sym

Adversarial Sample Synthesis for Visual Question Answering

  • Chuanhao Li, Chenchen Jing, Zhen Li, Yuwei Wu+, and Yunde Jia.
  • [TOMM 2024] [paper]
CVPR 2023
sym

Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language

  • Chuanhao Li, Zhen Li, Chenchen Jing+, Yunde Jia, and Yuwei Wu+.
  • [CVPR 2023] [paper] [code]
AAAI 2022
sym

Learning the Dynamics of Visual Relational Reasoning via Reinforced Path Routing

  • Chenchen Jing, Yunde Jia, Yuwei Wu, Chuanhao Li, and Qi Wu.
  • [AAAI 2022] [paper]

🏅 Selected Awards

  • 2023.01, the second prize in the multi-modal technology innovation competition of the first “Xingzhi Cup” National Artificial Intelligence Innovation Application Competition
  • 2016.05, the first prize in the CCPC Heilongjiang Collegiate Programming Contest
  • 2015.05, the first prize in the CCPC Heilongjiang Collegiate Programming Contest
  • 2014.07, the silver medal in the ACM-ICPC Collegiate Programming Contest Shanghai Invitational

🏛️ Academic Activities

💻 Work Experience

  • 2025.04 - Present, Researcher, Shanghai AI Lab, Shanghai, China
  • 2024.01 - 2025.04, Intern, Shanghai AI Lab, Shanghai, China
  • 2019.07 - 2019.10, Intern, UISEE, Beijing, China