MentisOculi: Revealing the Limits of Reasoning with Mental Imagery

1MPI for Intelligent Systems, 2ELLIS Institute Tübingen, 3ETH Zurich, 4University of Tuebingen

MentisOculi allows to compare different visual reasoning strategies across different model families.

MentisOculi comprises five visual reasoning tasks designed to be best-solved with mental imagery

Teaser image Collectively, the tasks require models to solve multi-step reasoning problems with geometric constraints. Success hinges on the ability to maintain a visual representation with high fidelity and consistent geometry under affine transformations. Each task is procedurally generated across five difficulty levels, scaling with the number of operations required from one (left) to five (right)

Abstract

Frontier models are transitioning from multimodal large language models (MLLMs) that merely ingest visual information to unified multimodal models (UMMs) capable of native interleaved generation. This shift has sparked interest in using intermediate visualizations as a reasoning aid, akin to human mental imagery. Central to this idea is the ability to form, maintain, and manipulate visual representations in a goal-oriented manner. To evaluate and probe this capability, we develop MentisOculi, a procedural, stratified suite of multi-step reasoning problems amenable to visual solution, tuned to challenge frontier models. Evaluating visual strategies ranging from latent tokens to explicit generated imagery, we find they generally fail to improve performance. Analysis of UMMs specifically exposes a critical limitation: While they possess the textual reasoning capacity to solve a task and can sometimes generate correct visuals, they suffer from compounding generation errors and fail to leverage even ground-truth visualizations. Our findings suggest that despite their inherent appeal, visual thoughts do not yet benefit model reasoning. MentisOculi establishes the necessary foundation to analyze and close this gap across diverse model families.

Results

BibTeX

@article{zeller2026mentisoculi,
                    title={{MENTISOCULI}: Revealing the Limits of Reasoning with Mental Imagery},
                    author={Zeller, Jana and Wiedemer, Thadd{\"a}us and Li, Fanfei and Klein, Thomas and Mayilvahanan, Prasanna and Bethge, Matthias and Wichmann, Felix and Cotterell, Ryan and Brendel, Wieland},
                    journal={arXiv preprint arXiv:2602.02465},
                    year={2026},
                    note={Preprint. January 31, 2026},
                    url={https://jana-z.github.io/mentis-oculis}
                  }