Research
I'm interested in computer vision, natural language processing, and how we can combine both to build
more intelligent systems, that can reason and navigate about the world in mutliple modalities.
|
|
|
MentisOculi: Revealing the Limits of Reasoning with Mental Imagery
Jana Zeller,
Thaddäus Wiedemer,
Fanfei Li,
Thomas Klein,
Prasanna Mayilvahanan,
Matthias Bethge,
Felix Wichmann,
Ryan
Cotterell,
Wieland
Brendel
Under Review
project page /
arXiv /
code
Can models reason better with intermediate visualizations, akin to human mental imagery?
MentisOculi evaluates this across frontier models and finds that visual thoughts do not yet improve
reasoning.
|
|
|
Highlight: Learning Visual Prompts for Vision-Language Models
Jana Zeller,
Aleksandar (Suny) Shtedritski,
Christian Rupprecht,
CVPR (Emergent Visual Abilities and Limits of Foundation Models), 2025
project page /
code
We automatically discover visual prompts for CLIP. Interestingly, they look like red circles.
|
|
|
Treating Video as an Image
Jana Zeller,
Deva Ramanan,
Jonathon
Luiten
Preprint, 2022
preprint
/
poster /
video
We speedup video processing by merging frames together and passing them through the same vision
model only once.
|
This website template is borrowed from Jon
Barron. Thanks!
|
|