Benchmarking VLMs’ Reasoning About Persuasive Atypical Images

Published in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025

We study how vision–language models interpret persuasive advertisements with atypical portrayal of objects. We introduce 3 new tasks to evaluate the visual reasoning abilities of VLMs and MLLMs in understanding atypical imagery. We further compare the visual reasoning capabilities of VLMs and LLMs via purposing an atypicality-aware chain-of-thought prompting method. Our findings show that current VLMs and MLLMs struggle with reasoning over atypical images in creative ads and tend to rely on shallow visual cues (e.g., object recognition), leading to significantly more semantic errors than LLMs when faced with semantically challenging negatives.