CogVL: Cognitive Foundations for Multimodal Models

Date: June 3, 2026 — 1pm–6pm

Location: Denver, Colorado @ CVPR 2026 — Rooms 610/612

CogVL Workshop @ CVPR 2026 brings together topics in vision, language, and cognitive science to move beyond surface-level intelligence toward models that reason, generalize, and decide reliably in the real world.

Schedule

Note: The schedule is subject to change.

Time Slot	Event	Speaker
1:00 pm – 1:10 pm	Welcome and Opening Remarks
1:10 pm – 1:50 pm	35 min Keynote + 5 min Q&A	Katerina Fragkiadaki (Carnegie Mellon University)
1:50 pm – 2:30 pm	Oral Presentations MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models Anh Thai, Stefan Stojanov, Zixuan Huang, Bikram Boote, James Matthew Rehg Multimodal Graph-of-Thoughts: Hypothesis-Verification Graphs for Multimodal Reasoning in Vision-Language Models Irina Belyaeva Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions Saurav Sengupta, Nazanin Moradinasab, Jiebei Liu, Donald E. Brown Relational Visual Similarity Thao Nguyen, Sicheng Mo, Krishna Kumar Singh, Yilin Wang, Jing Shi, Nicholas Kolkin, Eli Shechtman, Yong Jae Lee, Yuheng Li	4 x 10 min
2:30 pm – 3:10 pm	35 min Keynote + 5 min Q&A	Judith E. Fan (Stanford University)
3:10 pm – 3:30 pm	Break
3:30 pm – 4:10 pm	35 min Keynote + 5 min Q&A	Alane Suhr (UC Berkeley / BAIR)
4:10 pm – 4:50 pm	35 min Keynote + 5 min Q&A	Trevor Darrell (UC Berkeley / BAIR)
4:50 pm – 5:00 pm	Closing Remarks + Awards
5:00 pm – 6:00 pm	Poster Session

Keynote Speakers

Trevor Darrell

Berkeley EECS / BAIR

Website

Katerina Fragkiadaki

Carnegie Mellon University

Website

Judy Fan

Stanford Psychology & CS

Website

Alane Suhr

Berkeley EECS / BAIR

Website

Despite impressive perceptual and reasoning capabilities, vision-language models (VLMs) face challenges in systematic generalization, sample efficiency, commonsense reasoning, and trustworthy decision-making. The CogVL workshop provides a forum for researchers across computer vision, natural language processing, and cognitive science to explore how cognitively-inspired frameworks can address these limitations.

Our workshop is motivated by the emerging interest in whether cognitive principles such as counterfactual thinking, theory of mind, compositional reasoning, and causal inference can offer a blueprint for more adaptable, robust, and context-aware multimodal intelligence. Our half-day workshop features invited keynote talks, a panel discussion with leading experts, and selected papers. CogVL will also host the BlackSwan Challenge, which evaluates abductive reasoning (inferring hidden causes) and defeasible reasoning (adapting to new visual evidence) in unexpected video events.

Announcements

🚨 BlackSwan Challenge Deadline Extended! The BlackSwan Challenge deadline has been extended to May 1, 2026 (AoE) ~~(was April 15)~~. See the submission details for more information.
🚨 Deadline Extended! The paper submission deadline has been extended to March 6, 2026 (AoE).
Submission site is now open! Submit your papers via OpenReview. Deadline: ~~March 1, 2026~~ March 6, 2026 (AoE).
The BlackSwan Challenge has launched! See the submission instructions page for details on how to participate.
Follow us on X! CogVL Workshop on X