BlackSwan Challenge

Submission Details & Evaluation Guidelines

About the Dataset

BlackSwan is a benchmark for evaluating VLMs' ability to reason about unexpected events through abductive and defeasible tasks. Our tasks artificially limit the amount of visual information provided to models while questioning them about hidden unexpected events, or provide new visual information that could change an existing hypothesis.

We release our data with two splits:

  • Validation Split: Ground truth labels are accessible for model development.
  • Test Split: Ground truth labels are hidden; email your predictions to the organizers for evaluation.

We encourage participants to use the validation split during development and submit a final model version for test set evaluation. The validation set contains 827 videos (50% of data) and the test set contains 828 videos, making it slightly more challenging. Overall, the dataset comprises over 3,800 MCQ tasks spanning 1,655 videos. The challenge evaluates MCQ tasks only.

Evaluation

We evaluate model performance using accuracy — the proportion of correctly answered MCQ questions. In addition to overall accuracy, we report per-subtask scores to provide deeper insights into model reasoning capabilities.

Detective Score

Questions where only the pre-event and post-event clips are provided as context. Tests abductive reasoning — inferring the hidden cause of an unexpected event.

Reporter Score

Questions where the entire video is provided as context. Tests defeasible reasoning — revising a hypothesis when new visual evidence appears.

Submission Guidelines

How to Submit

Email your predictions to the organizers at cogvl2026@googlegroups.com with the subject line:

BlackSwan Challenge Submission — [Team Name]

Please include in the email body: your team name, affiliation, a brief description of your approach, and attach the JSON prediction file described below. The organizers will evaluate your submission and reply with your overall accuracy, Detective score, and Reporter score.

MCQ Prediction File Format

Attach a JSON file to your email where each key is a question ID (string) and each value is the index of the selected MCQ option (0-indexed integer).

Example:

{
  "0": 0,
  "1": 0,
  "2": 1,
  "3": 0,
  "4": 1,
  "5": 1
}

Terms and Conditions

By accessing, downloading, or using our data, you acknowledge that the BlackSwan team does not own the copyright to these videos and that they are solely provided for non-commercial research and/or educational purposes. This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

BibTeX Citation

To cite the BlackSwan dataset, please use:

@misc{chinchure2024blackswanabductivedefeasible,
  title={Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events},
  author={Aditya Chinchure and Sahithya Ravi and Raymond Ng and Vered Shwartz and Boyang Li and Leonid Sigal},
  year={2024},
  eprint={2412.05725},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2412.05725}
}