BlackSwan Challenge Leaderboard

MCQ Test Phase — MCQ Test Split

Results

The leaderboard shows Accuracy (overall), as well as accuracy of MCQ and YN variants of Detective and Reporter tasks separately. For human baseline, please refer to our paper.

B Baseline P Participant
Rank Team Accuracy Detective_Accuracy Reporter_Accuracy
1 Hunter (v1) (Anonymous Affiliation) P 76.28 72.57 82.26
2 cola_lover (v1) (Anonymous Affiliation) P 75.94 72.33 81.75
3 Boat (v3) (Jiangnan University) P 73.38 66.35 84.70
4 UBC-ViL (Baseline-GPT4o) B 69.06 63.18 78.53
5 IMG_AI 64.17 56.86 75.96
6 UBC-ViL (Baseline-Gemini) B 62.20 57.09 70.60
7 ASU_Computer_Vision (v2) P 62.06 56.22 71.47
8 chenpuka (v1) (Xidian University) P 61.02 55.66 69.67
9 UBC-ViL (Baseline-LlavaVideo) B 60.63 54.55 70.44
10 casia-base 59.94 52.07 72.62
11 UBC-ViL (Baseline-VideoLlama2-7B) B 53.15 53.27 52.96
12 UBC-ViL (Baseline-VILA-7B) B 50.49 49.44 52.19
13 longAI 39.27 37.44 45.14
14 UBC-ViL (Baseline-VideoChat2) B 36.66 28.55 49.74