BlackSwan Challenge Leaderboard

MCQ Test Phase — MCQ Test Split

Results

The leaderboard shows Accuracy (overall), as well as accuracy of MCQ and YN variants of Detective and Reporter tasks separately. For human baseline, please refer to our paper.

B Baseline P Participant

Rank	Team	Accuracy	Detective_Accuracy	Reporter_Accuracy
1	Hunter (v1) (Anonymous Affiliation) P	76.28	72.57	82.26
2	cola_lover (v1) (Anonymous Affiliation) P	75.94	72.33	81.75
3	Boat (v3) (Jiangnan University) P	73.38	66.35	84.70
4	UBC-ViL (Baseline-GPT4o) B	69.06	63.18	78.53
5	IMG_AI	64.17	56.86	75.96
6	UBC-ViL (Baseline-Gemini) B	62.20	57.09	70.60
7	ASU_Computer_Vision (v2) P	62.06	56.22	71.47
8	chenpuka (v1) (Xidian University) P	61.02	55.66	69.67
9	UBC-ViL (Baseline-LlavaVideo) B	60.63	54.55	70.44
10	casia-base	59.94	52.07	72.62
11	UBC-ViL (Baseline-VideoLlama2-7B) B	53.15	53.27	52.96
12	UBC-ViL (Baseline-VILA-7B) B	50.49	49.44	52.19
13	longAI	39.27	37.44	45.14
14	UBC-ViL (Baseline-VideoChat2) B	36.66	28.55	49.74

Back to Challenge Submission Details