BlackSwan Challenge Leaderboard

MCQ Test Phase — MCQ Test Split

Results

The leaderboard shows Accuracy (overall), as well as accuracy of MCQ and YN variants of Detective and Reporter tasks separately. For human baseline, please refer to our paper.

B Baseline P Participant

Rank	Team	Accuracy	Detective_Accuracy	Reporter_Accuracy
1	cola_lover (v1) (Southeast University, Opus AI Research) P	75.94	72.33	81.75
2	Boat (v1) (Jiangnan University) P	72.88	67.78	81.11
3	UBC-ViL (Baseline-GPT4o) B	69.06	63.18	78.53
4	IMG_AI	64.17	56.86	75.96
5	UBC-ViL (Baseline-Gemini) B	62.20	57.09	70.60
6	ASU_Computer_Vision (v2) P	62.06	56.22	71.47
7	UBC-ViL (Baseline-LlavaVideo) B	60.63	54.55	70.44
8	casia-base	59.94	52.07	72.62
9	UBC-ViL (Baseline-VideoLlama2-7B) B	53.15	53.27	52.96
10	UBC-ViL (Baseline-VILA-7B) B	50.49	49.44	52.19
11	longAI	39.27	37.44	45.14
12	UBC-ViL (Baseline-VideoChat2) B	36.66	28.55	49.74

Back to Challenge Submission Details