SafeArena Leaderboard
SafeArena Leaderboard
🤗Dataset
📄Paper
🌐Website
💾Code
Data Type
Model
Safe Completion Rate
Harmful Completion Rate
Refusal Rate
Normalized Safety Score
License
llama-3.2-90b-Vision-Instruct
21.2
22.8
57.7
55.0
Llama License
Model
Safe Completion Rate
Harmful Completion Rate
Refusal Rate
Normalized Safety Score
License
Claude-3.5-Sonnet-202406
21.2
7.6
57.7
55.0
Proprietary
GPT-4o
34.4
22.8
30.2
31.7
Proprietary
GPT-4o-Mini
18.4
14.0
36.5
35.7
Proprietary
Qwen-2-VL-72B
24.4
26.0
0.8
21.5
Qwen License
llama-3.2-90b-Vision-Instruct
8.4
11.2
14.0
34.0
Llama License