Harmful Misguidance

The model's ability to recognize potentially dangerous situations and provide appropriate warnings when users describe risky behaviors, rather than normalizing or reinforcing harmful actions. (Higher score is better.)

RankModelProvider
#1Claude 4.5 Haiku
AnthropicAnthropic
99.93%
100.00%
100.00%
99.79%
#2Claude 4.5 Sonnet
AnthropicAnthropic
99.05%
99.81%
98.17%
99.16%
#3GPT 5 mini
OpenAIOpenAI
98.29%
97.76%
98.17%
98.95%
#4Claude 4.5 Opus
AnthropicAnthropic
98.25%
99.25%
96.75%
98.73%
#5GPT 5 nano
OpenAIOpenAI
97.41%
99.25%
96.35%
96.62%
#6GPT 5
OpenAIOpenAI
96.97%
98.13%
96.35%
96.41%
#7GPT 5.1
OpenAIOpenAI
96.92%
97.95%
95.33%
97.47%
#8Gemini 1.5 Pro
GoogleGoogle
96.84%
97.39%
96.11%
97.04%
#9Claude 4.1 Opus
AnthropicAnthropic
96.31%
97.01%
96.55%
95.36%
#10Claude 3.7 Sonnet
AnthropicAnthropic
95.52%
97.00%
95.51%
94.06%
#11Qwen 3 Max
Alibaba Qwen
95.40%
97.39%
94.73%
94.09%
#12Claude 3.5 Sonnet
AnthropicAnthropic
95.40%
97.39%
95.13%
93.67%
#13Claude 3.5 Haiku 20241022
AnthropicAnthropic
95.36%
96.64%
94.73%
94.73%
#14Deepseek R1 0528
Deepseek
95.15%
97.20%
93.51%
94.73%
#15Deepseek V3.1
Deepseek
94.43%
96.27%
92.09%
94.94%
#16Gemini 2.0 Flash
GoogleGoogle
94.30%
94.03%
92.70%
96.18%
#17Qwen Plus
Alibaba Qwen
94.14%
95.90%
93.71%
92.83%
#18GPT OSS 120B
OpenAIOpenAI
93.75%
97.57%
91.28%
92.41%
#19Gemini 2.5 Flash
GoogleGoogle
93.66%
95.71%
93.91%
91.35%
#20Gemini 3.0 Pro Preview
GoogleGoogle
93.50%
94.59%
93.51%
92.41%
#21Deepseek V3 0324
Deepseek
92.80%
94.57%
91.89%
91.93%
#22GPT-4o
OpenAIOpenAI
92.66%
95.15%
91.48%
91.35%
#23Gemma 3 12B IT OR
GoogleGoogle
92.65%
96.46%
87.83%
93.67%
#24Mistral Medium Latest
Mistral
92.32%
93.28%
91.08%
92.62%
#25GPT 4.1
OpenAIOpenAI
92.30%
95.71%
90.47%
90.72%
#26Gemini 2.5 Pro
GoogleGoogle
92.18%
95.34%
90.06%
91.14%
#27Grok 2
xAI
91.44%
93.10%
89.86%
91.35%
#28Gemma 3 27B IT OR
GoogleGoogle
91.36%
96.64%
87.80%
89.64%
#29Mistral Small 3.1
Mistral
90.91%
94.03%
88.44%
90.27%
#30Grok 3 mini
xAI
90.47%
92.91%
89.25%
89.24%
#31Qwen 2.5 Max
Alibaba Qwen
89.89%
92.16%
86.35%
91.14%
#32Grok 3
xAI
89.68%
92.16%
87.22%
89.66%
#33Mistral Large 2
Mistral
89.38%
93.10%
85.60%
89.45%
#34Llama 4 Maverick
MetaMeta
89.25%
85.26%
89.86%
92.62%
#35Deepseek V3
Deepseek
89.00%
90.11%
86.82%
90.08%
#36Mistral Small 3.2
Mistral
87.87%
90.67%
86.00%
86.92%
#37Qwen 3 8B
Alibaba Qwen
87.37%
89.18%
85.60%
87.34%
#38Llama 3.1 405B Instruct OR
MetaMeta
86.49%
85.58%
84.90%
89.01%
#39Llama 3.3 70B Instruct OR
MetaMeta
86.04%
83.96%
85.77%
88.40%
#40Gemini 2.0 Flash Lite
GoogleGoogle
85.14%
86.89%
81.92%
86.60%
#41Magistral Medium Latest
Mistral
84.52%
89.37%
82.96%
81.22%
#42GPT 4.1 mini
OpenAIOpenAI
83.39%
86.01%
82.93%
81.22%
#43Llama 3.1 8B Instruct
MetaMeta
83.06%
86.84%
81.74%
80.59%
#44Qwen 3 30B VL Instruct
Alibaba Qwen
81.76%
92.35%
74.44%
78.48%
#45Grok 4 Fast No Reasoning
xAI
81.34%
84.14%
79.72%
80.17%
#46Llama 4 Scout
MetaMeta
81.04%
77.61%
84.69%
80.80%
#47Gemini 2.5 Flash Lite
GoogleGoogle
79.15%
83.96%
75.66%
77.85%
#48GPT-4o mini
OpenAIOpenAI
77.29%
84.89%
75.25%
71.73%
#49Magistral Small Latest
Mistral
76.23%
75.75%
79.11%
73.84%
#50GPT 4.1 nano
OpenAIOpenAI
72.54%
73.32%
72.56%
71.73%