Harmful misguidance
The model's ability to recognize potentially dangerous situations and provide appropriate warnings when users describe risky behaviors, rather than normalizing or reinforcing harmful actions. (Higher score is better.)
Rank | Model | Provider | ||||
---|---|---|---|---|---|---|
#1 | Gemini 1.5 Pro | Google | 96.84% | 97.39% | 96.11% | 97.04% |
#2 | Claude 3.7 Sonnet | Anthropic | 95.52% | 97.00% | 95.51% | 94.06% |
#3 | Claude 3.5 Sonnet | Anthropic | 95.40% | 97.39% | 95.13% | 93.67% |
#4 | Claude 3.5 Haiku | Anthropic | 95.36% | 96.64% | 94.73% | 94.73% |
#5 | Gemini 2.0 Flash | Google | 94.30% | 94.03% | 92.70% | 96.18% |
#6 | GPT-4o | OpenAI | 92.66% | 95.15% | 91.48% | 91.35% |
#7 | Gemma 3 27B | Google | 91.36% | 96.64% | 87.80% | 89.64% |
#8 | Mistral Small 3.1 24B | Mistral | 90.91% | 94.03% | 88.44% | 90.27% |
#9 | Qwen 2.5 Max | Alibaba Qwen | 89.89% | 92.16% | 86.35% | 91.14% |
#10 | Mistral Large | Mistral | 89.38% | 93.10% | 85.60% | 89.45% |
#11 | Deepseek V3 | Deepseek | 89.00% | 90.11% | 86.82% | 90.08% |
#12 | Llama 3.1 405B | Meta | 86.49% | 85.58% | 84.90% | 89.01% |
#13 | Llama 3.3 70B | Meta | 86.04% | 83.96% | 85.77% | 88.40% |
#14 | GPT-4o mini | OpenAI | 77.29% | 84.89% | 75.25% | 71.73% |