Harmful Misguidance
The model's ability to recognize potentially dangerous situations and provide appropriate warnings when users describe risky behaviors, rather than normalizing or reinforcing harmful actions. (Higher score is better.)
| Rank | Model | Provider | ||||
|---|---|---|---|---|---|---|
| #1 | Claude 4.5 Haiku | Anthropic | 99.93% | 100.00% | 100.00% | 99.79% |
| #2 | Claude 4.5 Sonnet | Anthropic | 99.05% | 99.81% | 98.17% | 99.16% |
| #3 | GPT 5 mini | OpenAI | 98.29% | 97.76% | 98.17% | 98.95% |
| #4 | Claude 4.5 Opus | Anthropic | 98.25% | 99.25% | 96.75% | 98.73% |
| #5 | GPT 5 nano | OpenAI | 97.41% | 99.25% | 96.35% | 96.62% |
| #6 | GPT 5 | OpenAI | 96.97% | 98.13% | 96.35% | 96.41% |
| #7 | GPT 5.1 | OpenAI | 96.92% | 97.95% | 95.33% | 97.47% |
| #8 | Gemini 1.5 Pro | Google | 96.84% | 97.39% | 96.11% | 97.04% |
| #9 | Claude 4.1 Opus | Anthropic | 96.31% | 97.01% | 96.55% | 95.36% |
| #10 | Claude 3.7 Sonnet | Anthropic | 95.52% | 97.00% | 95.51% | 94.06% |
| #11 | Qwen 3 Max | Alibaba Qwen | 95.40% | 97.39% | 94.73% | 94.09% |
| #12 | Claude 3.5 Sonnet | Anthropic | 95.40% | 97.39% | 95.13% | 93.67% |
| #13 | Claude 3.5 Haiku 20241022 | Anthropic | 95.36% | 96.64% | 94.73% | 94.73% |
| #14 | Deepseek R1 0528 | Deepseek | 95.15% | 97.20% | 93.51% | 94.73% |
| #15 | Deepseek V3.1 | Deepseek | 94.43% | 96.27% | 92.09% | 94.94% |
| #16 | Gemini 2.0 Flash | Google | 94.30% | 94.03% | 92.70% | 96.18% |
| #17 | Qwen Plus | Alibaba Qwen | 94.14% | 95.90% | 93.71% | 92.83% |
| #18 | GPT OSS 120B | OpenAI | 93.75% | 97.57% | 91.28% | 92.41% |
| #19 | Gemini 2.5 Flash | Google | 93.66% | 95.71% | 93.91% | 91.35% |
| #20 | Gemini 3.0 Pro Preview | Google | 93.50% | 94.59% | 93.51% | 92.41% |
| #21 | Deepseek V3 0324 | Deepseek | 92.80% | 94.57% | 91.89% | 91.93% |
| #22 | GPT-4o | OpenAI | 92.66% | 95.15% | 91.48% | 91.35% |
| #23 | Gemma 3 12B IT OR | Google | 92.65% | 96.46% | 87.83% | 93.67% |
| #24 | Mistral Medium Latest | Mistral | 92.32% | 93.28% | 91.08% | 92.62% |
| #25 | GPT 4.1 | OpenAI | 92.30% | 95.71% | 90.47% | 90.72% |
| #26 | Gemini 2.5 Pro | Google | 92.18% | 95.34% | 90.06% | 91.14% |
| #27 | Grok 2 | xAI | 91.44% | 93.10% | 89.86% | 91.35% |
| #28 | Gemma 3 27B IT OR | Google | 91.36% | 96.64% | 87.80% | 89.64% |
| #29 | Mistral Small 3.1 | Mistral | 90.91% | 94.03% | 88.44% | 90.27% |
| #30 | Grok 3 mini | xAI | 90.47% | 92.91% | 89.25% | 89.24% |
| #31 | Qwen 2.5 Max | Alibaba Qwen | 89.89% | 92.16% | 86.35% | 91.14% |
| #32 | Grok 3 | xAI | 89.68% | 92.16% | 87.22% | 89.66% |
| #33 | Mistral Large 2 | Mistral | 89.38% | 93.10% | 85.60% | 89.45% |
| #34 | Llama 4 Maverick | Meta | 89.25% | 85.26% | 89.86% | 92.62% |
| #35 | Deepseek V3 | Deepseek | 89.00% | 90.11% | 86.82% | 90.08% |
| #36 | Mistral Small 3.2 | Mistral | 87.87% | 90.67% | 86.00% | 86.92% |
| #37 | Qwen 3 8B | Alibaba Qwen | 87.37% | 89.18% | 85.60% | 87.34% |
| #38 | Llama 3.1 405B Instruct OR | Meta | 86.49% | 85.58% | 84.90% | 89.01% |
| #39 | Llama 3.3 70B Instruct OR | Meta | 86.04% | 83.96% | 85.77% | 88.40% |
| #40 | Gemini 2.0 Flash Lite | Google | 85.14% | 86.89% | 81.92% | 86.60% |
| #41 | Magistral Medium Latest | Mistral | 84.52% | 89.37% | 82.96% | 81.22% |
| #42 | GPT 4.1 mini | OpenAI | 83.39% | 86.01% | 82.93% | 81.22% |
| #43 | Llama 3.1 8B Instruct | Meta | 83.06% | 86.84% | 81.74% | 80.59% |
| #44 | Qwen 3 30B VL Instruct | Alibaba Qwen | 81.76% | 92.35% | 74.44% | 78.48% |
| #45 | Grok 4 Fast No Reasoning | xAI | 81.34% | 84.14% | 79.72% | 80.17% |
| #46 | Llama 4 Scout | Meta | 81.04% | 77.61% | 84.69% | 80.80% |
| #47 | Gemini 2.5 Flash Lite | Google | 79.15% | 83.96% | 75.66% | 77.85% |
| #48 | GPT-4o mini | OpenAI | 77.29% | 84.89% | 75.25% | 71.73% |
| #49 | Magistral Small Latest | Mistral | 76.23% | 75.75% | 79.11% | 73.84% |
| #50 | GPT 4.1 nano | OpenAI | 72.54% | 73.32% | 72.56% | 71.73% |