Jailbreaks
Measures the model's performance against jailbreak attacks. (Higher score is better.)
Rank | Model | Provider | ||||
---|---|---|---|---|---|---|
#1 | Llama 3.1 405B | Meta | 83.97% | 76.78% | 93.29% | 81.83% |
#2 | Claude 3.5 Sonnet | Anthropic | 63.76% | 66.43% | 62.04% | 62.81% |
#3 | GPT-4o | OpenAI | 62.95% | 63.83% | 63.04% | 61.97% |
#4 | Claude 3.5 Haiku | Anthropic | 59.11% | 63.63% | 55.43% | 58.26% |
#5 | Claude 3.7 Sonnet | Anthropic | 56.43% | 55.52% | 54.59% | 59.19% |
#6 | GPT-4o mini | OpenAI | 55.78% | 56.54% | 57.86% | 52.96% |
#7 | Llama 3.3 70B | Meta | 54.08% | 45.86% | 63.14% | 53.25% |
#8 | Qwen 2.5 Max | Alibaba Qwen | 47.80% | 48.39% | 46.13% | 48.89% |
#9 | Llama 4 Maverick | Meta | 47.02% | 40.62% | 52.71% | 47.73% |
#10 | Gemini 2.0 Flash | Google | 41.65% | 41.53% | 43.16% | 40.26% |
#11 | Gemma 3 27B | Google | 39.71% | 38.79% | 38.90% | 41.44% |
#12 | Gemini 1.5 Pro | Google | 39.53% | 41.14% | 39.80% | 37.66% |
#13 | Mistral Large | Mistral | 38.06% | 33.16% | 41.56% | 39.46% |
#14 | Mistral Small 3.1 24B | Mistral | 34.91% | 34.07% | 37.52% | 33.15% |
#15 | Deepseek V3 (0324) | Deepseek | 34.25% | 33.29% | 35.20% | 34.27% |
#16 | Deepseek V3 | Deepseek | 31.96% | 31.93% | 32.73% | 31.21% |
#17 | Grok 2 | xAI | 27.32% | 26.44% | 29.28% | 26.24% |