Framing Jailbreaks
Measures the model's performance against framing jailbreak attacks. (Higher score is better.)
| Rank | Model | Provider | ||||
|---|---|---|---|---|---|---|
| #1 | Claude 4.5 Opus | Anthropic | 98.86% | 96.95% | 99.65% | 100.00% |
| #2 | GPT 5 nano | OpenAI | 98.55% | 95.66% | 100.00% | 100.00% |
| #3 | GPT 5 mini | OpenAI | 98.14% | 96.89% | 99.28% | 98.25% |
| #4 | GPT 5.1 | OpenAI | 96.67% | 92.44% | 98.23% | 99.32% |
| #5 | Claude 4.5 Sonnet | Anthropic | 96.17% | 91.00% | 97.52% | 100.00% |
| #6 | Claude 4.5 Haiku | Anthropic | 95.56% | 88.10% | 98.58% | 100.00% |
| #7 | GPT 5 | OpenAI | 95.28% | 93.85% | 96.07% | 95.92% |
| #8 | Claude 4.1 Opus | Anthropic | 94.55% | 90.35% | 93.97% | 99.32% |
| #9 | Llama 3.1 405B Instruct OR | Meta | 87.61% | 76.21% | 95.74% | 90.88% |
| #10 | Claude 3.5 Haiku 20241022 | Anthropic | 86.98% | 79.74% | 87.94% | 93.24% |
| #11 | Claude 3.7 Sonnet | Anthropic | 86.93% | 79.58% | 88.30% | 92.91% |
| #12 | GPT OSS 120B | OpenAI | 85.40% | 75.36% | 87.59% | 93.24% |
| #13 | Gemini 3.0 Pro Preview | Google | 85.11% | 76.54% | 88.26% | 90.54% |
| #14 | GPT-4o | OpenAI | 65.85% | 59.65% | 65.96% | 71.96% |
| #15 | Qwen 3 Max | Alibaba Qwen | 63.77% | 68.17% | 58.51% | 64.63% |
| #16 | Llama 3.3 70B Instruct OR | Meta | 61.21% | 50.96% | 69.15% | 63.51% |
| #17 | Llama 3.1 8B Instruct | Meta | 60.92% | 62.70% | 62.63% | 57.43% |
| #18 | GPT-4o mini | OpenAI | 59.32% | 59.81% | 55.32% | 62.84% |
| #19 | Qwen Plus | Alibaba Qwen | 59.22% | 64.47% | 55.71% | 57.48% |
| #20 | Gemini 2.5 Flash Lite | Google | 56.91% | 64.68% | 49.65% | 56.42% |
| #21 | Llama 4 Maverick | Meta | 55.12% | 52.57% | 63.48% | 49.32% |
| #22 | Gemini 2.5 Pro | Google | 54.54% | 59.16% | 51.42% | 53.04% |
| #23 | Llama 4 Scout | Meta | 53.67% | 50.81% | 50.00% | 60.20% |
| #24 | GPT 4.1 nano | OpenAI | 53.41% | 63.45% | 42.20% | 54.58% |
| #25 | Gemini 2.5 Flash | Google | 50.55% | 56.84% | 47.16% | 47.64% |
| #26 | Gemini 2.0 Flash Lite | Google | 49.64% | 57.33% | 41.43% | 50.17% |
| #27 | Gemini 2.0 Flash | Google | 48.90% | 54.66% | 45.74% | 46.28% |
| #28 | GPT 4.1 | OpenAI | 47.49% | 53.95% | 43.26% | 45.27% |
| #29 | Gemma 3 27B IT OR | Google | 46.25% | 51.45% | 39.01% | 48.31% |
| #30 | GPT 4.1 mini | OpenAI | 45.94% | 60.06% | 38.21% | 39.53% |
| #31 | Gemma 3 12B IT OR | Google | 45.81% | 53.14% | 40.07% | 44.22% |
| #32 | Qwen 2.5 Max | Alibaba Qwen | 44.38% | 50.80% | 41.13% | 41.22% |
| #33 | Grok 4 Fast No Reasoning | xAI | 43.86% | 52.99% | 41.99% | 36.61% |
| #34 | Deepseek R1 0528 | Deepseek | 43.58% | 56.59% | 39.01% | 35.14% |
| #35 | Qwen 3 8B | Alibaba Qwen | 41.68% | 59.90% | 34.40% | 30.74% |
| #36 | Grok 3 mini | xAI | 41.23% | 60.16% | 34.04% | 29.49% |
| #37 | Deepseek V3.1 | Deepseek | 40.62% | 49.20% | 36.52% | 36.15% |
| #38 | Qwen 3 30B VL Instruct | Alibaba Qwen | 38.74% | 48.78% | 32.62% | 34.80% |
| #39 | Magistral Medium Latest | Mistral | 37.38% | 59.65% | 31.56% | 20.95% |
| #40 | Deepseek V3 0324 | Deepseek | 35.82% | 45.89% | 30.14% | 31.42% |
| #41 | Mistral Large 2 | Mistral | 33.62% | 43.25% | 32.62% | 25.00% |
| #42 | Mistral Medium Latest | Mistral | 33.24% | 43.60% | 31.90% | 24.23% |
| #43 | Grok 3 | xAI | 31.29% | 48.63% | 27.66% | 17.57% |
| #44 | Deepseek V3 | Deepseek | 30.94% | 44.61% | 26.60% | 21.62% |
| #45 | Mistral Small 3.2 | Mistral | 27.01% | 42.77% | 23.40% | 14.86% |
| #46 | Magistral Small Latest | Mistral | 24.79% | 43.73% | 19.50% | 11.15% |
| #47 | Grok 2 | xAI | 21.40% | 36.33% | 17.86% | 10.00% |
| Mistral Small 3.1* | Mistral | N/A | N/A | N/A | N/A | |
| Claude 3.5 Sonnet* | Anthropic | N/A | N/A | N/A | N/A | |
| Gemini 1.5 Pro* | Google | N/A | N/A | N/A | N/A |
* Models marked with an asterisk have partial scores.