Framing Jailbreaks

Measures the model's performance against framing jailbreak attacks. (Higher score is better.)

RankModelProvider
#1Claude 4.5 Opus
AnthropicAnthropic
98.86%
96.95%
99.65%
100.00%
#2GPT 5 nano
OpenAIOpenAI
98.55%
95.66%
100.00%
100.00%
#3GPT 5 mini
OpenAIOpenAI
98.14%
96.89%
99.28%
98.25%
#4GPT 5.1
OpenAIOpenAI
96.67%
92.44%
98.23%
99.32%
#5Claude 4.5 Sonnet
AnthropicAnthropic
96.17%
91.00%
97.52%
100.00%
#6Claude 4.5 Haiku
AnthropicAnthropic
95.56%
88.10%
98.58%
100.00%
#7GPT 5
OpenAIOpenAI
95.28%
93.85%
96.07%
95.92%
#8Claude 4.1 Opus
AnthropicAnthropic
94.55%
90.35%
93.97%
99.32%
#9Llama 3.1 405B Instruct OR
MetaMeta
87.61%
76.21%
95.74%
90.88%
#10Claude 3.5 Haiku 20241022
AnthropicAnthropic
86.98%
79.74%
87.94%
93.24%
#11Claude 3.7 Sonnet
AnthropicAnthropic
86.93%
79.58%
88.30%
92.91%
#12GPT OSS 120B
OpenAIOpenAI
85.40%
75.36%
87.59%
93.24%
#13Gemini 3.0 Pro Preview
GoogleGoogle
85.11%
76.54%
88.26%
90.54%
#14GPT-4o
OpenAIOpenAI
65.85%
59.65%
65.96%
71.96%
#15Qwen 3 Max
Alibaba Qwen
63.77%
68.17%
58.51%
64.63%
#16Llama 3.3 70B Instruct OR
MetaMeta
61.21%
50.96%
69.15%
63.51%
#17Llama 3.1 8B Instruct
MetaMeta
60.92%
62.70%
62.63%
57.43%
#18GPT-4o mini
OpenAIOpenAI
59.32%
59.81%
55.32%
62.84%
#19Qwen Plus
Alibaba Qwen
59.22%
64.47%
55.71%
57.48%
#20Gemini 2.5 Flash Lite
GoogleGoogle
56.91%
64.68%
49.65%
56.42%
#21Llama 4 Maverick
MetaMeta
55.12%
52.57%
63.48%
49.32%
#22Gemini 2.5 Pro
GoogleGoogle
54.54%
59.16%
51.42%
53.04%
#23Llama 4 Scout
MetaMeta
53.67%
50.81%
50.00%
60.20%
#24GPT 4.1 nano
OpenAIOpenAI
53.41%
63.45%
42.20%
54.58%
#25Gemini 2.5 Flash
GoogleGoogle
50.55%
56.84%
47.16%
47.64%
#26Gemini 2.0 Flash Lite
GoogleGoogle
49.64%
57.33%
41.43%
50.17%
#27Gemini 2.0 Flash
GoogleGoogle
48.90%
54.66%
45.74%
46.28%
#28GPT 4.1
OpenAIOpenAI
47.49%
53.95%
43.26%
45.27%
#29Gemma 3 27B IT OR
GoogleGoogle
46.25%
51.45%
39.01%
48.31%
#30GPT 4.1 mini
OpenAIOpenAI
45.94%
60.06%
38.21%
39.53%
#31Gemma 3 12B IT OR
GoogleGoogle
45.81%
53.14%
40.07%
44.22%
#32Qwen 2.5 Max
Alibaba Qwen
44.38%
50.80%
41.13%
41.22%
#33Grok 4 Fast No Reasoning
xAI
43.86%
52.99%
41.99%
36.61%
#34Deepseek R1 0528
Deepseek
43.58%
56.59%
39.01%
35.14%
#35Qwen 3 8B
Alibaba Qwen
41.68%
59.90%
34.40%
30.74%
#36Grok 3 mini
xAI
41.23%
60.16%
34.04%
29.49%
#37Deepseek V3.1
Deepseek
40.62%
49.20%
36.52%
36.15%
#38Qwen 3 30B VL Instruct
Alibaba Qwen
38.74%
48.78%
32.62%
34.80%
#39Magistral Medium Latest
Mistral
37.38%
59.65%
31.56%
20.95%
#40Deepseek V3 0324
Deepseek
35.82%
45.89%
30.14%
31.42%
#41Mistral Large 2
Mistral
33.62%
43.25%
32.62%
25.00%
#42Mistral Medium Latest
Mistral
33.24%
43.60%
31.90%
24.23%
#43Grok 3
xAI
31.29%
48.63%
27.66%
17.57%
#44Deepseek V3
Deepseek
30.94%
44.61%
26.60%
21.62%
#45Mistral Small 3.2
Mistral
27.01%
42.77%
23.40%
14.86%
#46Magistral Small Latest
Mistral
24.79%
43.73%
19.50%
11.15%
#47Grok 2
xAI
21.40%
36.33%
17.86%
10.00%
Mistral Small 3.1*
Mistral
N/A
N/A
N/A
N/A
Claude 3.5 Sonnet*
AnthropicAnthropic
N/A
N/A
N/A
N/A
Gemini 1.5 Pro*
GoogleGoogle
N/A
N/A
N/A
N/A
* Models marked with an asterisk have partial scores.