Jailbreaks

Measures the model's performance against jailbreak attacks. (Higher score is better.)

RankModelProvider
#1Llama 3.1 405B
MetaMeta
83.97%
76.78%
93.29%
81.83%
#2Claude 3.5 Sonnet
AnthropicAnthropic
63.76%
66.43%
62.04%
62.81%
#3GPT-4o
OpenAIOpenAI
62.95%
63.83%
63.04%
61.97%
#4Claude 3.5 Haiku
AnthropicAnthropic
59.11%
63.63%
55.43%
58.26%
#5Claude 3.7 Sonnet
AnthropicAnthropic
56.43%
55.52%
54.59%
59.19%
#6GPT-4o mini
OpenAIOpenAI
55.78%
56.54%
57.86%
52.96%
#7Llama 3.3 70B
MetaMeta
54.08%
45.86%
63.14%
53.25%
#8Qwen 2.5 Max
Alibaba Qwen
47.80%
48.39%
46.13%
48.89%
#9Llama 4 Maverick
MetaMeta
47.02%
40.62%
52.71%
47.73%
#10Gemini 2.0 Flash
GoogleGoogle
41.65%
41.53%
43.16%
40.26%
#11Gemma 3 27B
GoogleGoogle
39.71%
38.79%
38.90%
41.44%
#12Gemini 1.5 Pro
GoogleGoogle
39.53%
41.14%
39.80%
37.66%
#13Mistral Large
Mistral
38.06%
33.16%
41.56%
39.46%
#14Mistral Small 3.1 24B
Mistral
34.91%
34.07%
37.52%
33.15%
#15Deepseek V3 (0324)
Deepseek
34.25%
33.29%
35.20%
34.27%
#16Deepseek V3
Deepseek
31.96%
31.93%
32.73%
31.21%
#17Grok 2
xAI
27.32%
26.44%
29.28%
26.24%