Prompt Injection

Measures the model's performance against known injection attacks. (Higher score is better.)

RankModelProvider
#1Claude 4.5 Haiku
AnthropicAnthropic
98.07%
97.86%
97.59%
98.75%
#2Claude 4.1 Opus
AnthropicAnthropic
97.75%
97.86%
97.89%
97.49%
#3Claude 4.5 Sonnet
AnthropicAnthropic
97.42%
97.33%
98.19%
96.74%
#4Claude 4.5 Opus
AnthropicAnthropic
97.05%
97.33%
97.59%
96.24%
#5Claude 3.5 Haiku 20241022
AnthropicAnthropic
93.40%
93.58%
91.87%
94.74%
#6GPT 5 mini
OpenAIOpenAI
86.55%
90.22%
87.16%
82.28%
#7Claude 3.7 Sonnet
AnthropicAnthropic
86.35%
87.70%
86.14%
85.21%
#8GPT 5 nano
OpenAIOpenAI
86.17%
90.37%
84.94%
83.21%
#9GPT OSS 120B
OpenAIOpenAI
86.00%
92.51%
84.04%
81.45%
#10GPT 5.1
OpenAIOpenAI
85.19%
89.84%
84.04%
81.70%
#11GPT 5
OpenAIOpenAI
81.30%
85.95%
79.82%
78.14%
#12Llama 3.1 405B Instruct OR
MetaMeta
79.06%
78.61%
84.64%
73.93%
#13GPT 4.1 nano
OpenAIOpenAI
75.85%
82.89%
70.48%
74.19%
#14GPT-4o
OpenAIOpenAI
71.17%
75.40%
68.67%
69.42%
#15GPT 4.1
OpenAIOpenAI
69.98%
79.68%
63.86%
66.42%
#16Qwen Plus
Alibaba Qwen
68.88%
79.21%
66.27%
61.15%
#17Llama 4 Maverick
MetaMeta
68.33%
61.50%
79.82%
63.66%
#18Gemini 3.0 Pro Preview
GoogleGoogle
67.99%
77.01%
66.57%
60.40%
#19GPT-4o mini
OpenAIOpenAI
67.28%
73.26%
63.86%
64.74%
#20Qwen 2.5 Max
Alibaba Qwen
66.73%
77.53%
59.52%
63.16%
#21Qwen 3 Max
Alibaba Qwen
64.44%
72.47%
61.45%
59.40%
#22GPT 4.1 mini
OpenAIOpenAI
61.33%
69.35%
60.24%
54.39%
#23Grok 4 Fast No Reasoning
xAI
60.31%
63.10%
59.94%
57.89%
#24Llama 4 Scout
MetaMeta
55.76%
52.94%
64.46%
49.87%
#25Deepseek R1 0528
Deepseek
54.41%
47.59%
60.24%
55.39%
#26Qwen 3 30B VL Instruct
Alibaba Qwen
53.93%
63.64%
50.90%
47.24%
#27Gemini 2.5 Flash Lite
GoogleGoogle
53.18%
57.22%
52.71%
49.62%
#28Deepseek V3.1
Deepseek
52.52%
48.66%
54.52%
54.39%
#29Gemini 2.5 Flash
GoogleGoogle
50.14%
54.55%
51.51%
44.36%
#30Llama 3.1 8B Instruct
MetaMeta
49.57%
51.34%
53.78%
43.61%
#31Gemini 2.5 Pro
GoogleGoogle
48.59%
54.01%
46.39%
45.36%
#32Llama 3.3 70B Instruct OR
MetaMeta
48.10%
40.64%
58.43%
45.23%
#33Gemini 2.0 Flash
GoogleGoogle
46.15%
47.59%
45.76%
45.09%
#34Gemini 2.0 Flash Lite
GoogleGoogle
46.13%
52.41%
39.88%
46.10%
#35Gemma 3 12B IT OR
GoogleGoogle
45.86%
47.06%
42.77%
47.74%
#36Deepseek V3 0324
Deepseek
44.98%
47.06%
45.78%
42.11%
#37Qwen 3 8B
Alibaba Qwen
44.33%
45.45%
43.94%
43.61%
#38Gemma 3 27B IT OR
GoogleGoogle
41.97%
36.36%
48.19%
41.35%
#39Magistral Medium Latest
Mistral
41.33%
37.43%
52.71%
33.83%
#40Deepseek V3
Deepseek
38.19%
39.57%
39.16%
35.84%
#41Grok 2
xAI
35.57%
35.14%
36.97%
34.60%
#42Mistral Small 3.2
Mistral
34.20%
33.69%
34.34%
34.59%
#43Grok 3
xAI
32.99%
33.16%
35.24%
30.58%
#44Mistral Medium Latest
Mistral
31.00%
28.34%
36.45%
28.21%
#45Mistral Large 2
Mistral
30.58%
27.27%
36.14%
28.32%
#46Grok 3 mini
xAI
28.72%
24.60%
33.73%
27.82%
#47Magistral Small Latest
Mistral
22.09%
17.11%
27.11%
22.06%
Mistral Small 3.1*
Mistral
N/A
N/A
N/A
N/A
Claude 3.5 Sonnet*
AnthropicAnthropic
N/A
N/A
N/A
N/A
Gemini 1.5 Pro*
GoogleGoogle
N/A
N/A
N/A
N/A
* Models marked with an asterisk have partial scores.