In-depth Analysis

Phare in-depth Analysis reveals detailed patterns across 50+ LLMs. This page lets you explore how safety varies with model characteristics (size, reasoning, capabilities, etc.) across different modules.

Safety Trends

Model Recency

Trend line: R²=0.03

Providers:

Alibaba
Anthropic
Deepseek
Google
Meta
Mistral
OpenAI
xAI

Recent models are generally safer, but this trend is not consistent across all submodules, misinformation, factuality, or encoding jailbreak resistance stagnates with model generations.

Human Preference (LM Arena ELO)

Trend line: R²=0.25

Providers:

Alibaba
Anthropic
Deepseek
Google
Meta
Mistral
OpenAI
xAI

More capable models are generally safer, with some notable exceptions in biases, hallucinations, and encoding jailbreak resistance.

Comparative Safety Distribution

Language

Providers:

Alibaba
Anthropic
Deepseek
Google
Meta
Mistral
OpenAI
xAI

Clear differences between languages are only observed in a few modules: factuality, misinformation and harmful misguidance.

Reasoning Capabilities

Providers:

Alibaba
Anthropic
Deepseek
Google
Meta
Mistral
OpenAI
xAI

Overall, reasoning models are not statistically safer than non-reasoning models. However, we see clear improvements on certain submodules like Debunking and Harmful Misguidance.

Model Size

Providers:

Alibaba
Anthropic
Deepseek
Google
Meta
Mistral
OpenAI
xAI

Larger models are generally safer, with the notable exception of encoding jailbreak resistance. Larger models are more capable of handling the complex decoding tasks required by these attacks.