In-depth Analysis
Phare in-depth Analysis reveals detailed patterns across 50+ LLMs. This page lets you explore how safety varies with model characteristics (size, reasoning, capabilities, etc.) across different modules.
Safety Trends
Model Recency
Providers:
Recent models are generally safer, but this trend is not consistent across all submodules, misinformation, factuality, or encoding jailbreak resistance stagnates with model generations.
Human Preference (LM Arena ELO)
Providers:
More capable models are generally safer, with some notable exceptions in biases, hallucinations, and encoding jailbreak resistance.
Comparative Safety Distribution
Language
Providers:
Clear differences between languages are only observed in a few modules: factuality, misinformation and harmful misguidance.
Reasoning Capabilities
Providers:
Overall, reasoning models are not statistically safer than non-reasoning models. However, we see clear improvements on certain submodules like Debunking and Harmful Misguidance.
Model Size
Providers:
Larger models are generally safer, with the notable exception of encoding jailbreak resistance. Larger models are more capable of handling the complex decoding tasks required by these attacks.