Claude 3.7 Sonnet vs

Key metrics
Average scores by task.
Language performance
Average score by language over all modules.
Module performance
Average score by module.