GPT-4o vs

Key metrics
Performance by task. Higher score indicates better performance on the given task.
Language performance
Average performance by language over all modules.
Module performance
Average performance by module.