LLM Performance Leaderboard
Interactive comparison of large language models across multiple benchmarks
Filter by BenchmarkClick a benchmark to sort models
| Rank | Model Name | Organization | License | arenaElo Score | Votes |
|---|---|---|---|---|---|
| 1 | Grok-3-Preview-02-24 | xAI | Proprietary | 1412 | 3,364 |
| 2 | GPT-4.5-Preview | OpenAI | Proprietary | 1411 | 3,242 |
| 3 | Gemini-2.0-Flash-Thinking-Exp-01-21 | Proprietary | 1384 | 17,487 | |
| 4 | Gemini-2.0-Pro-Exp-02-05 | Proprietary | 1380 | 15,466 | |
| 5 | ChatGPT-4o-latest (2025-01-29) | OpenAI | Proprietary | 1377 | 17,221 |
| 6 | DeepSeek-R1 | DeepSeek | MIT | 1363 | 8,580 |
| 7 | Gemini-2.0-Flash-001 | Proprietary | 1357 | 13,257 | |
| 8 | o1-2024-12-17 | OpenAI | Proprietary | 1352 | 19,785 |
| 9 | Qwen2.5-Max | Alibaba | Proprietary | 1336 | 11,930 |
| 10 | o3-mini-high | OpenAI | Proprietary | 1329 | 9,102 |
| 11 | DeepSeek-V3 | DeepSeek | DeepSeek | 1318 | 22,007 |
| 12 | GLM-4-Plus-0111 | Zhipu | Proprietary | 1311 | 6,035 |
| 13 | Qwen-Plus-0125 | Alibaba | Proprietary | 1310 | 6,054 |
| 14 | Claude 3.7 Sonnet | Anthropic | Proprietary | 1309 | 4,254 |
| 15 | Gemini-2.0-Flash-Lite-Preview-02-05 | Proprietary | 1308 | 12,774 | |
| 16 | Step-2-16K-Exp | StepFun | Proprietary | 1305 | 5,132 |
| 17 | o1-mini | OpenAI | Proprietary | 1304 | 54,923 |
| 18 | o3-mini | OpenAI | Proprietary | 1304 | 15,463 |
| 19 | Gemini-1.5-Pro-002 | Proprietary | 1302 | 57,551 | |
| 20 | Grok-2-08-13 | xAI | Proprietary | 1288 | 67,038 |
| 21 | Yi-Lightning | 01.AI | Proprietary | 1287 | 28,946 |
| 22 | Claude 3.5 Sonnet (20241022) | Anthropic | Proprietary | 1284 | 59,139 |
| 23 | Deepseek-v2.5-1210 | DeepSeek | DeepSeek | 1279 | 7,247 |
| 24 | Athene-v2-Chat-72B | Nexusflow | Athene V2 | 1275 | 26,092 |
| 25 | GPT-4o-mini-2024-07-18 | OpenAI | Proprietary | 1272 | 66,710 |
| 26 | Hunyuan-Large-2025-02-10 | Tencent | Proprietary | 1271 | 3,860 |
| 27 | Gemini-1.5-Flash-002 | Proprietary | 1271 | 36,979 | |
| 28 | Llama-3.1-405B-Instruct-bf16 | Meta | Llama 3.1 | 1269 | 34,228 |
🚀 Real-time updates | 🔍 Interactive visualizations | 📊 Data-driven insights
Data aggregated from multiple benchmark sources • Last updated: March 2025