Evaluating LLMs on their Wordle-solving capabilities
| Rank | Model | Provider | Success Rate | Avg Guesses | Avg Cost |
|---|---|---|---|---|---|
| 1 | gpt-5 | openai |
|
3.59 | $0.20 |
| 2 | claude-sonnet-4.6 | anthropic |
|
3.7 | $0.14 |
| 3 | gpt-5.4 | openai |
|
3.57 | $0.09 |
| 4 | claude-opus-4.6 | anthropic |
|
3.58 | $0.14 |
| 5 | gpt-5.1 | openai |
|
3.6 | $0.13 |
| 6 | grok-4.20-beta | x-ai |
|
3.68 | $0.07 |
| 7 | gemini-3-pro-preview |
|
3.53 | $0.14 | |
| 8 | gpt-5-mini | openai |
|
3.62 | $0.04 |
| 9 | claude-opus-4.5 | anthropic |
|
3.62 | $0.29 |
| 10 | gemini-3.1-flash-lite-preview |
|
3.69 | $0.01 | |
| 11 | gpt-5.4-mini | openai |
|
3.72 | $0.05 |
| 12 | grok-4.1-fast | x-ai |
|
3.73 | $0.01 |
| 13 | claude-sonnet-4.5 | anthropic |
|
3.63 | $0.08 |
| 14 | gemma-4-31b-it |
|
3.56 | $0.00 | |
| 15 | gpt-5.2 | openai |
|
3.68 | $0.03 |
| 16 | kimi-k2.5 | moonshotai |
|
3.59 | $0.05 |
| 17 | gpt-5.4-nano | openai |
|
3.72 | $0.01 |
| 18 | qwen3-max-thinking | qwen |
|
3.74 | $0.24 |
| 19 | gpt-5-nano | openai |
|
3.69 | $0.01 |
| 20 | mimo-v2-flash | xiaomi |
|
3.65 | $0.00 |
| 21 | qwen3-next-80b-a3b-thinking | qwen |
|
3.69 | $0.04 |
| 22 | mimo-v2-pro | xiaomi |
|
3.77 | $0.03 |
| 23 | claude-haiku-4.5 | anthropic |
|
3.71 | $0.08 |
| 24 | gemini-3.1-pro-preview |
|
3.64 | $0.19 | |
| 25 | glm-5-turbo | z-ai |
|
3.73 | $0.03 |
| 26 | gemini-2.5-pro |
|
3.6 | $0.16 | |
| 27 | gpt-oss-120b | openai |
|
3.74 | $0.01 |
| 28 | gemini-2.5-flash |
|
3.83 | $0.03 | |
| 29 | qwen3.5-397b-a17b | qwen |
|
3.63 | $0.08 |
| 30 | glm-5 | z-ai |
|
3.59 | $0.06 |
| 31 | mistral-small-2603 | mistralai |
|
3.98 | $0.01 |
| 32 | nemotron-3-nano-30b-a3b | nvidia |
|
3.66 | $0.01 |
| 33 | gpt-oss-20b | openai |
|
3.85 | $0.01 |
| 34 | gemini-3-flash-preview |
|
3.46 | $0.02 | |
| 35 | qwen3.5-122b-a10b | qwen |
|
3.56 | $0.14 |
| 36 | qwen3.5-27b | qwen |
|
3.61 | $0.06 |
| 37 | gemma-4-26b-a4b-it |
|
3.41 | $0.01 | |
| 38 | glm-5.1 | z-ai |
|
3.5 | $0.08 |
| 39 | glm-4.7 | z-ai |
|
3.67 | $0.07 |
| 40 | minimax-m2.5 | minimax |
|
4.24 | $0.01 |
| 41 | qwen3.5-35b-a3b | qwen |
|
3.79 | $0.04 |
| 42 | minimax-m2.7 | minimax |
|
4.14 | $0.01 |
| 43 | mistral-large-2512 | mistralai |
|
3.7 | $0.00 |
| 44 | glm-4.7-flash | z-ai |
|
4.0 | $0.01 |
| 45 | trinity-large-thinking | arcee-ai |
|
2.67 | $0.04 |
| Rank | Word |
|---|---|
| 1 | WAFER |
| 2 | PIPER |
| 3 | EXPEL |
| 4 | EAGER |
| 5 | BOOZY |
| 6 | CATCH |
| 7 | MUMMY |
| 8 | PROXY |
| 9 | GULLY |
| 10 | GOUGE |
| Rank | Model |
|---|---|
| 1 | trinity-large-thinking |
| 2 | glm-4.7-flash |
| 3 | gemma-4-26b-a4b-it |
| 4 | gemini-3-flash-preview |
| 5 | glm-5.1 |
| 6 | glm-4.7 |
| 7 | nemotron-3-nano-30b-a3b |
| 8 | gpt-oss-20b |
| 9 | qwen3.5-35b-a3b |
| 10 | gpt-oss-120b |