Wordle Bench

Evaluating LLMs on their Wordle-solving capabilities

Leaderboard

Rank Model Provider Success Rate Avg Guesses Avg Cost
1 gpt-5 openai
99%
3.59 $0.20
2 claude-sonnet-4.6 anthropic
98%
3.7 $0.14
3 gpt-5.4 openai
98%
3.57 $0.09
4 claude-opus-4.6 anthropic
95%
3.58 $0.14
5 gpt-5.1 openai
95%
3.6 $0.13
6 grok-4.20-beta x-ai
95%
3.68 $0.07
7 gemini-3-pro-preview google
94%
3.53 $0.14
8 gpt-5-mini openai
94%
3.62 $0.04
9 claude-opus-4.5 anthropic
93%
3.62 $0.29
10 gemini-3.1-flash-lite-preview google
93%
3.69 $0.01
11 gpt-5.4-mini openai
89%
3.72 $0.05
12 grok-4.1-fast x-ai
89%
3.73 $0.01
13 claude-sonnet-4.5 anthropic
87%
3.63 $0.08
14 gemma-4-31b-it google
87%
3.56 $0.00
15 gpt-5.2 openai
87%
3.68 $0.03
16 kimi-k2.5 moonshotai
86%
3.59 $0.05
17 gpt-5.4-nano openai
86%
3.72 $0.01
18 qwen3-max-thinking qwen
86%
3.74 $0.24
19 gpt-5-nano openai
85%
3.69 $0.01
20 mimo-v2-flash xiaomi
85%
3.65 $0.00
21 qwen3-next-80b-a3b-thinking qwen
83%
3.69 $0.04
22 mimo-v2-pro xiaomi
81%
3.77 $0.03
23 claude-haiku-4.5 anthropic
80%
3.71 $0.08
24 gemini-3.1-pro-preview google
75%
3.64 $0.19
25 glm-5-turbo z-ai
75%
3.73 $0.03
26 gemini-2.5-pro google
72%
3.6 $0.16
27 gpt-oss-120b openai
72%
3.74 $0.01
28 gemini-2.5-flash google
63%
3.83 $0.03
29 qwen3.5-397b-a17b qwen
63%
3.63 $0.08
30 glm-5 z-ai
63%
3.59 $0.06
31 mistral-small-2603 mistralai
61%
3.98 $0.01
32 nemotron-3-nano-30b-a3b nvidia
58%
3.66 $0.01
33 gpt-oss-20b openai
55%
3.85 $0.01
34 gemini-3-flash-preview google
52%
3.46 $0.02
35 qwen3.5-122b-a10b qwen
52%
3.56 $0.14
36 qwen3.5-27b qwen
51%
3.61 $0.06
37 gemma-4-26b-a4b-it google
49%
3.41 $0.01
38 glm-5.1 z-ai
48%
3.5 $0.08
39 glm-4.7 z-ai
46%
3.67 $0.07
40 minimax-m2.5 minimax
42%
4.24 $0.01
41 qwen3.5-35b-a3b qwen
38%
3.79 $0.04
42 minimax-m2.7 minimax
22%
4.14 $0.01
43 mistral-large-2512 mistralai
10%
3.7 $0.00
44 glm-4.7-flash z-ai
7%
4.0 $0.01
45 trinity-large-thinking arcee-ai
3%
2.67 $0.04

Hardest words to solve

Rank Word
1 WAFER
2 PIPER
3 EXPEL
4 EAGER
5 BOOZY
6 CATCH
7 MUMMY
8 PROXY
9 GULLY
10 GOUGE

Models with most errors

Rank Model
1 trinity-large-thinking
2 glm-4.7-flash
3 gemma-4-26b-a4b-it
4 gemini-3-flash-preview
5 glm-5.1
6 glm-4.7
7 nemotron-3-nano-30b-a3b
8 gpt-oss-20b
9 qwen3.5-35b-a3b
10 gpt-oss-120b