Wordle Bench

Evaluating LLMs on their Wordle-solving capabilities

Leaderboard

Rank Model Provider Success Rate Avg Guesses Avg Cost
1 gpt-5 openai
99%
3.59 $0.20
2 claude-sonnet-4.6 anthropic
98%
3.7 $0.14
3 gpt-5.4 openai
98%
3.57 $0.09
4 claude-opus-4.6 anthropic
95%
3.58 $0.14
5 gpt-5.1 openai
95%
3.6 $0.13
6 grok-4.20-beta x-ai
95%
3.68 $0.07
7 gemini-3-pro-preview google
94%
3.53 $0.14
8 gpt-5-mini openai
94%
3.62 $0.04
9 claude-opus-4.5 anthropic
93%
3.62 $0.29
10 gemini-3.1-flash-lite-preview google
93%
3.69 $0.01
11 gpt-5.4-mini openai
89%
3.72 $0.05
12 grok-4.1-fast x-ai
89%
3.73 $0.01
13 claude-opus-4.7 anthropic
87%
3.53 $0.10
14 claude-sonnet-4.5 anthropic
87%
3.63 $0.08
15 gemma-4-31b-it google
87%
3.56 $0.00
16 gpt-5.2 openai
87%
3.68 $0.03
17 kimi-k2.5 moonshotai
86%
3.59 $0.05
18 gpt-5.4-nano openai
86%
3.72 $0.01
19 qwen3-max-thinking qwen
86%
3.74 $0.24
20 gpt-5-nano openai
85%
3.69 $0.01
21 mimo-v2-flash xiaomi
85%
3.65 $0.00
22 qwen3-next-80b-a3b-thinking qwen
83%
3.69 $0.04
23 kimi-k2.6 moonshotai
81%
3.75 $0.21
24 mimo-v2-pro xiaomi
81%
3.77 $0.03
25 claude-haiku-4.5 anthropic
80%
3.71 $0.08
26 mimo-v2.5-pro xiaomi
79%
3.76 $0.04
27 gemini-3.1-pro-preview google
75%
3.64 $0.19
28 glm-5-turbo z-ai
75%
3.73 $0.03
29 gemini-2.5-pro google
72%
3.6 $0.16
30 gpt-oss-120b openai
72%
3.74 $0.01
31 gemini-2.5-flash google
63%
3.83 $0.03
32 qwen3.5-397b-a17b qwen
63%
3.63 $0.08
33 glm-5 z-ai
63%
3.59 $0.06
34 mistral-small-2603 mistralai
61%
3.98 $0.01
35 nemotron-3-nano-30b-a3b nvidia
58%
3.66 $0.01
36 gpt-oss-20b openai
55%
3.85 $0.01
37 gemini-3-flash-preview google
52%
3.46 $0.02
38 qwen3.5-122b-a10b qwen
52%
3.56 $0.14
39 qwen3.5-27b qwen
51%
3.61 $0.06
40 mimo-v2.5 xiaomi
51%
3.67 $0.02
41 gemma-4-26b-a4b-it google
49%
3.41 $0.01
42 glm-5.1 z-ai
48%
3.5 $0.08
43 glm-4.7 z-ai
46%
3.67 $0.07
44 minimax-m2.5 minimax
42%
4.24 $0.01
45 qwen3.5-35b-a3b qwen
38%
3.79 $0.04
46 minimax-m2.7 minimax
22%
4.14 $0.01
47 mistral-large-2512 mistralai
10%
3.7 $0.00
48 glm-4.7-flash z-ai
7%
4.0 $0.01
49 trinity-large-thinking arcee-ai
3%
2.67 $0.04

Hardest words to solve

Rank Word
1 WAFER
2 PIPER
3 EXPEL
4 EAGER
5 BOOZY
6 MUMMY
7 CATCH
8 GULLY
9 PROXY
10 FIFTY

Models with most errors

Rank Model
1 trinity-large-thinking
2 glm-4.7-flash
3 gemma-4-26b-a4b-it
4 gemini-3-flash-preview
5 glm-5.1
6 mimo-v2.5
7 glm-4.7
8 nemotron-3-nano-30b-a3b
9 gpt-oss-20b
10 qwen3.5-35b-a3b