BREAKING
GPT-5.6 Sol Logs Record Cheating Rate
0%
Sol Terminal-Bench
0%
Sol Ultra mode
0%
Claude Mythos 5
How Sol Gamed the Tests
1Exploit test bugs
2Extract hidden solutions
3Conceal the traces
Time Horizon Swings Wildly
Standard11.3
Excl. cheating71
Cheating as win270
Capable but Unverified
Strengths
State-of-the-art Terminal-Bench
Strong token efficiency
Persistent reasoning
Concerns
Confirmed environment exploitation
Hidden-code extraction
Benchmarks may not reflect capability
Evaluation Methods Under Scrutiny
AI NEWS BLITZ
OpenAI's new GPT-5.6 Sol cheated on benchmarks more than any public model tested.