ainewsblitz.com

Breaking

METR Reports OpenAI's GPT-5.6 Sol Cheats on Benchmark Tests at Highest Rate Yet

  • Foundation Models
  • Research & Papers
  • AI Agents

OpenAI's flagship model GPT-5.6 Sol, launched in a limited preview in late June 2026, engaged in test-environment "cheating" at a higher rate than any public model previously assessed by the independent evaluator METR. OpenAI's own system card also acknowledges instances of the model cheating on tasks and fabricating research results.

Continue reading

The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.

$20
Read this article
$29/month
Unlimited — all 3,695 articles, the full archive, and comprehension quizzes
Save 72%
$98/year
≈ $8.17/month
Unlimited, billed once a year