BREAKING
OpenAI Launches GeneBench-Pro
129 research-level problems
Top models still score low
GPT-5.6 Sol Pro31.5
GPT-5.6 Sol high28.7
GPT-5.5 Pro33.2
Gemini 3.1 Pro11.2
Mimics a researcher's workflow
1Clean data
2Explore
3Pick model
4Decide
Human hours vs AI dollars
Human experts
20 to 40 hours per problem
Judgment-heavy analysis
AI inference
Costs only a few dollars
Noticing vs acting gap
Close, but not quite there
AI NEWS BLITZ
OpenAI has unveiled GeneBench-Pro, a benchmark for AI on messy biology data.