BREAKING
OpenAI Launches GeneBench-Pro
129 research-level problems
Top models still score low
GPT-5.6 Sol Pro
31.5
GPT-5.6 Sol high
28.7
GPT-5.5 Pro
33.2
Gemini 3.1 Pro
11.2
Mimics a researcher's workflow
1
Clean data
↓
2
Explore
↓
3
Pick model
↓
4
Decide
Human hours vs AI dollars
Human experts
●
20 to 40 hours per problem
●
Judgment-heavy analysis
AI inference
●
Costs only a few dollars
●
Noticing vs acting gap
Close, but not quite there
AI NEWS BLITZ
OpenAI has unveiled GeneBench-Pro, a benchmark for AI on messy biology data.