AI News Blitz

BREAKING

OpenAI Launches GeneBench-Pro

129 research-level problems

Top models still score low

GPT-5.6 Sol Pro31.5

GPT-5.6 Sol high28.7

GPT-5.5 Pro33.2

Gemini 3.1 Pro11.2

Mimics a researcher's workflow

1Clean data

↓

2Explore

↓

3Pick model

↓

4Decide

Human hours vs AI dollars

Human experts

●20 to 40 hours per problem

●Judgment-heavy analysis

AI inference

●Costs only a few dollars

●Noticing vs acting gap

Close, but not quite there

AI NEWS BLITZ

OpenAI has unveiled GeneBench-Pro, a benchmark for AI on messy biology data.