ainewsblitz.com

Breaking

Anthropic Examines the Barriers for AI Agents in Biology, Urges Database Redesign

On June 8, 2026, Anthropic published a Science Blog titled "Paving the way for agents in biology," analyzing why AI agents have not advanced as rapidly in biology as in coding. The piece argues that the very structure of biological databases is a fundamental bottleneck for agents and calls for new infrastructure to make data retrieval reliable.

Author Laura Luebbert and colleagues likened biological databases to "cities built before cars (agents) existed." Non-standard formats, scattered databases, web UIs designed around manual clicking, and ambiguous metadata make these resources extremely difficult for agents that autonomously chain together tools. By contrast, the software world—GitHub, APIs, package managers—was built to be operated programmatically from the start, which let coding agents like Claude Code move ahead first. That, the authors suggest, explains the gap in pace between coding and biology (details).

To test this hypothesis, the team built a benchmark called VirBench, comprising 120 queries across 40 pathogens. A typical task involved retrieving, for example, "sequences for ZEBOV (Zaire ebolavirus, TaxID 3052462), human-derived, from Africa, dated on or after January 1, 2014, at least 15,200 bases long, with no more than 1,900 ambiguous characters"—the kind of complex, conditional retrieval that arises in real research. State-of-the-art science agents including Claude Sonnet 4 and Opus 4.7, Stanford's Biomni, Edison Analysis, and GPT-5.2-pro/5.5 were tasked with pulling viral sequences from the virus database NCBI Virus.

The results showed that even cutting-edge models struggled with consistency and accuracy. Average accuracy ranged widely from 16.9% to 91.3%, peaking at only around 91.3%. More troubling was the lack of reproducibility: running the exact same query three times yielded wildly different numbers of sequences—106, then 15, then 5.

MetricAgent aloneWith gget virus
Accuracy range16.9%–91.3%Near 100%
Reproducibility of identical query106→15→5, highly variableStable

This variability is more than a technical nuisance. The post offers examples of how unstable retrieval directly distorts downstream analysis: estimates of the time to most recent common ancestor (TMRCA) in phylogenetic analysis drifting far off—to 1922 or to April—and impacts on therapeutic epitope analysis. In settings such as outbreak response and drug design, where accuracy can be a matter of life and death, such instability can be fatal.

As a solution, Anthropic proposed adding a "deterministic retrieval layer" such as gget virus behind the agent. By inserting a deterministic tool that returns the same result every time, accuracy reportedly improved to nearly 100%. The post argues that biological databases themselves should be "redesigned so agents can use them."

The analysis builds on Anthropic's work since the October 2025 launch of "Claude for Life Sciences" (details). The company has strengthened the ability to query databases directly via MCP servers and expanded partnerships and case studies with Manifold (the Terra platform), 10x Genomics, the Broad Institute, Stanford, Axiom Bio, Schrödinger, and the Allen Institute. In practice, Biomni reportedly analyzed 450 wearable-data datasets in 35 minutes (equivalent to three weeks of human work) and discovered a novel transcription factor from gene-activity data on 336,000 cells (details).

Among developers and researchers, natural-language single-cell and spatial analysis, along with direct database querying, correlation analysis, and toxicity prediction, are seen as delivering "new scale and efficiency." At the same time, persistent caution remains that biology still requires lab-based verification (decidability) and that replacing bench work itself is difficult. What this analysis brings into focus is a structural challenge that precedes model capability: biology's data foundations are not optimized for the age of agents, and how that infrastructure is built may well shape the future of science AI.