NVIDIA's open-weight model "Nemotron 3 Ultra" has entered the newly launched real-world agent benchmark "Agent Arena" from Arena.ai (formerly LMArena), landing at #20 overall and #5 among open-weight models. According to Arena.ai's announcement, the model's standout strengths are a positive praise-versus-complaint margin and low tool hallucination (tied for #1), while low steerability (#25) and bash error recovery (#22) hold back its ranking. It is also noted that scores are still stabilizing, with wide confidence intervals.
Continue reading
The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.
Already purchased? Sign in✓ Signed in — this article isn’t included in your current plan.