一句话摘要
R1 推理模型首次获得 Function Calling 和 JSON 输出能力,推理 benchmark 全面提升,使推理模型可在 Agent 场景中直接使用。
详细描述
deepseek-reasoner upgraded to DeepSeek-R1-0528. Major benchmark improvements: AIME 2025 70.0→87.5, GPQA 71.5→81.0, LCB_v6 63.5→73.3, Aider 57.0→71.6. New features: JSON Output support, Function Calling support. Tau-bench scores: 53.5 (Airline) / 63.9 (Retail). Also optimized front-end development and reduced hallucinations.
reasoner 模型升级至 R1-0528,AIME 2025 提升 17.5 分,新增 JSON 输出和 Function Calling 支持。Tau-bench:53.5(航空)/ 63.9(零售)。前端开发优化,幻觉显著降低。
原文摘录
deepseek-reasoner Model Upgraded to DeepSeek-R1-0528: Enhanced Reasoning Capabilities. JSON Output & Function Calling Support. Function call performance: Tau-bench score: 53.5 (Airline) / 63.9 (Retail).