一句话摘要
DeepSeek-V3.1 首次实现混合推理架构(单模型双模式),在 SWE-bench 和 Terminal-bench 上取得突破,Agent 和工具使用能力成为核心升级方向。
详细描述
DeepSeek-V3.1 introduces a hybrid reasoning architecture where a single model supports both thinking and non-thinking modes. Key benchmarks: SWE-bench Verified 66.0, SWE-bench Multilingual 54.5, Terminal-bench 31.3. Major improvements in tool usage and agent tasks, with significantly reduced thinking time vs DeepSeek-R1-0528.
V3.1 引入混合推理架构,单一模型同时支持思维和非思维模式。SWE-bench Verified 达 66.0,工具使用和 Agent 任务有重大改进,推理时间相比 R1-0528 显著缩短。
原文摘录
Hybrid reasoning architecture: A single model supports both thinking mode and non-thinking mode. Enhanced agent capabilities: With post-training optimization, the new model achieves major improvements in tool usage and intelligent agent tasks. SWE-bench Verified: 66.0, SWE-bench Multilingual: 54.5, Terminal-bench: 31.3.