一句话摘要
V3-0324 在推理 benchmark 上全面提升,修复了此前 V3 版本的 Function Calling 问题,中文写作对齐 R1 风格。
详细描述
deepseek-chat upgraded to DeepSeek-V3-0324. Benchmark improvements: MMLU-Pro 75.9→81.2, GPQA 59.1→68.4, AIME 39.6→59.4, LiveCodeBench 39.2→49.2. Improved front-end web development, Chinese writing aligned with R1 style, enhanced Function Calling accuracy.
deepseek-chat 升级至 V3-0324,MMLU-Pro 提升至 81.2,AIME 提升 19.8 分。前端开发、中文写作对齐 R1 风格,Function Calling 准确性提高。
原文摘录
MMLU-Pro: 75.9 → 81.2 (+5.3), GPQA: 59.1 → 68.4 (+9.3), AIME: 39.6 → 59.4 (+19.8), LiveCodeBench: 39.2 → 49.2 (+10.0). Function Calling Improvements: Increased accuracy in Function Calling, fixing issues from previous V3 versions.