AI 竞品情报系统

一句话摘要

V2-0628 在数学和推理 benchmark 上大幅提升，Arena-Hard 对 GPT-4 胜率接近翻倍，是 V2 时代最重要的能力跃升。

详细描述

deepseek-chat upgraded to DeepSeek-V2-0628. HumanEval 79.88%→84.76%, MATH 55.02%→71.02%, BBH 78.56%→83.40%. Arena-Hard win rate vs GPT-4-0314 increased from 41.6% to 68.3%. Role-playing capabilities significantly enhanced.

HumanEval 升至 84.76%，MATH 升至 71.02%，Arena-Hard 对 GPT-4-0314 胜率从 41.6% 升至 68.3%，角色扮演能力显著增强。

原文摘录

HumanEval Pass@1 79.88% -> 84.76%, MATH ACC@1 55.02% -> 71.02%, BBH 78.56% -> 83.40%. In the Arena-Hard evaluation, the win rate against GPT-4-0314 increased from 41.6% to 68.3%.

DeepSeek-V2-0628：推理和角色扮演能力提升

对我们的启示

一句话摘要

详细描述

原文摘录