模型评估 on 棱镜空间

模型评估 on 棱镜空间 https://pengjiyuan.github.io/tags/%E6%A8%A1%E5%9E%8B%E8%AF%84%E4%BC%B0/ Recent content in 模型评估 on 棱镜空间 Hugo -- 0.146.0 zh-CN Thu, 09 Apr 2026 00:00:00 +0000 GLM-5.1 深度解析：开源模型首次突破 1700 步自主执行，8 小时独立完成复杂任务 https://pengjiyuan.github.io/articles/glm-5-1-long-horizon-agent-2026/ Thu, 09 Apr 2026 00:00:00 +0000 https://pengjiyuan.github.io/articles/glm-5-1-long-horizon-agent-2026/ Z.ai 发布的 GLM-5.1 以 7540 亿参数的 MoE 架构，首次在开源模型中实现了 1700 步连续工具调用、8 小时自主工作能力。本文深度解析其「楼梯式优化」技术路径、SWE-Bench Pro 超越 Opus 4.6 的关键指标，以及对开源 Agent 生态的深远影响。