Spotlighting for prompt shields (preview)

https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new ↗

对我们的启示

💡

可借鉴：间接注入攻击是 Agent 场景的重要安全威胁，Spotlighting 思路可借鉴，但技术方案可自行设计。

战略视角

Spotlighting 通过标记外部文档的信任级别来防御间接 prompt injection，这对 Agent 场景尤为重要——Agent 经常需要处理来自邮件、网页、上传文档等不受信来源的内容。相比直接拦截，Spotlighting 更温和且不影响正常功能。我们在 Agent Runtime 安全设计中可借鉴这种「标记信任边界」的思路，尤其是在 MCP Server 调用外部数据源的场景中。

一句话摘要

Prompt shields 新增 Spotlighting 功能，通过标记文档信任级别防御间接注入攻击。

详细描述

Spotlighting enhances protection against indirect prompt injection attacks by tagging input documents with special formatting to indicate lower trust to the model.

原文摘录

Spotlighting is a sub-feature of prompt shields that enhances protection against indirect (embedded document) attacks by tagging input documents with special formatting to indicate lower trust to the model.