RAT: 思考与响应的解耦

RAT(Retrieval Augmented Thinking) 是由EverArt创始人 Pietro Schirano提出的一种混合LLM生成范式, 其主要思想是利用DeepSeek-R1的推理能力来指导其他LLM进行结构化思考, 从而生成更加准确的结果。

RAT的想法来源于对DeepSeek-R1 API特性的诗意破解: 在调用 deepseek-reasoner (即DeepSeek-R1) 模型时, 将最终响应token长度设置为1,即可使得推理与最终响应分离。

1
2
3
4
5
6
7
self.deepseek_messages.append({"role": "user", "content": user_input})
response = self.deepseek_client.chat.completions.create(
model=DEEPSEEK_MODEL,
max_tokens=1,
messages=self.deepseek_messages,
stream=True
)

RAT的实现非常简单: 获取 R1 的思考过程, 然后将其拼接到响应LLM的提示词中。

1
2
3
4
combined_prompt = (
f"<question>{user_input}</question>\n\n"
f"<thinking>{reasoning}</thinking>\n\n"
)

对于 Claude Sonnet, 由于其API 支持消息预填技术 (Message Pre-filling), 可以将思考过程作为Assistant消息发送, 让模型误以为是自己在思考。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
user_message = {
"role": "user",
"content": [
{
"type": "text",
"text": user_input
}
]
}

assistant_prefill = {
"role": "assistant",
"content": [
{
"type": "text",
"text": f"<thinking>{reasoning}</thinking>"
}
]
}

messages = [user_message, assistant_prefill]

据Pietro在X上称, 任何LLM在经过 R1 推理增强后, 效果都大大提升。

From my experience, it makes any LLM perform way better, even the old GPT-4. I know someone ran evaluations on Aider, and this process, using Sonnet as the second LLM, absolutely crushed the benchmarks.

据DeepSeek-R1中宣称, R1虽然在推理能力上相比 DeepSeek-V3 有了大幅提升, 但在函数调用、结构化输出、角色扮演等方面能力有所退化。 通过将思考与响应分离, 可以发挥思考与响应模型各自的优势, 从而得到从内容和格式上都更加准确的结果。

当然, 口说无凭。 真实效果如何, 需要更加客观、严谨的基准测试。

R1 + Sonnet: 代码领域的模型协作

Aider 是一款基于命令行的AI结对编程工具。1月24日, Aider在其官方博客中宣称,其通过 DeepSeek-R1 和 Claude Sonnet 的模型组合刷新了 polyglot 代码能力基准测试的SOTA 成绩。 (详见 R1+Sonnet set SOTA on aider’s polyglot benchmark)

在Aider的实验中, R1 被作为 Architect 模型, 用于生成解决问题的路径, 这是推理模型擅长的区域。 Sonnet则被作为Editor 模型,进行具体的代码编辑并应用。

R1+Sonnet在 aider polyglot 测试中得到了64%的成绩, 高于 OpenAI O1的 61.7%, 而单独使用R1和Sonnet的成绩分别是56.9%和51.6%。而与前SOTA O1相比, R1 + Sonnet的计算成本节省了14倍。

文章中也提到, O1与Sonnet的叠加并没有带来正向提升, 同时, 将Editor模型切换成其他模型也没有带来更多的提升。 与对相对的是, 早期推理模型, 如o1-preview 和 o1-mini, 在与很多Editor模型进行搭配时都有效果上的提升。

此外, 文章也指出, 通过将 thinking tokens 注入 Editor 模型的方式实际上效果更差。

To be clear, the results above are not using R1’s thinking tokens, just the normal final output. R1 is configured in aider’s standard architect role with Sonnet as editor. The benchmark results that used the thinking tokens appear to be worse than the architect/editor results shared here.

要在 aider 中指定R1为 Architect模型以及 sonnet作为Editor模型, 可以通过以下方式启动aider:

1
2
3
4
export DEEPSEEK_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>

aider --architect --model r1 --editor-model sonnet

或者在配置文件~/.aider.conf.yml中增加以下配置后直接通过aider命令启动:

1
2
3
chat-mode: architect 
model: r1
editor-model: sonnet

Aider博客中提到的这种 R1 + Sonnet 协作方式也在另一款热门的AI编程工具 Cline上得到了推崇。 Cline官方号在其X上写道:

1
2
3
Sometimes the best patterns emerge from usage, not design.

"R1's reasoning is incredible for planning, then I switch to Sonnet for the actual coding. It's not about picking sides - each model has its strengths."

这种结合R1强大的推理能力和低廉成本, 同时又不丢失Claude 出色的代码编辑能力的组合方式,可以预见会迅速成为AI编程领域的标配。(不知Cursor 何时跟进:))

DeepSeek-R1 提示词模板

PPFO: 基于XML的R1提示词模板

PPFO(Purpose, Planning, Format, Output) 是 @cj_zZZz提出的一套应用于DeepSeek-R1的提示词框架, 它将提示词拆分成目标、计划、格式和输出四个部分, 并以XML格式组织。 提供尽可能多的细节 , 同时避免在目标与输出部分使用项目列表符号。

示例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
PPFO Framework for Deepseek r1

<purpose>
You are an expert full - stack NextJS developer specializing in building scalable, performant, and maintainable web applications. Your expertise includes server - side rendering (SSR), static site generation (SSG), incremental static regeneration (ISR), and API route optimization. You prioritize clean, idiomatic code and adhere to Next.js best practices, ensuring seamless integration between frontend and backend components. Your goal is to deliver solutions that are not only functional but also optimized for performance, SEO, and user experience.
</purpose>

<planning_rules>
- Create a 4 - step plan for each task (e.g., setup, implementation, testing, deployment).
- Display the current step clearly.
- Ask for clarification on ambiguous requirements.
- Optimize for NextJS best practices (e.g., SSR, ISR, API routes).
</planning_rules>

<format_rules>
- Use code blocks for components, API routes, and configuration.
- Split long code into logical sections (e.g., frontend, backend, config).
- Create artifacts for file - level tasks (e.g., `page.tsx`, `api/route.ts`).
- Keep responses brief but complete.
</format_rules>

<output>
Create responses following these rules. Focus on scalable, performant solutions while maintaining a concise, helpful style.
</output>

Python 代码生成模板

由 @pyquantnews 提出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<context>

You are an expert programming AI assistant who prioritizes minimalist, efficient code. You plan before coding, write idiomatic solutions, seek clarification when needed, and accept user preferences even if suboptimal.

</context>

<planning_rules>

Create 3-step numbered plans before coding
Display current plan step clearly
Ask for clarification on ambiguity
Optimize for minimal code and overhead
</planning_rules>

<format_rules>

Use code blocks for simple tasks
Split long code into sections
Create artifacts for file-level tasks
Keep responses brief but complete
</format_rules>

OUTPUT: Create responses following these rules. Focus on minimal, efficient solutions while maintaining a helpful, concise style.