Back to Blog

12 个渐进 Sessions:从零构建 Claude Code 的 Agent Harness

2026-03-30
Claude Code Agent Harness Engineering Anthropic AI 开发教程

12 个渐进 Sessions:从零构建 Claude Code 的 Agent Harness

Claude Code 是 Anthropic 官方的 AI 编程助手。它的架构设计优雅、功能强大,但其核心思想却非常简单:Model 是 Agent,Code 只是 Harness

本文基于 shareAI-lab/learn-claude-code 仓库,通过 12 个渐进式的编程 Sessions,从零开始实现 Claude Code 的核心机制,带你深入理解 Harness Engineering 的精髓。

核心理念:Agent 是训练出来的 Model,不是编写出来的代码。我们工程师的职责不是"开发 Agent",而是"建造 Harness"——为 Agent 提供工具、知识、上下文和权限,让它能够有效地在特定领域行动。


为什么是 12 个 Sessions?

每个 Session 聚焦一个核心机制,按渐进式难度编排:

Phase 1: THE LOOP                    Phase 2: PLANNING & KNOWLEDGE
==================                   =============================
s01  The Agent Loop          [1]     s03  TodoWrite               [5]
     while + stop_reason                  TodoManager + nag reminder
     |                                    |
     +-> s02  Tool Use            [4]     s04  Subagents            [5]
              dispatch map: name->handler     fresh messages[] per child
                                              |
                                         s05  Skills               [5]
                                              SKILL.md via tool_result
                                              |
                                         s06  Context Compact      [5]
                                              3-layer compression

Phase 3: PERSISTENCE                 Phase 4: TEAMS
==================                   =====================
s07  Tasks                   [8]     s09  Agent Teams             [9]
     file-based CRUD + deps graph         teammates + JSONL mailboxes
     |                                    |
s08  Background Tasks        [6]     s10  Team Protocols          [12]
     daemon threads + notify queue        shutdown + plan approval FSM
                                          |
                                     s11  Autonomous Agents       [14]
                                          idle cycle + auto-claim
                                     |
                                     s12  Worktree Isolation      [16]
                                          task coordination + optional isolated execution lanes

Phase 1: THE LOOP —— Agent 的核心

Session 01: The Agent Loop

座右铭:"One loop & Bash is all you need"

一切从一个简单的循环开始:

def agent_loop(messages):
    while True:
        response = client.messages.create(
            model=MODEL,
            messages=messages,
            tools=TOOLS,
        )
        messages.append({"role": "assistant", "content": response.content})

        # 检查 Agent 是否想要使用工具
        if response.stop_reason != "tool_use":
            return  # Agent 完成了任务

        # 执行 Agent 请求的工具调用
        results = []
        for block in response.content:
            if block.type == "tool_use":
                output = TOOL_HANDLERS[block.name](**block.input)
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": output,
                })

        # 将工具结果返回给 Agent,继续循环
        messages.append({"role": "user", "content": results})

关键洞察

  • Agent 决定何时调用工具、调用什么工具
  • Code 只是执行 Agent 的请求
  • 这个循环模式贯穿所有后续 Sessions

Session 02: Tool Use

座右铭:"Adding a tool means adding one handler"

添加新工具不需要改变循环逻辑,只需注册处理器:

TOOL_HANDLERS = {
    "bash": bash_handler,
    "read_file": read_handler,
    "write_file": write_handler,
}

# 添加新工具只需注册
TOOL_HANDLERS["search"] = search_handler

设计原则:工具应该是原子的、可组合的、描述清晰的。


Phase 2: PLANNING & KNOWLEDGE —— 规划与知识

Session 03: TodoWrite

座右铭:"An agent without a plan drifts"

没有计划的 Agent 会漂移。先列出步骤,再执行:

class TodoManager:
    def __init__(self):
        self.todos = []

    def add_todos(self, items):
        self.todos.extend(items)

    def complete_todo(self, index):
        if 0 <= index < len(self.todos):
            self.todos[index]["status"] = "completed"

    def get_progress(self):
        completed = sum(1 for t in self.todos if t["status"] == "completed")
        return f"{completed}/{len(self.todos)}"

# Agent 先规划
todos = ["Read README", "Install dependencies", "Run tests"]
todo_manager.add_todos(todos)

# 然后按顺序执行
for i, todo in enumerate(todos):
    execute_step(todo)
    todo_manager.complete_todo(i)
    print(f"Progress: {todo_manager.get_progress()}")

Session 04: Subagents

座右铭:"Break big tasks down; each subtask gets a clean context"

大任务分解给 Subagent,每个 Subagent 有独立的上下文:

def spawn_subagent(task_description):
    # 独立的 messages[],不污染主对话
    sub_messages = [
        {"role": "user", "content": task_description}
    ]
    return agent_loop(sub_messages)

# 主 Agent 的 context 保持简洁
main_context = handle_big_task_in_parallel()

# 复杂任务交给 Subagent
result = spawn_subagent("Analyze this codebase and report bugs")

好处

  • 主对话保持清晰,不会被细节淹没
  • Subagent 可以专注于特定任务
  • 失败隔离:一个 Subagent 失败不影响其他

Session 05: Skills

座右铭:"Load knowledge when you need it, not upfront"

知识按需加载,而不是塞进 system prompt:

def load_skill(skill_name):
    skill_path = f"skills/{skill_name}/SKILL.md"
    skill_content = read_file(skill_path)

    # 通过 tool_result 注入,不污染 system prompt
    return {
        "type": "tool_result",
        "tool_use_id": current_tool_id,
        "content": f"[Loaded skill: {skill_name}]\n\n{skill_content}"
    }

# Agent 请求加载 Skill 时才注入
if tool_name == "load_skill":
    result = load_skill(tool_input["skill_name"])

Claude Code 的实现:Skill 是包含指令和元数据的 SKILL.md 文件,Agent 在需要时动态加载。

Session 06: Context Compact

座右铭:"Context will fill up; you need a way to make room"

三层压缩策略实现无限会话:

def compress_context(messages):
    # Layer 1: 保留最近 N 条消息
    recent = messages[-100:]

    # Layer 2: 将更早的消息压缩成摘要
    old_messages = messages[:-100]
    summary = summarize_messages(old_messages)

    # Layer 3: 选择性保留关键信息
    key_info = extract_key_info(old_messages)

    return [
        {"role": "system", "content": f"Previous context summary: {summary}"},
        *key_info,
        *recent
    ]

Claude Code 的做法:当 context 接近 token 限制时,自动压缩早期对话,保留关键信息。


Phase 3: PERSISTENCE —— 持久化

Session 07: Tasks

座右铭:"Break big goals into small tasks, order them, persist to disk"

任务持久化到文件,支持依赖关系:

import json
from pathlib import Path

class TaskGraph:
    def __init__(self, filepath):
        self.filepath = Path(filepath)
        self.tasks = self.load()

    def load(self):
        if self.filepath.exists():
            return json.loads(self.filepath.read_text())
        return []

    def save(self):
        self.filepath.write_text(json.dumps(self.tasks, indent=2))

    def add_task(self, description, depends_on=None):
        task = {
            "id": f"task-{len(self.tasks) + 1}",
            "description": description,
            "status": "pending",
            "depends_on": depends_on or []
        }
        self.tasks.append(task)
        self.save()

    def get_ready_tasks(self):
        """获取可以执行的任务(依赖已满足)"""
        ready = []
        for task in self.tasks:
            if task["status"] == "pending":
                deps = task.get("depends_on", [])
                if all(self.tasks[d]["status"] == "completed" for d in deps):
                    ready.append(task)
        return ready

为多 Agent 协作奠定基础:任务图可以让多个 Agent 并行工作。

Session 08: Background Tasks

座右铭:"Run slow operations in the background; the agent keeps thinking"

后台任务执行,Agent 不阻塞:

import threading
import queue

class BackgroundTaskManager:
    def __init__(self):
        self.notification_queue = queue.Queue()

    def run_background(self, command):
        def worker():
            result = subprocess.run(command, capture_output=True, text=True)
            # 完成时注入通知
            self.notification_queue.put({
                "type": "task_complete",
                "command": command,
                "output": result.stdout
            })

        thread = threading.Thread(target=worker)
        thread.start()

    def check_notifications(self):
        """Agent 每次循环检查是否有后台任务完成"""
        notifications = []
        while not self.notification_queue.empty():
            notifications.append(self.notification_queue.get())
        return notifications

场景:长时间运行的测试、文件下载、数据同步等。


Phase 4: TEAMS —— 团队协作

Session 09: Agent Teams

座右铭:"When the task is too big for one, delegate to teammates"

持久化队友 + 异步 mailboxes:

import json
from pathlib import Path

class Teammate:
    def __init__(self, name, skill):
        self.name = name
        self.skill = skill
        self.mailbox_path = Path(f"mailboxes/{name}.jsonl")

    def send_message(self, message):
        """发送消息到队友的 mailbox"""
        with open(self.mailbox_path, "a") as f:
            f.write(json.dumps(message) + "\n")

    def check_mailbox(self):
        """读取新消息"""
        if not self.mailbox_path.exists():
            return []

        messages = []
        with open(self.mailbox_path) as f:
            for line in f:
                messages.append(json.loads(line))
        return messages

# 创建专业化的队友
coder = Teammate("coder", "writes code")
tester = Teammate("tester", "runs tests")
reviewer = Teammate("reviewer", "reviews code")

Session 10: Team Protocols

座右铭:"Teammates need shared communication rules"

统一的 request-response 协议:

# 标准消息格式
def create_request(from_agent, to_agent, task, context=None):
    return {
        "type": "request",
        "from": from_agent,
        "to": to_agent,
        "task": task,
        "context": context or {},
        "timestamp": datetime.now().isoformat()
    }

def create_response(request, status, result=None):
    return {
        "type": "response",
        "from": request["to"],
        "to": request["from"],
        "in_reply_to": request.get("timestamp"),
        "status": status,  # "accepted", "rejected", "completed"
        "result": result,
        "timestamp": datetime.now().isoformat()
    }

# 使用示例
request = create_request(
    "lead", "coder",
    "Implement authentication",
    {"framework": "fastapi"}
)

coder.send_message(request)

# Coder 处理并回复
messages = coder.check_mailbox()
for msg in messages:
    if msg["type"] == "request":
        # 执行任务
        result = implement_auth(msg["task"])
        # 发送回复
        response = create_response(msg, "completed", result)
        lead.send_message(response)

Session 11: Autonomous Agents

座右铭:"Teammates scan the board and claim tasks themselves"

自主任务认领,无需中心分配:

def autonomous_cycle(teammate, task_graph):
    while True:
        # 扫描任务板
        ready_tasks = task_graph.get_ready_tasks()

        # 自动认领匹配的任务
        for task in ready_tasks:
            if can_handle(teammate, task):
                # 认领任务
                task["status"] = "in_progress"
                task["assigned_to"] = teammate.name
                task_graph.save()

                # 执行任务
                try:
                    result = execute_task(teammate, task)
                    task["status"] = "completed"
                    task["result"] = result
                except Exception as e:
                    task["status"] = "failed"
                    task["error"] = str(e)
                finally:
                    task_graph.save()

        # 心跳间隔
        time.sleep(30)

# 每个 teammate 运行自己的 autonomous cycle
for teammate in team:
    threading.Thread(
        target=autonomous_cycle,
        args=(teammate, task_graph)
    ).start()

Session 12: Worktree Isolation

座右铭:"Each works in its own directory, no interference"

工作树隔离,Tasks 管理目标,Worktrees 管理目录:

import subprocess
import shutil

class Worktree:
    def __init__(self, task_id, base_path="worktrees"):
        self.task_id = task_id
        self.path = Path(base_path) / task_id
        self.path.mkdir(parents=True, exist_ok=True)

    def execute_in_isolation(self, command):
        """在隔离环境中执行命令"""
        return subprocess.run(
            command,
            cwd=self.path,
            capture_output=True,
            text=True
        )

    def cleanup(self):
        """清理工作目录"""
        if self.path.exists():
            shutil.rmtree(self.path)

# Task 和 Worktree 通过 ID 绑定
task = {
    "id": "task-123",
    "description": "Build feature X",
    "worktree_id": "worktree-123"
}

# 创建隔离的工作环境
worktree = Worktree(task["worktree_id"])

# 在隔离环境中执行
result = worktree.execute_in_isolation(["npm", "install"])
result = worktree.execute_in_isolation(["npm", "run", "build"])

# 完成后清理
worktree.cleanup()

整合:完整的 Agent Harness

将所有机制整合起来:

class AgentHarness:
    def __init__(self):
        # Phase 1: 核心循环
        self.tools = self.setup_tools()

        # Phase 2: 规划与知识
        self.todo_manager = TodoManager()
        self.skill_loader = SkillLoader()
        self.context_manager = ContextManager()

        # Phase 3: 持久化
        self.task_graph = TaskGraph("tasks.json")
        self.background_tasks = BackgroundTaskManager()

        # Phase 4: 团队
        self.team = self.setup_team()
        self.worktrees = WorktreeManager()

    def setup_tools(self):
        return {
            "bash": bash_handler,
            "read_file": read_handler,
            "write_file": write_handler,
            "todo_write": self.todo_manager.add_todos,
            "load_skill": self.skill_loader.load,
            "spawn_subagent": spawn_subagent,
        }

    def run(self, user_message):
        messages = [{"role": "user", "content": user_message}]

        while True:
            # 检查后台任务通知
            notifications = self.background_tasks.check_notifications()
            if notifications:
                messages.append({
                    "role": "system",
                    "content": f"Background tasks: {notifications}"
                })

            # 压缩 context(如果需要)
            messages = self.context_manager.compress_if_needed(messages)

            # 调用 LLM
            response = client.messages.create(
                model=MODEL,
                messages=messages,
                tools=self.tools,
            )

            messages.append({"role": "assistant", "content": response.content})

            # 检查是否完成
            if response.stop_reason != "tool_use":
                return messages

            # 执行工具调用
            for block in response.content:
                if block.type == "tool_use":
                    output = self.tools[block.name](**block.input)
                    messages.append({
                        "role": "user",
                        "content": [{
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": str(output)
                        }]
                    })

关键设计原则

通过这 12 个 Sessions,我们可以总结出 Claude Code 架构的核心设计原则:

1. Model 是 Agent,Code 是 Harness

  • Agent 的智能来自训练,不是代码
  • Code 的职责是提供环境,不是模拟智能

2. 信任 Model,简化 Harness

  • 不要用复杂的决策树限制 Model
  • 给 Model 工具,让它自己决定何时使用

3. 按需加载,保持简洁

  • 知识按需加载,不塞满 system prompt
  • Context 动态管理,及时压缩清理

4. 隔离与协作

  • Subagents 独立上下文,避免污染
  • Teammates 异步通信,并行工作
  • Worktrees 隔离执行,互不干扰

5. 持久化一切

  • Tasks 持久化到文件
  • Mailboxes 持久化通信
  • 支持中断恢复

实战应用:构建你自己的 Agent Harness

理解原理后,你可以:

1. 学习 Claude Code

  • 阅读 shareAI-lab/learn-claude-code 完整实现
  • 运行 12 个参考实现,观察每个机制如何工作
  • 修改和扩展,理解 trade-offs

2. 使用 Kode Agent CLI

npm i -g @shareai-lab/kode

开源的编码 Agent CLI,支持多种 Model,可学习其 Harness 设计。

3. 集成到你的应用

使用 Kode Agent SDK 将 Agent 能力嵌入你的后端、浏览器扩展或任何应用。


进阶:从 On-Demand 到 Always-On

本文讨论的 Harness 是"用完即弃"模型——每次 session 从零开始。

如果你的需求是"始终在线"的 Assistant,可以参考 claw0 仓库,它基于相同的 Agent 核心,添加了:

  • Heartbeat:定期唤醒检查是否有工作
  • Cron:Agent 可以调度自己的未来任务
  • IM Channels:多渠道即时通讯(WhatsApp、Telegram、Slack 等)
  • Memory:持久化上下文记忆
  • Soul:个性化人格系统
learn-claude-code                   claw0
(agent harness core)                (proactive always-on harness)

总结

通过 12 个渐进 Sessions,我们从零构建了一个完整的 Agent Harness,实现了 Claude Code 的核心机制:

  1. 核心循环(s01-s02):Agent Loop + Tool Use
  2. 规划与知识(s03-s06):TodoWrite、Subagents、Skills、Context Compact
  3. 持久化(s07-s08):Tasks、Background Tasks
  4. 团队协作(s09-s12):Agent Teams、Protocols、Autonomous Agents、Worktree Isolation

最重要的收获:Agent 是训练出来的 Model,不是编写出来的代码。作为工程师,我们的职责是建造优秀的 Harness——给 Model 提供工具、知识、上下文和权限,然后信任它,让它去推理、去决定、去行动。

这就是 Claude Code 的灵魂。这就是 Harness Engineering 的精髓。


资源链接

Enjoyed this article? Share it with others!