技能库 / 开发编程 / Claude API 集成

全部分类开发编程创意写作商业办公教育学习效率工具数据分析技术集成通用语言翻译文档生成

Claude API 集成

在应用中正确调用 Claude API：鉴权、消息格式、流式与错误处理等。

v1.0.0 已认证

作者 / 来源

github-anthropics

在来源站打开

登录后收藏登录后加入合集

安装方式

CLI 安装（推荐）

claw install oss-anthropic-claude-api

需要安装 CLAW CLI

手动下载安装

下载 ZIP 后解压到技能目录即可安装。若在桌面客户端 WebView中直接下载出现异常，本站会改为提示页 + 原始链接，请按页内说明操作。

下载 ZIP (oss-anthropic-claude-api-v1.0.0.zip)

触发指令

/claude-api

跨平台安装指引

该技能声明兼容以下 1 个平台，将 ZIP 解压到对应目录即可被识别。

支持矩阵

Claude Code Coding Agent

macOS / Linux：~/.claude/skills/

Windows：%USERPROFILE%\.claude\skills\

unzip oss-anthropic-claude-api-v1.0.0.zip -d ~/.claude/skills/

目录不存在时请先 mkdir -p 创建；启用 Skill 后请重启对应 Agent 让配置生效。

使用指南

用 Claude 构建 LLM 应用

帮助使用 Claude API 与 Anthropic SDK 构建 LLM 应用。按需求选对 产品面（surface），检测项目语言，再读对应语言的文档节选。

默认约定

除非用户另有要求：

模型：使用 Claude Opus 4.6，模型字符串 claude-opus-4-6。
思考：凡略复杂任务，默认 thinking: { type: "adaptive" }（自适应思考）。
流式：输入长、输出长或 max_tokens 高时，默认流式，避免请求超时；若不需要逐事件处理，可用 SDK 的 .get_final_message() / .finalMessage() 取完整回复。

语言检测

读代码示例前，先判断用户用的语言：

看项目文件推断：
- *.py、requirements.txt、pyproject.toml 等 → Python，读 python/
- *.ts、*.tsx、package.json、tsconfig.json → TypeScript，读 typescript/
- 仅 *.js、*.jsx 且无 .ts → 仍按 TypeScript SDK 读 typescript/
- *.java、pom.xml、build.gradle → Java，读 java/
- *.kt、*.kts → Kotlin，用 Java SDK，读 java/
- *.scala、build.sbt → Scala，读 java/
- *.go、go.mod → Go，读 go/
- *.rb、Gemfile → Ruby，读 ruby/
- *.cs、*.csproj → C#，读 csharp/
- *.php、composer.json → PHP，读 php/
多语言并存： 看用户当前文件或问题指向哪一侧；仍模糊则问：「检测到 Python 与 TypeScript，Claude API 集成主要用哪边？」
无法推断： 用 AskUserQuestion 给选项：Python、TypeScript、Java、Go、Ruby、cURL/原始 HTTP、C#、PHP；若无该工具，默认展示 Python 并说明可换语言。
不支持的语言（Rust、Swift、C++、Elixir 等）：建议 curl/ 的原始 HTTP，并说明可能有社区 SDK；也可提供 Python/TS 作参考。
需要 cURL/原始 HTTP： 读 curl/。

各语言功能支持（Tool Runner / Agent SDK）

| 语言 | Tool Runner | Agent SDK | 说明 | |------|-------------|-----------|------| | Python | 有（beta） | 有 | 完整，@beta_tool | | TypeScript | 有（beta） | 有 | 完整，betaZodTool + Zod | | Java | 有（beta） | 无 | 注解类 beta tool | | Go | 有（beta） | 无 | toolrunner 包 | | Ruby | 有（beta） | 无 | BaseTool + tool_runner | | cURL | 不适用 | 不适用 | 原始 HTTP | | C# | 无 | 无 | 官方 SDK | | PHP | 有（beta） | 无 | BetaRunnableTool + toolRunner() |

该用哪种产品面？

从简： 默认选 能满足需求的最简单一层。单次调用与工作流覆盖多数场景；只有真需要 开放-ended、模型主导探索 时再上升为 Agent。

| 场景 | 层级 | 推荐 | 原因 | |------|------|------|------| | 分类、摘要、抽取、问答 | 单次调用 | Claude API | 一问一答 | | 批处理或向量 | 单次调用 | Claude API | 专用端点 | | 多步流水线，逻辑由代码编排 | 工作流 | Claude API + 工具 | 你控制循环 | | 自定义工具的智能体 | Agent | Claude API + 工具 | 最灵活 | | 需要内置文件/网页/终端 | Agent | Agent SDK | 内置工具、安全、MCP | | 编程助手类 Agent | Agent | Agent SDK | 面向该场景 | | 要内置权限与护栏 | Agent | Agent SDK | 安全能力开箱即用 |

说明： Agent SDK 适合要 内置文件/网页/终端、权限、MCP 的场景。若 工具全自建，用 Claude API 即可——可用 tool runner 自动循环，或手写循环做审批、日志、条件分支。

决策树（简述）

单次 LLM（分类/摘要/抽取/问答）→ Claude API
是否 Claude 本人 要读写文件、浏览网页、执行 shell（不是你读好再喂给模型）？是 → Agent SDK
多步工作流 + 自有工具 → Claude API + 工具
开放轨迹 + 自有工具 → Claude API 的 agentic 循环

是否该做「Agent」？

先过四问：复杂度（是否多步且难事先写死）、价值（是否值得更高成本与延迟）、可行性（模型是否擅长）、错误成本（能否用测试/审查/回滚兜住）。任一为「否」则留在更简单层级。

架构要点

统一走 POST /v1/messages；工具与输出约束都是该端点能力，不是另一套 API。

用户自定义工具：用装饰器、Zod 或原始 JSON 定义；SDK tool runner 负责请求、执行函数、循环直到结束；也可手写循环精细控制。
服务端工具：Anthropic 托管；代码执行在服务端；Computer use 可托管或自托管。
结构化输出：约束回复格式（output_config.format）与工具参数（strict: true）；推荐 client.messages.parse()。已弃用 顶层 output_format，改用 output_config: { format: {...} }。
配套端点：Batches、Files、Token 计数、Models（GET /v1/models 等）服务于上述流程。

当前模型（缓存日期见上游 SKILL）

| 模型 | ID | 上下文 | 输入 $/1M | 输出 $/1M | |------|-----|--------|-----------|-----------| | Claude Opus 4.6 | claude-opus-4-6 | 200K（1M beta） | $5.00 | $25.00 | | Claude Sonnet 4.6 | claude-sonnet-4-6 | 200K（1M beta） | $3.00 | $15.00 | | Claude Haiku 4.5 | claude-haiku-4-5 | 200K | $1.00 | $5.00 |

除非用户明确指定其他模型，否则始终用 claude-opus-4-6。 不要擅自为省钱改 Sonnet/Haiku。

只用表中精确 ID，不要自造日期后缀（如误用 claude-sonnet-4-5-20250514）。用户要旧版时读 shared/models.md 取官方 ID。

实时能力查询： 用户问上下文、是否支持 vision/thinking/effort 等时，用 Models API（client.models.retrieve / list），见 shared/models.md。

思考与 Effort（速查）

Opus 4.6 — 自适应思考（推荐）： thinking: { type: "adaptive" }；不要在 Opus 4.6 上使用 budget_tokens（已弃用）。用户说要「扩展思考」「思考预算」时，仍用 Opus 4.6 + adaptive。

Effort（GA，无需 beta 头）： output_config: { effort: "low"|"medium"|"high"|"max" }（在 output_config 内，非顶层）；默认 high；max 仅 Opus 4.6；与 Sonnet 4.5 / Haiku 4.5 组合会报错。可与 adaptive 联用。

Sonnet 4.6： 支持 adaptive；同样弃用 budget_tokens。

仅当用户明确要求旧模型时： 如 Sonnet 4.5 等可用 thinking: { type: "enabled", budget_tokens: N }，且 budget_tokens < max_tokens（至少 1024）。不要仅因用户提到 budget_tokens 就换旧模型——优先 Opus 4.6 + adaptive。

压缩（Compaction）速查

Beta，Opus 4.6 / Sonnet 4.6。 长对话可能超 200K 时可开服务端压缩；接近阈值（默认约 150K）时自动摘要更早上下文。需 beta 头 compact-2026-01-12。

关键： 每轮要把 response.content 整体 拼回 messages，不能只抽文本丢掉压缩块，否则 丢失压缩状态。

示例见 {lang}/claude-api/README.md 的 Compaction 节；更多见 shared/live-sources.md。

提示缓存（Prompt Caching）速查

前缀匹配： 前缀任意字节变化会使其后缓存全部失效。渲染顺序：tools → system → messages。稳定内容（固定 system、确定性工具列表）放前，易变内容（时间戳、每次不同的 ID）放在最后一个 cache_control 断点之后。

顶层自动缓存： messages.create() 上 cache_control: { type: "ephemeral" } 最简单；每请求最多 4 个断点；可缓存前缀约 ≥1024 tokens，更短则 静默不缓存。

用 usage.cache_read_input_tokens 验证；反复为 0 说明有 静默失效因素（如 system 里 datetime.now()、JSON 键无序、工具集每轮变化等）。

详见 shared/prompt-caching.md 与各语言 README 对应章节。

阅读指南

按语言进入 {lang}/ 下文件夹，按需打开：

快捷：

单次文本任务 → {lang}/claude-api/README.md
聊天 UI / 流式展示 → 加 {lang}/claude-api/streaming.md
超长对话 → README 中 Compaction
缓存优化 → shared/prompt-caching.md + README 缓存节
工具/智能体 → README + shared/tool-use-concepts.md + {lang}/claude-api/tool-use.md
批处理 → batches.md
跨请求复用文件 → files-api.md
内置工具 Agent → {lang}/agent-sdk/README.md + patterns.md

Agent SDK 仅 Python / TypeScript 有完整文档树。

何时用 WebFetch 拉最新文档

用户要「最新」、本地缓存疑似过时、或问到本文未覆盖的功能时；URL 列表在 shared/live-sources.md。

常见陷阱

向 API 传文件或长内容时 不要静默截断；超长则告知用户并讨论分块、摘要等。
Opus 4.6 / Sonnet 4.6：用 adaptive，不用 budget_tokens。
Opus 4.6 禁止助手消息 prefill（会 400）；改用结构化输出或 system 约束格式。
max_tokens 不要太小，否则思考/输出中途被截断需重试；非流式默认可给约 16000；流式可更高（如约 64000）。分类等极短输出再降低。
128K 输出需流式 + get_final_message 等，避免 HTTP 超时。
Opus 4.6 工具调用 input 的 JSON 转义可能与旧版不同；始终 JSON.parse / json.loads，不要对原始字符串做脆弱匹配。
结构化输出用 output_config.format，不要用已弃用的 output_format。
不要重复造 SDK 轮子：用 stream.finalMessage()、SDK 异常类型、SDK 导出类型（MessageParam、Tool 等），不要自造一套弱类型接口。
需要报告/文档/图表时，代码执行沙箱预装 docx/pptx/matplotlib/pillow/pypdf 等，可考虑 Files API 返回文件，而非只打 stdout。

# Building LLM-Powered Applications with Claude

This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.

## Defaults

Unless the user requests otherwise:

For the Claude model version, please use Claude Opus 4.6, which you can access via the exact model string `claude-opus-4-6`. Please default to using adaptive thinking (`thinking: {type: "adaptive"}`) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high `max_tokens` — it prevents hitting request timeouts. Use the SDK's `.get_final_message()` / `.finalMessage()` helper to get the complete response if you don't need to handle individual stream events

---

## Language Detection

Before reading code examples, determine which language the user is working in:

1. **Look at project files** to infer the language:

   - `*.py`, `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile` → **Python** — read from `python/`
   - `*.ts`, `*.tsx`, `package.json`, `tsconfig.json` → **TypeScript** — read from `typescript/`
   - `*.js`, `*.jsx` (no `.ts` files present) → **TypeScript** — JS uses the same SDK, read from `typescript/`
   - `*.java`, `pom.xml`, `build.gradle` → **Java** — read from `java/`
   - `*.kt`, `*.kts`, `build.gradle.kts` → **Java** — Kotlin uses the Java SDK, read from `java/`
   - `*.scala`, `build.sbt` → **Java** — Scala uses the Java SDK, read from `java/`
   - `*.go`, `go.mod` → **Go** — read from `go/`
   - `*.rb`, `Gemfile` → **Ruby** — read from `ruby/`
   - `*.cs`, `*.csproj` → **C#** — read from `csharp/`
   - `*.php`, `composer.json` → **PHP** — read from `php/`

2. **If multiple languages detected** (e.g., both Python and TypeScript files):

   - Check which language the user's current file or question relates to
   - If still ambiguous, ask: "I detected both Python and TypeScript files. Which language are you using for the Claude API integration?"

3. **If language can't be inferred** (empty project, no source files, or unsupported language):

   - Use AskUserQuestion with options: Python, TypeScript, Java, Go, Ruby, cURL/raw HTTP, C#, PHP
   - If AskUserQuestion is unavailable, default to Python examples and note: "Showing Python examples. Let me know if you need a different language."

4. **If unsupported language detected** (Rust, Swift, C++, Elixir, etc.):

   - Suggest cURL/raw HTTP examples from `curl/` and note that community SDKs may exist
   - Offer to show Python or TypeScript examples as reference implementations

5. **If user needs cURL/raw HTTP examples**, read from `curl/`.

### Language-Specific Feature Support

| Language   | Tool Runner | Agent SDK | Notes                                 |
| ---------- | ----------- | --------- | ------------------------------------- |
| Python     | Yes (beta)  | Yes       | Full support — `@beta_tool` decorator |
| TypeScript | Yes (beta)  | Yes       | Full support — `betaZodTool` + Zod    |
| Java       | Yes (beta)  | No        | Beta tool use with annotated classes  |
| Go         | Yes (beta)  | No        | `BetaToolRunner` in `toolrunner` pkg  |
| Ruby       | Yes (beta)  | No        | `BaseTool` + `tool_runner` in beta    |
| cURL       | N/A         | N/A       | Raw HTTP, no SDK features             |
| C#         | No          | No        | Official SDK                          |
| PHP        | Yes (beta)  | No        | `BetaRunnableTool` + `toolRunner()`   |

---

## Which Surface Should I Use?

> **Start simple.** Default to the simplest tier that meets your needs. Single API calls and workflows handle most use cases — only reach for agents when the task genuinely requires open-ended, model-driven exploration.

| Use Case                                        | Tier            | Recommended Surface       | Why                                     |
| ----------------------------------------------- | --------------- | ------------------------- | --------------------------------------- |
| Classification, summarization, extraction, Q&A  | Single LLM call | **Claude API**            | One request, one response               |
| Batch processing or embeddings                  | Single LLM call | **Claude API**            | Specialized endpoints                   |
| Multi-step pipelines with code-controlled logic | Workflow        | **Claude API + tool use** | You orchestrate the loop                |
| Custom agent with your own tools                | Agent           | **Claude API + tool use** | Maximum flexibility                     |
| AI agent with file/web/terminal access          | Agent           | **Agent SDK**             | Built-in tools, safety, and MCP support |
| Agentic coding assistant                        | Agent           | **Agent SDK**             | Designed for this use case              |
| Want built-in permissions and guardrails        | Agent           | **Agent SDK**             | Safety features included                |

> **Note:** The Agent SDK is for when you want built-in file/web/terminal tools, permissions, and MCP out of the box. If you want to build an agent with your own tools, Claude API is the right choice — use the tool runner for automatic loop handling, or the manual loop for fine-grained control (approval gates, custom logging, conditional execution).

### Decision Tree

```
What does your application need?

1. Single LLM call (classification, summarization, extraction, Q&A)
   └── Claude API — one request, one response

2. Does Claude need to read/write files, browse the web, or run shell commands
   as part of its work? (Not: does your app read a file and hand it to Claude —
   does Claude itself need to discover and access files/web/shell?)
   └── Yes → Agent SDK — built-in tools, don't reimplement them
       Examples: "scan a codebase for bugs", "summarize every file in a directory",
                 "find bugs using subagents", "research a topic via web search"

3. Workflow (multi-step, code-orchestrated, with your own tools)
   └── Claude API with tool use — you control the loop

4. Open-ended agent (model decides its own trajectory, your own tools)
   └── Claude API agentic loop (maximum flexibility)
```

### Should I Build an Agent?

Before choosing the agent tier, check all four criteria:

- **Complexity** — Is the task multi-step and hard to fully specify in advance? (e.g., "turn this design doc into a PR" vs. "extract the title from this PDF")
- **Value** — Does the outcome justify higher cost and latency?
- **Viability** — Is Claude capable at this task type?
- **Cost of error** — Can errors be caught and recovered from? (tests, review, rollback)

If the answer is "no" to any of these, stay at a simpler tier (single call or workflow).

---

## Architecture

Everything goes through `POST /v1/messages`. Tools and output constraints are features of this single endpoint — not separate APIs.

**User-defined tools** — You define tools (via decorators, Zod schemas, or raw JSON), and the SDK's tool runner handles calling the API, executing your functions, and looping until Claude is done. For full control, you can write the loop manually.

**Server-side tools** — Anthropic-hosted tools that run on Anthropic's infrastructure. Code execution is fully server-side (declare it in `tools`, Claude runs code automatically). Computer use can be server-hosted or self-hosted.

**Structured outputs** — Constrains the Messages API response format (`output_config.format`) and/or tool parameter validation (`strict: true`). The recommended approach is `client.messages.parse()` which validates responses against your schema automatically. Note: the old `output_format` parameter is deprecated; use `output_config: {format: {...}}` on `messages.create()`.

**Supporting endpoints** — Batches (`POST /v1/messages/batches`), Files (`POST /v1/files`), Token Counting, and Models (`GET /v1/models`, `GET /v1/models/{id}` — live capability/context-window discovery) feed into or support Messages API requests.

---

## Current Models (cached: 2026-02-17)

| Model             | Model ID            | Context        | Input $/1M | Output $/1M |
| ----------------- | ------------------- | -------------- | ---------- | ----------- |
| Claude Opus 4.6   | `claude-opus-4-6`   | 200K (1M beta) | $5.00      | $25.00      |
| Claude Sonnet 4.6 | `claude-sonnet-4-6` | 200K (1M beta) | $3.00      | $15.00      |
| Claude Haiku 4.5  | `claude-haiku-4-5`  | 200K           | $1.00      | $5.00       |

**ALWAYS use `claude-opus-4-6` unless the user explicitly names a different model.** This is non-negotiable. Do not use `claude-sonnet-4-6`, `claude-sonnet-4-5`, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours.

**CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes.** For example, use `claude-sonnet-4-5`, never `claude-sonnet-4-5-20250514` or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read `shared/models.md` for the exact ID — do not construct one yourself.

A note: if any of the model strings above look unfamiliar to you, that's to be expected — that just means they were released after your training data cutoff. Rest assured they are real models; we wouldn't mess with you like that.

**Live capability lookup:** The table above is cached. When the user asks "what's the context window for X", "does X support vision/thinking/effort", or "which models support Y", query the Models API (`client.models.retrieve(id)` / `client.models.list()`) — see `shared/models.md` for the field reference and capability-filter examples.

---

## Thinking & Effort (Quick Reference)

**Opus 4.6 — Adaptive thinking (recommended):** Use `thinking: {type: "adaptive"}`. Claude dynamically decides when and how much to think. No `budget_tokens` needed — `budget_tokens` is deprecated on Opus 4.6 and Sonnet 4.6 and must not be used. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). **When the user asks for "extended thinking", a "thinking budget", or `budget_tokens`: always use Opus 4.6 with `thinking: {type: "adaptive"}`. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use `budget_tokens` and do NOT switch to an older model.**

**Effort parameter (GA, no beta header):** Controls thinking depth and overall token spend via `output_config: {effort: "low"|"medium"|"high"|"max"}` (inside `output_config`, not top-level). Default is `high` (equivalent to omitting it). `max` is Opus 4.6 only. Works on Opus 4.5, Opus 4.6, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. Combine with adaptive thinking for the best cost-quality tradeoffs. Use `low` for subagents or simple tasks; `max` for the deepest reasoning.

**Sonnet 4.6:** Supports adaptive thinking (`thinking: {type: "adaptive"}`). `budget_tokens` is deprecated on Sonnet 4.6 — use adaptive thinking instead.

**Older models (only if explicitly requested):** If the user specifically asks for Sonnet 4.5 or another older model, use `thinking: {type: "enabled", budget_tokens: N}`. `budget_tokens` must be less than `max_tokens` (minimum 1024). Never choose an older model just because the user mentions `budget_tokens` — use Opus 4.6 with adaptive thinking instead.

---

## Compaction (Quick Reference)

**Beta, Opus 4.6 and Sonnet 4.6.** For long-running conversations that may exceed the 200K context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header `compact-2026-01-12`.

**Critical:** Append `response.content` (not just the text) back to your messages on every turn. Compaction blocks in the response must be preserved — the API uses them to replace the compacted history on the next request. Extracting only the text string and appending that will silently lose the compaction state.

See `{lang}/claude-api/README.md` (Compaction section) for code examples. Full docs via WebFetch in `shared/live-sources.md`.

---

## Prompt Caching (Quick Reference)

**Prefix match.** Any byte change anywhere in the prefix invalidates everything after it. Render order is `tools` → `system` → `messages`. Keep stable content first (frozen system prompt, deterministic tool list), put volatile content (timestamps, per-request IDs, varying questions) after the last `cache_control` breakpoint.

**Top-level auto-caching** (`cache_control: {type: "ephemeral"}` on `messages.create()`) is the simplest option when you don't need fine-grained placement. Max 4 breakpoints per request. Minimum cacheable prefix is ~1024 tokens — shorter prefixes silently won't cache.

**Verify with `usage.cache_read_input_tokens`** — if it's zero across repeated requests, a silent invalidator is at work (`datetime.now()` in system prompt, unsorted JSON, varying tool set).

For placement patterns, architectural guidance, and the silent-invalidator audit checklist: read `shared/prompt-caching.md`. Language-specific syntax: `{lang}/claude-api/README.md` (Prompt Caching section).

---

## Reading Guide

After detecting the language, read the relevant files based on what the user needs:

### Quick Task Reference

**Single text classification/summarization/extraction/Q&A:**
→ Read only `{lang}/claude-api/README.md`

**Chat UI or real-time response display:**
→ Read `{lang}/claude-api/README.md` + `{lang}/claude-api/streaming.md`

**Long-running conversations (may exceed context window):**
→ Read `{lang}/claude-api/README.md` — see Compaction section

**Prompt caching / optimize caching / "why is my cache hit rate low":**
→ Read `shared/prompt-caching.md` + `{lang}/claude-api/README.md` (Prompt Caching section)

**Function calling / tool use / agents:**
→ Read `{lang}/claude-api/README.md` + `shared/tool-use-concepts.md` + `{lang}/claude-api/tool-use.md`

**Batch processing (non-latency-sensitive):**
→ Read `{lang}/claude-api/README.md` + `{lang}/claude-api/batches.md`

**File uploads across multiple requests:**
→ Read `{lang}/claude-api/README.md` + `{lang}/claude-api/files-api.md`

**Agent with built-in tools (file/web/terminal):**
→ Read `{lang}/agent-sdk/README.md` + `{lang}/agent-sdk/patterns.md`

### Claude API (Full File Reference)

Read the **language-specific Claude API folder** (`{language}/claude-api/`):

1. **`{language}/claude-api/README.md`** — **Read this first.** Installation, quick start, common patterns, error handling.
2. **`shared/tool-use-concepts.md`** — Read when the user needs function calling, code execution, memory, or structured outputs. Covers conceptual foundations.
3. **`{language}/claude-api/tool-use.md`** — Read for language-specific tool use code examples (tool runner, manual loop, code execution, memory, structured outputs).
4. **`{language}/claude-api/streaming.md`** — Read when building chat UIs or interfaces that display responses incrementally.
5. **`{language}/claude-api/batches.md`** — Read when processing many requests offline (not latency-sensitive). Runs asynchronously at 50% cost.
6. **`{language}/claude-api/files-api.md`** — Read when sending the same file across multiple requests without re-uploading.
7. **`shared/prompt-caching.md`** — Read when adding or optimizing prompt caching. Covers prefix-stability design, breakpoint placement, and anti-patterns that silently invalidate cache.
8. **`shared/error-codes.md`** — Read when debugging HTTP errors or implementing error handling.
9. **`shared/live-sources.md`** — WebFetch URLs for fetching the latest official documentation.

> **Note:** For Java, Go, Ruby, C#, PHP, and cURL — these have a single file each covering all basics. Read that file plus `shared/tool-use-concepts.md` and `shared/error-codes.md` as needed.

### Agent SDK

Read the **language-specific Agent SDK folder** (`{language}/agent-sdk/`). Agent SDK is available for **Python and TypeScript only**.

1. **`{language}/agent-sdk/README.md`** — Installation, quick start, built-in tools, permissions, MCP, hooks.
2. **`{language}/agent-sdk/patterns.md`** — Custom tools, hooks, subagents, MCP integration, session resumption.
3. **`shared/live-sources.md`** — WebFetch URLs for current Agent SDK docs.

---

## When to Use WebFetch

Use WebFetch to get the latest documentation when:

- User asks for "latest" or "current" information
- Cached data seems incorrect
- User asks about features not covered here

Live documentation URLs are in `shared/live-sources.md`.

## Common Pitfalls

- Don't truncate inputs when passing files or content to the API. If the content is too long to fit in the context window, notify the user and discuss options (chunking, summarization, etc.) rather than silently truncating.
- **Opus 4.6 / Sonnet 4.6 thinking:** Use `thinking: {type: "adaptive"}` — do NOT use `budget_tokens` (deprecated on both Opus 4.6 and Sonnet 4.6). For older models, `budget_tokens` must be less than `max_tokens` (minimum 1024). This will throw an error if you get it wrong.
- **Opus 4.6 prefill removed:** Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6. Use structured outputs (`output_config.format`) or system prompt instructions to control response format instead.
- **`max_tokens` defaults:** Don't lowball `max_tokens` — hitting the cap truncates output mid-thought and requires a retry. For non-streaming requests, default to `~16000` (keeps responses under SDK HTTP timeouts). For streaming requests, default to `~64000` (timeouts aren't a concern, so give the model room). Only go lower when you have a hard reason: classification (`~256`), cost caps, or deliberately short outputs.
- **128K output tokens:** Opus 4.6 supports up to 128K `max_tokens`, but the SDKs require streaming for values that large to avoid HTTP timeouts. Use `.stream()` with `.get_final_message()` / `.finalMessage()`.
- **Tool call JSON parsing (Opus 4.6):** Opus 4.6 may produce different JSON string escaping in tool call `input` fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with `json.loads()` / `JSON.parse()` — never do raw string matching on the serialized input.
- **Structured outputs (all models):** Use `output_config: {format: {...}}` instead of the deprecated `output_format` parameter on `messages.create()`. This is a general API change, not 4.6-specific.
- **Don't reimplement SDK functionality:** The SDK provides high-level helpers — use them instead of building from scratch. Specifically: use `stream.finalMessage()` instead of wrapping `.on()` events in `new Promise()`; use typed exception classes (`Anthropic.RateLimitError`, etc.) instead of string-matching error messages; use SDK types (`Anthropic.MessageParam`, `Anthropic.Tool`, `Anthropic.Message`, etc.) instead of redefining equivalent interfaces.
- **Don't define custom types for SDK data structures:** The SDK exports types for all API objects. Use `Anthropic.MessageParam` for messages, `Anthropic.Tool` for tool definitions, `Anthropic.ToolUseBlock` / `Anthropic.ToolResultBlockParam` for tool results, `Anthropic.Message` for responses. Defining your own `interface ChatMessage { role: string; content: unknown }` duplicates what the SDK already provides and loses type safety.
- **Report and document output:** For tasks that produce reports, documents, or visualizations, the code execution sandbox has `python-docx`, `python-pptx`, `matplotlib`, `pillow`, and `pypdf` pre-installed. Claude can generate formatted files (DOCX, PDF, charts) and return them via the Files API — consider this for "report" or "document" type requests instead of plain stdout text.