安装方式
手动下载安装
下载 ZIP 后解压到技能目录即可安装。若在桌面客户端 WebView中直接下载出现异常,本站会改为提示页 + 原始链接,请按页内说明操作。
下载 ZIP (oss-superpowers-lab-finding-duplicate-functions-v1.0.0.zip)触发指令
/finding-duplicate-fu
跨平台安装指引
该技能声明兼容以下 1 个平台,将 ZIP 解压到对应目录即可被识别。
unzip oss-superpowers-lab-finding-duplicate-functions-v1.0.0.zip -d ~/.claude/skills/
mkdir -p 创建;启用 Skill 后请重启对应 Agent 让配置生效。
使用指南
查找「意图重复」的函数
审计代码库中 语义重复:名称或实现不同、但 干同一件事 的函数。LLM 参与的项目里常 重复造轮子;传统 jscpd 只抓 文本相似,漏掉「同意图不同写法」。
方法: 经典抽取 + LLM 按意图聚类/判重。
何时使用
- 多人(人或 LLM)长期堆代码,工具函数泛滥
- 怀疑同一逻辑被实现多次
- 大重构前想合并重复
- jscpd 已扫过语法重复,要做 语义层
速查
| 阶段 | 工具 | 模型 | 产出 |
|------|------|------|------|
| 1 抽取 | scripts/extract-functions.sh | - | catalog.json |
| 2 分类 | scripts/categorize-prompt.md | haiku | categorized.json |
| 3 拆分 | scripts/prepare-category-analysis.sh | - | categories/*.json |
| 4 判重 | scripts/find-duplicates-prompt.md | 每类 opus | duplicates/*.json |
| 5 报告 | scripts/generate-report.sh | - | report.md |
| 6 人工 | - | 人 | 合并/删重复 |
阶段说明
1 抽取:
./scripts/extract-functions.sh src/ -o catalog.json
常用选项:-o 输出、-c 上下文行数、-t 文件 glob、--include-tests(默认排除测试)。
2 分类: 用 haiku 子代理 + categorize-prompt.md,输入 catalog.json,输出 categorized.json。
3 拆分: ./scripts/prepare-category-analysis.sh categorized.json ./categories — 每类一文件;少于 3 个函数 的类可跳过深入分析。
4 判重: 对 categories/ 下每个文件派 opus 子代理 + find-duplicates-prompt.md,输出到 duplicates/{category}.json。
5 报告: ./scripts/generate-report.sh ./duplicates ./duplicates-report.md。
6 人工: 高置信条目:确认 保留函数有测试 → 改调用方 → 删重复 → 跑测试。
高风险重复区
utils/、helpers/、校验、错误格式化、路径处理、字符串/日期格式化、API 响应塑形等。
常见错误
抽太多内部小函数;跳过分类 直接全库判重(噪声大);用 haiku 做最终判重(易漏微妙语义);无测试就删重复。
# Finding Duplicate-Intent Functions
## Overview
LLM-generated codebases accumulate semantic duplicates: functions that serve the same purpose but were implemented independently. Classical copy-paste detectors (jscpd) find syntactic duplicates but miss "same intent, different implementation."
This skill uses a two-phase approach: classical extraction followed by LLM-powered intent clustering.
## When to Use
- Codebase has grown organically with multiple contributors (human or LLM)
- You suspect utility functions have been reimplemented multiple times
- Before major refactoring to identify consolidation opportunities
- After jscpd has been run and syntactic duplicates are already handled
## Quick Reference
| Phase | Tool | Model | Output |
|-------|------|-------|--------|
| 1. Extract | `scripts/extract-functions.sh` | - | `catalog.json` |
| 2. Categorize | `scripts/categorize-prompt.md` | haiku | `categorized.json` |
| 3. Split | `scripts/prepare-category-analysis.sh` | - | `categories/*.json` |
| 4. Detect | `scripts/find-duplicates-prompt.md` | opus | `duplicates/*.json` |
| 5. Report | `scripts/generate-report.sh` | - | `report.md` |
## Process
```dot
digraph duplicate_detection {
rankdir=TB;
node [shape=box];
extract [label="1. Extract function catalog\n./scripts/extract-functions.sh"];
categorize [label="2. Categorize by domain\n(haiku subagent)"];
split [label="3. Split into categories\n./scripts/prepare-category-analysis.sh"];
detect [label="4. Find duplicates per category\n(opus subagent per category)"];
report [label="5. Generate report\n./scripts/generate-report.sh"];
review [label="6. Human review & consolidate"];
extract -> categorize -> split -> detect -> report -> review;
}
```
### Phase 1: Extract Function Catalog
```bash
./scripts/extract-functions.sh src/ -o catalog.json
```
Options:
- `-o FILE`: Output file (default: stdout)
- `-c N`: Lines of context to capture (default: 15)
- `-t GLOB`: File types (default: `*.ts,*.tsx,*.js,*.jsx`)
- `--include-tests`: Include test files (excluded by default)
Test files (`*.test.*`, `*.spec.*`, `__tests__/**`) are excluded by default since test utilities are less likely to be consolidation candidates.
### Phase 2: Categorize by Domain
Dispatch a **haiku** subagent using the prompt in `scripts/categorize-prompt.md`.
Insert the contents of `catalog.json` where indicated in the prompt template. Save output as `categorized.json`.
### Phase 3: Split into Categories
```bash
./scripts/prepare-category-analysis.sh categorized.json ./categories
```
Creates one JSON file per category. Only categories with 3+ functions are worth analyzing.
### Phase 4: Find Duplicates (Per Category)
For each category file in `./categories/`, dispatch an **opus** subagent using the prompt in `scripts/find-duplicates-prompt.md`.
Save each output as `./duplicates/{category}.json`.
### Phase 5: Generate Report
```bash
./scripts/generate-report.sh ./duplicates ./duplicates-report.md
```
Produces a prioritized markdown report grouped by confidence level.
### Phase 6: Human Review
Review the report. For HIGH confidence duplicates:
1. Verify the recommended survivor has tests
2. Update callers to use the survivor
3. Delete the duplicates
4. Run tests
## High-Risk Duplicate Zones
Focus extraction on these areas first - they accumulate duplicates fastest:
| Zone | Common Duplicates |
|------|-------------------|
| `utils/`, `helpers/`, `lib/` | General utilities reimplemented |
| Validation code | Same checks written multiple ways |
| Error formatting | Error-to-string conversions |
| Path manipulation | Joining, resolving, normalizing paths |
| String formatting | Case conversion, truncation, escaping |
| Date formatting | Same formats implemented repeatedly |
| API response shaping | Similar transformations for different endpoints |
## Common Mistakes
**Extracting too much**: Focus on exported functions and public methods. Internal helpers are less likely to be duplicated across files.
**Skipping the categorization step**: Going straight to duplicate detection on the full catalog produces noise. Categories focus the comparison.
**Using haiku for duplicate detection**: Haiku is cost-effective for categorization but misses subtle semantic duplicates. Use Opus for the actual duplicate analysis.
**Consolidating without tests**: Before deleting duplicates, ensure the survivor has tests covering all use cases of the deleted functions.