查找语义重复函数

审计代码库中语义层面的重复实现（功能相同或高度相似的函数与逻辑块）。

v1.0.0

作者 / 来源

github-obra

在来源站打开

登录后收藏登录后加入合集

安装方式

CLI 安装（推荐）

claw install oss-superpowers-lab-finding-duplicate-functions

需要安装 CLAW CLI

手动下载安装

下载 ZIP 后解压到技能目录即可安装。若在桌面客户端 WebView中直接下载出现异常，本站会改为提示页 + 原始链接，请按页内说明操作。

下载 ZIP (oss-superpowers-lab-finding-duplicate-functions-v1.0.0.zip)

触发指令

/finding-duplicate-fu

跨平台安装指引

该技能声明兼容以下 1 个平台，将 ZIP 解压到对应目录即可被识别。

支持矩阵

Claude Code Coding Agent

macOS / Linux：~/.claude/skills/

Windows：%USERPROFILE%\.claude\skills\

unzip oss-superpowers-lab-finding-duplicate-functions-v1.0.0.zip -d ~/.claude/skills/

目录不存在时请先 mkdir -p 创建；启用 Skill 后请重启对应 Agent 让配置生效。

使用指南

查找「意图重复」的函数

审计代码库中 语义重复：名称或实现不同、但 干同一件事 的函数。LLM 参与的项目里常 重复造轮子；传统 jscpd 只抓 文本相似，漏掉「同意图不同写法」。

方法： 经典抽取 + LLM 按意图聚类/判重。

何时使用

多人（人或 LLM）长期堆代码，工具函数泛滥
怀疑同一逻辑被实现多次
大重构前想合并重复
jscpd 已扫过语法重复，要做 语义层

速查

| 阶段 | 工具 | 模型 | 产出 | |------|------|------|------| | 1 抽取 | scripts/extract-functions.sh | - | catalog.json | | 2 分类 | scripts/categorize-prompt.md | haiku | categorized.json | | 3 拆分 | scripts/prepare-category-analysis.sh | - | categories/*.json | | 4 判重 | scripts/find-duplicates-prompt.md | 每类 opus | duplicates/*.json | | 5 报告 | scripts/generate-report.sh | - | report.md | | 6 人工 | - | 人 | 合并/删重复 |

阶段说明

1 抽取：

./scripts/extract-functions.sh src/ -o catalog.json

常用选项：-o 输出、-c 上下文行数、-t 文件 glob、--include-tests（默认排除测试）。

2 分类： 用 haiku 子代理 + categorize-prompt.md，输入 catalog.json，输出 categorized.json。

3 拆分： ./scripts/prepare-category-analysis.sh categorized.json ./categories — 每类一文件；少于 3 个函数 的类可跳过深入分析。

4 判重： 对 categories/ 下每个文件派 opus 子代理 + find-duplicates-prompt.md，输出到 duplicates/{category}.json。

5 报告： ./scripts/generate-report.sh ./duplicates ./duplicates-report.md。

6 人工： 高置信条目：确认 保留函数有测试 → 改调用方 → 删重复 → 跑测试。

高风险重复区

utils/、helpers/、校验、错误格式化、路径处理、字符串/日期格式化、API 响应塑形等。

常见错误

抽太多内部小函数；跳过分类 直接全库判重（噪声大）；用 haiku 做最终判重（易漏微妙语义）；无测试就删重复。

# Finding Duplicate-Intent Functions

## Overview

LLM-generated codebases accumulate semantic duplicates: functions that serve the same purpose but were implemented independently. Classical copy-paste detectors (jscpd) find syntactic duplicates but miss "same intent, different implementation."

This skill uses a two-phase approach: classical extraction followed by LLM-powered intent clustering.

## When to Use

- Codebase has grown organically with multiple contributors (human or LLM)
- You suspect utility functions have been reimplemented multiple times
- Before major refactoring to identify consolidation opportunities
- After jscpd has been run and syntactic duplicates are already handled

## Quick Reference

| Phase | Tool | Model | Output |
|-------|------|-------|--------|
| 1. Extract | `scripts/extract-functions.sh` | - | `catalog.json` |
| 2. Categorize | `scripts/categorize-prompt.md` | haiku | `categorized.json` |
| 3. Split | `scripts/prepare-category-analysis.sh` | - | `categories/*.json` |
| 4. Detect | `scripts/find-duplicates-prompt.md` | opus | `duplicates/*.json` |
| 5. Report | `scripts/generate-report.sh` | - | `report.md` |

## Process

```dot
digraph duplicate_detection {
  rankdir=TB;
  node [shape=box];

  extract [label="1. Extract function catalog\n./scripts/extract-functions.sh"];
  categorize [label="2. Categorize by domain\n(haiku subagent)"];
  split [label="3. Split into categories\n./scripts/prepare-category-analysis.sh"];
  detect [label="4. Find duplicates per category\n(opus subagent per category)"];
  report [label="5. Generate report\n./scripts/generate-report.sh"];
  review [label="6. Human review & consolidate"];

  extract -> categorize -> split -> detect -> report -> review;
}
```

### Phase 1: Extract Function Catalog

```bash
./scripts/extract-functions.sh src/ -o catalog.json
```

Options:
- `-o FILE`: Output file (default: stdout)
- `-c N`: Lines of context to capture (default: 15)
- `-t GLOB`: File types (default: `*.ts,*.tsx,*.js,*.jsx`)
- `--include-tests`: Include test files (excluded by default)

Test files (`*.test.*`, `*.spec.*`, `__tests__/**`) are excluded by default since test utilities are less likely to be consolidation candidates.

### Phase 2: Categorize by Domain

Dispatch a **haiku** subagent using the prompt in `scripts/categorize-prompt.md`.

Insert the contents of `catalog.json` where indicated in the prompt template. Save output as `categorized.json`.

### Phase 3: Split into Categories

```bash
./scripts/prepare-category-analysis.sh categorized.json ./categories
```

Creates one JSON file per category. Only categories with 3+ functions are worth analyzing.

### Phase 4: Find Duplicates (Per Category)

For each category file in `./categories/`, dispatch an **opus** subagent using the prompt in `scripts/find-duplicates-prompt.md`.

Save each output as `./duplicates/{category}.json`.

### Phase 5: Generate Report

```bash
./scripts/generate-report.sh ./duplicates ./duplicates-report.md
```

Produces a prioritized markdown report grouped by confidence level.

### Phase 6: Human Review

Review the report. For HIGH confidence duplicates:
1. Verify the recommended survivor has tests
2. Update callers to use the survivor
3. Delete the duplicates
4. Run tests

## High-Risk Duplicate Zones

Focus extraction on these areas first - they accumulate duplicates fastest:

| Zone | Common Duplicates |
|------|-------------------|
| `utils/`, `helpers/`, `lib/` | General utilities reimplemented |
| Validation code | Same checks written multiple ways |
| Error formatting | Error-to-string conversions |
| Path manipulation | Joining, resolving, normalizing paths |
| String formatting | Case conversion, truncation, escaping |
| Date formatting | Same formats implemented repeatedly |
| API response shaping | Similar transformations for different endpoints |

## Common Mistakes

**Extracting too much**: Focus on exported functions and public methods. Internal helpers are less likely to be duplicated across files.

**Skipping the categorization step**: Going straight to duplicate detection on the full catalog produces noise. Categories focus the comparison.

**Using haiku for duplicate detection**: Haiku is cost-effective for categorization but misses subtle semantic duplicates. Use Opus for the actual duplicate analysis.

**Consolidating without tests**: Before deleting duplicates, ensure the survivor has tests covering all use cases of the deleted functions.