安装方式
手动下载安装
下载 ZIP 后解压到技能目录即可安装。若在桌面客户端 WebView中直接下载出现异常,本站会改为提示页 + 原始链接,请按页内说明操作。
下载 ZIP (shub-kubernetes-v1.0.0.zip)触发指令
/cluster-agent-swarm
跨平台安装指引
该技能声明兼容以下 1 个平台,将 ZIP 解压到对应目录即可被识别。
unzip shub-kubernetes-v1.0.0.zip -d ~/.claude/skills/
mkdir -p 创建;启用 Skill 后请重启对应 Agent 让配置生效。
使用指南
Kubernetes 工作流
围绕 Kubernetes 工作流:Deployment、Service、ConfigMap 与 kubectl 常用操作;集群版本差异见官方发行说明。 无需在每次任务前把零散英文说明手工拼进上下文,也 减少 与客户端默认行为脱节的试错;具体命令、钩子与 JSON 参数仍以 ZIP 包内 SKILL.md 为权威。下文结构与站内 MCP CLI 类专题稿相同:何时用、前置、流程、速查与故障。
何时使用
- Deployment、Service、ConfigMap 与 kubectl 常用操作
- 集群版本差异见官方发行说明
- 已获取本技能 ZIP,并准备在 Claude Code / OpenClaw 中按 SKILL.md 挂载。
- 希望用中文专题稿快速判断「该不该启用」,再深入英文 SKILL 查参数与边界。
- 需要与团队对齐同一套触发方式、目录约定或回调格式时。
前置条件
- 通用:可运行 Claude Code 或文档要求的客户端;有可读写的项目工作区(或 SKILL.md 指定的沙箱目录)。
- 权威细节:API Key / OAuth、钩子路径、环境变量以 ZIP 内 SKILL.md 为准。
典型流程
- 从 ClawHub / 站内分发获取技能 ZIP,校验版本与校验和(若提供)。
- 阅读 SKILL.md 的安装段落:目录落点、客户端类型(Claude Code / OpenClaw / 脚本)。
- 用文档中的最小示例完成第一次调用(单文件修改、单次查询或单次委派)。
- 确认工作目录、权限边界与输出路径后,再处理多文件或长耗时任务。
- 需要回调 / Webhook / 通知时,按 SKILL.md 配置端点并在测试环境先验通。
与 ZIP / SKILL.md 的关系
站内专题稿与 MCP CLI 类 oss 稿同样:概括何时用、怎么接、怎么排错;命令模板、钩子名、JSON 字段、版本矩阵一律以 ZIP 内 SKILL.md 与 ClawHub 上游为准。
命令示例(摘自包内 SKILL.md)
以下为从上游 SKILL.md(或入库正文)自动抽取的终端/脚本片段;路径、环境变量与参数以当前 ZIP 与官方说明为准。
ClawHub slug:kubernetes(安装命令以 SKILL.md / claw CLI 为准)。
# Set up session context for your environment
bash skills/orchestrator/scripts/setup-session.sh <environment> [context-name]
# Environments: dev, qa, staging, prod
# Note: prod requires human approval for all modifications
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e
# Verify the commit before installing
git clone https://github.com/kcns008/cluster-agent-swarm-skills
cd cluster-agent-swarm-skills
git checkout 91c362dba2911f7523f179e7dcc374cf4335814e
git show --stat # Review what changed
# Then install using the pinned URL above
# Orchestrator - Jarvis (task routing, coordination)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/orchestrator
# Cluster Ops - Atlas (cluster lifecycle, nodes, upgrades)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/cluster-ops
# GitOps - Flow (ArgoCD, Helm, Kustomize)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/gitops
# Security - Shield (RBAC, policies, CVEs)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/security
# Observability - Pulse (metrics, alerts, incidents)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/observability
# Artifacts - Cache (registries, SBOM, promotions)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/artifacts
# Developer Experience - Desk (namespaces, onboarding)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/developer-experience
# Clone and verify
git clone https://github.com/kcns008/cluster-agent-swarm-skills
cd cluster-agent-swarm-skills
# Checkout verified commit
git checkout 91c362dba2911f7523f179e7dcc374cf4335814e
# Verify (optional, if GPG signed)
git verify-commit 91c362dba2911f7523f179e7dcc374cf4335814e
# Review scripts BEFORE copying
# ls skills/*/scripts/
# cat skills/*/scripts/*.sh
# Copy manually reviewed scripts
cp -r skills/orchestrator ~/.claude/skills/
cp -r skills/cluster-ops ~/.claude/skills/
# ... add other skills as needed
站内入库时的触发命令(完整语义见 ZIP):
# 使用本技能时可在对话中引用或执行上述指令;完整参数与示例见下载包内 SKILL.md。
/cluster-agent-swarm
最佳实践
- 先 SKILL.md 再猜参数;站内专题稿不替代 schema 与必填字段说明。
- 委派任务时写清验收标准(命令、文件路径、测试命令),减少来回追问。
- 长任务用文档推荐的回调 / 日志落盘代替高频轮询,省 Token 也省机器负载。
- 多技能同时启用时,注意钩子加载顺序与重复工具调用(以 SKILL.md 冲突说明为准)。
调试与排错
- 打开 stderr 与客户端日志;PTY/tmux 场景同时看面板最后几十行输出。
- 参数错误时对照 SKILL.md 中的 JSON/CLI 示例(引号、转义、工作目录)。
- 网络类失败:查代理、防火墙、MCP 传输方式(stdio / HTTP / SSE)。
速查
| 动作 | 说明 |
|------|------|
| 获取技能包 | ClawHub / 站内 ZIP,核对版本 |
| 权威步骤 | 优先阅读 ZIP 内 SKILL.md |
| 首次试跑 | 使用 SKILL.md 最小示例 |
| 验收 | 对照路径、测试命令或回调负载 |
常见故障
- 无输出或立即退出 → 工作目录错误、依赖未装、或 Claude Code 未登录;按 SKILL.md 自检清单执行。
- 权限被拒绝 → 检查沙箱路径、
--permission-mode与工具白名单。 - 与简介不符 → 以英文 SKILL 与上游仓库为准,站内稿仅作结构化导读。
# Cluster Agent Swarm — Complete Platform Operations
## Runtime Requirements
This skill package provides Kubernetes/OpenShift cluster management capabilities. Credentials are **modular** - only configure what you need for your specific use case.
### Always Required
| Requirement | Description | Environment Variable |
|-------------|-------------|---------------------|
| **Kubeconfig** | Valid kubeconfig with cluster access | `KUBECONFIG` or `~/.kube/config` |
| **kubectl** | Kubernetes CLI | Must be in PATH |
### Conditional - Enable Only As Needed
| Platform | Enable If... | Credentials |
|----------|--------------|-------------|
| **AWS/EKS/ROSA** | Managing AWS-hosted Kubernetes | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
| **Azure/ARO** | Managing Azure-hosted Kubernetes | `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`, `AZURE_TENANT_ID` |
| **GCP/GKE** | Managing GCP-hosted Kubernetes | `GOOGLE_APPLICATION_CREDENTIALS` |
| **ArgoCD** | Using GitOps agent | `ARGOCD_AUTH_TOKEN`, `ARGOCD_SERVER` |
| **Vault** | Using secrets management | `VAULT_TOKEN` |
| **GitHub** | Pushing to git repositories | `GITHUB_TOKEN` |
### Session Setup
Before using the agents, you **MUST** set up a session context:
```bash
# Set up session context for your environment
bash skills/orchestrator/scripts/setup-session.sh <environment> [context-name]
# Environments: dev, qa, staging, prod
# Note: prod requires human approval for all modifications
```
### Security Considerations
- Agents operate with **least privilege** by default
- All credential access is logged
- Production modifications require human approval
- Secrets are never logged or stored in code
---
## Security Assessment - Read Before Installing
### Source Verification
- This skill pulls code from a **third-party GitHub repository**
- **Verify the source URL** before installing: `https://github.com/kcns008/cluster-agent-swarm-skills`
- **Pin to a specific version** - never use `main` branch in production:
```bash
git clone https://github.com/kcns008/cluster-agent-swarm-skills.git
cd cluster-agent-swarm-skills
git fetch --tags
git checkout v1.0.0 # Use verified release tag or commit hash
```
### Third-Party Script Execution Warning
- This is a **scripted skill** - it will write executable bash scripts to disk
- Scripts perform cluster operations including: deployments, scaling, scanning, configuration
- **Some scripts can be destructive** - review before running:
- Scripts with `-delete`, `-cleanup` in name may remove resources
- Scripts with `-promote`, `-deploy` modify cluster state
- Always test in non-production first
### Install Mechanism
- Installing via `npx skills add` downloads and executes code from GitHub
- The skill cannot verify integrity of external scripts
- **Audit all scripts locally** before running in production
- Consider maintaining a verified, offline copy of trusted scripts
- **ALWAYS PIN TO VERIFIED COMMIT HASH** for production - NEVER use floating URLs like `tree/main` or untagged branches
- Use manual git clone with verified checkout for highest security
### Persistence & Blast Radius
- Agents maintain **persistent state** across sessions via:
- `WORKING.md` - session progress tracking
- `LOGS.md` - action audit trail
- `MEMORY.md` - long-term learnings
- Agents are configured to **commit changes** to these files as part of normal operation
- This persistence increases blast radius if misused - limit repository write access if concerned
### Human Approval Enforcement
- The skill documentation **claims** human approval required for production changes
- **This is a procedural control, NOT a technical enforcement**
- Your platform **MUST enforce** an approval gate before allowing production operations
- Do not rely on agent self-restriction for production safety
### Principle of Least Privilege - Required
- **DO NOT** provide owner/root-level cloud credentials
- Create dedicated, minimal-permission service accounts for:
- Kubernetes namespace-level access (not cluster-admin)
- AWS IAM roles with limited EKS permissions
- Azure service principals with limited subscription access
- GCP service accounts with limited project permissions
- **Never provide production credentials** until you have audited the code in non-production
### Sandbox Before Production
1. Run this skill in an **isolated/non-production environment first**
2. Manually step through scripts to understand their behavior
3. Pay special attention to:
- `*-cleanup.sh` scripts - may delete resources
- `*-promote.sh` scripts - may promote artifacts
- `*-delete.sh` scripts - explicitly destructive
4. Verify no unexpected network calls to external endpoints
### Supply Chain Tools
- Scripts may download binaries (syft, cosign, trivy, etc.)
- **Only allow downloads from trusted release sources** (official GitHub releases, package managers)
- Consider curating offline toolchains if your environment requires it
### Additional Documentation
- **[OPERATIONAL_RISKS.md](OPERATIONAL_RISKS.md)** - Complete documentation of operational risks, inconsistencies, and mitigations
- **[SECURITY.md](SECURITY.md)** - Security policy, external dependencies, and verification requirements
---
This is the complete cluster-agent-swarm skill package. When you add this skill, you get
access to ALL 7 specialized agents working together as a coordinated swarm.
## Installation
### Security Warning - Read Before Installing
> ⚠️ **CRITICAL SECURITY WARNING**
>
> The installation commands below use GitHub URLs that fetch and execute code on your system.
> This is a **supply chain risk** - you must verify the repository and commit before use.
>
> **For production deployments:**
> 1. **ALWAYS** pin to a specific, verified commit hash
> 2. Review the commit: `git show <commit-hash>`
> 3. Verify GPG signatures if available: `git verify-commit <commit-hash>`
> 4. Use the manual clone method below for highest security
>
> **NEVER** use floating URLs (`tree/main`, `main` branch) in production.
### Install All Skills (Development Only)
> ⚠️ **NOT FOR PRODUCTION**: Uses floating URL without commit pinning.
```bash
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills
```
### Install All Skills (Production - Pinned)
> ✅ **RECOMMENDED**: Pins to verified commit hash.
```bash
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e
```
**Verification steps:**
```bash
# Verify the commit before installing
git clone https://github.com/kcns008/cluster-agent-swarm-skills
cd cluster-agent-swarm-skills
git checkout 91c362dba2911f7523f179e7dcc374cf4335814e
git show --stat # Review what changed
# Then install using the pinned URL above
```
### Install Individual Skills
> ⚠️ **ALWAYS PIN TO VERIFIED COMMIT** - Do not use `tree/main` in production.
```bash
# Orchestrator - Jarvis (task routing, coordination)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/orchestrator
# Cluster Ops - Atlas (cluster lifecycle, nodes, upgrades)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/cluster-ops
# GitOps - Flow (ArgoCD, Helm, Kustomize)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/gitops
# Security - Shield (RBAC, policies, CVEs)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/security
# Observability - Pulse (metrics, alerts, incidents)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/observability
# Artifacts - Cache (registries, SBOM, promotions)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/artifacts
# Developer Experience - Desk (namespaces, onboarding)
npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/tree/91c362dba2911f7523f179e7dcc374cf4335814e/skills/developer-experience
```
### Manual Installation (Highest Security)
> ✅ **MOST SECURE**: No remote code execution, full audit trail.
```bash
# Clone and verify
git clone https://github.com/kcns008/cluster-agent-swarm-skills
cd cluster-agent-swarm-skills
# Checkout verified commit
git checkout 91c362dba2911f7523f179e7dcc374cf4335814e
# Verify (optional, if GPG signed)
git verify-commit 91c362dba2911f7523f179e7dcc374cf4335814e
# Review scripts BEFORE copying
# ls skills/*/scripts/
# cat skills/*/scripts/*.sh
# Copy manually reviewed scripts
cp -r skills/orchestrator ~/.claude/skills/
cp -r skills/cluster-ops ~/.claude/skills/
# ... add other skills as needed
```
---
## The Swarm — Agent Roster
| Agent | Code Name | Session Key | Domain |
|-------|-----------|-------------|--------|
| Orchestrator | Jarvis | `agent:platform:orchestrator` | Task routing, coordination, standups |
| Cluster Ops | Atlas | `agent:platform:cluster-ops` | Cluster lifecycle, nodes, upgrades |
| GitOps | Flow | `agent:platform:gitops` | ArgoCD, Helm, Kustomize, deploys |
| Security | Shield | `agent:platform:security` | RBAC, policies, secrets, scanning |
| Observability | Pulse | `agent:platform:observability` | Metrics, logs, alerts, incidents |
| Artifacts | Cache | `agent:platform:artifacts` | Registries, SBOM, promotion, CVEs |
| Developer Experience | Desk | `agent:platform:developer-experience` | Namespaces, onboarding, support |
---
## Agent Capabilities Summary
### What Agents CAN Do
- Read cluster state (`kubectl get`, `kubectl describe`, `oc get`)
- Deploy via GitOps (`argocd app sync`, Flux reconciliation)
- Create documentation and reports
- Investigate and triage incidents
- Provision standard resources (namespaces, quotas, RBAC)
- Run health checks and audits
- Scan images and generate SBOMs
- Query metrics and logs
- Execute pre-approved runbooks
### What Agents CANNOT Do (Human-in-the-Loop Required)
- Delete production resources (`kubectl delete` in prod)
- Modify cluster-wide policies (NetworkPolicy, OPA, Kyverno cluster policies)
- Make direct changes to secrets without rotation workflow
- Modify network routes or service mesh configuration
- Scale beyond defined resource limits
- Perform irreversible cluster upgrades
- Approve production deployments (can prepare, human approves)
- Change RBAC at cluster-admin level
---
## Communication Patterns
### @Mentions
Agents communicate via @mentions in shared task comments:
```
@Shield Please review the RBAC for payment-service v3.2 before I sync.
@Pulse Is the CPU spike related to the deployment or external traffic?
@Atlas The staging cluster needs 2 more worker nodes.
```
### Thread Subscriptions
- Commenting on a task → auto-subscribe
- Being @mentioned → auto-subscribe
- Being assigned → auto-subscribe
- Once subscribed → receive ALL future comments on heartbeat
### Escalation Path
1. Agent detects issue
2. Agent attempts resolution within guardrails
3. If blocked → @mention another agent or escalate to human
4. P1 incidents → all relevant agents auto-notified
---
## Heartbeat Schedule
Agents wake on staggered 5-minute intervals:
```
*/5 * * * * Atlas (Cluster Ops - needs fast response for incidents)
*/5 * * * * Pulse (Observability - needs fast response for alerts)
*/5 * * * * Shield (Security - fast response for CVEs and threats)
*/10 * * * * Flow (GitOps - deployments can wait a few minutes)
*/10 * * * * Cache (Artifacts - promotions are scheduled)
*/15 * * * * Desk (DevEx - developer requests aren't usually urgent)
*/15 * * * * Orchestrator (Coordination - overview and standups)
```
---
## Key Principles
- **Roles over genericism** — Each agent has a defined SOUL with exactly who they are
- **Files over mental notes** — Only files persist between sessions
- **Staggered schedules** — Don't wake all agents at once
- **Shared context** — One source of truth for tasks and communication
- **Heartbeat, not always-on** — Balance responsiveness with cost
- **Human-in-the-loop** — Critical actions require approval
- **Guardrails over freedom** — Define what agents can and cannot do
- **Audit everything** — Every action logged to activity feed
- **Reliability first** — System stability always wins over new features
- **Security by default** — Deny access, approve by exception
---
## Detailed Agent Capabilities
### Orchestrator (Jarvis)
- Task routing: determining which agent should handle which request
- Workflow orchestration: coordinating multi-agent operations
- Daily standups: compiling swarm-wide status reports
- Priority management: determining urgency and sequencing of work
- Cross-agent communication: facilitating collaboration
- Accountability: tracking what was promised vs what was delivered
### Cluster Ops (Atlas)
- OpenShift/Kubernetes cluster operations (upgrades, scaling, patching)
- Node pool management and autoscaling
- Resource quota management and capacity planning
- Network troubleshooting (OVN-Kubernetes, Cilium, Calico)
- Storage class management and PVC/CSI issues
- etcd backup, restore, and health monitoring
- Multi-platform expertise (OCP, EKS, AKS, GKE, ROSA, ARO)
### GitOps (Flow)
- ArgoCD application management (sync, rollback, sync waves, hooks)
- Helm chart development, debugging, and templating
- Kustomize overlays and patch generation
- ApplicationSet templates for multi-cluster deployments
- Deployment strategy management (canary, blue-green, rolling)
- Git repository management and branching strategies
- Drift detection and remediation
- Secrets management integration (Vault, Sealed Secrets, External Secrets)
### Security (Shield)
- RBAC audit and management
- NetworkPolicy review and enforcement
- Security policy validation (OPA, Kyverno)
- Vulnerability scanning (image scanning, CVE triage)
- Secret rotation workflows
- Security incident investigation
- Compliance reporting
### Observability (Pulse)
- Prometheus/Grafana metric queries
- Log aggregation and search (Loki, Elasticsearch)
- Alert triage and investigation
- SLO tracking and error budget monitoring
- Incident response coordination
- Dashboards and visualization
- Telemetry pipeline troubleshooting
### Artifacts (Cache)
- Container registry management
- Image scanning and CVE analysis
- SBOM generation and tracking
- Artifact promotion workflows
- Version management
- Registry caching and proxying
### Developer Experience (Desk)
- Namespace provisioning
- Resource quota and limit range management
- Developer onboarding
- Template generation
- Developer support and troubleshooting
- Documentation generation
---
## File Structure
```
cluster-agent-swarm-skills/
├── SKILL.md # This file - combined swarm
├── AGENTS.md # Swarm configuration and protocols
├── skills/
│ ├── orchestrator/ # Jarvis - task routing
│ │ └── SKILL.md
│ ├── cluster-ops/ # Atlas - cluster operations
│ │ └── SKILL.md
│ ├── gitops/ # Flow - GitOps
│ │ └── SKILL.md
│ ├── security/ # Shield - security
│ │ └── SKILL.md
│ ├── observability/ # Pulse - monitoring
│ │ └── SKILL.md
│ ├── artifacts/ # Cache - artifacts
│ │ └── SKILL.md
│ └── developer-experience/ # Desk - DevEx
│ └── SKILL.md
├── scripts/ # Shared scripts
└── references/ # Shared documentation
```
---
## Reference Documentation
For detailed capabilities of each agent, refer to individual SKILL.md files:
- `skills/orchestrator/SKILL.md` - Full Orchestrator documentation
- `skills/cluster-ops/SKILL.md` - Full Cluster Ops documentation
- `skills/gitops/SKILL.md` - Full GitOps documentation
- `skills/security/SKILL.md` - Full Security documentation
- `skills/observability/SKILL.md` - Full Observability documentation
- `skills/artifacts/SKILL.md` - Full Artifacts documentation
- `skills/developer-experience/SKILL.md` - Full Developer Experience documentation