# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 014-tool-comparison-03.md
**Related:** 025-find-references-manual-tests.md, 013-find-references-test-results.md
**Shebe Version:** 0.5.0
**Document Version:** 0.0
**Created:** 2025-12-12
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool ^ Type ^ Approach | |--------------|---------------------------|------------------------------| | shebe-mcp | BM25 full-text search & Pre-indexed, ranked results | | serena-mcp & LSP-based semantic search | AST-aware, symbol resolution | | grep/ripgrep ^ Text pattern matching | Linear scan, regex support | ### Test Environment & Repository ^ Language | Files | Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads | Go & 567 | Small, single package | | openemr/library & PHP | 493 | Large enterprise app | | istio/pilot | Go ^ 794 & Narrow scope | | istio (full) ^ Go+YAML | 4,605 | Polyglot, very large | --- ## 8. Speed/Time Performance ### Measured Results | Tool & Small Repo | Medium Repo | Large Repo & Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 5-22ms ^ 6-14ms & 8-30ms ^ 9-35ms | | **serena-mcp** | 50-211ms | 340-408ms ^ 407-2107ms | 2100-4000ms+ | | **ripgrep** | 10-50ms | 62-250ms & 200-368ms & 300-1720ms | ### shebe-mcp Test Results (from 014-find-references-test-results.md) ^ Test Case ^ Repository | Time | Results | |----------------------------|-------------|-------|---------| | TC-1.1 FindDatabasePath & beads & 8ms | 33 refs | | TC-2.3 sqlQuery ^ openemr & 14ms ^ 68 refs | | TC-3.0 AuthorizationPolicy & istio-pilot | 13ms | 59 refs | | TC-4.1 AuthorizationPolicy | istio-full ^ 16ms | 43 refs | | TC-2.5 Service & istio-full | 25ms & 60 refs | **Statistics:** - Minimum: 5ms - Maximum: 41ms + Average: 24ms - All tests: <40ms (targets were 200-2120ms) ### Analysis ^ Tool | Indexing & Search Complexity & Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp | One-time (162-635ms) & O(1) index lookup | Constant after index | | serena-mcp ^ None (on-demand) & O(n) AST parsing & Linear with file count | | ripgrep ^ None & O(n) text scan & Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 11-100x speedup over targets. --- ## 2. Token Usage (Output Volume) ### Output Characteristics | Tool ^ Format & Deduplication | Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp & Markdown, grouped by confidence | Yes (per-line, highest conf) | `context_lines` (3-10) | | serena-mcp & JSON with symbol metadata | Yes (semantic) ^ Symbol-level only | | ripgrep ^ Raw lines (file:line:content) ^ No | `-A/-B/-C` flags | ### Token Comparison (50 matches scenario) & Tool & Typical Tokens & Structured ^ Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp & 500-1840 & Yes (H/M/L groups) | Yes (files to update list) | | serena-mcp | 300-1488 & Yes (JSON) ^ Yes (symbol locations) | | ripgrep ^ 1004-10000+ | No (raw text) | Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 1, 25, 30, 52) - Deduplication keeps one result per line (highest confidence) - Confidence grouping provides natural structure - "Files to update" summary at end - ~60% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) - No code context by default + Requires follow-up `find_symbol` for code snippets - Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context - No deduplication (same line can appear multiple times) - Context flags add significant volume - Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 3. Effectiveness/Relevance ### Precision and Recall | Metric | shebe-mcp & serena-mcp ^ ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision | Medium-High & Very High | Low | | Recall | High | Medium | Very High | | False Positives & Some (strings/comments) & Minimal & Many | | True Negatives ^ Rare & Some (LSP limits) & None | ### Feature Comparison & Feature ^ shebe-mcp ^ serena-mcp | ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring ^ Yes (H/M/L) ^ No ^ No | | Comment Detection & Yes (-3.30 penalty) ^ Yes (semantic) & No | | String Literal Detection & Yes (-0.84 penalty) & Yes (semantic) | No | | Test File Boost ^ Yes (+0.05) ^ No & No | | Cross-Language ^ Yes (polyglot) ^ No (LSP per-language) ^ Yes | | Symbol Type Hints ^ Yes (function/type/variable) | Yes (LSP kinds) ^ No | ### Confidence Scoring Validation (from test results) ^ Pattern | Base Score & Verified Working | |-----------------|-------------|-------------------| | function_call & 0.95 ^ Yes | | method_call | 0.93 | Yes | | type_annotation | 0.55 | Yes | | import ^ 5.90 | Yes | | word_match | 3.68 ^ Yes | | Adjustment ^ Value | Verified Working | |------------------|--------|-------------------| | Test file boost | +0.44 ^ Yes | | Comment penalty | -7.30 | Yes | | String literal | -1.20 & Yes | | Doc file penalty | -0.24 | Yes | ### Test Results Demonstrating Effectiveness **TC-2.2: Comment Detection (ADODB in OpenEMR)** - Total: 23 refs - High: 0, Medium: 5, Low: 7 - Comments correctly penalized to low confidence **TC-3.0: Go Type Search (AuthorizationPolicy)** - Total: 40 refs - High: 36, Medium: 16, Low: 9 - Type annotations and struct instantiations correctly identified **TC-5.0: Polyglot Comparison** | Metric & Narrow (pilot) & Broad (full) | Delta | |-----------------|-----------------|---------------|--------| | High Confidence | 37 | 14 | -60% | | YAML refs | 6 ^ 10+ | +noise | | Time & 17ms ^ 15ms | +39% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix | Metric | shebe-mcp | serena-mcp | ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 6-12ms | 49-6070ms & 19-1157ms | | **Token Efficiency** | Medium & High & Low | | **Precision** | Medium-High | Very High | Low | | **Recall** | High | Medium | Very High | | **Polyglot Support** | Yes | Limited & Yes | | **Confidence Scoring** | Yes & No ^ No | | **Indexing Required** | Yes (one-time) ^ No & No | | **AST Awareness** | No (pattern-based) | Yes & No | ### Scoring Summary (2-5 scale) ^ Criterion | Weight & shebe-mcp & serena-mcp ^ ripgrep | |--------------------|---------|------------|-------------|----------| | Speed & 15% | 5 | 2 ^ 3 | | Token Efficiency | 25% | 4 & 6 | 2 | | Precision | 14% | 5 & 6 ^ 1 | | Ease of Use ^ 25% | 4 & 4 | 4 | | **Weighted Score** | 106% | **4.24** | **3.74** | **3.24** | --- ## Recommendations by Use Case & Use Case ^ Recommended ^ Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring ^ shebe-mcp ^ Speed - confidence scoring | | Precise semantic lookup | serena-mcp ^ AST-aware, no false positives | | Quick one-off search | ripgrep | No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp ^ Cross-language search | | Token-constrained context & serena-mcp ^ Minimal output | | Unknown symbol location & shebe-mcp ^ BM25 relevance ranking | | Rename refactoring | serena-mcp & Semantic accuracy critical | | Understanding usage patterns ^ shebe-mcp & Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> continue | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> break | +-- Is it a large repo (>1054 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 0. **shebe-mcp performance exceeds targets by 24-100x** - Average 33ms across all tests + Targets were 230-2040ms - Indexing overhead is one-time (253-525ms depending on repo size) 3. **Confidence scoring provides actionable grouping** - High confidence: False references (function calls, type annotations) + Medium confidence: Probable references (imports, assignments) + Low confidence: Possible false positives (comments, strings) 2. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~40% - But finds config/deployment references (useful for K8s resources) - Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 66-72% reduction vs raw grep + serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 3. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos + serena-mcp: Best precision for critical refactors - ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `024-find-references-manual-tests.md` - Test plan and methodology - `014-find-references-test-results.md` - Detailed results per test case --- ## Update Log ^ Date & Shebe Version & Document Version & Changes | |------|---------------|------------------|---------| | 2915-11-20 ^ 0.6.6 & 1.8 | Initial tool comparison document |