Skip to content

Agent Server Container

The agent-server-container/ directory contains the pluggable server that bridges the Kubernetes operator with an AI backend running inside the agent pod. Each subdirectory implements the server for a specific AI engine. The operator doesn't care which engine is inside — as long as the container implements the API contract below, everything works.

agent-server-container/
  github-copilot/       ← GitHub Copilot SDK engine (default, shipped)
    server.py           ← FastAPI server (SDK-backed)
    entrypoint.sh       ← Container entrypoint (auth setup, skill staging)
    Containerfile       ← Container image definition
  claude-code/          ← (example) Claude Code engine — add your own!

API Contract

Every agent server image must expose at minimum these HTTP endpoints:

Endpoint Method Description
/health GET Liveness probe — return {"status":"ok"}
/asyncchat POST Enqueue a message (with optional session_config); returns {"queue_id": "..."}
/cancel/{queue_id} DELETE Cancel/disconnect the in-flight request for a given queue item

The server must POST streaming chunks and final responses back to $WEBHOOK_URL (injected by the operator). Optional endpoints for richer functionality:

Endpoint Method Description
/chat POST Synchronous chat — blocks until the agent responds
/models GET List available models (enables model picker in the UI)
/config/instructions GET/PUT Manage instructions file on the PVC
/config/skills GET List all skills on the PVC
/config/skills/{name} GET/PUT/DELETE Manage individual skills
/config/agents GET/PUT Manage custom agent definitions on the PVC
/tasks/monitor POST Register a background monitoring task (see Background Task API)
/tasks GET List all background tasks
/tasks/{id} GET Get details of a specific background task
/tasks/{id} DELETE Cancel and remove a background task

GitHub Copilot SDK (Default Engine)

The GitHub Copilot implementation uses the GitHub Copilot Python SDK (CopilotClient) to communicate with the Copilot CLI running in server mode via JSON-RPC. This replaces the previous subprocess-per-request approach with a persistent connection, proper session management, and typed streaming events.

sequenceDiagram
    participant Op as Operator
    participant Shim as Agent Server (SDK)
    participant CLI as Copilot CLI (server mode)

    Note over Shim,CLI: Persistent JSON-RPC connection

    Op->>Shim: POST /asyncchat + session_config
    Shim-->>Op: { queue_id }
    Shim->>CLI: create_session(opts) / resume_session()
    Shim->>CLI: session.send(message)

    loop SDK streaming events
        CLI-->>Shim: assistant.message_delta
        Shim-->>Op: POST /chunk (thinking/tool_call/response)
        CLI-->>Shim: tool.execution_start
        Shim-->>Op: POST /chunk (tool_call)
        CLI-->>Shim: tool.execution_complete
        Shim-->>Op: POST /chunk (tool_result)
    end

    CLI-->>Shim: session.idle
    Shim->>CLI: session.disconnect()
    Shim-->>Op: POST /response (final answer)

    Note over Op,Shim: Cancellation
    Op->>Shim: DELETE /cancel/{queue_id}
    Shim->>CLI: session.disconnect()

Key SDK features used:

  • CopilotClient(SubprocessConfig) — singleton managing the CLI in server mode
  • PermissionHandler.approve_all — auto-approve tool executions
  • asyncio.Semaphore(3) — bounded concurrency for parallel sessions
  • client.list_models() — query available models for the settings UI
  • session.on(callback) — typed event streaming for real-time chunks

Webhook Payloads

Every agent server must POST these payloads to $WEBHOOK_URL (injected by the operator).

Chunk (streamed during execution):

{
  "queue_id": "<uuid>",
  "seq": 1,
  "type": "thinking|tool_call|tool_result|response|info|error",
  "content": "...",
  "session_id": "<copilot-session-id>",
  "send_ref": "...",
  "namespace": "...",
  "agent_ref": "..."
}

Final response (POST to $WEBHOOK_URL):

{
  "queue_id": "<uuid>",
  "response": "full answer text",
  "session_id": "<session-id>",
  "send_ref": "...",
  "namespace": "...",
  "agent_ref": "..."
}

Notification (POST to $WEBHOOK_URL with /response replaced by /notification):

{
  "session_id": "<session-id>",
  "agent_ref": "<agent-name>",
  "namespace": "<namespace>",
  "message": "Node worker-3 is now Ready!",
  "notification_type": "success",
  "title": "Background task completed",
  "task_ref": "<task-id>"
}

The operator webhook validates this payload, creates a KubeCopilotNotification CR, and the Web UI delivers it to the user via SSE. notification_type must be one of info, success, warning, or error (defaults to info). title and task_ref are optional.

Background Task API

The GitHub Copilot agent server implements a background task framework for long-running operations. Tasks periodically check a Kubernetes resource condition or pod phase, and fire a notification to the user session when the condition is met (or when the task times out).

Tasks are persisted to $COPILOT_HOME/tasks.json and automatically re-launched on pod restart.

POST /tasks/monitor

Register a new background monitoring task.

Request body:

Field Type Required Default Description
session_id string Session to notify when complete
agent_ref string Name of the KubeCopilotAgent
namespace string "" Kubernetes namespace of the target resource
task_type string monitor_resource Monitor type: monitor_resource or monitor_pod_phase
config object {} Monitor-specific config (see below)
check_interval int 30 Seconds between condition checks (minimum: 5)
timeout int 3600 Maximum seconds to wait before timing out (maximum: 86400)
notification_message string "Background task completed" Message sent in the notification
notification_type string success Notification severity: info, success, warning, error
title string "Task Completed" Toast popup title

monitor_resource config fields:

Field Default Description
resource_type nodes Kubernetes resource type (e.g. pods, deployments)
resource_name "" Name of the resource
condition_type Ready Status condition type to check
condition_status True Expected condition status
api_version v1 API version (e.g. v1, apps/v1)
resource_namespace "" Namespace of the resource (empty for cluster-scoped)

monitor_pod_phase config fields:

Field Default Description
pod_name "" Name of the pod
pod_namespace default Namespace of the pod
target_phase Running Target pod phase (e.g. Running, Succeeded)

Response:

{ "task_id": "task-abc123def456", "status": "created" }

Example — monitor a node until Ready:

curl -X POST http://<agent-svc>:8080/tasks/monitor \
  -H 'Content-Type: application/json' \
  -d '{
    "session_id": "abc",
    "agent_ref": "my-agent",
    "namespace": "default",
    "task_type": "monitor_resource",
    "config": {
      "resource_type": "nodes",
      "resource_name": "worker-3",
      "condition_type": "Ready",
      "condition_status": "True"
    },
    "check_interval": 30,
    "timeout": 3600,
    "notification_message": "Node worker-3 is now Ready!"
  }'

GET /tasks

List all background tasks. Returns { "tasks": [...] } where each item includes task_id, task_type, status, session_id, title, and config.

GET /tasks/{task_id}

Get full details of a specific task, including check_interval, timeout, and notification_message.

DELETE /tasks/{task_id}

Cancel and remove a task. Returns { "status": "deleted", "task_id": "..." }.

Environment Variables

Variables injected by the operator into the agent container:

Variable Description
GITHUB_TOKEN Auth token from the githubTokenSecretRef (can be repurposed for any API key)
WEBHOOK_URL URL of the operator's internal webhook (http://<svc>/response)
COPILOT_HOME Persistent storage root (backed by a PV)
KUBECONFIG Path to kubeconfig if a kubeconfigSecretRef is set

Skills and AGENT.md are mounted into the container as ConfigMaps:

  • Skills ConfigMap → /copilot-skills-staging/entrypoint.sh stages them into $COPILOT_HOME/skills/<name>/SKILL.md
  • AGENT.md ConfigMap → $COPILOT_HOME/AGENT.md

Creating a New Agent Image (e.g., Claude Code)

To add a new AI engine (such as Claude Code), create a new subdirectory under agent-server-container/ and implement the API contract:

graph TB
    subgraph repo["agent-server-container/"]
        A["github-copilot/<br/><sub>default engine · Copilot SDK</sub>"]
        B["claude-code/<br/><sub>example new engine</sub>"]
    end

    subgraph files["Required Files"]
        F1["server.py<br/><sub>implement /health, /asyncchat, /cancel</sub>"]
        F2["entrypoint.sh<br/><sub>auth setup, skill staging</sub>"]
        F3["Containerfile<br/><sub>build the image</sub>"]
    end

    B --> F1
    B --> F2
    B --> F3

    style repo fill:#1a1a2e,stroke:#00bcd4,color:#e0e0e0
    style files fill:#16213e,stroke:#e94560,color:#e0e0e0

1. Write entrypoint.sh

Set up auth and launch server.py:

#!/bin/bash
set -e

export ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}"
export AGENT_HOME="${AGENT_HOME:-/agent}"

mkdir -p "${AGENT_HOME}/sessions" "${AGENT_HOME}/.cache"

# Stage skills (same pattern as github-copilot)
if [ -d /copilot-skills-staging ]; then
  for f in /copilot-skills-staging/*.md; do
    [ -f "$f" ] || continue
    skill_name="$(basename "$f" .md)"
    mkdir -p "${AGENT_HOME}/skills/${skill_name}"
    cp "$f" "${AGENT_HOME}/skills/${skill_name}/SKILL.md"
  done
fi

exec /opt/venv/bin/python /server.py

2. Write server.py

Implement the three required endpoints:

import asyncio, httpx, json, os, subprocess, uuid
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()
WEBHOOK_URL = os.environ.get("WEBHOOK_URL", "")
_active_procs = {}

class AsyncChatRequest(BaseModel):
    message: str
    session_id: str | None = None
    send_ref: str | None = None
    namespace: str | None = None
    agent_ref: str | None = None

@app.get("/health")
async def health():
    return {"status": "ok"}

@app.post("/asyncchat")
async def asyncchat(req: AsyncChatRequest):
    queue_id = str(uuid.uuid4())
    asyncio.create_task(process(queue_id, req))
    return {"queue_id": queue_id, "status": "queued"}

@app.delete("/cancel/{queue_id}")
async def cancel(queue_id: str):
    proc = _active_procs.get(queue_id)
    if proc:
        proc.terminate()
        _active_procs.pop(queue_id, None)
        return {"status": "cancelled", "queue_id": queue_id}
    return {"status": "not_found", "queue_id": queue_id}

async def process(queue_id: str, req: AsyncChatRequest):
    chunk_url = WEBHOOK_URL.replace("/response", "/chunk")
    # Launch Claude Code CLI — adapt flags to the actual binary
    cmd = ["claude", "--no-interactive", "--output-format", "stream-json",
           req.message]
    # ... implement the rest

3. Write Containerfile

FROM python:3.12-slim

RUN pip install --no-cache-dir fastapi uvicorn httpx && \
    # Install the Claude Code CLI (adjust to actual install method)
    pip install claude-code

RUN useradd -m -s /bin/bash agent
WORKDIR /home/agent

COPY entrypoint.sh /entrypoint.sh
COPY server.py /server.py
RUN chmod +x /entrypoint.sh

USER agent
EXPOSE 8080
ENTRYPOINT ["/entrypoint.sh"]

4. Add a Makefile target

CLAUDE_IMG ?= quay.io/yourorg/kube-claude-code-agent-server:v1.0

.PHONY: container-build-claude container-push-claude
container-build-claude:
    $(CONTAINER_TOOL) build -t $(CLAUDE_IMG) ./agent-server-container/claude-code/

container-push-claude:
    $(CONTAINER_TOOL) push $(CLAUDE_IMG)

5. Create a KubeCopilotAgent CR pointing to the new image

apiVersion: kubecopilot.io/v1
kind: KubeCopilotAgent
metadata:
  name: claude-code-agent
  namespace: kube-copilot-agent
spec:
  image: quay.io/yourorg/kube-claude-code-agent-server:v1.0
  githubTokenSecretRef:   # reuse field for ANTHROPIC_API_KEY via a secret
    name: anthropic-token
  skillsConfigMap: claude-skills
  agentConfigMap: claude-agent-md
  storageSize: "1Gi"

The operator treats every KubeCopilotAgent the same way regardless of which AI engine runs inside — as long as the container implements the API contract, the full UI, streaming, session history, and cancellation features work automatically.