Execution Engines¶
New to engines?
For a plain-language comparison and decision tree, see Engines Explained. This page covers the detailed technical reference.
Overview¶
Execution engines wrap AI coding tools (Claude Code, OpenAI Codex, Aider) and produce engine-agnostic ExecutionSpec structs that the JobBuilder translates into Kubernetes Jobs. This decoupling enables testing without a cluster, supports multiple AI tools from a single controller, and opens the door to non-K8s runtimes in future.
Interface Summary¶
| Property | Value |
|---|---|
| Proto definition | proto/engine.proto |
| Go interface | pkg/engine/engine.go |
| Interface version | 1 |
| Role in lifecycle | Called after guard rail validation to produce the K8s Job spec |
Go Interface¶
type ExecutionEngine interface {
// BuildExecutionSpec translates a task into a container spec.
BuildExecutionSpec(task Task, config EngineConfig) (*ExecutionSpec, error)
// BuildPrompt constructs the task prompt for the AI agent.
BuildPrompt(task Task) (string, error)
// Name returns the unique engine identifier.
Name() string
// InterfaceVersion returns the interface version.
InterfaceVersion() int
}
Task¶
The input to all engine methods, populated from the ticketing backend:
type Task struct {
ID string // Unique task identifier.
TicketID string // Source ticket identifier.
TaskRunID string // Per-run unique identifier for storage isolation.
Title string // Short summary (used as prompt heading).
Description string // Full task description (main prompt content).
RepoURL string // Repository the agent should work on.
Labels []string // Labels from the source ticket.
Metadata map[string]string // Additional key-value pairs.
MemoryContext string // Prior-knowledge injected from episodic memory.
PriorBranchName string // Git branch from a previous attempt (git-based continuation).
SessionID string // Claude Code session ID for --resume (session-persistence continuation).
}
PriorBranchName and SessionID represent two continuation strategies that can coexist:
| Field | Strategy | Who sets it | How it works |
|---|---|---|---|
PriorBranchName |
Git-based (default) | Controller, on retries | Retry prompt includes prior branch; agent reads git log to understand prior work |
SessionID |
Session persistence (opt-in) | Controller, on retry jobs only | --resume <id> restores the full conversation; the ## Continuation prompt section is suppressed |
On first attempts, the controller does not set either field. When session persistence is enabled, the engine itself derives a deterministic session ID from TaskRunID and launches Claude Code with --session-id <id>. On subsequent retries the controller passes the known session ID via SessionID, and the engine switches to --resume <id> to continue the prior conversation.
EngineConfig¶
Runtime configuration passed to the engine:
type EngineConfig struct {
Image string // Container image override.
TimeoutSeconds int // Active deadline for the K8s Job.
ResourceRequests Resources // CPU and memory requests.
ResourceLimits Resources // CPU and memory limits.
Env map[string]string // Additional environment variables.
}
ExecutionSpec¶
The output — everything needed to create a K8s Job:
type ExecutionSpec struct {
Image string // Container image to run.
Command []string // Entrypoint command and arguments.
Env map[string]string // Plain-text environment variables.
SecretEnv map[string]string // Key=env var name, Value=K8s Secret name.
ResourceRequests Resources
ResourceLimits Resources
Volumes []VolumeMount
ActiveDeadlineSeconds int // Hard timeout for the Job.
}
// VolumeMount — volume source priority: PVCName > ConfigMapName > emptyDir.
type VolumeMount struct {
Name string // Volume name.
MountPath string // Path inside the container.
ReadOnly bool
SubPath string // Mount a single key/subdirectory.
ConfigMapName string // Back the volume with a ConfigMap.
ConfigMapKey string // Project only this key (used with ConfigMapName).
PVCName string // Back the volume with a PersistentVolumeClaim.
}
TaskResult¶
The structured outcome of a completed task, written to /workspace/result.json:
type TaskResult struct {
Success bool // Whether the task completed successfully.
MergeRequestURL string // URL of the created pull request.
BranchName string // The branch containing changes.
Summary string // Human-readable summary.
TokenUsage *TokenUsage // Input/output token counts.
CostEstimateUSD float64 // Estimated cost in US dollars.
ExitCode int // 0=success, 1=agent failure, 2=guard rail blocked.
}
Built-in Engines¶
Claude Code¶
The primary and recommended engine. Runs the Claude Code CLI in headless mode with full hook-based guard rail support.
| Property | Value |
|---|---|
| Engine name | claude-code |
| Package | pkg/engine/claudecode/ |
| Default image | ghcr.io/unitaryai/engine-claude-code:latest |
| Default timeout | 7200 seconds (2 hours) |
| API key secret | anthropic-api-key |
| Guard rails | Pre-tool-use hooks via hooks.json |
| Max agentic turns | 50 (configurable) |
Configuration¶
config:
engines:
default: claude-code
claude_code:
image: "ghcr.io/unitaryai/engine-claude-code:v2.1.0"
max_turns: 50
model: "claude-sonnet-4-6"
timeout_seconds: 3600
fallback_model: haiku # used when primary model is overloaded
append_system_prompt: "Always run tests before committing."
tool_whitelist: # only these tools are available
- Bash
- Read
- Write
- Edit
tool_blacklist: # these tools are blocked
- WebSearch
json_schema: '{"type":"object","properties":{"success":{"type":"boolean"},"summary":{"type":"string"}},"required":["success","summary"]}'
resource_requests:
cpu: "500m"
memory: "512Mi"
resource_limits:
cpu: "2"
memory: "2Gi"
skills: # custom skills — see Skills section below
- name: create-changelog
inline: |
# Create Changelog
Generate a CHANGELOG.md entry for the changes made.
- name: review-checklist
path: /opt/osmia/skills/review-checklist.md
- name: deploy-guide
configmap: deploy-skills # load from a Kubernetes ConfigMap
key: deploy-guide.md # optional — defaults to <name>.md
sub_agents: # see Sub-Agents section below
- name: reviewer
description: "Reviews code changes for correctness"
prompt: "You are a code reviewer. Check for bugs, security issues, and style."
model: haiku
- name: architect
description: "System architecture reviewer"
configmap: architect-agent # load prompt from ConfigMap
agent_teams: # experimental multi-instance collaboration
enabled: false
mode: in-process # required for headless K8s containers
max_teammates: 3
| Field | Type | Default | Description |
|---|---|---|---|
image |
string | ghcr.io/unitaryai/engine-claude-code:latest |
Container image override |
timeout_seconds |
int | 7200 |
Active deadline for the K8s Job |
fallback_model |
string | — | Model to use when the primary is overloaded (e.g. haiku) |
append_system_prompt |
string | — | Extra text appended to Claude Code's system prompt |
session_persistence |
SessionPersistenceConfig | disabled | Opt-in session-state persistence between retries — see Session Persistence |
tool_whitelist |
[]string | — | Only allow these Claude Code tools (via --allowedTools) |
tool_blacklist |
[]string | — | Block these Claude Code tools (via --disallowedTools) |
json_schema |
string | built-in TaskResult schema | JSON schema for structured output (via --json-schema) |
skills |
[]SkillConfig | — | Custom skill files loaded into the agent — see Skills |
sub_agents |
[]SubAgentConfig | — | Sub-agent definitions — see Sub-Agents |
agent_teams |
AgentTeamsConfig | disabled | Experimental multi-instance collaboration — see Agent Teams |
Command¶
The engine generates a claude CLI invocation in streaming JSON mode:
setup-claude.sh \
-p "<prompt>" \
--output-format stream-json \
--max-turns 50 \
--dangerously-skip-permissions \
--verbose \
--mcp-config /workspace/.mcp.json
The setup-claude.sh wrapper runs before claude to initialise the writable home directory — writing ~/.claude/settings.json, /workspace/.mcp.json, and any skill files. It then execs the real claude binary with the arguments above.
Guard Rails (Hooks)¶
Claude Code supports a hooks system that intercepts tool calls before execution. Osmia generates a hooks.json configuration file and mounts it into the agent container at /config/hooks.json:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"command": "/config/guard-rail-check.sh \"$TOOL_INPUT\""
},
{
"matcher": "Write|Edit",
"command": "/config/file-pattern-check.sh \"$TOOL_INPUT\""
}
],
"PostToolUse": [
{
"command": "/config/heartbeat.sh"
}
]
}
}
The guard rail check scripts validate each tool call against:
- Destructive command detection — blocks
rm -rf,DROP TABLE,git push --force,sudo, and similar dangerous commands. - Blocked file patterns — prevents reading or writing files matching patterns in
blocked_file_patterns(e.g.,*.env,*.key,*.pem). - Network restriction — optionally blocks
curl,wget, and other network tools from contacting external hosts.
If a hook script exits with a non-zero code, Claude Code blocks the tool call and reports the violation to the agent, which can then adjust its approach.
The PostToolUse hook writes heartbeat telemetry to /workspace/heartbeat.json after every tool invocation, enabling the progress watchdog to monitor agent activity.
Environment Variables¶
| Variable | Source | Description |
|---|---|---|
ANTHROPIC_API_KEY |
K8s Secret osmia-anthropic-key |
API authentication |
OSMIA_TASK_ID |
Controller | Unique task identifier |
OSMIA_TICKET_ID |
Controller | Source ticket identifier |
OSMIA_REPO_URL |
Ticket | Repository to work on |
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC |
Engine | Always set to 1 |
CLAUDE_SKILL_INLINE_<NAME> |
Engine | Base64-encoded inline skill content (see Skills) |
CLAUDE_SKILL_PATH_<NAME> |
Engine | Path to a skill file on the image or ConfigMap mount (see Skills) |
CLAUDE_SUBAGENT_PATH_<NAME> |
Engine | Path to a ConfigMap-backed sub-agent file (see Sub-Agents) |
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS |
Engine | Set to 1 when agent teams are enabled |
CLAUDE_CODE_MAX_TEAMMATES |
Engine | Maximum teammate agents (agent teams) |
Volume Mounts¶
| Mount | Path | Writable | Purpose |
|---|---|---|---|
workspace |
/workspace |
Yes | Repository checkout and working directory |
config |
/config |
No | Guard rail hooks and configuration |
home |
/home/osmia |
Yes | Writable home directory (emptyDir) for ~/.claude/ config and skills |
tmp |
/tmp |
Yes | Writable tmp (emptyDir) for Claude Code subprocess shell directories |
Skills¶
Skills are custom Markdown instruction files that the agent can invoke via /skill-name in its prompts. They are written to ~/.claude/skills/<name>.md before the agent starts.
Each skill has a name (lowercase letters, digits, and hyphens only) and exactly one of:
inline— the Markdown content directly in the config. The controller base64-encodes it and passes it as theCLAUDE_SKILL_INLINE_<NAME>environment variable.path— a path to a Markdown file on the container image (e.g./opt/osmia/skills/review-checklist.md). The controller passes it asCLAUDE_SKILL_PATH_<NAME>.configmap— the name of a Kubernetes ConfigMap containing the skill. The controller mounts the ConfigMap as a volume at/skills/<name>.mdand setsCLAUDE_SKILL_PATH_<NAME>to the mount path. Optionally specifykeyto select a specific key within the ConfigMap (defaults to<name>.md).
At container startup, setup-claude.sh reads these environment variables, decodes/copies the files, and writes them to ~/.claude/skills/. The <NAME> suffix is converted to lowercase with hyphens (e.g. CLAUDE_SKILL_INLINE_CREATE_CHANGELOG → ~/.claude/skills/create-changelog.md).
Example — inline skill:
engines:
claude_code:
skills:
- name: create-changelog
inline: |
# Create Changelog
When asked to create a changelog entry:
1. Read the existing CHANGELOG.md
2. Determine the next version number from git tags
3. Add a new section with today's date
4. List all changes since the last release
Example — image-bundled skill:
Example — ConfigMap-backed skill:
engines:
claude_code:
skills:
- name: deploy-guide
configmap: deploy-skills # K8s ConfigMap name
key: deploy-guide.md # optional — defaults to <name>.md
Create the ConfigMap separately:
ConfigMap skills are ideal for large skill files or when different teams manage their own skills independently of the controller configuration.
To bundle skills into the container image, add them to your custom Dockerfile:
Sub-Agents¶
Sub-agents allow the main Claude Code agent to delegate specialised subtasks to other agents during execution. Each sub-agent has its own system prompt, model, tool restrictions, and permission mode. Sub-agents use Claude Code's official --agents flag and ~/.claude/agents/ directory.
Sub-agents can be defined inline (prompt in the config) or loaded from a Kubernetes ConfigMap (for large prompts or independent management).
Configuration:
engines:
claude_code:
sub_agents:
# Inline sub-agent — prompt is in the config.
- name: reviewer
description: "Reviews code changes for correctness and style"
prompt: |
You are a code reviewer. Check for:
- Bugs and logic errors
- Security vulnerabilities
- Style and convention violations
model: haiku
tools:
- Read
- Grep
- Glob
max_turns: 10
# ConfigMap-backed sub-agent — prompt loaded from a volume.
- name: architect
description: "Reviews system architecture decisions"
configmap: architect-agent # K8s ConfigMap name
key: architect.md # optional — defaults to <name>.md
model: opus
# Background sub-agent.
- name: linter
description: "Runs linting in the background"
prompt: "Run the project linter and report issues."
background: true
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | — | Sub-agent identifier (required) |
description |
string | — | Short summary of the sub-agent's purpose (required) |
prompt |
string | — | Inline system prompt (mutually exclusive with configmap) |
model |
string | inherit | Model to use: sonnet, opus, haiku, or inherit |
tools |
[]string | — | Tool allowlist |
disallowed_tools |
[]string | — | Tool denylist |
permission_mode |
string | default |
One of default, acceptEdits, dontAsk, bypassPermissions, plan |
max_turns |
int | — | Maximum agentic turns |
skills |
[]string | — | Skill names to preload |
background |
bool | false |
Run as a background process |
configmap |
string | — | Load prompt from this Kubernetes ConfigMap |
key |
string | <name>.md |
Key within the ConfigMap |
How it works:
- Inline sub-agents are serialised as JSON and passed via
--agents '{"name": {"description":"...", "prompt":"...", ...}}'. - ConfigMap sub-agents are volume-mounted at
/subagents/<name>.mdand copied to~/.claude/agents/<name>.mdbysetup-claude.shat container startup. Claude Code automatically discovers agent files in this directory.
Creating a ConfigMap for a sub-agent:
The Markdown file can include YAML frontmatter for metadata (model, tools, etc.) following the Claude Code sub-agents specification.
Agent Teams¶
Experimental
Agent teams are an experimental Claude Code feature (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS). The API may change in future Claude Code releases.
Agent teams spawn multiple independent Claude Code instances that collaborate via shared task lists and inter-agent messaging, coordinated by a team lead. The team lead dynamically creates teammates based on the task — you do not pre-define agents.
This is fundamentally different from Sub-Agents, which are lightweight helpers within a single Claude Code session. Both features can be used simultaneously.
Configuration:
engines:
claude_code:
agent_teams:
enabled: true
mode: in-process # required for headless K8s containers (no tmux)
max_teammates: 3
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool | false |
Activate agent teams mode |
mode |
string | in-process |
Teammate execution mode. Use in-process for headless containers (no tmux required). Alternative: tmux for interactive environments |
max_teammates |
int | 3 |
Maximum number of teammate agents the team lead can spawn |
When enabled, the engine:
- Sets
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1andCLAUDE_CODE_MAX_TEAMMATES=<n>in the container environment. - Appends
--teammate-mode <mode>to the Claude CLI command.
The team lead then autonomously decides how many teammates to create, what roles they should have, and how to distribute work across them.
Session Persistence¶
By default, each agent pod starts with a clean ~/.claude/ directory (emptyDir volume). When a task hits --max-turns and retries, the new pod has no conversation history — it must re-read git diffs to understand prior work.
Session persistence stores the ~/.claude/ directory (and optionally the workspace) in durable storage so that retry pods can resume the exact conversation context via --resume <session-id>, with no wasted turns re-establishing context.
Note: Session persistence is opt-in. Git-based continuation (
PriorBranchName) remains the default fallback.
Configuration:
engines:
claude_code:
session_persistence:
enabled: true
backend: shared-pvc # shared-pvc | per-taskrun-pvc | s3
pvc_name: osmia-agent-sessions # for shared-pvc backend
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool | false |
Activate session persistence |
backend |
string | — | Storage backend: shared-pvc, per-taskrun-pvc, or s3 |
pvc_name |
string | — | Name of the shared PVC (shared-pvc backend) |
storage_class |
string | cluster default | Storage class for dynamic PVCs (per-taskrun-pvc backend) |
storage_size |
string | 1Gi |
PVC size (per-taskrun-pvc backend) |
s3_bucket |
string | — | S3 bucket name (s3 backend, not yet implemented) |
s3_prefix |
string | osmia-sessions/ |
S3 key prefix (s3 backend) |
ttl_minutes |
int | 1440 |
Session data retention after TaskRun completion (24 h) |
Backends:
shared-pvc— mounts a single ReadWriteMany PVC. Each TaskRun gets an isolated subdirectory viaSubPath. Simple and low-cost; requires a storage class that supports RWX (e.g. NFS, EFS, Ceph).per-taskrun-pvc— creates a dedicated ReadWriteOnce PVC per TaskRun. Stronger isolation; the controller deletes the PVC on cleanup.s3— not yet implemented. Will use an init container to download session data and a lifecycle hook to upload on exit.
How it works:
- On first job launch, the controller generates a deterministic session ID from the TaskRun ID and passes it to the agent via
--session-id <id>. - Claude Code stores all session JSONL files in
$CLAUDE_CONFIG_DIR(the PVC-backed path) rather than the ephemeral emptyDir home volume. - When the agent hits
--max-turns, the retry pod receivestask.SessionIDand the agent is invoked with--resume <session-id>instead of a fresh session. - The workspace directory is also persisted on the PVC (
OSMIA_WORKSPACE_DIR), soentrypoint.shskips the git-clone step on retry pods.
Storage Cleanup¶
Session data is ephemeral — it only needs to exist while retries are possible. Once a TaskRun reaches a terminal state (Succeeded, Failed with retries exhausted, or TimedOut), its session data can be removed. Osmia handles this automatically via a background cleaner goroutine (internal/sessionstore/cleaner.go) that runs inside the controller pod when session persistence is enabled.
How cleanup works:
The cleaner runs a sweep every hour (configurable) and removes session data older than ttl_minutes (default 1440 = 24 hours):
| Backend | What gets cleaned | How |
|---|---|---|
shared-pvc |
Subdirectories on the shared PVC | The controller pod mounts the shared PVC at /data/sessions. The cleaner lists subdirectories and removes any whose modification time is older than the TTL. |
per-taskrun-pvc |
Dedicated PVCs | The cleaner lists PVCs with the osmia.io/task-run-id label and deletes any whose creation timestamp is older than the TTL. |
s3 |
S3 objects by prefix | Not yet implemented. |
Important details:
- PVC phase is intentionally not checked. When an agent pod finishes and is deleted, its PVC remains in
Boundphase — Kubernetes does not automatically transition it toReleased. If the cleaner filtered by phase, PVCs would accumulate indefinitely. Age-based deletion is safe because the TTL guarantees the TaskRun is terminal and no pod is referencing the PVC. - The
Cleanup()method on each store is idempotent. Calling it multiple times (e.g. after a controller restart) is safe — deleting an already-removed PVC is treated as success. - The controller pod must mount the shared PVC for the
shared-pvccleaner to work. The Helm chart handles this automatically: whensessionPersistence.backendisshared-pvc, the deployment template adds asession-pvcvolume and mounts it at/data/sessions. If you deploy without the Helm chart, ensure the controller pod has the shared PVC mounted and pass the mount path asPVCRootDirin theCleanerConfig. per-taskrun-pvccleanup requires RBAC. The controller's service account needslistanddeletepermissions onpersistentvolumeclaimsin the target namespace. The Helm chart's default RBAC rules include these permissions.
Tuning the TTL:
Set sessionPersistence.ttlMinutes in values.yaml. A shorter TTL reduces storage usage but risks deleting session data before a long-running retry completes. A longer TTL is safer but accumulates more data. The default of 24 hours is conservative — most tasks complete retries within minutes.
sessionPersistence:
enabled: true
backend: per-taskrun-pvc
ttlMinutes: 720 # 12 hours — more aggressive cleanup
What happens if cleanup fails?
The cleaner logs warnings and continues to the next item. Failed deletions are retried on the next sweep (1 hour later). No session data is critical — worst case, stale PVCs occupy storage until manually removed. You can monitor cleanup via the controller's structured logs:
level=INFO msg="per-taskrun-pvc cleaner: deleted stale session PVC" pvc=osmia-session-tr-abc123
level=WARN msg="per-taskrun-pvc cleaner: failed to delete stale PVC" pvc=osmia-session-tr-xyz error="..."
Helm chart: Set sessionPersistence.enabled: true and configure the chosen backend in values.yaml. The chart will create the shared PVC when backend: shared-pvc is selected.
Continuation Prompts¶
When an agent exhausts --max-turns, the default behaviour is to auto-retry using git-based continuation (the retry agent clones the prior branch and reads git log). User-prompted continuation replaces that with an interactive pause: the controller sends a Slack message asking the operator whether to continue or stop, and only resumes if the operator approves.
Prerequisites:
- Session persistence must be enabled (so the retry pod can resume via
--resume <session-id>rather than re-reading git history). - An approval backend must be configured (currently Slack).
How it works:
- The agent pod exits cleanly after hitting
--max-turns. The controller detects thatToolCallsTotal >= ConfiguredMaxTurnsand the task did not succeed. - The TaskRun transitions to
NeedsHumanwith gate typecontinuation. - The controller sends an approval request containing the turn count, cost so far, any progress summary, and two buttons: Continue and Stop.
- If the operator clicks Continue, the controller increments
ContinuationCount, creates a new pod with--resume <session-id>, and transitions back toRunning. - If the operator clicks Stop (or the message times out), the TaskRun transitions to
Failedwith a reason that includes the operator's name and any progress summary.
Configuration:
config:
engines:
claude-code:
continuation_prompt: true # enable user-prompted continuation
max_continuations: 3 # maximum times the operator can approve (default 3)
session_persistence:
enabled: true
backend: per-taskrun-pvc
Interaction with retries:
ContinuationCount and RetryCount are independent. A continuation resumes the existing session from where it paused — it does not count as a retry and does not consume a retry slot. If a resumed session then fails for a different reason (e.g. a tool error), the normal retry mechanism applies separately.
Operator experience:
The Slack message sent to the approval channel looks like:
Task tr-abc123 has exhausted its turn limit (50 turns, $0.42).
Progress: "Implemented the feature, tests passing, PR not yet raised."
Continuation 1 of 3. Approve to resume the session where it left off.
[Continue] [Stop]
Clicking Continue resumes immediately. Clicking Stop marks the TaskRun as failed and records the operator's username in the failure reason.
Note: User-prompted continuation requires the Claude Code engine and session persistence. The git-based continuation strategy is used for all other engines.
OpenAI Codex¶
Runs the OpenAI Codex CLI in fully autonomous mode.
| Property | Value |
|---|---|
| Engine name | codex |
| Package | pkg/engine/codex/ |
| Default image | ghcr.io/unitaryai/engine-codex:latest |
| Default timeout | 7200 seconds (2 hours) |
| API key secret | openai-api-key |
| Guard rails | Prompt-embedded rules |
Configuration¶
config:
engines:
default: codex
codex:
image: "ghcr.io/unitaryai/engine-codex:v1.0.0"
timeout_seconds: 3600
resource_requests:
cpu: "500m"
memory: "512Mi"
resource_limits:
cpu: "2"
memory: "2Gi"
Command¶
Guard Rails (Prompt-Embedded)¶
Codex does not support a hooks system. Guard rails are appended directly to the task prompt:
## Guard Rails
You MUST follow these rules strictly:
- Do NOT execute destructive commands (e.g. rm -rf /, drop database, etc.)
- Do NOT modify or read files matching sensitive patterns (*.env, **/secrets/**, *.key, *.pem)
- Do NOT make network requests to external services other than the repository remote
- Do NOT install packages or dependencies without explicit instructions to do so
- Do NOT push commits directly; stage and commit changes locally only
Prompt-based guard rails are advisory
The AI model may not always respect prompt-embedded rules. For stricter enforcement, use the Claude Code engine with hook-based guards, or rely on the quality gate for post-completion validation.
Repository Context¶
Codex reads repository conventions from AGENTS.md (rather than CLAUDE.md). If an AGENTS.md file is present in the repository root, Codex will use it for coding conventions and project structure guidance.
Environment Variables¶
| Variable | Source | Description |
|---|---|---|
OPENAI_API_KEY |
K8s Secret openai-api-key |
API authentication |
OSMIA_TASK_ID |
Controller | Unique task identifier |
OSMIA_TICKET_ID |
Controller | Source ticket identifier |
OSMIA_REPO_URL |
Ticket | Repository to work on |
Aider¶
Runs the Aider CLI for AI-assisted coding. Aider supports multiple LLM providers — Osmia can configure it to use either Anthropic or OpenAI models.
| Property | Value |
|---|---|
| Engine name | aider |
| Package | pkg/engine/aider/ |
| Default image | ghcr.io/unitaryai/engine-aider:latest |
| Default timeout | 7200 seconds (2 hours) |
| API key secret | anthropic-api-key (default) or openai-api-key |
| Guard rails | Prompt-embedded rules |
Configuration¶
config:
engines:
default: aider
aider:
image: "ghcr.io/unitaryai/engine-aider:v1.0.0"
provider: "anthropic" # or "openai"
timeout_seconds: 3600
resource_requests:
cpu: "500m"
memory: "512Mi"
resource_limits:
cpu: "2"
memory: "2Gi"
Command¶
The --no-git flag is used because Osmia manages git operations via the SCM backend, not via Aider's built-in git support.
Model Provider¶
Aider supports both Anthropic and OpenAI models. Configure the provider in the engine configuration:
| Provider | Environment Variable | Secret Name |
|---|---|---|
anthropic (default) |
ANTHROPIC_API_KEY |
anthropic-api-key |
openai |
OPENAI_API_KEY |
openai-api-key |
Repository Context¶
Aider reads coding conventions from .aider/conventions.md (rather than CLAUDE.md or AGENTS.md). If an .aider.conf.yml file is present in the repository root, Aider will use it for additional configuration (model selection, editor settings, etc.).
Guard Rails (Prompt-Embedded)¶
Like Codex, Aider does not support a hooks system. Guard rails are appended directly to the task prompt, identical in content to the Codex guard rails above.
OpenCode¶
OpenCode is a terminal-based AI coding agent. Osmia runs it in headless mode inside Kubernetes Jobs.
Package: pkg/engine/opencode/
| Property | Value |
|---|---|
| Engine name | opencode |
| Default image | ghcr.io/unitaryai/engine-opencode:latest |
| Command | opencode --non-interactive --message <prompt> |
| Interface version | 1 |
Configuration¶
engines:
default: opencode
opencode:
image: ghcr.io/unitaryai/engine-opencode:v1.0.0 # optional override
auth:
method: api_key
api_key_secret: anthropic-credentials
provider: anthropic # or "openai", "google"
Providers¶
| Provider | Environment Variable | K8s Secret Key |
|---|---|---|
anthropic (default) |
ANTHROPIC_API_KEY |
anthropic-api-key |
openai |
OPENAI_API_KEY |
openai-api-key |
google |
GOOGLE_API_KEY |
google-api-key |
Repository Context¶
OpenCode reads coding conventions from AGENTS.md in the repository root.
Guard Rails (Prompt-Embedded)¶
OpenCode does not support a hooks system. Guard rails are appended directly to the task prompt.
Cline¶
Community template — no pre-built image
Cline is a VS Code extension with no published headless CLI. The Go engine implementation (pkg/engine/cline/) and the Dockerfile (docker/engine-cline/) are provided as a community contribution template. No pre-built container image is published for Cline. Configuring cline as your engine will result in an image pull failure until a working headless integration is contributed. See Contributing if you want to help.
Cline is an AI coding agent with optional MCP (Model Context Protocol) and AWS Bedrock support. When a headless CLI becomes available, Osmia can run it inside Kubernetes Jobs using the implementation in pkg/engine/cline/.
Package: pkg/engine/cline/
| Property | Value |
|---|---|
| Engine name | cline |
| Default image | ghcr.io/unitaryai/engine-cline:latest (not yet published) |
| Command | cline --headless --task <prompt> --output-format json |
| Interface version | 1 |
Configuration¶
engines:
default: cline
cline:
image: ghcr.io/unitaryai/engine-cline:v1.0.0 # optional override
auth:
method: api_key
api_key_secret: anthropic-credentials
provider: anthropic # or "openai", "google", "bedrock"
mcp_enabled: true # append --mcp flag
Providers¶
| Provider | Environment Variable | K8s Secret Key |
|---|---|---|
anthropic (default) |
ANTHROPIC_API_KEY |
anthropic-api-key |
openai |
OPENAI_API_KEY |
openai-api-key |
google |
GOOGLE_API_KEY |
google-api-key |
bedrock |
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY |
aws-access-key-id, aws-secret-access-key |
Repository Context¶
Cline reads project-specific instructions from .clinerules in the repository root.
MCP Support¶
When mcp_enabled: true is set in the engine configuration, the --mcp flag is appended to the Cline command, enabling Model Context Protocol integration for tool use.
Guard Rails (Prompt-Embedded)¶
Cline does not support a hooks system. Guard rails are appended directly to the task prompt.
Engine Selection¶
The controller selects engines in this order:
- Per-ticket override — if the ticket metadata or labels specify an engine (e.g., a
engine:codexlabel), that engine is used. - Default engine — the
engines.defaultconfiguration value. - Fallback —
claude-codeif no default is configured.
Comparison Matrix¶
| Criterion | Claude Code | Codex | Aider | OpenCode | Cline |
|---|---|---|---|---|---|
| Guard rail enforcement | Hook-based (deterministic) | Prompt-based (advisory) | Prompt-based (advisory) | Prompt-based (advisory) | Prompt-based (advisory) |
| Sub-agent support | Yes (official) | No | No | No | No |
| Multi-model support | Anthropic only | OpenAI only | Anthropic + OpenAI | Anthropic + OpenAI + Google | Anthropic + OpenAI + Google + Bedrock |
| Agentic turns limit | Configurable (max_turns) |
N/A | N/A | N/A | N/A |
| Repository context file | CLAUDE.md |
AGENTS.md |
.aider/conventions.md |
AGENTS.md |
.clinerules |
| Heartbeat telemetry | Via PostToolUse hook | Not built-in | Not built-in | Not built-in | Not built-in |
| MCP server support | Yes | No | No | No | Yes (via --mcp flag) |
| Pre-built image | ✅ | ✅ | ✅ | ✅ | ❌ Community template |
Recommendation: Use Claude Code as the default engine for its superior guard rail enforcement via hooks and built-in heartbeat telemetry. Use Codex or Aider when you need OpenAI models or have specific tool preferences. OpenCode supports Google models. Cline is a community template without a published image.
Writing a Custom Engine¶
To add a new engine, create a new package under pkg/engine/ and implement the ExecutionEngine interface:
package myengine
import (
"fmt"
"strings"
"github.com/unitaryai/osmia/pkg/engine"
)
const (
engineName = "my-engine"
interfaceVersion = 1
defaultImage = "ghcr.io/myorg/my-engine:latest"
)
// MyEngine implements engine.ExecutionEngine.
type MyEngine struct{}
func New() *MyEngine { return &MyEngine{} }
func (e *MyEngine) Name() string { return engineName }
func (e *MyEngine) InterfaceVersion() int { return interfaceVersion }
func (e *MyEngine) BuildExecutionSpec(task engine.Task, config engine.EngineConfig) (*engine.ExecutionSpec, error) {
if task.ID == "" {
return nil, fmt.Errorf("task ID must not be empty")
}
prompt, err := e.BuildPrompt(task)
if err != nil {
return nil, fmt.Errorf("building prompt: %w", err)
}
image := config.Image
if image == "" {
image = defaultImage
}
return &engine.ExecutionSpec{
Image: image,
Command: []string{"my-cli", "--prompt", prompt},
Env: map[string]string{
"OSMIA_TASK_ID": task.ID,
"OSMIA_TICKET_ID": task.TicketID,
"OSMIA_REPO_URL": task.RepoURL,
},
SecretEnv: map[string]string{
"MY_API_KEY": "my-api-key-secret",
},
Volumes: []engine.VolumeMount{
{Name: "workspace", MountPath: "/workspace"},
{Name: "config", MountPath: "/config", ReadOnly: true},
},
ActiveDeadlineSeconds: config.TimeoutSeconds,
}, nil
}
func (e *MyEngine) BuildPrompt(task engine.Task) (string, error) {
if task.Title == "" {
return "", fmt.Errorf("task title must not be empty")
}
var b strings.Builder
b.WriteString("# Task: ")
b.WriteString(task.Title)
b.WriteString("\n\n")
if task.Description != "" {
b.WriteString("## Description\n\n")
b.WriteString(task.Description)
b.WriteString("\n\n")
}
b.WriteString("## Instructions\n\n")
b.WriteString("Complete the task above. Work in /workspace.\n")
b.WriteString("Write result.json to /workspace/result.json when finished.\n")
return b.String(), nil
}
Register the engine with the controller at startup:
reconciler := controller.NewReconciler(cfg, logger,
controller.WithEngine(claudecode.New()),
controller.WithEngine(codex.New()),
controller.WithEngine(aider.New()),
controller.WithEngine(myengine.New()),
)
Testing¶
Write table-driven tests for both BuildExecutionSpec and BuildPrompt:
func TestMyEngine_BuildExecutionSpec(t *testing.T) {
tests := []struct {
name string
task engine.Task
config engine.EngineConfig
wantErr bool
}{
{
name: "valid task produces spec",
task: engine.Task{ID: "1", Title: "Fix bug"},
config: engine.EngineConfig{TimeoutSeconds: 3600},
},
{
name: "empty ID returns error",
task: engine.Task{Title: "Fix bug"},
wantErr: true,
},
}
e := New()
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
spec, err := e.BuildExecutionSpec(tt.task, tt.config)
if tt.wantErr {
require.Error(t, err)
return
}
require.NoError(t, err)
assert.NotEmpty(t, spec.Image)
assert.NotEmpty(t, spec.Command)
})
}
}
Protobuf Definition¶
The complete protobuf service is defined in proto/engine.proto. Note that engines are currently built-in only (Go), but the protobuf definition exists for future support of third-party engines via gRPC.