Skip to content

Guard Rails

Overview

RoboDev provides layered safety boundaries — guard rails — to ensure autonomous AI agents operate within enterprise-approved limits. Guard rails are applied at multiple levels for defence in depth.

New to guard rails?

For a plain-language introduction, see Guard Rails Overview. This page covers the detailed configuration reference.

sequenceDiagram
    participant Ticket as Incoming Ticket
    participant L1 as 1. Controller Validation
    participant L2 as 2. Engine Hooks
    participant L3 as 3. Quality Gate
    participant L4 as 4. Watchdog

    Ticket->>L1: Check allowed repos, task types, limits
    L1->>L2: Pass — launch agent
    Note over L2: Intercept tool calls<br/>in real time
    L2->>L3: Agent finishes
    Note over L3: Scan for secrets,<br/>OWASP patterns
    L3->>L4: Continuous monitoring
    Note over L4: Detect loops, stalls,<br/>cost overruns

1. Controller-Level Guards

Applied before a job is created. Configured in robodev-config.yaml:

guardrails:
  max_cost_per_job: 50.00
  max_concurrent_jobs: 5
  max_job_duration_minutes: 120
  allowed_repos:
    - "org/frontend-*"
    - "org/backend-*"
  blocked_file_patterns:
    - "*.env"
    - "**/secrets/**"
    - "**/credentials/**"
  require_human_approval_before_mr: false
  allowed_task_types:
    - "dependency_upgrade"
    - "test_fix"
    - "bug_fix"
    - "documentation"

What Each Guard Does

Guard Effect
max_cost_per_job Terminates jobs exceeding the USD budget
max_concurrent_jobs Queues new tickets when limit is reached
max_job_duration_minutes Sets activeDeadlineSeconds on K8s Jobs
allowed_repos Rejects tickets for repositories not matching glob patterns
blocked_file_patterns Injected into engine hooks to prevent modification
require_human_approval_before_mr Pauses before PR creation for human sign-off
allowed_task_types Rejects tickets with disallowed task types

2. Engine-Level Guards (Claude Code Hooks)

Only applies to Claude Code

Engine hooks are only available for the Claude Code engine. Other engines (Codex, Aider, OpenCode, Cline) rely on prompt-based rules which are advisory, not enforced.

Applied inside the execution container via Claude Code's hooks system. RoboDev generates a settings.json file mounted into the container:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "/opt/robodev/hooks/block-dangerous-commands.sh"
          }
        ]
      },
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "/opt/robodev/hooks/block-sensitive-files.sh"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/opt/robodev/hooks/heartbeat.sh"
          }
        ]
      }
    ]
  }
}

Blocked Commands

The block-dangerous-commands.sh hook blocks: - rm -rf / and similar destructive commands - curl | bash, wget | bash (remote code execution) - eval with untrusted input - sudo (privilege escalation) - chmod 777 (insecure permissions) - git push --force to main/master

Blocked Files

The block-sensitive-files.sh hook blocks writes to: - .env* files - **/credentials/** - **/secrets/** - *.pem, *.key (private keys)

Custom patterns can be added via the BLOCKED_FILE_PATTERNS environment variable.

3. Custom Guard Rails via Markdown (Planned)

Not yet wired

The guardrails.md injection path is not currently wired in the controller. The TaskProfileConfig struct has a Workflow field and the promptbuilder package exists, but the controller builds execution specs directly from ticket fields rather than routing through the promptbuilder. This feature is on the roadmap.

The intention is that users will provide a guardrails.md file (mounted from a ConfigMap) that the prompt builder appends to every agent prompt, giving the agent advisory rules such as:

# Guard Rails

## Never Do
- Never modify CI/CD pipeline configuration files
- Never change database migration files

## Always Do
- Always run the full test suite before creating an MR

4. Per-Task-Type Permission Profiles (Partially Implemented)

Config schema only

task_profiles is present in the config schema and values are stored, but per-task-type file pattern restrictions (allowed_file_patterns, blocked_file_patterns) are not enforced at runtime. The controller reads AllowedTaskTypes for validation but does not yet apply profile-level constraints to agent pods.

The task_profiles config structure is defined for future enforcement:

guardrails:
  task_profiles:
    dependency_upgrade:
      allowed_file_patterns:
        - "pyproject.toml"
        - "requirements*.txt"
      max_cost_per_job: 30.00
      max_job_duration_minutes: 60

    bug_fix:
      blocked_file_patterns:
        - "**/migrations/**"
        - "**/auth/**"
      max_cost_per_job: 50.00

    documentation:
      allowed_file_patterns:
        - "*.md"
        - "docs/**"
      blocked_commands:
        - "git push"
      max_cost_per_job: 10.00

The controller selects the profile based on ticket labels or the ticket_type field from the ticketing backend.

5. Quality Gate

An optional post-completion review that runs as a separate K8s Job:

quality_gate:
  enabled: true
  mode: "post-completion"
  engine: claude-code
  max_cost_per_review: 5.00
  security_checks:
    scan_for_secrets: true
    check_owasp_patterns: true
    verify_guardrail_compliance: true
    check_dependency_cves: true
  on_failure: "retry_with_feedback"

The quality gate is read-only — it cannot modify the repository.

6. Progress Watchdog

Detects agents that are stalled, looping, or unproductive during execution:

progress_watchdog:
  enabled: true
  check_interval_seconds: 60
  min_consecutive_ticks: 2
  research_grace_period_minutes: 5
  loop_detection_threshold: 10
  thrashing_token_threshold: 80000
  stall_idle_seconds: 300
  cost_velocity_max_per_10_min: 15.00
  unanswered_human_timeout_minutes: 30

Detection Rules

Rule Detects Action
Loop detection Same tool call repeated N times Terminate with feedback
Thrashing High token use, no file changes Warn, then terminate
Stall No tool calls for N seconds Terminate
Cost velocity Spending > $X per 10 minutes Warn
Telemetry failure Heartbeat stopped advancing Warn
Unanswered human NeedsHuman with no response Terminate and notify

All terminate actions require the anomaly to persist for at least min_consecutive_ticks checks to prevent false positives.