Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
AWS Secrets Manager Backend¶
- Built-in AWS Secrets Manager secrets backend. New
aws-secrets-managerbackend (pkg/plugin/secrets/awssm/) reads secrets from AWS Secrets Manager via the AWS SDK v2 default credential chain. Supports IRSA on EKS, cross-account access via STS AssumeRole, configurable cache TTL (default 5 minutes), and thesecret-name#json-fieldkey format. URI scheme:aws-sm://. Configured viasecret_resolver.backendsinrobodev-config.yaml.
Session Continuation After Max Turns¶
- Retry jobs now continue from prior work rather than starting from scratch. When a Claude Code agent hits
--max-turnsand the pod exits, the next retry agent clones the branch that was pushed during the previous session instead of re-cloning from the default branch. - Deterministic branch naming. The base prompt now instructs the agent to create and push to
robodev/<ticket-id>throughout execution, committing frequently rather than waiting until the end. This branch name is predictable even ifresult.jsonwas never written (e.g. pod killed before the stop hook ran). Task.PriorBranchNamefield. Theengine.Taskstruct has a newPriorBranchNamefield. When set,BuildPromptemits a## Continuationsection that clones that branch with--depth=50and asks the agent to review prior commits before continuing.launchRetryJobbranch propagation. The controller now setsPriorBranchNamefromtr.Result.BranchNamewhen available, or falls back torobodev/<ticketID>for retries where the pod was killed before writingresult.json.
Configurable Max Turns for Claude Code¶
max_turnsis now configurable for the Claude Code engine. Setengines.claude_code.max_turnsin your config to override the default of 50 turns. Increase this for tasks that require more steps, such as large refactors or multi-file changes. The value is passed directly to Claude Code's--max-turnsflag.
ConfigMap-Backed Skills¶
- Skills can now be loaded from Kubernetes ConfigMaps. Set
configmap(and optionallykey) on a skill config entry instead ofinlineorpath. The ConfigMap is volume-mounted into the agent container and copied to~/.claude/skills/at startup. This avoids bloating the controller config with large Markdown files and allows teams to manage skills independently.
Sub-Agent Support¶
- New
sub_agentsconfiguration for Claude Code sub-agents. Sub-agents are lightweight helpers within a single session, using the official--agentsflag format ({"name": {"description":"...", ...}}). Supports inline prompts and ConfigMap-backed prompts (mounted to~/.claude/agents/). Full feature set: model selection, tool allow/deny lists, permission modes, max turns, skills, and background mode. This is a separate feature from agent teams (which spawn multiple independent instances).
ConfigMap Volume Support¶
engine.VolumeMountnow supports ConfigMap sources. NewConfigMapName,ConfigMapKey, andSubPathfields allow the jobbuilder to emit ConfigMap volume sources instead of emptyDir. This is the foundation for ConfigMap-backed skills and sub-agents.
Proto/Plugin Consistency¶
proto/scm.protoupdated to match Go interface. AddedListReviewComments,ReplyToComment,ResolveThread, andGetDiffRPCs with their request/response message types. Third-party SCM plugins written against the protobuf API can now implement the full v2 contract.- Plugin host now validates interface versions at load time.
LoadPluginchecks the plugin's declaredinterface_versionagainst the controller's expected version before spawning the subprocess. Mismatched plugins are rejected immediately with a structured error instead of silently loading.
Agent Teams Fixed and Wired¶
- Agent teams are now properly wired in the controller. Previously,
WithTeamsConfigwas never called inmain.go. Agent teams now correctly setCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1,CLAUDE_CODE_MAX_TEAMMATES, and pass--teammate-modeto the Claude CLI. The incorrect--agentsflag (which is a sub-agents feature, not an agent teams feature) has been removed. Agent teams and sub-agents are separate, complementary features that can be used simultaneously.
Documentation¶
- Skills and agent teams documented. The engine reference (
docs/plugins/engines.md) now covers custom skills (inline and image-bundled), agent teams configuration (modes, default agents per task type), and all previously-undocumented Claude Code fields (fallback_model,tool_whitelist,tool_blacklist,json_schema,no_session_persistence,append_system_prompt). The configuration reference (docs/getting-started/configuration.md) includes expanded examples. Thesetup-claude.shstartup flow is also described.
Approval Flow End-to-End¶
- Full approval flow wiring. Slack approval buttons (
robodev_approval_{taskRunID}_{i}) are now routed through a newApprovalHandlerinterface on the webhook server, which delegates toReconciler.ResolveApproval. Pre-start approval launches the job; pre-merge approval completes the task. Rejections transition to Failed and mark the ticket. RequestApprovalcalled at both gates. The controller now calls the approval backend'sRequestApprovalat both the pre-start and pre-merge gates so humans actually receive the notification.NeedsHuman → Failedstate transition. The TaskRun state machine now allows rejection from the NeedsHuman state.- Watchdog monitors NeedsHuman task runs. The watchdog loop now iterates NeedsHuman runs (for unanswered-human timeout) and cancels pending approval requests on termination.
Intelligent Routing Wired into Engine Selection¶
IntelligentSelectoris now the primary engine selector when routing is enabled. It uses the default engine selector as its fallback via a newSetFallbackmethod, ensuring cold-start tasks still get a sensible engine order.
Real Diff for Code Review Gate¶
GetDiffadded toscm.Backendinterface. GitHub and GitLab SCM backends now implementGetDiffusing their respective compare APIs. ThebaseBranchparameter is resolved automatically: GitHub usesHEAD(the repo's default branch) when empty; GitLab fetches the project'sdefault_branchfrom the API. The review gate fetches the actual diff before callingReviewDiff, instead of passing an empty string.scm.InterfaceVersionbumped from 1 to 2 to signal the contract change to external plugins.
Runtime Cost Enforcement¶
-
max_cost_per_jobwatchdog rule. A newcheckTotalCostrule in the watchdog terminates jobs whose cumulative cost exceedsguardrails.max_cost_per_job. Cost data is fed from the live stream:CostEvent.CostUSDis propagated toTaskRun.CostUSDviaConsumeStreamEvent, andbuildHeartbeatuses it for each watchdog tick. Falls back to the result-level cost for agents that do not emit live cost events. -
Slack approval callback failures now return 500. When
ResolveApprovalfails, the webhook returns a non-2xx status so Slack retries the delivery instead of silently losing the callback.
Fixed¶
Local Development¶
make local-upnow boots cleanly on kind without extra Helm overrides. The local controller image is built and loaded under the chart'sghcr.io/unitaryai/robodev/controllerrepository, and the dev Helm overlay now leaves ticketing unconfigured so the controller falls back to the built-in noop backend for credential-free smoke testing.
Security / Correctness¶
-
Slack approval action ID mismatch resolved. The webhook previously filtered
robodev_approve_*/robodev_reject_*prefixes, which never matched the actualrobodev_approval_*format generated by the Slack approval backend. Callbacks are now correctly detected, parsed, and routed. -
GitHub polling no longer picks up pull requests. The
/issuesendpoint returns both issues and pull requests. Items with a non-nilpull_requestfield are now filtered out before the polling backend emits tickets, preventing RoboDev from treating a labelled PR as a task. -
Webhook trigger labels auto-derived from ticketing config. When
webhook.github.trigger_labelsis not explicitly set andticketing.backendis"github", the webhook now falls back to thelabelslist from the GitHub ticketing backend config. Deployments that do not use the GitHub polling backend must settrigger_labelsexplicitly if label gating is desired. -
Webhook adapter propagates processing errors.
HandleWebhookEventnow returns the first error encountered rather than always returningnil. This causes the webhook server to respond with a non-2xx status so senders (GitHub, GitLab) will retry the delivery. -
Code review gate is now actually a gate. When
code_review.enabled: trueand the review backend returnspassed: false, the TaskRun is now transitioned toFailedand the ticket is marked failed. Previously the controller logged the outcome and then unconditionally succeeded the run. -
Intelligent routing no longer panics on cold start.
IntelligentSelector.SelectEnginespreviously calleds.fallback.SelectEngineswith a nil receiver when fingerprint data was insufficient. A nil-fallback guard now returns the available engine list in order instead of panicking. -
Routing advertises only configured engines. The
availableEngineslist in the routing initialisation was built by ranging over amap[string]boolwithout checking the boolean value, so unconfigured engine names were always included. Only engines whose configuration is non-nil (or whose name matchesengines.default) are now advertised. -
Label removal is now URL-safe.
removeLabelin the GitHub ticketing backend now usesurl.PathEscapeon the label name, preventing broken DELETE calls for labels with spaces or special characters. -
Removed unused
RequireHumanApprovalBeforeMRconfig field. Approval usesApprovalGatesinstead.
Documentation¶
- Corrected wrong config keys in README and getting-started guides:
max_cost_per_task_usd→max_cost_per_job,max_duration_minutes→max_job_duration_minutes. - Fixed
runAsUser: 1000→10000in security docs. - Changed "Seven layered" → "Six layered" in README; removed prompt injection from layers list (it is a threat, not a layer).
- Moved PRM from "Coming Soon" to implemented in guardrails overview.
- Added note that engine hooks only apply to Claude Code.
- Added note that only the
memoryTaskRun store backend is currently supported. - Fixed secondary docs overclaiming guardrails.md injection and task-profile file-pattern enforcement:
architecture.md,security.md,concepts/guardrails-overview.md,getting-started/kubernetes.mdall now accurately describe layers 3 & 4 as advisory/on-roadmap. - Fixed
docs/scaling.mdcontradiction: KEDA note no longer claims leader election is active; multi-tenancy section notes that namespace-per-tenant runtime isolation is on the roadmap. - Fixed
docs/plugins/secrets.mdoverclaiming: "Third-party plugins are available" replaced with accurate status. 1Password and Azure KV backends are not implemented; documented as planned or community-implementable. Added built-in Vault backend documentation (was undocumented). Added External Secrets Operator integration guide for AWS Secrets Manager. - Added built-in AWS Secrets Manager backend to roadmap (
docs/roadmap.md). - Fixed Slack webhook API docs: action ID prefix corrected from
robodev_approve_torobodev_approval_; clarified that approval callbacks are routed to the approval handler, not forwarded as tickets. - Fixed
docs/concepts/engines.md: tournament warning no longer callsmax_predicted_cost_per_joban "approval gate" — it is an auto-rejection threshold.
[0.2.0] — 2026-03-05¶
Added¶
Helm Chart Persistence¶
- New
persistencevalues block — whenpersistence.enabled: true, a PVC is provisioned and mounted at/datafor SQLite stores (memory, routing, watchdog). Uses the cluster default storage class; override viapersistence.storageClass. - Memory and routing subsystems now survive pod restarts when persistence is enabled.
Memory and PRM in Live Deployments¶
memoryandprmsubsystems are now enabled viavalues-live-local.yamland confirmed working end-to-end against a real kind cluster.- Controller now logs
memory context injected into prompt(with fact/insight/issue counts) when prior knowledge is retrieved and passed to an agent. - Controller logs
memory extraction completedwith node and edge counts after each successful task run.
Shortcut: Configurable Completed State¶
- New
completed_state_nameconfig key for the Shortcut backend. When set,MarkCompletetransitions stories to that named state (e.g."Ready for Review") rather than always using the first done-type state in the workflow.
Fixed¶
Helm Chart RBAC¶
- Added
pods/execwithcreateverb to the controllerClusterRole. Previously the PRM hint file injection and cleanup would fail with a 403 — this is now resolved.
Changed¶
CI Workflow¶
- Added
workflow_dispatchtrigger to the CI workflow so it can be manually run against any branch without requiring a push event.
Documentation¶
- Configuration Reference — expanded ticketing section with full config reference for GitHub Issues, Shortcut, and Linear, including all keys and field tables.
- Plugins / Ticketing — added Shortcut and Linear built-in backend sections with configuration examples, behaviour tables, and permissions notes.
- Setup Guide: Linear + Slack — new step-by-step guide covering API key creation, team ID lookup, label setup, secrets, config, deploy, and troubleshooting.
- Setup Guide: Shortcut + Slack — added
completed_state_name,exclude_labels, and multi-workflow tip. - Roadmap — moved Item 20 to completed section; marked documentation CI/CD items done; removed stale note about intelligence subsystems being unwired.
- Fixed broken anchor link in
index.mdpointing to the Competitive Execution section.
Added¶
PR/MR Comment Response (Item 20)¶
pkg/plugin/scm—ReviewCommenttype;ListReviewComments,ReplyToComment, andResolveThreadadded to theBackendinterface.pkg/plugin/scm/github— GitHub REST implementation:ListReviewCommentsmerges review-comment and issue-comment endpoints sorted by creation time;ReplyToCommenttries the review reply endpoint and falls back to issue comment;ResolveThreadis a documented no-op (GitHub REST does not support thread resolution).pkg/plugin/scm/gitlab— GitLab REST implementation:ListReviewCommentsfetches notes withsystem: falsefilter;ReplyToCommentresolves discussion ID then posts a discussion note;ResolveThreadcallsPUT …/discussions/{id}withresolved: true.internal/reviewpoller— new package implementing the review response polling subsystem:types.go—Classificationenum,ClassifiedComment,TrackedPR,FollowUpRequest.classifier.go—RuleBasedClassifier(bot-author ignore list, keyword tiers for informational/warning/error) andLLMClassifier(ChainOfThought with rule-based fallback; disabled by default).poller.go—PollerwithRegister,DrainFollowUps, andStart(background ticker); routes SCM calls viascmBackendorscmRouter; enforcesMaxFollowUpJobsper PR; optionally replies to comments acknowledging receipt.internal/config—ReviewResponseConfigstruct withenabled,poll_interval_minutes,min_severity,max_follow_up_jobs,reply_to_comments,resolve_threads,llm_classifier; validation rejects invalidmin_severityvalues.internal/taskrun—ParentTicketID,ReviewCommentID,ReviewThreadID,ReviewPRURLfields onTaskRunfor follow-up lifecycle tracking.internal/controller—reviewPollerfield,WithReviewPolleroption,handleFollowUpComplete(posts ticket comment, replies to review comment, resolves thread),processFollowUpTask(builds and launches a K8s follow-up Job), andscmForhelper; drain inreconcileOnce; register inhandleJobComplete.cmd/robodev/main.go— review response subsystem wiring underreview_response.enabled.tests/integration/review_response_test.go— 9 integration tests covering bot-ignore, requires-action, informational, follow-up emission, processed-ID idempotency, max-follow-up limit, merged-PR untracking, reply-on-action, and config validation.
E2E Workflow Pipeline Tests¶
hack/fake-agent/— standalone Go module with a pure-stdlib fake agent binary. ReadsROBODEV_SCENARIOand emits a pre-scripted NDJSON event stream, then exits with the appropriate code. Scenarios:success,loop,thrash,fail,tournament_a,tournament_b,judge. Built into a scratch container image (UID 10000, read-only root FS) viahack/fake-agent/Dockerfile.tests/e2e/workflow_helpers_test.go— shared helpers for workflow E2E tests:workflowFakeEngine,workflowSecondEngine,mockWorkflowTicketing,testLogWriter, and utility functions (ensureNamespace,waitForTicketComplete,waitForTicketFailed,runReconcilerInBackground, etc.).tests/e2e/workflow_test.go— seven end-to-end workflow tests that exercise the full pipeline against a real kind cluster (real K8s Jobs, real pod log streaming):TestWorkflowHappyPath— ticket → Job → NDJSON stream →StateSucceededTestWorkflowJobFailure— non-zero exit → retry exhaustion →StateFailedTestWorkflowEngineChainFallback— primary fails → fallback engine succeedsTestWorkflowPRMHintDelivery— looping agent triggers PRM nudge interventionTestWorkflowWatchdogTermination— cost-thrashing agent terminated by watchdogTestWorkflowSequentialTasksMemory— episodic memory injected into second taskTestWorkflowTournamentEndToEnd— 2 candidates + 1 judge → winner selected- Makefile — new targets
fake-agent-image,fake-agent-load, ande2e-workflow-test(FAKE_AGENT_IMAGEvariable; runsTestWorkflow*suite).
SQLite Persistence for Routing, Estimator, and Watchdog Stores¶
internal/routing/sqlite.go—SQLiteFingerprintStoreimplementingFingerprintStore; schema:fingerprints(engine_name PK, data_json, updated_at); uses a shadow JSON struct to avoid serialising the embeddedsync.RWMutexinEngineFingerprint.internal/estimator/sqlite.go—SQLiteEstimatorStoreimplementingEstimatorStore; schema:prediction_outcomeswith index onengine; kNN similarity search runs in Go using Euclidean distance over the complexity feature vector.internal/watchdog/sqlite.go—SQLiteProfileStoreimplementingProfileStore; schema:calibrated_profileswith composite primary key(repo_pattern, engine, task_type).- All three stores use WAL mode,
INSERT OR REPLACEupserts, andmodernc.org/sqlite(no CGO). - Config:
StoragePathadded toRoutingConfigandEstimatorConfig;CalibrationStorePathadded toAdaptiveCalibrationConfig. When empty the existing in-memory store is used. cmd/robodev/main.goconditionally constructs SQLite stores when the path is non-empty.- Unit tests (
sqlite_test.goin each package) cover save/get round-trips, upsert semantics, full-scanList, and persistence across close/reopen usingt.TempDir().
Security Hardening¶
- Memory graph tenant isolation (
internal/memory/store.go): ListNodesnow accepts atenantID stringparameter; an empty string returns all nodes (administrative / startup use only).DeleteNodenow accepts atenantID stringparameter and rejects cross-tenant deletes with an explicit error.SaveEdgevalidates that both endpoint nodes share the sametenant_id; cross-tenant edges are rejected before any write.nodeTenantIDhelper returns("", nil)for nodes absent from the store so that edges to unregistered (external) nodes are still accepted.internal/memory/graph.goupdated:LoadFromStorepasses""to load all tenants;PruneStalepasses each node's own tenant ID when callingDeleteNode.- Adversarial tests added for cross-tenant
ListNodes,DeleteNode, andSaveEdge. - Diagnosis prompt injection (
internal/diagnosis/): - New
sanitise.goexportssanitiseForPrompt(s string, maxLen int) string— strips ASCII control characters (< 0x20 except\nand\t) and truncates tomaxLen. analyser.gocallssanitiseForPromptonWatchdogReasonandResult.Summarybefore building the classifier text.retry_builder.gowraps the corrective prescription in XML-style delimiters (<previous-attempt-output>) with an explicit instruction not to follow injected content.- Tournament judge prompt injection (
internal/tournament/judge.go): - Each candidate diff is wrapped with
<!-- CANDIDATE-DIFF-BEGIN -->/<!-- CANDIDATE-DIFF-END -->comment markers. - Instructions section clarified: treat diff content as data only; correctness outweighs cost efficiency.
- Config validation (
internal/config/validate.go,internal/config/config.go): - New
validate.gowithvalidateNonNegativeFloat,validatePositiveInt,validateFraction, andvalidateStorePathhelpers. Config.Validate()checksGuardRailsbounds,Routing.EpsilonGreedyin [0, 1],Memory.PruneThresholdin [0, 1],CompetitiveExecution.DefaultCandidates≥ 2 when non-zero,PRM.HintFilePath, and all newStoragePathfields for path-traversal safety.Load()callscfg.Validate()before returning; invalid configuration is rejected at startup.
LLM V2 — Optional LLM-backed Scoring, Extraction, and Classification¶
- Rate-limited LLM client (
internal/llm/ratelimited.go): RateLimitedClientwraps anyClientand enforces a minimum gap between requests (1s / rps). Thread-safe viasync.Mutex.NewRateLimitedClient(inner Client, rps float64) *RateLimitedClient.- PRM LLM scorer (
internal/prm/scorer_llm.go): LLMScoreruses aChainOfThoughtmodule (inputs:tool_calls,task_description; outputs:scoreint 1–10,reasoning) and falls back to the rule-basedScoreron error or out-of-range score.NewLLMScorer(client llm.Client, fallback *Scorer) *LLMScorer.- Config:
UseLLMScoring booladded toPRMConfig; wired inmain.go. - Memory LLM extractor (
internal/memory/extractor_llm.go): LLMExtractorcalls an LLM module (inputs:task_description,outcome,summary,engine,repo_url; outputs:facts,patternsJSON arrays) and merges results with the V1 rule-based extractor. Falls back to V1 on error.NewLLMExtractor(client llm.Client, v1 *Extractor, logger *slog.Logger) *LLMExtractor.- Config:
UseLLMExtraction booladded toMemoryConfig; wired inmain.go. - Diagnosis LLM analyser (
internal/diagnosis/analyser_llm.go): LLMAnalyserclassifies failure modes via LLM (inputs:watchdog_reason,result_summary,tool_call_count,files_changed; outputs:failure_mode,confidence,evidence). Unknown mode strings trigger fallback to the rule-basedAnalyser.NewLLMAnalyser(client llm.Client, fallback *Analyser, logger *slog.Logger) *LLMAnalyser.- Config:
UseLLMClassification booladded toDiagnosisConfig; wired inmain.go.
Integration Tests (tests/integration/)¶
subsystems_test.go(//go:build integration) — 8 scenario tests:TestPRMInterventions— 10 identical tool calls →ActionNudgereturned.TestMemoryAccumulation— extract from 10 fake task runs →QueryForTaskreturns non-empty context.TestWatchdogCalibrationConverges— 15 observations →RefreshProfile→ calibrated thresholds differ from defaults.TestDiagnosisOnLoopingCalls— looping tool calls →ModelConfusiondiagnosis; retry prompt contains injection-defence delimiters.TestRoutingConvergence— 20 outcomes (claude-code 90%, aider 25%) →SelectEnginespicks claude-code as primary ≥ 8/10 times.TestCostEstimatorAccuracy— 10 outcomes →Predictreturns cost within 3× of mean.TestTournamentFlow—StartTournament→ 2×OnCandidateComplete→BeginJudging→SelectWinner→ assertsStateSelected.TestTournamentJudgeInjectionDefense— prompt containsCANDIDATE-DIFF-BEGINmarkers and injection instruction.all_features_test.go(//go:build integration) —TestAllFeaturesEnabledsmoke test: wires PRM + memory + watchdog + routing + estimator + diagnosis + transcript + tournament into aReconciler, processes two tickets, and asserts no panics with all task runs reaching a terminal state.
PRM Hint File Writer (internal/controller/controller.go)¶
writeHintFile(ctx, taskRunID, content string) error— delivers PRM hint content directly to the running agent pod via the Kubernetes exec API (remotecommand.NewSPDYExecutor). Usesteerather than a shell script to avoid injection. Caches pod names from the stream reader to avoid repeated pod lookups.cleanupHintFile(ctx, taskRunID string)— best-effortrm -fof the hint file on task completion (5 s timeout, errors are logged but not propagated).cleanupPodName(taskRunID string)— removes the cached pod name entry after task completion.validateHintPath(path string) error— rejects paths containing..components to prevent path traversal; called before every exec operation.recordPRMHintupdated to deliver hints asynchronously (10 s timeout goroutine) instead of only logging.buildK8sClient()inmain.gonow returns*rest.Configalongside the client;WithRestConfig(cfg)option passes it to the controller.
Tournament Coordinator Wiring (internal/controller/controller.go, cmd/robodev/main.go)¶
ProcessTicketnow detects tournament-eligible tasks: whentournamentCoordinatoris set,competitive_execution.enabled=true,default_candidates >= 2, andlen(engines) >= 2, it callslaunchTournamentinstead of launching a single job.launchTournament(ctx, ticket)— creates N parallel candidateTaskRuns + K8s Jobs, registers the tournament withtournament.Coordinator.StartTournament, records role/ID maps, and starts stream readers for claude-code candidates. The first candidate is stored under the standard idempotency key (ticketID-1) to prevent double-processing.handleCandidateComplete(ctx, tr, tournamentID)— transitions the candidate to Succeeded, callsOnCandidateComplete, and triggerslaunchJudgewhen the early-termination threshold is met. Lagging candidates' stream readers are cancelled before judging begins.launchJudge(ctx, tournamentID)— pre-generates the judge task run ID, callsBeginJudgingatomically (preventing duplicate judges under concurrent candidate completions), builds a judge prompt viatournament.JudgePromptBuilder, and launches the judge K8s Job.handleJudgeComplete(ctx, tr, tournamentID)— parses aJudgeDecisionJSON object from the judge's result summary (extracting the first{...}block to tolerate surrounding prose), callsSelectWinner, marks the ticket complete with the winner's result, and sends notifications. Defaults to candidate 0 when parsing fails.handleJobCompleteandhandleJobFailedboth dispatch viataskRunRolemap to the above handlers before normal flow; both callcleanupHintFileandcleanupPodNameon completion.WithTournamentCoordinator(c)option wired inmain.gowhenconfig.CompetitiveExecution.Enabledis true.go.sumupdated forgithub.com/gorilla/websocket,github.com/moby/spdystream, andgithub.com/mxk/go-flowrate(transitive deps ofk8s.io/client-go/tools/remotecommand).
Item 23 — Skills, Subagents, and Per-Task MCP Plugins¶
Custom skills for Claude Code (pkg/engine/claudecode/skills.go, pkg/engine/claudecode/engine.go)
- New Skill struct with Name, Inline, and Path fields
- SkillEnvVars(skills []Skill) map[string]string converts skills to container environment
variables: inline skills are base64-encoded into CLAUDE_SKILL_INLINE_<NAME>; path-based
skills use CLAUDE_SKILL_PATH_<NAME>
- WithSkills(skills []Skill) functional option on ClaudeCodeEngine; skill env vars are
injected into ExecutionSpec.Env during BuildExecutionSpec
Configuration (internal/config/config.go)
- New SkillConfig struct with Name, Path, Inline fields added to
ClaudeCodeEngineConfig as Skills []SkillConfig
- MCPServers []string added to TaskProfileConfig for future per-profile MCP server overrides
Container startup (docker/engine-claude-code/setup-claude.sh)
- setup-claude.sh now reads CLAUDE_SKILL_INLINE_* and CLAUDE_SKILL_PATH_* env vars and
writes the corresponding Markdown files to ~/.claude/skills/<name>.md before starting
the agent, making them available as /skill-name invocations
Main wiring (cmd/robodev/main.go)
- cfg.Engines.ClaudeCode.Skills translated to claudecode.Skill{} slice and passed to
claudecode.New(claudecode.WithSkills(...)) at startup
Tests (pkg/engine/claudecode/skills_test.go, engine_test.go)
- 8 unit tests covering inline encoding, path skills, mixed skills, empty skill handling,
base64 round-trip, env var key generation, and execution spec integration
Phase 2 Wiring — Diagnosis, Calibration, Routing, Estimator, SCM Router, Transcript¶
Diagnosis subsystem (internal/diagnosis/, internal/controller/controller.go)
- WithDiagnosis(analyser, retryBuilder) option wires the causal failure analyser into the controller
- handleJobFailed now runs Analyser.Analyse before deciding whether to retry; ShouldRetry skips retry when the same failure mode recurs or when InfraFailure is diagnosed; RetryBuilder.Build enriches the retry prompt with corrective instructions and optionally switches engine based on diagnosis
- DiagnosisRecord appended to tr.DiagnosisHistory on every diagnosis cycle
- Fixed: launchRetryJob method added — same-engine retries were previously left in StateRetrying indefinitely with no new job being created
Watchdog + adaptive calibration (internal/watchdog/, internal/controller/controller.go)
- WithWatchdog(wd) and WithWatchdogCalibration(cal, pr) options wire the watchdog and calibration pipeline
- Watchdog background loop started inside Reconciler.Run via wd.Start
- ConsumeStreamEvent wired into the stream reader for real-time telemetry updates
- Per-task-run heartbeat tracking (heartbeats, heartbeatSeqs maps) feeds watchdog.Check
- recordTaskOutcome calls calibrator.Record + profileResolver.RefreshProfile after every terminal task run
- main.go initialises Calibrator, MemoryProfileStore, ProfileResolver and uses NewWithCalibration constructor
Intelligent routing (internal/routing/, internal/controller/controller.go)
- WithIntelligentSelector(sel) option; recordTaskOutcome calls RecordOutcome on completion
- main.go initialises MemoryFingerprintStore + IntelligentSelector when routing.enabled: true
Cost/duration estimator (internal/estimator/, internal/controller/controller.go)
- WithEstimator(predictor, scorer) option; ProcessTicket calls ComplexityScorer.Score + Predictor.Predict before job creation; auto-rejects when predicted cost exceeds the configured threshold
- recordTaskOutcome calls estimatorPredictor.RecordOutcome on every terminal task run
- main.go initialises with in-memory store when estimator.enabled: true
Multi-SCM routing — item 22 (internal/scmrouter/)
- New internal/scmrouter package: Router.For(repoURL) selects the correct backend by matching the repo URL host against configured glob patterns; falls back to the first backend when no pattern matches
- WithSCMRouter(router) option on Reconciler
- main.go: checks cfg.SCM.Backends array first; falls back to legacy single-backend cfg.SCM.Backend for backwards compatibility
- SCMBackendEntry struct and Backends []SCMBackendEntry added to SCMConfig in internal/config/config.go
Transcript storage — item 21 (pkg/plugin/transcript/)
- New pkg/plugin/transcript package: TranscriptSink interface with Append and Flush
- New pkg/plugin/transcript/local: LocalSink writes NDJSON audit files (one .jsonl per task run) to a configured directory
- WithTranscriptSink(sink) option on Reconciler; stream reader wires Append as an event processor; transcript flushed when stream forwarder exits
- AuditConfig + TranscriptConfig added to internal/config/config.go; local backend activated when audit.transcript.backend: local and path are set
Agent Stream Logging Processor (internal/agentstream/logging.go)¶
- New
NewLoggingEventProcessor(logger *slog.Logger) StreamEventProcessor— logs each NDJSON stream event as a human-readable structuredslogline in the controller's own logs, giving operators a clean view of agent activity without touching (or losing) the raw pod logs - Log format per event type:
tool_use→ INFO"agent tool call"withtoolandinput(first 80 chars of args);content→ DEBUG"agent content"withroleonly (content text deliberately elided to avoid logging LLM output);result→ INFO"agent result"withsuccess,summary,mr_url;cost→ INFO"agent cost"with token counts and USD; system/unknown → DEBUG"agent system event" - Comprehensive table-driven tests covering all event types, nil-Parsed guards, content elision at both INFO and DEBUG levels, and the 80-character args truncation boundary
Multi-Workflow Shortcut Support (pkg/plugin/ticketing/shortcut/shortcut.go)¶
- New exported
WorkflowMappingstruct pairing a trigger state name with an in-progress state name - New
WithWorkflowMappings(mappings []WorkflowMapping) Option— configures multiple trigger states;PollReadyTicketsissues one API call per trigger state and merges/deduplicates results by story ID InitresolvestriggerStateIDfor every mapping; backward-compatible: singleWithWorkflowStateName/WithInProgressStateNamepair is automatically synthesised into a single mappingMarkInProgressnow selects the correct in-progress state by matching the story's currentworkflow_state_idagainst the mapping whose trigger state it was found in; falls back to the first mapping with a warning if no match is foundWorkflowStateID()returns the first mapping's resolved trigger state ID (for webhook filtering)- New
WorkflowMappings() []WorkflowMappingaccessor - New
pollQueryhelper to avoid duplicating query-build/parse logic across multiple trigger states cmd/robodev/main.goinitShortcutBackendreads theworkflowsarray from the Shortcut config map and callsWithWorkflowMappings; the legacyworkflow_state_name/in_progress_state_namekeys continue to work unchanged- New tests: multi-mapping Init, multi-workflow poll merge, overlap deduplication, mapping selection in
MarkInProgress
Configurable Code Review Gate (internal/config/config.go, internal/controller/controller.go)¶
- New
CodeReviewConfigstruct withenabled,backend,wait_for_comments,timeout_minutes,token_secretfields; added ascode_review:top-level key inConfig - Controller
handleJobCompletenow respectsconfig.CodeReview.Enabled: whenfalse(the default) no review wait occurs; whentrueand areviewBackendis wired, callsReviewDiffwith a configurable timeout (default 15 minutes) and logs the outcome before proceeding - New
ShortcutWorkflowstruct (trigger_state,in_progress_state) exported frominternal/config/for typed config parsing
Roadmap (docs/roadmap.md)¶
- Added Phase K (near-term) items 21–23: Transcript Storage & Audit Log, Multi-SCM Backend Routing, Skills/Subagents/Per-Task MCP Plugins
- Added Phase L (longer-term) items 24–25: Non-Standard Task Types (requires design doc), Supervisor Agent (requires design doc)
- Updated summary table
All Plugin Backends Wired into Controller (cmd/robodev/main.go)¶
- Linear ticketing:
initLinearBackendreads token secret, team_id, state_filter, labels, and exclude_labels from config - Discord notifications:
initDiscordChannelreads webhook_url from config - Telegram notifications:
initTelegramChannelreads token secret, chat_id, and optional thread_id from config - Slack approval:
initApprovalBackendreads channel_id and token secret; controller gainsWithApprovalBackend(approval.Backend)option - GitHub/GitLab SCM:
initSCMBackendreads token secret and optional base_url; controller gainsWithSCMBackend(scm.Backend)option - CodeRabbit review:
initReviewBackendreads api_key_secret; controller gainsWithReviewBackend(review.Backend)option - K8s/Vault secrets resolver:
initSecretsResolveriteratescfg.SecretResolver.Backendsand registers each backend by scheme; controller gainsWithSecretsResolver(*secretresolver.Resolver)option - Aider/Codex engines: wired in when
cfg.Engines.Aider/cfg.Engines.Codexare non-nil config.ApprovalConfigstruct added to top-levelConfig(mirrors ReviewConfig / SCMConfig pattern)- Notifications loop refactored from if/else chain to switch so Discord and Telegram have equal first-class handling
- Startup wiring test added (
cmd/robodev/init_test.go, 20 tests) — calls every init function directly with a fake Kubernetes client to verify that all supported backend strings reach their init function rather than falling through to the unsupported-backend error branch
Shortcut Backend Wired into Controller (cmd/robodev/main.go)¶
initShortcutBackendhelper function readstoken_secret,workflow_state_name,in_progress_state_name,owner_mention_name, andexclude_labelsfrom config, fetches the API token from a Kubernetes Secret, and callsInit()to resolve human-readable names to Shortcut API identifiers at startup"shortcut"case added to the ticketing backend selection block — previously only"github"was wired in- Webhook server automatically configured with
WithShortcutTargetStateIDwhen the Shortcut backend is active, so only story transitions to the trigger state generate webhook events
Fixed¶
- Add environment variable stripping to Codex, Cline, and OpenCode engine entrypoints (previously only Claude Code had it)
- Fix mermaid diagram in
docs/concepts/prm.mdthat failed to render on GitHub (replaced≥/<with safe alternatives) - Fix sandbox status contradiction in
docs/roadmap.md(summary said not started, but Phase C items were all complete) - Update Docker Compose quickstart and Documentation Site status in
docs/roadmap.mdto reflect shipped work - Sync all 13 items in
docs/improvements-plan.mdwith actual implementation status (12 items were marked "Planned" despite being implemented)
Added¶
DSPy-Inspired LLM Abstraction (internal/llm/)¶
- New
internal/llm/package providing a typed, composable LLM abstraction inspired by DSPy Signaturetype defining typed input/output fields for structured LLM interactionsModuleinterface withPredict(single LLM call) andChainOfThought(step-by-step reasoning) implementationsAdapterconverting signatures + inputs into formatted prompts and parsing structured JSON responsesClientinterface withAnthropicClientimplementation usingnet/httponly (no external SDK)Budgettracker with per-subsystem spend limits, token counting, and concurrent-safe access- Full table-driven unit test suite for all types, modules, adapter, client (with
httptestserver), and budget
PRM and Memory Controller Integration¶
- PRM wired into controller:
WithPRMConfigreconciler option, PRM evaluator lifecycle (creation instartStreamReader, cleanup inhandleJobComplete/handleJobFailed), real-time event processing viaWithEventProcessor, hint recording with Prometheus metrics - Memory wired into controller:
WithMemoryreconciler option accepting graph, extractor, and query engine; knowledge extraction on task completion and failure via background goroutines; memory context query before execution spec building - Memory context in prompts:
MemoryContextfield onengine.Task, propagated throughBuildPrompt,BuildPromptWithProfile, andBuildPromptWithTeamsin the prompt builder template - Memory initialisation in main.go: SQLite store opening, graph loading, extractor and query engine creation, background decay goroutine with configurable interval and prune threshold, graceful store close on shutdown
- PRM initialisation in main.go: Config mapping from
config.PRMConfigtoprm.Config, logger wiring - Backwards compatible: PRM disabled by default (
prm.enabled: false), Memory disabled by default (memory.enabled: false); no behavioural change when features are off - New controller unit tests:
TestWithPRMConfig,TestWithMemory,TestProcessTicketWithMemoryContext,TestHandleJobCompleteWithMemory,TestHandleJobFailedWithMemory - New integration tests:
tests/integration/prm_controller_test.go(PRM wiring, disabled no-op),tests/integration/memory_controller_test.go(memory extraction wiring, context injection, disabled no-op)
Bleeding-Edge Agentic Engineering Features (Scaffolding)¶
Note: PRM and Memory are now wired into the controller and functional when enabled. The remaining five features (Diagnosis, Routing, Estimator, Tournament, Adaptive Watchdog) have complete packages with types, core logic, unit tests, and integration tests, but are not yet wired into the controller. See
docs/roadmap.mdPhase I for the full integration plan.
Controller-Level Process Reward Model (Feature 2) — Real-Time Agent Coaching¶
- New
internal/prm/package: rule-based step scoring from tool call patterns, trajectory tracking with pattern detection (sustained decline, plateau, oscillation, recovery), and intervention decision logic (continue/nudge/escalate) Scorerevaluates rolling windows of tool calls: penalises repetition, rewards productive patterns (read→edit, edit→test), and tracks tool diversityTrajectorydetects patterns across score history with configurable window lengthInterventionDecidertriggers soft nudges (hint file at/workspace/.robodev-hint.md) or watchdog escalation based on score thresholds and trajectory patternsEvaluatorties scoring, trajectory, and intervention into a single entry point wired into the streaming pipeline viaWithEventProcessorPRMConfigin controller config:evaluation_interval,window_size,score_threshold_nudge,score_threshold_escalate,hint_file_path,max_budget_usd- Prometheus metrics:
prm_step_scoreshistogram,prm_interventions_totalcounter by action,prm_trajectory_patterns_totalcounter by pattern
Cross-Task Episodic Memory with Temporal Knowledge Graph (Feature 3)¶
- New
internal/memory/package: persistent knowledge graph accumulating structured lessons from every TaskRun across all engines, repos, and tenants - Node types:
Fact(with temporal validity and confidence decay),Pattern(recurring observations),EngineProfile(per-engine capability data) - Edge relations:
relates_to,contradicts,supersedeswith weighted connections - SQLite-backed storage (
modernc.org/sqlite, pure Go) with auto-migration on startup - Heuristic knowledge extractor: harvests facts from task outcomes, watchdog events, engine fallbacks, and tool call patterns
- Temporal query engine: retrieves relevant facts weighted by recency, confidence, and decay rate with cross-tenant isolation
MemoryConfigin controller config:store_path,decay_interval_hours,prune_threshold,max_facts_per_query,tenant_isolation- Prometheus metrics:
memory_nodes_totalgauge by type,memory_queries_totalcounter,memory_extractions_totalcounter by outcome,memory_confidence_distributionhistogram
Self-Healing Retry with Causal Diagnosis (Feature 4)¶
- New
internal/diagnosis/package: structured failure diagnosis pipeline replacing blind retries with informed corrective action - Failure mode classifier:
WrongApproach,DependencyMissing,TestMisunderstanding,ScopeCreep,PermissionBlocked,ModelConfusion,InfraFailure— rule-based pattern matching on tool call sequences, event content, and watchdog reasons - Template-based prescription generator: per-failure-mode corrective instructions using safe
text/template(prevents prompt injection from agent output) - Retry builder: composes original prompt + diagnosis prescription + optional engine switch recommendation
DiagnosisHistory []DiagnosisRecordfield on TaskRun prevents repeating the same diagnosisDiagnosisConfigin controller config:max_diagnoses_per_task,enable_engine_switch- Prometheus metrics:
diagnosis_totalcounter by failure mode,diagnosis_engine_switches_totalcounter,diagnosis_retry_success_totalcounter
Adaptive Watchdog Calibration (Feature 5)¶
- New
internal/watchdog/calibrator.go: running percentile statistics (P50, P90, P99) per (repo pattern, engine, task type) for key telemetry signals — token consumption rate, tool call frequency, file change rate, cost velocity, duration, consecutive identical tools - New
internal/watchdog/profiles.go: calibrated threshold profiles with cold-start fallback (minimum 10 completed TaskRuns before overriding static defaults), best-match profile resolution (exact → partial → global → static) AdaptiveCalibrationConfigin watchdog config:min_sample_count,percentile_threshold(p50/p90/p99),cold_start_fallback- Prometheus metrics:
watchdog_calibrated_thresholdgauge by (signal, repo_pattern, engine, task_type),watchdog_calibration_samplesgauge,watchdog_calibration_overrides_totalcounter
Engine Fingerprinting and Intelligent Task Routing (Feature 6)¶
- New
internal/routing/package: statistical engine profiles built from historical task outcomes, replacing static fallback chains with data-driven routing EngineFingerprintwith Laplace-smoothed success rates across dimensions: task type, repo language, repo size, complexityIntelligentSelectorimplementingEngineSelectorinterface: weighted composite scoring with epsilon-greedy exploration (default ε=0.1) to discover new engine capabilities- In-memory fingerprint store with thread-safe concurrent access
RoutingConfigin controller config:epsilon_greedy,min_samples_for_routing,store_path- Prometheus metrics:
routing_engine_selected_totalcounter by engine,routing_exploration_totalcounter,routing_fingerprint_samplesgauge by engine,routing_success_rategauge by (engine, dimension, value)
Predictive Cost and Duration Estimation (Feature 7)¶
- New
internal/estimator/package: pre-execution cost ($) and duration (minutes) prediction based on task complexity and historical data - Multi-dimensional complexity scorer: description complexity, label complexity, repo size, task type complexity → weighted composite score
- k-nearest-neighbours predictor (k=5): finds similar historical tasks, returns [P25, P75] cost/duration ranges with confidence based on sample count
- Cold-start defaults per engine when insufficient historical data
- New guard rail:
max_predicted_cost_per_job— auto-reject tasks predicted to exceed cost threshold EstimatorConfigin controller config:max_predicted_cost_per_job,default_cost_per_engine,default_duration_per_engine- Prometheus metrics:
estimator_predictions_totalcounter,estimator_predicted_costhistogram,estimator_auto_rejections_totalcounter,estimator_prediction_accuracyhistogram
Competitive Execution with Tournament Selection (Feature 1)¶
- New
internal/tournament/package: launch N parallel K8s Jobs with different engines/strategies, judge results, and select the best solution Coordinatormanages tournament lifecycle: start parallel candidates, track completions, trigger early termination, launch judge, select winnerJudgeBuilderconstructs judge Jobs with side-by-side diff comparison prompts and structured JSON output- Tournament-aware TaskRun fields:
TournamentID,CandidateIndex,TournamentState(Competing/Judging/Selected/Eliminated) CompetitiveExecutionConfigin controller config:default_candidates,judge_engine,early_termination_threshold,max_concurrent_tournaments- Prometheus metrics:
tournament_totalcounter,tournament_candidates_totalcounter by engine,tournament_winner_engine_totalcounter by engine,tournament_cost_totalhistogram,tournament_duration_secondshistogram
Strategic Roadmap Phase A-E (Items 1-8)¶
Enhanced Claude Code Engine (Item 1)¶
DefaultTaskResultSchemaconst with JSON schema for structured task result output- Functional options pattern:
WithFallbackModel,WithToolWhitelist,WithJSONSchema - Conditional CLI flag construction:
--output-format stream-json,--json-schema,--fallback-model,--no-session-persistence,--append-system-prompt,--allowedTools,--disallowedTools - Extended
EngineConfigwithFallbackModel,ToolWhitelist,ToolBlacklist,JSONSchema,AppendSystemPrompt,NoSessionPersistence,StreamingEnabledfields - Extended
ClaudeCodeEngineConfigwith matching YAML fields
Real-Time Agent Streaming (Item 2)¶
- New
internal/agentstream/package: NDJSON event types (ToolCallEvent,ContentDeltaEvent,CostEvent,ResultEvent), K8s pod log stream reader, event forwarder to watchdog and notifications - Watchdog
ConsumeStreamEventmethod for live tool call tracking, cost monitoring, and heartbeat updates from streaming agents - Controller stream reader lifecycle: starts per active Claude Code job, cancels on completion/failure
StreamingConfigwithEnabledandLiveNotificationsfields--verboseflag added when streaming is enabled for richer event data
Engine Fallback Chains (Item 3)¶
EngineSelectorinterface withDefaultEngineSelectorimplementation- Ticket label override (
robodev:engine:<name>) for per-ticket engine selection FallbackEngines []stringinEnginesConfigfor ordered fallback chainEngineAttemptsandCurrentEnginetracking on TaskRun- Automatic fallback to next engine in
handleJobFailedbefore exhausting retries launchFallbackJobhelper with unique job IDs per attempt
Agent Sandbox Integration (Item 4)¶
SandboxBuilderimplementingJobBuilderwith gVisor/Kata RuntimeClass, SandboxClaim annotations, and warm pool labelsExecutionConfigwithBackend("job"/"sandbox"/"local") andSandboxConfig(runtime class, warm pool, env stripping)- Helm templates:
runtimeclass-gvisor.yamlandsandboxwarmpool.yaml(gated bysandbox.enabled) - Environment variable stripping in Claude Code entrypoint (guarded by
ENV_STRIPPING)
Multi-Agent Coordination Phase 1 (Item 5)¶
TeamsConfigwith enabled, mode, and max_teammatesTeamsFlagsgenerating--teammate-modeCLI flagTeamsEnvVarssettingCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMSandCLAUDE_CODE_MAX_TEAMMATESWithTeamsConfigfunctional option on Claude Code engine- Team coordination prompt section in
BuildPromptWithTeams
TDD Workflow Mode (Item 6)¶
WorkflowInstructions()function returning structured instructions for "tdd" and "review-first" modesTaskProfileConfigwithWorkflow,ToolWhitelist,ToolBlacklistfields- Workflow instructions injected between task description and guard rails in prompt template
BuildPromptWithProfilemethod for profile-aware prompt construction
Approval Workflows and Audit Trail (Item 7)¶
TaskRunStoreinterface withSave,Get,List,ListByTicketIDmethodsMemoryStorethread-safe in-memory implementation- Pre-start and pre-merge approval gates in controller
- Queued → NeedsHuman state transition for approval holds
- Slack approval callback parsing (
robodev_approve_*/robodev_reject_*actions) ApprovalGatesandApprovalCostThresholdUSDin guard rails config
Local Development Mode (Docker Compose)¶
- DockerBuilder (
internal/jobbuilder/docker.go) implementingcontroller.JobBuilderfor local Docker execution — produces K8s Job objects annotated withrobodev.io/execution-backend: local - Builder selection in
cmd/robodev/main.go: readsexecution.backendconfig to choose between standard ("job"), sandbox, or local Docker builder - Noop ticketing file-watcher mode:
NewWithTaskFileconstructor reads tasks from a local YAML file, enabling local development without a real ticketing provider FileTaskstruct for YAML task definitions with ID, title, description, repo URL, and labelsdocker-compose.yamlfor running the controller outside Kubernetes with config and workspace volume mounts- Makefile targets:
compose-upandcompose-downfor Docker Compose lifecycle - Table-driven tests for DockerBuilder (backend annotation, security context, volumes, env vars, resources)
- Tests for noop file-watcher (valid YAML, empty file, malformed YAML, missing file, in-progress filtering)
Governance: Approval Workflows, Audit Trail, and TaskRun Store¶
- TaskRunStore interface (
internal/taskrun/store.go) withSave,Get,List, andListByTicketIDmethods for persistent TaskRun storage - MemoryStore in-memory implementation of TaskRunStore with thread-safe operations
- Approval gates configuration:
approval_gateslist andapproval_cost_threshold_usdin GuardRailsConfig (values: "pre_start", "high_cost", "pre_merge") - Pre-start approval gate: when "pre_start" is configured, TaskRun transitions Queued → NeedsHuman instead of launching a K8s Job, requiring human approval before execution begins
- Pre-merge approval gate: when "pre_merge" is configured, completed jobs transition Running → NeedsHuman instead of marking the ticket complete, requiring human approval before merge
- TaskRunStoreConfig with backend selection ("memory", "sqlite", "postgres") and SQLite path configuration — future durable backends can implement the TaskRunStore interface
- Reconciler now persists TaskRun state to the store after every state transition (creation, approval hold, running, completion, failure, retry)
WithTaskRunStorereconciler option; defaults to MemoryStore when not provided- Slack webhook handler now parses
robodev_approve_*androbodev_reject_*action IDs from approval callbacks, extracting task run IDs and logging structured approval/rejection events (stub for future resolution wiring) - Queued → NeedsHuman added as a valid state transition in the TaskRun state machine
- Table-driven tests for MemoryStore (save/get, list, filter by ticket ID, not-found error)
- Controller tests for pre-start approval gate blocking job creation, pre-merge gate holding completion, and store persistence verification
- Config loading tests for new governance fields (approval gates, cost threshold, taskrun store backend)
Integration Test Suite¶
- Comprehensive three-tier integration test architecture: Tier 1 (E2E against Kind cluster), Tier 2 (in-process with fake K8s client), Tier 3 (in-process, no K8s)
- No-op ticketing backend (
pkg/plugin/ticketing/noop/) as fallback for webhook-only and test deployments — prevents nil-pointer panics when no ticketing backend is configured - Tier 3 tests: engine spec validation across all 5 engines, JobBuilder security hardening verification
- Tier 2 tests: full reconciler pipeline, webhook-to-reconciler integration, TaskRun state machine lifecycle, guard rails with real reconciler wiring, secret resolver parsing and policy enforcement
- Tier 1 E2E tests: webhook signature validation (GitHub HMAC-SHA256), container security contexts, Helm resource verification, NetworkPolicy rules, Prometheus metrics endpoints
- Test orchestration script (
hack/run-integration-tests.sh) producing markdown reports suitable forclaude -panalysis - Test Helm values overlay (
hack/values-test.yaml) with webhook secrets and NetworkPolicy enabled - Makefile targets:
deploy-test,integration-test,test-report,test-all
Webhook Receiver & Event-Driven Ingestion¶
- HTTP webhook server (
internal/webhook/) with route handlers for GitHub, GitLab, Slack, Shortcut, and a configurable generic handler - GitHub HMAC-SHA256 signature validation (
X-Hub-Signature-256), issue event parsing foropenedandlabeledactions - GitLab secret token validation (
X-Gitlab-Token), issue and merge request event parsing - Slack signing secret validation (
X-Slack-Signature) with 5-minute replay attack prevention - Shortcut webhook handler with optional HMAC signature validation and story state change parsing
- Generic webhook handler with configurable HMAC or bearer token auth and dot-notation JSON field mapping
- Webhook server wired into
cmd/robodev/main.gowith graceful shutdown support - New
WebhookConfigin controller configuration for per-source secrets
Task-Scoped Secret Resolution¶
- Secret resolver (
internal/secretresolver/) parsing<!-- robodev:secrets -->HTML comment blocks androbodev:secret:label prefixes - Policy engine validating environment variable names against allowed/blocked glob patterns, URI scheme restrictions, and tenant scoping
- Multi-backend resolver dispatching secrets by URI scheme (
vault://,k8s://,alias://) to registered backends - Structured audit logging (secret names only, never values) for compliance and debugging
- New
SecretResolverConfigandVaultSecretsConfigtypes in controller configuration
HashiCorp Vault Secrets Backend¶
- Vault backend (
pkg/plugin/secrets/vault/) implementingsecrets.Backendinterface - Kubernetes auth method: reads ServiceAccount token and authenticates to Vault's
/v1/auth/kubernetes/loginendpoint - KV v2 secret reads with token caching for performance
- Uses
net/httpdirectly — no external Vault client library dependency
OpenCode Execution Engine¶
- OpenCode engine (
pkg/engine/opencode/) implementingExecutionEnginefor the OpenCode CLI - Command:
opencode --non-interactive --message <prompt>, context file:AGENTS.md - Supports Anthropic, OpenAI, and Google model providers
- OpenCode engine Dockerfile (
docker/engine-opencode/) withnode:22-slimbase, OpenCode CLI, and non-root user - OpenCode entrypoint script with repo cloning, multi-provider support, and semantic exit codes
- Makefile targets:
docker-build-engine-opencode,docker-build-dev-engine-opencode
Cline Execution Engine¶
- Cline engine (
pkg/engine/cline/) implementingExecutionEnginefor the Cline CLI - Command:
cline --headless --task <prompt> --output-format json, context file:.clinerules - Supports Anthropic, OpenAI, Google, and AWS Bedrock model providers
- Optional MCP (Model Context Protocol) support via
WithMCPEnabledoption and--mcpflag - Cline engine Dockerfile (
docker/engine-cline/) withnode:22-slimbase, headless CLI, and non-root user - Cline entrypoint script with repo cloning, MCP toggle (
MCP_ENABLED), JSON output, and semantic exit codes - Makefile targets:
docker-build-engine-cline,docker-build-dev-engine-cline
Shortcut Ticketing Backend¶
- Shortcut.com backend (
pkg/plugin/ticketing/shortcut/) implementingticketing.Backend - REST API v3 integration with story search, label management, comments, and completion
- Auth via
Shortcut-Tokenheader, configurable workflow state ID and label filtering
Linear Ticketing Backend¶
- Linear backend (
pkg/plugin/ticketing/linear/) implementingticketing.Backend - GraphQL API integration with issue queries, state transitions, comments, and label management
- Auth via raw API key, configurable team ID, state filter, and label filtering
Telegram Notification Channel¶
- Telegram channel (
pkg/plugin/notifications/telegram/) implementingnotifications.Channel - Bot API
sendMessageendpoint with Markdown formatting and optional topic-based thread support - Builder options:
WithAPIURL,WithHTTPClient,WithThreadID
Discord Notification Channel¶
- Discord channel (
pkg/plugin/notifications/discord/) implementingnotifications.Channel - Webhook-based with colour-coded rich embeds (green success, red failure, blue info)
- No auth library needed — webhook URL contains the token
NetworkPolicy & Security Hardening¶
- Agent NetworkPolicy (
networkpolicy-agent.yaml): deny all ingress, egress allowed to DNS (53), HTTPS (443), SSH (22) only - Controller NetworkPolicy (
networkpolicy-controller.yaml): allow webhook and metrics ingress, egress to DNS, HTTPS, and K8s API - PodDisruptionBudget (
pdb.yaml): configurableminAvailable/maxUnavailable, defaults tominAvailable: 1 - All templates gated by
networkPolicy.enabledandpdb.enabledvalues - Webhook Service template (
service-webhook.yaml) exposing webhook port separately from metrics - Webhook Ingress template (
ingress.yaml) with className, annotations, hosts, paths, and TLS support - Webhook container port added to controller Deployment template (conditional on
webhook.enabled) - New Helm values:
webhook(withservice,ingresssub-config),networkPolicy,pdbsections
GitHub Backend Filtering¶
- GitHub ticketing backend now supports filtering by assignee, milestone, and issue state in addition to labels
- Added client-side label exclusion to prevent re-pickup of in-progress and failed issues (default:
in-progress,robodev-failed) - New functional options:
WithAssignee,WithMilestone,WithState,WithExcludeLabels - Labels filter is now optional — omitting it enables assignee-only or milestone-only workflows
- Refactored
PollReadyTicketsURL construction to useurl.Valuesfor safer query parameter encoding
Live End-to-End Testing¶
- Wired up
cmd/robodev/main.gowith full backend initialisation: K8s client, GitHub ticketing, Claude Code engine, job builder, and Slack notifications - Controller now reads secrets from Kubernetes at startup (GitHub token, Slack token) using config-driven secret references
- Added
hack/setup-secrets.shinteractive script for provisioning K8s secrets (GitHub token, Anthropic API key, Slack bot token) - Added
hack/values-live.yamlHelm values overlay for live testing with real backends (conservative guardrails: $10 cost cap, 30min timeout, max 2 jobs) - Added Makefile targets:
setup-secrets,live-deploy,live-up,live-redeployfor full live testing workflow - In-cluster and kubeconfig fallback for K8s client creation (supports both deployed and local dev)
Local Development & E2E Testing¶
- Kind cluster configuration (
hack/kind-config.yaml) with two-node topology and host port mapping - Local dev Helm values overlay (
hack/values-dev.yaml) withpullPolicy: Never, disabled leader election, and NodePort access - End-to-end smoke test suite (
tests/e2e/) covering deployment readiness, health endpoints, metrics, RBAC, and resource creation - Makefile targets for full local workflow:
check-prereqs,kind-create,kind-delete,docker-build-dev,kind-load,deploy,undeploy,local-up,local-down,local-redeploy,e2e-test,logs - Configurable service type in Helm chart (supports NodePort for local development)
- Local development workflow documentation in CONTRIBUTING.md
Phase 1: Core Framework & Abstractions¶
- Protobuf definitions for all 7 plugin interfaces: ticketing, notifications, approval, secrets, review, SCM, engine (plus shared common.proto)
- buf.yaml and buf.gen.yaml for protobuf linting and Go/gRPC code generation
- Makefile targets:
proto-lintandproto-genfor protobuf workflow - All plugin Go interfaces: ticketing.Backend, notifications.Channel, approval.Backend, secrets.Backend, review.Backend, scm.Backend
- gRPC plugin host (pkg/plugin/host.go) using hashicorp/go-plugin with version handshake, crash detection, and restart with exponential backoff
- JobBuilder (internal/jobbuilder) translating ExecutionSpec to K8s batch/v1.Job with security contexts, tolerations, and labels
- Progress watchdog (internal/watchdog) with anomaly detection rules: loop detection, thrashing, stall, cost velocity, telemetry failure
- Cost tracker (internal/costtracker) with per-engine token rates and budget checking
- Prompt builder (internal/promptbuilder) with guard rails injection and task profile support
- Controller reconciliation loop with ticket polling, idempotency, guard rails validation, job lifecycle management, and retry logic
- Health endpoints (/healthz, /readyz) and Prometheus metrics serving in main.go
- Graceful shutdown with signal handling (SIGTERM, SIGINT)
Phase 2: Claude Code Engine + GitHub + Slack¶
- Claude Code execution engine (pkg/engine/claudecode) with BuildExecutionSpec, BuildPrompt, and hooks generation
- Claude Code hooks system for guard rails: PreToolUse blockers, PostToolUse heartbeat, Stop handler
- GitHub Issues ticketing backend (pkg/plugin/ticketing/github) with REST API integration
- GitHub SCM backend (pkg/plugin/scm/github) for branch and pull request management
- Kubernetes Secrets backend (pkg/plugin/secrets/k8s) with secret retrieval and env var building
- Slack notification channel (pkg/plugin/notifications/slack) with Block Kit formatted messages
- Slack approval backend (pkg/plugin/approval/slack) with interactive messages and callback handling
Phase 2D: Dockerfiles & Helm Chart¶
- Multi-stage controller Dockerfile (golang:1.23-alpine builder to distroless runtime)
- Claude Code engine Dockerfile with Node.js, Claude Code CLI, git, gh, python3, guard rail hooks
- Claude Code entrypoint.sh with repo cloning, CLAUDE.md injection, and semantic exit codes
- Guard rail hooks: block-dangerous-commands.sh (PreToolUse/Bash) and block-sensitive-files.sh (PreToolUse/Write|Edit)
- OpenAI Codex engine Dockerfile with Node.js, Codex CLI, and full-auto entrypoint
- Helm chart: plugin init container support via values.yaml plugins array and shared emptyDir volume
- Helm chart: metrics Service, leader election flag, and post-install NOTES.txt
- Makefile targets: docker-build-controller, docker-build-engine-claude-code, docker-build-engine-codex, docker-build, helm-lint
- .dockerignore for clean container builds
Phase 3: Codex + Aider + GitLab + CodeRabbit¶
- OpenAI Codex execution engine (pkg/engine/codex) with prompt-based guard rails
- Aider execution engine (pkg/engine/aider) with conventions.md support
- Engine registry (pkg/engine/registry.go) for engine selection and management
- GitLab SCM backend (pkg/plugin/scm/gitlab) with merge request support
- CodeRabbit review backend (pkg/plugin/review/coderabbit) for quality gate integration
Phase 4: Agent Teams, Scaling & Multi-Tenancy¶
- Claude Code agent teams configuration (pkg/engine/claudecode/teams.go) with in-process mode
- Multi-tenancy support via namespace-per-tenant model in config
- Quality gate configuration with security checks
- Progress watchdog configuration in main config
- Extended engine configs with auth settings (API key, Bedrock, Vertex, setup-token)
- Karpenter NodePool example (examples/karpenter/) with spot instances and taints
- KEDA ScaledObject example (examples/keda/) for Prometheus-based scaling
- Example configurations: github-slack, gitlab-teams, enterprise (with full feature set)
- Example third-party plugins: Jira (Python) and Microsoft Teams (TypeScript)
- Grafana dashboard JSON (charts/robodev/dashboards/)
- Scaling documentation (docs/scaling.md)
Phase 5: Community & Documentation¶
- Comprehensive README with architecture diagram, quick start, and full feature overview
- Guard rails documentation (docs/guardrails.md) covering all six layers
- Plugin interface documentation: ticketing, notifications, secrets, engines
- Expanded CI pipeline: protobuf linting, Helm linting, Docker build verification
- Release workflow with cosign image signing and syft SBOM generation
- GitHub issue templates (bug report, feature request, plugin request)
- Pull request template with checklist
- Comprehensive architecture documentation with system diagrams and component details
- Full security documentation with threat model, defence in depth, and container hardening
- Getting started guide with step-by-step quick start, configuration reference, and troubleshooting
- Plugin development guide covering Go built-in and gRPC third-party plugin authoring
- Expanded plugin interface docs: ticketing (269 lines), notifications (246 lines), secrets (261 lines), engines (500 lines) with full RPC coverage, implementation guidance, and design considerations
Documentation Site (MkDocs Material)¶
- MkDocs Material documentation site with deep purple/amber theme, dark mode, search, and code copy
- Landing page with feature cards, Mermaid architecture diagram, and full project layout reference
- Dual quick start paths: Docker Compose (K8s newbies) and Kubernetes (experienced users)
- Four newcomer-facing concept pages: What is RoboDev?, TaskRun Lifecycle, Engines Explained, Guard Rails Overview
- Full configuration reference page documenting all config sections from
internal/config/config.go - Troubleshooting guide covering controller, agent, webhook, notification, and watchdog issues
- Mermaid diagrams: system architecture, TaskRun state machine, job lifecycle sequence, guard rail layers, engine decision tree
- Community section with contributing guide, code of conduct, security policy, roadmap, and changelog
- GitHub Actions workflow for automatic deployment to GitHub Pages on push to main
- Makefile targets:
docs-serveanddocs-buildfor local development - Admonition boxes added to plugin documentation for tips, warnings, and interface info
- Existing docs updated: ASCII diagrams replaced with Mermaid, internal links fixed for MkDocs compatibility
getting-started.mdsplit intogetting-started/subdirectory (kubernetes, docker-compose, configuration, troubleshooting)
Infrastructure¶
- Go module and core skeleton: controller entrypoint, config loading, TaskRun state machine, Prometheus metrics, ExecutionEngine interface
- CI pipeline with lint, test, build, proto-lint, helm-lint, docker-build jobs
- Helm chart with deployment, RBAC, ConfigMap, ServiceMonitor, Service templates
- Community files: CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md
- Comprehensive table-driven tests for all packages (20 test suites)