Skip to content

LLM Abstraction Layer

RoboDev includes a DSPy-inspired LLM abstraction package (internal/llm/) that provides typed, composable LLM interactions for all intelligent subsystems. Rather than making ad-hoc API calls, subsystems define signatures that declare their inputs and outputs, and the framework handles prompt formatting, response parsing, and cost tracking automatically.

Why a Dedicated LLM Layer?

RoboDev's subsystems (PRM scoring, memory extraction, causal diagnosis, tournament judging) all need to call LLMs, but with very different schemas and cost constraints. Without a shared abstraction, each subsystem would re-implement prompt formatting, JSON parsing, error handling, and budget enforcement. The internal/llm/ package provides this shared foundation.

Key benefits:

  • Type safety — signatures define expected input/output fields with types, catching mismatches at build time rather than runtime
  • Composability — modules can be chained (e.g. chain-of-thought wraps predict)
  • Budget enforcement — per-subsystem spend limits prevent runaway costs
  • No external dependencies — uses only net/http and encoding/json from the standard library
  • Testability — the Client interface makes it trivial to mock LLM calls in tests

Core Concepts

Signatures

A Signature defines a typed contract for an LLM interaction, inspired by DSPy:

sig := llm.Signature{
    Name:        "ScoreToolCall",
    Description: "Evaluate an agent's recent tool call for productivity",
    InputFields: []llm.Field{
        {Name: "tool_calls", Description: "recent tool call sequence", Type: llm.FieldTypeString, Required: true},
        {Name: "context",    Description: "task context",              Type: llm.FieldTypeString, Required: false},
    },
    OutputFields: []llm.Field{
        {Name: "score",     Description: "productivity score 1-10",   Type: llm.FieldTypeInt,    Required: true},
        {Name: "reasoning", Description: "explanation of the score",  Type: llm.FieldTypeString, Required: true},
    },
}

The signature is validated before use — it must have a name and at least one output field. Input fields marked as Required are checked at prompt-formatting time.

Field Types

Type Go Representation Description
string string Free-form text
int int Integer values
float float64 Floating-point values
bool bool Boolean values
json map[string]any Arbitrary JSON objects

The adapter automatically coerces LLM responses to the declared type (e.g. JSON 7.0 → Go int(7) for FieldTypeInt).

Modules

A Module is a composable LLM operation that takes named inputs and returns named outputs:

type Module interface {
    Forward(ctx context.Context, inputs map[string]any) (map[string]any, error)
    GetSignature() Signature
}

Two built-in modules are provided:

Predict

The simplest module — a single LLM call with signature-based prompt formatting:

predict := llm.NewPredict(sig, client, budget)
result, err := predict.Forward(ctx, map[string]any{
    "tool_calls": "Read main.go, Edit main.go, Bash go test",
    "context":    "fixing a nil pointer dereference",
})
// result["score"] = 8  (int)
// result["reasoning"] = "productive sequence: read, edit, test"  (string)

ChainOfThought

Wraps Predict with an additional "reasoning" step that encourages step-by-step thinking:

cot := llm.NewChainOfThought(sig, client, budget)
result, err := cot.Forward(ctx, inputs)
// result["reasoning"] = "Step 1: the agent read the file..."
// result["score"] = 8
// result["verdict"] = "productive"

ChainOfThought automatically adds a reasoning output field and modifies the system prompt to encourage structured thinking before answering.

Client

The Client interface abstracts LLM API calls:

type Client interface {
    Complete(ctx context.Context, req CompletionRequest) (*CompletionResponse, error)
}

An AnthropicClient implementation is included, using net/http directly with no SDK dependency:

client := llm.NewAnthropicClient("sk-ant-...",
    llm.WithDefaultModel("claude-sonnet-4-20250514"),
    llm.WithBaseURL("https://api.anthropic.com"),  // or custom endpoint
)

The client supports:

  • API key authentication via the X-API-Key header
  • Model selection per-request or via default
  • Token counting in responses (input and output tokens)
  • Context cancellation and timeouts
  • Configurable HTTP client for proxies or custom TLS

Budget

The Budget type tracks cumulative spend for a subsystem and refuses calls when the budget is exhausted:

budget := llm.NewBudget(0.50) // $0.50 maximum

// Before each LLM call, the module checks:
err := budget.Check() // returns error if spent >= max

// After each call, tokens are recorded:
budget.RecordTokens(inputTokens, outputTokens)

// Query remaining budget:
remaining := budget.Remaining() // $0.35
spent := budget.Spent()         // $0.15

Budgets are thread-safe and can be shared across multiple modules within a subsystem.

Prompt Formatting

The adapter converts a Signature + inputs into a structured LLM request:

System prompt (auto-generated):

You are executing the "ScoreToolCall" operation.
Evaluate an agent's recent tool call for productivity

## Output Format

Respond with a JSON object containing the following fields:

- `score` (int) (required): productivity score 1-10
- `reasoning` (string) (required): explanation of the score

Respond ONLY with the JSON object. No markdown fences, no explanation.

User message (auto-generated from inputs):

## tool_calls

(recent tool call sequence)

Read main.go, Edit main.go, Bash go test

## context

(task context)

fixing a nil pointer dereference

The response is expected to be a JSON object matching the output fields. The parser handles common LLM response quirks:

  • Strips markdown code fences if present
  • Coerces numeric types (JSON float64 → Go int)
  • Reports clear errors for missing required fields

Architecture

internal/llm/
├── types.go       — Signature, Field, FieldType definitions
├── module.go      — Module interface, Predict, ChainOfThought
├── adapter.go     — Prompt formatting and response parsing
├── client.go      — Client interface + AnthropicClient
├── budget.go      — Budget tracking with spend limits
└── *_test.go      — Table-driven tests for all components

The entire package uses only standard library imports (net/http, encoding/json, fmt, sync, context). No external SDK dependencies.

Usage by Subsystems

Subsystem How It Uses internal/llm/
PRM (v2) ChainOfThought module with a scoring signature to evaluate agent tool calls
Memory (v2) Predict module with an extraction signature to identify facts from TaskRun data
Diagnosis (v2) ChainOfThought module with a classification signature to diagnose failure modes
Tournament Judge ChainOfThought module with a judging signature to compare candidate diffs

Each subsystem creates its own Budget to enforce per-subsystem cost limits independently.

Testing

All components have table-driven unit tests:

  • types_test.go — signature validation, field name helpers
  • adapter_test.go — prompt formatting, response parsing, type coercion, edge cases (markdown fences, missing fields)
  • client_test.go — HTTP request/response validation using httptest.NewServer, error handling, context cancellation
  • module_test.go — Predict and ChainOfThought with mock clients, budget integration
  • budget_test.go — spend tracking, limit enforcement, unlimited budgets, reset

To run the tests:

go test ./internal/llm/... -v