Skip to main content

Ralph Wiggum Loop Architecture

Overview

The Ralph Wiggum Loop is a continuous execution pattern designed for AI agent workflows. Named after the Simpsons character famous for his persistence ("I'm helping!"), this pattern ensures tasks are completed through persistent iteration.

Key Concepts

1. External Loop Pattern

Unlike internal AI chat loops, this is an external bash-style loop:

while true:
if all_tasks_complete():
break
execute_next_task()
run_tests()
commit_changes()
if max_iterations_reached():
break

2. Filesystem as Memory

  • The codebase itself serves as persistent memory
  • Task status is saved to tasks.json after each iteration
  • Git commits provide a history of changes
  • No reliance on chat history or session state

3. Task Workflow

Each iteration follows this sequence:

  1. Load tasks from tasks.json
  2. Find next pending task
  3. Execute task via Copilot SDK (or agent)
  4. Run tests to verify changes
  5. Commit successful changes to git
  6. Update task status
  7. Save tasks back to file

4. Safety Mechanisms

  • Max Iterations: Prevents infinite loops and cost overruns
  • Test Validation: Only commits changes that pass tests
  • Status Tracking: Failed tasks are marked and can be reviewed
  • Git History: Every change is tracked and reversible

Architecture

Components

1. RalphLoop

Core loop implementation that:

  • Manages iteration state
  • Loads/saves task state
  • Orchestrates agent execution
  • Controls loop lifecycle

2. AgentClient

Interface to the Copilot SDK:

  • Executes individual tasks
  • Reads codebase context
  • Runs tests
  • Commits changes

3. TaskManager

Handles task persistence:

  • Loads tasks from JSON
  • Saves task state
  • Finds next pending task

4. TuiApp

Terminal UI for monitoring:

  • Shows current iteration
  • Displays task status
  • Shows real-time logs
  • Allows pause/resume

Data Flow

┌─────────────┐
│ tasks.json │
└──────┬──────┘


┌──────────────────┐
│ RalphLoop │
│ - iteration │
│ - state │
└──────┬───────────┘


┌──────────────────┐
│ AgentClient │◄────── GitHub Models / Copilot SDK
│ - execute_task │
│ - run_tests │
│ - commit │
└──────┬───────────┘


┌──────────────────┐
│ Codebase │
│ (git repo) │
└──────────────────┘

Usage Patterns

Simple Task List

[
{
"id": "1",
"description": "Add user authentication",
"status": "pending"
},
{
"id": "2",
"description": "Add tests for authentication",
"status": "pending"
}
]

Multi-step Engineering

  1. Create a comprehensive task list
  2. Set appropriate max_iterations
  3. Start the loop
  4. Monitor progress in TUI
  5. Review commits as tasks complete

Recovery from Failures

  • Failed tasks remain in the list
  • Review the error logs
  • Adjust task description if needed
  • Restart the loop

Configuration

Environment Variables

  • GITHUB_TOKEN: GitHub token for GitHub Models API access (recommended)
  • COPILOT_API_TOKEN: GitHub Copilot API token (when using --model-provider copilot)

Command Line Options

  • --task-file: Path to task JSON (default: tasks.json)
  • --max-iterations: Safety limit (default: 100)
  • --work-dir: Repository directory (default: .)
  • --api-endpoint: API endpoint
  • --api-token: API token (or set COPILOT_API_TOKEN env var)
  • --model-provider: github-models, copilot, copilot-autopilot, or llama
  • --verify-command: Custom verification command
  • --evaluation-mode: command or agent-file
  • --headless: Run without TUI for CI environments
  • --reflection-rounds: Critic-actor rounds (default: 2, 0 to disable)
  • --replan-threshold: Failures before re-planning (default: 2, 0 to disable)
  • --ralph: Named ralph context from .wreck-it/config.toml; use all to run every ralph sequentially (headless only)
  • --goal: Generate tasks from a natural-language goal before starting
  • --max-cost: Maximum cumulative estimated API cost in USD (GitHub Models only)
  • --budget-strategy: greedy, critical-path (default), or conservative — used with --max-cost
  • --prompt-dir: Directory of per-role / per-task system-prompt template files
  • --work-dir-map: Per-role or per-task working directory overrides (ROLE_OR_ID=PATH, repeatable)
  • --notify-webhook: HTTP URL(s) to notify on task status transitions (repeatable)
  • --github-issues: Enable GitHub Issues integration
  • --github-repo: GitHub repository for Issues integration (owner/repo)
  • --max-autopilot-continues: Maximum autopilot continuation steps for copilot-autopilot provider

Best Practices

  1. Start Small: Test with 2-3 simple tasks first
  2. Clear Descriptions: Make task descriptions specific and actionable
  3. Set Realistic Limits: Use max_iterations based on task complexity
  4. Monitor Progress: Watch the TUI for unexpected behavior
  5. Review Commits: Check git history regularly
  6. Incremental Tasks: Break large features into smaller tasks

Limitations

  • Requires well-defined tasks
  • Best for tasks with clear success criteria
  • Testing must be automated
  • Works within single repository
  • Respects max iterations limit

Agent Swarm Capabilities

The following features extend the base Ralph Wiggum Loop into a full agent swarm orchestrator. All features work together and are exercised by the end-to-end integration test in cli/src/integration_eval.rs.

Role-Based Routing

Each task carries an AgentRole field that determines which type of agent should handle it:

RolePurpose
ideasResearch, explore, and generate follow-up tasks
implementer (default)Write code and make changes
evaluatorReview and validate completed work

Tasks without a role field default to implementer for backward compatibility. The filter_tasks_by_role helper routes tasks to the appropriate agent pool.

Dynamic Task Generation

An ideas (or any) agent can append new tasks to the task file at runtime:

  • generate_task_id(tasks, prefix) – produces a unique <prefix>N ID.
  • append_task(path, task) – validates and appends a task, enforcing:
    • Duplicate-ID rejection.
    • Circular-dependency detection (DFS).
    • A safety cap of MAX_TASKS (500) to prevent runaway generation.

Intelligent Scheduling (TaskScheduler)

TaskScheduler::schedule replaces the simple first-pending scan with a multi-factor scoring algorithm. Tasks are ordered from highest to lowest score before each iteration:

FactorEffect
priority (×10)Higher-priority tasks run sooner
complexity (×2, inverted)Simpler tasks are preferred (quick wins)
Dependency fan-out (×5)Tasks that unblock more work run first
failed_attempts (×3, penalty)Repeatedly-failing tasks back off
Time since last attempt (≤60 pts)Idle tasks avoid starvation

Only tasks whose depends_on list is fully satisfied (all dependencies in Completed status) are eligible.

Agent Memory Persistence

HeadlessState.memory is a free-form string log that grows across cron invocations. Each phase handler appends a line describing what happened (task triggered, PR created, merge result, etc.). Because HeadlessState is serialised to .wreck-it-state.json after every run, subsequent invocations start with full knowledge of previous actions.

Headless / Cloud-Agent Mode

In CI environments the loop does not run a local AI model. Instead it drives a cloud coding-agent state machine:

NeedsTrigger → create GitHub issue → assign Copilot
AgentWorking → poll for linked PR
NeedsVerification → merge PR when checks pass
Completed → mark task done, advance to next

State is persisted between cron invocations so the machine resumes correctly after each scheduled run.

For webhook-driven operation via a GitHub App (real-time event processing as an alternative or complement to cron), see the GitHub App Integration guide.

Parallel Task Execution

When TaskScheduler::schedule returns more than one ready task, the loop spawns a separate AgentClient per task and executes them concurrently via tokio::spawn. Results are merged back into the shared LoopState once all handles complete.

Task Lifecycle Kinds

Each task has a kind field that controls its lifecycle after completion:

KindBehaviour
milestone (default)Completes permanently. Standard one-shot work.
recurringResets to pending after completion, subject to an optional cooldown_seconds delay.

Recurring tasks are ideal for long-running goals that periodically need fresh work — for example keeping documentation up-to-date or maintaining a test coverage threshold. Before each scheduling pass the headless runner calls reset_recurring_tasks, which moves any completed recurring task whose cooldown has elapsed back to pending.

Example task file mixing both kinds, with an agent-evaluated precondition on the docs task so it only runs when source files have changed:

[
{
"id": "docs",
"description": "Review project structure and update documentation",
"status": "pending",
"kind": "recurring",
"cooldown_seconds": 86400,
"precondition_prompt": "Check if any source files have been modified since the last documentation update."
},
{
"id": "coverage",
"description": "Review test coverage. If below 90%, create and execute a plan to increase it",
"status": "pending",
"kind": "recurring",
"cooldown_seconds": 604800
},
{
"id": "auth",
"description": "Implement OAuth2 authentication",
"status": "pending"
}
]

Agent-Evaluated Preconditions

While cooldown_seconds provides a simple timer-based gate for recurring tasks, many real-world workflows need more nuanced checks. The precondition_prompt field lets an evaluation agent decide whether a task should run in a given iteration.

When a task has a precondition_prompt, the ralph loop spawns a dedicated precondition-evaluation agent before execution. The agent receives the task description, the precondition criteria, and access to the working directory. If the agent determines the precondition is satisfied it writes a marker file (.task-precondition-met); the loop checks for this file and only proceeds when it is present. If the precondition is not met the task is skipped and remains in pending status for the next iteration.

This is a key building block for powerful ralph loops: combined with recurring tasks and the intelligent scheduler, agent-evaluated preconditions allow you to build sophisticated, self-regulating automation that responds to the actual state of your codebase rather than just a clock.

Examples

A recurring documentation task that only runs when source files have actually changed:

{
"id": "docs",
"description": "Review project structure and update documentation",
"status": "pending",
"kind": "recurring",
"cooldown_seconds": 86400,
"precondition_prompt": "Check if any .rs source files have been modified since the last documentation update. Only proceed if the documentation may be stale."
}

A test-coverage guardian that only activates when coverage drops:

{
"id": "coverage",
"description": "Add tests to bring coverage back above 90%",
"status": "pending",
"kind": "recurring",
"cooldown_seconds": 604800,
"precondition_prompt": "Run the test suite and check if code coverage is below 90%. Only proceed if coverage is insufficient."
}

Implementation: cli/src/types.rsTask.precondition_prompt, cli/src/agent.rsAgentClient::evaluate_precondition, cli/src/ralph_loop.rs — precondition gate in run_single_task / run_parallel_tasks.


Per-Task Agent Memory

Each task accumulates a persistent memory log across iterations. After every execution attempt, wreck-it appends an entry (outcome + short summary) to a Markdown file at .wreck-it-memory/{task_id}.md. Before the agent is invoked for the next attempt, the memory file is loaded and prepended to the prompt, giving the agent full knowledge of prior outcomes.

.wreck-it-memory/
├── auth-impl.md # "Attempt 1 - Failure: missing import…"
└── auth-tests.md # "Attempt 1 - Success: all tests pass"

This prevents the agent from repeating the same mistakes and enables incremental progress across many cron invocations.

Implementation: cli/src/agent_memory.rsAgentMemory, load_context, record_attempt.


Epics, Sub-tasks, and the Project Management API

Tasks support a lightweight epic / sub-task hierarchy via two optional fields on Task:

FieldPurpose
parent_idID of the parent (epic) task. When set, this task is a sub-task.
labelsFree-form strings for categorization (board columns, tags, etc.).

A task with no parent_id that has other tasks pointing to it via parent_id is treated as an epic. The scheduler is unaffected — all tasks are eligible regardless of hierarchy. The hierarchy is primarily a data-organization feature consumed by external tools and the FFI layer.

The ProjectManager (cli/src/project_api.rs) exposes a high-level CRUD API on top of the task file:

list_tasks / get_task
list_epics / list_sub_tasks / epic_progress
create_task / create_sub_task / update_task / delete_task / move_task

A C-compatible FFI layer (cli/src/ffi.rs) wraps ProjectManager as extern "C" functions for consumption from Swift or any other C-ABI consumer. All data is exchanged as JSON-encoded C strings. Callers must free returned strings with wreck_it_free_string.

Implementation: cli/src/project_api.rsProjectManager, TaskUpdate. cli/src/ffi.rswreck_it_list_tasks, wreck_it_create_task, etc.


Parallel Persistent Ralph Loops (Multi-Ralph)

A repository can define multiple independent ralph contexts in .wreck-it/config.toml using [[ralphs]] table arrays. Each ralph has its own task file and state file so loops are fully isolated.

state_branch = "wreck-it-state"
state_root = ".wreck-it"

[[ralphs]]
name = "docs"
task_file = "docs-tasks.json"
state_file = ".docs-state.json"

[[ralphs]]
name = "coverage"
task_file = "coverage-tasks.json"
state_file = ".coverage-state.json"

Select which ralph to run via the --ralph CLI flag:

wreck-it run --headless --ralph docs
wreck-it run --headless --ralph coverage

# Run every ralph defined in config.toml sequentially (headless only)
wreck-it run --headless --ralph all

Each ralph can be driven by a separate GitHub Actions workflow so they run on independent schedules:

# .github/workflows/ralph-docs.yml
- run: ./target/release/wreck-it run --headless --ralph docs

# .github/workflows/ralph-coverage.yml
- run: ./target/release/wreck-it run --headless --ralph coverage

When no --ralph flag is provided, wreck-it falls back to the default single-ralph behaviour (task file and state file come from the headless config or CLI flags).

Single-Workflow Alternative

Multi-ralph is not required. A single workflow with a single task file that contains both milestone and recurring tasks is often sufficient. The scheduler handles both kinds transparently, and recurring tasks automatically reset after their cooldown elapses.

LLM-Powered Dynamic Task Planning (wreck-it plan)

The wreck-it plan --goal "..." sub-command (and the optional --goal flag for wreck-it run) converts a natural-language goal into a structured tasks.json via the configured LLM.

wreck-it plan --goal "Build a REST API with authentication" --output tasks.json

The planner prompt instructs the model to emit a JSON array of tasks with id, description, phase, and optional depends_on fields. The output is validated (no empty IDs, no duplicate IDs, phase ≥ 1) before being written to disk.

Implementation: cli/src/planner.rsTaskPlanner, parse_and_validate_plan.


Critic-Actor Reflection Loop

After the actor agent completes a task and before tests run, a lightweight critic prompt reads the git diff and evaluates it against the original task description. The critic returns a structured CriticResult { score, issues, approved }. If not approved, the actor is re-invoked with the critic's issues as additional context (up to reflection_rounds, default 2).

reflection_rounds = 2   # 0 disables reflection

Implementation: cli/src/agent.rsCriticResult, reflection loop in the AgentClient execution path.


Adaptive Re-Planning on Failure

After replan_threshold consecutive task failures (default 2), wreck-it invokes a re-planner agent that receives: the original task list, the failed task, the error output, and the current git status. The re-planner may: (a) rewrite the failed task description, (b) split it into smaller sub-tasks, or (c) inject a prerequisite task. The modified task list is persisted and the loop continues.

replan_threshold = 2   # 0 disables re-planning

Validation guards: duplicate IDs, circular dependencies, and completed tasks that must not be rolled back.

Implementation: cli/src/replanner.rsTaskReplanner, parse_and_validate_replan, build_replan_prompt.


Plan Migration (Cloud Agent Replanning)

Cloud agents (e.g. Copilot coding agent) work on the main branch and cannot modify the state branch directly. To support agent-driven replanning, agents write new or revised task plans as JSON files in .wreck-it/plans/ on the main branch. At the start of each headless iteration the runner migrates these plans into the state branch:

agent PR merges → .wreck-it/plans/feature-dev-tasks.json--batch-01.json lands on main

headless cron fires → migrate_pending_plans() reads plans dir

merge into state branch task file, remove consumed files

normal iteration loop picks up new/revised tasks

Targeted routing: Plan files use a {target}--{label}.json naming convention to specify which task file they target. For example, feature-dev-tasks.json--assessor.json merges into feature-dev-tasks.json on the state branch. Plain filenames (no -- separator) merge into the current ralph's default task file.

Merge rules: New task IDs are appended; existing non-completed tasks are replaced (allows plan updates); completed tasks are never overwritten.

Implementation: cli/src/plan_migration.rsmigrate_pending_plans; core/src/plan_migration.rsmerge_pending_tasks.


Typed Artefact Store / Context Chain

Tasks declare optional inputs (references to upstream artefacts) and outputs (artefacts to persist after completion). When a task completes its declared outputs are read from disk and stored in .wreck-it-artefacts.json. Downstream tasks that declare inputs have those artefacts injected into their agent prompt automatically.

{
"id": "design-1",
"outputs": [{ "kind": "summary", "name": "spec", "path": "spec.md" }]
}
{
"id": "impl-1",
"inputs": ["design-1/spec"],
"outputs": [{ "kind": "json", "name": "code", "path": "api.rs" }]
}

Implementation: cli/src/artefact_store.rsArtefactManifest, persist_output_artefacts, resolve_input_artefacts.


Gastown Cloud Runtime Integration

Tasks can declare runtime: "gastown" to offload execution to the gastown cloud agent service. wreck-it acts as a workflow DAG producer: it serialises the task graph and submits it to the gastown orchestrator. Gastown handles horizontal scaling, durable checkpointing, and capability negotiation.

{ "id": "heavy-task", "description": "...", "runtime": "gastown" }

Integration is enabled by setting both gastown_endpoint and gastown_token in the configuration. When either is absent, tasks fall back to local execution.

Integration pointImplementation
wreck-it → gastownGastownClient::build_dag / serialise_dag
gastown → wreck-itGastownClient::apply_status_events

Implementation: cli/src/gastown_client.rsGastownClient, WorkflowDag, DagNode, GastownStatusEvent.


Openclaw Provenance Tracking and Export

Every task execution is recorded as a provenance entry capturing: task ID, agent role, model, prompt hash, git diff hash, tool calls, timestamp, and outcome. Records are stored in .wreck-it-provenance/<task-id>-<ts>.json.

# Inspect the provenance chain for a single task
wreck-it provenance --task impl-1

# Export the full run as an openclaw-compatible JSON document
wreck-it export-openclaw --output run.openclaw.json

The openclaw export (OpenclawDocument) contains the complete task graph annotated with all provenance records and artefact links, ready to load into the openclaw plan-graph visualiser.

Implementation: cli/src/provenance.rsProvenanceRecord, persist_provenance_record, load_provenance_records. cli/src/openclaw.rsOpenclawDocument, build_document, serialise_document.


Security Gate

A security_gate task skips the LLM entirely and runs a security audit scanner:

  • Rust (Cargo.toml present) → cargo audit --json
  • Node.js (package.json present) → npm audit --json

Findings are serialised to JSON and written to the path declared in the task's first output artefact (defaulting to .wreck-it/security-findings.json). The task fails when critical or high severity vulnerabilities are found. Downstream implementer tasks can consume the artefact to self-remediate.

[
{ "id": "sec", "role": "security_gate", "description": "Audit dependencies",
"outputs": [{ "kind": "json", "name": "findings", "path": ".wreck-it/security-findings.json" }] },
{ "id": "fix", "description": "Fix critical vulnerabilities from findings",
"inputs": ["sec/findings"], "depends_on": ["sec"] }
]

Implementation: cli/src/security_gate.rsSecurityGateFindings, run_security_gate. core/src/types.rsAgentRole::SecurityGate.


Fan-Out / Fan-In Sub-Task Spawning

An ideas-role task can dynamically spawn a set of parallel sub-tasks by writing a SubTaskManifest artefact. After the task completes the runner:

  1. Parses each SubTaskManifest artefact as a SubTaskManifestSpec.
  2. Creates one Task per entry in sub_tasks at parent.phase + 1.
  3. Optionally creates a fan-in aggregator Task at parent.phase + 2 with depends_on set to all sibling IDs and inputs pre-populated from their declared outputs.

This pattern enables dynamic parallelism without pre-defining the task list:

{
"id": "planner",
"role": "ideas",
"description": "Analyse the codebase and produce a parallel refactor plan",
"outputs": [{ "kind": "sub_task_manifest", "name": "plan", "path": ".wreck-it/plan.json" }]
}

Implementation: cli/src/fan_out.rsdetect_and_spawn_fan_out. core/src/types.rsSubTaskManifestSpec, SubTaskSpec, FanInSpec, ArtefactKind::SubTaskManifest.


OTEL Tracing

Every task execution can be wrapped in an OTEL span exported to any OTLP-compatible collector (Jaeger, Honeycomb, Grafana Cloud, etc.). Spans carry:

  • Task id, description, role, phase, complexity, priority
  • Model name, prompt_tokens, completion_tokens, estimated cost in USD
  • Task outcome (success / failure) and retry attempt number

Configure via the [otel] section in your wreck-it.toml:

[otel]
endpoint = "http://localhost:4318" # OTLP HTTP base URL
service_name = "my-project" # optional, defaults to "wreck-it"

# Optional per-header overrides, e.g. Honeycomb API key:
# [otel.headers]
# "x-honeycomb-team" = "YOUR_API_KEY"

When otel is absent or endpoint is empty, no spans are created.

Implementation: cli/src/otel.rsinit_otel, TaskSpan, shutdown_otel.


Kanban Integration

wreck-it can synchronise task status with external project-management boards. The KanbanProvider trait abstracts operations on Linear, JIRA, and Trello so the rest of the codebase is provider-agnostic.

Configure via kanban_provider in .wreck-it/config.toml or wreck-it.toml:

kanban_provider = "linear"   # or "jira" / "trello"
kanban_api_key = "lin_api_…"
kanban_team_id = "…"

Each iteration, sync_kanban_inbound() pulls description edits and new comments from the board back into the local task definitions; outbound sync pushes status changes and attaches GitHub Issue / PR URLs to the board item.

Implementation: cli/src/kanban/KanbanProvider, LinearProvider, JiraProvider, TrelloProvider.


Log Source Ingest

wreck-it can pull error/exception log entries from a structured log platform and automatically create tasks for an agent to triage and fix:

[log_source]
provider = "seq" # or "cloudflare"
url = "http://seq-host:5341"
api_key = "your-seq-api-key"
filter = "outcome = 'exception'" # Seq CLEF filter expression

Each iteration, sync_log_source_inbound() queries the platform for new entries since the last run, deduplicates them via the kanban-source: label prefix, and creates pending implementer tasks from each unique error.

Implementation: cli/src/log_source/LogSourceProvider, SeqProvider, CloudflareProvider.


All Horizon 2–3 features are exercised together in the eval7_full_horizon2_horizon3_acceptance_gate test in cli/src/integration_eval.rs. The scenario:

  1. wreck-it plan generates a four-task "Build REST API" plan (impl-5).
  2. Role-based routing assigns specialist agents to each task (impl-1).
  3. Artefact chaining passes the design spec into the implementation task (impl-8).
  4. Provenance records are written for every completed step (impl-10).
  5. The review task fails twice — adaptive re-planning splits it into two smaller tasks (impl-7).
  6. Gastown DAG serialisation is verified for the review tasks (impl-9).
  7. The full run is exported as an openclaw document (impl-10).
  8. Agent memory persists across simulated cron invocations (impl-3).

Future Enhancements

  • Plugin hooks for custom role types
  • Task dependency visualization in the TUI
  • Interactive task editing from within the TUI