Harnesses and Shims

Hankweave doesn't implement its own AI agent. It orchestrates existing ones—Claude Code, Gemini CLI, and potentially others—through a unified interface. When Claude Code gets new tools or Gemini improves its reasoning, you get those benefits automatically. No waiting for Hankweave to catch up.

🎯

Who is this for? This page is for developers building on Hankweave (Track 3) and contributors working on the runtime (Track 4). If you're writing hanks, you don't need to understand this layer—just specify your model and Hankweave handles the rest.

The Philosophy

Why orchestrate existing agents rather than building one from scratch?

Orchestration gives you immediate access to whatever your agent can do. When Claude Code adds a new tool, it's available in your hanks. When Gemini improves its context window, your Gemini codons benefit. Hankweave doesn't need to reimplement file operations, shell access, or tool use—it inherits battle-tested implementations from the agents themselves.

The trade-off is that Hankweave is bound by the underlying agent's capabilities:

Feature support: Hankweave can only use features the agent exposes.
Complexity: Log formats must be translated between the agent and the runtime.
Continuation: Cross-codon session continuity depends on agent-specific features.

Hankweave makes this trade-off deliberately. Agent development moves fast—building a custom agent would mean perpetually chasing feature parity. Orchestration lets Hankweave focus on what makes it valuable: checkpoints, execution isolation, and debugging capabilities.

Architecture Overview

Here's how the pieces fit together:

Harness Architecture

Two paths, one destination. Regardless of which model you use, Hankweave sees the same log format. That's the key insight—shims translate foreign agents into a lingua franca the runtime understands.

Claude Agent SDK Integration

For Anthropic models—Claude Opus, Sonnet, and Haiku—Hankweave uses the official Claude Agent SDK. Unlike shims, the SDK runs in-process rather than as a subprocess.

How Detection Works

The CodonRunner decides which path to take by checking the model's provider ID:

Text

static canRun(model: ModelInfo): boolean {
  const supportedProviders = ["anthropic", "google"];
  return supportedProviders.includes(model.providerId.toLowerCase());
}

When the provider is "anthropic", Hankweave routes to ClaudeAgentSDKManager. Everything else goes through a shim.

What the SDK Manager Does

The ClaudeAgentSDKManager sits between Hankweave and the Claude Agent SDK, handling the details that make orchestration work:

Session management creates and resumes sessions using the SDK's session continuation features.
System prompt appending injects additional instructions via the systemPrompt.append option.
Environment variable passing forwards HANKWEAVE_* variables (with prefix stripped) along with essential system variables.
Template variable replacement substitutes <%EXECUTION_DIR%>, <%DATA_DIR%>, and the legacy <%PROJECT_DIR%>.
Log output writes SDK messages to Claude-compatible JSONL format.
Synthetic PIDs generate process IDs in the 900000+ range for compatibility with process management APIs.

Session Continuation

When a codon uses continuationMode: "continue-previous", the SDK manager passes the previous session ID:

Text

if (codon.continuationMode === "continue-previous" && previousSessionId) {
  options.continue = true;
  options.resume = previousSessionId;
}

This maintains conversation context across codons. The new codon doesn't start fresh—it continues exactly where the previous one left off, with full memory of what came before.

Environment Variables

Not all environment variables reach the agent. The SDK manager filters and transforms them:

Category	Pattern	Behavior
Essential system	`PATH`, `HOME`, `USER`, etc.	Passed directly
Claude Code	`CLAUDE_CODE_*`	Passed directly (for OAuth)
Anthropic	`ANTHROPIC_*` (except `ANTHROPIC_API_KEY`)	Passed directly
Hankweave	`HANKWEAVE_*` (except `RUNTIME_` and `SENTINEL_`)	Prefix stripped, then passed (see configuration reference)
Codon-specific	`codon.env`	Override any existing vars

⚠️

ANTHROPIC_API_KEY is excluded when OAuth tokens are present to avoid authentication conflicts. The SDK will prefer CLAUDE_CODE_OAUTH_TOKEN if it's set.

Self-Test

Before running anything expensive, you can verify the SDK is properly configured. A self-test runs diagnostics to check that the SDK is installed, accessible, and authenticated.

Text

const result = await manager.runSelfTest();
// Returns: { shim, agent, checks, overall }

The self-test runs several diagnostics:

Installation: Checks if the SDK is installed and importable.
Executable: Verifies the Claude CLI executable can be found (or resolved internally by the SDK).
Authentication: Confirms an API key or OAuth token is configured.
Endpoint: Validates any custom base URL.

The Shim System

For non-Anthropic models, Hankweave uses shims—command-line adapters that translate between agent-specific formats and Claude-compatible JSONL.

Why Shims Exist

Every AI CLI tool does things differently. Output formats vary—streaming JSON, NDJSON, custom protocols. Session continuation mechanisms differ. Tool names aren't consistent (one agent's read_file is another's ReadFile). Error reporting styles range from structured JSON to plain text stack traces.

Shims normalize all of this. From Hankweave's perspective, every agent speaks the same language.

The Standardized Interface

Every shim implements the same command-line interface, accepting these arguments:

--model {modelId} (required): Specifies which model to use.
-p (required): Reads prompt content from stdin.
--resume {sessionId} (optional): Continues from a previous session.
--append-system-prompt {text} (optional): Injects additional system instructions.
--self-test (optional): Runs environment diagnostics and exits.

Log Output Format

Shims write Claude-compatible JSONL to stdout—one JSON object per line. Four message types make up the format:

System messages:

Text

{
  "type": "system",
  "subtype": "init",
  "cwd": "/path/to/execution",
  "session_id": "session-abc123",
  "tools": ["Read", "Write", "Edit", "Bash", "Glob", "Grep"],
  "model": "google/gemini-2.0-flash",
  "permissionMode": "bypassPermissions"
}

Assistant messages:

Text

{
  "type": "assistant",
  "message": {
    "id": "msg_abc123",
    "type": "message",
    "role": "assistant",
    "model": "google/gemini-2.0-flash",
    "content": [
      { "type": "text", "text": "I'll read that file..." },
      { "type": "tool_use", "id": "toolu_xyz", "name": "Read", "input": { "file_path": "..." } }
    ],
    "stop_reason": "tool_use"
  }
}

User messages (tool results):

Text

{
  "type": "user",
  "message": {
    "role": "user",
    "content": [
      { "type": "tool_result", "tool_use_id": "toolu_xyz", "content": "file contents..." }
    ]
  }
}

Result messages:

Text

{
  "type": "result",
  "subtype": "success",
  "is_error": false,
  "duration_ms": 12345,
  "num_turns": 5,
  "result": "Request completed successfully",
  "session_id": "session-abc123",
  "usage": { "input_tokens": 1000, "output_tokens": 500 }
}

Tool Name Normalization

Different agents name their tools differently. Shims translate them all to a standard set:

Native Name	Normalized
`read_file`, `readFile`, `file_read`	`Read`
`write_file`, `writeFile`	`Write`
`edit_file`, `editFile`, `str_replace_editor`	`Edit`
`run_shell_command`, `bash`, `shell`	`Bash`
`list_directory`, `ls`, `list`	`LS`
`glob`, `find_files`	`Glob`
`grep`, `search_files`, `search`	`Grep`

Self-Test Protocol

When you invoke a shim with --self-test, it checks its dependencies—CLI tools, API keys, anything else it needs. Then it outputs a JSON result to stdout and exits with code 0 if everything passed, or 1 if something failed.

Here's what the output looks like:

Text

{
  "shim": {
    "name": "gemini-cli-shim",
    "version": "1.0.0"
  },
  "agent": {
    "name": "gemini-cli",
    "version": "0.1.2",
    "found": true
  },
  "checks": [
    { "name": "gemini_cli_found", "passed": true, "message": "Gemini CLI found at /usr/local/bin/gemini" },
    { "name": "api_key", "passed": true, "message": "API key found in environment" }
  ],
  "overall": {
    "passed": true,
    "message": "All checks passed"
  }
}

The Gemini Shim (Reference Implementation)

Hankweave ships with a reference shim for Google's Gemini models. It wraps the gemini CLI and translates its output to the standard format. If you're building your own shim, this is a good place to start.

How It Works

Gemini Shim Sequence

Event Translation

The shim maps Gemini CLI events to their Claude equivalents:

Gemini Event	Claude Output
`init`	`system` message with session ID
`message` (role: assistant)	Accumulated into `assistant` message
`tool_use`	`tool_use` block in assistant message
`tool_result`	`user` message with `tool_result`
`result`	`result` message with final stats

The shim recognizes shortcuts for Gemini models, such as flash and pro. The LlmProviderRegistry (covered next) is the single source of truth for resolving all model shortcuts to their full IDs.

Model Detection and Routing

When a codon specifies a model, the CodonRunner figures out which path to take:

Model Routing

Currently two providers are supported: anthropic models run through the Claude Agent SDK in-process, while google models go through the Gemini shim as a subprocess.

Choosing between providers: Anthropic models execute in-process (lower latency, shared memory), while Google models run as isolated subprocesses (more fault isolation, higher overhead). For reliability-critical hanks, the subprocess isolation of Google models is valuable. For latency-sensitive work, Anthropic's in-process execution is faster. You can mix both in the same hank—each codon routes to its appropriate harness.

`LlmProviderRegistry`

Behind model detection sits the LlmProviderRegistry—a singleton that manages model resolution, provider health checks, and cost calculation. It's the source of truth for what models exist and how to reach them.

Model Resolution

The registry is forgiving about model names. It resolves them through three phases:

First, it tries an exact match—anthropic/claude-sonnet-4-20250514 matches directly. If that fails, it attempts inferred provider resolution—claude-sonnet infers Anthropic from the name. As a last resort, fuzzy matching kicks in—claude-sonet (typo) finds the closest model name match. Fuzzy matching is helpful for development but unreliable for production—always use full model IDs in critical hanks.

Text

const result = registry.resolveModel({
  providerId: "anthropic",    // Optional: narrow to provider
  model: "sonnet",            // Accepts shortcuts, full IDs, fuzzy names
  ignoreBlockList: true       // Skip blocklist checks (default: true)
});
 
// result.matchType: "exact" | "exact-with-inferred-provider" | "fuzzy"

Model Shortcuts

The registry has built-in shortcuts for convenience:

Shortcut	Resolves To	Provider
`opus`	`claude-opus` (latest)	Anthropic
`sonnet`	`claude-sonnet` (latest)	Anthropic
`haiku`	`claude-haiku` (latest)	Anthropic
`pro`	`gemini-2.0-pro` (latest)	Google
`flash`	`gemini-2.0-flash` (latest)	Google

Provider Health Checks

Before running an expensive hank, you can verify that your providers are reachable:

Text

await registry.performHealthChecks();
const status = registry.getProviderStatus();
// Map<string, { status: "available" | "not-configured" | "failed", healthy: boolean, ... }>

Health checks use the cheapest available model for each provider to minimize cost. If something's misconfigured, you'll find out before burning tokens.

Cost Calculation

The registry also tracks pricing, so you can calculate costs programmatically:

Text

const cost = registry.calculateCost(
  "anthropic/claude-sonnet-4-20250514",
  {
    inputTokens: 10000,
    outputTokens: 5000,
    cacheReadTokens: 2000,
    cacheCreationTokens: 1000
  }
);
// Returns cost in USD, or null if pricing unavailable

Building Your Own Shim

Want to add support for a new AI provider? Here's what it takes.

1. Implement the CLI Interface

Your shim needs to accept --model {modelId} to specify the model, -p to read the prompt from stdin, --resume {sessionId} for optional session continuation, --append-system-prompt {text} for optional system prompt injection, and --self-test for diagnostic mode.

2. Output Claude-Compatible JSONL

Write one JSON object per line to stdout. Start with a system message containing the session ID. Follow with assistant messages (including tool use blocks) and user messages (containing tool results). End with a result message that includes usage stats.

3. Normalize Tool Names

Map your agent's tool names to the standard set: Read, Write, Edit, Bash, Glob, Grep, LS. This is what lets Hankweave understand tool usage regardless of which agent is running.

4. Handle Errors Gracefully

Even when things go wrong, your shim should emit a complete message sequence: a system message (even with placeholder data), a synthetic assistant message explaining the error, and a result message with subtype: "error". This keeps downstream processing consistent.

5. Implement Self-Test

Return a JSON object that describes your shim name and version, the underlying agent name, version, and whether it was found, individual checks with pass/fail status, and an overall result. See the Gemini shim's self-test output for the expected format.

Polymorphic Connector Pattern

A notable application of this architecture is the polymorphic connector pattern: using AI to build adapters for other AI. Hankweave's own Gemini shim was built this way.

The process works like this: start with the Claude JSONL schema as your target format. Give an agent the documentation for whatever CLI you're wrapping. Have it generate a shim that translates between formats. Iterate until the translation is accurate.

This pattern works for any agent with documented output format, session continuation support, and tool use capabilities. If an agent meets those criteria, you can have an AI write most of the adapter for you.

The polymorphic connector concept is one of Hankweave's use cases—you can build hanks that generate adapters, parsers, or translators based on documentation.

Debugging Agent Integration

When things go wrong at the harness layer, here's where to look.

Check Logs

Agent logs live in .hankweave/runs/{runId}/{codonId}-claude.log. These are raw JSONL from the agent or shim—exactly what Hankweave received. If something's being mistranslated, you'll see it here.

Verify Self-Test

Before digging deeper, verify that the agent is even available:

Text

hankweave --validate

This runs self-tests for all models used in your hank. If something's misconfigured, the self-test will tell you.

Common Issues

Symptom	Likely Cause	Fix
"Claude CLI not found"	Claude Code not installed	Install from anthropic.com
"Gemini CLI not found"	Gemini CLI not in PATH	Install from geminicli.com
"No API key found"	Missing environment variable	Set `ANTHROPIC_API_KEY` or `GOOGLE_API_KEY`
"Invalid session ID"	Session expired or invalid	Session continuation failed—remove `--resume`
Model resolution failed	Unknown model name	Check spelling, use full ID like `anthropic/claude-sonnet-4-20250514`

Configuration Reference - Model specification in hank config
Execution Flow - How codon execution works end-to-end
Debugging Guide - Troubleshooting execution and debugging agent integration

Summary

Hankweave's harness architecture is a bet on orchestration over implementation.

The Claude Agent SDK handles Anthropic models in-process. Shims translate other agents to a common format. The registry resolves models and tracks provider health. And the polymorphic pattern shows how AI can build its own adapters—Hankweave eating its own dog food.

You don't need to understand this layer to use Hankweave. But if you want to extend it to new providers, this is where you start.

State File LLM Proxy

Harnesses and Shims

The Philosophy

Architecture Overview

Claude Agent SDK Integration

How Detection Works

What the SDK Manager Does

Session Continuation

Environment Variables

Self-Test

The Shim System

Why Shims Exist

The Standardized Interface

Log Output Format

Tool Name Normalization

Self-Test Protocol

The Gemini Shim (Reference Implementation)

How It Works

Event Translation

Model Detection and Routing

LlmProviderRegistry

Model Resolution

Model Shortcuts

Provider Health Checks

Cost Calculation

Building Your Own Shim

1. Implement the CLI Interface

2. Output Claude-Compatible JSONL

3. Normalize Tool Names

4. Handle Errors Gracefully

5. Implement Self-Test

Polymorphic Connector Pattern

Debugging Agent Integration

Check Logs

Verify Self-Test

Common Issues

Related Pages

Summary

`LlmProviderRegistry`