Reference
Performance and Cost Tracking

Performance and Cost Tracking

Hankweave automatically tracks costs and token usage. Every LLM call from a codon or sentinel is metered and recorded, available for inspection both in real time and after a run completes.

🎯

Who is this for? Users optimizing hank costs (Track 2: Writing Hanks) and developers building monitoring tools (Track 3: Building on Hankweave).

How Cost Tracking Works

Hankweave calculates costs using per-model pricing data from its LLM Provider Registry. When an agent makes an LLM call, the runtime extracts token usage from the response and computes costs based on input tokens, output tokens, and cache operations.

Cost calculation pipeline

Token Categories

Every LLM call tracks four token categories:

CategoryDescriptionTypical Impact
Input TokensTokens sent to the model (prompt, context, files)Usually the largest cost component
Output TokensTokens generated by the modelHigher per-token cost than input
Cache Read TokensTokens read from prompt cachingSignificantly cheaper than input
Cache Write TokensTokens written to prompt cacheOne-time cost, enables future savings

Cost Calculation

The registry calculates costs per million tokens using model-specific pricing:

Text
const inputCost = (usage.inputTokens / 1_000_000) * modelCost.input;
const outputCost = (usage.outputTokens / 1_000_000) * modelCost.output;
const cacheReadCost = (usage.cacheReadTokens / 1_000_000) * modelCost.cache_read;
const cacheWriteCost = (usage.cacheWriteTokens / 1_000_000) * modelCost.cache_write;
 
const totalCost = inputCost + outputCost + cacheReadCost + cacheWriteCost;

Prompt caching can substantially reduce costs. When a codon uses the continue-previous continuation mode, prompts that overlap with previous turns can be served from the cache at a reduced cost.

Where Costs Appear

Hankweave reports costs in multiple places, each serving a different purpose.

State File

The state file tracks cumulative costs per codon. Running codons show currentCost and currentTokens, while completed codons have finalCost and finalTokens:

Text
{
  "codons": [
    {
      "codonId": "generate-schema",
      "status": "completed",
      "finalCost": 0.0234,
      "finalTokens": {
        "inputTokens": 15420,
        "outputTokens": 3210,
        "cacheWriteTokens": 0,
        "cacheReadTokens": 12500
      }
    }
  ]
}

See State File for the complete schema.

Events

Cost updates are emitted as events during execution:

EventWhenData
token.usageDuring codon executionCurrent tokens and cost for the codon
codon.completedAfter codon finishesFinal cost, tokens, and sentinel costs
sentinel.outputAfter sentinel LLM callPer-call cost and tokens

State Snapshots

The state.snapshot event includes aggregated costs:

Text
{
  "type": "state.snapshot",
  "data": {
    "currentCost": 0.0234,
    "totalTokens": {
      "inputTokens": 15420,
      "outputTokens": 3210,
      "cacheWriteTokens": 0,
      "cacheReadTokens": 12500
    }
  }
}

Codon vs. Sentinel Costs

Hankweave tracks codon and sentinel costs separately because they serve different purposes and often use different models.

Codon and sentinel cost tracking

Codon Costs

Codon costs come from the main agent—such as Claude Code or a shim-based agent—executing the codon's work. These are typically the majority of a run's costs.

Codon costs appear in:

  • currentCost / finalCost on codon execution objects
  • token.usage events during execution
  • codon.completed events when finished

Sentinel Costs

Sentinel costs come from parallel LLM calls for observation. Each sentinel tracks its cumulative cost independently.

Sentinel costs appear in:

  • The totalCost field within the sentinels object on a codon execution
  • sentinel.output events with per-call costs
  • Individual sentinel state with the totalCost field
Text
{
  "sentinels": {
    "executed": [
      {
        "id": "cost-tracker",
        "model": "anthropic/claude-haiku",
        "totalCost": 0.0012,
        "llmCallCount": 8
      }
    ],
    "totalCost": 0.0012
  }
}
⚠️

Sentinel costs add up. A sentinel using the immediate execution strategy triggers on every matching event. In a long run, frequent sentinel calls can become significant. Use debounce, count, or timeWindow strategies to batch events and reduce LLM calls.

The Cost Cache

The state manager maintains a cost cache for fast queries without recalculating from the full state history:

Text
// Fast queries via cache
const currentRunCost = stateManager.getCurrentRunCost();
const totalCost = stateManager.getTotalCost();

The cache rebuilds automatically after any cost-related state transition. This means queries are O(1) rather than O(codons).

When the Cache Rebuilds

EventCache Behavior
CostsUpdatedFull rebuild
CostsIncrementedFull rebuild
CodonFinalCostSetFull rebuild
RunStartedReset current run cost
RunCompleted / RunFailedReset current run cost

Performance Considerations

Beyond direct LLM costs, several factors influence Hankweave's performance and resource usage.

Event Journal Growth

The event journal (.hankweave/events/events.jsonl) grows with every journaled event. For long-running hanks or hanks with verbose sentinels, this file can become large.

What's normal:

  • Short runs (< 10 codons): A few hundred KB
  • Medium runs (10-50 codons): 1-10 MB
  • Long runs or loops: 10-100+ MB

When to be concerned:

  • Journal file over 500 MB
  • Noticeable latency in event queries

Mitigation:

  • Archive old runs. The journal is append-only, so you can safely truncate the file after backing it up.
  • Use reportToWebsocket.triggers: false on verbose sentinels to reduce events.
  • For extremely long runs, consider splitting them into multiple hanks.

Sentinel Queue Sizing

Sentinels use fixed queue limits to prevent unbounded memory growth:

QueueLimitBehavior When Full
Trigger queue100 triggersDrops oldest trigger
Event buffer10,000 eventsDrops oldest trigger's events

These limits exist to prevent memory exhaustion during high-throughput scenarios. If you see "Queue full" or "Event buffer exceeded" log messages, your sentinel is receiving events faster than it can process them.

Solutions:

  • Switch from immediate to a debounce or timeWindow strategy.
  • Increase trigger selectivity with more specific conditions.
  • Use a faster model for the sentinel.

Loop Efficiency

Hankweave uses lazy loop expansion—only one iteration exists in the execution plan at a time. This design has several performance implications:

Benefits:

  • Memory-efficient: The plan doesn't grow until an iteration starts.
  • Supports unbounded loops: contextExceeded loops can run indefinitely.
  • Enables mid-iteration termination: A hank can be stopped cleanly within a loop.

Considerations:

  • Each iteration requires a cheap plan expansion operation.
  • Debugging tools show only the currently expanded iterations.
  • The runtime cannot "see ahead" to future iterations in the plan.

Checkpoint Overhead

Every checkpoint creates a git commit in the shadow repository. Git operations are fast but not free:

OperationTypical TimeWhen
git add10-50msAfter each checkpoint
git commit50-100msAfter each checkpoint
git checkout100-500msDuring rollback
git log10-50msListing checkpoints

For hanks with many small codons, checkpoint overhead can become noticeable. The overhead scales with the number of tracked files, not their size, as git uses content-addressed storage efficiently.

Memory vs. File Storage

Hankweave offers two event storage backends:

Storage backends comparison

BackendUse CaseMemoryPersistence
FileEventStorageProductionLowYes
MemoryEventStorageTesting100 events maxNo

FileEventStorage uses chunked reads (10,000 events per chunk) and streaming to handle large journals without loading everything into memory.

MemoryEventStorage auto-trims to a maximum of 100 events (dropping the oldest 10% when the limit is reached), making it suitable only for tests.

Optimizing Costs

Model Selection by Task

Different tasks have different model requirements. Hankweave lets you specify models per codon, so use them strategically:

Task TypeSuggested Model ClassWhy
Planning, architectureHigh reasoning (opus, o1)Complex decision-making
Code generationMedium reasoning (sonnet)Balance of capability and cost
Simple transformationsFast (haiku, gemini-flash)Speed over sophistication
Validation, formattingFast (haiku, gemini-flash)Quick feedback loops

Sentinel Model Selection

Sentinels often don't need top-tier models. A narrator or cost tracker works well with a faster, cheaper model like Haiku:

Text
{
  "id": "narrator",
  "model": "anthropic/claude-haiku",
  "trigger": { "event": "*" },
  "execution": { "strategy": "debounce", "milliseconds": 5000 }
}

Sentinel-specific API keys let you use different billing for sentinel calls. Set the HANKWEAVE_SENTINEL_ANTHROPIC_API_KEY environment variable to separate sentinel costs from codon costs.

Prompt Caching

Claude models support prompt caching, which can significantly reduce costs for codons that share context. To maximize cache hits:

  1. Use continue-previous mode when codons build on previous work.
  2. Place stable content early in prompts (e.g., system prompts, reference documentation).
  3. Keep changing content late in prompts (e.g., the current task, recently modified files).

Batching with Execution Strategies

For sentinels that don't need real-time responses, batching reduces LLM calls:

Batching strategies

StrategyBest ForConfiguration
immediateCritical alertsN/A
debounceActivity summaries5-30 seconds typical
countPeriodic digests5-20 events typical
timeWindowRegular reports30-300 seconds typical

A sentinel that fires once per minute instead of on every event can use over 60x fewer LLM calls.

Monitoring During Execution

Real-Time Cost Tracking

Connect a WebSocket client to receive cost updates as they happen:

Text
client.on('token.usage', (event) => {
  console.log(`Codon ${event.data.codonId}: $${event.data.cost.toFixed(4)}`);
});
 
client.on('sentinel.output', (event) => {
  console.log(`Sentinel ${event.data.sentinelId}: $${event.data.cost.toFixed(4)}`);
});

Cost-Tracking Sentinel

Build a sentinel that monitors costs in real time:

Text
{
  "id": "cost-tracker",
  "model": "anthropic/claude-haiku",
  "trigger": {
    "event": ["token.usage", "codon.completed"]
  },
  "execution": {
    "strategy": "debounce",
    "milliseconds": 10000
  },
  "userPromptText": "Summarize spending: <%= JSON.stringify(it.events) %>"
}

What's Normal, What's Concerning

Typical Cost Patterns

ScenarioExpected Cost RangeNotes
Simple 3-codon hank0.050.05 - 0.50Depends on model choices
Data processing pipeline0.500.50 - 5.00More for complex schemas
Long validation loop1.001.00 - 20.00Depends on iterations
With active sentinelsAdd 10-30%Model-dependent

Warning Signs

Runaway costs:

  • Cost growing rapidly without visible progress.
  • A contextExceeded loop that should have terminated.
  • A sentinel with an immediate strategy on high-frequency events.

Performance issues:

  • State file writes taking > 1 second.
  • Event queries timing out.
  • Log parsing interval warnings.

Resource exhaustion:

  • "Queue full" messages from sentinels.
  • Git operations failing.
  • Lock file heartbeat warnings.

Related Pages

Next Steps

To optimize your hank's costs:

  1. Review the state file after a run to see where costs accumulated.
  2. Consider sentinel execution strategies for batching.
  3. Use appropriate model selection for each codon.