Building a Hank

This tutorial walks you through building a real hank from scratch—a data codebook generator that takes CSV files and produces documented, validated schemas. By the end, you'll understand how to think in Hankweave.

Who is this for? This tutorial is for "Track 2" readers who want to create their own hanks. You should have already read Getting Started, Codons, and Hanks. This guide assumes you have Hankweave installed and working.

What We're Building

A data codebook is documentation for a dataset—it explains what each column means, what values are valid, and how tables relate. We'll build a hank that:

Observes CSV files and writes structured notes.
Generates Zod schemas from those observations.
Validates the schemas through iterative refinement.
Documents everything in human-readable Markdown.

This is a realistic workflow. The techniques apply to any multi-step AI pipeline.

Data Codebook Pipeline

The CCEPL Philosophy

Before writing code, let's talk about how to think when building hanks. CCEPL is the development loop:

Code — Write the hank configuration.
Capture — Run it and capture what happens.
Execute — Watch the agents work.
Polish — Refine based on the results.
Loop — Repeat until it works.

This isn't a waterfall process. You write a minimal codon, run it, see what happens, adjust, and run again. Think of the agent as your REPL—each iteration teaches you something about what works and what doesn't.

Key Insight: Don't try to design the perfect hank upfront. Start small and iterate fast. A working codon that does 80% of what you want is better than a perfect design that doesn't run.

Setup

Create Project Structure

Text

mkdir data-codebook
cd data-codebook
mkdir -p prompts data templates

Add Sample Data

Create two CSV files to use as our source data. Place them in the data/ directory.

Text

id,name,email,created_at,status
1,Alice Smith,alice@example.com,2024-01-15,active
2,Bob Jones,bob@example.com,2024-02-20,inactive
3,Carol White,carol@example.com,2024-03-10,active

Text

order_id,user_id,product,amount,currency,order_date
1001,1,Widget A,29.99,USD,2024-01-20
1002,1,Widget B,49.99,USD,2024-01-22
1003,2,Widget A,29.99,USD,2024-02-25

Create Initial Hank File

Start with a minimal hank.json. This file defines the entire workflow.

Text

{
  "meta": {
    "name": "Data Codebook Generator",
    "version": "0.1.0",
    "description": "Generate documented schemas from CSV files"
  },
  "hank": []
}

We'll add codons to the hank array as we go. This is the CCEPL way—start empty and build incrementally.

Step 1: Observation Codon

The first codon observes the data. It reads the CSV files and writes structured notes about what it finds.

Write the Prompt

This prompt tells the AI agent what to look for in the data.

Text

# Data Observation Task
 
Examine the CSV files in the `read_only_data_source/data/` directory.
 
For each CSV file:
1. List all columns with their inferred data types.
2. Note any patterns (IDs, dates, enums, foreign keys).
3. Identify relationships between files (e.g., `user_id` in `orders.csv` references `id` in `users.csv`).
4. Record sample values and constraints.
 
Create a file called `notes/observations.md` with your findings. Structure it clearly with headers for each CSV file.
 
Be thorough but concise. Focus on what a schema author would need to know.

Add the Codon

Update your hank.json to include the first codon in the hank array:

Text

{
  "meta": {
    "name": "Data Codebook Generator",
    "version": "0.1.0",
    "description": "Generate documented schemas from CSV files"
  },
  "hank": [
    {
      "id": "observe",
      "name": "Observe Data Structure",
      "model": "haiku",
      "continuationMode": "fresh",
      "promptFile": "./prompts/observe.md",
      "rigSetup": [
        {
          "type": "command",
          "command": {
            "run": "mkdir -p notes",
            "workingDirectory": "project"
          }
        }
      ],
      "checkpointedFiles": ["notes/**/*"]
    }
  ]
}

Let's break this down:

Field	Why
`model: "haiku"`	Observation is straightforward—use a fast, cheap model.
`continuationMode`	`fresh` is required for the first codon in a sequence.
`rigSetup`	Creates the `notes` directory before the agent starts.
`checkpointedFiles`	Tracks everything in `notes/` so we get file change events.

Run It

Text

hankweave --data ./data

Watch the TUI. You'll see:

The rig setup running (mkdir -p notes).
Claude Haiku analyzing your CSVs.
Files being written to the notes/ directory.
A checkpoint being created.

When it completes, check the output:

Text

# Find the execution directory from the TUI output, then:
cat ~/.hankweave-executions/{your-execution}/notes/observations.md

Evaluate and Iterate

Now for the Polish phase—the most important part. Read what the agent wrote and ask yourself:

Did it find all the columns?
Did it identify the relationships (user_id → users.id)?
Is the output structured well for the next codon?

If not, edit prompts/observe.md and run again. Maybe you need to be more specific about the output format or request more detail on data types.

CCEPL in action: You just completed a full loop. You wrote config, ran it, captured results, evaluated, and are now ready to polish. This cycle repeats for every codon you add.

Step 2: Schema Generation Codon

With observations in hand, we can generate the actual Zod schemas. This codon reads what the first one wrote and produces source code.

Create the Template

We'll provide the AI with a TypeScript project template. This gives it a consistent structure to work with, ensuring packages, configurations, and entry points are in predictable locations.

Create the project template directory:

Text

mkdir -p templates/typescript/src

Create the package.json for the template:

Text

{
  "name": "data-schemas",
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "typecheck": "tsc --noEmit",
    "test": "bun test"
  },
  "dependencies": {
    "zod": "^3.22.0"
  },
  "devDependencies": {
    "typescript": "^5.0.0",
    "@types/bun": "latest"
  }
}

And the tsconfig.json:

Text

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "outDir": "./dist"
  },
  "include": ["src/**/*"]
}

Write the Prompt

This prompt guides the agent in writing the schema code.

Text

# Schema Generation Task
 
Read the observations in `notes/observations.md`.
 
Based on those observations, create Zod schemas for each CSV file. Place them in the `src/schemas/` directory.
 
Requirements:
- One schema file per CSV (e.g., `src/schemas/users.ts`, `src/schemas/orders.ts`).
- Include JSDoc comments explaining each field.
- Add appropriate validations (email format, date strings, enums).
- Create a barrel export in `src/schemas/index.ts`.
 
Example schema structure:
```typescript
import { z } from 'zod';
 
/**
 * Represents a user record from the users.csv file.
 */
export const UserSchema = z.object({
  id: z.number().int().positive(),
  name: z.string().min(1),
  email: z.string().email(),
  created_at: z.string().datetime(),
  status: z.enum(['active', 'inactive'])
});
 
export type User = z.infer<typeof UserSchema>;

Focus on accuracy. The schemas should be able to validate the real data from the CSV files.

Text


### Add the Codon

Add the `generate` codon to your `hank.json`, after the `observe` codon:

```json filename="hank.json"
// ... observe codon from Step 1 ...
    {
      "id": "generate",
      "name": "Generate Zod Schemas",
      "model": "sonnet",
      "continuationMode": "fresh",
      "promptFile": "./prompts/generate.md",
      "rigSetup": [
        {
          "type": "copy",
          "copy": {
            "from": "./templates/typescript",
            "to": "src"
          }
        },
        {
          "type": "command",
          "command": {
            "run": "bun install",
            "workingDirectory": "lastCopied"
          }
        }
      ],
      "checkpointedFiles": [
        "src/schemas/**/*.ts",
        "src/package.json"
      ]
    }
// ... rest of hank array ...

Notice:

Model upgrade: Schema generation is more complex than observation, so we use the more capable sonnet model.
continuationMode: "fresh": We're reading from files created by the previous step, not continuing a conversation.
Rig setup: The rigSetup first copies the template, then runs bun install inside the new directory (lastCopied) to install dependencies. See Rigs for more on working directories.

Why Not `continue-previous`?

You might wonder why we don't continue the conversation from the observation codon. Two reasons:

Different models: We're switching from haiku to sonnet. Different models can't share a continuous session.
Context isolation: The observations are stored in a file. Reading from a file is more reliable and inspectable than relying on accumulated conversational context, which can drift.

Design Principle: Prefer file-based handoffs over context continuation when you can. Files are persistent, inspectable, and explicit.

Run and Evaluate

Text

hankweave --data ./data

The first codon runs (or is skipped if its output already exists), then the second begins. Watch the agent:

Read your observations.
Create schema files.
Install dependencies.

Check the results:

Text

cat ~/.hankweave-executions/{your-execution}/src/schemas/users.ts

Does it match your data? Are the types correct? Is the enum accurate?

Step 3: Validation Loop

AI-generated code rarely works perfectly on the first try. Type errors, missing imports, or incorrect assumptions about data are expected. Loops are how we handle this by giving the agent a chance to fix its own mistakes.

Write the Validation Prompt

This prompt tells the agent how to check its work and fix errors.

Text

# Schema Validation Task
 
Your generated schemas may have errors. Your task is to find and fix them.
 
First, run the TypeScript type checker to find static errors:
```bash
cd src && bun run typecheck

If there are type errors, read the error messages carefully, fix the schema files, and run typecheck again to verify.

Next, test that the schemas can parse the actual data:

Text

cd src && bun test

If tests fail, adjust the schemas to correctly represent the data in the CSV files.

Continue this process of checking and fixing until both typecheck and test pass, or explain what is blocking you.

Text


### Add the Loop

Add the `validate` loop to `hank.json` after the `generate` codon:

```json filename="hank.json"
// ... generate codon from Step 2 ...
    {
      "type": "loop",
      "id": "validate",
      "name": "Schema Validation Loop",
      "description": "Iteratively fix schema issues",
      "terminateOn": {
        "type": "iterationLimit",
        "limit": 3
      },
      "codons": [
        {
          "id": "fix-schemas",
          "name": "Validate and Fix Schemas",
          "model": "sonnet",
          "continuationMode": "continue-previous",
          "promptFile": "./prompts/validate.md",
          "checkpointedFiles": ["src/schemas/**/*.ts"],
          "rigSetup": [
            {
              "type": "command",
              "command": {
                "run": "bun run typecheck",
                "workingDirectory": "src"
              },
              "allowFailure": true
            }
          ]
        }
      ]
    }
// ... rest of hank array ...

Key points:

type: "loop": This defines a loop block, not a single codon.
iterationLimit: 3: The agent gets a maximum of 3 attempts to fix issues.
continuationMode: "continue-previous": Each iteration inside the loop remembers the previous attempts, allowing the agent to learn from its mistakes.
allowFailure: true: We expect the typecheck command might fail—that's the whole point of the loop. This setting prevents a failure from stopping the hank.

Validation Loop

Why an Iteration Limit?

We use iterationLimit instead of a condition like contextExceeded because:

It provides predictable costs (3 iterations maximum).
Type errors should be fixable in a few passes. If not, there's likely a deeper issue.
It preserves context for any codons that might run after the loop.

⚠️

Context Budgeting: If you used contextExceeded as the termination condition, the agent would keep trying to fix errors until its context window filled up, which could be expensive. iterationLimit gives you direct control over the cost.

Step 4: Documentation Codon

The code is working—now make it understandable for humans. This final codon takes the validated schemas and produces documentation.

Write the Prompt

Text

# Documentation Generation Task
 
Create comprehensive documentation for the schemas in `src/schemas/`.
 
Generate a file at `docs/CODEBOOK.md` with the following sections:
 
1.  **Overview**: A brief summary of the datasets this codebook covers.
2.  **Schemas**: For each schema, provide:
    - Table name and purpose.
    - Field-by-field documentation, explaining what the data represents.
    - Example valid and invalid values for key fields.
    - Relationships to other tables.
3.  **Usage**: A short guide on how to import and use the generated schemas in a TypeScript project.
 
The documentation should be clear enough for non-developers to understand. Focus on the meaning of the data, not just the technical types.
 
Also, create a `docs/CHANGELOG.md` file summarizing the schemas that were generated.

Add the Codon

Add the document codon to hank.json after the validate loop:

Text

// ... validate loop from Step 3 ...
    {
      "id": "document",
      "name": "Generate Documentation",
      "model": "sonnet",
      "continuationMode": "fresh",
      "promptFile": "./prompts/document.md",
      "rigSetup": [
        {
          "type": "command",
          "command": {
            "run": "mkdir -p docs",
            "workingDirectory": "project"
          }
        }
      ],
      "checkpointedFiles": ["docs/**/*"],
      "outputFiles": [
        {
          "copy": ["src/schemas/**/*.ts", "docs/**/*"],
          "beforeCopy": [
            {
              "type": "command",
              "command": {
                "run": "bun run typecheck",
                "workingDirectory": "src"
              }
            }
          ]
        }
      ]
    }
// ... rest of hank array ...

Notice:

continuationMode: "fresh": We start with a clean slate for documentation to ensure it's based on the final, validated files.
outputFiles: This block defines the final artifacts of the hank. It copies the schemas and docs to the results directory.
beforeCopy: As a final sanity check, it runs typecheck one last time. If the check fails, the hank stops and nothing is copied, ensuring only valid code is output.

Step 5: Add Sentinels

So far, each codon has run in sequence. Sentinels are different: they run in parallel, observing the workflow without blocking it. Think of them as background monitors that watch what's happening and produce their own output.

Understanding Sentinel Templating

Sentinels use Eta templating syntax (<%= %>) to dynamically insert data from the execution into their prompts. Inside a template, you have access to several variables:

it.events: An array of the events that triggered the sentinel.
it.timestamp: The time when the sentinel fired.
it.context: The current execution context.

For example, <%= JSON.stringify(it.events, null, 2) %> inserts all triggering events as formatted JSON. See Sentinels for a full reference.

Cost Tracker Sentinel

Create sentinels/cost-tracker.sentinel.json to watch token usage.

Text

{
  "id": "cost-tracker",
  "name": "Cost Tracker",
  "description": "Track token usage and costs",
  "model": "haiku",
  "trigger": {
    "type": "event",
    "on": ["token.usage"]
  },
  "execution": {
    "strategy": "timeWindow",
    "milliseconds": 30000
  },
  "userPromptText": "Summarize the token usage so far. Calculate total input tokens, output tokens, and estimated cost. List the most expensive operations.\n\nEvents:\n<%= JSON.stringify(it.events, null, 2) %>",
  "joinString": "\n\n---\n\n"
}

Progress Narrator Sentinel

Create sentinels/narrator.sentinel.json to generate human-readable progress updates.

Text

{
  "id": "narrator",
  "name": "Progress Narrator",
  "description": "Human-readable progress updates",
  "model": "haiku",
  "trigger": {
    "type": "event",
    "on": ["assistant.action", "tool.result"]
  },
  "execution": {
    "strategy": "debounce",
    "milliseconds": 10000
  },
  "systemPromptText": "You are a technical writer summarizing AI agent progress. Be concise and factual.",
  "userPromptText": "Based on these events, write a brief paragraph about what the agent just accomplished:\n\n<%= JSON.stringify(it.events, null, 2) %>",
  "joinString": "\n\n"
}

Attach Sentinels to Codons

Update your codons to attach the sentinels. For example, add a sentinels array to the generate codon:

Text

// In the "generate" codon definition
"sentinels": [
  {
    "sentinelConfig": "./sentinels/narrator.sentinel.json"
  },
  {
    "sentinelConfig": "./sentinels/cost-tracker.sentinel.json"
  }
]

Advanced Sentinel Configuration: For more control, you can add a settings object alongside sentinelConfig:

Text

{
  "sentinelConfig": "./sentinels/narrator.sentinel.json",
  "settings": {
    "failCodonIfNotLoaded": false,
    "outputPaths": { "logFile": "narrator-output.md" }
  }
}

See Sentinels for details on all available settings.

Add the same sentinels array to the validate loop and the document codon. Sentinels are cheap—they use haiku and process events in batches, providing valuable insight with minimal overhead.

The Complete Hank

Here is the full hank.json file, putting all the pieces together.

Text

{
  "meta": {
    "name": "Data Codebook Generator",
    "version": "1.0.0",
    "description": "Generate documented schemas from CSV files"
  },
  "hank": [
    {
      "id": "observe",
      "name": "Observe Data Structure",
      "model": "haiku",
      "continuationMode": "fresh",
      "promptFile": "./prompts/observe.md",
      "rigSetup": [
        {
          "type": "command",
          "command": {
            "run": "mkdir -p notes",
            "workingDirectory": "project"
          }
        }
      ],
      "checkpointedFiles": ["notes/**/*"]
    },
    {
      "id": "generate",
      "name": "Generate Zod Schemas",
      "model": "sonnet",
      "continuationMode": "fresh",
      "promptFile": "./prompts/generate.md",
      "rigSetup": [
        {
          "type": "copy",
          "copy": {
            "from": "./templates/typescript",
            "to": "src"
          }
        },
        {
          "type": "command",
          "command": {
            "run": "bun install",
            "workingDirectory": "lastCopied"
          }
        }
      ],
      "checkpointedFiles": ["src/schemas/**/*.ts", "src/package.json"],
      "sentinels": [
        { "sentinelConfig": "./sentinels/narrator.sentinel.json" },
        { "sentinelConfig": "./sentinels/cost-tracker.sentinel.json" }
      ]
    },
    {
      "type": "loop",
      "id": "validate",
      "name": "Schema Validation Loop",
      "description": "Iteratively fix schema issues",
      "terminateOn": {
        "type": "iterationLimit",
        "limit": 3
      },
      "codons": [
        {
          "id": "fix-schemas",
          "name": "Validate and Fix Schemas",
          "model": "sonnet",
          "continuationMode": "continue-previous",
          "promptFile": "./prompts/validate.md",
          "checkpointedFiles": ["src/schemas/**/*.ts"],
          "rigSetup": [
            {
              "type": "command",
              "command": {
                "run": "bun run typecheck",
                "workingDirectory": "src"
              },
              "allowFailure": true
            }
          ],
          "sentinels": [
            { "sentinelConfig": "./sentinels/narrator.sentinel.json" }
          ]
        }
      ]
    },
    {
      "id": "document",
      "name": "Generate Documentation",
      "model": "sonnet",
      "continuationMode": "fresh",
      "promptFile": "./prompts/document.md",
      "rigSetup": [
        {
          "type": "command",
          "command": {
            "run": "mkdir -p docs",
            "workingDirectory": "project"
          }
        }
      ],
      "checkpointedFiles": ["docs/**/*"],
      "outputFiles": [
        {
          "copy": ["src/schemas/**/*.ts", "docs/**/*"],
          "beforeCopy": [
            {
              "type": "command",
              "command": {
                "run": "bun run typecheck",
                "workingDirectory": "src"
              }
            }
          ]
        }
      ],
      "sentinels": [
        { "sentinelConfig": "./sentinels/narrator.sentinel.json" },
        { "sentinelConfig": "./sentinels/cost-tracker.sentinel.json" }
      ]
    }
  ]
}

Complete Hank Architecture

Run the Full Hank

Execute the complete workflow with a single command:

Text

hankweave --data ./data

Watch the TUI as the hank progresses through each step:

observe: Haiku analyzes the CSVs and writes notes (fast and cheap).
generate: Sonnet creates schemas from the observations.
validate: The loop runs typecheck, giving the agent up to 3 chances to fix issues.
document: Sonnet writes documentation and copies the final, validated artifacts to the output directory.

All the while, your sentinels will run in parallel, generating progress summaries and cost reports.

What You've Learned

You just built a real-world hank from scratch. Here are the core patterns you used:

Pattern	Where Used
Model selection by task	Haiku for observation, Sonnet for generation
File-based handoffs	Observations → Schema generation
Rig setup with templates	Copying the TypeScript project structure
Validation loops	Iterative schema fixing
`allowFailure` in loops	Typecheck might fail—that's expected
Sentinels for monitoring	Progress and cost tracking
Finalizing with `outputFiles`	Copying results to the output directory

Debugging Tips

When things go wrong:

Check the observation notes: Did the agent see your data correctly in the first step? The entire pipeline depends on this.
Read the agent logs: You can find detailed JSONL logs for each agent interaction in the execution directory (e.g., ~/.hankweave-executions/{runId}/).
Inspect checkpoints: The TUI shows created checkpoints. You can use them to roll back to a known good state and try a different approach.
Iterate on prompts: If the agent misunderstands a task, clarify the prompt with more specific instructions or better examples.

For example, to investigate why a codon failed, you can inspect its log file:

Text

# Find the relevant execution directory and log file
cat ~/.hankweave-executions/{runId}/fix-schemas#1-sonnet.log

Look for error messages from tools or assistant reasoning that seems flawed. See Debugging for comprehensive troubleshooting strategies.

Adapting This Pattern

This tutorial built a data codebook, but the structure applies to many other problems. The same Observe → Generate → Validate → Document pipeline works for:

Code generation: Observe specs → Generate code → Run tests → Document the API.
Document processing: Extract text → Transform to structured data → Validate against a schema → Format for output.
Automated testing: Generate test cases → Run them → Fix failing code → Report the results.

The underlying principles transfer to any multi-step AI workflow: start with cheap observation, use more capable models for generation, validate with controlled iteration, document with a clean context, and monitor everything with parallel sentinels.

Codons — Understand each field in a codon.
Loops — Learn about termination modes and constraints.
Rigs — See all available setup operations and path handling options.
Sentinels — Explore triggers, strategies, and structured output.
Checkpoints — Understand how to roll back when things go wrong.

Next Steps

Now that you've built a working hank:

Add more sophisticated sentinels that produce structured output or use conversational mode.
Experiment with the contextExceeded termination condition for open-ended tasks.
Apply these patterns to build hanks for your own use cases.
Read Loops and Checkpoints to learn advanced control strategies.

Getting Started Testing Hanks

Building a Hank

What We're Building

The CCEPL Philosophy

Setup

Create Project Structure

Add Sample Data

Create Initial Hank File

Step 1: Observation Codon

Write the Prompt

Add the Codon

Run It

Evaluate and Iterate

Step 2: Schema Generation Codon

Create the Template

Write the Prompt

Why Not continue-previous?

Run and Evaluate

Step 3: Validation Loop

Write the Validation Prompt

Why an Iteration Limit?

Step 4: Documentation Codon

Write the Prompt

Add the Codon

Step 5: Add Sentinels

Understanding Sentinel Templating

Cost Tracker Sentinel

Progress Narrator Sentinel

Attach Sentinels to Codons

The Complete Hank

Run the Full Hank

What You've Learned

Debugging Tips

Adapting This Pattern

Related Pages

Next Steps

Why Not `continue-previous`?