Building a Hank
This tutorial walks you through building a real hank from scratch—a data codebook generator that takes CSV files and produces documented, validated schemas. By the end, you'll understand how to think in Hankweave.
Who is this for? This tutorial is for "Track 2" readers who want to create their own hanks. You should have already read Getting Started, Codons, and Hanks. This guide assumes you have Hankweave installed and working.
What We're Building
A data codebook is documentation for a dataset—it explains what each column means, what values are valid, and how tables relate. We'll build a hank that:
- Observes CSV files and writes structured notes.
- Generates Zod schemas from those observations.
- Validates the schemas through iterative refinement.
- Documents everything in human-readable Markdown.
This is a realistic workflow. The techniques apply to any multi-step AI pipeline.
The CCEPL Philosophy
Before writing code, let's talk about how to think when building hanks. CCEPL is the development loop:
- Code — Write the hank configuration.
- Capture — Run it and capture what happens.
- Execute — Watch the agents work.
- Polish — Refine based on the results.
- Loop — Repeat until it works.
This isn't a waterfall process. You write a minimal codon, run it, see what happens, adjust, and run again. Think of the agent as your REPL—each iteration teaches you something about what works and what doesn't.
Key Insight: Don't try to design the perfect hank upfront. Start small and iterate fast. A working codon that does 80% of what you want is better than a perfect design that doesn't run.
Setup
Create Project Structure
mkdir data-codebook
cd data-codebook
mkdir -p prompts data templatesAdd Sample Data
Create two CSV files to use as our source data. Place them in the data/ directory.
id,name,email,created_at,status
1,Alice Smith,alice@example.com,2024-01-15,active
2,Bob Jones,bob@example.com,2024-02-20,inactive
3,Carol White,carol@example.com,2024-03-10,activeorder_id,user_id,product,amount,currency,order_date
1001,1,Widget A,29.99,USD,2024-01-20
1002,1,Widget B,49.99,USD,2024-01-22
1003,2,Widget A,29.99,USD,2024-02-25Create Initial Hank File
Start with a minimal hank.json. This file defines the entire workflow.
{
"meta": {
"name": "Data Codebook Generator",
"version": "0.1.0",
"description": "Generate documented schemas from CSV files"
},
"hank": []
}We'll add codons to the hank array as we go. This is the CCEPL way—start empty and build incrementally.
Step 1: Observation Codon
The first codon observes the data. It reads the CSV files and writes structured notes about what it finds.
Write the Prompt
This prompt tells the AI agent what to look for in the data.
# Data Observation Task
Examine the CSV files in the `read_only_data_source/data/` directory.
For each CSV file:
1. List all columns with their inferred data types.
2. Note any patterns (IDs, dates, enums, foreign keys).
3. Identify relationships between files (e.g., `user_id` in `orders.csv` references `id` in `users.csv`).
4. Record sample values and constraints.
Create a file called `notes/observations.md` with your findings. Structure it clearly with headers for each CSV file.
Be thorough but concise. Focus on what a schema author would need to know.Add the Codon
Update your hank.json to include the first codon in the hank array:
{
"meta": {
"name": "Data Codebook Generator",
"version": "0.1.0",
"description": "Generate documented schemas from CSV files"
},
"hank": [
{
"id": "observe",
"name": "Observe Data Structure",
"model": "haiku",
"continuationMode": "fresh",
"promptFile": "./prompts/observe.md",
"rigSetup": [
{
"type": "command",
"command": {
"run": "mkdir -p notes",
"workingDirectory": "project"
}
}
],
"checkpointedFiles": ["notes/**/*"]
}
]
}Let's break this down:
| Field | Why |
|---|---|
model: "haiku" | Observation is straightforward—use a fast, cheap model. |
continuationMode | fresh is required for the first codon in a sequence. |
rigSetup | Creates the notes directory before the agent starts. |
checkpointedFiles | Tracks everything in notes/ so we get file change events. |
Run It
hankweave --data ./dataWatch the TUI. You'll see:
- The rig setup running (
mkdir -p notes). - Claude Haiku analyzing your CSVs.
- Files being written to the
notes/directory. - A checkpoint being created.
When it completes, check the output:
# Find the execution directory from the TUI output, then:
cat ~/.hankweave-executions/{your-execution}/notes/observations.mdEvaluate and Iterate
Now for the Polish phase—the most important part. Read what the agent wrote and ask yourself:
- Did it find all the columns?
- Did it identify the relationships (user_id → users.id)?
- Is the output structured well for the next codon?
If not, edit prompts/observe.md and run again. Maybe you need to be more specific about the output format or request more detail on data types.
CCEPL in action: You just completed a full loop. You wrote config, ran it, captured results, evaluated, and are now ready to polish. This cycle repeats for every codon you add.
Step 2: Schema Generation Codon
With observations in hand, we can generate the actual Zod schemas. This codon reads what the first one wrote and produces source code.
Create the Template
We'll provide the AI with a TypeScript project template. This gives it a consistent structure to work with, ensuring packages, configurations, and entry points are in predictable locations.
Create the project template directory:
mkdir -p templates/typescript/srcCreate the package.json for the template:
{
"name": "data-schemas",
"version": "1.0.0",
"type": "module",
"scripts": {
"typecheck": "tsc --noEmit",
"test": "bun test"
},
"dependencies": {
"zod": "^3.22.0"
},
"devDependencies": {
"typescript": "^5.0.0",
"@types/bun": "latest"
}
}And the tsconfig.json:
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"outDir": "./dist"
},
"include": ["src/**/*"]
}Write the Prompt
This prompt guides the agent in writing the schema code.
# Schema Generation Task
Read the observations in `notes/observations.md`.
Based on those observations, create Zod schemas for each CSV file. Place them in the `src/schemas/` directory.
Requirements:
- One schema file per CSV (e.g., `src/schemas/users.ts`, `src/schemas/orders.ts`).
- Include JSDoc comments explaining each field.
- Add appropriate validations (email format, date strings, enums).
- Create a barrel export in `src/schemas/index.ts`.
Example schema structure:
```typescript
import { z } from 'zod';
/**
* Represents a user record from the users.csv file.
*/
export const UserSchema = z.object({
id: z.number().int().positive(),
name: z.string().min(1),
email: z.string().email(),
created_at: z.string().datetime(),
status: z.enum(['active', 'inactive'])
});
export type User = z.infer<typeof UserSchema>;Focus on accuracy. The schemas should be able to validate the real data from the CSV files.
### Add the Codon
Add the `generate` codon to your `hank.json`, after the `observe` codon:
```json filename="hank.json"
// ... observe codon from Step 1 ...
{
"id": "generate",
"name": "Generate Zod Schemas",
"model": "sonnet",
"continuationMode": "fresh",
"promptFile": "./prompts/generate.md",
"rigSetup": [
{
"type": "copy",
"copy": {
"from": "./templates/typescript",
"to": "src"
}
},
{
"type": "command",
"command": {
"run": "bun install",
"workingDirectory": "lastCopied"
}
}
],
"checkpointedFiles": [
"src/schemas/**/*.ts",
"src/package.json"
]
}
// ... rest of hank array ...Notice:
- Model upgrade: Schema generation is more complex than observation, so we use the more capable
sonnetmodel. continuationMode: "fresh": We're reading from files created by the previous step, not continuing a conversation.- Rig setup: The
rigSetupfirst copies the template, then runsbun installinside the new directory (lastCopied) to install dependencies. See Rigs for more on working directories.
Why Not continue-previous?
You might wonder why we don't continue the conversation from the observation codon. Two reasons:
- Different models: We're switching from
haikutosonnet. Different models can't share a continuous session. - Context isolation: The observations are stored in a file. Reading from a file is more reliable and inspectable than relying on accumulated conversational context, which can drift.
Design Principle: Prefer file-based handoffs over context continuation when you can. Files are persistent, inspectable, and explicit.
Run and Evaluate
hankweave --data ./dataThe first codon runs (or is skipped if its output already exists), then the second begins. Watch the agent:
- Read your observations.
- Create schema files.
- Install dependencies.
Check the results:
cat ~/.hankweave-executions/{your-execution}/src/schemas/users.tsDoes it match your data? Are the types correct? Is the enum accurate?
Step 3: Validation Loop
AI-generated code rarely works perfectly on the first try. Type errors, missing imports, or incorrect assumptions about data are expected. Loops are how we handle this by giving the agent a chance to fix its own mistakes.
Write the Validation Prompt
This prompt tells the agent how to check its work and fix errors.
# Schema Validation Task
Your generated schemas may have errors. Your task is to find and fix them.
First, run the TypeScript type checker to find static errors:
```bash
cd src && bun run typecheckIf there are type errors, read the error messages carefully, fix the schema files, and run typecheck again to verify.
Next, test that the schemas can parse the actual data:
cd src && bun testIf tests fail, adjust the schemas to correctly represent the data in the CSV files.
Continue this process of checking and fixing until both typecheck and test pass, or explain what is blocking you.
### Add the Loop
Add the `validate` loop to `hank.json` after the `generate` codon:
```json filename="hank.json"
// ... generate codon from Step 2 ...
{
"type": "loop",
"id": "validate",
"name": "Schema Validation Loop",
"description": "Iteratively fix schema issues",
"terminateOn": {
"type": "iterationLimit",
"limit": 3
},
"codons": [
{
"id": "fix-schemas",
"name": "Validate and Fix Schemas",
"model": "sonnet",
"continuationMode": "continue-previous",
"promptFile": "./prompts/validate.md",
"checkpointedFiles": ["src/schemas/**/*.ts"],
"rigSetup": [
{
"type": "command",
"command": {
"run": "bun run typecheck",
"workingDirectory": "src"
},
"allowFailure": true
}
]
}
]
}
// ... rest of hank array ...Key points:
type: "loop": This defines a loop block, not a single codon.iterationLimit: 3: The agent gets a maximum of 3 attempts to fix issues.continuationMode: "continue-previous": Each iteration inside the loop remembers the previous attempts, allowing the agent to learn from its mistakes.allowFailure: true: We expect thetypecheckcommand might fail—that's the whole point of the loop. This setting prevents a failure from stopping the hank.
Why an Iteration Limit?
We use iterationLimit instead of a condition like contextExceeded because:
- It provides predictable costs (3 iterations maximum).
- Type errors should be fixable in a few passes. If not, there's likely a deeper issue.
- It preserves context for any codons that might run after the loop.
Context Budgeting: If you used contextExceeded as the termination condition, the agent would keep trying to fix errors until its context window filled up, which could be expensive. iterationLimit gives you direct control over the cost.
Step 4: Documentation Codon
The code is working—now make it understandable for humans. This final codon takes the validated schemas and produces documentation.
Write the Prompt
# Documentation Generation Task
Create comprehensive documentation for the schemas in `src/schemas/`.
Generate a file at `docs/CODEBOOK.md` with the following sections:
1. **Overview**: A brief summary of the datasets this codebook covers.
2. **Schemas**: For each schema, provide:
- Table name and purpose.
- Field-by-field documentation, explaining what the data represents.
- Example valid and invalid values for key fields.
- Relationships to other tables.
3. **Usage**: A short guide on how to import and use the generated schemas in a TypeScript project.
The documentation should be clear enough for non-developers to understand. Focus on the meaning of the data, not just the technical types.
Also, create a `docs/CHANGELOG.md` file summarizing the schemas that were generated.Add the Codon
Add the document codon to hank.json after the validate loop:
// ... validate loop from Step 3 ...
{
"id": "document",
"name": "Generate Documentation",
"model": "sonnet",
"continuationMode": "fresh",
"promptFile": "./prompts/document.md",
"rigSetup": [
{
"type": "command",
"command": {
"run": "mkdir -p docs",
"workingDirectory": "project"
}
}
],
"checkpointedFiles": ["docs/**/*"],
"outputFiles": [
{
"copy": ["src/schemas/**/*.ts", "docs/**/*"],
"beforeCopy": [
{
"type": "command",
"command": {
"run": "bun run typecheck",
"workingDirectory": "src"
}
}
]
}
]
}
// ... rest of hank array ...Notice:
continuationMode: "fresh": We start with a clean slate for documentation to ensure it's based on the final, validated files.outputFiles: This block defines the final artifacts of the hank. It copies the schemas and docs to the results directory.beforeCopy: As a final sanity check, it runstypecheckone last time. If the check fails, the hank stops and nothing is copied, ensuring only valid code is output.
Step 5: Add Sentinels
So far, each codon has run in sequence. Sentinels are different: they run in parallel, observing the workflow without blocking it. Think of them as background monitors that watch what's happening and produce their own output.
Understanding Sentinel Templating
Sentinels use Eta templating syntax (<%= %>) to dynamically insert data from the execution into their prompts. Inside a template, you have access to several variables:
it.events: An array of the events that triggered the sentinel.it.timestamp: The time when the sentinel fired.it.context: The current execution context.
For example, <%= JSON.stringify(it.events, null, 2) %> inserts all triggering events as formatted JSON. See Sentinels for a full reference.
Cost Tracker Sentinel
Create sentinels/cost-tracker.sentinel.json to watch token usage.
{
"id": "cost-tracker",
"name": "Cost Tracker",
"description": "Track token usage and costs",
"model": "haiku",
"trigger": {
"type": "event",
"on": ["token.usage"]
},
"execution": {
"strategy": "timeWindow",
"milliseconds": 30000
},
"userPromptText": "Summarize the token usage so far. Calculate total input tokens, output tokens, and estimated cost. List the most expensive operations.\n\nEvents:\n<%= JSON.stringify(it.events, null, 2) %>",
"joinString": "\n\n---\n\n"
}Progress Narrator Sentinel
Create sentinels/narrator.sentinel.json to generate human-readable progress updates.
{
"id": "narrator",
"name": "Progress Narrator",
"description": "Human-readable progress updates",
"model": "haiku",
"trigger": {
"type": "event",
"on": ["assistant.action", "tool.result"]
},
"execution": {
"strategy": "debounce",
"milliseconds": 10000
},
"systemPromptText": "You are a technical writer summarizing AI agent progress. Be concise and factual.",
"userPromptText": "Based on these events, write a brief paragraph about what the agent just accomplished:\n\n<%= JSON.stringify(it.events, null, 2) %>",
"joinString": "\n\n"
}Attach Sentinels to Codons
Update your codons to attach the sentinels. For example, add a sentinels array to the generate codon:
// In the "generate" codon definition
"sentinels": [
{
"sentinelConfig": "./sentinels/narrator.sentinel.json"
},
{
"sentinelConfig": "./sentinels/cost-tracker.sentinel.json"
}
]Advanced Sentinel Configuration: For more control, you can add a settings object alongside sentinelConfig:
{
"sentinelConfig": "./sentinels/narrator.sentinel.json",
"settings": {
"failCodonIfNotLoaded": false,
"outputPaths": { "logFile": "narrator-output.md" }
}
}See Sentinels for details on all available settings.
Add the same sentinels array to the validate loop and the document codon. Sentinels are cheap—they use haiku and process events in batches, providing valuable insight with minimal overhead.
The Complete Hank
Here is the full hank.json file, putting all the pieces together.
{
"meta": {
"name": "Data Codebook Generator",
"version": "1.0.0",
"description": "Generate documented schemas from CSV files"
},
"hank": [
{
"id": "observe",
"name": "Observe Data Structure",
"model": "haiku",
"continuationMode": "fresh",
"promptFile": "./prompts/observe.md",
"rigSetup": [
{
"type": "command",
"command": {
"run": "mkdir -p notes",
"workingDirectory": "project"
}
}
],
"checkpointedFiles": ["notes/**/*"]
},
{
"id": "generate",
"name": "Generate Zod Schemas",
"model": "sonnet",
"continuationMode": "fresh",
"promptFile": "./prompts/generate.md",
"rigSetup": [
{
"type": "copy",
"copy": {
"from": "./templates/typescript",
"to": "src"
}
},
{
"type": "command",
"command": {
"run": "bun install",
"workingDirectory": "lastCopied"
}
}
],
"checkpointedFiles": ["src/schemas/**/*.ts", "src/package.json"],
"sentinels": [
{ "sentinelConfig": "./sentinels/narrator.sentinel.json" },
{ "sentinelConfig": "./sentinels/cost-tracker.sentinel.json" }
]
},
{
"type": "loop",
"id": "validate",
"name": "Schema Validation Loop",
"description": "Iteratively fix schema issues",
"terminateOn": {
"type": "iterationLimit",
"limit": 3
},
"codons": [
{
"id": "fix-schemas",
"name": "Validate and Fix Schemas",
"model": "sonnet",
"continuationMode": "continue-previous",
"promptFile": "./prompts/validate.md",
"checkpointedFiles": ["src/schemas/**/*.ts"],
"rigSetup": [
{
"type": "command",
"command": {
"run": "bun run typecheck",
"workingDirectory": "src"
},
"allowFailure": true
}
],
"sentinels": [
{ "sentinelConfig": "./sentinels/narrator.sentinel.json" }
]
}
]
},
{
"id": "document",
"name": "Generate Documentation",
"model": "sonnet",
"continuationMode": "fresh",
"promptFile": "./prompts/document.md",
"rigSetup": [
{
"type": "command",
"command": {
"run": "mkdir -p docs",
"workingDirectory": "project"
}
}
],
"checkpointedFiles": ["docs/**/*"],
"outputFiles": [
{
"copy": ["src/schemas/**/*.ts", "docs/**/*"],
"beforeCopy": [
{
"type": "command",
"command": {
"run": "bun run typecheck",
"workingDirectory": "src"
}
}
]
}
],
"sentinels": [
{ "sentinelConfig": "./sentinels/narrator.sentinel.json" },
{ "sentinelConfig": "./sentinels/cost-tracker.sentinel.json" }
]
}
]
}Run the Full Hank
Execute the complete workflow with a single command:
hankweave --data ./dataWatch the TUI as the hank progresses through each step:
- observe: Haiku analyzes the CSVs and writes notes (fast and cheap).
- generate: Sonnet creates schemas from the observations.
- validate: The loop runs
typecheck, giving the agent up to 3 chances to fix issues. - document: Sonnet writes documentation and copies the final, validated artifacts to the output directory.
All the while, your sentinels will run in parallel, generating progress summaries and cost reports.
What You've Learned
You just built a real-world hank from scratch. Here are the core patterns you used:
| Pattern | Where Used |
|---|---|
| Model selection by task | Haiku for observation, Sonnet for generation |
| File-based handoffs | Observations → Schema generation |
| Rig setup with templates | Copying the TypeScript project structure |
| Validation loops | Iterative schema fixing |
allowFailure in loops | Typecheck might fail—that's expected |
| Sentinels for monitoring | Progress and cost tracking |
Finalizing with outputFiles | Copying results to the output directory |
Debugging Tips
When things go wrong:
- Check the observation notes: Did the agent see your data correctly in the first step? The entire pipeline depends on this.
- Read the agent logs: You can find detailed JSONL logs for each agent interaction in the execution directory (e.g.,
~/.hankweave-executions/{runId}/). - Inspect checkpoints: The TUI shows created checkpoints. You can use them to roll back to a known good state and try a different approach.
- Iterate on prompts: If the agent misunderstands a task, clarify the prompt with more specific instructions or better examples.
For example, to investigate why a codon failed, you can inspect its log file:
# Find the relevant execution directory and log file
cat ~/.hankweave-executions/{runId}/fix-schemas#1-sonnet.logLook for error messages from tools or assistant reasoning that seems flawed. See Debugging for comprehensive troubleshooting strategies.
Adapting This Pattern
This tutorial built a data codebook, but the structure applies to many other problems. The same Observe → Generate → Validate → Document pipeline works for:
- Code generation: Observe specs → Generate code → Run tests → Document the API.
- Document processing: Extract text → Transform to structured data → Validate against a schema → Format for output.
- Automated testing: Generate test cases → Run them → Fix failing code → Report the results.
The underlying principles transfer to any multi-step AI workflow: start with cheap observation, use more capable models for generation, validate with controlled iteration, document with a clean context, and monitor everything with parallel sentinels.
Related Pages
- Codons — Understand each field in a codon.
- Loops — Learn about termination modes and constraints.
- Rigs — See all available setup operations and path handling options.
- Sentinels — Explore triggers, strategies, and structured output.
- Checkpoints — Understand how to roll back when things go wrong.
Next Steps
Now that you've built a working hank:
- Add more sophisticated sentinels that produce structured output or use conversational mode.
- Experiment with the
contextExceededtermination condition for open-ended tasks. - Apply these patterns to build hanks for your own use cases.
- Read Loops and Checkpoints to learn advanced control strategies.