Guides
Philosophy & When to Use

Philosophy & When to Use Hankweave

Hankweave is a specialized tool—intentionally opinionated and narrow. Used correctly, it's extraordinary. Used incorrectly, it's worse than just talking to Claude directly. This guide helps you decide if Hankweave fits your needs and understand the thinking behind its design.

Who is this for? Anyone considering Hankweave, or anyone who wants to understand why it works the way it does. Read this before building complex Hanks—the mental models here will save you debugging time later.

Is Hankweave Right for You?

Task Horizon

Different tasks require different "horizons"—the length of time or number of operations required to complete them productively.

HorizonScopeExamplesBest Tool
Ultra-shortSingle LLM callFind a fact in a documentLLM directly
Short~10 calls, ~1 minuteSearch the web, combine resultsCoding agent
MediumHundreds of calls, ~20 minutesFollow references, compile a reportCoding agent with steering
LongThousands of calls, hours to daysUnderstand a complex dataset, build a systemHankweave

Coding agents are incredible for ultra-short through medium horizon tasks—you can steer, intervene, and course-correct in real time. Hankweave is designed for tasks that extend into the long horizon, where human attention can't scale, where reliability across many runs matters, and where the work needs to survive beyond one session.

If the task fits comfortably in a single Claude Code session, you probably don't need Hankweave.

When Hankweave Fits

You've built something that works, and want it to keep working. You spent three hours with Claude Code and got something amazing—a data pipeline, a code review system, a document processor. It works! But when you try to run it again next week, it fails in new ways. Or it works differently. Or you can't remember exactly what you did. Hankweave gives you a place to freeze what worked so you can run it reliably.

You want to share AI capability across a team. The organization has a few people who can do incredible things with AI, but that ability is stuck in their heads. They can't hand it off. They can't take vacation without things breaking. Hankweave makes the "how" explicit and shareable.

You have complex workflows where reliability matters. You can see how AI could automate something valuable—data processing, code migration, document analysis—but oftentimes, agentic systems are just too brittle to put in production. The proof-of-concept works 80% of the time, when you need 99%. Hankweave is infrastructure for that kind of reliability.

You're building AI workflows for others. As a consultant or contractor, you need to hand over something that works without you. Not a conversation transcript. Not a prayer. Something your client can run, maintain, and improve.

When Hankweave Doesn't Fit

You're exploring something for the first time. You don't know what the output looks like. You don't know what the intermediate steps are. You're charting new territory. Use a coding agent directly—that's what they're for. Hankweave comes after the exploration phase, not during it.

You need something once. You're going to run this one time, get your answer, and never think about it again. Just talk to Claude. Or GPT. Or Gemini.

You need constant steering. The work requires decisions every few minutes based on what the agent produces. It's highly interactive, more conversation than execution. Hankweave is designed for minimal intervention—it's infrastructure, not a conversation partner.

The task has too many judgment calls. A "judgment call" is a decision that's contested even among humans. Some tasks are dense with them—creative writing, strategic decisions, anything where "it depends" is the honest answer. For these, a human needs to be in the loop. Hankweave can surface judgment calls, but if every step is a judgment call, interactive work is better.

The Honest Tradeoff

Hanks are harder to write than just generating output. You can't just describe what you want and get a working Hank. The process involves working with a coding agent first, understanding what actually works, and then freezing that into structure. More upfront effort.

But Hanks are vastly easier to debug, maintain, and improve. When something breaks, you know where to look. When you learn something new, you know where to put it. When someone else needs to use it, they can.

Think of it like the difference between a quick script and a proper codebase. The script is faster to write; the codebase is faster to live with.

The Track Bike Analogy: Hankweave is a track bike, not a pickup truck. A track bike is terrible for getting groceries—uncomfortable, impractical, overkill. But on a track, it's extraordinary. If you need a pickup truck, get a pickup truck. Coding agents are great pickup trucks. If you need a track bike, we built one.

The Mental Models

These principles guide Hankweave's design. Understanding them helps you reason through problems and make good decisions.

Brownfield Over Greenfield

Here's something that took us a while to understand: models getting better only helps greenfield work, not brownfield.

Every improvement in AI—more intelligence, longer context, better tool use—makes it easier to build something new from scratch. Wonderful. But it doesn't help with what happens next.

The moment something becomes semi-successful, it becomes brownfield. People use it. They depend on it. It needs to keep working. It accumulates edge cases. It needs maintenance.

Greenfield is where you discover what works. Brownfield is where you make it last.

Most AI tooling focuses on greenfield—making it easier to start, making the first experience magical. That's valuable, but it's only half the story.

Hankweave focuses on brownfield. It's not optimized for the first run; it's optimized for the hundredth run. For the run where something breaks at 3 AM. For the run where a new team member needs to understand what's happening. For the run six months from now when you've forgotten everything.

The Amber and the Mosquito

Here's a metaphor that captures what Hankweave does: Hankweave is amber. Your working AI workflow is the mosquito.

The value of amber isn't the amber itself—it's what it preserves. The mosquito's DNA. Information that would otherwise be lost to time.

When you work with a coding agent, valuable behavior emerges—patterns that work, sequences of operations that solve your problem, prompts that get good results. But this behavior is ephemeral. It exists in the conversation, in the moment, and then it's gone.

You can't give it to someone else. You can't run it again with confidence. You can't systematically improve it.

Hankweave is how you trap the mosquito. You freeze the working behavior into structure. Now you can give it to other people, run it reliably, improve it systematically, and extract the DNA later when you need something similar.

This reframes when to use Hankweave: after you have something worth preserving. Not before. Not during the messy exploration phase. After.

Chesterton's Fence

When inheriting code—or a Hank—the instinct is to "clean it up." Make it elegant. Resist this instinct.

Those ugly parts encode knowledge. That weird workaround exists because of a bug that bit someone. That special case handles an edge case they learned about the hard way. That verbose prompt addresses a model behavior that caused problems before.

Don't remove something until you understand why it was put there.

A codon that seems unnecessarily complex might be handling an edge case. A rig that seems redundant might be preventing a failure mode. A prompt that seems verbose might be addressing a model behavior that bit someone before.

The goal isn't elegant Hanks. It's Hanks that work—and that accumulate wisdom over time.

Antibrittle: Channeling Variability

Intelligence is stochastic. Humans can't do the same thing exactly the same way twice—that's why sports exist. LLMs are the same. Run the same prompt twice and get different results.

This variability often shows up as brittleness: random failures, path-dependent bugs, confident mistakes.

The naive solution is to try to eliminate variability. Pin everything down. Be maximally prescriptive. Remove all degrees of freedom. This makes things worse. Over-constrained systems are fragile. They work only in the exact conditions they were designed for. Any deviation—a slightly different input, a model update, a changed environment—and they shatter.

The better approach is to build systems that channel variability productively—that benefit from randomness rather than being destroyed by it.

How Hankweave does this:

PrincipleHow It Helps
Behavior, not callsSuccess is measured at the codon level, not individual LLM calls. A few failures along the way are fine if the overall behavior is correct.
TrenchesCodon boundaries are hard walls. Problems in one codon don't leak into the next. Blast radius is contained.
ReceiptsEverything is traceable. When something goes wrong, you can follow the chain back. Five nines of accountability instead of five nines of reliability.
Regions of freedomSome codons are tight (parsing, validation). Others are loose (exploration, analysis). Match the constraint level to the task.

Fresh Eyes by Default

Accumulated context is usually noise. Use continuationMode: "fresh" unless there's a specific reason for continue-previous.

When you use continue-previous, you're trusting that everything in the accumulated context is still relevant and helpful. Usually it isn't. The agent anchors on early decisions. Irrelevant details accumulate. The context fills with exploration that's no longer relevant.

Reset state often, and clearly. Each codon should be a clean room. Pass information through files, not accumulated conversation.

Less Is More

This is the mantra: less is more maintainable.

When you generate a Hank, the instinct is to handle everything. Create a codon for every possible step. Add prompts covering every edge case. Be comprehensive.

This creates unmaintainable systems.

A 50-codon Hank with thousand-line prompts might work once. But when it breaks—and it will—you'll have no idea where to look. The complexity overwhelms any ability to debug or improve.

A 5-codon Hank with focused prompts is easier to understand, debug, and improve. When it breaks, you know where to look.

The tradeoff is real: simpler Hanks take more thought to design. You have to decide what's essential. You have to trust the model to handle details rather than specifying everything.

But simpler Hanks last longer. They're easier for others to understand. They're easier for you to understand six months from now.

When in doubt, do less.

The Development Loop: CCEPL

We call it CCEPL: Claude Code Eval Print Loop. (Works with any coding agent; the name stuck.)

The key insight: architects don't design Hanks, they discover them.

Text
Work with Agent → Observe What Worked → Distill Into Prompt → Freeze Into Codon → Run & Iterate

Phase 1: Exploration

Don't start by writing a Hank. Start by doing the task yourself, with a coding agent.

Open Claude Code (or Cursor, or whatever). Talk through the problem. Solve it. Don't worry about structure or "doing it right." Just get it working.

This is greenfield work. Explore. Make mistakes. Follow tangents.

You can't freeze something that doesn't exist yet. A Hank encodes a workflow that works. If you haven't discovered that workflow through actual use, you're guessing.

If possible, do the task two or three times. Each time you'll notice things—prompts that worked, information you had to provide, where the agent went wrong. These observations become the Hank's structure.

Phase 2: Observation

After getting something working, stop and look at what happened.

What did the agent need to know? Go through the conversation. Every time you provided information—context, constraints, preferences, clarifications—that's something the agent needed but didn't have.

80-90% of the time, when an agent goes wrong, it's because it didn't know one of your preferences and made a judgment call in a different direction. It used Python instead of TypeScript. It structured code differently. It chose a different library. These aren't failures of intelligence—they're failures of information.

What did the agent do that you could have given it? Did it write a setup script? You could provide one. Did it figure out an API? You could give it documentation. Anything the agent had to figure out is a source of variance.

Where are the natural breakpoints? If you were handing this off to someone else, where would you draw the line? "I'll do research, you implement." "I'll set up, you code." These handoff points are where codon boundaries probably should be.

Phase 3: Sketching the Skeleton

Before writing individual prompts, sketch the overall structure:

Text
# Code Review Hank - Structure
 
## Phases
 
1. **Analyze** (tight) - Categorize files, identify entry points
2. **Deep Review** (loose) - Freedom to explore, follow interesting threads
3. **Synthesize** (tight) - Generate structured report following template
4. **Refine Loop** (2-3 iterations) - Fresh eyes each time
 
## Key Decisions
 
- Use Opus for synthesis (complex reasoning)
- Use Sonnet for analysis (cheaper, good enough)
- Fresh context everywhere except within refine loop iterations

This forces thinking about structure before details. Identifies where tight vs. loose constraints are needed. Reveals dependencies between codons.

Phase 4: Freezing

Create the codon. Start minimal—one codon, one prompt, basic rig. See if it works.

The temptation is to anticipate everything. Resist this. You don't know what will actually break until you run it.

Validate first:

Text
hankweave --validate ./hank.json ./your-data

This catches missing files, schema errors, invalid models, broken rigs—before you spend tokens.

Phase 5: Iteration

It probably won't work perfectly the first time.

When it fails, diagnose:

SymptomLikely CauseFix
Rig errorsSetup didn't workImprove the rig
Agent confusionCodon doing too muchSplit into multiple codons
Wrong outputPrompt unclearRevise the prompt
Different from expectedMissing preferencesAdd them explicitly

The Control-C point: When you run the codon and watch it execute, you'll eventually want to stop it. As you iterate, that point should get further away. First run: 30 seconds. Second run: 2 minutes. Third run: 5 minutes. When you can let it complete without intervention, the codon is frozen enough.

What This Means in Practice

Think in Problems, Not TODOs

A sequential TODO list linearizes tasks that are interdependent. "1. Find data source, 2. Write script, 3. Load data, 4. Analyze, 5. Output" looks clean but is fragile—what happens when step 4 fails because the data source from step 1 returns a different format than step 2 expected?

Instead, track problems: "Where does data come from? How do we get it usable? What analysis? What output does the user need?" This makes the stochastic nature of progress clearer.

Estimate Codon Weight

A codon is too heavy if it has 3+ distinct tasks. Signs to split:

Too heavySplit into
"Parse doc, find stubs, read source, analyze, write notes"Extract → Research → Synthesize
"Read codebase and generate docs"Analyze structure → Generate docs
"Review and fix all issues"Review (find issues) → Fix (address them)

Prep codons are often worth it. A codon that extracts/lists/maps makes later codons simpler and more focused.

The "Just Do It First" Pattern

When someone asks to build a Hank for a simpler task, often the best response is: "Let me just do this for you first, then we can freeze it into a Hank."

  1. Do the task directly, as a coding agent normally would
  2. See the process, the decisions, the outputs
  3. At the end, if it worked well, freeze it into a Hank
  4. The Hank is based on real experience, not speculation

This catches cases where a one-off task doesn't need Hank complexity.

Related Pages