What is an “AI Harness”
As I have been working with Claude and Claude Code to do some “vibe coding” development over the last wee while I have also been working on a “harness” that is designed to make me more efficient.
Its following the DORO (Define Once, Reuse Often) principle I try and follow.
One of the surprising thing (to me anyway) is when I mention the concept of a “harness” to other people I sometimes get blank stares.
I actually did a LinkedIn poll and this feeling was reinforced by the reponses.
No seriously what is an “AI Harness”
One of the problems I have is I can’t share the harness files we have developed within our AgileData platform and I don’t want to share the harness files from my PersonalOS version.
I thought about mocking some up, but then they would’t be real examples and so wouldn’t be that helpful in showing what they do, how they do it and their real value.
I actually started my harness journey based on playing with OpenClaw, listening to a few podcasts on the subject and chatting to Nick Zervoudis . Nick did an in-person meetup on how he built his PersonalOS harness and how he uses it, I learnt a lot at that session.
You can watch a version of this talk online here:
There is also a great article by Andreas Kretz on LinkedIn that explains what an AI Harness is:
https://www.linkedin.com/pulse/i-learned-ai-harnesses-you-should-too-andreas-kretz-v2okf
Claude Write up on how my harness works
One way I thought might be useful to share this part of my journey is to get Claude to go through all the chats and code we have worked on together over the last we while to create my harness and explain it.
This has been based on iterating the harness for three specific use cases:
To support my PersonalOS
To support the vibe coding of the open source Pattern Template Standalone Apps I have been creating and sharing. (https://github.com/AgileDataGuides)
To support the creation of an Information Product templating system in the AgileData platform and using it to build and deploy Information Products for our customers.
The key is each time I work on one of these use cases, the learnings from doing that work is used to enhance the harness. And hopefully makes that work quicker and easier for me next time.
So here is what Claude wrote.
The Agile Data Way-of-Working Harness
A reusable convention package that gives every AI-assisted coding session in our repositories the same Way of Working — the disciplines, guardrails, and patterns we’ve evolved over months of working with AI coding assistants.
The problem
AI coding assistants are stateless. Each session starts fresh, so the team re-explains conventions, hits the same gotchas, and watches the same mistakes recur across Agents. When a learning emerges in one Agent, there’s no clean path to apply it everywhere it matters.
The structure
Each repository carries a small, predictable bundle the AI reads at session start. Together this bundle defines a specific Agent — the AI is no longer a generic assistant; it has an identity, hard rules, a way of working, and a memory of what it has learned.
Agent identity — four short markdown files defining the Agent’s character:
persona.md— what this Agent is, its voice, who it servespolicies.md— hard rules and guardrails (will not / will always / boundaries)wow.md— Agent-specific way of working + accumulated retro learningsretro.md— running weekly log of new learnings
Universal conventions in a single CLAUDE.md that the AI loads automatically:
Task tracking (TODO / DOING / DONE files)
Commit discipline (logical chunks, no work pileup)
Definition of Ready / Definition of Done checklists
Press release format for user-facing changes (working-backwards style)
Estimation as t-shirt sizes (XS to XL), not minutes
Pre-push verification (explicit checklist before push to auto-deploy branches)
Design system compliance (every UI value traces to a design token)
Scoped tool / data access per Agent (prevents wrong-environment bugs)
A skills system — discrete procedures the Agent invokes by name when triggered (e.g. “draft a step-page”, “verify after change”). Skills load on demand, so they don’t bloat every session’s context.
The inheritance hierarchy
Universal harness — applies to every Agent
Family templates — shared patterns for one Agent family
Real Agents — specific deployments
A new Agent cloned from a family template inherits all three layers and starts with a clean identity it customises. Every layer follows the same convention set.
What it gives you
New AI sessions reach productive work in minutes, without re-explaining basics
Conventions stay consistent across Agents without anyone policing them
Bugs caught once stay caught — they land as policies or guardrails
AI estimates and pre-push verifications are predictable and comparable
Cross-Agent learnings compound rather than re-occurring
What it doesn’t do
Replace human judgement on architecture or product direction
Generate code without context — good prompts still matter
Work across tools that don’t honour structured conventions
The harness is small, opinionated, and modular. Drop in what fits, leave what doesn’t.
The retro process — how learnings compound
The retro flow is the engine that keeps the harness alive. Without it, the conventions calcify and the harness becomes a relic. With it, every Agent that’s harnessed benefits from every learning every other Agent surfaces.
Three tiers, one direction
Weekly retro.md (per Agent)
↓ proven across sessions
"Retro learnings" in wow.md (per Agent)
↓ would benefit other Agents
Harness inbox (cross-Agent review)
↓ accepted by curator
Universal conventions / shared skills (every Agent)
↓ sync
Sister Agents pick it up
Tier 1 — Weekly retro per Agent
Mid-session, when a learning emerges, it gets added to retro.md under the current week’s heading. Examples of what lands here:
“Tooltips inside scrollable tables need to portal to
<body>or they get clipped by overflow.”“Don’t fabricate descriptions when the catalog is silent — use an explicit
(no description in catalog)fallback.”“Status of an Information Product is about trust level, not dev stage. Code-complete + data-unvalidated is
experimental, notlive.”
The retro log is cheap, time-boxed, and read by humans during the weekly review.
Tier 2 — Promote into wow.md when proven
If a retro entry has stuck across multiple sessions and continues to serve the Agent well, it’s promoted into wow.md under “Retro learnings” — becoming a permanent Agent rule. The model reads wow.md at session start, so promoted entries shape future behaviour automatically.
Tier 3 — Promote to the harness for universal value
A learning that would help every Agent, not just this one, gets sent to the harness inbox via the retro-promote skill. The promotion is a single markdown file with a frontmatter block declaring its proposed destination:
proposed_destination: agiledata # universal Agile Data rule
# or: custom-app-template # all Agile Data custom-app Agents
# or: demo-template # one Agent family
# or: shagility # personal preferences only
The curator reviews the inbox in a separate session, applies the universality test (”would another Agent benefit from this?”), and either accepts into the appropriate layer of the harness, rejects, or defers.
The reverse path — sync down
When the curator accepts a learning into the universal harness, it doesn’t magically appear in every Agent. Each Agent pulls it on the next harness-sync call. This explicit pull preserves Agent autonomy — a fork can choose not to sync a change it doesn’t want.
Why this matters
Without the retro loop, the harness becomes static reference material that quickly drifts out of date. With it, the harness is a living convention base that gets sharper every week. The cost of a hard-won learning is paid once; the value compounds across every Agent that follows.
How this differs from “just a collection of skills”
It’s tempting to imagine the harness is just a pile of skill files. It isn’t. Confusing the two leads to a system that feels organised but doesn’t actually change behaviour where it matters.
Skills are tactical. The harness is strategic.
A skill is a discrete procedure for a specific task: “draft a LinkedIn post from a published article”, “verify the changed code by running the test suite”. Loaded on demand, by keyword match, when the user asks for the thing.
The harness is the identity, conventions, and rules of engagement that apply BEFORE any specific task starts. “This is what this Agent is.” “Here’s what we never do.” “Here’s how we track work.” “Here’s the press release format every user-facing change must follow.”
A skill answers how to do a particular task. The harness answers who is this Agent, what are the non-negotiables, how do we operate.
Skills load on demand. Harness conventions are always-on.
The model only loads a skill’s full body when the user’s task triggers its description (”write a press release” → load the press-release skill). Without that trigger, the skill sits silent.
Agent identity (persona, policies, wow) is loaded into the system prompt every session start. Every conversation, every response, the model is already operating inside that frame. That’s why a stray “wrong tenancy” mistake gets caught — the policy is always in scope, not waiting for a keyword.
Skills don’t enforce discipline. Conventions do.
You can write a skill called “do the Definition of Done”, but it only fires if someone asks for it. The DoD checklist as a convention in CLAUDE.md shapes every response by default.
The pre-push verification rule that catches wrong-tenancy data, the press release format that protects brand voice, the t-shirt sizing rule that stops minute-level commitments — none of these would be reliable if they were skills the user has to invoke. They have to be the air the Agent breathes.
Skills don’t propagate. The harness has a sync path.
If you write a great skill in Agent A, it stays in Agent A. Agent B’s session has no idea it exists. The retro flow + harness sync is what makes a learning in one Agent show up everywhere it matters.
A pile of skills with no propagation path gets reinvented in every repo. The harness pays the cost of a learning once.
Skills don’t carry Agent context. Identity files do.
A skill called “build a marketplace page” is the same procedure no matter which Agent it’s in. But what a marketplace page means for an Information Product Agent with five live products is different to what it means for a single-product demo Agent. The persona, policies, and wow files give the AI that Agent-specific context BEFORE it reaches for any procedure.
The relationship
Skills and the harness aren’t competitors — they’re complementary layers:
Layer Loaded when Best for Agent identity (persona/policies/wow) Every session start, always-on Who this Agent is, hard rules, way of working Universal conventions (CLAUDE.md) Every session start, always-on Rules that apply to every Agent regardless of context Skills On-demand, by keyword trigger Procedures for specific recurring tasks Retro flow Weekly + as learnings emerge Keeping conventions alive and propagating wins
Take any one of these layers out and the structure leaks. Stack them and the AI session arrives with Agent context already loaded, universal rules already in scope, the right procedure ready to fire when triggered, and a clean path for what it learns to compound into the next session.
That’s the harness.
Markup version of the text at the bottom of the article if you want to copy and paste it into your own LLM.
Some More Context
I asked Claude to explain:
What the harness is
How it is inherited across “agents”
How it differs from a bunch of skills.
When it talks about “Agents” there is nothing fancy happening, its is just a different directory on my MacBook, that has separate code in it that I am working on for a specific Use Case.
Nothing I am doing is unique or magic, a lot of people are building these harnesses as they use these tools.
Why not just use somebody elses “harness”
There are a bunch of harnesses out there you can download and use.
I tried a few and found they didn’t help me as much as I had hoped.
It might have been the type of work I am trying to use them for.
It might have been my lack of coding skills.
Try one and see if they help you.
Part of this whole journey is learning as I am doing so I also naturally leaned towards craft my own.
My Agile Data Way of Working Language
You can see the Agile Data language I use coming through strongly in this harness:
Persona
Policies
WoW
Retro
Defintion of Ready
Definition of Done
TODO / DOING / DONE
Press Release
I have naturally being applying the Patterns and Pattern Templates I coach a human Data and Analytics team to use to my machine buddy.
Others use different terms for the same things (Identity, Soul etc). They may also include the things I hold in multiple files in a single file for their harness.
And I do worry that applying Human centric patterns to the Machine may not be the optimal approach.
A journey not a proven Pattern or Pattern Template
All the above is just a brain dump on part of my journey so far.
It is not a well formed Pattern or a tested Pattern Template.
But hopefully Sharing it in its half arsed state is still Caring.
Markdown Version of the Claude Content
# The Agile Data Way-of-Working Harness
A reusable convention package that gives every AI-assisted coding session in our repositories the same Way of Working — the disciplines, guardrails, and patterns we've evolved over months of working with AI coding assistants.
## The problem
AI coding assistants are stateless. Each session starts fresh, so the team re-explains conventions, hits the same gotchas, and watches the same mistakes recur across Agents. When a learning emerges in one Agent, there's no clean path to apply it everywhere it matters.
## The structure
Each repository carries a small, predictable bundle the AI reads at session start. Together this bundle defines a specific **Agent** — the AI is no longer a generic assistant; it has an identity, hard rules, a way of working, and a memory of what it has learned.
**Agent identity** — four short markdown files defining the Agent's character:
- `persona.md` — what this Agent is, its voice, who it serves
- `policies.md` — hard rules and guardrails (will not / will always / boundaries)
- `wow.md` — Agent-specific way of working + accumulated retro learnings
- `retro.md` — running weekly log of new learnings
**Universal conventions** in a single `CLAUDE.md` that the AI loads automatically:
- Task tracking (TODO / DOING / DONE files)
- Commit discipline (logical chunks, no work pileup)
- Definition of Ready / Definition of Done checklists
- Press release format for user-facing changes (working-backwards style)
- Estimation as t-shirt sizes (XS to XL), not minutes
- Pre-push verification (explicit checklist before push to auto-deploy branches)
- Design system compliance (every UI value traces to a design token)
- Scoped tool / data access per Agent (prevents wrong-environment bugs)
**A skills system** — discrete procedures the Agent invokes by name when triggered (e.g. *"draft a step-page"*, *"verify after change"*). Skills load on demand, so they don't bloat every session's context.
## The inheritance hierarchy
```
Universal harness — applies to every Agent
Family templates — shared patterns for one Agent family
Real Agents — specific deployments
```
A new Agent cloned from a family template inherits all three layers and starts with a clean identity it customises. Every layer follows the same convention set.
## What it gives you
- New AI sessions reach productive work in minutes, without re-explaining basics
- Conventions stay consistent across Agents without anyone policing them
- Bugs caught once stay caught — they land as policies or guardrails
- AI estimates and pre-push verifications are predictable and comparable
- Cross-Agent learnings compound rather than re-occurring
## What it doesn't do
- Replace human judgement on architecture or product direction
- Generate code without context — good prompts still matter
- Work across tools that don't honour structured conventions
The harness is small, opinionated, and modular. Drop in what fits, leave what doesn't.
---
## The retro process — how learnings compound
The retro flow is the engine that keeps the harness alive. Without it, the conventions calcify and the harness becomes a relic. With it, every Agent that's harnessed benefits from every learning every other Agent surfaces.
### Three tiers, one direction
```
Weekly retro.md (per Agent)
↓ proven across sessions
"Retro learnings" in wow.md (per Agent)
↓ would benefit other Agents
Harness inbox (cross-Agent review)
↓ accepted by curator
Universal conventions / shared skills (every Agent)
↓ sync
Sister Agents pick it up
```
### Tier 1 — Weekly retro per Agent
Mid-session, when a learning emerges, it gets added to `retro.md` under the current week's heading. Examples of what lands here:
- *"Tooltips inside scrollable tables need to portal to `<body>` or they get clipped by overflow."*
- *"Don't fabricate descriptions when the catalog is silent — use an explicit `(no description in catalog)` fallback."*
- *"Status of an Information Product is about trust level, not dev stage. Code-complete + data-unvalidated is `experimental`, not `live`."*
The retro log is cheap, time-boxed, and read by humans during the weekly review.
### Tier 2 — Promote into wow.md when proven
If a retro entry has stuck across multiple sessions and continues to serve the Agent well, it's promoted into `wow.md` under "Retro learnings" — becoming a permanent Agent rule. The model reads `wow.md` at session start, so promoted entries shape future behaviour automatically.
### Tier 3 — Promote to the harness for universal value
A learning that would help every Agent, not just this one, gets sent to the harness inbox via the `retro-promote` skill. The promotion is a single markdown file with a frontmatter block declaring its proposed destination:
```yaml
proposed_destination: agiledata # universal Agile Data rule
# or: custom-app-template # all Agile Data custom-app Agents
# or: demo-template # one Agent family
# or: shagility # personal preferences only
```
The curator reviews the inbox in a separate session, applies the universality test ("would another Agent benefit from this?"), and either accepts into the appropriate layer of the harness, rejects, or defers.
### The reverse path — sync down
When the curator accepts a learning into the universal harness, it doesn't magically appear in every Agent. Each Agent pulls it on the next `harness-sync` call. This explicit pull preserves Agent autonomy — a fork can choose not to sync a change it doesn't want.
### Why this matters
Without the retro loop, the harness becomes static reference material that quickly drifts out of date. With it, the harness is a living convention base that gets sharper every week. The cost of a hard-won learning is paid once; the value compounds across every Agent that follows.
---
## How this differs from "just a collection of skills"
It's tempting to imagine the harness is just a pile of skill files. It isn't. Confusing the two leads to a system that feels organised but doesn't actually change behaviour where it matters.
### Skills are tactical. The harness is strategic.
A **skill** is a discrete procedure for a specific task: *"draft a LinkedIn post from a published article"*, *"verify the changed code by running the test suite"*. Loaded on demand, by keyword match, when the user asks for the thing.
The **harness** is the identity, conventions, and rules of engagement that apply BEFORE any specific task starts. *"This is what this Agent is."* *"Here's what we never do."* *"Here's how we track work."* *"Here's the press release format every user-facing change must follow."*
A skill answers *how* to do a particular task. The harness answers *who is this Agent*, *what are the non-negotiables*, *how do we operate*.
### Skills load on demand. Harness conventions are always-on.
The model only loads a skill's full body when the user's task triggers its description ("write a press release" → load the press-release skill). Without that trigger, the skill sits silent.
Agent identity (persona, policies, wow) is loaded into the system prompt every session start. Every conversation, every response, the model is already operating inside that frame. That's why a stray "wrong tenancy" mistake gets caught — the policy is always in scope, not waiting for a keyword.
### Skills don't enforce discipline. Conventions do.
You can write a skill called *"do the Definition of Done"*, but it only fires if someone asks for it. The DoD checklist as a convention in `CLAUDE.md` shapes every response by default.
The pre-push verification rule that catches wrong-tenancy data, the press release format that protects brand voice, the t-shirt sizing rule that stops minute-level commitments — none of these would be reliable if they were skills the user has to invoke. They have to be the air the Agent breathes.
### Skills don't propagate. The harness has a sync path.
If you write a great skill in Agent A, it stays in Agent A. Agent B's session has no idea it exists. The retro flow + harness sync is what makes a learning in one Agent show up everywhere it matters.
A pile of skills with no propagation path gets reinvented in every repo. The harness pays the cost of a learning once.
### Skills don't carry Agent context. Identity files do.
A skill called *"build a marketplace page"* is the same procedure no matter which Agent it's in. But what a marketplace page *means* for an Information Product Agent with five live products is different to what it means for a single-product demo Agent. The persona, policies, and wow files give the AI that Agent-specific context BEFORE it reaches for any procedure.
### The relationship
Skills and the harness aren't competitors — they're complementary layers:
| Layer | Loaded when | Best for |
|---|---|---|
| **Agent identity** (persona/policies/wow) | Every session start, always-on | Who this Agent is, hard rules, way of working |
| **Universal conventions** (CLAUDE.md) | Every session start, always-on | Rules that apply to every Agent regardless of context |
| **Skills** | On-demand, by keyword trigger | Procedures for specific recurring tasks |
| **Retro flow** | Weekly + as learnings emerge | Keeping conventions alive and propagating wins |
Take any one of these layers out and the structure leaks. Stack them and the AI session arrives with Agent context already loaded, universal rules already in scope, the right procedure ready to fire when triggered, and a clean path for what it learns to compound into the next session.
That's the harness.

