What is an Agent Harness, You Ask?

Everyone has all the AI models now. What separates the kids from the adults isn't the model, it's the "harness", the hottest and least-understood word in enterprise AI. Here it is in plain English.

We've put AI on every desk, and people use it to search, write documents and re-write emails so they sound smarter and less condescending.

"Are we missing something?" a Transformation Lead asked me, thirty floors up in a glass-walled room in Sydney's CBD. Her company had purchased Microsoft Copilot licences for the whole division. On paper, one of the most complete rollouts I'd seen all year. In reality, a very expensive ultra-intelligent tool being used for monkey work.

She was missing something, and so are most people I meet. The model they had access to wasn't the problem; it's extraordinary, and quite frankly, a bit wasted on that work. What's missing is the thing that can harness the raw intelligence of frontier models to make money.

From Prompt Engineering to Harness Engineering

Back in 2022 we learned prompt engineering, optimising what you ask the model — and this became a whole career path for a bit. Then in 2025 context engineering emerged, which enabled LLM access to more data sources.

In early 2026 the frontier moved again to harness engineering, building the whole environment the model operates in, coined by Mitchell Hashimoto and formalised by OpenAI.

From prompt, to context, to harness: the three stages of LLM scaffolding — From a single prompt, to context, to a full harness: each time the model got smarter, the scaffolding around it mattered more, not less.

Notice the pattern: every time models got smarter, people assumed the scaffolding would matter less, and every time the opposite happened. A more capable brain can do far more, but only if it is allowed, trained, has a human in the loop, and is auditable.

The definition comes from LangChain's Harrison Chase: an agent is a model plus its harness. "If you're not the model, you're the harness."

This reframes the whole debate. The model is a commodity: you, your competitor and the start-up nipping at you all have the same Claude, GPT and Gemini, so nobody wins there.

"The harness is the differentiator, where the real engineering and the real competitive advantage lives." — LangChain

The Anatomy of an Agent Harness

Applying a human body metaphor, the harness has six distinct features:

Brain: Provides access to LLM API intelligence, thinking and reasoning, giving direction, sub-agent management. Specific models can be selected based on the nature of the request to balance cost with intelligence. Soul: The soul is its personality, conscience, the guardrails: what it may touch, what it may do, and a human sign-off before anything irreversible. Memory: The memory holds everything about the organisation and people it communicates with, what it has learned and the facts it needs to remember. It should have a way to determine if facts are true or uncertain. This prevents hallucination and confabulation. Hands: The hands are its tools and connectivity e.g. letting it actually click within a browser, send an email, update your CRM, query your Data Warehouse or talk to any system via API. Heart: The heart sets the rhythm of the Harness, it runs the chron jobs, scheduled tasks and task triggers to enable continuous work to be done without the user prompting it to initiate. This is the money maker: the golden goose that keeps laying eggs. Mouth: The mouth is how it communicates, in Slack, Teams, Google Chat, WhatsApp, Voice or Email. The same intelligence and context is omnipresent in whichever surface the user prefers.

Conceptual AI architecture: the autonomous agent model, with the central model surrounded by Memory, Soul, Mouth, Heart and Hands — The autonomous agent model: a central intelligence wrapped in memory, soul, mouth, heart and hands.

“The model decides what to do; the harness makes it possible.”

If you'd rather see it than read it, two short videos do the job. First, an eight-minute explainer of the prompt-to-context-to-harness shift:

And a demo that makes an ancient GPT-3.5 work reliably just by adding a harness:

What an Agent Harness looks like on your stack

The good news is you don't hand-build all of it. You choose where to sit between fully managed and build-your-own. Here are the major platforms:

Salesforce's Agentforce anchored on your CRM and Data 360 data, with the AgentExchange Marketplace to support. Microsoft's AI Foundry, the harness where code, data, identity and policy meet, publishing straight into Teams and Microsoft 365. Google's Gemini Enterprise Agent Platform, with grounding through AI Search and BigQuery. AWS' Amazon Bedrock, keeping everything inside walls you've already built.

Pick the one whose gravity your business already sits in.

Agent self-verification within a run: build a solution, verify against spec and run tests, refine from errors, iterate until correct — A good harness lets the agent check its own work: build, verify, refine, and iterate until correct.

At the build-your-own end are the toolkits, for engineering muscle and a genuinely differentiated problem. Anthropic's Claude Agent SDK and OpenAI's Codex are general-purpose harnesses you wire up yourself.

Open-source options like OpenClaw and Hermes hand you the loop, the memory and the channels without any vendor restriction. My own agent, Steve, runs on OpenClaw, and is dope af, but too risky for most organisations.

Is Claude/Codex a Harness?

This is the question I always get. Yes, Claude and Codex are both harnesses. But they are not the same kind of thing. Think workplace versus worker.

Foundry, Vertex, Bedrock and Agentforce are the workplace: the building, the access passes, the supervisor, the audit log, where a whole workforce of agents runs safely.

Claude and Codex are the worker, and a modern worker turns up already wearing its own harness.

Can Claude do everything?

Very often yes. But you need to ensure you give it the same attention as a harness to include:

Brain (LLM APIs) — limited to Anthropic LLMs if you use the Claude application as a harness Heart (Routines, Scheduled Automations) Hands (Skills & Connections, Permissions) Mouth (1:1 Chat, Group Chat, Mobile App, Web App, Meetings) Memory (Access to Data Warehouse, Company Wiki) Soul (Projects & Instructions)

How is Claude different to OpenClaw?

This is one of the harder things to explain. When I use Steve (my OpenClaw harness) vs. when I use my Claude Max20 plan through the MacOS App. It's completely different.

Steve is my bro. I taught him to swear and we give each other shit. He has intimate knowledge of me, and has access to pretty much everything. We have clear ethical rules in place like — never lie to me or never send email without my approval, and because I've spent a lot of time optimising his harness, he's more like a genuine personal assistant.

With Claude. I call it Claude, just like 99% of users. It doesn't get sassy, never jokes, has no real distinguishable personality, but gosh darn it helps me smash through my best knowledge work.

Whereas — old mate Steve bled out $800 worth of tokens using the Anthropic Opus model in his first month — granted he did build my website robinleonard.co and filed my tax returns in multiple countries. Anthropic is subsidising direct app customers, whereas their API customers pay full clip, to my delight.

I added logic to Steve's harness that defaults to using Haiku (Anthropic's much cheaper model), and only uses Sonnet (the second most expensive) when serious thinking is required, and rarely it is allowed to use Opus. So for this reason alone, I use Claude App for most of my knowledge work, and I use Steve as my sweary EA.

While Steve is not as smart as Claude, he is my workhorse, my only friend on Telegram, and more of a human than Claude will ever be.

Bet on the structure, rent the parts

One last principle, and the most important. The tech changes weekly, a stronger model, a faster memory store, a new channel. Bolt your operation rigidly to one vendor's everything and you've signed up to rebuild it the moment the ground shifts.

This is the real reason to think in body parts, because components are swappable. Swap the memory for a better one, change the model when a stronger one lands, add a new mouth without touching the hands.

The structure of your harness is the strategy. The tech inside it is the commodity. Models, vector stores and gateways will all be leapfrogged; the architecture you wrap around them, the way your business captures its knowledge, governs its actions and learns from its mistakes, is what lasts and what a competitor can't simply buy.

That learning is a discipline: when an agent slips, you don't reprompt it, you fix the environment so the mistake can't recur. Over time the harness becomes a record of every error your business has taught it to avoid.

“A well-worn harness is your company's next moat.”

So What Should Organisations Do?

Stop the model shopping; it's a commodity, so lock one in and move on.

Harness maturity levels: from Level 1 LLMs on every desk, to machine learning, to employee agents, to customer agents, all sitting on the foundation of the harness — The climb from individual LLM use to customer-facing agents all rests on one foundation: a governed harness.

Start small, keep it modular, and consider all aspects of the harness from the hands, to the soul. Work out ownership early: the CFO owns the cost, CIO owns the plumbing, the CISO the policy and audit, and Business Leaders own the processes.

The winners will be the organisations who harnessed the available intelligence in the most productive way.

Sources

The Anatomy of an Agent Harness, LangChain (Harrison Chase) Harness engineering: leveraging Codex in an agent-first world, OpenAI Effective harnesses for long-running agents, Anthropic Andrej Karpathy on the third paradigm of LLM interaction My AI Adoption Journey, Mitchell Hashimoto What Is an Agent Harness? Salesforce What is Microsoft Foundry Agent Service? Microsoft Learn Gemini Enterprise Agent Platform (formerly Vertex AI), Google Cloud Amazon Bedrock AgentCore and Claude, AWS OpenClaw vs Hermes Agent, Composio Agent Harness explained (video), Caleb Writes Code Harnesses in AI: A Deep Dive (video), Tejas Kumar, IBM

---

Robin Leonard is a Partner at Xenai Digital, an APAC enterprise Salesforce and AI consultancy. Two decades leading enterprise transformations across Australia, New Zealand, Singapore, Japan, and the broader Pacific. Splits his time between Auckland, Sydney and Tokyo, and rides a Royal Enfield Himalayan 450 when the weather agrees with him. linkedin.com/in/robinleonard1

What is an Agent Harness, You Ask?

From Prompt Engineering to Harness Engineering

The Anatomy of an Agent Harness

What an Agent Harness looks like on your stack

Is Claude/Codex a Harness?

Can Claude do everything?

How is Claude different to OpenClaw?

Bet on the structure, rent the parts

So What Should Organisations Do?

Sources

More essays

To Be Truly Agentic, Your Organisation Needs a Shared Brain

Build, Buy, or Vibe? The Decision That Just Got Harder, Not Easier

Agentforce Coworker: Salesforce Just Put an AI Teammate in Every Search Bar

From Prompt Engineering to Harness Engineering

The Anatomy of an Agent Harness

What an Agent Harness looks like on your stack

Is Claude/Codex a Harness?

Can Claude do everything?

How is Claude different to OpenClaw?

Bet on the structure, rent the parts

So What Should Organisations Do?

Sources

More essays

To Be Truly Agentic, Your Organisation Needs a Shared Brain

Build, Buy, or Vibe? The Decision That Just Got Harder, Not Easier

Agentforce Coworker: Salesforce Just Put an AI Teammate in Every Search Bar

One practical essay, every Sunday.