About Agent Harnesses

Brief overview of Agent Harnesses such as Claude Code, Codex, Opencode and Pi.

Agents

Overview

You may have heard the term agent harness throwed around here and there, but what exactly is a agent harness.

Lets understand what exactly a harness is in this blog.

  • What is harness is
  • how tools like Claude Code,Cursor actually work
  • why the harness affects output quality so much
  • and why the core of all this is much simpler than it looks

The model only generates text

At the heart of every coding agent there is a LLM. And regardless whatever this agents do, the model itself is still doing one basic thing:

Predicting Text

That's it. It's just guessing what's come after a given word over and over again.

It cannot "open files", "edit code", or "delete your production database :)"

So if this is the case, how does coding agents like Cursor and Claude Code can:

  • explore your project
  • search for files
  • edit code
  • run shell commands
  • and keep working step and step until a given task is completed.

The answer is tool calling, orchestrated by a harness


What is a harness?

A harness is the environment and tool layer around the model.

It is a software layer that:

  • tells the model what tools it has access to
  • receives tool calls from the model
  • decides whether those tool calls are safe to run
  • executes them
  • feeds the results back into the conversation
  • and keeps that whole loop going until the model is done

So when people are comparing different coding agents, they are not just comparing the models.

They're comparing:

  • Tools
  • tool descriptions
  • system prompt
  • the permission model
  • the context management

All of this can massively influence how good the final result is.


How tool calling works

Lets say you ask your agent:

What are the contents of this directory:

A plain chat model can’t actually inspect your filesystem.
But a harness can give it a tool like bash.

Instructions on how to use this tool are already present in the context, and the tool descriptions are send via API.

The model outputs special tokens that tells the harness, that it wants to call a tool with given parameters.

<tool: bash>
ls -la
</tool>

At that point, the model stops.

Then the harness takes over:

  1. It sees the tool call
  2. It runs ls -la
  3. It captures the output
  4. It appends that output to the chat history
  5. It sends the updated conversation back to the model
  6. The model continues from there

So every tool call creates a loop:

model → tool call → harness executes → result added to history → model continues

That’s the core architecture behind nearly every modern AI coding tool.

ReadingGrokking Simplicity