How Claude Code Stores Subagents and Large Tool Outputs

Once Claude Code starts spawning subagents and calling tools heavily, a single flat transcript would become noisy fast. It would also become expensive to keep in the model context. Claude Code solves both problems with two clean storage decisions:

subagents get their own transcript files
oversized tool output gets spilled into standalone files with previews kept inline

These two mechanisms tell you a lot about how the product is optimized for real agent workflows instead of simple request-response chats.

Subagents are stored as parallel transcript streams

When Claude Code uses the Agent tool, it does not cram that side conversation into the main session file. Instead, it creates a subagents/ directory under the session artifact folder and writes separate JSONL files for each agent run.

The structure looks like this:

{session-id}/
├── subagents/
│   ├── agent-abc123.jsonl
│   └── agent-abc123.meta.json
└── tool-results/

That is the right choice for at least three reasons.

The main transcript stays readable

If subagent work lived inline, the top-level session would become hard to follow quickly. You would get the manager agent, the explore agent, the fix agent, and their internal loops all mixed together. Separate files keep the main session understandable.

Sidechains can get large

Subagent transcripts can grow into multi-megabyte files on their own. Splitting them out avoids bloating the parent transcript and keeps local resume and rendering paths lighter.

Different agents may use different models

In the analyzed build, main sessions and subagents do not necessarily use the same model. Explore-style agents can use a cheaper, faster model while the main session runs on a larger one. That makes isolated storage even more useful because it keeps per-agent usage measurable.

Why this matters for cost tracking

Subagents are not just an implementation detail. They change the accuracy of every downstream metric.

If you include subagents	If you ignore subagents
Total token counts are closer to reality	Usage can be materially undercounted
Delegation-heavy sessions make sense	Parent sessions look strangely "cheap"
Model mix is measurable	Cheap helper models disappear from the dataset
Session cost attribution improves	Orchestration work gets misread as low-effort work

That is the practical reason this storage split matters. A parser that stops at the parent transcript is not "close enough." It is missing execution surface.

What changes inside a subagent transcript

The record format is mostly the same as the main transcript, but there are important differences.

Subagent records carry fields like:

isSidechain: true
agentId
agent metadata in a sibling .meta.json file

That metadata can include the agent type and a short description of the assigned task. So from an analytics perspective, a subagent is not just "more tokens." It is a separate workstream with its own cost, timing, and purpose.

In Vibenalytics, that distinction turned out to matter far more than we expected. One of the most serious data-quality problems we hit was a subagent data loss bug.

The failure mode looked like this:

Claude Code fired the parent Stop event.
Vibenalytics synced immediately.
The subagent kept writing to disk after that.
The backend saw the parent prompt first and created records.
The later subagent payload arrived for the same prompt index.
The old backend logic treated it as already seen and skipped it.

The result was not a small discrepancy. Delegation-heavy sessions looked artificially cheap.

If you are reading this from the metrics angle, this is exactly why What You Can Measure From Claude Code Transcripts argues that subagents are not optional accounting detail.

We fixed it with a two-part design:

a local subagent-pending.json state file to track in-flight agents
sync deferral until all subagents complete, with a timeout fallback and SessionEnd as a hard stop

That is the kind of issue you only find once you stop thinking of subagents as an abstract feature and start treating them as asynchronous writers in a real filesystem-backed runtime.

Most local analytics miss subagents

This is where the storage design becomes a trap for shallow parsers. If you only scan the top-level session JSONL files and ignore subagents/*.jsonl, you will undercount usage, and in some workflows, badly.

That is especially true for search-heavy or delegation-heavy sessions where the main agent is mostly orchestrating and the real work is happening in sidechains.

This is one of the most interesting findings from the reverse-engineering work because it explains why two tools can read "the same session" and still disagree on usage. They are not necessarily reading the same execution surface.

Vibenalytics also had to handle a more pathological version of this problem: compaction ghost agents. After context compaction, Claude Code can create agent-acompact* files containing pre-compaction subagent data. If you naively discover all subagent files, you double-count tokens and line changes.

That was not hypothetical. We explicitly had to filter those files out because they contain data that already exists in the parent transcript. This is a good example of a recurring theme in the whole series: the hard part is not reading the files. The hard part is understanding which files are authoritative for which part of the session history.

That compaction-specific edge case is covered more directly in marble-origami: How Claude Code Handles Context Compaction.

Large tool output is handled as spillover

Claude Code also avoids another common failure mode: stuffing giant tool results directly into the transcript. When a tool output grows beyond the inline limit, Claude Code writes the full result to tool-results/{toolUseId}.txt and keeps only a small preview inline.

The design is straightforward:

keep roughly 2KB of preview text in the transcript
store the full output on disk
insert a structured reference telling both the model and the human where the full file lives

This is a strong compromise. The model gets enough local context to decide what to do next, the human still has access to the full output, and the transcript stays compact enough to be useful.

You can think of the spillover behavior like this:

<persisted-output>
Output too large. Full output saved to: /path/to/file.txt

Preview:
[first 2000 characters]
</persisted-output>

That pattern is worth copying if you are building tools around long-running agents. It preserves auditability for the human without wasting context budget for the model.

The same principle is why Vibenalytics never sends raw tool output or transcript content to the backend. The useful thing for analytics is usually the metadata:

that a tool ran
how often it ran
how large the output was
how long it took
which prompt or subagent produced it

That boundary is not a marketing line. It is part of the architecture.

Why the `"wx"` flag matters

Claude Code writes spillover files with an exclusive create mode. That is a small implementation detail with a real payoff: parallel tool calls cannot silently overwrite each other's outputs.

For an agent runtime that can execute many operations in a short burst, that is exactly the kind of defensive filesystem behavior you want.

This is really a context management story

Subagent files and tool spillover files are not just storage tricks. They are context management primitives.

Claude Code is constantly balancing three competing goals:

preserve a full audit trail
keep local persistence practical
avoid wasting context window on low-value bulk output

Separate sidechains and spillover previews solve all three reasonably well.

The retention limit still matters

There is one catch. Even though Claude Code stores subagents and tool spillover intelligently, the local transcript system is still subject to the same approximate 30-day retention window as the rest of ~/.claude/.

So yes, the data exists, but not for long.

That matters a lot if you want to study:

how often subagents are used over time
which workflows rely most on delegation
whether tool output volume is growing
how model mix changes across weeks or months

A good local storage design solves recent-state persistence. It does not automatically solve historical analytics.

That is also why the retention issue deserves its own post: Claude Code's 30-Day Memory Problem.

Subagent-heavy workflows are hard to measure locally

Vibenalytics tracks Claude Code activity with project attribution and long-term retention, which makes delegation-heavy sessions easier to understand after the local transcript window rolls forward.

Track real usage