Once Claude Code starts spawning subagents and calling tools heavily, a single flat transcript would become noisy fast. It would also become expensive to keep in the model context. Claude Code solves both problems with two clean storage decisions:
- subagents get their own transcript files
- oversized tool output gets spilled into standalone files with previews kept inline
These two mechanisms tell you a lot about how the product is optimized for real agent workflows instead of simple request-response chats.
Subagents are stored as parallel transcript streams
When Claude Code uses the Agent tool, it does not cram that side conversation into the main session file. Instead, it creates a subagents/ directory under the session artifact folder and writes separate JSONL files for each agent run.
The structure looks like this:
{session-id}/
├── subagents/
│ ├── agent-abc123.jsonl
│ └── agent-abc123.meta.json
└── tool-results/
That is the right choice for at least three reasons.
The main transcript stays readable
If subagent work lived inline, the top-level session would become hard to follow quickly. You would get the manager agent, the explore agent, the fix agent, and their internal loops all mixed together. Separate files keep the main session understandable.
Sidechains can get large
Subagent transcripts can grow into multi-megabyte files on their own. Splitting them out avoids bloating the parent transcript and keeps local resume and rendering paths lighter.
Different agents may use different models
In the analyzed build, main sessions and subagents do not necessarily use the same model. Explore-style agents can use a cheaper, faster model while the main session runs on a larger one. That makes isolated storage even more useful because it keeps per-agent usage measurable.
Why this matters for cost tracking
Subagents are not just an implementation detail. They change the accuracy of every downstream metric.
| If you include subagents | If you ignore subagents |
|---|---|
| Total token counts are closer to reality | Usage can be materially undercounted |
| Delegation-heavy sessions make sense | Parent sessions look strangely "cheap" |
| Model mix is measurable | Cheap helper models disappear from the dataset |
| Session cost attribution improves | Orchestration work gets misread as low-effort work |
That is the practical reason this storage split matters. A parser that stops at the parent transcript is not "close enough." It is missing execution surface.
What changes inside a subagent transcript
The record format is mostly the same as the main transcript, but there are important differences.
Subagent records carry fields like:
isSidechain: trueagentId- agent metadata in a sibling
.meta.jsonfile
That metadata can include the agent type and a short description of the assigned task. So from an analytics perspective, a subagent is not just "more tokens." It is a separate workstream with its own cost, timing, and purpose.
In Vibenalytics, that distinction turned out to matter far more than we expected. One of the most serious data-quality problems we hit was a subagent data loss bug.
The failure mode looked like this:
- Claude Code fired the parent
Stopevent. - Vibenalytics synced immediately.
- The subagent kept writing to disk after that.
- The backend saw the parent prompt first and created records.
- The later subagent payload arrived for the same prompt index.
- The old backend logic treated it as already seen and skipped it.
The result was not a small discrepancy. Delegation-heavy sessions looked artificially cheap.
If you are reading this from the metrics angle, this is exactly why What You Can Measure From Claude Code Transcripts argues that subagents are not optional accounting detail.
We fixed it with a two-part design:
- a local
subagent-pending.jsonstate file to track in-flight agents - sync deferral until all subagents complete, with a timeout fallback and
SessionEndas a hard stop
That is the kind of issue you only find once you stop thinking of subagents as an abstract feature and start treating them as asynchronous writers in a real filesystem-backed runtime.
Most local analytics miss subagents
This is where the storage design becomes a trap for shallow parsers. If you only scan the top-level session JSONL files and ignore subagents/*.jsonl, you will undercount usage, and in some workflows, badly.
That is especially true for search-heavy or delegation-heavy sessions where the main agent is mostly orchestrating and the real work is happening in sidechains.
This is one of the most interesting findings from the reverse-engineering work because it explains why two tools can read "the same session" and still disagree on usage. They are not necessarily reading the same execution surface.
Vibenalytics also had to handle a more pathological version of this problem: compaction ghost agents. After context compaction, Claude Code can create agent-acompact* files containing pre-compaction subagent data. If you naively discover all subagent files, you double-count tokens and line changes.
That was not hypothetical. We explicitly had to filter those files out because they contain data that already exists in the parent transcript. This is a good example of a recurring theme in the whole series: the hard part is not reading the files. The hard part is understanding which files are authoritative for which part of the session history.
That compaction-specific edge case is covered more directly in marble-origami: How Claude Code Handles Context Compaction.
Large tool output is handled as spillover
Claude Code also avoids another common failure mode: stuffing giant tool results directly into the transcript. When a tool output grows beyond the inline limit, Claude Code writes the full result to tool-results/{toolUseId}.txt and keeps only a small preview inline.
The design is straightforward:
- keep roughly 2KB of preview text in the transcript
- store the full output on disk
- insert a structured reference telling both the model and the human where the full file lives
This is a strong compromise. The model gets enough local context to decide what to do next, the human still has access to the full output, and the transcript stays compact enough to be useful.
You can think of the spillover behavior like this:
<persisted-output>
Output too large. Full output saved to: /path/to/file.txt
Preview:
[first 2000 characters]
</persisted-output>
That pattern is worth copying if you are building tools around long-running agents. It preserves auditability for the human without wasting context budget for the model.
The same principle is why Vibenalytics never sends raw tool output or transcript content to the backend. The useful thing for analytics is usually the metadata:
- that a tool ran
- how often it ran
- how large the output was
- how long it took
- which prompt or subagent produced it
That boundary is not a marketing line. It is part of the architecture.
Why the "wx" flag matters
Claude Code writes spillover files with an exclusive create mode. That is a small implementation detail with a real payoff: parallel tool calls cannot silently overwrite each other's outputs.
For an agent runtime that can execute many operations in a short burst, that is exactly the kind of defensive filesystem behavior you want.
This is really a context management story
Subagent files and tool spillover files are not just storage tricks. They are context management primitives.
Claude Code is constantly balancing three competing goals:
- preserve a full audit trail
- keep local persistence practical
- avoid wasting context window on low-value bulk output
Separate sidechains and spillover previews solve all three reasonably well.
The retention limit still matters
There is one catch. Even though Claude Code stores subagents and tool spillover intelligently, the local transcript system is still subject to the same approximate 30-day retention window as the rest of ~/.claude/.
So yes, the data exists, but not for long.
That matters a lot if you want to study:
- how often subagents are used over time
- which workflows rely most on delegation
- whether tool output volume is growing
- how model mix changes across weeks or months
A good local storage design solves recent-state persistence. It does not automatically solve historical analytics.
That is also why the retention issue deserves its own post: Claude Code's 30-Day Memory Problem.

Subagent-heavy workflows are hard to measure locally
Vibenalytics tracks Claude Code activity with project attribution and long-term retention, which makes delegation-heavy sessions easier to understand after the local transcript window rolls forward.