Back to blog

Claude Code Context Compaction: Inside marble-origami

Claude Code's internal context compaction system, marble-origami, shows where model-visible history ends and full transcript history continues. Here is how compaction boundaries actually work.

Martin Vančo
Claude Code Context Compaction: Inside marble-origami

Long Claude Code sessions eventually hit the same wall every agent does: the context window is finite. What happens next is more interesting than "old messages get summarized." Inside Claude Code, context compaction is tracked explicitly in the transcript through records connected to an internal codename: marble-origami. That name is whimsical. The implementation is not.

Compaction is treated as a first-class event

When Claude Code decides to compact context, it does not hide the operation behind silent internal state. It writes dedicated transcript records, including:

  • marble-origami-commit
  • marble-origami-snapshot
  • a system record with subtype compact_boundary

This is a strong architectural choice because compaction changes what the model can still "see." That is not just UI trivia. It affects causality, debugging, and analysis. Claude Code preserves that distinction in the storage layer.

The key trick: two parent concepts

The most elegant detail is the split between:

  • parentUuid
  • logicalParentUuid

At the compaction boundary, parentUuid is set to null, which tells the chain-building logic that the current model-visible conversation starts here. But Claude Code also preserves the real lineage through logicalParentUuid, which lets the UI and analytical tooling reconstruct the broader history. This is the right way to model compaction because it admits an important truth: the full local transcript and the active model context are no longer the same thing.

Here is the essential logic in pseudocode:

if (record.subtype === 'compact_boundary') {
  record.parentUuid = null
  record.logicalParentUuid = previousVisibleMessageUuid
}

That tiny distinction is what lets the system tell the truth to both audiences:

  • the model, which only sees the compacted continuation
  • the UI or analytics layer, which may still need full lineage

If you have not read the general transcript structure first, Inside Claude Code Transcripts: Record Types, Trees, and Turn Reconstruction is the best companion piece for this section.

Why this matters more than it sounds

A lot of transcript tooling quietly assumes a session is one continuous visible conversation. That stops being true after compaction.

After a compact event:

  • the user may still see earlier history in the interface
  • the local transcript may still contain older records
  • the model is operating on a summarized version, not the original chain

If your parser ignores this distinction, you can build analyses that look coherent but are conceptually wrong.

Vibenalytics ended up caring about this more than a normal transcript viewer would, because compaction is not just part of session playback. It is part of the metric surface. We track PreCompact hook events and transcript compact_boundary records so compaction can appear as its own event type in analytics, with fields like trigger and pre-compaction token count.

Compaction is a signal, not just a maintenance event

marble-origami is useful for more than accurate reconstruction. It is also a high-value behavioral metric. Compaction tells you that a session accumulated enough context pressure to require summarization, which makes it a proxy for session shape.

A compacted session is often one of these:

  • a long debugging trace
  • a broad exploratory session
  • a multi-step implementation loop
  • a session with large tool outputs or many iterations

This opens up better questions than raw token totals.

For example:

  • Which projects compact most often?
  • Which tasks create the most context pressure?
  • Are compactions correlated with subagent use?
  • Does compaction frequency rise during certain hours or work patterns?

That is far more actionable than "this session used a lot of tokens."

Compaction is also a proxy for prompt design pressure

Once you start measuring compaction frequency, you get indirect feedback on workflow structure:

Frequent compaction can suggestWhy it matters
Long exploratory sessionsContext is growing faster than execution is resolving
Overly broad promptsThe agent keeps carrying too much state forward
Heavy tool outputThe transcript is accumulating bulk, not just intent
Repetitive debugging loopsThe session keeps revisiting prior context

This is where compaction stops being an internal implementation detail and becomes a useful product metric.

There was also a very practical reason to get compaction handling right: compaction ghost agents. Claude Code can create agent-acompact* subagent files around compaction boundaries. Those files contain pre-compaction data that already exists in the parent transcript. If you ingest them naively, you double-count both tokens and line changes.

That is a good illustration of the whole problem space. Compaction is not just a UI state transition. It changes what files exist on disk, which records are authoritative, and what a downstream analytics system has to filter out.

That filtering problem is also part of the broader subagent story in How Claude Code Stores Subagents and Large Tool Results.

Compaction changes how you should measure turns

If you are reconstructing full conversation history, compaction means chronology alone is not enough. You need to distinguish between:

  • historical transcript continuity
  • model-visible continuity

That affects any tooling that tries to answer:

  • what the model had access to before a reply
  • which messages were preserved versus summarized
  • whether an error happened before or after a compaction boundary

This is exactly why the transcript stores the boundary explicitly. Claude Code's runtime needs this distinction, and serious analysis needs it too.

There is also a product lesson here

Compaction is a symptom of success. You do not need sophisticated context management in tiny toy sessions. You need it when users are doing real, messy, extended work. So marble-origami is evidence that Claude Code is designed for long-running sessions with actual workflow complexity.

That is worth noticing because many AI tools still behave like stateless chat surfaces with nicer branding. Claude Code does not. Its storage layer reflects a much more operational model.

The 30-day retention limit turns compaction into a disappearing signal

There is also a limit here. Compaction events are extremely useful for understanding workflow pressure over time, but they live inside the same local transcript system that is usually cleaned up after about 30 days.

So if you want to know:

  • whether compaction is becoming more common
  • which repositories trigger it repeatedly
  • how context pressure changes over months

...local storage alone will not get you there.

You have the event, but not the durable history.

This is the recurring theme in Claude Code's architecture: the local transcript is rich enough for deep analysis, but not retained long enough for durable observability.

That durability gap is the subject of Claude Code's 30-Day Memory Problem.

Track context pressure over time

Vibenalytics makes it easier to correlate compaction events, session cost, tool behavior, and project context so you can spot when workflows start pushing the model too hard.

View workflow insights