How to Reduce Claude Code Costs for Your Team (Without Changing Anything)

Most teams only notice the Claude Code bill when it's too late, after the first month surprises them.

The pattern is consistent. A team adopts Claude Code, developers start using it daily, productivity improves noticeably. Then the first Anthropic invoice arrives. $1,400. $2,400. $4,800. The question that follows is always the same: how did we spend that much?

The answer is context accumulation at scale. And the fix does not require changing how your developers work.

TL;DR

Five developers running 8 Claude Code sessions per day at 30 turns each accumulate roughly 240M input tokens per developer per month.
At Sonnet pricing of $3 per million, that is $480 per developer per month, or $2,400 per month for a five-person team.
Manual mitigations (asking developers to be careful, switching models, monitoring dashboards) underperform because they fight workflow rather than cost.
Automated context distillation at the proxy layer cuts 30-60% of costs depending on session patterns, with a 20% conservative benchmark floor. No developer behavior change required.
Setup is per-developer at roughly two minutes each — total ten minutes for a five-person team.

The $2,400/Month Problem

A five-developer team on Claude Code at standard usage (8 sessions per day, 30 turns each, 20 working days per month) accumulates roughly 240 million input tokens per developer per month. At Sonnet's $3 per million, that is $480 per developer, $2,400 across the team. This is not heavy usage; it is the floor cost for a team using the tool as intended.

Five developers. Each running 8 Claude Code sessions per day. Twenty working days per month. That is 160 sessions per developer per month.

Each session averages 30 turns. Each turn re-sends the growing conversation context to the API: tool results, file reads, prior messages, everything accumulated since turn 1. By turn 30, a typical session carries approximately 50,000 input tokens per turn.

160 sessions × 30 turns × avg 50k tokens = 240M input tokens/developer/month
240M × $3.00/M (Sonnet pricing) = $480/developer/month
5 developers = $2,400/month

For more on the exact team cost breakdown at different scales, see Claude Code for teams.

Stepped escalation bar chart showing cost buildup: $4.50 per session, $480 per dev per month (x160 sessions), $2,400 per team per month (x5 devs)

Why Teams Spend More Than Solo Developers

Teams pay more per developer than solo users because of three structural patterns. Solo developers eventually internalize cost intuitions over months of usage; teams do not develop these intuitions collectively. Teams also run longer agentic sessions on shared work (debugging production issues, large refactors). And teams have no cross-developer signal about which sessions are getting expensive — each developer's bill is invisible to the others.

Solo developers naturally develop intuitions about Claude Code costs over time. They learn to start fresh context windows when a session gets long. They scope sessions to specific files or tasks. They restart when something feels slow.

Teams do not develop these intuitions collectively. Each developer manages their own sessions independently. There is no shared signal about when a session is getting expensive. No policy enforcement. No visibility into whether a five-hour refactor session is costing $15 or $50.

Teams also run longer sessions on average. Complex multi-developer tasks (debugging production issues, large-scale refactors, architectural decisions) tend to run longer and accumulate more context than typical solo sessions. The sessions that generate the most value also generate the highest bills.

What Does Not Work

Four common approaches to team cost reduction underperform in practice: asking developers to be careful, switching to a cheaper model, enforcing session length limits, and monitoring dashboards. Each addresses a symptom but leaves the structural cost driver intact, which is why teams that try them typically see brief reductions followed by drift back to baseline.

Asking developers to be more careful. Individual behavior changes are fragile. Developers are focused on the task, not on session economics. Even well-intentioned teams revert to natural patterns within a few weeks.

Switching models. Moving from Sonnet to Haiku reduces the per-token cost but does not change the accumulation mechanic. A session that accumulates 1.5 million tokens on Sonnet accumulates the same 1.5 million tokens on Haiku. The bill is lower; the problem is not solved.

Enforcing session length limits. Hard limits interrupt work at arbitrary points. If a developer is in the middle of debugging a production issue, cutting their session because it has run 30 turns is not a viable policy.

Monitoring dashboards. Visibility is useful but not sufficient. Telling a team what they spent last week does not change what they do this week. Usage patterns are driven by workflow incentives, not by dashboard awareness.

What Actually Works

Context distillation at the proxy layer is the only approach that reduces costs without requiring any change to developer workflow. The proxy intercepts each Claude Code request, removes redundant content (duplicate file reads, verbose shell output, stale conversation history), and forwards a smaller payload to Anthropic. Real-world reduction is 30-60% depending on session patterns; the conservative benchmark floor is 20% on standard sessions.

From the developer's perspective there is no difference. Claude responds normally. The session continues. Each developer runs two commands:

npm install -g thedistillery
thedistillery start

That is the entire workflow change. Everything else is automated.

Structuring Team Sessions to Minimize Context Bleed

Context bleed happens when a session grows beyond its original scope — a developer starts debugging one module, pulls in three others for reference, reads a config file, runs a migration, and ends up with 80,000 tokens of context where 20,000 would have served the task. The fix is session scoping as a team habit: each session should have a declared entry point and a defined exit condition. In practice, this means one Claude Code session per task or sub-task, not one session per workday. When a developer finishes a feature branch and moves to a bug fix, they start a new session rather than continuing in the same context window. Over twenty working days, this discipline alone can reduce the average session token count by 20-30%, before any proxy optimization runs on top of it.

The Team Math

At the $2,400/month baseline for five developers, a 20% conservative floor reduction saves $480/month in direct API costs — real-world savings of 30-60% depending on session patterns would scale proportionally. At higher session volumes or with more developers, the savings scale proportionally.

Team cost reduction: $2,400 baseline for 5 devs, 30% reduction saves $720/month, 60% reduction saves $1,440/month

Agentic sessions, the kind that teams run more often than solo developers, tend to produce higher reductions. Long multi-step sessions with repeated tool calls have more redundant content to distil. Teams running complex long-horizon tasks often see the upper end of the 30-60% range.

The proxy handles the distillation automatically. There are no rules to configure, no session parameters to tune, no workflow adjustments required from developers. It intercepts, distils, forwards.

Per-Developer vs Project-Level Setup

Two configuration patterns work for teams: per-developer setup (each developer installs the proxy locally and sets the env var in their shell profile) and project-level setup (a .distilleryrc.json file in the repository applies settings to anyone working in that codebase). The choice depends on how the team prefers to manage developer environment configuration.

Per-developer setup is simpler. Each developer runs npm install -g thedistillery, adds export ANTHROPIC_BASE_URL=http://localhost:3080 to their shell profile, and starts the proxy. Total time per developer is roughly two minutes. The setup persists across machine restarts because the env var is in the shell profile.

Project-level setup uses a .distilleryrc.json file checked into the repository root. The file specifies preset and compression settings that apply automatically when developers run Claude Code inside the project directory. This is the better option for teams that want consistent settings across the team without relying on individual shell configurations. The trade-off is that developers still need the proxy installed locally; the project file controls behavior, not installation.

For a five-developer team, the combined approach is typical: each developer installs locally (two minutes per developer, ten minutes total), and the project root has a .distilleryrc.json for consistent compression behavior across all team members.

How Teams Verify the Reduction

A 30-60% cost reduction claim is meaningful only if the team can confirm it on their own workflow. Two checks give the actual delta. The first is the benchmark itself, which runs the optimization pipeline against a fixed corpus of multi-turn coding fixtures and reports the reduction percentage (20% conservative floor). This sets the floor.

The second is per-session logging. Each developer's proxy records raw and optimized token counts per session in the local SQLite database. After a week of normal usage, thedistillery stats reports the per-session reduction across actual workflow sessions. For most teams, the real-session reduction sits between the 20% benchmark floor and the 50% heavy-session ceiling, depending on what kinds of sessions dominate the workload.

The verification matters because it converts an abstract claim into a concrete number for the specific team. A team that runs primarily short conversational sessions sees closer to 20%. A team that runs long agentic refactors sees closer to 40-50%. Both are normal; the per-team variation is what determines the actual monthly saving.

Getting Started for a Team

The setup is per-developer, not per-project. Each developer installs The Distillery locally and sets one environment variable in their shell profile. Alternatively, project-level setup using a .distilleryrc.json config file applies settings automatically for everyone working in that repository.

For a five-person team, the total setup time is approximately ten minutes across the team.

The reduction begins immediately. Every subsequent Claude Code request routes through the proxy, which applies distillation before forwarding to Anthropic. No restart required. No model change. No impact on Claude Code functionality.

Frequently Asked Questions

Q: Does the proxy require a server to run somewhere central?

No. Each developer runs the proxy locally on their own machine, listening on localhost:3080. There is no shared infrastructure to provision. API keys never leave the developer's machine because the proxy forwards directly to Anthropic from each developer's local environment.

Q: Will distillation affect what Claude generates for our developers?

The distillation removes redundant transmission, not semantic content. A test result "44 suites passed" carries the same information as 200 lines of per-test output. Claude's understanding of the result is unchanged because the relevant signal is preserved. Output quality is unaffected; the bill is lower.

Q: How do we measure the actual reduction across the whole team?

Each developer's proxy logs raw and optimized token counts per session locally. A team can aggregate these logs (for example, with a weekly script that pulls each developer's thedistillery stats output) to compute team-level totals. The benchmark figure is a floor; team-level real-session figures vary by workflow, typically 20-50%.

Q: What is the per-seat cost if we want a managed version?

The Distillery is open source and runs locally per developer at no per-seat cost. There is no managed plan that adds per-seat fees. Teams pay only for the Anthropic API tokens they consume, which is the cost being reduced. The total team bill drops by the reduction percentage.

Q: How does this compare to switching to a cheaper model for the team?

Switching to Haiku reduces per-token cost but does not change the accumulation mechanic. The same number of tokens is still transmitted, just at a lower rate. Distillation reduces the volume of tokens transmitted, which compounds with whatever model is in use. Teams can do both: distil context and run a cheaper model where quality permits. The savings stack.

Install The Distillery and reduce your team's API costs starting today. Or see the teams breakdown for the full cost model at different team sizes.

Try it on your own Claude Code sessions.

The Distillery applies these distillations automatically. Free until it saves you something.

Coming soon