MCP+
Precision Context Management for MCP Agents
A post-processing layer that wraps your MCP clients and returns only the needles - not the haystack - so you cut context bloat and cost without changing your agent.
About
The promise of Model Context Protocol (MCP) servers is seamless tool integration, but the reality is often indiscriminate context bloat. Every time an agent fetches a 1000-line HTML document just to find a single button's ID, you aren't just paying for those tokens once — the context bloat compounds with every subsequent turn in the conversation. The task agent is forced to find needles in the haystack while you foot the bill for the hay. This inflates costs, exhausts context windows, and distracts even the most capable LLMs.
To mitigate this context bloat, we built MCP+: a server-, agent-, and task-agnostic post-processing layer designed to sit as a protective wrapper around your MCP clients. Instead of forcing your primary task agent to sift through a mountain of hay, MCP+ intercepts the tool outputs and hands over only the needles. By delivering a highly condensed context window, it achieves comparable reasoning performance while slashing associated inference costs by up to 75%.
The power of the expected_info argument
MCP+ acts as a pseudo-MCP-server; a layer that requires zero changes to your existing agent's logic. To your task agent, it looks like the same MCP server with the same tools, but with one powerful new capability: the expected_info argument. This allows the agent to call a tool and define the specific needle it needs before the haystack even reaches its context.
It's the difference between calling a tool for a 2000-line nested JSON data dump and asking for specific data values for User ID X, Y and Z — one is a context-clogging haystack; the other is the exact token-efficient answer. By offloading the haystack sifting to a more economical model (e.g. GPT-5-mini or Gemini-2.5-Flash), you stop burning premium tokens on noise and start investing them in reasoning.
Example: Yahoo Finance MCP Server
Here's an example output from the Yahoo Finance MCP server, where the agent has called the tool get_historical_stock_prices with the arguments 'ticker': 'MSFT', 'end_date': '2025-01-09', 'start_date': '2023-01-08':
Running the same task with MCP+ adds expected_info: e.g. "Closing prices for MSFT on 2023-01-09 and 2025-01-08 to calculate investment returns." With that, MCP+ returns only the relevant slice to the task agent:
The number of tokens goes down from roughly ~7000 to ~200 (>95% reduction).
Intelligent activation and cost control
MCP+ uses an economical, low-latency model (e.g. GPT-5-mini) as a post-processing agent that performs both direct and code-based extraction to filter out the fluff while preserving the integrity of the original data. To keep the process efficient, we've included a customizable activation threshold: MCP+ only engages when the tool call output is large enough to threaten your context window or your budget. For smaller payloads, the data passes through untouched.
We also add guardrails that enforce a strict token ceiling, so the extracted output never exceeds the size of the original source. The result is a system that maximizes information density and delivers a consistent reduction in overhead without compromising performance.
Results
Measuring the impact of MCP+ on performance and inference costs
We ran a systematic evaluation across task domains, output formats, and agent LLMs. Using the MCP-Universe Benchmark, we measured the performance of Claude 4.0 Sonnet, GPT-5, and Gemini 3 Pro Preview as function-calling agents across:
- Browser Navigation: Playwright (raw HTML DOM)
- Financial Management: Yahoo Finance (structured JSON)
- Web Search: Google Search (Serper API payloads)
In each case we compared standard tool-use with an MCP+ configuration (powered by GPT-5-mini) to measure the delta in performance and inference cost. We found that MCP+ leads to consistent cost savings while maintaining comparable performance when averaged across multiple runs.
Demo
MCP+: Arbitrage for developers to preserve premium LLM context
For developers using Cursor or Claude Code, context bloat is a direct drain on your innovation budget. When an MCP tool dumps 10,000 tokens of irrelevant metadata into your chat, it's stealing the tokens your model needs to refactor that function or debug that edge case.
MCP+ lets you do an economic arbitrage: by offloading the haystack sifting to a cheaper model, you preserve your premium Claude Opus 4.5 context for what actually matters — writing code. While this orchestration step introduces a small latency overhead, the tradeoff is a leaner, more focused, and significantly more cost-effective inference cycle.
Here's a demo showcasing MCP+ in action in Cursor with Playwright:
Here's a demo showcasing MCP+ in action in Claude Code with Context7:
Here's a demo showcasing MCP+ in action in Agentforce Vibes with DX MCP:
Installation
Prerequisites: Python 3.10+, OpenAI API key (or another economical LLM of your choice).
Restart Cursor or Claude Code. Your servers now have -plus versions with intelligent filtering.