Context Engineering
Every LLM interaction builds up the same way: system prompt, user message, assistant response, tool calls, more user input, more responses, and all of this accumulates into context, which grows with every turn.
This creates two problems. First, context size naturally limits how many turns you can have. Second, more context makes LLM responses worse. Prompts, agents, MCP, tools—they're all just different ways to add text to context. And badly designed MCP servers like GitHub or Atlassian happily poison the context with irrelevant data: too many tools, too many parameters, massive JSON responses.
The real trick is keeping context good for longer runs: not too big, not too small, and with relevant data.
The most popular approach now is using an LLM to "compress" the context when it gets close to limits, and results are often catastrophic, with valuable information gets lost. Making things worse, context building with LLMs is non-deterministic, due to rounding errors (see Thinking Machines paper), which is hilarious by itself: same input produces different outputs, so we can't just simply rewind the history back.
So we need better approaches, and they are emerging:
- Increase LLM determinism (same input -> same output)
- Anthropic's skills concept emphasizing progressive disclosure
- Evolving Contexts for Self-Improving Language Models
- COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving
- Agent Learning via Early Experience
And the fundamental question still remains: how far can we get with the best context possible? And another one, closely related: what is the best context?