Intelligence is cheap. Context is expensive.
Published on August 24, 2025
Introduction
After thousands of hours working with frontier LLMs, I've reached a counterintuitive conclusion: the models are already smart enough for most tasks. What varies wildly is how well we can give them what they actually need to succeed. The popular narrative focuses on model capabilities: reasoning, factual knowledge, code generation. But I think we're missing the real bottleneck. Performance differences between users aren't primarily about prompt engineering tricks or having access to the latest model. They're about something much harder: extracting and encoding the implicit context that determines whether output is actually useful.
The context gap
Here's a simple test. Ask Claude or ChatGPT to write a product requirements document for a feature at your company. Then ask a colleague to do the same task. The model will give you something that looks professional – proper sections, reasonable user stories, clean formatting. Your colleague will give you something that actually reflects how decisions get made at your company. They know that the CEO hates anything that increases support tickets. They remember that the last time someone proposed a similar feature, it got killed because of data privacy concerns. They understand that "quick win" in your organization means "ships in the current quarter without additional headcount." This isn't because your colleague is smarter than Claude. It's because they have access to context that's nearly impossible to encode in a prompt: the unwritten rules, competing priorities, and organizational dynamics that determine what actually gets built.
A simple framework
Performance varies because of three factors: Performance = Model × Task Clarity × Context Model capability keeps rising, but it's well beyond what's needed for many tasks. The variance in outcomes is increasingly dominated by how well you can specify what you want (Task Clarity) and surface the relevant information the model needs to succeed (Context). This is why the same person can get dramatically different results from the same model on seemingly similar tasks. It's not the model that's inconsistent, it's the quality of specification and context extraction.
The five types of hidden context
When LLM output feels generic or misses the mark, it's usually because you haven't surfaced one of these categories:
1. Tacit knowledge
The things insiders "just know" but never write down. In my industry, everyone understands that customers under $10K ARR churn at completely different rates than enterprise clients, but this isn't documented anywhere. If I ask an LLM to analyze our customer retention without specifying this, the analysis will be fundamentally wrong.
2. Higher-order goals
The real objective that wins trade-offs when push comes to shove. You might ask for a marketing plan to "increase brand awareness," but if your actual constraint is that every initiative needs measurable ROI within 90 days, the model needs to know that. Otherwise you'll get a beautiful strategy that's completely unimplementable.
3. Local constraints
The physics of your specific environment. Budget limits, data availability, team capacity, regulatory requirements, technical debt. These aren't industry best practices, they're the specific limitations and resources you're working within right now.
4. Taste and format preferences
How output needs to look and sound to be accepted by your audience. Some executives want one-page memos with clear recommendations. Others prefer detailed analysis with caveats. Some organizations use specific terminology or avoid certain buzzwords. The substance might be identical, but the wrong presentation kills adoption.
5. Organizational theory of mind
Your live model of key stakeholders: their goals, constraints, communication styles, and current priorities. This is often the most important and hardest to articulate. The head of sales cares about anything that might disrupt Q4 numbers. The engineering manager is under pressure to reduce technical debt. The CEO has been burned by overly ambitious timelines.
A concrete example
Say you're tasked with evaluating whether to build a new integration with a popular tool. Most people prompt something like:
"Help me analyze whether our software company should build an integration with [Tool X]. Include pros and cons, resource requirements, and a recommendation."
That gets you a generic analysis that covers obvious points but misses what actually matters. Compare that to including just some of the relevant context:
"Help me analyze whether we should build an integration with [Tool X]. Context: We're a B2B fintech SaaS company in the with 50 enterprise customers averaging $25K ARR. Our engineering team of 8 is already committed through Q2. Our biggest competitor launched this integration 6 months ago, and we've lost 2 deals specifically citing its absence. Our CEO's top priority is reaching $2M ARR by year-end, but our Head of Engineering is concerned about technical debt from rushed integrations we built last year. The sales team wants this yesterday, but our product manager thinks we should focus on core platform stability. Need a recommendation that acknowledges these competing pressures."
The second prompt produces analysis that's actually useful because it optimizes for your real constraints and stakeholder concerns, not generic best practices.
Agents as context extractors
This brings us to an important trend: agents are increasingly doing the context extraction work for us. Systems like Deep Research, Claude Code, or ChatGPT with a Google Drive or Sharepoint connector enabled represent a shift in how we solve the context problem. These systems work by automating context assembly. They iteratively search and plan, alternating between "think → look → compress → think", improving the context with each pass. They delegate to sub-agents that parallelize discovery: one hunts primary sources, another extracts metrics, a third builds a timeline. Each returns compressed, ranked snippets to an orchestrator that synthesizes everything into a final response. The magic isn't just the final answer – it's the system's ability to go get what the answer depends on. When you point these agents at your Google Drive or SharePoint, performance jumps because the model no longer depends on you to manually ferry documents and tribal knowledge into the prompt. The system spends most of its tokens fetching and compressing the right internal context.
The partial solution
This automation solves a significant portion of the context problem. Agents excel at gathering what's already documented. They can learn your organization's writing style by reading past proposals, understand format preferences from old presentations, and even infer some tacit knowledge by analyzing patterns across documents, emails, and Slack conversations. But this is only a partial solution. Much of the most important context – especially organizational dynamics, unstated preferences, and recent informal decisions – isn't expressed in text anywhere. The head of product's growing frustration with technical debt, the unspoken understanding that certain customer segments aren't worth pursuing, the CEO's evolving views on market timing. These insights live in conversations, body language, and institutional memory that never gets written down. This creates an interesting dynamic: while agents democratize access to documented knowledge, they also increase the premium on context that can't be easily extracted. Skilled users who can surface this undocumented context will continue to extract significantly better performance from frontier models.
The capability paradox
As model capabilities expand, the frontier tasks become more context-hungry, not less. Better models don't just make existing tasks easier, they raise our expectations and make us attempt more complex, nuanced work that requires even richer context. When GPT-3 could barely write coherent paragraphs, we were impressed by any reasonable output. Now that GPT-5 can write sophisticated analysis, or complete a multi-hour task like building an entire website, we expect it to understand our specific industry, company dynamics, and strategic context. The ceiling rises, but the price of admission, clean specification plus the right evidence, rises with it. This means that as models get more capable, the variance between average users and sophisticated users may actually increase. The most skilled users will push these systems to tackle increasingly complex tasks that require deeper context extraction, while the average user trying the same task without the hard work of curating the right context will see underwhelming results.
Why this matters
We're entering a world where raw intelligence is increasingly commoditized, but the ability to effectively specify tasks and extract relevant context becomes the key differentiator. The people getting the most value from AI aren't necessarily the most technically sophisticated users, they're the ones who best understand their domain and can most clearly articulate the implicit context that shapes whether any solution will actually work. They've built a mental model of the LLM that helps them to see where it succeeds and fails, and how to provide the right harness to extract the best performance. If you feel like LLMs are underwhelming for complex work tasks, the solution probably isn't waiting for smarter models. It's developing the ability to recognize and extract the implicit context that usually stays in your head. Try choosing a difficult task that you'd normally expect a colleague to spend a couple of hours on. Something entirely text-based. Spent some time asking yourself: - What do I know about this domain that the model doesn't? - What constraints am I operating under that aren't obvious? - Who are the stakeholders, and what do they actually care about? - What would make output useful versus just correct? - What are the unwritten rules that determine success in this context? The models are already smart enough. The question is whether you can give them what they need to be useful.