Beyond RAG: Agentic Context Construction
Retrieval should feel less like a single query and more like an exploratory loop.
For years, retrieval systems have been obsessed with finding the perfect chunk.
The assumption was simple:
Question
↓
Vector search
↓
Top-k chunks
↓
Answer
If the answer wasn’t good enough, we built better embeddings.
Then coding agents arrived.
Cursor, Windsurf, Claude Code, Codex, Copilot, Google Antigravity.
And they accidentally demonstrated something surprising:
You don’t need perfect retrieval.
You need a way to build context.
A Simple Example
Suppose I ask:
What is the maximum transaction amount in this dataset?
Traditional semantic retrieval might embed maximum and try to find passages related to that word.
But language is messy. The document might say:
- highest transaction
- top transaction
- largest purchase
- peak value
Embedding models are useful because they can connect these concepts. A vector search might correctly retrieve all of them.
But notice what happened.
The model wasn’t actually looking only for maximum. It was looking for a set of related concepts:
max
highest
top
largest
peak
In other words, for many retrieval tasks, semantic search behaves a lot like query expansion.
Important caveat: semantic search and query expansion are not the same mechanism. Semantic search learns a latent representation where terms are nearby; query expansion explicitly generates those terms. But from an agent’s point of view, the effect is often similar:
semantic("max")
grep("(max|highest|top|largest)")
Both are mechanisms for increasing recall.
That raises the historical question: if semantic search is one powerful way to improve recall, why did embeddings become the whole story?
The Embedding Era
For a while, embeddings felt like the center of the universe.
In early 2023, text-embedding-ada-002 became the default mental model. Vector databases were everywhere, and the standard RAG pipeline hardened into a familiar sequence:
chunk
embed
index
retrieve
rerank
evaluate recall and precision
That made sense. If retrieval is a one-shot operation, better embeddings and faster vector search are the obvious levers. So the money and the attention followed: vector databases became a category, a pitch deck, and a default box in every RAG diagram.
Then the frontier got quieter.
OpenAI released text-embedding-3-small and text-embedding-3-large on January 25, 2024: cheaper, stronger, with native support for shorter embedding dimensions. As of June 2026, OpenAI’s embeddings guide still centers on that generation. Anthropic never shipped its own embedding model; its Claude embeddings guide points developers to Voyage AI instead.
This is the quiet part. Not a crash. A fade. Vector databases followed a similar arc: pgvector brought similarity search into Postgres, incumbents bolted on vector indexes, and the standalone category started to feel more like plumbing than frontier.
Embedding work did not stop. Google announced Gemini Embedding 2 on March 10, 2026, with multimodal embeddings across text, images, video, audio, and documents. The open model world is active too: the MTEB leaderboard keeps moving, with recent state-of-the-art claims from models like Microsoft’s Harrier OSS and Qwen’s Qwen3 Embedding.
So the claim is not that embeddings are dead. It is narrower, and a little stranger: embeddings got good, got commoditized, and got quiet.
The visible energy in assistant systems shifted toward long context, tool use, file search, code execution, connectors, and agents. Less effort on making the first nearest-neighbor query perfect. More on giving the model ways to gather, inspect, and act on context.
Maybe embeddings became good enough for many use cases. Maybe the bottleneck moved elsewhere. Or maybe coding agents exposed a more general lesson: retrieval quality is not only about the first result. It is also about whether the system can keep looking.
What Coding Agents Do Instead
A coding agent rarely asks:
What are the top 10 chunks relevant to my problem?
For the transaction question, it does something more like:
search "max"
search "highest|largest|top"
open matching file or page
read nearby
follow table or reference
verify the value
search again if uncertain
The agent actively explores. Context emerges from that exploration process, not from a single retrieval result.
This is fundamentally different.
Retrieval Is a Process
Traditional RAG treats retrieval as an operation:
retrieve(query)
Coding agents treat retrieval as a loop:
while not confident:
search()
inspect()
expand()
The difference sounds subtle.
It isn’t.
The first assumes the answer already exists inside the retrieved context.
The second assumes the context itself must be constructed.
That is why traditional RAG systems become obsessed with recall and precision. If retrieval is a one-shot operation, missing the right chunk is fatal. But retrieving too much irrelevant context is also harmful, because it wastes the context window and can distract the model.
The search-inspect-expand loop relaxes that pressure. The first search does not need to be perfect if the agent can recover: read nearby, follow references, inspect structure, and search again.
But this only works if the system exposes those navigational primitives.
Codebases already have them: files, folders, imports, symbols, line numbers, and grep. Most document systems throw much of that structure away when they reduce a PDF into independent chunks.
Agentic Context Construction
This suggests a different abstraction.
Instead of:
Question
→ Retrieve context
→ Answer
we should think:
Question
→ Construct context
→ Answer
Imagine the same question inside a financial report:
What is the maximum transaction amount in this dataset?
A chunk-based system tries to retrieve the passage most likely to contain the answer. An agentic system builds a path: search for transaction language, open the matching section, inspect the surrounding table, check the unit and currency, follow any appendix reference, and only then answer.
The context window is not a bag of top-k chunks. It is the smallest working set the agent could assemble to solve the problem.
That is exactly how human engineers investigate a codebase. It is also how people read dense documents: search for a concept, open the section, read around it, jump to the table, follow the citation, return.
The process is navigational.
Not retrieval-oriented.
What Document Systems Should Expose
The lesson from coding agents is not that vector search is useless.
Nor is it that a handful of grep patterns can replace semantic retrieval. Semantic search is often better when the retrieval task depends on meaning, nuance, or language that does not share obvious keywords.
The point is that semantic search is one primitive in the process, not the whole retrieval system.
For documents, the practical takeaway is simple: preserve structure and expose navigation. Keep page numbers, section hierarchy, table boundaries, captions, citations, backlinks, neighboring chunks, and stable document IDs. Give the agent tools like BM25, semantic search, exact search, read_nearby, open_page, and follow_reference.
At that point, retrieval stops being the product.
Navigation becomes the product.
Once an agent can iterate, the exact retrieval primitive becomes less important than the ability to keep exploring. The agent’s job becomes:
Construct the smallest context necessary to solve the problem.
That is what coding agents accidentally taught us. They were not trying to fix RAG. They were just moving through a workspace, asking what to inspect next.
Maybe that is the real shift beyond RAG:
Not retrieval. Agentic Context Construction.