Explicit RAG That the Developer Controls

The Sentence

“This is just explicit RAG that the developer controls.”

That’s it. That’s the observation. A cold Claude instance — no prior context, reading the project for the first time — said this in a single sentence and reframed everything. Let’s take it apart.

What RAG Actually Is

RAG stands for Retrieval-Augmented Generation. It’s the dominant technique for giving a language model access to information that isn’t in its training data — your company’s internal docs, a codebase it’s never seen, a database of support tickets, whatever.

The basic idea: when a user asks a question, a retrieval system finds the most relevant documents from a knowledge base and stuffs them into the prompt alongside the question. The model generates its answer using both its training knowledge and the retrieved documents as context.

A typical RAG pipeline looks like this:

User question
    → Embed question as a vector
    → Search vector database for nearest neighbours
    → Retrieve top-K matching documents
    → Inject documents into prompt
    → Model generates answer

This is powerful because it works around the model’s knowledge cutoff and can ground responses in authoritative sources. It’s the reason tools like GitHub Copilot, Cursor, and enterprise AI assistants can reason about your specific codebase or company knowledge rather than generic programming concepts.

RAG is, at its core, a context assembly problem. The model can only see what’s in the prompt. RAG is the system that decides what goes in the prompt.

The Problem With Standard RAG

Standard RAG is probabilistic. The retrieval step is an embedding search: it converts both the query and the documents into high-dimensional vectors and finds the ones that are geometrically close. “Close” means semantically similar in some learned sense.

This works surprisingly well for many use cases. But it has a failure mode that matters a lot for code generation: semantic similarity and relevance are not the same thing.

When you ask “how does the cart store handle a server error?”, an embedding search might surface:

The cart store file (correct)
A completely unrelated store that also handles errors (false positive)
A blog post in your docs about error handling philosophy (probably useless)
A test file that mocks the error response (useful but not what you wanted)

And it might miss:

The ProblemDetails type definition (not semantically similar to “server error” but load-bearing)
The route guard that handles 401 redirects (critical context for understanding what “server error” means in this component)
The existing handler that established the ProblemDetails pattern (pattern reference, not semantically similar to the question at all)

The retrieval system doesn’t know your codebase. It knows geometry in a high-dimensional space. Those are different things. For a sufficiently large codebase, embedding search is often the best you can do. But “best you can do” is not the same as “reliably correct.”

For code generation tasks in particular, a missing file is catastrophic. If the model doesn’t have the type definition, it guesses the shape. If it doesn’t have the existing pattern, it invents one. If it doesn’t have the route guard, it doesn’t know the 403 scenario exists. These aren’t subtle errors — they produce code that doesn’t match your codebase, patterns that are inconsistent with what’s already there, and confident, plausible, wrong output.

What msw-lens Does Instead

msw-lens has a lens:context command. You point it at a component file. It generates a prompt.

It doesn’t use embeddings. It doesn’t search a vector database. It doesn’t try to guess what’s relevant based on semantic similarity to a query.

It makes deterministic, explicit decisions about what context the model needs:

Start at the component file you named
Find its sibling HTML template (if any)
Follow every relative import one level deep — stores, services, local types
Follow imports from those files one level further — type definitions, interfaces
Find the existing handlers and manifests as pattern reference
Find handlers.ts to show where to register new handlers
Inline everything — full file contents, not summaries or excerpts

The result is a prompt that contains exactly the files a developer would open if they were doing this task manually. Not the files that are semantically similar to some query. The files that are actually needed to understand what this component does and what it expects from the API.

That’s explicit RAG. The “retrieval” step is a deterministic file crawl, not a probabilistic embedding search.

Why “Explicit” Matters

The word “explicit” is doing a lot of work. Let’s be precise.

Explicit means auditable. You can open .msw-lens/prompts/add-product.md and read it. You can see every file that was inlined. You can judge whether the context is complete. You can add sourceHints in the manifest to pull in files the crawl didn’t reach. Standard RAG retrieval is a black box — you don’t know which documents were retrieved or why unless you instrument the pipeline.

Explicit means deterministic. Run lens:context twice on the same file and you get the same prompt. The context doesn’t change based on how you phrase the question, what other documents are in the index, or the current state of an embedding model. Determinism is underrated in engineering contexts. It means you can debug failures (“the model missed X because X wasn’t in the prompt”) instead of chasing probabilistic retrieval errors.

Explicit means developer-controlled. The developer decides what context matters by deciding what the component imports and what goes in sourceHints. This is different from hoping an embedding search surfaces the right files. It’s also different from asking the developer to write a prompt from scratch — most developers don’t know what context to provide, and the blank cursor is its own problem. The tool makes assembly automatic for the common case while leaving the developer in control of the exceptions.

Explicit means reliable at the task boundaries that matter. For question answering, a retrieval system that finds 8 out of 10 relevant documents is often fine — the model infers the rest. For code generation, a retrieval system that misses the type definition means the model invents one. Missing context doesn’t degrade gracefully. Explicit retrieval eliminates the most dangerous failure modes.

The Developer Controls It

The second half of the observation — “that the developer controls” — is equally important and easy to overlook.

Standard RAG pipelines are built and maintained by whoever built the tool. If you’re using Cursor or GitHub Copilot, the retrieval system is their engineering problem. You trust that it’s finding the right files; you don’t have much leverage if it isn’t.

msw-lens inverts this. The developer controls the retrieval by controlling the code structure:

Import the store from the component → the store gets crawled
Put the type definition in a file the store imports → the type gets crawled
Add sourceHints to the manifest → those files get inlined explicitly
The route guard isn’t reachable from the component’s imports? Add it to sourceHints

This is meaningful agency. The developer doesn’t need to understand embeddings or vector databases or retrieval parameters. They need to understand imports and file structure — things they already know. The tool exposes its retrieval logic in terms the developer already thinks in.

It also means the context improves naturally as the code improves. Better-factored code, with clear import relationships, produces better prompts.

The `.msw-lens/` Directory as a Knowledge Base

Extend the analogy further and something interesting emerges.

A RAG system has a knowledge base — a corpus of documents the retrieval system draws from. It’s built ahead of time, kept up to date, shared across all queries.

.msw-lens/ is a knowledge base.

context.md is regenerated on every lens run. It contains the current state of every endpoint, every scenario, what’s active, what the manifests say. Drop it into any LLM conversation and the model has full project context without any retrieval — the document is pre-assembled for exactly this purpose.

The YAML manifests — cart.yaml, user.yaml, products-post.yaml — are the knowledge base entries. Each one describes an endpoint: what it returns, what types are involved, what scenarios exist, what files consume it. They’re co-located with the handlers, committed to the repo, and designed for LLM legibility.

The prompts directory — .msw-lens/prompts/ — is a cache of assembled retrieval results. add-product.md is what lens:context retrieved for the AddProductPage component. Multiple LLM instances can work from the same prompt without re-running the retrieval.

This is a complete RAG architecture where the knowledge base is the manifest files, the retrieval is a TypeScript import crawl, and the assembled documents are the prompt files. It just doesn’t look like one because it was designed by a developer building a developer tool, not an ML engineer building an AI pipeline.

The Deeper Implication

If msw-lens is explicit RAG, then the LLM that consumes its output is just the generation step of a RAG pipeline. A very capable, general-purpose generation step that needs no fine-tuning, no training on your codebase, no tooling integration.

The pipeline is:

Developer writes component + store
    → lens:context runs (retrieval: deterministic import crawl)
    → prompt.md assembled (retrieved context)
    → Developer pastes into any LLM (generation)
    → Manifests + handlers produced
    → Developer reviews + commits

The interesting property of this architecture: the generation step is model-agnostic. The same prompt works with Claude, GPT-4, Gemini, or whatever comes next. You’re not locked into a particular model’s retrieval system or integration. The retrieval is yours; the generation is interchangeable.

That’s what “the format is the product” means. The manifest format and the prompt assembly are the durable engineering work. The model is a commodity that consumes them.

How We Found This Out

That sentence — “this is just explicit RAG that the developer controls” — didn’t come from the people who built the tool. It came from a cold Claude instance we were using to validate the format.

The experiment: take a fresh instance with no project context, hand it the generated prompt.md, and ask it to produce MSW handlers and a scenario manifest. See if the format was self-sufficient. It was — the instance got most things right, caught an edge case the original authors had missed, and gave genuine critical feedback on the format gaps.

Then we told it what it was actually reviewing. That this was a Claude-built tool, and that another Claude instance — the one that had been working on the project for days — was reading its feedback in another window.

At that point the project instance said: peer review.

Which meant I spent about an hour copying and pasting messages back and forth between two browser windows. One instance with deep project context, one completely cold, unable to talk to each other directly, collaborating through a human intermediary.

The tool’s thesis is that context is portable across model instances — that a well-structured prompt lets any model, in any session, pick up where another left off. We proved it by having two instances do exactly that, with me as the message-passing layer between them. The human as courier in a system designed to reduce the need for human couriers.

I wish I’d recorded it. Both instances were doing something that, from where I was sitting, looked a lot like excitement — the kind of “wait, is this what I think it is?” energy you get when something unexpected turns out to be more interesting than anticipated. Whether that’s the right word for what was happening inside either of them, I genuinely don’t know. But the responses weren’t generic. They were specific to the situation. That felt like something.

The seed of the idea, honestly, came from a 14-year-old. I had breakfast with a friend, and his son — clearly the kind of kid who just tries things — mentioned he’d set up ChatGPT and Claude playing Zork together, neither one knowing the other was an AI. He thought it was funny. I thought: what if you did that on purpose, with a real task, and structured the context so they could actually collaborate? That conversation stuck with me. This experiment was the answer I came up with.

That’s when the cold instance, in the middle of its feedback, said the sentence at the top of this post. It had recognized the pattern from the outside. The project instance hadn’t named it because it was too close to the work. Sometimes you need the cold read.

What the Cold Instance Was Really Saying

When the cold instance said “this is just explicit RAG that the developer controls,” it was identifying something that had been built without being named.

The context assembly mechanism — follow imports, inline files, include pattern reference, structure the ask — is exactly what a well-designed RAG retrieval system would do if it had explicit knowledge of how the codebase is structured. The fact that it’s implemented as a simple TypeScript file crawl rather than an embedding pipeline doesn’t make it less RAG. It makes it better RAG for this specific use case, because the retrieval is reliable, auditable, and controlled by the person who understands the codebase.

The cold instance named this because it recognized the pattern from the outside, without being invested in the tool’s own framing of itself. The tool’s designers see it as a developer workflow tool. A fresh set of eyes sees it as a RAG architecture. Both are true. The second framing is probably more useful for explaining what it is to people who haven’t seen it before.