Most Legal AI Products Use the Same Foundation Models
The vast majority of legal AI products run on foundation models from OpenAI, Anthropic, or Google. When lawyers interact with a legal AI assistant, they query the same underlying models that power consumer tools like ChatGPT, Claude, or Gemini-wrapped in domain-specific interfaces and workflows.
Legal tech companies add a custom user interface, prompt engineering, and workflows around the model. The underlying reasoning engine is often the same one available to practitioners directly. It's like multiple car brands using the same engine manufacturer.
For simple tasks like summarizing a contract, drafting a standard letter, or answering a discrete legal question, the difference between a legal tech product and querying a frontier model directly is often modest. The real value legal tech vendors add lies in workflow integration, security infrastructure, and retrieval systems that connect models to large document collections.
Retrieval Systems Create Real Differentiation
Meaningful differentiation begins with how systems retrieve and structure information before sending it to the model. When a lawyer uploads contracts or case files and asks a question, the system needs a way to find relevant passages and feed them to the AI.
Most legal tech platforms use retrieval-augmented generation (RAG). Here's how it works: the system breaks documents into chunks and converts each into a vector embedding-a mathematical representation that captures semantic meaning. Your question receives the same treatment. The system then compares these representations to find the most relevant passages and sends only those passages to the model along with your question.
RAG enables source citation and reduces hallucination by grounding responses in actual documents. Leading platforms have invested in custom embeddings trained specifically on legal text, so the system recognizes that "hold harmless" and "indemnification" are related concepts, even when a general-purpose model might miss the connection.
But standard RAG has a core limitation: retrieval typically occurs only once per query. The system cannot recognize that initial results raise new questions, follow citation chains, or identify gaps that warrant further search.
Agentic Systems Add Iterative Reasoning
A newer approach called agentic retrieval addresses this limitation by introducing an orchestration layer that plans, executes, evaluates, and re-plans retrieval steps iteratively. Rather than retrieving passages and generating an answer in a single pass, agentic systems assess whether retrieved context is sufficient, formulate follow-up queries when gaps remain, and continue searching until the question is adequately resolved.
This mirrors how a human investigator works: read, reason, notice what's missing, then search again. For complex investigative tasks, the accuracy gains appear significant. The improvement comes from architecture, not from using a "better" model.
Context Windows Have Hidden Limitations
Modern LLMs advertise impressive context windows. Gemini 3 Pro supports roughly 1 million tokens (about 750,000 English words), GPT-5.2 offers 400,000 tokens, and Claude Opus 4.5 provides 200,000. Yet research consistently shows that performance degrades as context length increases-a phenomenon called "context rot."
Even on basic retrieval tasks, performance declines in non-uniform ways as inputs grow longer. Models particularly struggle to recall information buried in the middle of long contexts. The implication is counterintuitive: dumping more documents into a large context window can produce worse results than carefully selecting what the model sees.
Context engineering treats the model's input as a design problem. Well-designed systems use hierarchical summarization, write intermediate findings to external memory, or rely on sub-agent architectures in which specialized components analyze subsets of documents and return structured outputs.
What This Means for Your Evaluation
For simple tasks, the gap between legal tech products and frontier models is smaller than marketing suggests. For complex work, architectural choices around retrieval and context management make a meaningful difference.
When evaluating tools, ask whether a system relies on basic RAG, agentic retrieval, or more sophisticated context engineering. Understanding what's actually under the hood positions you to deploy these tools more responsibly.
Supervision remains essential regardless of architecture.
Your membership also unlocks: