Data, Not Wrappers: Why Law Firms Should Hire 30 Data Engineers Now

Law firms win AI with clean, machine-readable data and the pipes that feed it-not shiny chat wrappers. Kardos-Nyheim: hire data engineers and build retrieval with citations.

Categorized in: AI News Legal
Published on: Feb 19, 2026
Data, Not Wrappers: Why Law Firms Should Hire 30 Data Engineers Now

"Hire 30 data engineers tomorrow": How law firms actually win the AI race

Most firms are still trying to bolt AI onto old workflows. Alexander Kardos-Nyheim says that's the wrong bet. The moat isn't a shiny interface - it's the quality of your proprietary legal data and the plumbing that feeds it to your models.

His blunt advice to any managing partner: hire data engineers now. Put them inside your document systems - your iManage, your SharePoint - and make your institutional knowledge machine-readable.

The real moat is data (not wrappers)

Feature-led "wrappers" age fast. A competitor can copy your interface in weeks. What's hard to copy is a clean, structured, well-governed corpus of firm knowledge that models can reason over and cite with confidence.

Kardos-Nyheim's point is simple: legal-specific models paired with curated firm data outperform generic chat apps. They reduce hallucinations, deliver better citations, and understand legal reasoning because they've been fed the right inputs in the right format.

What this means for managing partners

  • Shift headcount: fewer junior bodies, more data engineers. Your leverage comes from systems, not headcount at the bottom of the pyramid.
  • Treat data as product: deduplicate, classify, tag, and version your precedents, playbooks, advice memos, and deal bibles.
  • Prioritize retrieval and citations: build retrieval pipelines that surface source passages with page/paragraph anchors so every answer is verifiable.
  • Adopt legal-tuned models: start with strong base LLMs, then fine-tune or use adapters on your domains (practice, jurisdiction, client context).
  • Measure outcomes: track drafting cycle time, review accuracy, citation error rate, write-offs avoided, and client satisfaction.
  • Rework delivery: productize repeatable work, price for outcomes, and let lawyers focus on judgment, negotiation, and strategy.

Who are the "30 data engineers" you actually need?

  • Data engineers to build pipelines from DMS/SharePoint into secure data stores.
  • Knowledge engineers to design taxonomies, ontologies, and document schemas (clause, obligation, party, jurisdiction, effective dates).
  • Retrieval/ML engineers to implement chunking, embeddings, rerankers, and evaluation harnesses for citation quality and hallucination control.
  • Data product managers to set requirements with practice leaders and translate legal needs into backlog and metrics.
  • Security/infra engineers to enforce privilege boundaries, redaction, tenancy, audit logs, and KMS-managed encryption.

Your first 90 days

  • Inventory and score your data: which repositories matter most (by practice, revenue, and reuse potential)? Identify duplicates and stale versions.
  • Define a common schema: document types, clause IDs, parties, jurisdictions, citations, effective/expiry, and matter metadata.
  • Stand up retrieval: build a pilot RAG pipeline for one high-value use case (e.g., M&A SPAs or fund docs) with strict citation display.
  • Evaluation before rollout: create a benchmark set of prompts, expected answers, and acceptable citations. Measure precision/recall and hallucination rates.
  • Data governance: retention rules, access controls by client/matter, privilege tiers, and approval workflows for model training data.
  • Talent plan: hire a lead data engineer, a knowledge engineer, and a retrieval engineer first - then scale.

Why legal-specific models matter

Generic models guess. Legal-tuned systems recall. They're trained to recognize structure (citations, defined terms, exhibits), evaluate authority, and avoid confident nonsense. They also handle formatting, clause extraction, and redline rationale far better when your data layer is clean.

The win isn't "more answers," it's verified answers with sources your partners would sign. That's how you defend quality and margin.

What to stop doing

  • Stop trying to ship a ChatGPT-style wrapper and calling it innovation.
  • Stop dumping millions of PDFs into a vector database without cleaning, tagging, and version control.
  • Stop ignoring citations. If every output doesn't show its sources, you're creating risk, not leverage.
  • Stop pretending this is an IT side project. This is core to how you deliver legal work.

The new pyramid

Expect fewer lawyers - and better ones. AI won't replace judgment, but it will compress the grunt work that props up the old model. Partners who invest in data and systems will ship higher-quality work, faster, with clearer provenance.

If you get the data layer right, your models become a force multiplier. If you don't, you'll spend the next year testing demos while competitors ship results.

Where to skill up next

For practical training on building firm-grade data pipelines, retrieval, and legal AI workflows, start here: AI for Legal. If you're hiring and upskilling technical teams, see: AI for IT & Development.

Kardos-Nyheim's message is clear: win with data, structure, and retrieval - then let your best lawyers do what they do best.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)