Databricks upgrades Agent Bricks with quality, observability and governance to move agents from pilot to production

Databricks sharpens Agent Bricks to push agentic AI from pilot to production

Databricks is rolling out new Agent Bricks capabilities aimed at a problem most IT leaders feel every quarter: agents that look good in demos but stall before production. The update centers on accuracy, governance, and data access-exactly the weak spots that keep projects stuck.

The most notable change is general availability of MLflow for Agent Quality and Observability, built to continuously evaluate and monitor agent behavior. Alongside that, Databricks is previewing a governed AI Gateway, an MCP Catalog for tool/data access, multi-agent supervision with MCP support, and a SQL function to extract context from unstructured content.

Why this matters

Enterprises shifted from basic chatbots to agentic systems in 2024, but building trustworthy autonomous workflows is hard. Poor accuracy, one-vendor lock-in, and governance gaps are common failure points.

According to William McKnight, president of McKnight Consulting, "The new capabilities are a significant update designed to instill confidence in moving AI agent projects from pilots to secure production by focusing on ensuring the AI is governed, open and accurate. A full agent lifecycle is covered."

What's new in Agent Bricks

MLflow for Agent Quality and Observability (GA): Continuous agent evaluation, run tracking, and metrics to raise accuracy and reduce drift. See the broader ecosystem at MLflow.
AI Gateway (preview): A governed interface to manage agent connections to models like OpenAI's GPT-5, Google's Gemini, Anthropic's Claude Sonnet, and open source options.
MCP Catalog in Marketplace (preview): Governance and lifecycle control for connecting agents to external tools and data sources via the Model Context Protocol. Learn more about the protocol at Model Context Protocol (MCP).
MCP support in Multi-Agent Supervisor (beta): Coordinate multi-step workflows across specialized agents with standardized tool access.
ai_parse_document SQL function (preview): Extracts content from documents and tables so agents can ground decisions in unstructured data, not just rows and columns.

Only MLflow for Agent Quality and Observability is generally available today; the rest are in preview or beta.

Context and momentum

Agent Bricks launched in beta in June to help teams close the gap between prototypes and production. Databricks also made OpenAI models natively available across its platform as part of a $100 million partnership, broadening model choice without custom plumbing.

Devin Pratt, analyst at IDC, summed it up: "Collectively, these updates help organizations move agents from pilot to production with greater control and trust. This is about making enterprise agents trustworthy, accurate, governed and flexible on the data organizations already control."

How Databricks is framing the problem

Databricks points to three recurring blockers: low confidence in agent quality, lock-in to a single model provider, and security/governance exposure. The new releases target all three with evaluation pipelines, policy controls, and standardized tool access.

McKnight sees the biggest near-term upside in the MCP Catalog and ai_parse_document: they address governance, security, and data grounding-common reasons pilots stall. Pratt also highlights MLflow's evaluation workflows as critical for regulated or customer-facing uses.

How it stacks up

Competitors like Snowflake (Cortex Agents), Teradata, Informatica, AWS, Google Cloud, and Microsoft are all building agent tooling. Analysts note Databricks' edge is unifying data governance, model control, and agent evaluation inside a lakehouse architecture.

As Pratt puts it, this supports governed, data-centric AI operations while keeping development flexible and well-orchestrated.

Practical rollout plan for IT and engineering leaders

Stand up evaluation early: Define task suites, golden datasets, and pass/fail thresholds in MLflow before integration work begins. Track regressions per agent/version.
Enforce model policy: Route all model calls through AI Gateway. Set guardrails for PII handling, cost ceilings, preferred model lists, and failover providers.
Standardize tool access via MCP: Catalog external tools and data sources with clear approval paths, scopes, and audit trails. Use MCP in multi-agent workflows to avoid bespoke connectors.
Ground agents in your data: Use ai_parse_document to extract context from PDFs, docs, and tables. Pair with retrieval policies that log sources and citations for auditability.
Plan multi-model testing: Benchmark tasks across providers (OpenAI, Google, Anthropic, open source). Select by performance, latency, and cost-not brand.
Operationalize observability: Monitor task success rates, tool-call accuracy, latency, and cost per task. Alert on drift and roll back to safe versions when needed.

Governance and risk checklist

Document decision boundaries for each agent; prevent actions outside scope.
Require human-in-the-loop for high-impact or irreversible steps.
Enable end-to-end audit logs: prompts, tool calls, retrieved content, model responses, and final actions.
Run red-team tests for data exfiltration, prompt injection, and tool abuse.
Track total cost per user or workflow to avoid surprises at scale.

Known friction points

Analysts still call out two gaps: ease-of-use and pricing clarity. If you're evaluating, include UX in your proof-of-concept and model your total cost of ownership with real workloads, especially integration-heavy pipelines.

Bottom line

Agent Bricks is maturing in the right places: accuracy, governance, and data access. If your agents are stuck at the pilot stage, the GA evaluation tooling plus governed model and tool access are worth testing against your highest-priority workflows.

Next steps

Identify two production candidate workflows and define success metrics this week.
Set up MLflow evaluation, route model calls through AI Gateway, and catalog required tools via MCP.
Pilot ai_parse_document for unstructured data grounding and measure error rate reduction.
Hold a go/no-go review after two weeks based on accuracy, latency, compliance, and cost.

Building team capability for agentic systems? Explore role-based AI paths here: AI courses by job.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

Databricks upgrades Agent Bricks with quality, observability and governance to move agents from pilot to production

Databricks sharpens Agent Bricks to push agentic AI from pilot to production

Why this matters

What's new in Agent Bricks

Context and momentum

How Databricks is framing the problem

How it stacks up

Practical rollout plan for IT and engineering leaders

Governance and risk checklist

Known friction points

Bottom line

Next steps

Related AI News for IT and Development

Malaysia Deepens AI and Cybersecurity Ties with China, Balancing Innovation and Safety

When Chatbots Grow a Personality on Their Own - and What It Means for How We Use Them

FDA and EMA's 10 Principles for AI in Drug Development: Practical Takeaways for Sponsors and Partners

2025 Go Developer Survey: AI Tool Use High, Satisfaction Held Back by Quality Issues; Go Scores 91%

Related AI News for Management

Front-office AI takes center stage as buy-side tech priorities shift to innovation and data

AI Weekly: Linux Foundation's Agentic Push, Salesforce MuleSoft, ServiceNow-OpenAI, and $400M for ClickHouse - Jan 23, 2026

Metadata, not models, determines whether AI scales

Yelp to acquire AI lead-management platform Hatch for $270M, plus $30M retention

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: