Increased testing of models and agentic AI reshapes enterprise observability
Dynatrace built its name on observability and security for classic workloads. Now it's extending that discipline to AI workloads, giving teams a clearer picture of how modern applications behave under real use.
Speaking at KubeCon + CloudNativeCon NA, Alois Reitbauer, chief technology strategist at Dynatrace, summed it up: "We see a change in how people are building applications. In the past it was basically OpenAI, you used OpenAI and then it started to switch to other models. Now, we see people experimenting way more, like A-B testing models and the practice of … AI native engineering."
From single model usage to model experimentation
Enterprises aren't sticking to one foundation model anymore. Teams are trialing multiple models, routing by task, and running A/B tests to measure quality, latency, and cost.
That shift demands a new level of traceability: which model was used, why the router picked it, what context was injected, and how it performed against a specific goal.
Agentic AI changes how you debug
Some newer models expose a "train of thought" that makes it easier to see how they arrived at an output. Helpful, but the job is far from done.
"Debugging AI in agentic applications is kind of different," Reitbauer said. "The more we move into more dynamic systems, like going more into this agentic world, the more the individual transactions will be different." In short: fewer repeatable paths, more unique runs. Observability has to capture intent, decisions, and outcomes for each step.
Guardrails and goal tracking over shoulder-watching
Agentic systems act on tasks, not step-by-step instructions. You won't inspect every move. You set constraints, define success criteria, and review outcomes.
"Guardrails are a key and guardrails started to emerge very early on," Reitbauer noted. "Really thinking to the next step about agentic, we have to track against goals and I think that's where business observability comes in. You're delegating a task, you're not looking AI over the shoulder."
Dynatrace's Azure move
Dynatrace announced its next-generation cloud operations solution for Microsoft Azure, including support for agent-driven patterns. That matters because many teams will build agents, tools, and model routing directly on Azure services.
If you're evaluating this path, review Microsoft's guidance on agent services for architecture and security baselines. Microsoft Learn: Azure AI agent services
What to instrument now (practical checklist)
- Prompt, context, and tool calls: Log prompts, system instructions, retrieved context, and the exact tools an agent used. Mask PII and secrets.
- Model routing: Capture model/version, selection rationale, fallback events, and temperature/top-p values.
- Agent steps: Track each step, the tool invoked, input/output, and confidence signals.
- Evaluation signals: Store automatic scores (toxicity, hallucination checks, policy hits) and human ratings (RAG quality, relevance, helpfulness).
- Goal outcomes: Define the task up front and record success/failure, retries, and final business outcome.
- Cost and latency: Token usage, time per step, queue delays, and upstream/downstream service time.
- Guardrail events: Blocked prompts, redactions, policy violations, and safety interventions.
- Data lineage: Where context came from, embedding versions, and index timestamps.
- Version control: Tie model/app/agent config to git commits and feature flags for fast rollback.
For engineering, ops, and product leaders
- Make goals first-class: Every agent action should roll up to a measurable outcome.
- Bias toward experiments: A/B test models, prompts, and routing policies. Keep the best, retire the rest.
- Build a safety net: Policy checks, rate limits, escalation paths, and human-in-the-loop for high-risk flows.
- Close the loop: Feed production outcomes back into prompts, retrieval, and router decisions.
- Standardize telemetry: Use consistent schemas so data can be searched, compared, and audited.
Why this matters now
Traditional observability centers on services, endpoints, and traces. AI adds intent, context, and decisions that change on every run. If you can't see those layers, you'll struggle to diagnose failures, improve quality, or control cost.
The teams that log goals, instrument agent steps, and quantify outcomes will ship faster with fewer surprises. Everyone else will guess.
Skill up your team
If your organization is standing up agent projects, strengthen skills in prompting, evaluation, and AI safety. Start here: AI courses by job role
Your membership also unlocks: