A research team from MIT and Microsoft has built a system called Murakkab that automatically optimizes multi-step AI workflows, slashing energy use and cloud costs without hurting performance. The system, presented at the USENIX Symposium on Operating Systems Design and Implementation, tackles a growing challenge: as agentic workflows become the backbone of cloud applications, their fragmented design often leads to overprovisioned resources and wasted computation.
Agentic workflows chain multiple AI models and external tools to handle complex tasks like analyzing video and answering questions. Developers typically must hardcode every technical choice-which models, tools, and order to use-and specify the hardware configuration upfront. This brittle approach forces a complete rework whenever a new model or accelerator appears, and it leaves cloud providers blind to how resources are actually being consumed.
The configuration conundrum
"Even if you wanted to do all this manually, it is unlikely that you'll be able to configure the workflow optimally because the space of possible configurations is so large," said Gohar Chaudhry, an EECS graduate student at MIT and lead author of the paper. The workflows combine black-box models and diverse tools from different vendors, each with its own settings, creating an explosion of combinatorial choices.
The result is a persistent efficiency gap. Cloud data centers allocate hardware without seeing inside the workflow, often over-allocating resources to be safe. For IT and development teams building agentic systems, this translates directly into inflated bills and avoidable carbon footprints. Optimizing these workflows is critical as they become more complex, a topic central to AI Agents & Automation.
How Murakkab adapts on the fly
Murakkab changes the game by letting a developer describe what they want in plain language instead of coding every detail. The system automatically identifies the best models and tools, decides which components should run sequentially or in parallel, and selects ideal hardware configurations. It also adjusts these choices dynamically based on each user's priorities, such as minimizing cost or maximizing speed.
When the cloud provider deploys the workflow, Murakkab gives full visibility across multiple workloads, enabling shared resources in the most efficient way while still satisfying user constraints. This shifts the burden from manual tuning to intelligent, real-time decision-making. "Agentic workflows are getting very complicated and quickly becoming the backbone of what cloud providers are doing. Energy usage is a huge concern, so we need to be very careful about how efficient these workflows are. It is very easy to over-allocate resources, wasting energy and money. Enabling a cloud provider to intelligently make these workflows more resource-optimal is a win for everyone involved," Chaudhry said.
Cloud providers must intelligently allocate resources, a challenge that falls squarely within the scope of AI for IT & Development. Murakkab's adaptive approach means a new GPU or model can be integrated without the developer rewriting the entire application.
Real-world gains in efficiency
Tests on agentic workloads for video Q&A and code generation showed striking results. Murakkab met user requirements while using only about 35 percent of the computation required by other methods. It consumed roughly 27 percent as much energy and cost less than 25 percent of the baseline. In one instance, the system cut energy consumption by more than an order of magnitude with only a 2 percent drop in accuracy.
The platform also discovered an unexpectedly ideal configuration for a model that selects video frames-something a human developer would almost certainly miss. By constantly reevaluating the interplay of models, tools, and hardware, Murakkab finds efficiency gains that remain hidden in rigid, pre-set workflows.
Why this matters for IT, development, and research professionals
For anyone who builds, deploys, or relies on cloud-based AI services, Murakkab points to a future where agentic workflows no longer come with wasteful overhead. Rather than manually micro-optimizing components, teams can specify high-level goals and let the system handle the messy configuration work. That reduces both operational costs and the expertise barrier for deploying sophisticated multi-step AI.
Researchers and engineers can apply the same principles of dynamic, intent-driven optimization to other resource-intensive workloads. The paper offers a concrete blueprint for cutting energy use in data centers without sacrificing performance-a priority that directly affects budgets, sustainability targets, and the scalability of next-generation AI systems.
Your membership also unlocks: