March 6, 2026
How Balyasny Asset Management built an AI research engine for investing
Balyasny Asset Management runs ~180 investment teams across asset classes and geographies. To keep conviction high and cycle times low in a flood of financial data, they built an Applied AI function-20 researchers, engineers, and domain experts focused on AI-native tools that plug directly into team workflows. Their flagship: an AI research system that can reason, retrieve, and act like a skilled analyst-without breaking compliance.
"AI is enabling our teams to apply first principles thinking faster, across more data, and with more structure." -Charlie Flanagan, Chief AI Officer
The problem with legacy research workflows
Analysts sift through thousands of sources-market data, broker notes, expert calls, and regulatory filings-under tight deadlines. Off-the-shelf tools struggle to handle structured and unstructured data together, lack orchestration, and fall short on institutional-grade compliance. Balyasny needed an AI system that thinks like an analyst, moves like software, and respects firmwide guardrails.
Four lessons from Balyasny's approach to AI at scale
1) Evaluate models before deploying them
Balyasny built a rigorous evaluation pipeline across 12+ dimensions: forecasting accuracy, numerical reasoning, scenario analysis, tool use, and resilience to noisy inputs. Tests run on internal benchmarks and proprietary data, surfacing strengths in the GPT-5.4 family-especially multi-step planning, tool execution, and reduced hallucinations. GPT-5.4 now operates as the reasoning engine alongside internal models, selected task-by-task on empirical performance.
"We evaluate models the way we evaluate investments: on fundamentals. GPT-5.4 proved it could plan, reason, and execute with real rigor." -Su Wang, Senior Research Scientist
2) Put users and model builders in the same room
Balyasny involved OpenAI directly in user-facing workflows. Model teams observed live analyst sessions-where the system wins, where it fails, and what "good" looks like in production. The result: faster iterations, tighter feedback loops, and better behavior on finance-specific tasks. As a design partner on frontier releases, Balyasny surfaced insights from real analysts, not synthetic tests.
"We didn't just tell OpenAI what we needed. We showed them. And that made all the difference." -Jonathan Park, Product Manager
3) Design for feedback loops, not static tools
Because AI is embedded in daily workflows, Balyasny captures structured feedback in real time-user ratings, outcome audits, and tool execution quality. That loop improves both models and orchestration. Example: merger arbitrage teams needed agents to constantly re-evaluate deal probabilities as filings and press releases landed. The team extended agent planning and tool access, replacing a slow, manual workflow with real-time probabilistic monitoring.
4) Centralize your AI system, and customize locally
An Applied AI team builds shared components-agent frameworks, toolchains, and compliance guardrails-and deploys them across strategies via a federated model. Each team gets scoped data and tools, while the platform scales centrally with consistent governance. This preserves speed and flexibility at the edge without sacrificing risk management.
"Our early investments in AI paid off. Today, every one of our investment teams can decide how to apply the latest AI to their process, in a secure environment and with real-time expert guidance." -Kevin Byrne, Chief Operating Officer
A playbook delivering results in hours-not days
~95% of Balyasny investment teams now use the AI platform. Deep research tasks that took days finish in hours, with agents synthesizing tens of thousands of documents-filings, broker research, earnings, and expert calls. For context on filings, see the SEC EDGAR system.
A Central Bank Speech Analyst cut macro scenario analysis from 2 days to ~30 minutes. A Merger Arbitrage Superforecaster now updates deal probabilities continuously, replacing bespoke spreadsheets and manual alerts.
Confidence also increased. With scoped tools, traceable reasoning, and testable agents, analysts deliver structured, explainable insights that strengthen decision quality.
"It's like adding a teammate who never forgets, always cites sources, and double-checks the details before sending anything back." -Charlie Sweat, Portfolio Manager
The operating model behind the wins
- Evaluation-first: Internal benchmarks and red-team tests across reasoning, math, forecasting, and noise tolerance.
- Agentic by default: Multi-step planning, tool use, and continuous monitoring in production workflows.
- Compliance at the core: Data scoping, policy guardrails, audit trails, and explainability built into the stack.
- Federated deployment: Central platform; local customization by strategy, asset class, and data permissions.
- Live feedback loops: User evaluations, outcome audits, and automated checks drive weekly model and workflow improvements.
Build your own AI research engine: a practical roadmap
- Stand up an Applied AI team with engineering, research, and domain expertise. Give them a clear mandate and executive air cover.
- Define evaluation criteria linked to business outcomes (forecast error, scenario accuracy, tool success rate, time-to-insight).
- Create a model marketplace: frontier LLMs + internal models, selected per task based on measured performance.
- Adopt an agent framework with planning, retrieval, tool execution, and verification steps. Treat agents like products with owners.
- Instrument everything: log chains, tool calls, sources, and reasoning summaries for auditability and debugging.
- Enforce compliance and security centrally: data scoping, PII controls, content policies, and approval workflows.
- Embed feedback in the UI: quick ratings, error flags, and "request improvement" paths to close the loop.
- Prioritize high-value workflows first: e.g., event-driven monitoring, scenario analysis, and document synthesis at scale.
- Ship weekly: small improvements to tools, prompts, and routing often beat big-bang releases.
- Track ROI: adoption %, cycle-time reduction, accuracy lift, and incident rate-review with leadership monthly.
Metrics that matter
- Adoption rate by team and role
- Time-to-insight reduction for priority workflows
- Forecast accuracy and backtest performance impact
- Tool execution success and hallucination rate
- Auditability: % of outputs with sources and reasoning summaries
- Compliance exceptions and remediation time
- User satisfaction (CSAT) and retained usage over 90 days
What's next on Balyasny's AI roadmap
Reinforcement Fine-Tuning (RFT) to sharpen behavior on complex, high-value tasks. Deeper agent orchestration across financial domains. Multimodal inputs-charts, statements, filings-to align models with how analysts actually work. And continuous evaluation of future frontier models for domain fit and security.
Resources
Your membership also unlocks: