Catalog AI: How Abhishek Agrawal Made Amazon Search Feel Smarter-and What Product Teams Can Learn
If Amazon feels easier lately-clearer titles, richer images, better suggestions-you're noticing the output of Catalog AI. Built under the leadership of Abhishek Agrawal at Amazon Web Services, the system updates product listings with cleaner structure and richer detail. That upgrade feeds directly into predictive search, which now suggests relevant items in real time as you type.
For product developers, the story behind Catalog AI is a playbook: standardize data, instrument feedback loops, ship experiments early, scale with automation when manual work hits its ceiling.
The Problem: Messy Catalogs Create Friction
Amazon's catalog relied on third-party sellers who entered inconsistent data. Titles varied, specs were incomplete, and product attributes were all over the place. That noise fed into search and confused shoppers.
Agrawal's team tackled the root cause. They created a shared glossary of attributes-dimensions, color, manufacturer-and auto-suggested the standard language as sellers typed. Consistency went up. Search got cleaner.
From Glossary to Predictive Search
Once the data was structured, the team wired it into search. Type "red mixer," and real products appear under the search bar that actually match your intent. The loop is tight: structured inputs → clearer listings → better suggestions → faster decisions.
Scaling Up With LLMs
Manual normalization couldn't keep pace with the size of the catalog. In 2023, the team built an AI layer to ingest product information from across the Web and use large language models to fix titles, fill in missing specs, and correct errors. Listings became more complete and more consistent, at scale.
According to a July report, Amazon projected the system could lift sales this year by roughly US $7.5 billion. Source.
Earlier Lessons: Building Bing With Scarce Data
Before Amazon, Agrawal helped turn Microsoft's Live Search into Bing. The team lacked enough user data for reliable machine learning, especially for local queries. So they leaned on deterministic algorithms to extract structured signals like locations, dates, and prices.
He shipped a query clarifier to help users refine intent, then ranked results from most to least relevant. To validate improvements, the team built an online A/B experimentation platform that scored user engagement and performance across variants. That platform later scaled across Microsoft products.
Shipping Culture: Experimentation First
When Agrawal moved to Microsoft's Seattle campus, teams were shipping features and checking results after launch. He pushed for the experimentation platform to become a company-wide gate-test first, ship second. Within six months, it was in place and releases stabilized.
On Microsoft Teams, he addressed notification overload with "Trending," which surfaced the five most important messages. He also led early feature work for emoji reactions, screen sharing, and video calls-small quality improvements that add up when measured and iterated.
Career Snapshot
- Roots: Grew up in Chirgaon, Uttar Pradesh, India. First saw a computer at the Indian Statistical Institute.
- Education: University of Allahabad (statistics) and Indian Statistical Institute (statistics and later computer science). Early exposure to fuzzy c-means clustering for medical imaging sparked his interest in AI.
- Industry: Novell (file sync), Microsoft (OS upgrades, Bing, experimentation platform, Teams), Amazon (Catalog AI).
- Community: IEEE senior member and active volunteer in the Seattle Section, organizing workshops on building autonomous AI agents and contributing as a peer reviewer.
Product Takeaways You Can Use
- Standardize language before you optimize search. Build a shared glossary and suggest it in the authoring UI. Garbage in, garbage out is real.
- When data is scarce, structure it. Deterministic rules and clarifiers can bootstrap quality until you have enough signals for machine learning.
- Make experimentation a default, not a project. Invest in an A/B platform that gates releases, tracks guardrail metrics, and auto-scores variants.
- Design for the moment of intent. Predictive suggestions should be driven by structured attributes that mirror how users think (color, size, brand, use case).
- Automate cleanup at scale with LLMs-add guardrails. Use models to fill gaps and fix errors, but constrain outputs with your schema and glossary.
- Reduce cognitive load, then add features. Prioritize the alerts or content that matter most; polish comes after clarity.
A 30-60-90 Plan to Ship Your Own "Catalog AI"
- Days 0-30: Audit your catalog or content model. Define a controlled vocabulary for core attributes. Add inline suggestions to your authoring tools. Instrument baseline metrics (completion rates, search CTR, conversion, returns).
- Days 31-60: Wire the glossary into search and recommendations. Add a query clarifier for ambiguous intent. Stand up an A/B platform with automated scorecards and guardrails.
- Days 61-90: Pilot an LLM service that rewrites titles, fills missing specs, and flags inconsistencies. Constrain outputs with your schema. Run staged experiments, then ramp by segment.
Why This Works
You reduce entropy at the source, then feed that order into search and recommendations. As structure improves, each new feature compounds the gains. Experiments keep you honest, and AI scales what your team can't do by hand.
Further Resources
- Overview of controlled online experiments (A/B testing): Wikipedia
- Coverage of Amazon's projected sales lift from Catalog AI: Fox News
Level Up Your Team's AI Skills
If you're a product lead or engineer building with LLMs and search, structured learning helps. See curated options by role: Complete AI Training - Courses by Job.
Your membership also unlocks: