Google launches Gemini Omni to generate and edit video from text, images and audio

Google launched Gemini Omni at I/O 2026, an AI model that generates and edits videos from text, images, audio, and video inputs via natural language. Gemini Omni Flash is live now for AI Plus, Pro, and Ultra subscribers.

Categorized in: AI News Marketing
Published on: May 21, 2026
Google launches Gemini Omni to generate and edit video from text, images and audio

Google Rolls Out Gemini Omni for AI Video Generation With Conversational Editing

Google announced Gemini Omni at I/O 2026, a new AI model that generates editable videos from text, images, audio and video inputs through natural language instructions. The first version, Gemini Omni Flash, is now available to Google AI Plus, Pro and Ultra subscribers through the Gemini app, Google Flow and YouTube Shorts.

The model marks Google's expansion into multimodal video creation, combining Gemini's reasoning abilities with generative tools to produce video outputs. Support for additional formats including images and audio will roll out in coming months.

Conversational Editing Replaces Repeated Prompts

Gemini Omni uses conversational editing, allowing users to refine videos through multiple instructions without restarting the creative process. Characters remain consistent across scenes, and edits retain context from earlier prompts.

Users can alter environments, change actions, add objects or introduce new elements while maintaining scene continuity. The model applies broader physics understanding and contextual knowledge to create more realistic content.

Combining Multiple Media Types Into Single Output

The system accepts existing videos, images, sketches and audio files as references and transforms them into a single output. Gemini Omni draws on broader knowledge of history, science and cultural context to create explainers and visual storytelling formats alongside creative content.

Google also introduced avatar features allowing users to create digital versions of themselves using their own voice for AI-generated videos.

What This Means for Marketers

For marketing professionals, Gemini Omni reduces friction in video production workflows. Conversational editing eliminates the need to restart after each revision, cutting iteration time for campaign videos, product demos and social content.

The ability to combine multiple media types-existing brand assets, audio voiceovers, reference images-into a single coherent video simplifies asset repurposing. Avatar creation enables personalized video content at scale without requiring talent or production crews.

Learn more about generative video capabilities and explore AI for marketing applications.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)