Google DeepMind launches Gemini Omni, a video generation and editing model that takes multimodal inputs

Google launched Gemini Omni Flash, a video model that creates and edits clips through plain-text conversation using mixed inputs. It's available now to paid Gemini subscribers, with free access coming to YouTube Shorts this week.

Published on: May 22, 2026
Google DeepMind launches Gemini Omni, a video generation and editing model that takes multimodal inputs

Google Releases Gemini Omni, a Video Generation Model That Edits Through Conversation

Google has launched Gemini Omni Flash, a video generation model that creates and edits videos from text, images, audio and video inputs combined. The tool rolls out today to Google AI Plus, Pro and Ultra subscribers through the Gemini app and Google Flow, with free access launching this week on YouTube Shorts and YouTube Create.

The model represents a shift in how video editing works. Instead of clicking buttons and dragging timelines, users describe changes in natural language. A user can ask the system to "dim the lights" or "make the mirror ripple like liquid," and the model applies those edits while maintaining character consistency and physical accuracy across multiple turns.

How It Works

Gemini Omni combines multiple input types into a single output video. Users can reference an image, video clip and audio track in one prompt, and the model generates a new video that incorporates all three references cohesively.

The system handles iterative editing. A user can start with a video of a violinist, ask to move the violinist to a different environment, then make the violin invisible, then change the camera angle - each instruction builds on the previous one without losing the original scene's context.

Physics modeling is a core feature. The model demonstrates understanding of gravity, kinetic energy and fluid dynamics, allowing it to generate realistic motion in complex scenarios like a marble rolling through a chain reaction or fluid responding to forces.

Content Verification and Safety

All videos created with Gemini Omni include an imperceptible SynthID digital watermark. Users can verify whether a video was generated with the tool through the Gemini app, Gemini in Chrome and Google Search.

The company has restricted certain capabilities during this initial rollout. Users can create videos with their own voice through an avatar feature, but the ability to edit audio and speech in existing videos remains in testing. Google said it is working to understand how to bring that capability responsibly.

Availability and Rollout

Developers and enterprise customers will gain access via APIs in the coming weeks. Google plans to expand the Omni family to support image and audio generation as output modalities over time.

The launch builds on earlier multimodal work. Last year, Google added image generation and editing to Gemini, which the company said has been used by millions of people to restore photos and design from sketches.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)