Google Releases Gemini Omni, a Video Generation Model That Edits Through Conversation
Google has launched Gemini Omni Flash, a video generation model that creates and edits videos from text, images, audio and video inputs combined. The tool rolls out today to Google AI Plus, Pro and Ultra subscribers through the Gemini app and Google Flow, with free access launching this week on YouTube Shorts and YouTube Create.
The model represents a shift in how video editing works. Instead of clicking buttons and dragging timelines, users describe changes in natural language. A user can ask the system to "dim the lights" or "make the mirror ripple like liquid," and the model applies those edits while maintaining character consistency and physical accuracy across multiple turns.
How It Works
Gemini Omni combines multiple input types into a single output video. Users can reference an image, video clip and audio track in one prompt, and the model generates a new video that incorporates all three references cohesively.
The system handles iterative editing. A user can start with a video of a violinist, ask to move the violinist to a different environment, then make the violin invisible, then change the camera angle - each instruction builds on the previous one without losing the original scene's context.
Physics modeling is a core feature. The model demonstrates understanding of gravity, kinetic energy and fluid dynamics, allowing it to generate realistic motion in complex scenarios like a marble rolling through a chain reaction or fluid responding to forces.
Content Verification and Safety
All videos created with Gemini Omni include an imperceptible SynthID digital watermark. Users can verify whether a video was generated with the tool through the Gemini app, Gemini in Chrome and Google Search.
The company has restricted certain capabilities during this initial rollout. Users can create videos with their own voice through an avatar feature, but the ability to edit audio and speech in existing videos remains in testing. Google said it is working to understand how to bring that capability responsibly.
Availability and Rollout
Developers and enterprise customers will gain access via APIs in the coming weeks. Google plans to expand the Omni family to support image and audio generation as output modalities over time.
The launch builds on earlier multimodal work. Last year, Google added image generation and editing to Gemini, which the company said has been used by millions of people to restore photos and design from sketches.
Your membership also unlocks: