Google DeepMind launches Gemini Omni, a video generation and editing model that takes multimodal inputs

Google launched Gemini Omni Flash, a video model that creates and edits clips through plain-text conversation using mixed inputs. It's available now to paid Gemini subscribers, with free access coming to YouTube Shorts this week.

Google Releases Gemini Omni, a Video Generation Model That Edits Through Conversation

Google has launched Gemini Omni Flash, a video generation model that creates and edits videos from text, images, audio and video inputs combined. The tool rolls out today to Google AI Plus, Pro and Ultra subscribers through the Gemini app and Google Flow, with free access launching this week on YouTube Shorts and YouTube Create.

The model represents a shift in how video editing works. Instead of clicking buttons and dragging timelines, users describe changes in natural language. A user can ask the system to "dim the lights" or "make the mirror ripple like liquid," and the model applies those edits while maintaining character consistency and physical accuracy across multiple turns.

How It Works

Gemini Omni combines multiple input types into a single output video. Users can reference an image, video clip and audio track in one prompt, and the model generates a new video that incorporates all three references cohesively.

The system handles iterative editing. A user can start with a video of a violinist, ask to move the violinist to a different environment, then make the violin invisible, then change the camera angle - each instruction builds on the previous one without losing the original scene's context.

Physics modeling is a core feature. The model demonstrates understanding of gravity, kinetic energy and fluid dynamics, allowing it to generate realistic motion in complex scenarios like a marble rolling through a chain reaction or fluid responding to forces.

Content Verification and Safety

All videos created with Gemini Omni include an imperceptible SynthID digital watermark. Users can verify whether a video was generated with the tool through the Gemini app, Gemini in Chrome and Google Search.

The company has restricted certain capabilities during this initial rollout. Users can create videos with their own voice through an avatar feature, but the ability to edit audio and speech in existing videos remains in testing. Google said it is working to understand how to bring that capability responsibly.

Availability and Rollout

Developers and enterprise customers will gain access via APIs in the coming weeks. Google plans to expand the Omni family to support image and audio generation as output modalities over time.

The launch builds on earlier multimodal work. Last year, Google added image generation and editing to Gemini, which the company said has been used by millions of people to restore photos and design from sketches.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Google DeepMind launches Gemini Omni, a video generation and editing model that takes multimodal inputs

Google Releases Gemini Omni, a Video Generation Model That Edits Through Conversation

How It Works

Content Verification and Safety

Availability and Rollout

Related AI News for Product Development Professionals

Google DeepMind launches Gemini Omni, a video generation and editing model that takes multimodal inputs

Hyundai Motor Group plans AI simulation of virtual consumers to test unreleased products

Stability AI releases audio models that generate music up to six minutes long using fully licensed data

Autodesk expands AI tools to drive revenue growth as shares lag broader market

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: