Researchers Develop AI System for Text-Driven Video Editing
Researchers at the Katz School of Science and Health have built an AI system that edits and generates videos from written instructions. The system, called 4EV, lets users type commands like "make the car drive left" or "change the background to a beach," and the AI adjusts the video accordingly. The work was published in IEEE Access.
Video editing with AI is fundamentally harder than image generation. A video is hundreds or thousands of frames played in sequence. The system must keep objects consistent across frames and make motion look natural-a task where human perception is unforgiving.
The 4EV system uses generative video technology built on diffusion models, the same class of tools behind image generators like Stable Diffusion. But the researchers added a critical layer: spatial-temporal attention, which teaches the AI to track both what appears in each frame and how objects move between frames.
The team created a custom training dataset called Motion4EV, containing videos paired with text descriptions of movement. This exposed the model to diverse motion patterns-objects following paths, zooming, changing direction-so it could learn motion dynamics.
A second innovation lets the system preserve motion from the original video while applying text-based changes. You could turn a bicycle into a motorcycle while keeping the exact same motion path. The researchers used a technique called attention map injection to ensure the AI modifies the right objects without disrupting the rest of the scene.
When tested against competing systems like Text2LIVE and CogVideo, 4EV scored higher on metrics measuring how closely the output matched the written prompt and how smoothly motion was preserved.
Practical Applications
Film and media production could use video editing tools like this to speed up visual effects and pre-production work. Game designers might generate animated scenes quickly. Teachers could create customized visual demonstrations by writing a prompt. The technology could also support virtual reality environments that change dynamically based on user input.
The research is still in development, but it demonstrates how AI can make video creation more accessible to people without technical editing expertise.
Your membership also unlocks: