From song to screen for the cost of an API call: AutoMV's open-source AI makes full-length music videos

AutoMV tackles long-form music-to-video with a multi-agent pipeline that keeps the story, timing, and characters on track. Open-source, it brings full-track videos within reach.

Categorized in: AI News Science and Research

Published on: Jan 07, 2026

Reimagining music videos with AI: EECS research breaks new ground

Published: 6 January 2026

Music-to-video generation (a form of Generative Video) has hit a wall: short clips look good, but long-form videos break down on story, timing, and character consistency. AutoMV, an open-source system led by researchers at Queen Mary University of London with partners at Beijing University of Posts and Telecommunications, Nanjing University, Hong Kong University of Science and Technology, and the University of Manchester, goes after that problem head-on. It's built to generate complete music videos directly from full-length songs.

The effort brings together music information retrieval, multimodal AI, and creative computing, and reflects active Research into long-form multimodal generation. The team includes Dr Emmanouil Benetos, PhD candidate Yinghao Ma, and collaborators Dr. Changjae Oh and Chaoran Zhu from the Centre for Intelligent Sensing.

Why long-form music videos are hard for AI

Maintaining narrative flow over several minutes without visual drift.
Staying aligned with beats, sections, and time-aligned lyrics.
Keeping characters, scenes, and visual identity consistent.
Planning shots that make sense across verses, choruses, and bridges.

How AutoMV works

AutoMV operates like a virtual production crew. It first analyzes the song's structure, beats, and time-aligned lyrics to build a timeline that matters for editing and narrative pacing.

Specialized agents then take over: a "screenwriter" drafts scene plans, a "director" translates them into visual prompts, and an "editor" assembles images and video clips into coherent sequences. A final verifier agent checks for coherence, identity consistency, and synchronization, and triggers regeneration when needed.

The result is a full-length music video that follows the song end-to-end while keeping story and visuals consistent.

What the evaluations show

Human expert assessments indicate AutoMV outperforms existing commercial tools on alignment, narrative flow, and character consistency. That closes the gap with professionally produced music videos.

There's also a cost shift: projects that typically require tens of thousands of pounds can move closer to the cost of an API call. That matters for independent musicians, educators, and small studios.

Open-source, reproducible, and ready for contributions

AutoMV is open-source and built for transparent, reproducible research. The team is inviting contributions on the codebase, benchmarking, and extensions to long-form, multimodal workflows.

Who this can help right now

Independent musicians and labels: Produce narrative videos for full tracks without a large budget.
Educators: Build classroom material that links music structure to visual storytelling.
Research labs: Test multi-agent planning, long-horizon control, and MIR-driven alignment on a real creative task.
HCI and evaluation teams: Study human preferences, editing criteria, and perceptual metrics for long-form generative video.

Technical angles worth exploring

Structure-aware planning: Using beat and section analysis to guide shot lists and transitions.
Identity preservation: Keeping character visual traits stable across scenes and costume changes.
Verifier loops: Automated checks for timing, continuity, and re-generation thresholds.
Evaluation protocols: Designing human-in-the-loop rating schemes and MIR-aligned quantitative metrics.

If you work in music information retrieval, AutoMV is a practical testbed for structure-driven generation. For community context on MIR research, see ISMIR.

Get started

Read the paper, clone the repository, and try a full-track workflow. Share issues and benchmarks to help the community push longer, more coherent, and more aligned multi-agent generation.

Exploring tooling and Design around generative video for your lab or studio? Here's a curated overview of options: Generative Video Tools.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

From song to screen for the cost of an API call: AutoMV's open-source AI makes full-length music videos

Reimagining music videos with AI: EECS research breaks new ground

Why long-form music videos are hard for AI

How AutoMV works

What the evaluations show

Open-source, reproducible, and ready for contributions

Who this can help right now

Technical angles worth exploring

Get started

Related AI News for Science and Research

U of T and AMD launch AI and computing research hub with cybersecurity in focus

UK launches £40m AI lab to tackle hallucinations and build trust

China puts AI at the heart of science, with AGI in its sights

AI outpaces PhDs in research - and academia scrambles to keep up

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: