From song to screen for the cost of an API call: AutoMV's open-source AI makes full-length music videos

AutoMV tackles long-form music-to-video with a multi-agent pipeline that keeps the story, timing, and characters on track. Open-source, it brings full-track videos within reach.

Categorized in: AI News Science and Research
Published on: Jan 07, 2026
From song to screen for the cost of an API call: AutoMV's open-source AI makes full-length music videos

Reimagining music videos with AI: EECS research breaks new ground

Published: 6 January 2026

Music-to-video generation has hit a wall: short clips look good, but long-form videos break down on story, timing, and character consistency. AutoMV, an open-source system led by researchers at Queen Mary University of London with partners at Beijing University of Posts and Telecommunications, Nanjing University, Hong Kong University of Science and Technology, and the University of Manchester, goes after that problem head-on. It's built to generate complete music videos directly from full-length songs.

The effort brings together music information retrieval, multimodal AI, and creative computing. The team includes Dr Emmanouil Benetos, PhD candidate Yinghao Ma, and collaborators Dr. Changjae Oh and Chaoran Zhu from the Centre for Intelligent Sensing.

Why long-form music videos are hard for AI

  • Maintaining narrative flow over several minutes without visual drift.
  • Staying aligned with beats, sections, and time-aligned lyrics.
  • Keeping characters, scenes, and visual identity consistent.
  • Planning shots that make sense across verses, choruses, and bridges.

How AutoMV works

AutoMV operates like a virtual production crew. It first analyzes the song's structure, beats, and time-aligned lyrics to build a timeline that matters for editing and narrative pacing.

Specialized agents then take over: a "screenwriter" drafts scene plans, a "director" translates them into visual prompts, and an "editor" assembles images and video clips into coherent sequences. A final verifier agent checks for coherence, identity consistency, and synchronization, and triggers regeneration when needed.

The result is a full-length music video that follows the song end-to-end while keeping story and visuals consistent.

What the evaluations show

Human expert assessments indicate AutoMV outperforms existing commercial tools on alignment, narrative flow, and character consistency. That closes the gap with professionally produced music videos.

There's also a cost shift: projects that typically require tens of thousands of pounds can move closer to the cost of an API call. That matters for independent musicians, educators, and small studios.

Open-source, reproducible, and ready for contributions

AutoMV is open-source and built for transparent, reproducible research. The team is inviting contributions on the codebase, benchmarking, and extensions to long-form, multimodal workflows.

Who this can help right now

  • Independent musicians and labels: Produce narrative videos for full tracks without a large budget.
  • Educators: Build classroom material that links music structure to visual storytelling.
  • Research labs: Test multi-agent planning, long-horizon control, and MIR-driven alignment on a real creative task.
  • HCI and evaluation teams: Study human preferences, editing criteria, and perceptual metrics for long-form generative video.

Technical angles worth exploring

  • Structure-aware planning: Using beat and section analysis to guide shot lists and transitions.
  • Identity preservation: Keeping character visual traits stable across scenes and costume changes.
  • Verifier loops: Automated checks for timing, continuity, and re-generation thresholds.
  • Evaluation protocols: Designing human-in-the-loop rating schemes and MIR-aligned quantitative metrics.

If you work in music information retrieval, AutoMV is a practical testbed for structure-driven generation. For community context on MIR research, see ISMIR.

Get started

Read the paper, clone the repository, and try a full-track workflow. Share issues and benchmarks to help the community push longer, more coherent, and more aligned multi-agent generation.

Exploring tooling around generative video for your lab or studio? Here's a curated overview of options: Generative Video Tools.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide