MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio
Published in NAACL 2025, 2025
We propose MM-StoryAgent, an open-source multi-modal multi-agent story video generation framework that achieves high-quality immersive narrated storybook video generation through a multi-stage writing pipeline and collaboration of full-modality (image, speech, sound effects) expert agents. The project has received 85K+ visits on ModelScope.
Recommended citation: Xuenan Xu, Jiahao Mei, Chenliang Li, et al. "MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio." NAACL, 2025.
Download Paper
