Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2 
publications
EMID: An Emotional Aligned Dataset in Audio-Visual Modality
Published in ACM MM Workshop 2023, 2023
A high-quality music-image cross-modal matching dataset (30K+ pairs) using emotional consistency as the primary basis for cross-modal alignment.
Recommended citation: Jialing Zou*, Jiahao Mei*, Guangze Ye, et al. "EMID: An Emotional Aligned Dataset in Audio-Visual Modality." ACM MM Workshop, 2023.
Download Paper
TEAdapter: Supply Vivid Guidance for Controllable Text-to-Music Generation
Published in IEEE ICME 2024, 2024
A lightweight plugin for diffusion models enabling fine-grained controllable music generation through chord, melody, and instrument features extraction.
Recommended citation: Jialing Zou*, Jiahao Mei*, Xudong Nan, et al. "TEAdapter: Supply Vivid Guidance for Controllable Text-to-Music Generation." IEEE ICME, 2024.
Download Paper
DashengTokenizer: One Layer is Enough for Unified Audio Understanding and Generation
Published in arXiv preprint, 2025
A unified continuous audio tokenizer for audio understanding and generation, outperforming mainstream codec/tokenizer baselines on speech, music, and environmental sound tasks.
Recommended citation: Heinrich Dinkel, Xingwei Sun, Gang Li, Jiahao Mei, et al. "DashengTokenizer: One Layer is Enough for Unified Audio Understanding and Generation." arXiv, 2025.
Download Paper
MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio
Published in NAACL 2025, 2025
An open-source multi-modal multi-agent story video generation framework achieving automated immersive narrated storybook video generation. 85K+ visits on ModelScope.
Recommended citation: Xuenan Xu, Jiahao Mei, Chenliang Li, et al. "MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio." NAACL, 2025.
Download Paper
UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities
Published in arXiv preprint, 2025
The first fully open-source unified audio generation framework based on Flow Matching, with a novel Dual-Fusion mechanism supporting text, audio, and video inputs across 7 tasks.
Recommended citation: Xuenan Xu*, Jiahao Mei*, Zihao Zheng, et al. "UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities." arXiv, 2025.
Download Paper
LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment
Published in arXiv preprint, 2025
A latent affective representation alignment mechanism for continuous fine-grained emotion control in music generation using valence-arousal values.
Recommended citation: Jiahao Mei, Xuenan Xu, Zeyu Xie, et al. "LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment." arXiv, 2025.
Download Paper
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
Published in arXiv preprint, 2025
A unified text-to-audio framework for general audio scene generation, enabling end-to-end collaborative generation of speech, music, sound effects, and environmental sounds.
Recommended citation: Jiahao Mei, Heinrich Dinkel, Yadong Niu, et al. "Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text." arXiv, 2025.
Download Paper
WritingBench: A Comprehensive Benchmark for Generative Writing
Published in NeurIPS 2025, 2025
A comprehensive benchmark covering 6 domains, 100 sub-domains, and 1239 queries for long-form creative writing evaluation, with a dynamic evaluation framework achieving 83% human agreement.
Recommended citation: Yuning Wu, Jiahao Mei, Ming Yan, et al. "WritingBench: A Comprehensive Benchmark for Generative Writing." NeurIPS, 2025.
Download Paper
Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation
Published in ACM MM Workshop 2025, 2025
A Vision Mamba and TripleLabel mechanism-based Chinese calligraphy generation model trained on 1.9M+ calligraphy images.
Recommended citation: Kaiyuan Liu, Jiahao Mei, Hengyu Zhang, et al. "Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation." ACM MM Workshop, 2025.
Download Paper
talks
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.
