Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
Published in ACM MM Workshop 2023, 2023
A high-quality music-image cross-modal matching dataset (30K+ pairs) using emotional consistency as the primary basis for cross-modal alignment.
Recommended citation: Jialing Zou*, Jiahao Mei*, Guangze Ye, et al. "EMID: An Emotional Aligned Dataset in Audio-Visual Modality." ACM MM Workshop, 2023.
Download Paper
Published in IEEE ICME 2024, 2024
A lightweight plugin for diffusion models enabling fine-grained controllable music generation through chord, melody, and instrument features extraction.
Recommended citation: Jialing Zou*, Jiahao Mei*, Xudong Nan, et al. "TEAdapter: Supply Vivid Guidance for Controllable Text-to-Music Generation." IEEE ICME, 2024.
Download Paper
Published in arXiv preprint, 2025
A unified continuous audio tokenizer for audio understanding and generation, outperforming mainstream codec/tokenizer baselines on speech, music, and environmental sound tasks.
Recommended citation: Heinrich Dinkel, Xingwei Sun, Gang Li, Jiahao Mei, et al. "DashengTokenizer: One Layer is Enough for Unified Audio Understanding and Generation." arXiv, 2025.
Download Paper
Published in NAACL 2025, 2025
An open-source multi-modal multi-agent story video generation framework achieving automated immersive narrated storybook video generation. 85K+ visits on ModelScope.
Recommended citation: Xuenan Xu, Jiahao Mei, Chenliang Li, et al. "MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio." NAACL, 2025.
Download Paper
Published in arXiv preprint, 2025
The first fully open-source unified audio generation framework based on Flow Matching, with a novel Dual-Fusion mechanism supporting text, audio, and video inputs across 7 tasks.
Recommended citation: Xuenan Xu*, Jiahao Mei*, Zihao Zheng, et al. "UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities." arXiv, 2025.
Download Paper
Published in arXiv preprint, 2025
A latent affective representation alignment mechanism for continuous fine-grained emotion control in music generation using valence-arousal values.
Recommended citation: Jiahao Mei, Xuenan Xu, Zeyu Xie, et al. "LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment." arXiv, 2025.
Download Paper
Published in arXiv preprint, 2025
A unified text-to-audio framework for general audio scene generation, enabling end-to-end collaborative generation of speech, music, sound effects, and environmental sounds.
Recommended citation: Jiahao Mei, Heinrich Dinkel, Yadong Niu, et al. "Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text." arXiv, 2025.
Download Paper
Published in NeurIPS 2025, 2025
A comprehensive benchmark covering 6 domains, 100 sub-domains, and 1239 queries for long-form creative writing evaluation, with a dynamic evaluation framework achieving 83% human agreement.
Recommended citation: Yuning Wu, Jiahao Mei, Ming Yan, et al. "WritingBench: A Comprehensive Benchmark for Generative Writing." NeurIPS, 2025.
Download Paper
Published in ACM MM Workshop 2025, 2025
A Vision Mamba and TripleLabel mechanism-based Chinese calligraphy generation model trained on 1.9M+ calligraphy images.
Recommended citation: Kaiyuan Liu, Jiahao Mei, Hengyu Zhang, et al. "Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation." ACM MM Workshop, 2025.
Download Paper
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.