Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

EMID: An Emotional Aligned Dataset in Audio-Visual Modality

Published in ACM MM Workshop 2023, 2023

A high-quality music-image cross-modal matching dataset (30K+ pairs) using emotional consistency as the primary basis for cross-modal alignment.

Recommended citation: Jialing Zou*, Jiahao Mei*, Guangze Ye, et al. "EMID: An Emotional Aligned Dataset in Audio-Visual Modality." ACM MM Workshop, 2023.
Download Paper

TEAdapter: Supply Vivid Guidance for Controllable Text-to-Music Generation

Published in IEEE ICME 2024, 2024

A lightweight plugin for diffusion models enabling fine-grained controllable music generation through chord, melody, and instrument features extraction.

Recommended citation: Jialing Zou*, Jiahao Mei*, Xudong Nan, et al. "TEAdapter: Supply Vivid Guidance for Controllable Text-to-Music Generation." IEEE ICME, 2024.
Download Paper

DashengTokenizer: One Layer is Enough for Unified Audio Understanding and Generation

Published in arXiv preprint, 2025

A unified continuous audio tokenizer for audio understanding and generation, outperforming mainstream codec/tokenizer baselines on speech, music, and environmental sound tasks.

Recommended citation: Heinrich Dinkel, Xingwei Sun, Gang Li, Jiahao Mei, et al. "DashengTokenizer: One Layer is Enough for Unified Audio Understanding and Generation." arXiv, 2025.
Download Paper

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

Published in NAACL 2025, 2025

An open-source multi-modal multi-agent story video generation framework achieving automated immersive narrated storybook video generation. 85K+ visits on ModelScope.

Recommended citation: Xuenan Xu, Jiahao Mei, Chenliang Li, et al. "MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio." NAACL, 2025.
Download Paper

UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities

Published in arXiv preprint, 2025

The first fully open-source unified audio generation framework based on Flow Matching, with a novel Dual-Fusion mechanism supporting text, audio, and video inputs across 7 tasks.

Recommended citation: Xuenan Xu*, Jiahao Mei*, Zihao Zheng, et al. "UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities." arXiv, 2025.
Download Paper

LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment

Published in arXiv preprint, 2025

A latent affective representation alignment mechanism for continuous fine-grained emotion control in music generation using valence-arousal values.

Recommended citation: Jiahao Mei, Xuenan Xu, Zeyu Xie, et al. "LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment." arXiv, 2025.
Download Paper

Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text

Published in arXiv preprint, 2025

A unified text-to-audio framework for general audio scene generation, enabling end-to-end collaborative generation of speech, music, sound effects, and environmental sounds.

Recommended citation: Jiahao Mei, Heinrich Dinkel, Yadong Niu, et al. "Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text." arXiv, 2025.
Download Paper

WritingBench: A Comprehensive Benchmark for Generative Writing

Published in NeurIPS 2025, 2025

A comprehensive benchmark covering 6 domains, 100 sub-domains, and 1239 queries for long-form creative writing evaluation, with a dynamic evaluation framework achieving 83% human agreement.

Recommended citation: Yuning Wu, Jiahao Mei, Ming Yan, et al. "WritingBench: A Comprehensive Benchmark for Generative Writing." NeurIPS, 2025.
Download Paper

Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Published in ACM MM Workshop 2025, 2025

A Vision Mamba and TripleLabel mechanism-based Chinese calligraphy generation model trained on 1.9M+ calligraphy images.

Recommended citation: Kaiyuan Liu, Jiahao Mei, Hengyu Zhang, et al. "Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation." ACM MM Workshop, 2025.
Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

Jiahao Mei (梅嘉豪)

Sitemap

Pages

Page Not Found

About

Posts by Category

Posts by Collection

CV

CV

Page Archive

Publications

Sitemap

Posts by Tags

Blog posts

关于我

Markdown Generator

Posts

Future Blog Post

Blog Post number 4

Blog Post number 3

Blog Post number 2

Blog Post number 1

portfolio

Portfolio item number 1

Portfolio item number 2

publications

EMID: An Emotional Aligned Dataset in Audio-Visual Modality

TEAdapter: Supply Vivid Guidance for Controllable Text-to-Music Generation

DashengTokenizer: One Layer is Enough for Unified Audio Understanding and Generation

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities

LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment

Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text

WritingBench: A Comprehensive Benchmark for Generative Writing

Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

talks

teaching

Teaching experience 1

Teaching experience 2