Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

EMID: An Emotional Aligned Dataset in Audio-Visual Modality

Published in ACM MM Workshop 2023, 2023

A high-quality music-image cross-modal matching dataset (30K+ pairs) using emotional consistency as the primary basis for cross-modal alignment.

Recommended citation: Jialing Zou*, Jiahao Mei*, Guangze Ye, et al. "EMID: An Emotional Aligned Dataset in Audio-Visual Modality." ACM MM Workshop, 2023.
Download Paper

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

Published in NAACL 2025, 2025

An open-source multi-modal multi-agent story video generation framework achieving automated immersive narrated storybook video generation. 85K+ visits on ModelScope.

Recommended citation: Xuenan Xu, Jiahao Mei, Chenliang Li, et al. "MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio." NAACL, 2025.
Download Paper

UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities

Published in arXiv preprint, 2025

The first fully open-source unified audio generation framework based on Flow Matching, with a novel Dual-Fusion mechanism supporting text, audio, and video inputs across 7 tasks.

Recommended citation: Xuenan Xu*, Jiahao Mei*, Zihao Zheng, et al. "UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities." arXiv, 2025.
Download Paper

LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment

Published in arXiv preprint, 2025

A latent affective representation alignment mechanism for continuous fine-grained emotion control in music generation using valence-arousal values.

Recommended citation: Jiahao Mei, Xuenan Xu, Zeyu Xie, et al. "LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment." arXiv, 2025.
Download Paper

WritingBench: A Comprehensive Benchmark for Generative Writing

Published in NeurIPS 2025, 2025

A comprehensive benchmark covering 6 domains, 100 sub-domains, and 1239 queries for long-form creative writing evaluation, with a dynamic evaluation framework achieving 83% human agreement.

Recommended citation: Yuning Wu, Jiahao Mei, Ming Yan, et al. "WritingBench: A Comprehensive Benchmark for Generative Writing." NeurIPS, 2025.
Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.