WritingBench: A Comprehensive Benchmark for Generative Writing

Published in NeurIPS 2025, 2025

We propose WritingBench, an open-source comprehensive benchmark covering 6 domains, 100 sub-domains, and 1239 queries for evaluating long-form creative writing. We design a dynamic evaluation framework achieving 83% human agreement, significantly outperforming static evaluation standards. Using this framework to filter high-quality training data, a 7B model can approach closed-source SOTA writing capabilities.

Recommended citation: Yuning Wu, Jiahao Mei, Ming Yan, et al. "WritingBench: A Comprehensive Benchmark for Generative Writing." NeurIPS, 2025.
Download Paper