SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Chenyu Wang, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, and Bo Liu
preprint, 2025
Spotlight at the NeurIPS 2025 Efficient Reasoning Workshop