Publications

(2025). Meta-Learning Objectives for Preference Optimization. arXiv preprint arXiv:2411.06568.

PDF

(2025). Learning mirror maps in policy mirror descent. To appear in International Conference on Learning Representations.

PDF

(2023). A Novel Framework for Policy Mirror Descent with General Parametrization and Linear Convergence. Advances in Neural Information Processing Systems.

PDF

(2022). Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization. arXiv preprint: 2209.15382.

PDF

(2021). Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning. arXiv preprint: 2109.11692.

PDF