Carlo Alfano

Applied Scientist

Amazon

Biography

I am an Applied Scientist at Amazon , working on training LLM-based evaluators.

I completed my PhD in the Department of Statistics at the University of Oxford, under the supervision of Patrick Rebeschini and George Deligiannidis. I was funded by EPSRC.

My research interests include reinforcement learning, LLM fine-tuning, optimization and learning theory. In particular, I focus on building and analyzing reinforcement learning algorithms using standard optimization tools, such as natural gradient descent and mirror descent.

Download my CV.

Interests

Reinforcement Learning
LLM fine-tuning
Optimization

Education

DPhil in Statistics, 2020-2025

University of Oxford
MSc in Statistical Sciences, 2019-2020

University of Oxford
BSc in Statistics, Economics and Finance, 2016-2019

Sapienza University of Rome

Publications

Carlo Alfano, Aymen Al Marjani, Zeno Jonke, Amin Mantrach, Saab Mansour, Marcello Federico (2026). Multilingual Self-Taught Faithfulness Evaluators. To appear in Findings of the Association for Computational Linguistics: EACL 2026.

PDF

Carlo Alfano, Silvia Sapora, Jakob Nicolaus Foerster, Patrick Rebeschini, Yee Whye Teh (2025). Meta-Learning Objectives for Preference Optimization. Advances in Neural Information Processing Systems (NeurIPS 2025).

PDF

Carlo Alfano, Sebastian Rene Towers, Silvia Sapora, Chris Lu, Patrick Rebeschini (2025). Learning mirror maps in policy mirror descent. International Conference on Learning Representations (ICLR 2025).

PDF

Carlo Alfano, Rui Yuan, Patrick Rebeschini (2023). A Novel Framework for Policy Mirror Descent with General Parametrization and Linear Convergence. Advances in Neural Information Processing Systems (NeurIPS 2023).

PDF

Carlo Alfano, Patrick Rebeschini (2022). Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization. arXiv preprint: 2209.15382.

PDF

Carlo Alfano, Patrick Rebeschini (2021). Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning. arXiv preprint: 2109.11692.

PDF

Experience

Applied Scientist

Amazon

Jun 2025 – Present Spain

Focused on training LLM-based evaluators for LLMs with synthetic data and reinforcement learning.

Applied Scientist Intern

Amazon

Sep 2024 – Feb 2025 Luxembourg

Focused on building LLM-based evaluators for LLM faithfulness.

Teaching Assistant

University of Oxford

Oct 2020 – Jun 2023 United Kingdom

Taught Courses:

Algorithmic Foundation of Learning
Advanced Simulation Methods

Supervisor

UNIQ+ DeepMind internship at the University of Oxford

Jun 2022 – Sep 2022 United Kingdom

Awards

G-Research Grant for PhD students and postdocs in quantitative fields

G-Research Sep 2023

EPSRC DTP full scholarship

University of Oxford Oct 2020 – Oct 2024

Full scholarship holder

Sapienza University of Rome Oct 2016 – Oct 2019

Honorable mention

National Italian Math Olympic games Apr 2016

Talks

Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization

We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-linear policy parametrizations in …

Sep 21, 2022 2:00 PM Oxford Mathematical Institute

Carlo Alfano