Advances in Stochastic Control and Optimal Stopping with Applications in Economics and Finance / Avancées en contrôle stochastique et arrêt optimal avec applications à l'économie et à la finance

Collection Advances in Stochastic Control and Optimal Stopping with Applications in Economics and Finance / Avancées en contrôle stochastique et arrêt optimal avec applications à l'économie et à la finance

Organisateur(s) Buckdahn, Rainer ; Ferrari, Giorgio ; Grigorova, Miryana ; Quenez, Marie-Claire ; Riedel, Frank
Date(s) 12/09/2022 - 16/09/2022
URL associée https://conferences.cirm-math.fr/2600.html
00:00:00 / 00:00:00
3 4

Some recent progress for continuous-time reinforcement learning and regret analysis

De Xin Guo

Recently, reinforcement learning (RL) has attracted substantial research interests. Much of the attention and success, however, has been for the discrete time setting. Continuous-time RL, despite its natural analytical connection to stochastic controls, has been largely unexplored and with limited progress. In particular, characterizing sample efficiency for continuous-time RL algorithms with convergence rate remains a challenging and open problem. In this talk, we will discuss some recent advances in the convergence rate analysis for the episodic linear-convex RL problem, and report a regret bound of the order $O(\sqrt{N \ln N})$ for the greedy least-squares algorithm, with $N$ the number of episodes. The approach is probabilistic, involving establishing the stability of the associated forward-backward stochastic differential equation, studying the Lipschitz stability of feedback controls, and exploring the concentration properties of sub-Weibull random variables. In the special case of the linear-quadratic RL problem, the analysis reduces to the regularity and robustness of the associated Riccati equation and the sub-exponential properties of continuous-time least-squares estimators, which leads to a logarithmic regret.

Informations sur la vidéo

Données de citation

  • DOI 10.24350/CIRM.V.19959403
  • Citer cette vidéo Guo, Xin (13/09/2022). Some recent progress for continuous-time reinforcement learning and regret analysis. CIRM. Audiovisual resource. DOI: 10.24350/CIRM.V.19959403
  • URL https://dx.doi.org/10.24350/CIRM.V.19959403

Dernières questions liées sur MathOverflow

Pour poser une question, votre compte Carmin.tv doit être connecté à mathoverflow

Poser une question sur MathOverflow




Inscrivez-vous

  • Mettez des vidéos en favori
  • Ajoutez des vidéos à regarder plus tard &
    conservez votre historique de consultation
  • Commentez avec la communauté
    scientifique
  • Recevez des notifications de mise à jour
    de vos sujets favoris
Donner son avis