00:00:00 / 00:00:00
3 4

Some recent progress for continuous-time reinforcement learning and regret analysis

By Xin Guo

Recently, reinforcement learning (RL) has attracted substantial research interests. Much of the attention and success, however, has been for the discrete time setting. Continuous-time RL, despite its natural analytical connection to stochastic controls, has been largely unexplored and with limited progress. In particular, characterizing sample efficiency for continuous-time RL algorithms with convergence rate remains a challenging and open problem. In this talk, we will discuss some recent advances in the convergence rate analysis for the episodic linear-convex RL problem, and report a regret bound of the order $O(\sqrt{N \ln N})$ for the greedy least-squares algorithm, with $N$ the number of episodes. The approach is probabilistic, involving establishing the stability of the associated forward-backward stochastic differential equation, studying the Lipschitz stability of feedback controls, and exploring the concentration properties of sub-Weibull random variables. In the special case of the linear-quadratic RL problem, the analysis reduces to the regularity and robustness of the associated Riccati equation and the sub-exponential properties of continuous-time least-squares estimators, which leads to a logarithmic regret.

Information about the video

Citation data

  • DOI 10.24350/CIRM.V.19959403
  • Cite this video Guo Xin (9/13/22). Some recent progress for continuous-time reinforcement learning and regret analysis. CIRM. Audiovisual resource. DOI: 10.24350/CIRM.V.19959403
  • URL https://dx.doi.org/10.24350/CIRM.V.19959403

Last related questions on MathOverflow

You have to connect your Carmin.tv account with mathoverflow to add question

Ask a question on MathOverflow


  • Bookmark videos
  • Add videos to see later &
    keep your browsing history
  • Comment with the scientific
  • Get notification updates
    for your favorite subjects
Give feedback