![Lagrangian, Eulerian and Kantorovich formulations of multi-agent optimal control problems](/media/cache/video_light/uploads/video/2024-04-08_Savare.mp4-eaded2b8967a4a380f8445686c288640-video-31ad5fbd74443fab58c6546facbd526a.jpg)
![](/assets/front/img/icon-video-play-7e3956a0b9.png)
Lagrangian, Eulerian and Kantorovich formulations of multi-agent optimal control problems
By Giuseppe Savaré
![Sums of squares approximations in polynomial optimization: performance analysis and degree bounds](/media/cache/video_light/uploads/video/Capture%20d%E2%80%99%C3%A9cran%202023-11-20%20%C3%A0%2011.54.38%20%282%29.png)
![](/assets/front/img/icon-video-play-7e3956a0b9.png)
Sums of squares approximations in polynomial optimization: performance analysis and degree bounds
By Monique Laurent
By Xin Guo
Appears in collection : Advances in Stochastic Control and Optimal Stopping with Applications in Economics and Finance / Avancées en contrôle stochastique et arrêt optimal avec applications à l'économie et à la finance
Recently, reinforcement learning (RL) has attracted substantial research interests. Much of the attention and success, however, has been for the discrete time setting. Continuous-time RL, despite its natural analytical connection to stochastic controls, has been largely unexplored and with limited progress. In particular, characterizing sample efficiency for continuous-time RL algorithms with convergence rate remains a challenging and open problem. In this talk, we will discuss some recent advances in the convergence rate analysis for the episodic linear-convex RL problem, and report a regret bound of the order $O(\sqrt{N \ln N})$ for the greedy least-squares algorithm, with $N$ the number of episodes. The approach is probabilistic, involving establishing the stability of the associated forward-backward stochastic differential equation, studying the Lipschitz stability of feedback controls, and exploring the concentration properties of sub-Weibull random variables. In the special case of the linear-quadratic RL problem, the analysis reduces to the regularity and robustness of the associated Riccati equation and the sub-exponential properties of continuous-time least-squares estimators, which leads to a logarithmic regret.