Mathematics, Signal Processing and Learning / Mathématiques, traitement du signal et apprentissage

Collection Mathematics, Signal Processing and Learning / Mathématiques, traitement du signal et apprentissage

Organisateur(s) Anthoine, Sandrine ; Chaux, Caroline ; Mélot, Clothilde ; Richard, Frédéric

Date(s) 25/01/2021 - 29/01/2021

URL associée https://conferences.cirm-math.fr/2472.html

00:00:00 / 00:00:00

16 16

Reinforcement learning - lecture 1

De Alessandro Lazaric

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part - Introduction to reinforcement learning (recent advances and current limitations) - How to model a RL problem: Markov decision processes (MDPs) - How to solve an MDP: Dynamic programming methods (value and policy iteration) - How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning) - How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO) - How to efficiently explore an MDP: from bandit to RL

Practical part - Simple example of value iteration and Q-learning - More advanced example with policy gradient - Simple bandit example for exploration - More advanced example for exploration in RL

Informations sur la vidéo

Date de captation 28/01/2021
Date de publication 22/02/2021
Institut CIRM
Licence CC BY NC ND
Langue Anglais
Audience Chercheurs
Réalisateur(s) Guillaume Hennenfent
Format MP4

Données de citation

DOI 10.24350/CIRM.V.19704903
Citer cette vidéo Lazaric, Alessandro (28/01/2021). Reinforcement learning - lecture 1. CIRM. Audiovisual resource. DOI: 10.24350/CIRM.V.19704903
URL https://dx.doi.org/10.24350/CIRM.V.19704903

Domaine(s)

Codes MSC

Dernières questions liées sur MathOverflow

Pour poser une question, votre compte Carmin.tv doit être connecté à mathoverflow

Poser une question sur MathOverflow

Toutes les vidéos de la collection

01:30:12

publiée le 22 février 2021

Optimization - lecture 1

De Nelly Pustelnik

01:30:32

publiée le 22 février 2021

Optimization - lecture 2

De Nelly Pustelnik

01:05:51

publiée le 22 février 2021

Signal processing tutorial - part 1

De Laurent Oudre

01:35:45

publiée le 22 février 2021

Reinforcement learning - lecture 2

De Alessandro Lazaric

01:16:09

publiée le 22 février 2021

Basics in machine learning - lecture 1

De Marianne Clausel

01:25:24

publiée le 22 février 2021

Basics in machine learning - practical session 2

De Marianne Clausel

01:08:16

publiée le 22 février 2021

Signal processing tutorial - part 2

De Laurent Oudre

01:19:59

publiée le 22 février 2021

Reinforcement learning - lecture 4

De Alessandro Lazaric

01:16:05

publiée le 22 février 2021

Basics in machine learning - lecture 2

De Marianne Clausel

01:31:47

publiée le 22 février 2021

One signal processing view on deep learning - lecture 1

De Edouard Oyallon

01:27:31

publiée le 22 février 2021

Optimization - lecture 3

De Nelly Pustelnik

01:37:05

publiée le 22 février 2021

Reinforcement learning - lecture 3

De Alessandro Lazaric

01:28:13

publiée le 22 février 2021

Basics in machine learning - practical session 1

De Marianne Clausel

01:24:01

publiée le 22 février 2021

One signal processing view on deep learning - lecture 2

De Edouard Oyallon

01:06:33

publiée le 22 février 2021

Optimization - lecture 4

De Nelly Pustelnik

01:28:26

publiée le 22 février 2021

Reinforcement learning - lecture 1

De Alessandro Lazaric

Copyright Carmin.tv 2026

Donner son avis