Ecoles de recherche

Collection Ecoles de recherche

00:00:00 / 00:00:00

50 97

Optimal vector quantization: from signal processing to clustering and numerical probability

De Gilles Pagès

Apparaît également dans la collection : CEMRACS - Summer school: Numerical methods for stochastic models: control, uncertainty quantification, mean-field / CEMRACS - École d'été : Méthodes numériques pour équations stochastiques : contrôle, incertitude, champ moyen

Optimal vector quantization has been originally introduced in Signal processing as a discretization method of random signals, leading to an optimal trade-off between the speed of transmission and the quality of the transmitted signal. In machine learning, similar methods applied to a dataset are the historical core of unsupervised classification methods known as “clustering”. In both case it appears as an optimal way to produce a set of weighted prototypes (or codebook) which makes up a kind of skeleton of a dataset, a signal and more generally, from a mathematical point of view, of a probability distribution. Quantization has encountered in recent years a renewed interest in various application fields like automatic classification, learning algorithms, optimal stopping and stochastic control, Backward SDEs and more generally numerical probability. In all these various applications, practical implementation of such clustering/quantization methods more or less rely on two procedures (and their countless variants): the Competitive Learning Vector Quantization $(CLV Q)$ which appears as a stochastic gradient descent derived from the so-called distortion potential and the (randomized) Lloyd's procedure (also known as k- means algorithm, nu ees dynamiques) which is but a fixed point search procedure. Batch version of those procedures can also be implemented when dealing with a dataset (or more generally a discrete distribution). In a more formal form, if is probability distribution on an Euclidean space $\mathbb{R}^d$, the optimal quantization problem at level $N$ boils down to exhibiting an $N$-tuple $(x_{1}^{²}, . . . , x_{N}^{²})$, solution to

argmin$_{(x1,\dotsb,x_N)\epsilon(\mathbb{R}^d)^N} \int_{\mathbb{R}^d 1\le i\le N} \min |x_i-\xi|^2 \mu(d\xi)$

and its distribution i.e. the weights $(\mu(C(x_{i}^{²}))_{1\le i\le N}$ where $(C(x_{i}^{²})$ is a (Borel) partition of $\mathbb{R}^d$ satisfying

$C(x_{i}^{²})\subset \lbrace\xi\epsilon\mathbb{R}^d :|x_{i}^{²} -\xi|\le_{1\le j\le N} \min |x_{j}^{²}-\xi|\rbrace$.

To produce an unsupervised classification (or clustering) of a (large) dataset $(\xi_k)_{1\le k\le n}$, one considers its empirical measure

$\mu=\frac{1}{n}\sum_{k=1}^{n}\delta_{\xi k}$

whereas in numerical probability $\mu = \mathcal{L}(X)$ where $X$ is an $\mathbb{R}^d$-valued simulatable random vector. In both situations, $CLV Q$ and Lloyd's procedures rely on massive sampling of the distribution $\mu$. As for clustering, the classification into $N$ clusters is produced by the partition of the dataset induced by the Voronoi cells $C(x_{i}^{²}), i = 1, \dotsb, N$ of the optimal quantizer. In this second case, which is of interest for solving non linear problems like Optimal stopping problems (variational inequalities in terms of PDEs) or Stochastic control problems (HJB equations) in medium dimensions, the idea is to produce a quantization tree optimally fitting the dynamics of (a time discretization) of the underlying structure process. We will explore (briefly) this vast panorama with a focus on the algorithmic aspects where few theoretical results coexist with many heuristics in a burgeoning literature. We will present few simulations in two dimensions.

Informations sur la vidéo

Date de captation 19/07/2017
Date de publication 31/07/2017
Institut CIRM
Licence CC BY NC ND
Langue Anglais
Audience Chercheurs, Doctorants
Réalisateur(s) Guillaume Hennenfent
Format MP4

Données de citation

DOI 10.24350/CIRM.V.19199603
Citer cette vidéo Pagès, Gilles (19/07/2017). Optimal vector quantization: from signal processing to clustering and numerical probability. CIRM. Audiovisual resource. DOI: 10.24350/CIRM.V.19199603
URL https://dx.doi.org/10.24350/CIRM.V.19199603

Domaine(s)

Bibliographie

Duflo, M. (1996). Algorithmes stochastiques. Paris: Springer-Verlag - http://www.springer.com/fr/book/9783540606994
Gersho, A., & Gray, R.M. (1992). Vector Quantization and Signal Compression. Boston: Kluwer Academic Publishers - http://dx.doi.org/10.1007/978-1-4615-3626-0
Graf, S., & Luschgy, H. (2000). Foundations of quantization for probability distributions. Berlin: Springer - http://dx.doi.org/10.1007/BFb0103945
Kushner, H., & Yin, G.G. (2003). Stochastic approximation and recursive algorithms and applications. 2nd ed. New York: Springer - http://dx.doi.org/10.1007/b97441
Pagès, G. (2015). Introduction to vector quantization and its applications for numerics. ESAIM Proceedings and Surveys, 48, 29-79 - http://dx.doi.org/10.1051/proc/201448002
Pagès, G., & Printems, J. (2009). Optimal quantization for finance: from random vectors to stochastic processes. In A. Bensoussan, & Q. Zhang (Eds.), Handbook of numerical analysis. XV, Special volume, Mathematical modeling and numerical methods in finance (pp. 595-649). Amsterdam: Elsevier/North-Holland - http://dx.doi.org/10.1016/S1570-8659(08)00015-x
Pagès, G., & Wilbertz, B. (2011). Optimal Delaunay and Voronoi quantization schemes for pricing American style options. - https://hal.archives-ouvertes.fr/hal-00572709

Codes MSC

Document(s)

http://smai.emath.fr/cemracs/cemracs17/Slides/pages.pdf

Dernières questions liées sur MathOverflow

Pour poser une question, votre compte Carmin.tv doit être connecté à mathoverflow

Poser une question sur MathOverflow

Toutes les vidéos de la collection

01:20:38

publiée le 26 mai 2016

Coloring graphs on surfaces

De Louis Esperet

01:41:57

publiée le 2 juin 2016

The local Langlands correspondence: functoriality, $L$-functions, gamma functions and the epsilon factors

De Dipendra Prasad

55:12

publiée le 8 décembre 2016

Applications of algebra to automatic sequences and pattern avoidance - Lecture 1

De Jason P. Bell

01:29:39

publiée le 17 janvier 2017

Characters, maps, free cumulants. Lecture 1: Characters, maps, free cumulants and Stanley character formula

De Piotr Sniady

01:00:49

publiée le 9 février 2017

Entropy and mixing for multidimensional shifts of finite type - Lecture 1

De Ronnie Pavlov

01:00:17

publiée le 9 février 2017

Entropy and mixing for multidimensional shifts of finite type - Lecture 2

De Ronnie Pavlov

01:04:05

publiée le 9 février 2017

Entropy and mixing for multidimensional shifts of finite type - Lecture 3

De Ronnie Pavlov

01:08:53

publiée le 30 mars 2017

Le problème Graph Motif - Partie 1

De Guillaume Fertin

01:03:16

publiée le 13 février 2018

Cluster algebras and categorification - Lecture 1

De Claire Amiot

01:13:29

publiée le 24 avril 2018

Introduction to quantum optics - Lecture 1

De Peter Zoller

01:07:31

publiée le 27 juin 2018

Structure of hyperbolic manifolds - Lecture 1

De Jessica Purcell

59:19

publiée le 10 décembre 2018

Inexact gradient projection and fast data driven compressed sensing: theory and application

De Michael E. Davies

01:47:34

publiée le 21 janvier 2019

Bayesian inference and mathematical imaging - Part 3: probability and convex optimisation

De Marcelo Pereyra

01:29:11

publiée le 23 janvier 2019

Improving RNA secondary structure prediction

De Ronny Lorenz

55:55

publiée le 7 février 2019

Topics on $K3$ surfaces - Lecture 1: $K3$ surfaces in the Enriques Kodaira classification and examples

De Alessandra Sarti

01:18:41

publiée le 11 avril 2019

Autour de la mesure de Plancherel sur les partitions d'entiers (une introduction aux processus de Schur) - Partie 1

De Jérémie Bouttier

01:19:10

publiée le 26 mai 2016

Bidimensionality and subexponential parameterized algorithms

De Dimitrios Thilikos

03:01:51

publiée le 29 juillet 2016

The Portable Extensible Toolkit for Scientific Computing

De Matthew Knepley

02:59:55

publiée le 29 juillet 2016

Krylov subspace solvers and preconditioners

De Kees Vuik

02:30:25

publiée le 29 juillet 2016

Time parallel time integration

De Martin Gander

02:58:11

publiée le 29 juillet 2016

Domain decomposition, hybrid methods, coarse space corrections

De Frédéric Nataf

01:01:30

publiée le 1 août 2016

A gentle introduction to parallel programming using OpenMP

De François Broquedis

59:28

publiée le 1 août 2016

OpenCL introduction

De Frédéric Desprez

02:55:45

publiée le 29 juillet 2016

Algorithms for future emerging technologies

De Jack Dongarra

55:32

publiée le 29 juillet 2016

Overview of architectures and programming language for parallel computing

De Jean-françois Méhaut

03:06:20

publiée le 1 août 2016

Tutorial with Freefem++

De Frédéric Hecht

03:01:17

publiée le 1 août 2016

Reduced basis methods: approximation of PDE's, interpolation and a posteriori estimate

De Yvon Maday

03:00:50

publiée le 3 août 2016

Introduction to data assimilation: Kalman filters and ensembles

De Vivien Mallet

02:34:09

publiée le 3 août 2016

Data assimilation training course @ CEMRACS: introduction and variational algorithms

De Sophie Ricci

59:43

publiée le 8 décembre 2016

About the domino problem on finitely generated groups - Lecture 1

De Nathalie Aubrun

01:06:42

publiée le 8 décembre 2016

Automorphism groups of low complexity subshift - Lecture 2

De Samuel Petite

01:01:09

publiée le 8 décembre 2016

Amenable groups - Lecture 2

De Laurent Bartholdi

01:07:30

publiée le 8 décembre 2016

Logic, decidability and numeration systems - Lecture 1

De Émilie Charlier

01:27:18

publiée le 17 janvier 2017

Characters, maps, free cumulants. Lecture 2: Characters, maps, free cumulants and randoms Young diagrams

De Piotr Sniady

51:29

publiée le 17 janvier 2017

A new spectral theory for Schur polynomials and applications

De Alexander Moll

01:26:16

publiée le 17 janvier 2017

Characters, maps, free cumulants. Lecture 3: Characters, maps, free cumulants and Kerov character polynomials

De Piotr Sniady

42:45

publiée le 17 janvier 2017

Flat surfaces and combinatorics

De Élise Goujard

43:12

publiée le 9 février 2017

Palindromes patterns

De Srecko Brlek

$Exemple d'Arnoux-Yoccoz, fractal de Rauzy, problème de Novikov : brins d'une guirlande éternelle$
58:28

publiée le 9 février 2017

Exemple d'Arnoux-Yoccoz, fractal de Rauzy, problème de Novikov : brins d'une guirlande éternelle

De Pascal Hubert

01:06:25

publiée le 16 février 2017

Dynamics on homogeneous spaces and Diophantine approximation

De Anish Ghosh

01:07:18

publiée le 15 février 2017

Dynamics on quotients of SL(2,C) by discrete subgroups - Lecture 2

De Barbara Schapira

01:05:27

publiée le 15 février 2017

Shrinking targets on homogeneous spaces and improving Dirichlet's Theorem

De Dmitry Kleinbock

01:03:47

publiée le 15 février 2017

Counting and equidistribution of integral representations by quadratic norm forms in positive characteristic?

De Frédéric Paulin

58:17

publiée le 30 mars 2017

Random cubic planar graphs revisited

De Juanjo Rué

49:35

publiée le 26 juillet 2017

Numerical methods for mean field games - Lecture 1: Introduction to the system of PDEs and its interpretation. Uniqueness of classical solutions

De Yves Achdou

01:11:56

publiée le 26 juillet 2017

Numerical methods for mean field games - Lecture 3: Variational MFG and related algorithms for solving the discrete system of nonlinear equations

De Yves Achdou

01:02:06

publiée le 26 juillet 2017

Particle algorithm for McKean SDE: a short review on numerical analysis

De Mireille Bossy

01:51:23

publiée le 26 juillet 2017

Cubature methods and applications

De Dan Crisan

58:51

publiée le 30 juillet 2017

The Metropolis Hastings algorithm: introduction and optimal scaling of the transient phase

De Benjamin Jourdain

01:51:23

publiée le 31 juillet 2017

Optimal vector quantization: from signal processing to clustering and numerical probability

De Gilles Pagès

01:02:50

publiée le 31 juillet 2017

Dynamic formulations of optimal transportation and variational MFGs

De Jean-David Benamou

01:46:09

publiée le 1 août 2017

Multilevel and multi-index sampling methods with applications - Lecture 1: Adaptive strategies for Multilevel Monte Carlo

De Raul Tempone

01:46:24

publiée le 1 août 2017

Least squares regression Monte Carlo for approximating BSDES and semilinear PDES

De Plamen Turkedjiev

01:00:33

publiée le 1 août 2017

Subsurface flow with uncertainty : applications and numerical analysis issues

De Julia Charrier

56:32

publiée le 1 août 2017

Global sensitivity analysis in stochastic systems

De Olivier Le Maître

01:07:59

publiée le 1 août 2017

Metamodels for uncertainty quantification and reliability analysis

De Stefano Marelli

01:51:57

publiée le 1 août 2017

Multilevel and multi-index sampling methods with applications - Lecture 2: Multilevel and Multi-index Monte Carlo methods for the McKean-Vlasov equation

De Raul Tempone

01:50:04

publiée le 3 août 2017

Lecture 1: Introduction to HPC, random generation, and OpenMP

De Jérôme Lelong

01:39:56

publiée le 3 août 2017

Lecture 2: Introduction to HPC - MPI: design of parallel program and MPI

De Jérôme Lelong

01:06:53

publiée le 28 novembre 2017

The undecidability of the domino problem

De Emmanuel Jeandel

01:28:30

publiée le 28 novembre 2017

From combinatorial games to shape-symmetric morphisms

De Michel Rigo

01:38:52

publiée le 28 novembre 2017

$S$-adic sequences: a bridge between dynamics, arithmetic, and geometry

De Jörg Thuswaldner

01:21:20

publiée le 28 novembre 2017

Lecture on Delone sets and tilings

De Boris Solomyak

30:25

publiée le 22 mars 2018

Chemins à grands pas dans le quadrant

De Mireille Bousquet-Mélou

33:27

publiée le 22 mars 2018

Sur les mesures stationnaires des VLMC

De Nicolas Pouyanne

26:05

publiée le 22 mars 2018

Comptage et design multiple d'ARN

De Yann Ponty

01:22:56

publiée le 26 juillet 2018

Multi-level mathematical models for cell migration in dense fibrous environments

De Luigi Preziosi

01:03:27

publiée le 30 juillet 2018

Irreversible electroporation of liver malignancy: a new opportunity of curative treatment for patients not amenable to resection and thermal ablation

De Olivier Seror

01:23:52

publiée le 30 juillet 2018

Fluid-structure interaction in the cardiovascular system. Lecture 1: Forward problems

De Jean-Frédéric Gerbeau

01:28:17

publiée le 30 juillet 2018

Mathematics behind some phenomena in crowd motion : Stop and Go waves and Capacity Drop

De Bertrand Maury

01:33:10

publiée le 30 juillet 2018

Data Assimilation: a deterministic vision, theory and applications. Lecture 1: Least-square estimation

De Philippe Moireau

01:10:22

publiée le 26 juillet 2018

A new continuum theory for incompressible swelling materials

De Pierre Degond

01:18:16

publiée le 30 juillet 2018

Fluid-structure interaction in the cardiovascular system. Lecture 2: Cardiac valves

De Jean-Frédéric Gerbeau

01:42:17

publiée le 30 juillet 2018

Data Assimilation: a deterministic vision, theory and applications. Lecture 2: Asymptotic observers

De Philippe Moireau

01:10:38

publiée le 21 mars 2019

Posets, polynômes, et polytopes - Partie 1

De Kolja Knauer

01:26:05

publiée le 21 mars 2019

Transductions - Partie 1

De Emmanuel Filiot

01:12:01

publiée le 21 mars 2019

Posets, polynômes, et polytopes - Partie 2

De Kolja Knauer

01:27:49

publiée le 25 mars 2019

Transductions - Partie 2

De Pierre-Alain Reynier

57:37

publiée le 25 mars 2019

Toeplitz determinants, Painlevé equations, and special functions. Part I: an operator approach - Lecture 1

De Estelle Basor

50:42

publiée le 28 mars 2019

Toeplitz determinants, Painlevé equations, and special functions. Part II: a Riemann-Hilbert point of view - Lecture 1

De Alexander R. Its

01:03:03

publiée le 28 mars 2019

Toeplitz determinants, Painlevé equations, and special functions. Part II: a Riemann-Hilbert point of view - Lecture 2

De Alexander R. Its

54:50

publiée le 27 mars 2019

Random matrices, integrability, and number theory - Lecture 2

De Jonathan P. Keating

51:08

publiée le 27 mars 2019

Random matrices, integrability, and number theory - Lecture 1

De Jonathan P. Keating

47:51

publiée le 25 mars 2019

Toeplitz determinants, Painlevé equations, and special functions. Part I: an operator approach - Lecture 2

De Estelle Basor

55:45

publiée le 28 mars 2019

Determinantal point processes - Lecture 1

De Alexander Bufetov

57:40

publiée le 28 mars 2019

Toeplitz determinants, Painlevé equations, and special functions. Part II: a Riemann-Hilbert point of view - Lecture 3

De Alexander R. Its

01:03:00

publiée le 27 mars 2019

Random matrices, integrability, and number theory - Lecture 3

De Jonathan P. Keating

54:04

publiée le 28 mars 2019

Operator limits of beta ensembles - Lecture 1

De Brian Rider

55:06

publiée le 28 mars 2019

Determinantal point processes - Lecture 2

De Alexander Bufetov

59:07

publiée le 27 mars 2019

Random matrices, integrability, and number theory - Lecture 4

De Jonathan P. Keating

55:15

publiée le 27 mars 2019

Operator limits of beta ensembles - Lecture 2

De Brian Rider

47:38

publiée le 2 avril 2019

Toeplitz determinants, Painlevé equations, and special functions. Part I: an operator approach - Lecture 3

De Estelle Basor

01:02:04

publiée le 28 mars 2019

Determinantal point processes - Lecture 3

De Alexander Bufetov

42:23

publiée le 29 mars 2019

Correlation functions for some integrable systems with random initial data, theory and computation - Lecture 1

De Tamara Grava

50:47

publiée le 29 mars 2019

Correlation functions for some integrable systems with random initial data, theory and computation - Lecture 2

De Kenneth D. T.-R. McLaughlin

51:26

publiée le 27 mars 2019

Operator limits of beta ensembles - Lecture 3

De Brian Rider

47:52

publiée le 28 mars 2019

Operator limits of beta ensembles - Lecture 4

De Brian Rider

Copyright Carmin.tv 2025

Donner son avis