Imaging and machine learning

Collection Imaging and machine learning

Organizer(s)

Date(s) 03/05/2024

00:00:00 / 00:00:00

22 30

Designing multimodal deep architectures for Visual Question Answering

By Matthieu Cord

Multimodal representation learning for text and image has been extensively studied in recent years. Currently, one of the most popular tasks in this field is Visual Question Answering (VQA). I will introduce this complex multimodal task, which aims at answering a question about an image. To solve this problem, visual and textual deep nets models are required and, high level interactions between these two modalities have to be carefully designed into the model in order to provide the right answer. This projection from the unimodal spaces to a multimodal one is supposed to extract and model the relevant correlations between the two spaces. Besides, the model must have the ability to understand the full scene, focus its attention on the relevant visual regions and discard the useless information regarding the question.

Information about the video

Date of recording 04/04/2019
Date of publication 10/05/2019
Institution IHP
Language English
Format MP4
Venue Institut Henri Poincaré

Domain(s)

Computer Science

Last related questions on MathOverflow

You have to connect your Carmin.tv account with mathoverflow to add question

Ask a question on MathOverflow

All the collection videos

41:31

published on May 7, 2019

Structured prediction via implicit embeddings

By Alessandro Rudi

55:19

published on May 7, 2019

A Kernel Perspective for Regularizing Deep Neural Networks

By Julien Mairal

45:58

published on May 7, 2019

Random Matrix Advances in Machine Learning

By Romain Couillet

51:54

published on May 7, 2019

Optimization meets machine learning for neuroimaging

By Alexandre Gramfort

45:48

published on May 7, 2019

Iterative regularization via dual diagonal descent

By Silvia Villa

38:49

published on May 7, 2019

Scalable hyperparameter transfer learning

By Valerio Perrone

45:58

published on May 7, 2019

Using structure to select features in high dimension

By Chloé-Agathe Azencott

44:30

published on May 7, 2019

Predicting aesthetic appreciation of images

By Naila Murray

45:57

published on May 7, 2019

Learning Representations for Information Obfuscation and Inference

By Guillermo Sapiro

48:01

published on May 7, 2019

An SDCA-powered inexact dual augmented Lagrangian method for fast CRF learning

By Guillaume Obozinski

30:02

published on May 7, 2019

Revisiting non-linear PCA with progressively grown autoencoders

By José Lezama

51:22

published on May 9, 2019

Combinatorial Solutions to Elastic Shape Matching

By Daniel Cremers

47:33

published on May 7, 2019

On the several ways to regularize optimal transport

By Marco Cuturi

42:20

published on May 9, 2019

Rank optimality for the Burer-Monteiro factorization

By Irène Waldspurger

50:06

published on May 9, 2019

Bayesian inversion for tomography through machine learning

By Ozan Öktem

48:30

published on May 9, 2019

Understanding geometric attributes with autoencoders

By Alasdair Newson

50:00

published on May 9, 2019

Statistical inference in high-dimension and application to medical imaging

By Bertrand Thirion

46:59

published on May 9, 2019

Deep Inversion, Autoencoders for Learned Regularization of Inverse Problems

By Christoph Brune

44:42

published on May 9, 2019

Optimal machine learning with stochastic projections and regularization

By Lorenzo Rosasco

51:43

published on May 10, 2019

Roto-Translation Covariant Convolutional Networks for Medical Image Analysis

By Remco Duits

52:34

published on May 10, 2019

Unsupervised domain adaptation with application to urban scene analysis

By Patrick Pérez

50:11

published on May 10, 2019

Designing multimodal deep architectures for Visual Question Answering

By Matthieu Cord

43:47

published on May 10, 2019

Towards demystifying over-parameterization in deep learning

By Mahdi Soltanolkotabi

41:40

published on May 10, 2019

Nonnegative matrix factorisation with the beta-divergence for robust hyperspectral unmixing

By Cédric Févotte

55:15

published on May 10, 2019

Autoencoder Image Generation with Multiscale Sparse Deconvolutions

By Stéphane Mallat

48:01

published on May 10, 2019

Learning from permutations

By Jean-Philippe Vert

43:31

published on May 10, 2019

Learned image reconstruction for high-resolution tomographic imaging

By Marta Betcke

36:16

published on May 10, 2019

Contextual Bandit: from Theory to Applications

By Claire Vernade

01:00:52

published on May 10, 2019

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

By Francis Bach

01:12:01

published on May 9, 2019

L’intelligence Artificielle est-elle Logique ou Géométrique ?

By Stéphane Mallat

Copyright Carmin.tv 2024

Give feedback