Imaging and machine learning

Collection Imaging and machine learning

Organizer(s)
Date(s) 03/05/2024
00:00:00 / 00:00:00
22 30

Designing multimodal deep architectures for Visual Question Answering

By Matthieu Cord

Multimodal representation learning for text and image has been extensively studied in recent years. Currently, one of the most popular tasks in this field is Visual Question Answering (VQA). I will introduce this complex multimodal task, which aims at answering a question about an image. To solve this problem, visual and textual deep nets models are required and, high level interactions between these two modalities have to be carefully designed into the model in order to provide the right answer. This projection from the unimodal spaces to a multimodal one is supposed to extract and model the relevant correlations between the two spaces. Besides, the model must have the ability to understand the full scene, focus its attention on the relevant visual regions and discard the useless information regarding the question.

Information about the video

  • Date of recording 04/04/2019
  • Date of publication 10/05/2019
  • Institution IHP
  • Language English
  • Format MP4
  • Venue Institut Henri Poincaré

Domain(s)

Last related questions on MathOverflow

You have to connect your Carmin.tv account with mathoverflow to add question

Ask a question on MathOverflow




Register

  • Bookmark videos
  • Add videos to see later &
    keep your browsing history
  • Comment with the scientific
    community
  • Get notification updates
    for your favorite subjects
Give feedback