New challenges in high-dimensional statistics / Statistique mathématique

Collection New challenges in high-dimensional statistics / Statistique mathématique

Organizer(s) Klopp, Olga ; Pouet, Christophe ; Rakhlin, Alexander
Date(s) 16/12/2024 - 20/12/2024
linked URL https://conferences.cirm-math.fr/3055.html
00:00:00 / 00:00:00
5 5

Attention layers provably solve single-location regression

By Claire Boyer

Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random variable, retrievable via a linear projection of the input. To solve this task, we propose a dedicated predictor, which turns out to be a simplified version of a non-linear self-attention layer. We study its theoretical properties, by showing its asymptotic Bayes optimality and analyzing its training dynamics. In particular, despite the non-convex nature of the problem, the predictor effectively learns the underlying structure. This work highlights the capacity of attention mechanisms to handle sparse token information and internal linear structures.

Information about the video

Citation data

  • DOI 10.24350/CIRM.V.20279403
  • Cite this video Boyer, Claire (19/12/2024). Attention layers provably solve single-location regression. CIRM. Audiovisual resource. DOI: 10.24350/CIRM.V.20279403
  • URL https://dx.doi.org/10.24350/CIRM.V.20279403

Domain(s)

Bibliography

Last related questions on MathOverflow

You have to connect your Carmin.tv account with mathoverflow to add question

Ask a question on MathOverflow




Register

  • Bookmark videos
  • Add videos to see later &
    keep your browsing history
  • Comment with the scientific
    community
  • Get notification updates
    for your favorite subjects
Give feedback