00:00:00 / 00:00:00

The mean-field dynamics of transformers

By Philippe Rigollet

Appears in collection : New challenges in high-dimensional statistics / Statistique mathématique 2025

We develop a mathematical framework that interprets Transformer attention as an interacting particle system and studies its continuum (mean-field) limits. By idealizing attention on the sphere, we connect Transformer dynamics to Wasserstein gradient flows, synchronization models (Kuramoto), and mean-shift clustering. Central to our results is a global clustering phenomenon whereby tokens cluster asymptotically after long metastable states where they are arranged into multiple clusters. We further analyze a tractable equiangular reduction to obtain exact clustering rates, show how commonly used normalization schemes alter contraction speeds, and identify a phase transition for long-context attention. The results highlight both the mechanisms that drive representation collapse and the regimes that preserve expressive, multi-cluster structure in deep attention architectures.

Information about the video

Citation data

  • DOI 10.24350/CIRM.V.20425603
  • Cite this video Rigollet, Philippe (16/12/2025). The mean-field dynamics of transformers. CIRM. Audiovisual resource. DOI: 10.24350/CIRM.V.20425603
  • URL https://dx.doi.org/10.24350/CIRM.V.20425603

Bibliography

Last related questions on MathOverflow

You have to connect your Carmin.tv account with mathoverflow to add question

Ask a question on MathOverflow




Register

  • Bookmark videos
  • Add videos to see later &
    keep your browsing history
  • Comment with the scientific
    community
  • Get notification updates
    for your favorite subjects
Give feedback