ReLU and Softplus Neural Nets as Zero-Sum, Turn-Based, Stopping Games
By Yiannis Vlassopoulos
Appears in collection : 3rd Edition of Mathematics for and by Large Language Models
Large language models process vast sequences of input tokens by alternating between classical multi-layer perceptron layers and self-attention mechanisms. While the approximation capabilities of perceptrons are relatively well understood, those of attention mechanisms remain less explored. In this talk, I will compare the proof techniques and approximation results associated with these two types of layers, emphasizing key open questions that connect large language models with approximation theory in infinite-dimensional spaces representing input token distributions.