Apparaît dans la collection : 2024 - PC2 - Random tensors and related topics
Combining random matrices and multilayer perceptrons (MLPs) forms the foundation theory of deep neural networks (DNN). So, what role do random tensors play in deep learning? In this talk, we introduce how random tensors appear in the analysis of the MLP-Mixer. The MLP-Mixer is a type of DNN used in image processing and is a simplified model of the Vision Transformer (ViT). In these models, input images are divided into tokens, arranged sequentially, and input as second-order tensors. The MLP-Mixer processes both within-token and between-token operations using MLP blocks. Despite its simple structure that replaces the attention mechanism of ViT with MLPs, the MLP-Mixer achieves performance close to ViT's, highlighting the importance of data volume and tokenization. Specifically, this talk presents experimental results showing that high sparsity and large hidden layer dimensions positively impact performance. To this end, we intentionally disrupt the model's structure using tensor products and random permutation matrices, verifying that these beneficial properties are not dependent on the model's specific structure.