Convergence of Sharpness-Aware Minimization

Gradient descent
non-convex optimization
convergence analysis
wide minima

Apparaît dans la collection : 2022 - T3 - WS1 - Non-linear and high dimensional inference

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative---the derivative of the Hessian in the leading eigenvector direction---that encourages drift toward wider minima.

Mathématiques
et Interactions

Inter'Actions 2015 - colloque pour les doctorants en mathématiques en France

Mikhail Gromov : Generation, Transformation, Transmission, Memorization, Storage and Expression of Information From Cell to LLM : Signals, Structures, Knowledge and Learning

Not Only Scalar Curvature Seminar

Composition operators and Banach spaces theory / Opérateurs de composition et espaces de Banach

Convergence of Sharpness-Aware Minimization

Informations sur la vidéo

Données de citation

Domaine(s)

Dernières questions liées sur MathOverflow

Poser une question sur MathOverflow

A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives

Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning

Convergence and Linear Speed-Up in Stochastic Federated Learning

Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning

Analyzing black-box optimization algorithms - why and how?

Convergence of Sharpness-Aware Minimization

Informations sur la vidéo

Données de citation

Domaine(s)

Dernières questions liées sur MathOverflow

Poser une question sur MathOverflow

Inscrivez-vous