2022 - T3 - WS1 - Non-Linear and High Dimensional Inference

Collection 2022 - T3 - WS1 - Non-Linear and High Dimensional Inference

Organizer(s) Aamari, Eddie ; Aaron, Catherine ; Chazal, Frédéric ; Fischer, Aurélie ; Hoffmann, Marc ; Le Brigant, Alice ; Levrard, Clément ; Michel, Bertrand
Date(s) 03/10/2022 - 07/10/2022
linked URL https://indico.math.cnrs.fr/event/7545/
9 21

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative---the derivative of the Hessian in the leading eigenvector direction---that encourages drift toward wider minima.

Information about the video

Citation data

  • DOI 10.57987/IHP.2022.T3.WS1.009
  • Cite this video Bartlett, Peter (06/10/2022). Convergence of Sharpness-Aware Minimization. IHP. Audiovisual resource. DOI: 10.57987/IHP.2022.T3.WS1.009
  • URL https://dx.doi.org/10.57987/IHP.2022.T3.WS1.009

Domain(s)

Last related questions on MathOverflow

You have to connect your Carmin.tv account with mathoverflow to add question

Ask a question on MathOverflow




Register

  • Bookmark videos
  • Add videos to see later &
    keep your browsing history
  • Comment with the scientific
    community
  • Get notification updates
    for your favorite subjects
Give feedback