

Slow Convergence of Stochastic Optimization Algorithms Without Derivatives Is Avoidable
By Anne Auger


Deep Out-of-the-distribution Uncertainty Quantification in for Data (Science) Scientists
By Nicolas Vayatis
By Francis Bach
Appears in collection : Optimization for Machine Learning / Optimisation pour l’apprentissage automatique
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.