Center for Theoretical Biological Physics Research Seminar
The Center for Theoretical Biological Physics
The Graduate Center
City University of New York
“How Noise Affects the Hessian Spectrum in Over-parameterized Neural Networks”
Abstract: Stochastic gradient descent (SGD) forms the core optimization method for deep neural networks, contributing to their resurgence. While some theoretical progress has been made, it remains unclear why SGD leads the learning dynamics in over-parameterized networks to solutions that generalize well. Here we show that for over-parameterized networks with a degenerate valley in their loss landscape, SGD on average decreases the trace of the Hessian of the loss. We also show that isotropic noise in the non-degenerate subspace of the Hessian decreases its determinant. In addition to explaining SGDs role in sculpting the Hessian spectrum, this opens the door to new optimization approaches that guides the model to solutions with better generalization. We test our results with experiments on toy models and deep neural networks.
Bio: A biological physicist, David Schwab applies statistical physics and nonlinear dynamics to problems in biology. To explore these issues, he draws on a diverse set of analytical and computational tools such as statistical mechanics, dynamical systems theory, machine learning, and information theory. Previously he was an assistant professor at Northwestern University, where he focused on questions such as how neural networks perform memory and attention and how cells communicate and coordinate their behavior during development.
An Official Seminar of the Ph.D. Program in Systems, Synthetic and Physical Biology at Rice University