PhD Defenses

PHYSICS DISSERTATION DEFENSE: Brett Larsen

Date
Mon September 19th 2022, 10:00 - 11:00am
Location
Neurosciences Building, Room S375
Stanford Student Observatory

Linda Cicero

Ph.D. Candidate:  Brett Larsen

Research Advisor:  Shaul Druckmann and Surya Ganguli

Date: September 19, 2022
Time: 10 AM

Location: Neurosciences Building, Room S375


Zoom Link:  
https://stanford.zoom.us/j/99313124173

Zoom Password: email nickswan [at] stanford.edu for password.

 

Title: 
Optimization and High-Dimensional Loss Landscapes in Deep Learning


Abstract:
Despite deep learning's impressive success, many questions remain concerning how training such high-dimensional models behaves in practice and why it reliably produces useful networks. We employ an empirical approach, performing experiments guided by theoretical predictions, to study the following through the lens of the loss landscape. (1) How do loss landscape properties affect the success or failure of weight pruning methods? Recent work on two fronts – the lottery tickets hypothesis and training restricted to random subspaces – has demonstrated that deep neural networks can be successfully optimized using far fewer degrees of freedom than the total number of parameters.  In particular, lottery tickets, or sparse subnetworks capable of matching the full model's accuracy, can be identified via iterative pruning and retraining of the weights. We first provide a framework for the success of low-dimensional training in terms of the high-dimensional geometry of the loss landscape. We then leverage this framework both to better understand the success of lottery tickets and to predict how aggressively we can prune the weights at each iteration. (2) What are the algorithmic advantages of recurrent connections in neural networks? One of the brain's most striking anatomical features is the ubiquity of lateral and recurrent connections. Yet while the strong computational abilities of feedforward networks have been extensively studied, our understanding of the role of recurrent computations that might explain their prevalence remains an important open challenge. We demonstrate that recurrent connections are efficient for performing tasks that can be solved via repeated, local propagation of information and propose that they can be combined with feedforward architectures for efficient computation across timescales
.