删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

张驰浩 博士后:Towards Understanding the Terminal Phase of Training of Deep Neural Networks

本站小编 Free考研考试/2021-12-26



Academy of Mathematics and Systems Science, CAS
Colloquia & Seminars

Speaker: 张驰浩 博士后 ,日本东京大学
Inviter: 张世华
Title:
Towards Understanding the Terminal Phase of Training of Deep Neural Networks
Time & Venue:
2021.10.28 08:00-08:40 S525
Abstract:
Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Vardan Papyan et al. characterizes the TPT as Neural Collapse (NC), involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame(ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class-mean, i.e. the Nearest Class-Center (NCC) decision rule. However, the NC described by Vardan Papyan et al. focuses on the behaviors of the last layer of deepnets; the behaviors of the deepnets' intermediate layers is still unclear. In this talk, I will briefly introduce the NC phenomena and discuss the future direction towards understanding the TPT of deepnets by investigating the behaviors of the intermediate layers.

相关话题/博士后 日本东京