朱占星教授于2016年在英国爱丁堡大学获得机器学习方向博士学位。主要研究领域为机器学习，深度学习，强化学习算法及应用，在人工智能/机器学习领域顶级期刊及会议有40多篇文章发表，包括 NeurIPS, ICML, CVPR, ACL等。多次担任人工智能顶级会议 AAAI、AISTATS 的领域主席。曾获得2019阿里巴巴达摩院青橙奖、计算机安全领域旗舰会议 CCS 2018最佳论文提名奖等。
Dr. Zhanxing Zhu, is currently assistant professor at School of Mathematical Sciences, Peking University, also affiliated with Center for Data Science, Peking University. He obtained Ph.D degree in machine learning from University of Edinburgh in 2016. His research interests cover machine learning and its applications in various domains. Currently he mainly focuses on deep learning theory and optimization algorithms, reinforcement learning, and their applications. He has published more than 40 papers on top AI journals and conferences, such as NIPS, ICML, CVPR, ACL etc. He was awarded “2019 Alibaba Damo Young Fellow”, and obtained “Best Paper Finalist” from the top computer security conference CCS 2018.
It has been a long-standing debate that “Is deep learning alchemy or science? ”, since the success of deep learning mostly relies on various engineering design and tricks, lack of theoretical foundation. Unfortunately, the underlying mechanism of deep learning is still mysterious, which severely limits its further development from both theoretical and application aspects.
In this talk, I will introduce some of our attempts on theoretically understanding deep learning, mainly focusing on analyzing its training methods and tricks, including stochastic gradient descent, batch normalization and knowledge distillation. (1) We analyze the implicit regularization property of stochastic gradient descent (SGD), i.e. interpreting why SGD could find well generalizing minima compared with other alternatives; (2) We comprehensively reveal the learning dynamics of batch normalization and weight decay, and show its benefits on avoiding vanishing/exploding gradients, not being trapped into sharp minima and loss drop when decaying learning rate. (3) We also show the underlying mechanism of knowledge distillation, including its transfer risk bound, data efficiency and imperfect teacher distillation. These new findings shed some light on understanding the deep learning towards opening this black-box and also inspires new algorithmic design.
16:30 - 18:30
会议号:972 987 87202