

本站小编 Free考研考试/2021-12-27

孙琪1, 陶蕴哲2, 杜强3
1 北京大学北京国际数学研究中心, 北京 100871;
2 亚马逊网络服务人工智能, 西雅图 WA90121, 美国;
3 哥伦比亚大学应用物理与应用数学系, 纽约 NY10025, 美国


Sun Qi1, Tao Yunzhe2, Du Qiang3
1 Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China;
2 Amazon Web Services Artificial Intelligence, Seattle, WA 98121, USA;
3 Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, USA






[1] Achille A, Soatto S. Information dropout:Learning optimal representations through noisy computation[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 40(12):2897-2905.

[2] Borzì A, Schulz V, Schillings C, Winckel G V. On the treatment of distributed uncertainties in PDE-constrained optimization[J]. GAMM-Mitteilungen, 2010, 33(2):230-246.

[3] Borzì A, von Winckel G. A POD framework to determine robust controls in PDE optimization[J]. Computing and visualization in science, 2011, 14(3):91-103.

[4] Bottou L, Curtis F E, Nocedal J. Optimization methods for large-scale machine learning[J]. SIAM Review, 2018, 60(2):223-311.

[5] Chang B, Meng L, Haber E, Tung F, Begert D. Multi-level Residual Networks from Dynamical Systems View[J]. arXiv preprint arXiv:1710.10348, 2017.

[6] Chaudhari P, Choromanska A, Soatto S, LeCun Y, Baldassi C, Borgs C, Chayes J, Sagun L, Zecchina R. Entropy-SGD:Biasing gradient descent into wide valleys[J]. arXiv preprint arXiv:1611.01838, 2016.

[7] Chaudhari P, Oberman A, Osher S, Soatto S, Carlier G. Deep Relaxation:partial differential equations for optimizing deep neural networks[J]. arXiv preprint arXiv:1704.04932, 2017.

[8] Chen R, Rubanova Y, Bettencourt J, Duvenaud D. Neural ordinary differential equa-tions[C]. In Advances in Neural Information Processing Systems, 2018.

[9] Cheng Y, Wang D, Zhou P, Zhang T. A survey of model compression and acceleration for deep neural networks[J]. arXiv preprint arXiv:1710.09282, 2017.

[10] Dauphin Y N, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization[C]. In Advances in neural information processing systems, 2014, 2933-2941.

[11] Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet:A large-scale hierarchical image database[C]. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, Ieee, 2009, 248-255.

[12] Dinh L, Pascanu R, Bengio S, Bengio Y. Sharp minima can generalize for deep nets[C]. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, 2017, 1019-1028.

[13] E W. A proposal on machine learning via dynamical systems[J]. Communications in Mathematics and Statistics, 2017, 5(1):1-11.

[14] Evans L C. Partial differential equations[M]. American mathematical society, 1998.

[15] Gal Y, Ghahramani Z. Dropout as a Bayesian approximation:Representing model uncertainty in deep learning[C]. In international conference on machine learning, 2016, 1050-1059.

[16] Gastaldi X. Shake-shake regularization[J]. International Conference on Learning Representations Workshop Track, 2017.

[17] Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning[M], volume 1. MIT press Cambridge, 2016.

[18] Gunzburger M D, Webster C G, Zhang G. Stochastic finite element methods for partial differential equations with random input data[J]. Acta Numerica, 2014, 23:521-650.

[19] Haber E, Ruthotto L, Holtham E. Learning across scales-A multiscale method for Convolution Neural Networks[J]. 2017.

[20] He H, Huang G, Yuan Y. Asymmetric Valleys:Beyond Sharp and Flat Local Minima[C]. In Advances in Neural Information Processing Systems, 2019, 2549-2560.

[21] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition[C]. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, 770-778.

[22] He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks[C]. In European Conference on Computer Vision, Springer, 2016, 630-645.

[23] Higham D J. An algorithmic introduction to numerical simulation of stochastic differential equations[J]. SIAM review, 2001, 43(3):525-546.

[24] Hoffer E, Hubara I, Soudry D. Train longer, generalize better:closing the generalization gap in large batch training of neural networks[C]. In Advances in Neural Information Processing Systems, 2017, 1731-1741.

[25] Hu W, Li C J, Li L, Liu J G. On the diffusion approximation of nonconvex stochastic gradient descent[J]. arXiv preprint arXiv:1705.07562, 2017.

[26] Huang G, Liu Z, Maaten L V D, Weinberger K Q. Densely Connected Convolutional Networks.[C]. In CVPR, 2017, 4700-4708.

[27] Huang G, Sun Y, Liu Z, Sedra D, Weinberger K Q. Deep networks with stochastic depth[C]. In European Conference on Computer Vision, Springer, 2016, 646-661.

[28] Itô K. Diffusion processes[M]. Wiley Online Library, 1974.

[29] Izmailov P, Podoprikhin D, Garipov T, Vetrov D, Wilson A G. Averaging weights leads to wider optima and better generalization[J]. arXiv preprint arXiv:1803.05407, 2018.

[30] Kawaguchi K, Kaelbling L P, Bengio Y. Generalization in deep learning[J]. arXiv preprint arXiv:1710.05468, 2017.

[31] Keskar N S, Mudigere D, Nocedal J, Smelyanskiy M, Tang P T P. On large-batch training for deep learning:Generalization gap and sharp minima[J]. arXiv preprint arXiv:1609.04836, 2016.

[32] Kloeden P E, Pearson R A. The numerical solution of stochastic differential equations[M]. Springer Berlin Heidelberg, 2010.

[33] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[J]. 2009.

[34] Labach A, Salehinejad H, Valaee S. Survey of dropout methods for deep neural networks[J]. arXiv preprint arXiv:1904.13310, 2019.

[35] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553):436-444.

[36] Lee H C, Gunzburger M D. Comparison of approaches for random PDE optimization problems based on different matching functionals[J]. Computers & Mathematics with Applications, 2017, 73(8):1657-1672.

[37] Li H, Xu Z, Taylor G, Studer C, Goldstein T. Visualizing the loss landscape of neural nets[C]. In Advances in Neural Information Processing Systems, 2018, 6389-6399.

[38] Li Q, Lin T, Shen Z. Deep Learning via Dynamical Systems:An Approximation Perspective[J]. arXiv preprint arXiv:1912.10382, 2019.

[39] Li Q, Tai C, E W. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms[C]. In International Conference on Machine Learning, 2017, 2101-2110.

[40] Li X, Chen S, Hu X, Yang J. Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift[J]. arXiv preprint arXiv:1801.05134, 2018.

[41] Li Z, Shi Z. Deep Residual Learning and PDEs on Manifold[J]. arXiv preprint arXiv:1708.05115, 2017.

[42] Liu X, Xiao T, Si S, Cao Q, Kumar S, Hsieh C J. Neural SDE:Stabilizing Neural ODE Networks with Stochastic Noise[J]. arXiv preprint arXiv:1906.02355, 2019.

[43] Lu Y, Zhong A, Li Q, Dong B. Beyond Finite Layer Neural Networks:Bridging Deep Architectures and Numerical Differential Equations[J]. arXiv preprint arXiv:1710.10121, 2017.

[44] Øksendal B. Stochastic differential equations[G]. In Stochastic differential equations, Springer, 2003, 65-84.

[45] Osher S, Wang B, Yin P, Luo X, Pham M, Lin A. Laplacian Smoothing Gradient Des-cent[J]. arXiv preprint arXiv:1806.06317, 2018.

[46] Schmidhuber J. Deep learning in neural networks:An overview[J]. Neural networks, 2015, 61:85-117.

[47] Smith S L, Kindermans P J, Ying C, Le Q V. Don't decay the learning rate, increase the batch size[J]. arXiv preprint arXiv:1711.00489, 2017.

[48] Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:a simple way to prevent neural networks from overfitting.[J]. Journal of machine learning research, 2014, 15(1):1929-1958.

[49] Sun Q, Du Q. A Distributed Optimal Control Problem with Averaged Stochastic Gradient Descent[J]. Communications in Computational Physics, 2020, 27(3):753-774.

[50] Sun Q, Tao Y, Du Q. Stochastic training of residual networks:a differential equation viewpoint[J]. arXiv preprint arXiv:1812.00174, 2018.

[51] Thorpe M, van Gennip Y. Deep limits of residual neural networks[J]. arXiv preprint arXiv:1810.11741, 2018.

[52] Veit A, Wilber M J, Belongie S. Residual networks behave like ensembles of relatively shallow networks[C]. In Advances in Neural Information Processing Systems, 2016, 550-558.

[53] Wan L, Zeiler M, Zhang S, Cun Y L, Fergus R. Regularization of neural networks using dropconnect[C]. In International Conference on Machine Learning, 2013, 1058-1066.

[54] Wang B, Shi Z, Osher S. ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies[C]. In Advances in Neural Information Processing Systems, 2019, 1655-1665.

[55] Wang B, Yuan B, Shi Z, Osher S J. EnResNet:ResNet Ensemble via the Feynman-Kac Formalism[J]. arXiv preprint arXiv:1811.10745, 2018.

[56] Wang K, Sun W, Du Q. A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation[J]. Computational Mechanics, 2019, 64(2):467-499.

[57] Warming R, Hyett B. The modified equation approach to the stability and accuracy analysis of finite-difference methods[J]. Journal of computational physics, 1974, 14(2):159-179.

[58] Zagoruyko S, Komodakis N. Wide residual networks[J]. arXiv preprint arXiv:1605.07146, 2016.

[59] Zhang H M, Dong B. A Review on Deep Learning in Medical Image Reconstruction[J]. Journal of the Operations Research Society of China, 2019, 1-30.

[1]董彬. 图像反问题中的数学与深度学习方法[J]. 计算数学, 2019, 41(4): 343-366.
[2]岳超. 高阶分裂步(θ1,θ2,θ3)方法的强收敛性[J]. 计算数学, 2019, 41(2): 126-155.
[3]张维, 王文强. 随机微分方程改进的分裂步单支θ方法的强收敛性[J]. 计算数学, 2019, 41(1): 12-36.
[4]赵卫东. 正倒向随机微分方程组的数值解法[J]. 计算数学, 2015, 37(4): 337-373.
[5]赵桂华, 李春香, 孙波. 带跳随机微分方程的Euler-Maruyama方法的几乎处处指数稳定性和矩稳定性[J]. 计算数学, 2014, 36(1): 65-74.

--> -->





相关话题/数学 计算 网络 北京 过程