无中心优化的算子分裂方法

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2021-12-27

印卧涛

阿里巴巴(美国)达摩院

收稿日期:2019-07-09出版日期:2019-09-15发布日期:2019-08-21

作者简介:印卧涛,哥伦比亚大学大学运筹学博士,现任阿里巴巴(美国)达摩院机器智能技术研究员,加州洛杉矶大学(UCLA)数学系终身教授(on leave).主要从事计算理论、优化算法、机器学习、信号处理等专业的创新理论和应用研究.曾获得2008年MSF CAREER Award,2009年Alfred P. Sloan研究奖,2016年晨星应用数学金奖,以及若干会议及年度最佳论文奖.

OPERATOR SPLITTING METHODS FOR DECENTRALIZED OPTIMIZATION

Yin Wotao

DAMO Academy, Alibaba US;(on leave from) Department of Mathematics, University of California, Los Angeles US

Received:2019-07-09Online:2019-09-15Published:2019-08-21

摘要
图/表
参考文献
相关文章
编辑推荐
-->Metrics
本文评论

在某些多智能体系统中，由于受到通讯等因素的限制，单个智能体只能进行本地计算，再与相邻智能体交换数据.与传统的并行和分布式计算不同，这种数据交换方式不再使用中心节点或者共享内存，而仅限于相邻节点之间.这种通过局部数据交换而实现全网目标的方式叫做无中心计算.比如，从任意的多个数开始，所有智能体通过不断地计算其局部平均，就都能收敛到这些数的平均值.无中心计算有不易形成通讯和计算瓶颈的优点，更适合分布的节点，因此受到一些应用的欢迎.
本文介绍求解一致最优化问题的若干无中心算法.一致最优化问题的目标是全网所有节点的变量收敛到同一个、并使所有目标函数之和最小的值.我们可以通过推广求平均的无中心方法去实现这个目标，但是得到算法比普通（有中心的）优化算法收敛得更慢，有阶数差距.近年来，一些新的无中心算法弥补了这个阶数差距.本文采用算子分裂的统一框架，以比这些算法原文更为简单的形式介绍这些方法.
MR(2010)主题分类:
68Q85
90C25
93A14
分享此文:

()

[1] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed. A diffusion RLS scheme for distributed estimation over adaptive networks. In 2007 IEEE 8th Workshop on Signal Processing Advances in Wireless Communications. 2007, 1-5.

[2] F. S. Cattivelli and A. H. Sayed. Diffusion LMS strategies for distributed estimation. IEEE Transactions on Signal Processing[J]. 2010, 58(3):1035-1048.

[3] Kun Yuan, Qing Ling, and Wotao Yin. On the convergence of decentralized gradient descent. SIAM Journal on Optimization[J]. 2016, 26(3):1835-1854.

[4] Annie I.-An Chen. Fast Distributed First-Order Methods. Thesis, Massachusetts Institute of Technology, 2012.

[5] Dusan Jakovetic, Joao Xavier, and Jose M. F. Moura. Fast distributed gradient methods. IEEE Transactions on Automatic Control[J]. 2014, 59(5):1131-1146.

[6] Jinshan Zeng and Wotao Yin. On nonconvex decentralized gradient descent. IEEE Transactions on Signal Processing[J]. 2018, 66(11):2834-2848.

[7] Ernest K. Ryu and Stephen Boyd. Primer on monotone operator methods. Technical report, Stanford University, 2015.

[8] Ernest Ryu, Robert Hannah, and Wotao Yin. Scaled relative graph:Nonexpansive operators via 2D Euclidean geometry. arXiv:1902.09788, 2019.

[9] Heinz H. Bauschke and Patrick L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer New York, New York, NY, 2011.

[10] David Kempe, Alin Dobra, and Johannes Gehrke. Gossip-based computation of aggregate information. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS'03, pages 482-, Washington, DC, USA, 2003. IEEE Computer Society.

[11] Angelia Nedic and Alex Olshevsky. Distributed optimization over time-varying directed graphs. IEEE Transactions on Automatic Control[J]. 2015, 60(3):601-615.

[12] R. Glowinski and A. Marroco. Sur l'approximation, paréléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires. ESAIM:Mathematical Modelling and Numerical Analysis[J]. 1975, 9(R2):41-76.

[13] P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis[J]. 1979, 16(6):964-979.

[14] Neal Parikh and Stephen Boyd. Proximal algorithms. Foundations and Trends in Optimization[J]. 2014, 1(3):127-239.

[15] Hao-Jun Michael Shi, Shengyinying Tu, Yangyang Xu, and Wotao Yin. A primer on coordinate descent algorithms. UCLA CAM Report 16-67, 2016.

[16] Giovanni Chierchia, Emilie Chouzenoux, Patrick L. Combettes, and Jean-Christophe Pesquet. The proximity operator repository (user's guide). Technical Report version 0.1, proximity-operator.net, 2017.

[17] Ronald E. Bruck Jr. An iterative solution of a variational inequality for certain monotone operators in Hilbert space. Bulletin of the American Mathematical Society[J]. 1975, 81(5):890-893.

[18] Gregory B. Passty. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. Journal of Mathematical Analysis and Applications[J]. 1979, 72(2):383-390.

[19] Damek Davis and Wotao Yin. Convergence rate analysis of several splitting schemes. In Roland Glowinski, Stanley Osher, and Wotao Yin, editors, Splitting Methods in Communication, Imaging, Science and Engineering, Chapter 4, pages 115-163. Springer, 2016.

[20] Damek Davis and Wotao Yin. Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Mathematics of Operations Research[J]. 2017, 42(3):783-805.

[21] Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision[J]. 2011, 40(1):120-145.

[22] Laurent Condat. A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. Journal of Optimization Theory and Applications[J]. 2013, 158(2):460-479.

[23] B. C. Vǔ. A splitting algorithm for dual monotone inclusions involving cocoercive operators. Advances in Computational Mathematics[J]. 2011, 38(3):667-681.

[24] Angelia Nedic and Asuman Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control[J]. 2009, 54(1):48-61.

[25] Damek Davis and Wotao Yin. A three-operator splitting scheme and its optimization applications. Set-Valued and Variational Analysis[J]. 2017, 25(4):829-858.

[26] Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed Computation:Numerical Methods. Prentice Hall, Englewood Cliffs, N.J, 1989.

[27] Gonzalo Mateos, Juan Andrés Bazerque, and Georgios B. Giannakis. Distributed sparse linear regression. IEEE Transactions on Signal Processing[J]. 2010, 58(10):5262-5276.

[28] Ioannis D. Schizas, Alejandro Ribeiro, and Georgios B. Giannakis. Consensus in ad hoc WSNs with noisy links-part I:Distributed estimation of deterministic signals. IEEE Transactions on Signal Processing[J]. 2008, 56(1):350-364.

[29] Qing Ling and Zhi Tian. Decentralized sparse signal recovery for compressive sleeping wireless sensor networks. IEEE Transactions on Signal Processing[J]. 2010, 58(7):3816-3827.

[30] Wei Shi, Qing Ling, Kun Yuan, Gang Wu, and Wotao Yin. On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Transactions on Signal Processing[J]. 2014. 62(7):1750-1761.

[31] Tsung-Hui Chang, Mingyi Hong, and Xiangfeng Wang. Multi-agent distributed optimization via inexact consensus ADMM. IEEE Transactions on Signal Processing[J]. 2015, 63(2):482-497.

[32] E. Wei and A. Ozdaglar. On the o(1/k) convergence of asynchronous distributed alternating direction method of multipliers. In 2013 IEEE Global Conference on Signal and Information Processing, 2013, 551-554.

[33] Wei Shi, Qing Ling, Gang Wu, and Wotao Yin. EXTRA:An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization[J]. 2015, 25(2):944-966.

[34] Wei Shi, Qing Ling, Gang Wu, and Wotao Yin. A proximal gradient algorithm for decentralized composite optimization. IEEE Transactions on Signal Processing[J]. 2015, 63(22):6013-6023.

[35] Håkan Terelius, Ufuk Topcu, and Richard M. Murray. Decentralized multi-agent optimization via dual decomposition. IFAC Proceedings Volumes[J]. 2011, 44(1):11245-11251.

[36] Minghui Zhu and Sonia Martínez. Discrete-time dynamic average consensus. Automatica[J]. 2010, 46(2):322-329.

[37] A. Nedi?, A. Olshevsky, W. Shi, and C. A. Uribe. Geometrically convergent distributed optimization with uncoordinated step-sizes. In 2017 American Control Conference (ACC), 2017, 3950-3955.

[38] Jinming Xu, Shanying Zhu, Yeng Chai Soh, and Lihua Xie. Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes. In 201554th IEEE Conference on Decision and Control (CDC), 2055-2060, Osaka, 2015. IEEE.

[39] Zhi Li, Wei Shi, and Ming Yan. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Transactions on Signal Processing, early access, 2019.

[40] G. Qu and N. Li. Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems[J]. 2018, 5(3):1245-1260.

[41] Angelia Nedi?, Alex Olshevsky, and Wei Shi. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM Journal on Optimization[J] 2017, 27(4):2597-2633.

[42] Kun Yuan, Bicheng Ying, Xiaochuan Zhao, and Ali H. Sayed. Exact diffusion for distributed optimization and learning-Part I:Algorithm development. IEEE Transactions on Signal Processing[J]. 2019, 67(3):708-723.

[43] Kun Yuan, Bicheng Ying, Xiaochuan Zhao, and Ali H. Sayed. Exact diffusion for distributed optimization and learning-Part Ⅱ:Convergence analysis. IEEE Transactions on Signal Processing[J]. 2019, 67(3):724-739.

[44] Kevin Scaman, Francis Bach, Sebastien Bubeck, Laurent Massoulié, and Yin Tat Lee. Optimal algorithms for non-smooth distributed optimization in networks. In Advances in Neural Information Processing Systems 31, pages 2740-2749. Curran Associates, Inc., 2018.

[45] Guannan Qu and Na Li. Accelerated distributed Nesterov gradient descent. arXiv:1705.07176, 2017.

[46] César A. Uribe, Soomin Lee, Alexander Gasnikov, and Angelia Nedi?. A dual approach for optimal algorithms in distributed optimization over networks. arXiv:1809.00710, 2018.

[47] Huan Li, Cong Fang, Wotao Yin, and Zhouchen Lin. A sharp convergence rate analysis for distributed accelerated gradient methods. arXiv:1810.01053, 2018.

[48] Kevin Seaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, and Laurent Massoulié. Optimal algorithms for smooth and strongly convex distributed optimization in networks. In Proceedings of the 34th International Conference on Machine Learning 70, pages 3027-3036, 2017.

[49] Tianyu Wu, Kun Yuan, Qing Ling, Wotao Yin, and Ali H. Sayed. Decentralized consensus optimization with asynchrony and delays. IEEE Transactions on Signal and Information Processing over Networks[J]. 2018, 4(2):293-307.

[50] Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5330-5340. Curran Associates, Inc., 2017.

[51] Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu. Asynchronous decentralized parallel stochastic gradient descent. In International Conference on Machine Learning, pages 3043-3052, 2018.

[1]	郭科, 韩德仁. 单调算子理论与分裂算法[J]. 计算数学, 2018, 40(4): 418-435.
[2]	孙澈,秦树杰. 非定常线性对流扩散问题的算子分裂半显式有限元分析[J]. 计算数学, 2003, 25(1): 23-34.

--> -->

阅读次数

全文

摘要

Cited

Shared

PDF全文下载地址:

http://www.computmath.com/jssx/CN/article/downloadArticleFile.do?attachType=PDF&id=69