1.国防科技大学信息系统工程重点实验室,湖南长沙 410073
2.军事科学院战略评估咨询中心,北京 100091
3.陆军炮兵防空兵学院信息工程系,安徽合肥 230031
[ "陇 盛 男,1998年1月出生于贵州省盘州市.硕士研究生.主要研究领域为机器学习,模式识别.E-mail: ls15186322349@163.com" ]
[ "陶 蔚 男,1991年出生于安徽省合肥市.博士,助理研究员,主要研究领域为机器学习." ]
[ "张泽东 男,1994年出生于山东省临清市.硕士研究生,主要研究领域为机器学习,模式识别.E-mail: l632783823@qq.com" ]
[ "陶 卿 男,1965年出生于安徽省合肥市.博士, 教授,博士生导师,CCF高级会员,主要研究领域为机器学习,模式识别, 应用数学.E-mail: qing.tao@ia.ac.cn" ]
收稿:2021-05-09,
修回:2021-11-02,
纸质出版:2022-09-25
移动端阅览
陇盛,陶蔚,张泽东等.非光滑强凸情形Adam型算法的最优收敛速率[J].电子学报,2022,50(09):2049-2059.
LONG Sheng,TAO Wei,ZHANG Ze-dong,et al.The Optimal Convergence Rate of Adam-Type Algorithms for Non-Smooth Strongly Convex Cases[J].ACTA ELECTRONICA SINICA,2022,50(09):2049-2059.
陇盛,陶蔚,张泽东等.非光滑强凸情形Adam型算法的最优收敛速率[J].电子学报,2022,50(09):2049-2059. DOI: 10.12263/DZXB.20210589.
LONG Sheng,TAO Wei,ZHANG Ze-dong,et al.The Optimal Convergence Rate of Adam-Type Algorithms for Non-Smooth Strongly Convex Cases[J].ACTA ELECTRONICA SINICA,2022,50(09):2049-2059. DOI: 10.12263/DZXB.20210589.
对于非光滑强凸问题,在线梯度下降(Online Gradient Decent,OGD)取适当步长参数可以得到对数阶后悔界.然而,这并不能使一阶随机优化算法达到最优收敛速率.为解决这一问题,研究者通常采取两种方案:其一是改进算法本身,另一种是修改算法输出方式.典型的Adam(Adaptive moment estimation)型算法SAdam(Strongly convex Adaptive moment estimation)采用了改进算法的方式,并添加了自适应步长策略和动量技巧,虽然得到更好的数据依赖的后悔界,但在随机情形仍然达不到最优.针对这个问题,本文改用加权平均的算法输出方式,并且重新设计与以往算法同阶的步长超参数,提出了一种名为WSAdam(Weighted average Strongly convex Adaptive moment estimation)的Adam型算法.证明了WSAdam达到了非光滑强凸问题的最优收敛速率.经过Reddi问题的测试和在非光滑强凸函数优化中的实验,验证了所提方法的有效性.
For non-smooth strongly convex problems
the logarithmic regret bound can be obtained by online gradient descent with the appropriate step size parameter. However
it cannot make the first order stochastic optimization algorithm achieve the optimal convergence rate. To solve this problem
researchers usually adopt two schemes: one is to modify the iterate of the algorithm
the other is to change the output mode of the algorithm. SAdam
a typical Adam-type algorithm
modify the algorithm by adding adaptive step size strategy and momentum technique. Although it obtains a tighter data dependent regret bound
it still cannot reach the optimal bound in the stochastic case. To solve this problem
this paper redesigns the step size scheme which is the same order as the previous algorithm
uses the weighted average algorithm output mode
and proposes an Adam-type algorithm named WSAdam. It is proved that WSAdam achieves the optimal convergence rate for non-smooth strongly convex problems. The validity of the proposed method is verified by the test of Reddi's problem and the experiment in the optimization of non-smooth strongly convex functions.
CUTKOSKY A . Parameter-free, dynamic, and strongly-adaptive online learning [C]// Proceedings of the 37th International Conference on Machine Learning . Virtual Event : ICML , 2020 : 2250 - 2259 .
ZINKEVICH M . Online convex programming and generalized infinitesimal gradient ascent [C]// Proceedings of the Twentieth International Conference on International Conference on Machine Learning . Washington DC : ICML , 2003 : 928 - 935 .
HAZAN E , KALAI A , KALE S , et al . Logarithmic regret algorithms for online convex optimization [C]// Proceedings of 19th Annual Conference on Learning Theory(COLT) . Berlin, Heidelberg : Springer , 2006 : 499 - 513 .
SHAMIR O , ZHANG T . Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes [C]// Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 . Atlanta : ICML , 2013 : I-71-I-79.
AGARWAL A , BARTLETT P L , RAVIKUMAR P , et al . Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization [J]. IEEE Transactions on Information Theory , 2012 , 58 ( 5 ): 3235 - 3249 .
陶卿 , 朱烨雷 , 罗强 , 等 . 一种基于Comid的非光滑损失随机坐标下降方法 [J]. 电子学报 , 2013 , 41 ( 4 ): 768 - 775 .
TAO Q , ZHU Y L , LUO Q , et al . A new comid-based stochastic coordinate descent method for non-smooth losses [J]. Acta Electronica Sinica , 2013 , 41 ( 4 ): 768 - 775 . (in Chinese)
HAZAN E , KALE S . Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization [J]. Journal of Machine Learning Research , 2011 : 421 - 436 .
RAKHLIN A , SHAMIR O , SRIDHARAN K . Making gradient descent optimal for strongly convex stochastic optimization [C]// Proceedings of the 29th International Conference on Machine Learning(ICML) . Edinburgh : ICML , 2012 : 1571 - 1578 .
LACOSTE-JULIEN S , SCHMIDT M , BATCH F . A Simpler Approach to Obtaining an convergence rate for projected stochastic subgradient descent [EB/OL].( 2012-12-10 ). http://arxiv.org/abs/1212.2002 http://arxiv.org/abs/1212.2002 .
邵言剑 , 陶卿 , 姜纪远 , 等 . 一种求解强凸优化问题的最优随机算法 [J]. 软件学报 , 2014 , 25 ( 9 ): 2160 - 2171 .
SHAO Y J , TAO Q , JIANG J Y , et al . Stochastic algorithm with optimal convergence rate for strongly convex optimization problems [J]. Journal of Software , 2014 , 25 ( 9 ): 2160 - 2171 . (in Chinese)
CUTKOSKY A . Anytime online-to-batch, optimism and acceleration [C]// Proceedings of the 36th International Conference on Machine Learning(ICML) . Long Beach : ICML , 2019 : 1446 - 1454 .
KINGMA D P , BA J . Adam: A method for stochastic optimization [C]// Proceedings of the 3th International Conference on Learning Representations(ICLR) . San Diego : ICLR , 2015 : 1 - 13 .
TIMOTHY D . Incorporating Nesterov momentum into Adam [EB/OL].( 2015-12-12 ). http://cs229.stanford.edu/proj2015/054_report.pdf http://cs229.stanford.edu/proj2015/054_report.pdf .
CHEN J H , ZHOU D R , TANG Y Q , et al . Closing the generalization gap of adaptive gradient methods in training deep neural networks [C]// Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence . California : International Joint Conferences on Artificial Intelligence Organization , 2020 : 3267 - 3275 .
TAO Wei , LONG Sheng , WU Gao-wei , et al . The role of momentum parameters in the optimal convergence of adaptive Polyak's heavy-ball methods [EB/OL].( 2021-02-15 ). http://arxiv.org/abs/2102.07314 http://arxiv.org/abs/2102.07314 .
REDDI S J , KALE S , KUMAR S . On the convergence of Adam and Beyond [EB/OL].( 2019-04-19 ). http://arxiv.org/abs/1904.09237 http://arxiv.org/abs/1904.09237 .
MUKKAMALA M C , HEIN M . Variants of RMSProp and Adagrad with logarithmic regret bounds [C]// Proceedings of the 34th International Conference on Machine Learning . Sydney : ICML , 2017 : 2545 - 2553 .
DUCHI J , HAZAN E , SINGER Y . Adaptive subgradient methods for online learning and stochastic optimization [J]. Journal of Machine Learning Research , 2011 , 12 ( 7 ): 257 - 269 .
CHEN Zai-yi , XU Yi , CHEN En-hong , et al . SADAGRAD: Strongly adaptive stochastic gradient methods [C]// Proceedings of the 35th International Conference on Machine Learning(ICML) . Stockholm : ICML , 2018 : 912 - 920 .
WANG Guang-hui , LU Shi-yin , et al . SAdam: A variant of Adam for strongly convex functions [EB/OL].( 2019-05-08 ). http://arxiv.org/abs/1905.02957 http://arxiv.org/abs/1905.02957 .
SHALEV-SHWARTZ S , SINGER Y , SREBRO N , et al . Pegasos: primal estimated sub-gradient solver for SVM [J]. Mathematical Programming , 2011 , 127 ( 1 ): 3 - 30 .
沈卉卉 , 李宏伟 . 基于动量方法的受限玻尔兹曼机的一种有效算法 [J]. 电子学报 , 2019 , 47 ( 1 ): 176 - 182 .
SHEN H H , LI H W . An effective algorithm of restricted boltzmann machine based on momentum method [J]. Acta Electronica Sinica , 2019 , 47 ( 1 ): 176 - 182 . (in Chinese)
姜纪远 , 陶卿 , 邵言剑 , 等 . 随机COMID的瞬时收敛速率分析 [J]. 电子学报 , 2015 , 43 ( 9 ): 1850 - 1858 .
JIANG J Y , TAO Q , SHAO Y J , et al . The analysis of convergence rate of individual COMID iterates [J]. Acta Electronica Sinica , 2015 , 43 ( 9 ): 1850 - 1858 . (in Chinese)
邹军华 , 段晔鑫 , 任传伦 , 等 . 基于噪声初始化、Adam-Nesterov方法和准双曲动量方法的对抗样本生成方法 [J]. 电子学报 , 2022 , 50 ( 1 ): 207 - 216 .
ZOU J H , DUAN Y X , REN C L , et al . Perturbation initialization, Adam-Nesterov and quasi-hyperbolic momentum for adversarial examples [J]. Acta Electronica Sinica , 2022 , 50 ( 1 ): 207 - 216 . (in Chinese)
罗会兰 , 袁璞 , 童康 . 基于深度学习的显著性目标检测方法综述 [J]. 电子学报 , 2021 , 49 ( 7 ): 1417 - 1427 .
LUO H L , YUAN P , TONG K . Review of the methods for salient object detection based on deep learning [J]. Acta Electronica Sinica , 2021 , 49 ( 7 ): 1417 - 1427 . (in Chinese)
0
浏览量
8
下载量
2
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621