电子学报 ›› 2022, Vol. 50 ›› Issue (9): 2049-2059.DOI: 10.12263/DZXB.20210589

• 学术论文 •    下一篇

非光滑强凸情形Adam型算法的最优收敛速率

陇盛1,3, 陶蔚2, 张泽东3, 陶卿3   

  1. 1.国防科技大学信息系统工程重点实验室, 湖南 长沙 410073
    2.军事科学院战略评估咨询中心, 北京 100091
    3.陆军炮兵防空兵学院信息工程系, 安徽 合肥 230031
  • 收稿日期:2021-05-09 修回日期:2021-11-02 出版日期:2022-09-25
    • 通讯作者:
    • 陶蔚
    • 作者简介:
    • 陇 盛 男,1998年1月出生于贵州省盘州市.硕士研究生.主要研究领域为机器学习,模式识别.E-mail: ls15186322349@163.com
      陶 蔚 男,1991年出生于安徽省合肥市.博士,助理研究员,主要研究领域为机器学习.
      张泽东 男,1994年出生于山东省临清市.硕士研究生,主要研究领域为机器学习,模式识别.E-mail: l632783823@qq.com
      陶 卿 男,1965年出生于安徽省合肥市.博士, 教授,博士生导师,CCF高级会员,主要研究领域为机器学习,模式识别, 应用数学.E-mail: qing.tao@ia.ac.cn
    • 基金资助:
    • 国家自然科学基金 (62076252)

The Optimal Convergence Rate of Adam-Type Algorithms for Non-Smooth Strongly Convex Cases

LONG Sheng1,3, TAO Wei2, ZHANG Ze-dong3, TAO Qing3   

  1. 1.Science and Technology on Information Systems Engineering Laboratory,National University of Defense Technology,Changsha,Hunan 410073,China
    2.Center for Strategic Assessment and Consulting,Academy of Military Science,Beijing 100091,China
    3.Department of Information Engineering,Army Academy of Artillery and Air Defense,Hefei,Anhui 230031,China
  • Received:2021-05-09 Revised:2021-11-02 Online:2022-09-25 Published:2022-10-26
    • Corresponding author:
    • TAO Wei

摘要:

对于非光滑强凸问题,在线梯度下降(Online Gradient Decent,OGD)取适当步长参数可以得到对数阶后悔界.然而,这并不能使一阶随机优化算法达到最优收敛速率.为解决这一问题,研究者通常采取两种方案:其一是改进算法本身,另一种是修改算法输出方式.典型的Adam(Adaptive moment estimation)型算法SAdam(Strongly convex Adaptive moment estimation)采用了改进算法的方式,并添加了自适应步长策略和动量技巧,虽然得到更好的数据依赖的后悔界,但在随机情形仍然达不到最优.针对这个问题,本文改用加权平均的算法输出方式,并且重新设计与以往算法同阶的步长超参数,提出了一种名为WSAdam(Weighted average Strongly convex Adaptive moment estimation)的Adam型算法.证明了WSAdam达到了非光滑强凸问题的最优收敛速率.经过Reddi问题的测试和在非光滑强凸函数优化中的实验,验证了所提方法的有效性.

关键词: 非光滑, 强凸优化, 自适应步长, 动量方法, Adam型算法, 加权平均, 收敛速率

Abstract:

For non-smooth strongly convex problems, the logarithmic regret bound can be obtained by online gradient descent with the appropriate step size parameter. However, it cannot make the first order stochastic optimization algorithm achieve the optimal convergence rate. To solve this problem, researchers usually adopt two schemes: one is to modify the iterate of the algorithm, the other is to change the output mode of the algorithm. SAdam, a typical Adam-type algorithm, modify the algorithm by adding adaptive step size strategy and momentum technique. Although it obtains a tighter data dependent regret bound, it still cannot reach the optimal bound in the stochastic case. To solve this problem, this paper redesigns the step size scheme which is the same order as the previous algorithm, uses the weighted average algorithm output mode, and proposes an Adam-type algorithm named WSAdam. It is proved that WSAdam achieves the optimal convergence rate for non-smooth strongly convex problems. The validity of the proposed method is verified by the test of Reddi's problem and the experiment in the optimization of non-smooth strongly convex functions.

Key words: non-smooth, strongly convex, adaptive step size, momentum methods, Adam-type algorithms, weighted average, convergence rate

中图分类号: