摘要: |
目的 针对传统优化算法在训练深度学习模型时,由于模型参数量不断增大,网络层数不断加深所产生的训
练效率较低的问题,提出一种基于 Nesterov 加速的 Nadabelief 优化算法,以提高模型的训练效率。 方法 首先采取
Adabelief 算法代替 Adam 算法,缓解了算法的泛化性问题;接着从一阶矩经典动量项的角度出发,在 Adabelief 算法
的基础上引入了 Nesterov 动量加速机制,在梯度更新时不仅考虑当前时刻的梯度,还借助于历史累积梯度来修正
梯度的更新幅度,进一步提升了算法的效率;最后根据理论分析证明得到算法的遗憾界,确保了算法的收敛性。 结
果 为了验证算法的性能,在凸情况下进行了 Logistic 回归实验,在非凸情况下进行了图像分类和语言建模实验,通
过与 Adam、Adabelief 等算法的比较,验证了 Nadabelief 算法的优越性。 通过在不同初始学习率下对算法进行测试,
验证了算法良好的鲁棒性。 结论 实验表明:所提出的算法在保持原有 Adabelief 算法泛化能力的同时兼具更好的
收敛精度,在训练深度学习模型时效率得到了进一步提高。 |
关键词: 自适应算法 Nesterov 动量加速 深度学习 图像识别 语言建模 |
DOI: |
分类号: |
基金项目: |
|
An Improved Adaptive Optimization Algorithm Based on Nesterov Acceleration |
QIAN Zhen1, LI Dequan2
|
1. School of Mathematics and Big Data Anhui University of Science and Technology Anhui Huainan 232001 China
2. School of Artificial Intelligence Anhui University of Science and Technology Anhui Huainan 232001 China
|
Abstract: |
Objective Traditional optimization algorithms exhibit lower training efficiency when training deep learning
models due to increasing model parameters and deeper network layers. To address this issue a Nadabelief optimization
algorithm based on Nesterov acceleration was proposed to improve the efficiency of model training. Methods Firstly the
Adabelief algorithm was employed in place of the Adam algorithm to mitigate the generalization problem. Subsequently
from the perspective of the first-order moment classical momentum term the Nesterov momentum acceleration mechanism
was incorporated into the Adabelief algorithm. During gradient updates not only the gradient at the current moment was
considered but the historical cumulative gradient was also utilized to adjust the magnitude of gradient updates so as to
further improve the convergence of the algorithm. Finally the regret bound of the algorithm was obtained based on
theoretical analysis to ensure the convergence of the algorithm. Results To verify the performance of the algorithm
Logistic regression experiments were conducted in the convex scenario while image classification and language modeling
experiments were carried out in the non - convex scenario. Comparisons with algorithms such as Adam and Adabelief
demonstrated the superiority of the Nadabelief algorithm. Additionally the algorithm?? s robustness was confirmed by
testing it at various initial learning rates. Conclusion The experiments demonstrate that the proposed algorithm not only maintains the generalization capability of the original Adabelief algorithm but also achieves better convergence accuracy.
The proposed algorithm further improves the efficiency when training deep learning models. |
Key words: adaptive algorithms Nesterov momentum acceleration deep learning image recognition language modeling |