An Effective Algorithm of Restricted Boltzmann Machine Based on Momentum Method
SHEN Hui-hui1,2,3, LI Hong-wei1,3
1. School of Mathematics and Physics, China University of Geosciences, Wuhan, Hubei 430074, China;
2. School of Statistics & Information Management, Hubei University of Economics, Wuhan, Hubei 430205, China;
3. Hubei Subsurface Multi-scale Imaging Key Laboratory, China University of Geosciences, Wuhan, Hubei 430074, China
Abstract:Deep learning is bringing revolution to pattern recognition and machine learning,which has been successfully applied to language processing,image processing,signal processing,business economy and so on.Restricted Boltzmann machine (RBM) is a strong representation and generative mod el,however,the learning time of deep belief nets (DBN),which consists of multiple stacking RBM,will be longer.In this paper,the improved momentum method is used not only in gradient ascent algorithm but also in gradient descent algorithm for both classification accuracy enhancement and training time decreasing.According to the characteristics of the gradient ascent algorithm,a rapidly ascending momentum method is used in the RBM pre-training phase,which greatly improves the speed of learning.According to the characteristics of the gradient descent algorithm,an improved slowly descending momentum term is also used in the fine-tuning stage to accurately find the best point.Through the recognition experiments on the MNIST dataset and CMU-PIE face dataset,the achieved results show that the improved momentum algorithm can effectively enhance the ability of image feature expression and improve both accuracy and computation efficiency.
[1] 焦李成,杨淑媛,刘芳等.神经网络七十年:回顾与展望[J].计算机学报,2016,39(1):1-21. Jiao Li-cheng,Yang Shu-yuan,Liu Fang.Neural network in seventy:retrospect and prospect[J].Chinese Journal of Computers,2016,39(1):1-21.(in Chinese)
[2] Lee H,Grosse R,Ranganath R.Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations[A].Proceedings of the 26th Annual International Conference on Machine Learning[C].New York:ACM,2009:609-616.
[3] Swersky K,Chen B,Marlin B M.A tutorial on stochastic approximation algorithms for training restricted boltzmann machines and deep belief nets[A].ITA[C].IEEE,2010,80-89.
[4] Mei X G,Ma Y,Fan F.Infrared ultraspectral signature classification based on a restricted Boltzmann machine with sparse and prior constraints[J].International Journal of Remote Sensing,2015,36(18):4724-4747.
[5] Hinton G E,Srivastava N,Krizhevsky A,Sutskever I,Salakhutdinov R.Improving neural networks by preventing co-adaptation of feature detectors[DB/OL].https://arxiv.org/pdf/1207.0580v1.pdf,2012-7-3.
[6] Wager S,Wang S,Liang P.Dropout training as adaptive regularization[DB/OL].https://arxiv.org/pdf/1307.1493v2.pdf,2013-11-1.
[7] Hinton G E.Training products of experts by minimizing contrastive divergence[J].Neural Computation,2002,14(8):1711-1800.
[8] Mayraz G,Hinton G E.Recognizing handwritten digits using hierarchical products of experts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(2):189-197.
[9] Hinton G E,Osindero S,Teh Y W.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554.
[10] 杨杰,孙亚东,张良俊, 刘海波.基于弱监督学习的去噪受限玻尔兹曼机特征提取算法[J].电子学报,2014,42(12):2365-2370. Yang Jie,Sun Ya-dong,Zhang Liang-jun,Liu Hai-bo.Weakly supervised learning with denoising restricted Boltzmann machines for extracting features[J].Acta Electronica Sinica,2014,42(12):2365-2370. (in Chinese)
[11] Lopes N,Ribeiro B,Goncalves J.Restricted Boltzmann machines and deep belief networks on multi-core processors[A].WCCI 2012 IEEE World Congress on Computational Intelligence June[C].Brisbane,Australia,2012.10-15.
[12] Zhang Ch Y,Philip-Chen C L,Chen D W.MapReduce based distributed learning algorithm for restricted Boltzmann machine[J].Neurocomputing,2016,198:4-11.
[13] Rumelhart D E,Hinton G E,Williams R J.Learning representations by back-propagating errors[J].Nature,1986,323:533-536.
[14] Hinton G E.A practical guide to training restricted Boltzmann machines[R].Neural Networks:Tricks of the Trade (2nd ed), 2012.599-619.
[15] Sutskever I,Martens J,Dahl G,Hinton G.On the importance of initialization and momentum in deep learning[A].Proc International Conference on Machine Learning[C].Atlanta,USA,2013:1139-1147.
[16] Nitanda A.Stochastic proximal gradient descent with acceleration techniques[A].Proc Advances in Neural Information Processing Systems[C].Montreal,Canada,2014.1574-1582.
[17] Zareba S,Gonczarek A,Tomczak J M,Swiatek J.Accelerated learning for restricted Boltzmann machine with momentum term[A].International Conference on Systems Engineering[C].Coventry,UK,2015.187-192.
[18] Yuan K,Ying B C,Sayed A H.On the influence of momentum acceleration on online learning[J].Journal of Machine Learning Research,2016(17):1-66.
[19] 李飞,高晓光,万开方.基于权值动量的RBM加速学习算法研究[J].自动化学报,2017,43(7):1142-1159. Li Fei,Gao Xiao-guang,Wan Kai-fang.Research on RBM accelerating learning algorithm with weight momentum[J].Acta Automatica Sinica,2017,43(7):1142-1159.(in Chinese)
[20] Fischer A,Igel C.Training restricted Boltzmann machines:An introduction[J].Pattern Recognition,2014,(47):25-39.
[21] Polyak T.Some methods of speeding up the convergence of iteration methods[J].USSR Computational Mathematics and Mathematical Physics,1964,4(5):1-17.
[22] Goodfellow I,Bengio Y,Courville A著,赵申剑等译,深度学习[M].北京:人民邮电出版社,2017.181-187.
[23] 王岳青,窦勇,吕启,李宝峰,李腾.基于异构体系结构的并行深度学习编程框架[J].计算机研究与发展,2016,53(6):1202-1210. Wang Yue-qing,Dou Yong,Lv Qi,et al.A parallel deep learning programming framework based on heterogeneous architecture[J].Journal of Computer Research and Development,2016,53(6):1202-1210.(in Chinese)
[24] 付晓,沈远彤,付丽华等.基于特征聚类的稀疏自编码快速算法[J].电子学报,2018,46(5):1041-1046. FU Xiao,SHEN Yuan-tong,FU Li-hua,et al..An optimized sparse auto-encoder network based on feature clustering[J].Acta Electronica Sinica 2018,46(5):1041-1046(in Chinese)
[25] 李倩玉,蒋建国,齐美彬.基于改进深层网络的人脸识别算法[J].电子学报,2017,45(3):619-625. Li Qian-yu, Jiang Jian-guo,Qi Mei-bin.Face recognition algorithm based on improved deep networks[J].Acta Electronica Sinica,2017,45(3):619-625.(in Chinese)