贵州大学计算机科学与技术学院,贵州贵阳 550025
[ "易 令 男,1994年出生于贵州省铜仁市,硕士.主要研究方向为自适应流媒体,深度强化学习.E-mail: 2821210016@qq.com" ]
[ "李泽平 男,1964年出生于贵州省贵阳市,博士,毕业于电子科技大学.现为贵州大学计算机科学与技术学院教授.主要研究方向为网络分发,视频流.E-mail: zpli1@gzu.edu.cn" ]
收稿:2021-04-16,
修回:2022-01-27,
纸质出版:2022-05-25
移动端阅览
易令,李泽平.基于深度强化学习的码率自适应算法研究[J].电子学报,2022,50(05):1192-1200.
YI Ling,LI Ze-ping.Research of Adaptive Bitrate Algorithm Based on Deep Reinforcement Learning[J].ACTA ELECTRONICA SINICA,2022,50(05):1192-1200.
易令,李泽平.基于深度强化学习的码率自适应算法研究[J].电子学报,2022,50(05):1192-1200. DOI: 10.12263/DZXB.20210487.
YI Ling,LI Ze-ping.Research of Adaptive Bitrate Algorithm Based on Deep Reinforcement Learning[J].ACTA ELECTRONICA SINICA,2022,50(05):1192-1200. DOI: 10.12263/DZXB.20210487.
码率自适应(Adaptive BitRate,ABR)算法是视频客户端提高用户体验质量(Quality of Experience,QoE)的一种有效途径.针对现有ABR算法存在频繁缓冲、视频卡顿、画质较低和网络吞吐量预测不准确等问题,本文提出一种基于深度强化学习的码率自适应(Deep Reinforcement Learning based ABR,DRLA)算法.DRLA用实际网络带宽数据训练神经网络,通过收集客户端缓冲区占用率和网络吞吐量向视频服务器请求最佳码率的视频.首先,DRLA用基线函数方法优化损失函数
L
,用熵随机探索方法防止损失函数局部收敛;其次利用约束条件限制新旧策略的散度更新幅度提高算法的鲁棒性;最后通过置信域(trust region)优化找到最优策略,使得QoE达到最优.与现有ABR算法对比的实验结果表明:DRLA减少了训练时间,能进一步提高算法的鲁棒性和用户的QoE,并在实际环境下验证了算法的有效性.
Modern video players employ adaptive bitrate(ABR) algorithms to improve user quality of experience(QoE). Aiming at the problems of the existing ABR algorithms
for example
these algorithms usually lead to frequent rebuffering
video freezes
low video quality
or inaccurate network throughput prediction. In this paper
we propose a deep reinforcement learning algorithm based on ABR(DRLA). DRLA trains the neural network with the actual network bandwidth data
and requests the video with the best bit rate from the video server by collecting the client buffer occupancy rate and network throughput. DRLA optimizes the loss function with the baseline function method. To encourage exploration
we add an entropy regularization term to the update rule of the policy network. Then
DRLA uses constraints to limit the divergence of the new and old policies. Besides
DRLA optimizes the policy to use trust region to improve QoE. Compared with the existing ABR algorithms on the QoE metrics
DRLA reduces training time
is more robust
and can further improve QoE
and the experimental results verify the effectiveness of this algorithm.
曹燕 , 董一鸿 , 邬少清 , 陈华辉 , 钱江波 , 潘善亮 . 动态网络表示学习研究进展 [J]. 电子学报 , 2020 , 48 ( 10 ): 2047 - 2059 .
CAO Yan , DONG Yi-hong , WU Shao-qing , CHEN Hua-hui , QIAN Jiang-bo , PAN Shan-liang . Dynamic network representation learning: A review [J]. Acta Electronica Sinica , 2020 , 48 ( 10 ): 2047 - 2059 . (in Chinese)
DOBRIAN F , SEKAR V , AWAN A , et al . Understanding the impact of video quality on user engagement [J]. ACM SIGCOMM , 2011 , 41 ( 4 ): 362 - 373 .
KRISHNAN S , SITARAMAN R . Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs [J]. IEEE/ACM Transactions on Networking , 2013 , 21 ( 6 ): 2001 - 2014 .
STOCKHAMMER T . Dynamic adaptive streaming over HTTP standards and design principles [C]// ACM Conference on Multimedia Systems . Scottsdale Arizona, USA : ACM , 2011 : 133 - 144 .
MAO H , NETRAVALI R , ALIZADEH M . Neural adaptive video streaming with pensieve [C]// ACM Special Interest Group on Data Communication . Los Angeles, USA : ACM , 2017 : 197 - 210 .
SPITERI K , URGAONKAR R , SITARAMAN R K . BOLA: Near-optimal bitrate adaptation for online videos [J]. IEEE/ACM Transactions on Networking , 2020 , 28 ( 4 ): 1698 - 1711 .
SUN Y , YIN X , JIANG J , et al . C S2P : Improving video bitrate selection and adaptation with data-driven throughput prediction [C]// ACM SIGCOMM Conference . Florianopolis, Brazil : ACM , 2016: 272 - 285 .
JIANG J , SEKAR V , ZHANG H . Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with festive [J]. IEEE/ACM Transactions on Networking , 2014 , 22 ( 1 ): 326 - 340 .
YIN X , JINDAL A , SEKAR V , et al . A control-theoretic approach for dynamic adaptive video streaming over HTTP [C]// ACM Conference on Special Interest Group on Data Communication . London, UK : ACM , 2015 : 325 - 338 .
CLAEYS M , LATRÉ S , FAMAEY J , et al . Design and optimisation of a Q-learning-based HTTP adaptive streaming client [J]. Connection Science , 2014 , 26 ( 1 ): 25 - 43 .
MAO H , CHEN S , DIMMERY D , et al . Real-world video adaptation with reinforcement learning [C]// International Conference on Machine Learning . California, USA : ACM , 2019 : 1 - 10 .
LETHAM B , BAKSHY E . Bayesian optimization for policy search via online-offline experimentation [J]. Journal of Machine Learning Research , 2019 , 20 ( 145 ): 1 - 30 .
MNIH V , BADIA A P , MIRZA M , et al . Asynchronous methods for deep reinforcement learning [C]// International Conference on Machine Learning . New York : ACM , 2016 : 1928 - 1937 .
LEKHARU A , MOULII K Y , SUR A , et al . Deep learning based prediction model for adaptive video streaming [C]// International Conference on COMmunication Systems & NETworkS . Bengaluru, India : IEEE , 2020 : 152 - 159 .
GADALETA M , CHIARIOTTI F , ROSSI M , et al . D-DASH: A deep Q-learning framework for DASH video streaming [J]. IEEE Transactions on Cognitive Communications and Networking , 2017 , 3 ( 4 ): 703 - 718 .
HUO L , WANG Z , XU M , et al . A meta-learning framework for learning multi-user preferences in QoE optimization of DASH [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2019 , 30 ( 9 ): 3210 - 3225 .
SUTTON R S , BARTO A G . Reinforcement Learning: An Introduction [M]. 2th ed . London : MIT Press , 2018 .
MAO H , VENKATAKRISHNAN S B , SCHWARZKOPF M , et al . Variance reduction for reinforcement learning in input-driven environments [C]// International Conference on Learning Representations . New Orleans, USA : ACM , 2019 : 1 - 20 .
HAMMERSLEY J . Monte Carlo Methods [M]. 4th ed . USA : Halsted Press , 2013 .
郭宪 , 方勇纯 . 深入浅出强化学习: 原理入门 [M]. 北京 : 电子工业出版社 , 2018 .
ENGSTROM L , ILYAS A , SANTURKAR S , et al . Implementation matters in deep RL: A case study on ppo and trpo [C]// International Conference on Learning Representations . New Orleans : ACM , 2019 : 1 - 14 .
SCHULMAN J , LEVINE S , ABBEEL P , et al . Trust region policy optimization [C]// International conference on machine learning . Lille, France : ACM , 2015 : 1889 - 1897 .
BAUER S , CLARK D , LEHR W . Gigabit broadband measurement workshop report [J]. ACM SIGCOMM Computer Communication Review , 2020 , 50 ( 1 ): 60 - 65 .
RIISER H , VIGMOSTAD P , GRIWODZ C , et al . Commute path bandwidth traces from 3G networks: analysis and applications [C]// ACM Multimedia Systems Conference . Boston Massachusetts : ACM , 2013 : 114 - 118 .
RACA D , QUINLAN J J , ZAHRAN A H , et al . Beyond throughput: A 4G LTE dataset with channel and context metrics [C]// Proceedings of the 9th ACM Multimedia Systems Conference . New York : ACM , 2018 : 460 - 465 .
HUANG T , ZHOU C , YAO X , et al . Quality-aware neural adaptive video streaming with lifelong imitation learning [J]. IEEE Journal on Selected Areas in Communications , 2020 , 38 ( 10 ): 2324 - 2342 .
SPITERI K , SITARAMAN R , SPARACIO D . From theory to practice: Improving bitrate adaptation in the DASH reference player [J]. ACM Transactions on Multimedia Computing, Communications, and Applications , 2020 , 15 ( 2 ): 1 - 29 .
0
浏览量
8
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621