基于深度强化学习的码率自适应算法研究

易令; 李泽平

doi:10.12263/DZXB.20210487

您当前的位置：

首页 >

文章列表页 >

基于深度强化学习的码率自适应算法研究

学术论文 | 更新时间：2025-12-08

- 基于深度强化学习的码率自适应算法研究
- Research of Adaptive Bitrate Algorithm Based on Deep Reinforcement Learning
- 电子学报 2022年50卷第5期页码：1192-1200
- 作者机构：
  
  贵州大学计算机科学与技术学院，贵州贵阳 550025
- 作者简介：
  
  [ "易令男，1994年出生于贵州省铜仁市，硕士.主要研究方向为自适应流媒体，深度强化学习.E-mail: 2821210016@qq.com" ]
  [ "李泽平男，1964年出生于贵州省贵阳市，博士，毕业于电子科技大学.现为贵州大学计算机科学与技术学院教授.主要研究方向为网络分发，视频流.E-mail: zpli1@gzu.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(61462014)
- DOI：10.12263/DZXB.20210487
  中图分类号： TP393
- 收稿：2021-04-16，
  
  修回：2022-01-27，
  
  纸质出版：2022-05-25
- 稿件说明：
移动端阅览
易令,李泽平.基于深度强化学习的码率自适应算法研究[J].电子学报,2022,50(05):1192-1200.

YI Ling,LI Ze-ping.Research of Adaptive Bitrate Algorithm Based on Deep Reinforcement Learning[J].ACTA ELECTRONICA SINICA,2022,50(05):1192-1200.
易令,李泽平.基于深度强化学习的码率自适应算法研究[J].电子学报,2022,50(05):1192-1200. DOI： 10.12263/DZXB.20210487.

YI Ling,LI Ze-ping.Research of Adaptive Bitrate Algorithm Based on Deep Reinforcement Learning[J].ACTA ELECTRONICA SINICA,2022,50(05):1192-1200. DOI： 10.12263/DZXB.20210487.

摘要

码率自适应（Adaptive BitRate，ABR）算法是视频客户端提高用户体验质量（Quality of Experience，QoE）的一种有效途径.针对现有ABR算法存在频繁缓冲、视频卡顿、画质较低和网络吞吐量预测不准确等问题，本文提出一种基于深度强化学习的码率自适应（Deep Reinforcement Learning based ABR，DRLA）算法.DRLA用实际网络带宽数据训练神经网络，通过收集客户端缓冲区占用率和网络吞吐量向视频服务器请求最佳码率的视频.首先，DRLA用基线函数方法优化损失函数

，用熵随机探索方法防止损失函数局部收敛；其次利用约束条件限制新旧策略的散度更新幅度提高算法的鲁棒性；最后通过置信域（trust region）优化找到最优策略，使得QoE达到最优.与现有ABR算法对比的实验结果表明：DRLA减少了训练时间，能进一步提高算法的鲁棒性和用户的QoE，并在实际环境下验证了算法的有效性.

Abstract

Modern video players employ adaptive bitrate(ABR) algorithms to improve user quality of experience(QoE). Aiming at the problems of the existing ABR algorithms

for example

these algorithms usually lead to frequent rebuffering

video freezes

low video quality

or inaccurate network throughput prediction. In this paper

we propose a deep reinforcement learning algorithm based on ABR(DRLA). DRLA trains the neural network with the actual network bandwidth data

and requests the video with the best bit rate from the video server by collecting the client buffer occupancy rate and network throughput. DRLA optimizes the loss function with the baseline function method. To encourage exploration

we add an entropy regularization term to the update rule of the policy network. Then

DRLA uses constraints to limit the divergence of the new and old policies. Besides

DRLA optimizes the policy to use trust region to improve QoE. Compared with the existing ABR algorithms on the QoE metrics

DRLA reduces training time

is more robust

and can further improve QoE

and the experimental results verify the effectiveness of this algorithm.

关键词

Keywords

references

曹燕 , 董一鸿 , 邬少清 , 陈华辉 , 钱江波 , 潘善亮 . 动态网络表示学习研究进展 [J]. 电子学报 , 2020 , 48 ( 10 ): 2047 - 2059 .

CAO Yan , DONG Yi-hong , WU Shao-qing , CHEN Hua-hui , QIAN Jiang-bo , PAN Shan-liang . Dynamic network representation learning: A review [J]. Acta Electronica Sinica , 2020 , 48 ( 10 ): 2047 - 2059 . (in Chinese)

DOBRIAN F , SEKAR V , AWAN A , et al . Understanding the impact of video quality on user engagement [J]. ACM SIGCOMM , 2011 , 41 ( 4 ): 362 - 373 .

KRISHNAN S , SITARAMAN R . Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs [J]. IEEE/ACM Transactions on Networking , 2013 , 21 ( 6 ): 2001 - 2014 .

STOCKHAMMER T . Dynamic adaptive streaming over HTTP standards and design principles [C]// ACM Conference on Multimedia Systems . Scottsdale Arizona, USA : ACM , 2011 : 133 - 144 .

MAO H , NETRAVALI R , ALIZADEH M . Neural adaptive video streaming with pensieve [C]// ACM Special Interest Group on Data Communication . Los Angeles, USA : ACM , 2017 : 197 - 210 .

SPITERI K , URGAONKAR R , SITARAMAN R K . BOLA: Near-optimal bitrate adaptation for online videos [J]. IEEE/ACM Transactions on Networking , 2020 , 28 ( 4 ): 1698 - 1711 .

SUN Y , YIN X , JIANG J , et al . C S2P : Improving video bitrate selection and adaptation with data-driven throughput prediction [C]// ACM SIGCOMM Conference . Florianopolis, Brazil : ACM , 2016: 272 - 285 .

JIANG J , SEKAR V , ZHANG H . Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with festive [J]. IEEE/ACM Transactions on Networking , 2014 , 22 ( 1 ): 326 - 340 .

YIN X , JINDAL A , SEKAR V , et al . A control-theoretic approach for dynamic adaptive video streaming over HTTP [C]// ACM Conference on Special Interest Group on Data Communication . London, UK : ACM , 2015 : 325 - 338 .

CLAEYS M , LATRÉ S , FAMAEY J , et al . Design and optimisation of a Q-learning-based HTTP adaptive streaming client [J]. Connection Science , 2014 , 26 ( 1 ): 25 - 43 .

MAO H , CHEN S , DIMMERY D , et al . Real-world video adaptation with reinforcement learning [C]// International Conference on Machine Learning . California, USA : ACM , 2019 : 1 - 10 .

LETHAM B , BAKSHY E . Bayesian optimization for policy search via online-offline experimentation [J]. Journal of Machine Learning Research , 2019 , 20 ( 145 ): 1 - 30 .

MNIH V , BADIA A P , MIRZA M , et al . Asynchronous methods for deep reinforcement learning [C]// International Conference on Machine Learning . New York : ACM , 2016 : 1928 - 1937 .

LEKHARU A , MOULII K Y , SUR A , et al . Deep learning based prediction model for adaptive video streaming [C]// International Conference on COMmunication Systems & NETworkS . Bengaluru, India : IEEE , 2020 : 152 - 159 .

GADALETA M , CHIARIOTTI F , ROSSI M , et al . D-DASH: A deep Q-learning framework for DASH video streaming [J]. IEEE Transactions on Cognitive Communications and Networking , 2017 , 3 ( 4 ): 703 - 718 .

HUO L , WANG Z , XU M , et al . A meta-learning framework for learning multi-user preferences in QoE optimization of DASH [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2019 , 30 ( 9 ): 3210 - 3225 .

SUTTON R S , BARTO A G . Reinforcement Learning: An Introduction [M]. 2th ed . London : MIT Press , 2018 .

MAO H , VENKATAKRISHNAN S B , SCHWARZKOPF M , et al . Variance reduction for reinforcement learning in input-driven environments [C]// International Conference on Learning Representations . New Orleans, USA : ACM , 2019 : 1 - 20 .

HAMMERSLEY J . Monte Carlo Methods [M]. 4th ed . USA : Halsted Press , 2013 .

郭宪 , 方勇纯 . 深入浅出强化学习: 原理入门 [M]. 北京 : 电子工业出版社 , 2018 .

ENGSTROM L , ILYAS A , SANTURKAR S , et al . Implementation matters in deep RL: A case study on ppo and trpo [C]// International Conference on Learning Representations . New Orleans : ACM , 2019 : 1 - 14 .

SCHULMAN J , LEVINE S , ABBEEL P , et al . Trust region policy optimization [C]// International conference on machine learning . Lille, France : ACM , 2015 : 1889 - 1897 .

BAUER S , CLARK D , LEHR W . Gigabit broadband measurement workshop report [J]. ACM SIGCOMM Computer Communication Review , 2020 , 50 ( 1 ): 60 - 65 .

RIISER H , VIGMOSTAD P , GRIWODZ C , et al . Commute path bandwidth traces from 3G networks: analysis and applications [C]// ACM Multimedia Systems Conference . Boston Massachusetts : ACM , 2013 : 114 - 118 .

RACA D , QUINLAN J J , ZAHRAN A H , et al . Beyond throughput: A 4G LTE dataset with channel and context metrics [C]// Proceedings of the 9th ACM Multimedia Systems Conference . New York : ACM , 2018 : 460 - 465 .

HUANG T , ZHOU C , YAO X , et al . Quality-aware neural adaptive video streaming with lifelong imitation learning [J]. IEEE Journal on Selected Areas in Communications , 2020 , 38 ( 10 ): 2324 - 2342 .

SPITERI K , SITARAMAN R , SPARACIO D . From theory to practice: Improving bitrate adaptation in the DASH reference player [J]. ACM Transactions on Multimedia Computing, Communications, and Applications , 2020 , 15 ( 2 ): 1 - 29 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于因果思维树的电动汽车电池SOC预测模型

边缘计算中基于QoE感知的任务卸载：势博弈方法

车联网边缘计算环境下基于流量预测的高效任务卸载策略研究

室内VLC-NOMA网络的协作多点用户接入和功率分配