电子学报 ›› 2022, Vol. 50 ›› Issue (4): 869-886.DOI: 10.12263/DZXB.20211209

所属专题: 机器学习交叉融合创新 长摘要论文

• 机器学习交叉融合创新 • 上一篇    下一篇

基于无标签视频数据的深度预测学习方法综述

潘敏婷1, 王韫博1, 朱祥明1, 高思宇1(), 龙明盛2(), 杨小康1()   

  1. 1.上海交通大学人工智能研究院、人工智能教育部重点实验室,上海 201109
    2.清华大学软件学院,北京 100084
  • 收稿日期:2021-09-01 修回日期:2022-02-17 出版日期:2022-04-25
    • 作者简介:
    • 潘敏婷 女,1996年生,广西贵港人.现为上海交通大学博士研究生.主要研究方向为视频数据预测模型.E-mail: panmt53@sjtu.edu.cn
      王韫博(通讯作者) 男,1989年生,吉林长春人.现为上海交通大学人工智能研究院助理教授.主要研究方向为深度学习,尤其是预测学习、时空动态系统建模、有模型的强化学习决策.E-mail: yunbow@sjtu.edu.cn
      高思宇 女,1999年生,山东青岛人.现为上海交通大学硕士研究生.主要研究方向为时空序列数据预测.E-mail: siyu.gao@sjtu.edu.cn
      龙明盛 男,1985年生,广西河池人.现为清华大学副教授,国家优秀青年科学基金获得者.主要研究方向为机器学习的理论和算法,尤其是迁移学习、深度学习和面向科学的机器学习方法.E-mail: mingsheng@tsinghua.edu.cn
      杨小康 男,1972年生,浙江东阳人.现为上海交通大学教授,教育部长江学者特聘教授,国家杰出青年科学基金获得者,国家万人计划创新领军人才.主要研究方向为计算机视觉和机器学习.E-mail: xkyang@sjtu.edu.cn
    • 基金资助:
    • 国家自然科学基金 (62106144); 上海市科技重大专项 (2021SHZDZX0102); 上海市青年科技英才扬帆计划 (21Z510202133)

A Survey on Deep Predictive Learning Based on Unlabeled Videos

PAN Min-ting1, WANG Yun-bo1, ZHU Xiang-ming1, GAO Si-yu1(), LONG Ming-sheng2(), YANG Xiao-kang1()   

  1. 1.MoE Key laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 201109, China
    2.School of Software, Tsinghua University, Beijing 100084, China
  • Received:2021-09-01 Revised:2022-02-17 Online:2022-04-25 Published:2022-04-25
    • Supported by:
    • National Natural Science Foundation of China (62106144); Science and Technology Major Project of Shanghai Municipality (2021SHZDZX0102); Shanghai Youth Science and Technology Talents Sailing Program (21Z510202133)

摘要:

基于视频数据的深度预测学习(以下简称“深度预测学习”)属于深度学习、计算机视觉和强化学习的交叉融合研究方向,是气象预报、自动驾驶、机器人视觉控制等场景下智能预测与决策系统的关键组成部分,在近年来成为机器学习的热点研究领域.深度预测学习遵从自监督学习范式,从无标签的视频数据中挖掘自身的监督信息,学习其潜在的时空模式表达.本文对基于深度学习的视频预测现有研究成果进行了详细综述.首先,归纳了深度预测学习的研究范畴和交叉应用领域.其次,总结了视频预测研究中常用的数据集和评价指标.而后,从基于观测空间的视频预测、基于状态空间的视频预测、有模型的视觉决策三个角度,分类对比了当前主流的深度预测学习模型.最后,本文分析了深度预测学习领域的热点问题,并对研究趋势进行了展望.

长摘要
基于视频数据的深度预测学习(以下简称深度预测学习)属于深度学习、计算机视觉和强化学习的交叉融合研究方向,是气象预报、自动驾驶、机器人视觉控制等场景下智能预测与决策系统的关键组成部分,在近年来成为机器学习的热点研究领域。深度预测学习遵从自监督学习范式,从无标签的视频数据中挖掘自身的监督信息,学习其潜在的时空模式表达。同时,深度预测学习与强化学习和视觉决策算法密切相关,智能体学习与世界交互时,为了在环境状态无标注的情况下学习物理对象的运动,可以通过预测学习方法来估计环境在给定动作序列的条件下在观测空间中的反馈,对场景中与动作序列相对应的运动物体进行建模,实现更优的视觉控制与决策。本文对基于深度学习的视频预测现有研究成果进行了详细综述。首先,归纳了深度预测学习的研究范畴和交叉应用领域。其次,从基于观测空间的视频预测、基于状态空间的视频预测、有模型的视觉决策三个角度,分类对比了当前主流的深度预测学习模型。而后,总结了视频预测研究中常用的数据集和评价指标。最后,本文分析了深度预测学习领域的热点问题,并对研究趋势进行了展望。

关键词: 深度学习, 自监督学习, 计算机视觉, 视频预测, 有模型的视觉决策

Abstract:

Deep predictive learning based on video data (hereinafter referred to as "deep predictive learning") is a research direction of deep learning, being interacted with computer vision and reinforcement learning. It is a key part of intelligent prediction and decision-making systems in weather forecasting, autonomous driving, robotics, and other scenarios, and has become a hot research field of machine learning in recent years. Deep predictive learning follows the self-supervised learning paradigm, using internal constraints from unlabeled video data to learn the underlying spatiotemporal patterns. In this paper, we review the existing deep learning techniques for predictive learning in detail. First, we summarize the research scope and application fields of deep predictive learning. Second, we present the datasets and evaluation metrics commonly used in this research field. Third, we summarize current mainstream deep prediction learning models from three perspectives: predictive models based on observation space, predictive models based on state space, and visual planning methods based on the predictive models. Finally, we discuss the hot issues and future research directions in the field of deep predictive learning.

Extended Abstract
Deep predictive learning based on video data (hereinafter referred to as “deep predictive learning”) is a research direction of deep learning, being interacted with computer vision and reinforcement learning. It is a key part of intelligent prediction and decision-making systems in weather forecasting, autonomous driving, robotics, and other scenarios, and has become a hot research field of machine learning in recent years. Deep predictive learning follows the self-supervised learning paradigm, using internal constraints from unlabeled video data to learn the underlying spatiotemporal patterns. Moreover, deep predictive learning is closely related to the reinforcement learning and visual decision-making algorithm. When an agent interacts with the world to learn the motion of a physical object without labeling the environment state, it can predict the feedback of the environment in the observation space under the condition of a given action sequence by the predictive learning method. The moving objects corresponding to the action sequence are modeled to achieve better visual control and decision-making. In this paper, we review the existing deep learning techniques for predictive learning in detail. First, we summarize the research scope and application fields of deep predictive learning. Second, we summarize current mainstream deep prediction learning models from three perspectives: predictive models based on observation space, predictive models based on state space, and visual planning methods based on the predictive models. Third, we present the datasets and evaluation metrics commonly used in this research field. Finally, we discuss the hot issues and future research directions in the field of deep predictive learning.

Key words: deep learning, self-supervised learning, computer vision, video prediction, model-based visual planning

中图分类号: