1 |
SilverD, HuangA, MaddisonC J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484 - 489.
|
2 |
SilverD, SchrittwieserJ, SimonyanK, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354 - 359.
|
3 |
周沛, 陈后金, 于泽宽, 等. 跨模态医学图像预测综述[J]. 电子学报, 2019, 47(1):220 - 226.
|
|
ZhouP, ChenH J, YuZ K, et al. A review of multimodal medical image prediction [J]. Acta Electronica,Sinica,2019, 47 (1): 220 - 226. (in Chinese)
|
4 |
LoweR, FoersterJ, BoureauY L, et al. On the pitfalls of measuring emergent communication[A]. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems[C]. Montreal: ICAAMS, 2019. 693 - 701.
|
5 |
WangX, ChenW, WuJ, et al. Video captioning via hierarchical reinforcement learning[A]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C]. Hawaii: CVPR, 2018. 4213 - 4222.
|
6 |
郑兴华, 孙喜庆, 吕嘉欣,等. 基于深度学习和智能规划的行为识别[J]. 电子学报, 2019, 47(8):1661 - 1668.
|
|
ZhengX H, SunX Q, LVJ X, et al. Behavior recognition based on deep learning and intelligent planning [J]. Acta Electronica Sinica, 2019, 47 (8): 1661 - 1668. (in Chinese)
|
7 |
SchulmanJ, LevineS, AbbeelP, et al. Trust region policy optimization[A]. International Conference on Machine Learning[C]. Lille: ICML, 2015. 1889 - 1897.
|
8 |
闻佳, 王宏君, 邓佳, 等. 基于深度学习的异常事件检测[J]. 电子学报, 2020,48(2):308 - 313.
|
|
WenJ, WangH J, DengJ, et al. Abnormal event detection based on deep learning [J]. Acta Electronica Sinica, 2020, 48 (2): 308 - 313. (in Chinese)
|
9 |
AbdallahS, KaisersM. Addressing environment non-stationarity by repeating Q-learning updates[J]. The Journal of Machine Learning Research, 2016, 17(1): 1582 - 1612.
|
10 |
FoersterJ N, FarquharG, AfourasT, et al. Counterfactual multi-agent policy gradients[A]. Thirty-second AAAI Conference on Artificial Intelligence[C]. New Orleans: AAAI, 2018. 2974 - 2982.
|
11 |
LoweR, WuY, TamarA, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[A]. Advances in Neural Information Processing Systems[C]. Long Beach: NIPS, 2017. 6379 - 6390.
|
12 |
HaarnojaT, ZhouA, AbbeelP, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[A]. International Conference on Machine Learning[C]. Stockholm: ICML, 2018. 1856 - 1865.
|
13 |
HaarnojaT, TangH, AbbeelP, et al. Reinforcement learning with deep energy-based policies[A]. Proceedings of the 34th International Conference on Machine Learning[C]. Sydney: ICML, 2017. 1352 - 1361.
|
14 |
DasA, KotturS, MouraJ M F, et al. Learning cooperative visual dialog agents with deep reinforcement learning[A]. Proceedings of the IEEE International Conference on Computer Vision[C]. Venice: ICCV, 2017. 2951 - 2960.
|
15 |
曹源,唐涛,徐田华,穆建成.形式化方法在列车运行控制系统中的应用[J].交通运输工程学报, 2010, 10(1):112 - 126.
|
|
CaoYuan, TangTao, XuTianhua, MuJiancheng. Application of formal method in train operation control system [J]. Journal of Transportation Engineering, 2010, 10 (1): 112 - 126. (in Chinese)
|
16 |
吴胜权,黄振晖,曹源. 有轨电车路权配置与信号系统选择[J]. 中国铁路, 2014, (8):97 - 99.
|
|
WuShengquan, HuangZhenhui, CaoYuan. Tram right of way configuration and signal system selection [J]. China Railway, 2014, (8): 97 - 99. (in Chinese)
|