1.中国科学院软件研究所基础软件国家工程研究中心,北京 100190
2.中国科学院大学,北京 100049
[ "王玉庆 男,1987年出生,河南南阳人.2013年毕业于北京邮电大学计算机学院.其后在中国科学院软件研究所工作,现为博士研究生.主要研究方向为计算机体系结构、操作系统和缓存系统.E-mail: yuqing@iscas.ac.cn" ]
[ "杨秋松 男.1977年出生,河北沧州人.2008年毕业于中国科学院软件研究所.其后在中国科学院软件研究所工作.博士,教授,博士生导师.主要研究方向为操作系统、软件工程和系统安全." ]
收稿:2021-02-23,
修回:2021-10-27,
纸质出版:2023-02-25
移动端阅览
王玉庆,杨秋松,李明树.基于指令流混合模式学习的缓存预取算法[J].电子学报,2023,51(02):342-354.
WANG Yu-qing,YANG Qiu-song,LI Ming-shu.A Cache Prefetching Mechanism Based on Hybrid Pattern Learning of Instruction Flow[J].ACTA ELECTRONICA SINICA,2023,51(02):342-354.
王玉庆,杨秋松,李明树.基于指令流混合模式学习的缓存预取算法[J].电子学报,2023,51(02):342-354. DOI: 10.12263/DZXB.20210273.
WANG Yu-qing,YANG Qiu-song,LI Ming-shu.A Cache Prefetching Mechanism Based on Hybrid Pattern Learning of Instruction Flow[J].ACTA ELECTRONICA SINICA,2023,51(02):342-354. DOI: 10.12263/DZXB.20210273.
近期缓存预取算法的研究热点是使用基于模式识别的预测技术,例如Lookahead,推算访存请求的地址.此类算法一方面很难学习访存行为中的依赖缓存失效,另一方面不能精确控制预取请求发送和写回的时机.为了解决上述问题,本文提出了一种基于分支预测技术和混合模式学习的缓存预取(Instruction Flow Based Hybrid Prediction,IFBHP)算法.使用分支预测技术识别程序未来指令流中的访存指令流,通过多种地址关联模式的学习逐一计算访存指令流中每条指令的地址,写入访存地址队列.使用阈值评估未来指令流进入处理器主流水线的时刻,精确控制指令流所对应的预取请求的发送和写回.实验表明,本文算法相比STeMS(Spatio-Temporal Memory Streaming)算法、ISB++(Irregular Stream Buffer++)算法、SANGAM算法、IPCP(Instruction Pointer Classifier based spatial Prefetching)算法一级数据的读操作缓存失效次数分别平均减少31.58%,28.85%,17.85%,11.48%;本文算法相比STeMS算法、ISB++算法、SANGAM算法、IPCP算法一级数据的写操作缓存失效次数分别平均减少31.58%,28.85%,17.85%,11.48%.
Pattern recognition mechnisms
such as Lookahead
have been used to calculate the addresses of prefetching in cache prefetchers. However
these mechanisms cannot dispose cache dependent missing. At the same time
they cannot send and write back prefetching requests to cache systems in time. IFBHP (Instruction Flow Based Hybrid Prediction)
a mechanism based on branch prediction and hybrid learning is proposed. IFBHP predicts memory accessing instruction streams based on branch prediction. Then it uses hybrid learning mechanisms to predict the address of every memory accessing instruction
and write the address into a dedicated queue. IFBHP controls sending and writing time of prefetching requests by estimating the time of instruction blocks coming in pipeline. The evaluations prove that IFBHP reduced cache misses of read operations by 31.58%
28.85%
17.85%
11.48% respectively comparing with STeMS (Spatio-Temporal Memory Streaming)
ISB++ (Irregular Stream Buffer++)
SANGAM
IPCP (Instruction Pointer Classifier based spatial Prefetching). While
IFBHP reduced cache misses of write operations by 31.58%
28.85%
17.85%
11.48% respectively.
WULF W A , MCKEE S A . Hitting the memory wall [J]. ACM SIGARCH Computer Architecture News , 1995 , 23 ( 1 ): 20 - 24 .
WON J Y , GRATZ P , SHAKKOTTAI S , et al . Having your cake and eating it too: Energy savings without performance loss through resource sharing driven power management [C]// 2015 IEEE/ACM International Symposium on Low Power Electronics and Design(ISLPED) . Rome : IEEE , 2015 : 255 - 260 .
BAKHSHALIPOUR M , TABAEIAGHDAEI S , LOTFI-KAMRAN P , et al . Evaluation of hardware data prefetchers on server processors [J]. ACM Computing Surveys , 2020 , 52 ( 3 ): 1 - 29 .
SMITH A J . Sequential program prefetching in memory hierarchies [J]. Computer , 1978 , 11 ( 12 ): 7 - 21 .
CHEN T F , BAER J L . Effective hardware-based data prefetching for high-performance processors [J]. IEEE Transactions on Computers , 1995 , 44 ( 5 ): 609 - 623 .
SOMOGYI S , WENISCH T F , AILAMAKI A , et al . Spatio-temporal memory streaming [C]// Proceedings of the 36th Annual International Symposium on Computer Architecture . New York : ACM , 2009 : 69 - 80 .
WU H , NATHELLA K , SUNWOO D , et al . Efficient metadata management for irregular data prefetching [C]// 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture . Phoenix : IEEE , 2019 : 1 - 13 .
JAIN A , LIN C . Linearizing irregular memory accesses for improved correlated prefetching [C]// 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO) . Davis : IEEE , 2013 : 247 - 259 .
MICHAUD P . Best-offset hardware prefetching [C]// 2016 IEEE International Symposium on High Performance Computer Architecture . Barcelona : IEEE , 2016 : 469 - 480 .
PAKALAPATI S , PANDA B . Bouquet of instruction pointers: Instruction pointer classifier-based spatial hardware prefetching [C]// 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture . Valencia : IEEE , 2020 : 118 - 131 .
CHAUDHURI M , DESHMUKH N . Sangam: A multi-component core cache prefetcher [EB/OL]. ( 2019-06-23 )[ 2021-10-26 ]. https://dpc3.compas.cs.stonybrook.edu/pdfs/Sangam.pdf https://dpc3.compas.cs.stonybrook.edu/pdfs/Sangam.pdf .
KADJO D , KIM J , SHARMA P , et al . B-fetch: Branch prediction directed prefetching for chip-multiprocessors [C]// 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture . Cambridge : IEEE , 2014 : 623 - 634 .
KIM J , PUGSLEY S H , GRATZ P V , et al . Path confidence based lookahead prefetching [C]// 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) . Taipei : IEEE , 2016 : 1 - 12 .
CAO P , FELTEN E W , KARLIN A R , et al . A study of integrated prefetching and caching strategies [C]// Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems . New York : ACM , 1995 : 188 - 197 .
王玉庆 . 基于指令流关联模式的缓存性能优化方法研究 [D]. 北京 : 中国科学院大学 , 2021 .
BAER J L , CHEN T F . An effective on-chip preloading scheme to reduce data access penalty [C]// Proceedings of the 1991 ACM/IEEE Conference on Supercomputing . New York : ACM , 1991 : 176 - 186 .
ISHII Y , INABA M , HIRAKI K . Access map pattern matching for high performance data cache prefetch [J]. Journal of Instruction-Level Parallelism , 2011 , 13 ( 1 ): 1 - 24 .
SAIR S , SHERWOOD T , CALDER B . A decoupled predictor-directed stream prefetching architecture [J]. IEEE Transactions on Computers , 2003 , 52 ( 3 ): 260 - 276 .
PUGSLEY S H , CHISHTI Z , WILKERSON C , et al . Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers [C]// 2014 IEEE 20th International Symposium on High Performance Computer Architecture . Orlando : IEEE , 2014 : 626 - 637 .
SHAKERINAVAET M , BAKHSHALIPOUR M , KAMRAN P L , et al . Multi-lookahead offset prefetching [EB/OL]. ( 2019-06-23 )[ 2021-10-26 ]. https://dpc3.compas.cs.stonybrook.edu/pdfs/Multi_lookahead.pdf https://dpc3.compas.cs.stonybrook.edu/pdfs/Multi_lookahead.pdf .
BAKHSHALIPOUR M , LOTFI-KAMRAN P , SARBAZI-AZAD H . Domino temporal data prefetcher [C]// 2018 IEEE International Symposium on High Performance Computer Architecture . Vienna : IEEE , 2018 : 131 - 142 .
WENISCH T F , SOMOGYI S , HARDAVELLAS N , et al . Temporal streaming of shared memory [J]. ACM SIGARCH Computer Architecture News , 2005 , 33 ( 2 ): 222 - 233 .
SHEVGOOR M , KOLADIYA S , BALASUBRAMONIAN R , et al . Efficiently prefetching complex address patterns [C]// Proceedings of the 48th International Symposium on Microarchitecture . New York : ACM , 2015 : 141 - 152 .
NESBIT K J , DHODAPKAR A S , SMITH J E . AC/DC: An adaptive data cache prefetcher [C]// Proceedings . 13th International Conference on Parallel Architecture and Compilation Techniques . Antibes : IEEE , 2004 : 135 - 145 .
BAKHSHALIPOUR M , SHAKERINAVA M , LOTFI-KAMRAN P , et al . Bingo spatial data prefetcher [C]// 2019 IEEE International Symposium on High Performance Computer Architecture . Piscataway : IEEE , 2019 : 399 - 411 .
KAMRUZZAMAN M , SWANSON S , TULLSEN D M . Inter-core prefetching for multicore processors using migrating helper threads [C]// Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems . New York : ACM , 2011 : 393 - 404 .
KONDGULI S , HUANG M . R3-DLA(reduce, reuse, recycle): A more efficient approach to decoupled look-ahead architectures [C]// 2019 IEEE International Symposium on High Performance Computer Architecture . Washington : IEEE , 2019 : 533 - 544 .
MUTLU O , KIM H , PATT Y N . Techniques for efficient processing in runahead execution engines [J]. ACM SIGARCH Computer Architecture News , 2005 , 33 ( 2 ): 370 - 381 .
LEE , SMITH . Branch prediction strategies and branch target buffer design [J]. Computer , 1984 , 17 ( 1 ): 6 - 22 .
SEZNEC A , FRABOULET A . Effective ahead pipelining of instruction block address generation [C]// Proceedings of the 30th Annual International Symposium on Computer Architecture . New York : ACM , 2003 : 241 - 252 .
gem The 5 Simulator . A modular platform for computer-system architecture research [EB/OL]. ( 2018-08-29 )[ 2021-02-22 ]. http://www.m5sim.org/Main_Page http://www.m5sim.org/Main_Page .
GANESAN K , PANWAR D , JOHN L K . Generation, validation and analysis of SPEC CPU2006 simulation points based on branch, memory and TLB characteristics [C]// SPEC Benchmark Workshop . Berlin : Springer , 2009 : 121 - 137 .
0
浏览量
13
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621