电子学报 ›› 2012, Vol. 40 ›› Issue (11): 2145-2151.DOI: 10.3969/j.issn.0372-2112.2012.11.001

• 学术论文 •    下一篇

面向按序执行处理器的预执行指导的数据预取方法

党向磊, 王箫音, 佟冬, 陆俊林, 程旭, 王克义   

  1. 北京大学微处理器研究开发中心,北京 100871;北京大学微处理器及系统教育部工程研究中心,北京 100871
  • 收稿日期:2011-12-14 修回日期:2012-03-26 出版日期:2012-11-25 发布日期:2012-11-25
  • 通讯作者: 王箫音 女,1983年生于河北保定,北京大学信息科学技术学院博士后.主要研究方向为微处理器结构设计、低功耗设计和存储系统性能优化. E-mail:wangxiaoyin@mprc.pku.edu.cn
  • 作者简介:党向磊 男,1988年生于山东滕州,北京大学信息科学技术学院博士研究生.主要研究方向为微处理器结构设计、访存性能优化和系统芯片设计. E-mail:dangxianglei@mprc.pku.edu.cn
  • 基金资助:
    "核高基"重大专项(No.2009ZX01029-001-002);中国博士后科学基金(No.20110490208)

Pre-Execution Directed Prefetching for In-Order Processors

DANG Xiang-lei, WANG Xiao-yin, TONG Dong, LU Jun-lin, CHENG Xu, WANG Ke-yi   

  1. Microprocessor Research & Development Center,Peking University.Beijing 100871,China; Engineering Research Center of Microprocessor & System Ministry of Education,Peking University,Beijing 100871,China
  • Received:2011-12-14 Revised:2012-03-26 Online:2012-11-25 Published:2012-11-25

摘要: 为提高按序执行处理器的访存性能,本文提出一种预执行指导的数据预取方法(PEDP).PEDP利用跨距预取器对规则的访存模式进行预取,并在发生L2 Cache失效后通过预执行后续指令对不规则的访存模式进行精确的预取,从而结合两者的优势提高预取覆盖率.同时,PEDP利用预执行过程中提前捕获的真实访存信息指导跨距预取器的预取过程.在预执行的指导下,跨距预取器可以对预执行能够产生的符合跨距访存模式的地址更早地发起预取请求,从而改善预取及时性.此外,为进一步优化上述指导过程,PEDP使用更新过滤器有效去除指导过程中对跨距预取器的有害更新,从而提高预取准确率.实验结果表明,在平均情况下,PEDP将基准处理器的性能提升33.0%.与跨距预取和预执行各自单独使用相比,PEDP将性能分别提高16.2%和7.3%.

关键词: 数据预取, 预执行, 访存延迟包容, 按序执行处理器

Abstract: This paper proposes a pre-execution directed prefetching(PEDP) method to improve the memory latency tolerance of in-order processors.PEDP utilizes stride prefetching to handle regular access patterns and pre-execution to generate accurate prefetches regardless of the regularity of access patterns when a L2 cache miss occurs,which combines the advantages of the two techniques to improve the prefetch coverage.Meanwhile,PEDP captures actual memory access patterns during pre-execution to guide the stride prefetcher's update process.Under the guide of pre-execution,the stride prefetcher can issue prefetches earlier than pre-execution for addresses that can be generated by both of the two techniques,thus improving the prefetch timeliness.In addition,PEDP achieves improvement in prefetch accuracy by an update filter which effectively eliminates the harmful updates to the stride prefetcher during the guide process.Experimental results demonstrate that PEDP increases the performance by 33.0% over the baseline processor.Compared with stride prefetching and pre-execution,PEDP improves the performance by 16.2% and 7.3%,respectively.

Key words: prefetching, pre-execution, memory latency tolerance, in-order processors

中图分类号: