WO2022247070A1 - 一种面向高性能的适应预取的智能缓存替换策略 - Google Patents
一种面向高性能的适应预取的智能缓存替换策略 Download PDFInfo
- Publication number
- WO2022247070A1 WO2022247070A1 PCT/CN2021/119290 CN2021119290W WO2022247070A1 WO 2022247070 A1 WO2022247070 A1 WO 2022247070A1 CN 2021119290 W CN2021119290 W CN 2021119290W WO 2022247070 A1 WO2022247070 A1 WO 2022247070A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cache
- demand
- isvm
- predictor
- prefetch
- Prior art date
Links
- 230000003044 adaptive effect Effects 0.000 title claims abstract 3
- 230000015654 memory Effects 0.000 claims description 27
- 238000012549 training Methods 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 238000000034 method Methods 0.000 claims description 13
- 230000006399 behavior Effects 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 8
- 238000002360 preparation method Methods 0.000 claims 1
- 238000012706 support-vector machine Methods 0.000 abstract 1
- 238000003780 insertion Methods 0.000 description 6
- 230000037431 insertion Effects 0.000 description 6
- 230000007423 decrease Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
- G06F12/0646—Configuration or reconfiguration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0871—Allocation or management of cache space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the invention belongs to the field of computer system cache system structure, and in particular relates to a high-performance-oriented intelligent cache replacement strategy adapted to prefetching.
- the performance improvement speed of computer memory is far behind the speed of processor performance improvement, forming a "storage wall” that hinders processor performance improvement, making the memory system one of the performance bottlenecks of the entire computer system.
- the last-level cache (LLC) alleviates the huge difference in latency and bandwidth between the CPU and DRAM, and improving the processor's memory subsystem is the key to alleviating the "memory wall" problem.
- One approach relies on well-designed cache replacement strategies to efficiently manage on-chip last-level caches, which reduce the number of inserted cache lines by dynamically modifying cache insertions to prioritize data reusability and importance. Interference with LLC.
- Another mainstream approach to alleviate the "memory wall” problem is to use a hardware prefetcher, which prefetches data into the cache hierarchy before it is actually referenced. Although prefetching can hide memory latency and improve performance significantly, false prefetching Fetches cause cache pollution and can severely degrade processor performance.
- a learning-based cache replacement strategy is to learn data reusability from past cache behaviors to predict the priority of future cache line insertions. For example, if a load instruction introduced a cache line that produced a cache hit in the past, it is very likely that the same load instruction will introduce a cache line that will also produce a cache hit in the future.
- the cache replacement strategy simulates the optimal replacement decision by predicting the re-reference interval (Re-Reference Prediction Value, RRPV) of the cache line.
- the reuse interval indicates the relative importance of the cache line, and the cache line with a small reuse interval indicates that it is about to be reused, so Inserting the line in the cache with high priority guarantees that the line will remain in the cache. Cache lines with a large reuse interval are inserted with low priority to ensure that they are evicted as soon as possible.
- the learning-based cache replacement strategy it is common to predict the reuse interval of the cache line based on the program counter (Program Counter, PC) of the memory instruction that caused the cache access.
- PC program Counter
- SHiP proposes a PC-based reuse prediction algorithm to predict cache reuse behavior and use this prediction to guide cache insertion locations.
- Hawkeye reconstructs the Belady-MIN algorithm from past cache accesses, trains a PC-based predictor that learns from the MIN algorithm's decisions on past memory accesses, and then Hawkeye makes replacement decisions based on what the predictor has learned.
- Zhan et al. modeled the cache replacement as a sequence label problem, and used the Long Short Term Memory (LSTM) network model (Long Short Term Memory, LSTM) to train the offline predictor. Due to the input of long-term history of past load instructions, the accuracy of the prediction was improved. .
- An online cache replacement strategy Glider is further proposed, and the features that can compactly represent the long-term load instruction history of the program are designed on the hardware, which is input to the online ISVM, and the ISVM table is used on the hardware to track the ISVM weight of each PC.
- ISVM-based online predictors provide accuracy and performance superior to those used in cutting-edge cache replacement strategies. However, the above studies did not consider the presence of prefetchers.
- prefetch requests When there is prefetching, the accuracy of predictions decreases because there is no distinction between prefetching and demand requests.
- the cache pollution caused by prefetching will also interfere with the management of cache space and reduce memory usage. subsystem performance. From a cache management perspective, prefetch requests have different properties than demand requests, and generally a cache line inserted by a demand request into the LLC is more important to a program's performance than a prefetch request.
- the present invention proposes an intelligent cache replacement strategy adapting to prefetching, which uses the request type as the granularity to predict the reuse of loaded cache lines, and inputs the PC address of the load instruction currently accessed and the PC of the past load instruction in the memory access history record Address, design different ISVM predictors for prefetching and demand requests, improve the accuracy of cache line reuse prediction when prefetching exists, and better integrate the performance improvement brought by hardware prefetching and cache replacement.
- the present invention proposes an intelligent cache replacement strategy adapted to prefetch, to request
- the type (demand or prefetch) is granular for reuse prediction. First, select some cache groups in the last-level cache as the sampling set.
- the input data of the demand predictor includes the PC address of the load instruction that needs to be accessed in the sampling set, and the past PC address stored in the PCHR; the input data of the prefetch predictor Including the PC address of the load instruction that triggers prefetch access in the sampling set, and the past PC address stored in PCHR; secondly, add the component DMINgen, reconstruct the Demand-MIN algorithm on the hardware to provide labels for the data of the training predictor, and label points are positive and negative labels.
- the positive label indicates that the currently accessed cache line is cache-friendly and can be inserted into the cache.
- the negative label indicates that the currently accessed cache line is not cache-friendly and cannot be inserted into the cache.
- the ISVM-based prefetch predictor and the ISVM-based demand predictor are trained respectively according to the memory access behavior of the sample set.
- the training methods are the same, specifically: after the predictor reads the input data, the Look up the weights corresponding to the current input data PC and PCHR in the ISVM table. If the label corresponding to the input data is a positive example label, the weight will be increased by 1; otherwise, the weight will be decreased by 1.
- the prediction process of the two predictors is the same, specifically: select the demand predictor or the prefetch predictor for prediction according to the type of request accessed, and each ISVM table of the predictor is composed of 16 weights.
- each ISVM table corresponds to a PC
- the weights in the ISVM table are obtained by training; first, create a 4-bit hash for each PC in the PCHR, which is used to find the PCHR The weight corresponding to the current content, and look up the weight in the corresponding ISVM table; then, these weights are summed, and if the sum is greater than or equal to the threshold, the currently loaded cache line is predicted to be cache-friendly, and it is inserted with high priority ; If the sum is less than 0, it is predicted that the line does not meet the cache requirements, and the line is inserted with low priority; for the rest of the cases, the line is predicted to be cache-friendly with low confidence, and the line is inserted with medium priority.
- the intelligent cache replacement strategy adapted to prefetching improves the accuracy of cache line reuse prediction when there is a prefetcher, avoids cache pollution caused by prefetching, and retains more useful data in the cache, thereby improving the performance of the memory subsystem.
- a high-performance intelligent cache replacement strategy adapted to prefetching, distinguishing between prefetching and demand requests, using ISVM-based prefetch predictors to predict the re-reference interval of cache lines loaded by prefetch access, using ISVM-based demand prediction
- the predictor predicts the re-reference interval of the cache line loaded by the demand access, and performs cache replacement according to the prediction result.
- Each predictor corresponds to a set of ISVM tables
- ISVM table A corresponds to a PC address B.
- ISVM table A is composed of PC address B and 16 ISVM weights, among which 16 ISVM weights correspond to the 16 PC addresses that have appeared in the PCHR except B, and have the most occurrences, and the initial value of the weights is set to 0.
- the training and prediction process of the predictor consists of the following steps:
- Step 1 select some cache groups in the last level cache as the sampling set, the input data of the demand predictor includes the PC address of the load instruction that generates the demand access, and the past PC address stored in the PCHR;
- the input data of the prefetch predictor includes the PC address of the load instruction that triggers the prefetch access, and the past PC address stored in the PCHR;
- Step 2 add the component DMINgen, and reconstruct the Demand-MIN algorithm on the hardware to provide training labels for the input data of the training predictor.
- the labels are divided into positive example labels and negative example labels.
- the positive example label indicates that the currently accessed cache line is cache-friendly Yes, it can be inserted into the cache.
- the negative instance label indicates that the currently accessed cache line is not cache-friendly and cannot be inserted into the cache.
- the specific generation method is as follows:
- DMINgen determines that the currently accessed cache line will not generate a demand hit.
- a negative instance label is generated for the PC that accessed the cache line last time; prefetch access Use intervals at the end of P will not increase the number of demand hits, and evicting them can provide space for other intervals that produce demand hits, reducing the number of demand misses;
- DMINgen determines that the currently accessed cache line will generate a demand hit. If a demand hit can occur, if the cache space is not full at any time during the use interval, then this Generate a positive example label for the PC that accessed the cache line last time; if the cache space is full at a certain moment in the use interval memory, then generate a negative example label for the PC that accessed the cache line last time at this time;
- the use interval refers to the time interval between the previous access to row X and the access to row X again; the usage interval of row X represents the demand of row X for cache, and is used to determine whether the reference to row X will cause a cache hit;
- Step 3 The ISVM-based prefetch predictor and the ISVM-based demand predictor are respectively trained according to the memory access behavior of the sample set, and the training method is the same, specifically: after the predictor reads the input data, the corresponding ISVM table of the predictor Find the weights corresponding to the current input data PC and PCHR in , if the label corresponding to the input data is a positive example label, the weight will increase by 1; otherwise, the weight will be reduced by 1; if the weight corresponding to the current input data PC and PCHR in the ISVM table If the sum is greater than the threshold, the weight will not be updated this time.
- Step 4 in order to predict the cache line reuse interval with the request type as the granularity, when using the predictor for prediction, select the demand predictor or the prefetch predictor for prediction according to the requested type of access, and each ISVM table of the predictor is composed of Composed of 16 weights, it is used to find the weight values corresponding to different PCs in the PCHR, where each ISVM table corresponds to a PC, and the weights in the ISVM table are obtained by training; first, create a 4-bit hash for each PC in the PCHR , used to find the weights corresponding to the current contents of the PCHR, and look up the weights in the corresponding ISVM table; then, these weights are summed, and if the sum is greater than or equal to a threshold, the currently loaded cache line is predicted to be cache-friendly, and Insert it with high priority; if the sum is less than 0, predict that the line is not cache-eligible and insert it with low priority; The line is inserted according to the priority, which represents the reuse and importance of the
- a load instruction that loads a cache-friendly demand access, but triggers an incorrect prefetch will be classified as a cache-friendly line by the demand predictor and as a cache-unfavorable line by the prefetch predictor.
- Step 5 when replacing a cache line, select a low-priority cache line as an eviction candidate; if there is no such line, select the earliest line that enters the cache from the cache-friendly lines to be evicted.
- Distinguishing between prefetching and demand requests improves the accuracy of reuse interval prediction, which can reserve more useful cache lines and avoid cache pollution caused by prefetching, improving the performance of the memory subsystem.
- the present invention has the following advantages:
- the hardware prefetcher improves performance by pre-fetching useful data, but there is interference between the cache prefetch and replacement strategies.
- the cache pollution caused by prefetch will reduce the performance of the processor.
- the replacement strategy handles prefetch and demand requests The same way will reduce the performance gain of the replacement algorithm.
- the intelligent cache replacement strategy adapted to prefetching trains different predictors for demand and prefetch requests, and inputs the PC sequence of past load instructions in the current PC and memory access history to predict demand and The priority of cache line insertion for prefetch accesses.
- the intelligent cache replacement strategy adapted to prefetch can reduce the interference of prefetch to the replacement strategy, improve the accuracy of reuse interval prediction, retain useful cache lines and avoid cache pollution caused by prefetch, and better integrate hardware prefetch and cache replacement performance advantages.
- Figure 1 is a framework diagram of an intelligent cache replacement strategy adapted to prefetch
- Figure 2 is a schematic diagram of how the Demand-MIN algorithm reduces the number of missing demands
- Figure 3 is a schematic diagram of an ISVM-based demand predictor and prefetch predictor
- Figure 4 is a comparison of the IPC between the cache replacement strategy adapted to prefetch and other cutting-edge replacement algorithms
- Figure 5 is a comparison chart of the LLC demand access miss rate between the cache replacement strategy adapted to prefetch and other cutting-edge replacement algorithms.
- What the present invention relates to is a kind of intelligent cache replacement strategy that adapts to prefetching, as shown in Figure 1, its main component is to make the demand request predictor and the prefetch request predictor of reuse interval prediction, and simulate Demand-MIN Algorithms provide DMINgen for training predictor input labels.
- the design mainly includes two parts of training and forecasting of demand predictor and prefetch predictor.
- the input data of the demand predictor includes the PC address of the load instruction that generates the demand access, and the past PC address stored in the PCHR;
- the input data of the prefetch predictor includes the PC address of the load instruction that triggers the prefetch access, and the PC address stored in the PCHR.
- DMINgen emulates the Demand-MIN algorithm to provide training labels for the input data to train the predictor. And two predictors are trained based on the cache access behavior of the sample set.
- use the ISVM-based prefetch predictor to predict the re-reference interval of the cache line loaded by the pre-fetch access, and use the ISVM-based demand predictor to perform re-reference interval prediction on the cache line loaded by the demand access; by Distinguish between prefetch and demand request predictions, and improve the accuracy of cache line reuse interval predictions.
- Step 1 Add PC history register (PC History Register, PCHR) to the hardware, which is used to save the PC history records of past load instructions during program running. Longer PC records can improve the accuracy of prediction. Select some cache groups in the last-level cache as the sample set.
- the input data of the demand predictor includes the PC address of the load instruction that generates the demand access, and the past PC address stored in the PCHR; the input data of the prefetch predictor includes triggering prefetch The PC address of the load instruction accessed, and the past PC address stored in PCHR.
- Step 2 DMINgen provides training labels for the input data to train the predictor.
- DMINgen expands the concept of usage interval defined in Hawkeye.
- the fine-grained identification of the endpoint of the usage interval is demand access (D) or prefetch access (P).
- the usage interval after distinguishing the request type includes D-D, P-D, P-P, D-P;
- DMINgen determines that the currently accessed cache line will not generate a demand hit. At this time, a negative instance label is generated for the PC that accessed the cache line last time; DMINgen evicts first This cache line does not generate a demand hit;
- DMINgen determines that the currently accessed cache line will generate a demand hit. If a demand hit can occur, if the cache space is not full at any time during the use interval, then this Generate a positive example label for the PC that accessed the cache line last time; if the cache space is full at a certain moment in the use interval memory, then generate a negative example label for the PC that accessed the cache line last time at this time;
- the figure shows a sequence of memory accesses, where the ones in the dotted line box are the prefetch accesses, and the ones in the non-dotted line box are the demand accesses.
- the ones in the dotted line box are the prefetch accesses
- the ones in the non-dotted line box are the demand accesses.
- Step 3 Part of the cache group in the last level cache is used as a sampling set.
- two different predictors are used to separate the training of demand and prefetch request.
- the ISVM-based prefetch predictor and the ISVM-based demand predictor are respectively trained according to the memory access behavior of the sampling set.
- the training methods are the same, specifically: after the predictor reads the input data, it searches the corresponding The weight corresponding to the current input data PC and PCHR, if the label corresponding to the input data is a positive label, the weight will increase by 1; otherwise, the weight will be reduced by 1; if the sum of the weights corresponding to the current input data PC and PCHR in the ISVM table is greater than threshold, the weights will not be updated this time, and the weights will not be updated after the threshold is exceeded to prevent the weights from being saturated to their extreme values, so the predictor can respond quickly to changes in program running behavior to improve prediction accuracy.
- the threshold in this example is dynamically chosen from a fixed set of thresholds (0, 30, 100, 300, and 3000).
- Step 4 in order to predict the cache line reuse interval with the request type as the granularity, as shown in Figure 3, the prediction process of the two predictors is the same, including the following steps:
- Each ISVM table of the predictor is composed of 16 weights, which are used to find the weight values corresponding to different PCs in PCHR, where each ISVM table corresponds to a PC.
- the weights in the ISVM table are obtained by training; first, create a 4-digit hash for each PC in the PCHR, and the 16 numbers represented by the 4-digit hash correspond to the 16 weights in the ISVM table, according to each generated
- the weights can be looked up in the corresponding ISVM table; then, these weights are summed up, if the sum is greater than or equal to the threshold, it is predicted that the currently loaded cache line is cache-friendly and inserted with high priority; if the sum is less than 0, it is predicted that the line does not meet the cache requirements, and the line is inserted with a low priority; for other cases, the line is predicted to be friendly to the cache, and the confidence is low, and the line is inserted with a medium priority;
- PCHR includes PC 1 , PC 2 , PC 6 , PC 10 and PC 15 , combined with the current PC and the type of memory access request that generated this access, we look up the difference in PCHR in the corresponding ISVM table
- the threshold value is 60.
- the priority of cache line insertion represents the reuse and importance of the line. High-priority lines stay in the cache longer, and low-priority lines are evicted as early as possible, thereby retaining more useful in the cache. cache line.
- prefetch requests and demand requests have different properties
- designing different predictors can improve the accuracy of cache line reuse prediction and make more accurate decisions in cache line insertion and eviction. Keep more useful cache lines in the cache, avoid cache pollution caused by prefetch, and better integrate the performance advantages brought by hardware prefetch and cache replacement.
- cache prefetching and cache replacement are two independent technologies of cache management, and the interaction between them will cause performance differences of application programs.
- prefetching can hide memory latency and improve performance
- incorrect prefetching can cause cache pollution
- a cache replacement strategy that cannot adapt to prefetching can cause performance differences or even decline. Therefore, we propose a new smart cache replacement strategy that adapts to prefetching, differentiates between prefetching and demand requests, designs different ISVM-based predictors for the two different types of requests, and applies machine learning to prefetch-enabled Cache management improves the accuracy of reuse interval prediction and reduces the interference of cache prefetching on replacement strategies.
- This Cache management strategy uses the simulation framework released by the 2nd JILP Cache Replacement Championship (CRC2) to evaluate the model.
- the framework is based on ChampSim. After experiments, relevant data are obtained.
- Figure 4 is a comparison of the IPC speedup ratio of the prefetch-adaptive intelligent cache replacement policy (Prefetch-Adaptive Intelligent Cache Replacement Policy, PAIC) and other cutting-edge replacement algorithms relative to LRU.
- Figure 5 is the cache replacement strategy adapted to prefetch and other cutting-edge replacement algorithms A comparison chart of the LLC demand access miss rate, both of which are evaluated on the memory-intensive SPEC2006 and SPEC2017 benchmark suites.
- the intelligent cache replacement strategy adapted to prefetch has improved by 37.22% compared with the baseline LRU (LRU without a prefetcher), the improvement of SHiP is 32.93%, and the improvement of Hawkeye was 34.56%, and Glider's boost was 34.43%.
- Figure 5 shows that the intelligent cache replacement strategy adapted to prefetching reduces the average demand memory miss rate of the baseline LRU by 17.41%, while the demand memory miss rates of SHiP, Hawkeye and Glider are reduced by 13.66%, 15.46% and 15.48%, respectively.
- the intelligent cache replacement strategy adapted to prefetch effectively combines the performance advantages brought by hardware prefetch and cache replacement, ensuring the performance gain brought by the intelligent cache replacement strategy in the presence of hardware prefetch, and improving system performance.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
一种面向高性能的适应预取的智能缓存替换策略,在存在硬件预取器的情况下,区分预取和需求请求,利用基于ISVM(Integer Support Vector Machines)的预取预测器对预取访问加载的缓存行进行重引用间隔预测,利用基于ISVM的需求预测器对需求访问加载的缓存行进行重引用间隔预测。输入当前访存的load指令的PC地址和访存历史记录中过去load指令的PC地址,针对预取和需求请求设计不同的ISVM预测器,以请求类型为粒度对加载的缓存行进行重用预测,改善存在预取时缓存行重用预测的准确度,更好的融合了硬件预取和缓存替换带来的性能提升。
Description
本发明属于计算机体系缓存系统结构领域,具体涉及一种面向高性能的适应预取的智能缓存替换策略。
计算机内存的性能提升速度远远落后于处理器性能提升的速度,形成了阻碍处理器性能提升的“存储墙”,从而使得内存系统成为整个计算机系统的性能瓶颈之一。末级缓存(LLC)缓解了CPU和DRAM之间的延迟和带宽方面的巨大差异,改善处理器的内存子系统是缓解“存储墙”问题的关键。一种方法依赖于设计合理的高速缓存替换策略来有效地管理片上末级高速缓存,这些方法通过动态修改高速缓存插入,以对数据的重用性和重要性进行优先级排序,从而减少插入缓存行对LLC的干扰。另一种缓解“存储墙”问题的主流方法是使用硬件预取器,在实际引用之前将数据预取到缓存层次结构中,尽管预取可以隐藏内存延迟并显着提高性能,但是错误的预取会造成缓存污染,可能会严重降低处理器性能。
随着处理器核数的增加以及工作负载多样性和复杂性的增加,CPU处理器上的替换策略已经从越来越复杂的基于启发式的解决方案发展到基于学习的解决方案。基于学习的缓存替换策略是从过去的缓存行为中学习数据的重用性,以预测未来缓存行插入的优先级。例如,某个load指令在过去引入了产生缓存命中的缓存行,那么将来同一load指令很可能引入也将产生缓存命中的缓存行。
缓存替换策略通过预测高速缓存行的重引用间隔(Re-Reference Prediction Value,RRPV)来模拟最佳替换决策,重用间隔表示缓存行的相对重要性,重用间隔小的缓存行表示即将被重用,因此以高优先级在缓存中插入该行,保证该行可以保留在缓存中。重用间隔大的缓存行则以低优先级插入,保证尽快被驱逐。基于学习的缓存替换策略中,常见的是基于引起缓存访问的内存指令的程序计数器(Program Counter,PC)预测缓存行的重用间隔,如果来自同一个PC的大多数缓存访问具有相似的重用行为,则基于PC可以准确预测缓存行的重用间隔。例如,SHiP提出了一种基于PC的重用预测算法,以预测高速缓存 的重用行为并使用该预测来指导高速缓存的插入位置。Hawkeye根据过去的缓存访问来重建Belady-MIN算法,训练一个基于PC的预测器,从MIN算法对过去的内存访问进行的决策中学习,然后Hawkeye根据预测器所学的内容做出替换决策。
Zhan等人将缓存替换建模为序列标签问题,并采用长短时记忆网络模型(Long Short Term Memory,LSTM)训练离线预测器,由于输入长期的过去load指令的历史记录,提高了预测的准确率。进一步提出了在线缓存替换策略Glider,在硬件上设计了可以紧凑地表示程序长期的load指令历史记录的特征,输入到在线ISVM,硬件上使用ISVM表跟踪每个PC的ISVM的权重。基于ISVM的在线预测器提供的准确性和性能优于前沿的缓存替换策略中使用的预测器。然而上述的研究没有考虑存在预取器的情况,存在预取时,由于没有区分预取和需求请求,导致预测的准确率下降,预取造成的缓存污染也会干扰缓存空间的管理,降低内存子系统性能。从缓存管理的角度来看,预取请求具有与需求请求不同的属性,通常由需求请求插入LLC的缓存行比预取请求对程序的性能更重要。
本发明提出了一种适应预取的智能缓存替换策略,以请求类型为粒度对加载的缓存行进行重用预测,输入当前访存的load指令的PC地址和访存历史记录中过去load指令的PC地址,针对预取和需求请求设计不同的ISVM预测器,改善存在预取时缓存行重用预测的准确度,更好的融合硬件预取和缓存替换带来的性能提升。
发明内容
为了解决现代高性能处理器中普遍使用硬件预取器,但存在预取时最新的智能缓存替换策略表现出的性能增益下降的问题,本发明提出了适应预取的智能缓存替换策略,以请求类型(需求或预取)为粒度进行重用预测。首先,选取末级缓存中部分缓存组作为采样集,需求预测器的输入数据包括采样集中产生需求访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;预取预测器的输入数据包括采样集中触发预取访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;其次,增加组件DMINgen,在硬件上重构Demand-MIN算法为训练预测器的数据提供标签,标签分为正例标签和负例标 签,正例标签表示当前访问的缓存行是缓存友好的,可以插入缓存中,负例标签表示当前访问的缓存行是缓存不友好的,不能插入缓存中。预测器训练阶段,基于ISVM的预取预测器和基于ISVM的需求预测器根据采样集的访存行为分别进行训练,训练方式相同,具体为:预测器读取输入数据后,在预测器对应的ISVM表中查找与当前输入数据PC和PCHR对应的权重,如果输入数据对应的标签为正例标签,则权重将增加1;否则,权重将减1。使用预测器进行预测时,两种预测器的预测过程相同,具体为:根据访问的请求类型选择需求预测器或预取预测器进行预测,预测器的每个ISVM表由16个权重组成,用于查找PCHR中不同PC对应的权重值,其中每个ISVM表对应一个PC,ISVM表中的权重由训练得到;首先,为PCHR中的每个PC创建一个4位哈希,用于找到与PCHR当前内容相对应的权重,并在相应的ISVM表中查找权重;然后,将这些权重相加,如果总和大于或等于阈值,则预测当前加载的缓存行是缓存友好,并以高优先级插入它;如果总和小于0,则预测该行不符合缓存要求,对应低优先级插入该行;对于其余情况,则预测该行对缓存友好,且置信度较低,并以中等优先级插入该行。高优先级插入的行在缓存中保留更长的时间,低优先级的行则尽快从缓存中驱逐。适应预取的智能缓存替换策略提高了存在预取器时缓存行重用预测的准确度,避免预取造成的缓存污染,在缓存中保留更多有用的数据,从而提升内存子系统的性能。
具体技术方案如下:
一种面向高性能的适应预取的智能缓存替换策略,区分预取和需求请求,利用基于ISVM的预取预测器对预取访问加载的缓存行进行重引用间隔预测,利用基于ISVM的需求预测器对需求访问加载的缓存行进行重引用间隔预测,根据预测结果,进行缓存替换,每个预测器对应一组ISVM表,ISVM表A对应一个PC地址B,ISVM表A由PC地址B和16个ISVM权重组成,其中16个ISVM权重对应除B之外的、PCHR中曾经出现过的、且出现次数最多的16个PC地址,权重的初值设定为0。
预测器的训练和预测过程包括以下步骤:
步骤1,选取末级缓存中部分缓存组作为采样集,需求预测器的输入数据包括产生需求访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;
预取预测器的输入数据包括触发预取访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;
步骤2,增加组件DMINgen,在硬件上重构Demand-MIN算法为训练预测器的输入数据提供训练标签,标签分为正例标签和负例标签,正例标签表示当前访问的缓存行是缓存友好的,可以插入缓存中,负例标签表示当前访问的缓存行是缓存不友好的,不能插入缓存中,具体产生方法如下:
对于以预取访问P结束的使用间隔,即D-P和P-P,DMINgen认定当前访问的缓存行不会产生需求命中,此时对上一次访问该缓存行的PC产生一个负例标签;以预取访问P结束的使用间隔不会增加需求命中次数,驱逐它们可以为产生需求命中的其他间隔提供空间,减少需求缺失次数;
对于以需求访问D结束的使用间隔,即P-D和D-D,DMINgen认定当前访问的缓存行会产生需求命中,在可以产生需求命中的情况下,如果使用间隔内任意时刻缓存空间都未满,则此时对上一次访问该缓存行的PC产生一个正例标签;如果使用间隔内存在某一时刻缓存空间已满,则此时对上一次访问该缓存行的PC产生一个负例标签;
所述的使用间隔指前一次访问行X到再次访问行X的时间间隔;行X的使用间隔表示行X对缓存的需求,用于确定对行X的引用是否会导致缓存命中;
步骤3,基于ISVM的预取预测器和基于ISVM的需求预测器根据采样集的访存行为分别进行训练,训练方式相同,具体为:预测器读取输入数据后,在预测器对应的ISVM表中查找与当前输入数据PC和PCHR对应的权重,如果输入数据对应的标签为正例标签,则权重将增加1;否则,权重将减1;如果ISVM表中当前输入数据PC和PCHR对应的权重之和大于阈值,则本次不再更新权重。
步骤4,为了进行以请求类型为粒度的缓存行重用间隔预测,当使用预测器进行预测时,根据访问的请求类型选择需求预测器或预取预测器进行预测,预测器的每个ISVM表由16个权重组成,用于查找PCHR中不同PC对应的权重值,其中每个ISVM表对应一个PC,ISVM表中的权重由训练得到;首先,为PCHR中的每个PC创建一个4位哈希,用于找到与PCHR当前内容相对应的权重,并在相应的ISVM表中查找权重;然后,将这些权重相加,如果总和 大于或等于阈值,则预测当前加载的缓存行是缓存友好,并以高优先级插入它;如果总和小于0,则预测该行不符合缓存要求,对应低优先级插入该行;对于其余情况,则预测该行对缓存友好,且置信度较低,并以中等优先级插入该行,所述优先级代表行的重用性和重要性,高优先级的行在缓存中驻留更长时间,低优先级的行则尽早的被驱逐。
使用两个不同的预测器将需求和预取请求的预测分开,能够更好地了解导致需求和预取访问的load指令的缓存行为。例如,一个load指令,它加载对缓存友好的需求访问,但是触发不正确的预取,将被需求预测器分类为对缓存友好的行,而被预取预测器分类为对缓存不利的行。
步骤5,在替换高速缓存行时,将选择低优先级的缓存行作为驱逐候选对象;如果不存在这样的行,则从缓存友好的行中选出进入缓存最早的行逐出。
区分预取和需求请求提高了重用间隔预测的准确率,则可以保留更多有用的缓存行并避免预取造成的缓存污染,改善内存子系统的性能。
与现有技术相比,本发明具有以下优点:
硬件预取器通过预先获取有用的数据来提高性能,但缓存预取和替换策略之间会存在干扰,预取引起的缓存污染会降低处理器的性能,替换策略对预取和需求请求的处理方式相同会降低替换算法的性能增益。考虑到预取和需求请求性质的不同,适应预取的智能缓存替换策略对需求和预取请求训练不同的预测器,输入当前PC和访存历史记录中过去load指令的PC序列,预测需求和预取访问的缓存行插入的优先级。适应预取的智能缓存替换策略可以减轻预取对替换策略的干扰,提高重用间隔预测的准确率,保留有用的缓存行并避免预取造成的缓存污染,更好的融合硬件预取和缓存替换带来的性能优势。
图1为适应预取的智能缓存替换策略框架图
图2为Demand-MIN算法如何减少需求缺失数的示意图
图3为基于ISVM的需求预测器和预取预测器的示意图
图4为适应预取的缓存替换策略与其它前沿替换算法的IPC对比图
图5为适应预取的缓存替换策略与其它前沿替换算法的LLC需求访问缺失率对比图。
为使本发明的目的,技术方案和优点更加清楚明白,下文中将结合附图对本发明的实施例进行详细说明。
本发明所涉及的是一种适应预取的智能缓存替换策略,如图1所示,它的主要组件是做出重用间隔预测的需求请求预测器和预取请求预测器,以及模拟Demand-MIN算法提供训练预测器输入标签的DMINgen。该设计主要包括需求预测器和预取预测器的训练和预测两部分。需求预测器的输入数据包括产生需求访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;预取预测器的输入数据包括触发预取访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;DMINgen模拟Demand-MIN算法为训练预测器的输入数据提供训练标签。并基于采样集的缓存访问行为训练两种预测器。在每次缓存访问时,利用基于ISVM的预取预测器对预取访问加载的缓存行进行重引用间隔预测,利用基于ISVM的需求预测器对需求访问加载的缓存行进行重引用间隔预测;通过区分预取和需求请求的预测,提高缓存行重用间隔预测的准确率。具体步骤如下:
步骤1,硬件上增加PC历史寄存器(PC History Register,PCHR),用于保存程序运行过程中过去的load指令的PC历史记录,较长的PC记录可以改善预测的准确率。选取末级缓存中部分缓存组作为采样集,需求预测器的输入数据包括产生需求访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;预取预测器的输入数据包括触发预取访问的load指令的PC地址,以及PCHR中存放的过去的PC地址。
步骤2,DMINgen为训练预测器的输入数据提供训练标签。DMINgen扩展了Hawkeye中定义的使用间隔概念,细粒度标识使用间隔的端点为需求访问(D)或预取访问(P),区分请求类型后的使用间隔包括D-D,P-D,P-P,D-P;
对于以预取访问P结束的使用间隔,即D-P和P-P,DMINgen认定当前访问的缓存行不会产生需求命中,此时对上一次访问该缓存行的PC产生一个负例标签;DMINgen优先逐出这种不会产生需求命中的缓存行;
对于以需求访问D结束的使用间隔,即P-D和D-D,DMINgen认定当前访问的缓存行会产生需求命中,在可以产生需求命中的情况下,如果使用间隔内任意时刻缓存空间都未满,则此时对上一次访问该缓存行的PC产生一个正例标签;如果使用间隔内存在某一时刻缓存空间已满,则此时对上一次访问该缓存行的PC产生一个负例标签;
如图2所示,该图显示了一个访存序列,其中虚线框内的为预取访问,非虚线框内的为需求访问。对于可以容纳2个缓存行并最初保留A和B的缓存,当行C加载到空间已满的缓存中时,驱逐A或B会导致不同的需求缺失数。观察图中的访存序列,在时间t=1处,DMINgen选择驱逐B,由于B将在时间t=2时被预取,在t=3时对B的需求访问命中,随后A在t=4时同样需求命中。DMINgen驱逐当前访存序列中最远预取的B,同时对上一次访问B的PC产生一个负例标签,相对于在t=1时驱逐当前访存序列中最远重用的A,DMINgen减少了一次需求缺失次数,降低需求缺失数可以改善程序的性能。
步骤3,末级缓存中部分缓存组作为采样集,按照访问的请求类型,使用两个不同的预测器把需求和预取请求的训练分开。基于ISVM的预取预测器和基于ISVM的需求预测器根据采样集的访存行为分别进行训练,训练方式相同,具体为:预测器读取输入数据后,在预测器对应的ISVM表中查找与当前输入数据PC和PCHR对应的权重,如果输入数据对应的标签为正例标签,则权重将增加1;否则,权重将减1;如果ISVM表中当前输入数据PC和PCHR对应的权重之和大于阈值,则本次不再更新权重,超过阈值后不再更新权重可防止权重全部饱和到其极值,因此预测器可以对程序运行行为的变化做出快速响应来提高预测精度。本实例中的阈值会在一组固定的阈值(0、30、100、300和3000)中动态选择。
步骤4,为了进行以请求类型为粒度的缓存行重用间隔预测,如图3所示,两种预测器的预测过程相同,具体包括以下步骤:
根据访问的请求类型选择需求预测器或预取预测器进行预测,预测器的每个ISVM表由16个权重组成,用于查找PCHR中不同PC对应的权重值,其中每个ISVM表对应一个PC,ISVM表中的权重由训练得到;首先,为PCHR中的每个PC创建一个4位哈希,4位哈希所表示的16个数分别对应ISVM表中 的16个权重,根据每次生成的数就可以在相应的ISVM表中查找权重;然后,将这些权重相加,如果总和大于或等于阈值,则预测当前加载的缓存行是缓存友好,并以高优先级插入它;如果总和小于0,则预测该行不符合缓存要求,对应低优先级插入该行;对于其余情况,则预测该行对缓存友好,且置信度较低,并以中等优先级插入该行;
例如,在图3中,PCHR中包含PC
1,PC
2,PC
6,PC
10和PC
15,结合产生本次访问的当前PC和访存请求类型,我们在对应的ISVM表中查找PCHR中不同PC对应的权重值,即weight1,weight2,weight6,weight10,weight15,将这些权重相加,如果总和大于或等于阈值,我们预测当前访问的缓存行是缓存友好的,以高优先级(RRPV=0)插入它;如果总和小于0,则我们预测该行不符合缓存要求,并以低优先级(RRPV=7)插入该行;对于其余情况,我们确定该行对缓存友好,且置信度较低,并以中等优先级(RRPV=2)插入该行。本实施例中阈值取60。缓存行插入的优先级代表了行的重用性和重要性,高优先级的行在缓存中驻留更长时间,低优先级的行则尽早的被驱逐,从而在缓存中保留更多有用的缓存行。
步骤5,在替换高速缓存行时,将选择低优先级插入的行(RRPV=7)作为驱逐候选对象,如果不存在这样的行,则从缓存友好的行中选出进入缓存最早的行逐出;
由于预取请求和需求请求具有不同的性质,设计不同的预测器可以提高缓存行重用预测的准确率,在缓存行插入和驱逐时做出更准确的决策。在缓存中保留更多有用的缓存行,并避免预取造成的缓存污染,更好的融合硬件预取和缓存替换带来的性能优势。
本发明的适应预取的智能缓存替换策略,缓存预取和缓存替换作为缓存管理的两种独立的技术,它们之间的交互会造成应用程序的性能差异。尽管预取可以隐藏内存延迟并提高性能,但是错误的预取会造成缓存污染,无法适应预取的缓存替换策略会造成性能差异甚至下降。因此,我们提出了一种新的适应预取的智能缓存替换策略,区分预取和需求请求,为两种不同类型的请求设计不同的基于ISVM的预测器,将机器学习应用于支持预取的缓存管理,改善重 用间隔预测的准确率,减轻缓存预取对替换策略的干扰,支持预取的智能替换策略可以更好的融合硬件预取和缓存替换带来的性能优势。本Cache管理策略使用第二届JILP缓存替换锦标赛(CRC2)发布的仿真框架对模型进行评估,该框架基于ChampSim,经过实验,得出相关数据。图4是适应预取的缓存替换策略(Prefetch-Adaptive Intelligent Cache Replacement Policy,PAIC)与其它前沿替换算法相对LRU的IPC加速比对比图,图5是适应预取的缓存替换策略与其它前沿替换算法的LLC需求访问缺失率对比图,两个图都是在访存密集型SPEC2006和SPEC2017基准套件上进行评估得出的。通过计算,单核配置存在预取器的情况下,适应预取的智能缓存替换策略相对于基准LRU(不存在预取器的LRU)改善了37.22%,SHiP的提升为32.93%,Hawkeye的提升为34.56%,Glider的提升为34.43%。图5表明适应预取的智能缓存替换策略相对于基准LRU平均需求访存缺失率减少了17.41%,而SHiP,Hawkeye和Glider的需求访存缺失率分别减少了13.66%,15.46%和15.48%。适应预取的智能缓存替换策略有效的融合了硬件预取和缓存替换带来的性能优势,保证了存在硬件预取的情况下,智能缓存替换策略带来的性能增益,提高了系统的性能。
Claims (4)
- 一种面向高性能的适应预取的智能缓存替换策略,其特征在于,区分预取和需求请求,利用基于ISVM的预取预测器对预取访问加载的缓存行进行重引用间隔预测,利用基于ISVM的需求预测器对需求访问加载的缓存行进行重引用间隔预测,根据预测结果,进行缓存替换,其中,每个预测器对应一组ISVM表,其中,ISVM表A对应一个PC地址B,ISVM表A由PC地址B和16个ISVM权重组成,其中16个ISVM权重对应除B之外的、PCHR中曾经出现过的、且出现次数最多的16个PC地址,权重的初值设定为0。
- 根据权利要求1所述的一种面向高性能的适应预取的智能缓存替换策略,其特征在于,两种预测器的训练过程相同,具体包括以下步骤:训练数据准备阶段步骤1,选取末级缓存中部分缓存组作为采样集,需求预测器的输入数据包括产生需求访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;预取预测器的输入数据包括触发预取访问的load指令的PC地址,以及PCHR中存放的过去的PC地址;硬件上增加的PCHR,用于保存程序运行过程中过去的load指令的PC历史记录;步骤2,增加组件DMINgen,在硬件上重构Demand-MIN算法为训练预测器的数据提供标签,标签分为正例标签和负例标签,正例标签表示当前访问的缓存行是缓存友好的,可以插入缓存中,负例标签表示当前访问的缓存行是缓存不友好的,不能插入缓存中,具体产生方法如下:对于以预取访问P结束的使用间隔,即D-P和P-P,DMINgen认定当前访问的缓存行不会产生需求命中,此时对上一次访问该缓存行的PC产生一个负例标签;对于以需求访问D结束的使用间隔,即P-D和D-D,DMINgen认定当前访问的缓存行会产生需求命中,在可以产生需求命中的情况下,如果使用间隔内任意时刻缓存空间都未满,则此时对上一次访问该缓存行的PC产生一个正例标签;如果使用间隔内存在某一时刻缓存空间已满,则此时对上一次访问该缓存行的PC产生一个负例标签;所述的使用间隔指前一次访问行X到再次访问行X的时间间隔;行X的使用间隔表示行X对缓存的需求,用于确定对行X的引用是否会导致缓存命中;预测器训练阶段基于ISVM的预取预测器和基于ISVM的需求预测器根据采样集的访存行为分别进行训练,训练方式相同,具体为:预测器读取输入数据后,在预测器对应的ISVM表中查找与当前输入数据PC和PCHR对应的权重,如果输入数据对应的标签为正例标签,则权重将增加1;否则,权重将减1;如果ISVM表中当前输入数据PC和PCHR对应的权重之和大于阈值,则本次不再更新权重。
- 根据权利要求1所述的一种面向高性能的适应预取的智能缓存替换策略,其特征在于,两种预测器的预测过程相同,具体包括以下步骤:根据访问的请求类型选择需求预测器或预取预测器进行预测,首先,用哈希算法为PCHR中的每个PC地址生成一个四位的二进制值C,这16个值分别对应ISVM表中的16个权重,进而根据C在ISVM表中查找到PCHR中PC地址对应的权重;然后,将这些权重相加,如果总和大于或等于阈值,则预测当前加载的缓存行是缓存友好,并以高优先级插入它;如果总和小于0,则预测该行不符合缓存要求,对应低优先级插入该行;对于其余情况,则预测该行对缓存友好,且置信度较低,并以中等优先级插入该行,所述优先级代表行的重用性和重要性,高优先级的行在缓存中驻留更长时间,低优先级的行则尽早的被驱逐。
- 根据权利要求1所述的一种面向高性能的适应预取的智能缓存替换策略,其特征在于,缓存替换方法具体如下:在替换高速缓存行时,将选择低优先级的缓存行作为驱逐候选对象;如果不存在这样的行,则从缓存友好的行中选出进入缓存最早的行逐出。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/719,304 US12093188B2 (en) | 2021-05-24 | 2022-04-12 | Prefetch-adaptive intelligent cache replacement policy for high performance |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110606031.1 | 2021-05-24 | ||
CN202110606031.1A CN113297098B (zh) | 2021-05-24 | 2021-05-24 | 一种面向高性能的适应预取的智能缓存替换策略 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/719,304 Continuation US12093188B2 (en) | 2021-05-24 | 2022-04-12 | Prefetch-adaptive intelligent cache replacement policy for high performance |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022247070A1 true WO2022247070A1 (zh) | 2022-12-01 |
Family
ID=77326496
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/119290 WO2022247070A1 (zh) | 2021-05-24 | 2021-09-18 | 一种面向高性能的适应预取的智能缓存替换策略 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113297098B (zh) |
WO (1) | WO2022247070A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117573574A (zh) * | 2024-01-15 | 2024-02-20 | 北京开源芯片研究院 | 一种预取方法、装置、电子设备及可读存储介质 |
CN118035135A (zh) * | 2024-02-29 | 2024-05-14 | 北京开元维度科技有限公司 | 一种缓存替换方法及存储介质 |
CN118295936A (zh) * | 2024-06-06 | 2024-07-05 | 北京开源芯片研究院 | 高速缓存替换策略的管理方法、装置及电子设备 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113297098B (zh) * | 2021-05-24 | 2023-09-01 | 北京工业大学 | 一种面向高性能的适应预取的智能缓存替换策略 |
US12093188B2 (en) | 2021-05-24 | 2024-09-17 | Beijing University Of Technology | Prefetch-adaptive intelligent cache replacement policy for high performance |
CN113760787B (zh) * | 2021-09-18 | 2022-08-26 | 成都海光微电子技术有限公司 | 多级高速缓存数据推送系统、方法、设备和计算机介质 |
CN114816734B (zh) * | 2022-03-28 | 2024-05-10 | 西安电子科技大学 | 一种基于访存特征的Cache旁路系统及其数据存储方法 |
CN116107926B (zh) * | 2023-02-03 | 2024-01-23 | 摩尔线程智能科技(北京)有限责任公司 | 缓存替换策略的管理方法、装置、设备、介质和程序产品 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170293571A1 (en) * | 2016-04-08 | 2017-10-12 | Qualcomm Incorporated | Cost-aware cache replacement |
US20180314533A1 (en) * | 2017-04-28 | 2018-11-01 | International Business Machines Corporation | Adaptive hardware configuration for data analytics |
CN113297098A (zh) * | 2021-05-24 | 2021-08-24 | 北京工业大学 | 一种面向高性能的适应预取的智能缓存替换策略 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3797498B2 (ja) * | 1996-06-11 | 2006-07-19 | ソニー株式会社 | メモリ制御装置およびメモリ制御方法、並びに画像生成装置 |
US7539608B1 (en) * | 2002-05-10 | 2009-05-26 | Oracle International Corporation | Techniques for determining effects on system performance of a memory management parameter |
US8626791B1 (en) * | 2011-06-14 | 2014-01-07 | Google Inc. | Predictive model caching |
US11003592B2 (en) * | 2017-04-24 | 2021-05-11 | Intel Corporation | System cache optimizations for deep learning compute engines |
-
2021
- 2021-05-24 CN CN202110606031.1A patent/CN113297098B/zh active Active
- 2021-09-18 WO PCT/CN2021/119290 patent/WO2022247070A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170293571A1 (en) * | 2016-04-08 | 2017-10-12 | Qualcomm Incorporated | Cost-aware cache replacement |
US20180314533A1 (en) * | 2017-04-28 | 2018-11-01 | International Business Machines Corporation | Adaptive hardware configuration for data analytics |
CN113297098A (zh) * | 2021-05-24 | 2021-08-24 | 北京工业大学 | 一种面向高性能的适应预取的智能缓存替换策略 |
Non-Patent Citations (1)
Title |
---|
SHI ZHAN ZSHI17@CS.UTEXAS.EDU; HUANG XIANGRU XRHUANG@CS.UTEXAS.EDU; JAIN AKANKSHA AKANKSHA@CS.UTEXAS.EDU; LIN CALVIN LIN@CS.UTEXAS: "Applying Deep Learning to the Cache Replacement Problem", PROCEEDINGS OF THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO '52, ACM PRESS, NEW YORK, NEW YORK, USA, 12 October 2019 (2019-10-12) - 16 October 2019 (2019-10-16), New York, New York, USA , pages 413 - 425, XP058476987, ISBN: 978-1-4503-6938-1, DOI: 10.1145/3352460.3358319 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117573574A (zh) * | 2024-01-15 | 2024-02-20 | 北京开源芯片研究院 | 一种预取方法、装置、电子设备及可读存储介质 |
CN117573574B (zh) * | 2024-01-15 | 2024-04-05 | 北京开源芯片研究院 | 一种预取方法、装置、电子设备及可读存储介质 |
CN118035135A (zh) * | 2024-02-29 | 2024-05-14 | 北京开元维度科技有限公司 | 一种缓存替换方法及存储介质 |
CN118295936A (zh) * | 2024-06-06 | 2024-07-05 | 北京开源芯片研究院 | 高速缓存替换策略的管理方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113297098A (zh) | 2021-08-24 |
CN113297098B (zh) | 2023-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022247070A1 (zh) | 一种面向高性能的适应预取的智能缓存替换策略 | |
US12093188B2 (en) | Prefetch-adaptive intelligent cache replacement policy for high performance | |
Michaud | Best-offset hardware prefetching | |
Nesbit et al. | AC/DC: An adaptive data cache prefetcher | |
US8856452B2 (en) | Timing-aware data prefetching for microprocessors | |
Gaur et al. | Bypass and insertion algorithms for exclusive last-level caches | |
US8683129B2 (en) | Using speculative cache requests to reduce cache miss delays | |
US7991956B2 (en) | Providing application-level information for use in cache management | |
US20200004692A1 (en) | Cache replacing method and apparatus, heterogeneous multi-core system and cache managing method | |
US8433852B2 (en) | Method and apparatus for fuzzy stride prefetch | |
US20130246708A1 (en) | Filtering pre-fetch requests to reduce pre-fetching overhead | |
US6629210B1 (en) | Intelligent cache management mechanism via processor access sequence analysis | |
Liang et al. | STEP: Sequentiality and thrashing detection based prefetching to improve performance of networked storage servers | |
WO2020073641A1 (zh) | 一种面向数据结构的图形处理器数据预取方法及装置 | |
JP7453360B2 (ja) | キャッシュアクセス測定デスキュー | |
Wenisch et al. | Making address-correlated prefetching practical | |
CN116820773A (zh) | 一种gpgpu寄存器缓存管理系统 | |
Teran et al. | Minimal disturbance placement and promotion | |
Yang et al. | A prefetch-adaptive intelligent cache replacement policy based on machine learning | |
WO2008149348A2 (en) | Method architecture circuit & system for providing caching | |
Zhang et al. | DualStack: A high efficient dynamic page scheduling scheme in hybrid main memory | |
US8191067B2 (en) | Method and apparatus for establishing a bound on the effect of task interference in a cache memory | |
Manivannan et al. | Runtime-assisted global cache management for task-based parallel programs | |
Zhang et al. | Locality protected dynamic cache allocation scheme on GPUs | |
Sun et al. | Cache coherence method for improving multi-threaded applications on multicore systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21942637 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21942637 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21942637 Country of ref document: EP Kind code of ref document: A1 |