CN113656332B

CN113656332B - CPU cache data prefetching method based on merging address difference value sequence

Info

Publication number: CN113656332B
Application number: CN202110962555.4A
Authority: CN
Inventors: 蒋实知; 慈轶为; 杨秋松; 李明树
Original assignee: Shanghai Advanced Research Institute of CAS
Current assignee: Shanghai Advanced Research Institute of CAS
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2023-05-26
Anticipated expiration: 2041-08-20
Also published as: CN113656332A

Abstract

The invention provides a CPU cache data prefetching method based on a merging address difference value sequence, which comprises the following steps: collecting current access information of a data cache to be prefetched, and attempting to acquire information from a historical information table to update the information to obtain a current difference sequence segment when the current access information is collected; updating a historical information table, a difference mapping array and a difference sequence segment sub-table according to the current difference sequence segment, and removing the first difference value to obtain a difference sequence to be predicted; utilizing the difference value sequence to be predicted to remove prefix subsequences of the complete sequence segment stored in the multiple matching dynamic mapping mode table, and obtaining the complete sequence segment with optimal matching and a corresponding prediction target difference value; and adding the memory address in the current memory information to the predicted target difference value to obtain a predicted target address. The invention solves the problem that the multi-length difference sequence can be stored only by cascading a plurality of tables in the prior art, and simplifies the storage and query logic of the access mode by using only one table for storage.

Description

CPU cache data prefetching method based on merging address difference value sequence

Technical Field

The invention belongs to the technical field of CPU chip hardware architecture, and particularly relates to a CPU cache data prefetching method based on a merging address difference sequence.

Background

High latency in CPU access is one of the major bottlenecks that prevent its performance from improving. The cache data prefetcher loads data from the memory into the cache in advance by predicting the data address required by the CPU calculation, so that the average memory access delay is reduced, and the overall performance of the CPU is improved. Wherein coverage, accuracy and timeliness are three major factors that measure prefetcher performance.

The design of the main stream prefetching method at present is mainly divided into two main categories: spatial prefetching and Temporal prefetching. The technical prefetch is not widely adopted in the industry because the hardware cost is excessive and the operating system is required to provide additional support, and is not in the scope of the discussion herein. Spatial prefetching exploits the correlation between address sequences in a smaller address region to predict the likely memory addresses in that region in the future. The prefetch design of the Spatial class also includes VLDP, SPP, SMS, etc. VLDP is an advanced multi-table based multi-length difference sequence matching pre-fetch method (see [ Manjunath Shevgoor, sahil Koladiya, rajeev Balasubramonian, chris Wilkerson, seth H. Pugsley, and Zeshan Chishti.2015. Effect prefetching complex address patterns.In Proceedings of the 48th International Symposium on Microarchitecture,MICRO 2015,Waikiki,HI,USA,December 5-9,2015,Milos Prvulovic (Ed.). ACM,141-152.Https:// doi. Org/10.1145/2830772.2830793 ].

Disclosure of Invention

The invention aims to provide a CPU cache data prefetching method based on a merging address difference sequence, which solves the problem that the multi-length difference sequence can be saved only by using one table in the prior art by cascading multiple tables, and simplifies the storage and inquiry logic of a memory access mode.

In order to achieve the above object, the present invention provides a method for prefetching CPU cache data based on a merged address difference sequence, including:

s0: providing a history information table, a difference value mapping array and a difference value sequence segment sub-table which are connected with a data cache to be prefetched, wherein the difference value mapping array and the difference value sequence segment sub-table form a dynamic mapping mode table;

s1: acquiring a sequence of access requests from a core of a CPU by utilizing a data cache to be prefetched, so as to collect current access information in the sequence through a bypass, wherein the access information comprises an access address and a corresponding PC (personal computer) which is a program counter value; when the current access information is collected each time, trying to acquire information from a historical information table to update and obtain a current difference sequence segment; when the current difference value sequence segment is obtained, updating the history information table, the difference value mapping array and the difference value sequence segment sub-table according to the current difference value sequence segment, and removing the first difference value to obtain a difference value sequence to be predicted;

Step S2: utilizing a difference sequence to be predicted to remove prefix subsequences of complete sequence segments stored in a multiple matching dynamic mapping mode table, wherein multiple matching refers to searching data items of a difference mapping array in a hit dynamic mapping table, and then matching the data items with each data item of a corresponding data group in a difference sequence segment sub-table, wherein the lengths of matching labels are multiple, so that the complete sequence segments which are optimally matched and corresponding prediction target difference values are obtained;

s3: and adding the target address in the current access information with the predicted target difference value to obtain a predicted target address, and sending the predicted target address to a missing state processing register to wait for accessing the predicted target address, thereby realizing data prefetching.

The history information table is used for storing a plurality of history information table records, each history information table record comprises a PC label, a page label, a last page internal offset, a merging address difference value sequence segment and a valid bit, the history information table takes a PC low order as an index, and a high order part of the PC is stored as the PC label;

the difference value mapping array consists of a plurality of pieces of all-connected caches, and comprises a plurality of data items, wherein each data item comprises a head difference value, a confidence number and a valid bit;

The difference sequence segment sub-table consists of a plurality of caches connected with each other, the number of the data sets is consistent with that of the data items of the difference mapping array, each data set is provided with a plurality of data items, and each data item comprises a difference sequence segment, a confidence coefficient and a valid bit;

the difference value mapping array and the difference value sequence segment sub-table take the processing result of hit times of each data item obtained by statistics as the confidence coefficient; the data sets of the difference sequence segment sub-table are in one-to-one correspondence with the numbers of the data items in the difference mapping array.

In the step S1, an attempt is made to obtain, from the history information table, a merging address difference sequence segment and a last intra-page offset corresponding to the same PC and page address according to the current access information, and when the merging address difference sequence segment and the last intra-page offset are successfully obtained, a current difference sequence segment is updated according to the merging address difference sequence segment, the last intra-page offset and the current access information.

In the step S1, the current difference sequence segment is updated by attempting to obtain information from the history information table according to the current access information, and the history information table, the difference mapping array and the difference sequence segment sub-table are updated according to the current difference sequence segment when the current difference sequence segment is obtained, including:

S11: indexing the history information table by using the PC low order of the current access information, and then carrying out hash on the PC high order to obtain a PC label and matching the PC label with the indexed history information table record; if the matching is successful, acquiring a corresponding page tag, the last page internal offset and a merging address difference value sequence segment; matching the obtained page tag with the hash result of the memory address in the current memory information, if the page tag and the hash result are successfully matched, performing step S12, otherwise ending the current flow;

s12: subtracting the previous intra-page offset from the current intra-page offset to obtain a current difference value, updating a historical information table 1 by using the current difference value, confirming whether the number of past difference values in the combined address difference value sequence segments stored in the historical information table 1 is complete, and if the number of past difference values is confirmed to be complete, forming the current difference value sequence segments consisting of 4 difference values together with the combined address difference value sequence segments to serve as difference value sequence segments to be learned;

s13: utilizing the leader difference value in the current difference value sequence segment to inquire and hit the data item in the difference value mapping array, and counting the hit times of each data item in the difference value mapping array to update each data item and the confidence coefficient thereof;

S14: updating the corresponding data group in the difference sequence segment sub-table by utilizing the subsequent part of the current difference sequence segment according to the hit result of the step S13;

in the step S14, if the hit is successful, the number of the hit data item is obtained, the subsequent part of the current difference sequence segment is utilized to query each data item of the data set corresponding to the number of the data item in the hit difference sequence segment sub-table, the hit times of each data item are counted to update each data item and the confidence coefficient thereof, and the update strategy is the same as the update strategy of the difference mapping array; otherwise, if the hit fails, all data items in the data group of the difference sequence segment sub-table corresponding to the number of the data item with the smallest confidence number in the difference mapping array are emptied, the subsequent part of the current difference sequence segment is stored and recorded in the empty position of the difference sequence segment sub-table, and the initial confidence number of the data item is set to be 1.

In step S11, when at least one history information table record can be indexed, and the current access information is the same as the PC tags of both the indexed history information table records, performing subsequent data extraction to obtain a corresponding page tag, a previous intra-page offset, and a merging address difference sequence segment, and concatenating the valid bit of the history information table record to be 1; if the PC labels of the current access information and the indexed record are different, if the valid bit of the history information table record is 1, the valid bit is set to be 0, otherwise, if the valid bit of the history information table record is already 0, the PC label of the history information table record is replaced, and other data fields are emptied;

And in the step S12, if the number of the differences in the merging address difference sequence segment stored in the history information table is less than 3, the flow is ended.

In the step S13, the specific way to count the hit number of each data item to update each data item and its confidence number is as follows: when a certain data item in the hit difference value mapping array is queried, if the hit data item does not reach a saturation value, the confidence number of the hit data item is increased by 1, if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference value mapping array are halved, and the hit data item maintains the saturation value; otherwise, if the hit fails, replacing the data item with the smallest confidence number in the difference mapping array by using the leader difference of the current difference sequence segment, and setting the initial confidence number of the data item to be 1.

In the step S2, an adaptive voting strategy is adopted to obtain the complete sequence segment with the best match and the corresponding prediction target difference value according to the confidence numbers of the matched prefix subsequences and the corresponding matching lengths thereof.

The step S2 includes:

s21: inquiring the data item in the hit difference value mapping array by utilizing the leader difference value in the difference value sequence to be predicted;

S22: according to the hit result of step S21, if the hit is successful, the number of the hit data item is obtained, the subsequent difference values of the difference value sequence to be predicted are utilized to respectively form matching labels with different lengths, each data item of the data group corresponding to the number of the data item in the difference value sequence segment sub-table is concurrently removed, and the last difference value of the difference value sequence segment in the data item of each successfully matched difference value sequence segment sub-table is the candidate difference value to be prefetched; otherwise, ending the flow;

s23: calculating the score of each candidate difference value to be prefetched, wherein the score is the sum of the products of confidence numbers and voting factors of the difference value sequence segments to which all candidate difference values to be prefetched belong; then, counting the total score of all the candidate difference values to be prefetched, and calculating the proportion of the score of each candidate difference value to be prefetched; if the proportion of the score of the pre-fetching candidate difference exceeds a pre-fetching threshold, the complete sequence segment corresponding to the pre-fetching candidate difference is the complete sequence segment with the best match, and if the proportion of the score exceeds the pre-fetching threshold, the pre-fetching candidate difference is the prediction target difference, otherwise, the flow is ended;

the lengths of the merging address difference sequence segment, the difference sequence segment of the difference sequence segment sub-table and the prefix sub-sequence are 3 differences, and the length of the matching label comprises the lengths of 1 difference and 2 differences.

The CPU cache data prefetching method based on the merging address difference value sequence further comprises the following step S4: adding the predicted target difference value obtained in the step S2 to the tail end of the current difference value sequence, and removing the oldest difference value to keep the same sequence length, so as to obtain an updated difference value sequence to be predicted; subsequently, the steps S2 and S3 are repeated and the step of updating the sequence of differences to be predicted is repeated until the number of repetitions of steps S2 and S3 reaches a predetermined prediction depth, ending the flow accordingly.

Therefore, the invention utilizes a multiple matching mechanism, so that the matching of multiple length sequences can be realized by adopting a single table, the matching probability under different program loads is increased, the coverage rate is effectively improved, the cache miss rate is reduced, and complex update management logic of multi-table cascading is avoided. In the updating process, the sequence segments with different lengths do not need to be updated one by one, and only the merging address difference value sequence segments with fixed lengths need to be updated.

According to the invention, different voting weights are adopted according to different matching lengths, irrelevant difference sequence segments are filtered, and the prediction accuracy is improved. The prior art (VLDP) considers only the longest matching sequence and ignores the effect of the high confidence short matching sequence.

The dynamic mapping mode table takes the leader difference of the merging address difference sequence segment as an index of the dynamic mapping mode table, forms mapping with a storage position allocated dynamically, and is stored in a difference mapping array in an explicit mode, so that the stored information characteristic and the index of the dynamic mapping mode table are decoupled, and the information characteristic and the table position relation are stored in an independent structure in an explicit mode. According to the method, through decoupling the mapping relation between the metadata characteristics and the difference sequence segment sub-table, the sequence segment with high frequency access is ensured to reside in the metadata table as much as possible, the cost of the prefetcher is greatly reduced, and the performance of the prefetcher can still exceed that of other high-cost data prefetchers.

Drawings

Fig. 1 is a suitable environment and design framework diagram of the CPU cache data prefetching method based on the merged address difference sequence of the present invention.

FIG. 2 is a memory access pattern learning flow chart of a CPU cache data prefetching method based on a merged address difference sequence of the present invention.

FIG. 3 is a flow chart of program access mode prediction and prefetching for a CPU cache data prefetching method based on a merged address difference sequence of the present invention.

Fig. 4 is a schematic diagram of recursive prediction.

Detailed Description

The invention will now be described in further detail by way of specific examples and the accompanying drawings.

Fig. 1 shows a suitable environment and a design structure diagram of the CPU cache data prefetching method based on the merged address difference sequence of the present invention. The method for prefetching the CPU cache data based on the merging address difference sequence is suitable for the L1-level data cache of one Core (in the embodiment, the Core is taken as a Core0 in the CPU as an example) in the traditional CPU, and can also be suitable for the L2-level and L3-level data cache, and the corresponding performance is reduced. Each core in the traditional CPU can adopt a set of independent CPU cache data prefetching method based on the merging address difference value sequence to realize data prefetching, thereby realizing the data prefetching of each core in the traditional CPU.

The CPU cache data prefetching method based on the merging address difference value sequence specifically comprises the following steps:

step S0: providing a history information table 1, a difference mapping array 2 and a difference sequence segment sub-table 3 which are connected with a data cache to be prefetched as shown in fig. 1, wherein the difference mapping array 2 and the difference sequence segment sub-table 3 form a dynamic mapping mode table, and the history information table 1 is used for storing a plurality of history information table records;

In the present embodiment, the data cache to be prefetched is an L1-level data cache, however, in other embodiments, the data cache to be prefetched may be an L2-level data cache or an L3-level data cache.

The history information table 1 is composed of a Direct-Mapped multi-item table format cache, and a plurality of history information table records are stored in the history information table 1. In the present embodiment, the number of entries of the history information table record that the history information table 1 can store is 128. Each history information table record includes the following fields: PC label (12 bits), page label (8 bits), last page internal offset (9 bits), merging address difference sequence segment (30 bits) and valid bit (1 bit), 60bits altogether.

The history information table 1 is indexed by the low order bits of the program counter (ProgramCounter, PC) value, and stores the high order part of the PC as a PC tag. The PC label and the page label are obtained by hashing the high-order part of the original PC in the access memory information and the access memory address respectively, so that the cost is reduced. The 128 PC tags of the obtained history information table 1 are independent, and are used for confirming whether the storage content of the history information table entry matches with the PC of a certain access information. The PC tag and page tag represent a particular PC and physical page, respectively. The length of the merging address difference sequence segment in the history information table record is a fixed length, which is used for storing the accumulated 3 past differences. In the present embodiment, the number of difference values that can be stored is at most 3, that is, the merge address difference sequence section is composed of accumulated 3 past difference values or all past difference values when the number of past difference values is less than 3. That is, there are two cases in the merged address difference sequence section of the history information table 1, one is that the difference is less than 3, and is composed of all the past differences, and the newly generated difference is then added to the end of the current sequence; if the segment of the merged address difference sequence consists of 3 differences, the newly generated difference will be added to the end of the sequence by removing the oldest difference. In summary, only the history information table is left when the difference value of the merging address difference value sequence segment is less than three, and the dynamic mapping mode table (namely the difference value mapping array 2 and the difference value sequence segment sub-table 3) is not updated. The combined address difference value sequence section is obtained by calculating an intra-page offset difference of the address sequence, and the value range can be a positive value or a negative value. The difference here refers to an intra-page offset difference, where the current intra-page offset difference is obtained by subtracting the last intra-page offset from the current intra-page offset, where the intra-page offset refers to the intra-page partial low order of a page address, and thus the last intra-page offset is the intra-page partial low order of the page address of the last access information, and the current intra-page offset is the intra-page partial low order of the page address of the current access information.

The difference mapping array 2 is composed of a plurality of pieces of all-connected caches, and stores the mapping relation between the head difference value in the current address difference sequence segment and the data group of the difference sequence segment sub-group 3. The difference map array 2 includes a plurality of data items (in this embodiment, the number of data items is 16), each data item including the following fields: the leader difference (10 bits), the confidence (6 bits) and the significance (1 bit). Thus, the leading differences in the merged address difference sequence segments are stored in a separate memory structure in particular to decouple the direct mapping of the difference sequence and the difference sequence segment sub-table 3.

The confidence value of the difference mapping array 2 is updated by using the leader difference value in the current difference sequence segment in each step of memory access pattern learning, when a certain data item in the difference mapping array 2 is queried, if the hit data item does not reach the saturation value (the saturation value is 63), the confidence value of the hit data item is increased by one, if the hit data item reaches the saturation value, the confidence values of other data items in the difference mapping array 2 are halved, and the hit data item maintains the saturation value. In addition, if any data item is not hit, the data item with the smallest confidence number in the difference mapping array 2 is selected, the leader difference value of the data item is replaced by the leader difference value in the current difference sequence section, and the initial confidence number of the data item is set to be 1.

The difference sequence segment sub-table 3 consists of a plurality of caches connected with one another, the number of data sets is consistent with that of the data items of the difference mapping array 2, and each data set is provided with a plurality of data items; in this embodiment, the number of data sets is 16, and each data set has 8 data items, so that there are 128 data items in total. Each data item of the difference sequence segment sub-table 3 comprises the following fields: a difference sequence segment (30 bits), a confidence number (9 bits) and a significance (1 bit).

Wherein the difference sequence segment (30 bits) is set to completely preserve the subsequent parts (i.e., all subsequent differences except the leading difference) in the current address difference sequence segment. In this embodiment, the number of differences in the current address difference sequence segment is 4, so the length of the difference sequence segment sub-table 3 is 3 differences, i.e. the number of differences in the difference sequence segment sub-table 3 is complete when the number of differences is 3. Thus, 3 differences in the difference

sequence segment sub-table

3 and 1 difference stored in the difference map array 2 can form a difference sequence segment with a length of 4 differences. The confidence of each data item of the difference sequence segment sub-table 3 takes the same update logic as the difference map array, with a numerical upper limit of 511.

The difference mapping array 2 and the difference sequence segment sub-table 3 both use the processing result of the hit times of each data item obtained by statistics as the confidence coefficient.

The data set number of the difference sequence segment sub-table 3 and the data item number in the difference mapping array 2 are fixed in hardware, so that the two can be directly in a one-to-one correspondence in hardware (namely, the data set of the difference sequence segment sub-table 3 and the number of the data item in the difference mapping array 2 are in one-to-one correspondence), so that the displayed saved number is not needed, and the part of static state is not needed. While the leader differences stored in the difference map array 2 are variable, this is part of the "dynamic". Therefore, when the mapping relation between the difference sequence segment sub-table 3 and the difference mapping array 2 is stored, only one of the two relation sides (in the embodiment, the leader difference of the difference mapping array 2 is dynamically changed) needs to be ensured to be dynamically changed, and the two relation sides (namely, the difference sequence segment sub-table 3 and the difference mapping array 2) do not need to be dynamically changed. Thus, the group numbers of the difference sequence segment sub-table 3 and the data item numbers in the difference mapping array 2 are in one-to-one correspondence, and in the present invention, after a first item used in addition queries a certain data item of the hit difference mapping array 2, the group number of the difference sequence segment sub-table 3 corresponding to the data item number of the hit difference mapping array 2 is the group number of the subsequent sequence segment in the difference sequence segment sub-table 3, and the obtained group number can be used in the steps of access mode learning (step S1) and access mode prediction (step S2) which will be described in detail below.

The significance of the difference map array 2 and the difference sequence segment sub-table 3 is used to confirm whether the record is valid or not at the hardware design time. If a record is invalid, no matching pairs are needed, nor is the matching structure considered.

Therefore, the dynamic mapping mode table takes the leader difference value of the merging address difference value sequence segment as an index (namely metadata characteristic) of the dynamic mapping mode table, forms mapping with the storage position allocated dynamically, and is stored in a difference value mapping array in an explicit mode, so that the stored information characteristic and the index of the dynamic mapping mode table are decoupled, and the information characteristic and the table position relation are stored in an independent structure in an explicit mode. According to the invention, the mapping relation between the decoupling metadata characteristics and the difference sequence segment sub-table is utilized to ensure that the sequence segment accessed by high frequency resides in the metadata table as much as possible. In this way, the prefetcher overhead is greatly reduced and its performance can still exceed other high overhead data prefetchers.

Step S1: executing a program access mode learning method: acquiring a sequence of access requests from each core in the CPU by utilizing a data cache to be prefetched to collect current access information in the sequence through a bypass, wherein the access information comprises an access address and a corresponding Program Counter (PC) value (namely a PC value-address pair); when the current access information is collected each time, the current difference sequence section is obtained by attempting to acquire information from the historical information table 1, the difference mapping array 2 and the difference sequence section sub-table 3 are updated according to the current difference sequence section, and the leader difference is removed to obtain a difference sequence to be predicted; the resulting sequence of differences to be predicted thus has 3 consecutive differences.

The prefetching method of the present invention requires access to information in the history information table in both the memory access pattern learning (step S1) and the memory access pattern prediction (step S2) described in detail below, and these two accesses may be combined into one. After the history information table is updated by using the current PC and the memory address information in the learning stage, the head difference value of the obtained current difference value sequence section (composed of 4 difference values) is directly removed for the prediction stage, namely 3 continuous difference values comprising the current difference value are used for the prediction stage.

There is no additional limitation on the PC, but the sets of access information collected and stored in the same history table 1 should belong to the same program. At the same time, the PC and the memory address are in a one-to-one correspondence in a general sense. Because one PC corresponds to a plurality of memory addresses, the same PC can be naturally converted into a plurality of memory addresses.

In the step S1, an attempt is made to obtain, from the history information table 1, a merging address difference sequence segment and a last intra-page offset corresponding to the same PC and page address according to the current access information, and when the merging address difference sequence segment and the last intra-page offset are successfully obtained, a current difference sequence segment is updated according to the merging address difference sequence segment, the last intra-page offset and the current access information.

Fig. 2 shows a complete program access pattern learning flow. As shown in fig. 2, in the step S1, the current difference sequence segment is updated by attempting to obtain information from the history information table 1 according to the current access information, and when the current difference sequence segment is obtained, the history information table 1, the difference mapping array 2 and the difference sequence segment sub-table 3 are updated according to the current difference sequence segment, which specifically includes:

step S11: indexing the historical information table 1 by using the PC low order of the current access information, and then carrying out hash on the PC high order to obtain a PC label and matching with the indexed historical information table record; if the matching is successful, acquiring a corresponding page tag, the last page internal offset and a merging address difference value sequence segment; matching the obtained page tag with the hash result of the memory address in the current memory information, if the page tag and the hash result are successfully matched, performing step S12, otherwise ending the current flow; thus, the current access information and the matched history information table record are ensured to be positioned in the same physical page.

Wherein the tag (tag) is changeable for confirming that the stored history information table record matches a certain PC. Thus, the current access information (PC-address pair) can generate a PC tag with the upper part of its own PC and index the record in the history information table with the lower part of its own PC. It should be noted that the addresses stored in the same data item of the history information table 1 all correspond to the same PC and should be in the same physical page.

In step S11, when at least one history information table record can be indexed, and the current access information is the same as the PC tag of the indexed history information table record, subsequent data extraction may be performed to obtain a corresponding page tag, the last intra-page offset, and the merging address difference sequence segment, and the valid bit of the history information table record is 1; if the PC labels of the current access information and the indexed record are different, if the valid bit of the record of the history information table is 1, the valid bit is set to be 0, otherwise, if the valid bit of the record of the history information table is already 0, the PC label of the record of the history information table is replaced, and all data segments corresponding to the PC in the history information table are emptied.

Step S12: after successful matching, subtracting the last page offset from the current page offset to obtain a current difference value and updating the historical information table 1 by using the current difference value; and then confirming whether the number of past differences in the merging address difference sequence segments stored in the historical information table 1 is complete (namely, whether the number is 3), and if so, forming a current difference sequence segment consisting of 4 differences together with the merging address difference sequence segments, thereby being used as a difference sequence segment to be learned.

It should be noted that, in step S12, if the number of differences in the merged address difference sequence segment stored in the history information table 1 is less than 3, the process is ended to avoid updating the dynamic mapping pattern table. Thus, the history information table 1 needs to be updated when each access information arrives; the dynamic mapping mode table is updated only when the length of the current difference sequence segment meets the requirement (length is 4), and is not updated when the number of differences in the merging address difference sequence segment is less than 3.

The calculation formula of the intra-page offset PageOffset is:

PageOffset＝(Address&((1<<12)－1))>>3，

wherein Address is Address, < < is bit left shift operation in computer, corresponding > > is bit right shift operation, -is minus sign, 1, 12, 3 represent bit (bit) number of bit operation.

When the page tag does not span the page, the current calculation formula of the difference Delta is:

Delta＝CurrentPageOffset-LastPageOffset，

where LastPageOffset is the last intra-page offset and CurrentPageOffset is the current intra-page offset.

In addition, the page tag is used to distinguish whether to spread pages, i.e. the addresses corresponding to the same PC may come from different physical pages. If a page spread is found to occur, the method of calculating the current difference needs to be adjusted. The purpose of the recording page label is that when judging the same page, the method is according to the normal current difference value calculation method; when different pages are used, the current calculation method of the difference value of the page connection is adopted by default, such as: the last access was from page a offset 59 (maximum offset 64) and the current access was from page B offset 1. We assume that page B is a subsequent page to a, and that the calculated offset difference needs to be corrected (1-59+64=6), rather than directly subtracted to obtain-58.

The updating of the history information table 1 by using the current difference value is performed by updating the merging address difference value sequence segment stored in the history information table 1 to be a combination of the difference value in the merging address difference value sequence segment and the current difference value when the difference value of the merging address difference value sequence segment is less than 3, and removing the first difference value (i.e., the oldest difference value) from the merging address difference value sequence segment stored in the history information table 1 when the difference value of the merging address difference value sequence segment is 3.

The formula for updating (or forming) the merging address difference sequence segment is:

S _n ＝((S _n-1 <<10|D)&((1<<10k) -1), wherein S represents a sequence of differences, S _n S is the current merging address difference value sequence segment _n-1 For the last merging address difference sequence segment, k is the fixed difference number contained in the merging address difference sequence segment, and D represents the current difference.

Therefore, if step S11 cannot match any history information table record in the history information table or step S12 cannot form the current difference sequence segment to be learned composed of 4 differences, that is, the current difference sequence segment cannot be obtained, the process ends, that is, the process returns to step S1 directly without any subsequent steps (e.g., step S13, step S14, step S2, step S3, step S4, etc.).

Step S13: inquiring and hitting the data items in the difference mapping array 2 by utilizing the leader difference value (namely the oldest difference value in a plurality of difference values) in the current difference value sequence section, and counting the hit times of each data item in the difference mapping array 2 to update each data item and the confidence coefficient thereof;

step S14: and updating the corresponding data group in the difference sequence segment sub-table 3 by utilizing the subsequent part of the current difference sequence segment according to the hit result of the step S13.

In the step S14, if the hit is successful (i.e. a certain data item in the hit difference mapping array 2 is queried), the number of the hit data item is obtained, the subsequent part of the current difference sequence segment is utilized to query each data item of the data set corresponding to the number of the data item in the hit difference sequence segment sub-table 3, the hit times of each data item are counted to update each data item and the confidence coefficient thereof, and the update strategy is the same as that of the difference mapping array; otherwise, if the hit fails, all data items in the data group of the difference sequence segment sub-table 3 corresponding to the number of the data item with the smallest confidence number in the difference mapping array 2 are emptied, the subsequent part of the current difference sequence segment is stored and recorded in the empty position of the difference sequence segment sub-table 3, and the initial confidence number of the data item is set to be 1. Thus, the one-time learning process is completed.

In the step S13, the specific way of counting the hit number of each data item to update each data item and its confidence number is as follows:

when a certain data item in the difference mapping array 2 is searched, if the hit data item does not reach the saturation value (the saturation value is 63), the confidence number of the hit data item is increased by 1, if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference mapping array 2 are halved, and the hit data item maintains the saturation value. Otherwise, if the hit fails (i.e. any data item is not hit), the data item with the smallest confidence in the difference mapping array 2 is replaced by the leading difference of the current difference sequence segment, and the initial confidence of the data item is set to be 1.

In the step S14, the specific way to count the hit number of each data item to update each data item and its confidence number is as follows:

when a certain data item in the difference sequence segment sub-table 3 is searched, if the hit data item does not reach a saturation value (the saturation value is 511), the confidence number of the hit data item is increased by 1, if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference sequence segment sub-table 3 are halved, and the hit data item maintains the saturation value still; if the hit fails (i.e., any data item is not hit), the data item with the smallest confidence in the difference sequence segment sub-table 3 is replaced with the subsequent portion of the current difference sequence segment, and the initial confidence of the data item is set to 1.

In summary, step S1 calculates and stores the differences corresponding to each PC by collecting the last page offset (i.e. the address of the last access information) corresponding to each PC, combining the current page offset (i.e. the address of the current access information) respectively, and the obtained plurality of continuous differences form a merging address difference sequence segment with a fixed length, and the merging address difference sequence segment and the corresponding current difference sequence segment are continuously updated in the running process (i.e. by removing the oldest header difference and adding the latest difference) through steps S11-S14, and correspondingly updates the history information table 1, the difference mapping array 2 and the difference sequence segment sub-table 3.

Step S2: the method for predicting the access mode of the execution program comprises the following steps: and removing prefix subsequences of the complete sequence segments stored in the multiple matching dynamic mapping mode table by utilizing the difference sequence to be predicted, thereby obtaining the complete sequence segments with optimal matching and the corresponding prediction target difference values.

As described above, if the current difference sequence segment is not generated in step S1, the flow is ended, and steps S2 to S4 are not performed.

The dynamic mapping mode table consists of a difference mapping array 2 and a difference sequence segment sub-table 3, wherein the head difference of the difference mapping array 2 and the difference sequence segment in the difference sequence segment sub-table 3 jointly form a complete sequence segment of the dynamic mapping mode table. The complete sequence segment stored in the dynamic mapping mode table is split into a prefix sub-sequence and a target difference value, wherein the prefix sub-sequence refers to a sub-sequence formed by a plurality of first difference values of the complete sequence segment, and the target difference value is the last difference value of the complete sequence segment. Since the difference mapping array 2 only has the leading difference, and in this embodiment, the number of differences in the difference sequence segment sub-table 3 is complete when the number of differences is 3, so in this embodiment, the number of differences in the complete sequence segment stored in the dynamic mapping mode table is 4, the difference in the prefix sub-sequence is a sub-sequence consisting of the first 3 differences of the complete sequence segment, and the target difference is the last difference of the complete sequence segment. Also, therefore, a part of the prefix sub-sequence (i.e., the first difference) is stored in the difference mapping array 2, and the rest (i.e., the two later differences) is stored in the difference sequence segment sub-table 3 as a part of the difference sequence segment sub-table 3; the target difference is stored in the difference sequence segment sub-table 3.

In the step S2, the multiple matching refers to searching the data item of the difference mapping array 2 in the hit dynamic mapping table, and then matching with each data item of the corresponding data set in the difference sequence segment sub-table 3, where the lengths of the matching labels are multiple. Multiple matching includes two phases: 1) Extracting a first difference value of a difference value sequence to be predicted to find a mapping relation in a difference value mapping array 2; 2) And matching the subsequent part of the difference sequence to be predicted with corresponding data items in the difference sequence segment sub-table 3 according to the searched mapping relation. The two stages can be matched, namely one successful match is achieved.

In the step S2, when multiple matching is performed, an adaptive voting strategy is adopted according to the confidence numbers of the matched prefix subsequences and the corresponding matching lengths thereof, so as to obtain the complete sequence segment with the best matching and the corresponding prediction target difference value. The self-adaptive voting strategy is a scoring method, and the score of the candidate difference value to be prefetched obtained by the self-adaptive voting strategy is the product of the confidence number of the difference value sequence section of the candidate difference value to be prefetched and the voting factor corresponding to the matching length of the confidence number. Then, the final prefetch target is judged by comparing whether the proportion of the scores of the respective prediction target differences exceeds a prefetch threshold.

Step S3: performing data prefetching: and adding the target address in the current access information with the target address difference to obtain a target address, and sending the target address to a missing state processing register (Miss Status Handling Register, MSHR) to wait for accessing the target address, thereby realizing data prefetching.

In the step S3, the miss status processing register refers to an existing component in the CPU cache for processing a memory access request caused by a data miss.

Fig. 3 shows the program access mode prediction method of step S2 and the data prefetching method of step S3 according to the present invention.

As shown in fig. 3, the step S2 includes:

step S21: and querying the data item in the hit difference value mapping array 2 by utilizing the leader difference value in the difference value sequence to be predicted. The specific manner of querying the data items in the hit difference map array 2 is similar to method step S12, but does not update or reject any data.

Step S22: according to the hit result of step S21, if the hit is successful, the number of the hit data item is obtained, the subsequent difference values of the difference value sequence to be predicted are utilized to respectively form matching labels with the length of 1 difference value and 2 difference values, each data item of the data group corresponding to the number of the data item in the difference value sequence segment sub-table 3 is concurrently removed, and the last difference value of the difference value sequence segment in the data item of each successfully matched difference value sequence segment sub-table 3 is the candidate difference value to be prefetched; otherwise, step S21 is not successful, and the flow is ended to stop the prediction.

In the step S22, the difference sequence segment sub-table 3 supports multi-length sequence matching by extracting prefixes of different lengths by using different masks, and when matching is performed by using a matching tag of 1 difference in length and a matching tag of 2 differences in length, the corresponding masks are respectively 0x3ff and 0xffff, and the corresponding voting factors are respectively 3 and 4.

In this embodiment, only the matching labels with the length of 1 difference and 2 differences exist, and no matching with the length of 3 exists, because the difference sequence segment in the difference sequence segment sub-table 3 has only 3 differences, and the last difference in the difference sequence segment sub-table 3 is to be used as a prediction target. The voting factor is the "weight" of the scores of the matching sequences of different lengths at the time of voting.

According to experimental statistics, under the current configuration, a multi-length matching process will generate 3.09 candidate differences to be prefetched on average, i.e. 2-4 candidate differences to be prefetched are usually generated.

Step S23: calculating the score of each candidate difference value to be prefetched, wherein the score is the sum of the products of confidence numbers and voting factors of the difference value sequence segments to which all candidate difference values to be prefetched belong; then, counting the total score of all the candidate difference values to be prefetched, and calculating the proportion of the score of each candidate difference value to be prefetched; if the proportion of the score of the pre-fetching candidate difference exceeds a pre-fetching threshold (in this embodiment, the pre-fetching threshold is set to be 50%), the complete sequence segment corresponding to the pre-fetching candidate difference is the complete sequence segment with the best match, and the pre-fetching candidate difference with the score exceeding the pre-fetching threshold is the prediction target difference. Thus, step S3 may be performed subsequently to trigger a prefetch. Otherwise, the complete sequence segment which can not find the best match meeting the condition is indicated, and the process is ended to stop the prefetching.

Step S4: and (3) carrying out recursive prediction: adding the predicted target difference value obtained in the step S2 to the tail end of the current difference value sequence, and removing the oldest difference value to keep the same sequence length, so as to obtain an updated difference value sequence to be predicted; subsequently, the steps S2 and S3 are repeated and the step of updating the difference sequence to be predicted is repeated, so that a new round of matching, prediction and one data pre-fetching are recursively performed by using the difference sequence to be predicted until the repetition times of the steps S2 and S3 reach a predetermined prediction depth, and the process is correspondingly ended, so that the recursive data pre-fetching is stopped.

In addition, as described above, if the proportion of the score of the non-prefetch candidate difference exceeds the prefetch threshold in step S2, the flow is also ended, and thus the recursive data prefetch is also stopped.

Since, as described above, the present invention will attempt to update the current difference sequence segment every time the current access information is collected, so as to trigger the prediction of step S2 and the prefetch process of step S3 when the current difference sequence segment is obtained. Thus, according to step S4 of the present invention, once step S1 successfully completes the first trigger (i.e., successfully obtains the current difference sequence segment), steps S2 and S3 are then repeatedly performed to prefetch as much data as possible according to the recursive predictive method of step S4. In order to limit the prefetcher heap memory bandwidth occupation, in this embodiment, the value of the prediction depth is 8, that is, the recursive prediction of step S4 generates at most 8 prefetches each time.

Fig. 4 illustrates the general flow of recursive prediction in a simplified example, ignoring the history information table, where the length of the complete sequence segment of the dynamic mapping pattern table is consistent with this embodiment, the difference sequence segment of length 4 differences is preserved, and the confidence is ignored and no dynamic mapping mechanism is employed. This fig. 4 is used to explain a general recursive prediction mechanism.

As shown in fig. 4, the dynamic mapping mode table is composed of a difference mapping array 2 and a difference sequence segment sub-table 3, and the leading difference of the difference mapping array 2 and the difference sequence segment in the difference sequence segment sub-table 3 together form a complete sequence segment of the dynamic mapping mode table. The complete sequence segment stored in the dynamic mapping mode table is split into a prefix sub-sequence and a target difference value, wherein the prefix sub-sequence refers to a sub-sequence formed by a plurality of first difference values of the complete sequence segment, and the target difference value is the last difference value of the complete sequence segment. As shown in fig. 4, assuming that the to-be-predicted difference sequence is {1,2,3}, the to-be-predicted difference sequence matches the first record in the table, so as to obtain a value of 4 of the predicted target difference, and a predicted target address is generated, where the predicted target address is equal to the current address +4. At this time, the prediction is not stopped, but the target difference 4 for prediction is continuously added to the tail of the difference sequence to be predicted, and the oldest difference 1 is removed, so that a new sequence segment {2,3,4} is generated, and the new sequence segment serves as an updated difference sequence to be predicted for next matching and prediction. The process of relying on the last prediction result to infer the next prediction is thus called recursive prediction. The invention is built in the framework of recursive prediction, so that the invention can predict a plurality of data at one time when the current difference sequence segment is obtained.

In summary, the invention adopts the merging address differential sequence to realize the multi-length sequence matching, improves the coverage rate of the prefetcher, avoids the management of a plurality of cascade metadata tables, and reduces the complexity of metadata management; the problem that the program access mode is difficult to restore in the process of disordered execution of the CPU is solved by utilizing the self-adaptive voting strategy; finally, through a dynamic metadata mapping mechanism, the high hardware overhead is avoided while the performance of the prefetcher is ensured.

And (3) experimental verification:

the experimental environment adopts an Intel 10 core processor, a memory 64G and a hard disk 1TB; the operating system is Ubuntu 20.04; the hardware architecture simulation was performed using simulator chamtsim, the configuration of which is shown in table 1. SPEC 2017 was additionally used as an experimental load.

Table 1 simulator configuration table

Experimental results show that the invention can achieve 53.1% performance improvement when being added into CPU without prefetcher.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and those skilled in the art may modify or substitute the technical solution of the present invention without departing from the spirit and scope of the present invention, and the protection scope of the present invention shall be defined by the claims.

Claims

1. The CPU cache data prefetching method based on the merging address difference value sequence is characterized by comprising the following steps of:

step S0: providing a history information table (1), a difference value mapping array (2) and a difference value sequence segment sub-table (3) which are connected with a data cache to be prefetched, wherein the difference value mapping array (2) and the difference value sequence segment sub-table (3) form a dynamic mapping mode table;

step S1: acquiring a sequence of access requests from a core of a CPU by utilizing a data cache to be prefetched, so as to collect current access information in the sequence through a bypass, wherein the access information comprises an access address and a corresponding PC (personal computer) which is a program counter value; when the current access information is collected each time, trying to acquire information from a historical information table (1) to update and obtain a current difference sequence segment; when a current difference value sequence segment is obtained, updating the history information table (1), the difference value mapping array (2) and the difference value sequence segment sub-table (3) according to the current difference value sequence segment, and removing the first difference value to obtain a difference value sequence to be predicted;

step S2: utilizing a difference sequence to be predicted to remove prefix subsequences of complete sequence segments stored in a multiple matching dynamic mapping mode table, wherein multiple matching refers to searching data items of a difference mapping array (2) in a hit dynamic mapping table, and then matching the data items with each data item of a corresponding data set in a difference sequence segment sub-table (3), wherein the lengths of matching labels are multiple, so that a complete sequence segment with optimal matching and a corresponding prediction target difference value are obtained;

Step S3: and adding the target address in the current access information with the predicted target difference value to obtain a predicted target address, and sending the predicted target address to a missing state processing register to wait for accessing the predicted target address, thereby realizing data prefetching.

2. The CPU cache data prefetching method based on the merge address difference sequence according to claim 1, wherein the history information table (1) is used for storing a plurality of history information table records, each history information table record includes a PC tag, a page tag, a last intra-page offset, a merge address difference sequence segment, and a valid bit, the history information table (1) uses a PC low order as an index, and stores a PC high order part as a PC tag;

the difference mapping array (2) consists of a plurality of pieces of all-connected caches, the difference mapping array (2) comprises a plurality of data items, and each data item comprises a first item difference value, a confidence number and a valid bit;

the difference sequence segment sub-table (3) consists of a plurality of caches connected with one another, the number of data sets is consistent with that of the data items of the difference mapping array (2), each data set is provided with a plurality of data items, and each data item comprises a difference sequence segment, a confidence number and a valid bit;

The difference value mapping array (2) and the difference value sequence segment sub-table (3) both take the processing result of the hit times of each data item obtained by statistics as the confidence coefficient; the data sets of the difference sequence segment sub-table (3) are in one-to-one correspondence with the numbers of the data items in the difference mapping array (2).

3. The method according to claim 2, wherein in the step S1, the merging address difference sequence segment and the last intra-page offset corresponding to the same PC and page address are obtained from the history information table (1) according to the current access information, and when the merging address difference sequence segment and the last intra-page offset are successfully obtained, the current difference sequence segment is obtained according to the merging address difference sequence segment, the last intra-page offset, and the current access information.

4. A method for prefetching CPU cache data based on a merge address difference sequence according to claim 3, wherein in said step S1, the current difference sequence segment is updated by attempting to obtain information from the history information table (1) according to the current access information, and when the current difference sequence segment is obtained, said history information table (1), the difference map array (2) and the difference sequence segment sub-table (3) are updated according to the current difference sequence segment, comprising:

Step S11: indexing the historical information table (1) by using the PC low bits of the current access information, and then carrying out hash on the PC high bits to obtain a PC label and matching the PC label with the indexed historical information table record; if the matching is successful, acquiring a corresponding page tag, the last page internal offset and a merging address difference value sequence segment; matching the obtained page tag with the hash result of the memory address in the current memory information, if the page tag and the hash result are successfully matched, performing step S12, otherwise ending the current flow;

step S12: subtracting the previous intra-page offset from the current intra-page offset to obtain a current difference value, updating a history information table (1) by using the current difference value, confirming whether the number of past difference values in the merging address difference value sequence segments stored in the history information table (1) is complete, and if the number of past difference values is confirmed to be complete, forming a current difference value sequence segment consisting of 4 difference values together with the merging address difference value sequence segments, thereby being used as a difference value sequence segment to be learned;

step S13: inquiring and hitting the data items in the difference mapping array (2) by utilizing the leader differences in the current difference sequence segments, and counting the hit times of each data item in the difference mapping array (2) to update each data item and the confidence coefficient thereof;

Step S14: updating the corresponding data set in the difference sequence segment sub-table (3) by utilizing the subsequent part of the current difference sequence segment according to the hit result of the step S13;

in the step S14, if the hit is successful, the number of the hit data item is obtained, the subsequent part of the current difference sequence segment is utilized to query each data item of the data set corresponding to the number of the data item in the hit difference sequence segment sub-table (3), the hit times of each data item are counted to update each data item and the confidence coefficient thereof, and the update strategy is the same as that of the difference mapping array; otherwise, if the hit fails, all data items in the data group of the difference sequence segment sub-table (3) corresponding to the number of the data item with the smallest confidence number in the difference mapping array (2) are emptied, the subsequent part of the current difference sequence segment is stored and recorded in the empty position of the difference sequence segment sub-table (3), and the initial confidence number of the data item is set to be 1.

5. The method for prefetching CPU cache data based on the merge address difference sequence according to claim 4, wherein in step S11, when at least one history information table record can be indexed and the PC labels of the current access information and the indexed history information table record are the same, the subsequent data extraction is performed to obtain the corresponding page label, the last intra-page offset, the merge address difference sequence segment, and the valid bit of the history information table record is 1; if the PC labels of the current access information and the indexed record are different, if the valid bit of the history information table record is 1, the valid bit is set to be 0, otherwise, if the valid bit of the history information table record is already 0, the PC label of the history information table record is replaced, and other data fields are emptied;

In the step S12, if the number of differences in the merged address difference sequence segment stored in the history information table (1) is less than 3, the process is ended.

6. The method for prefetching CPU cache data based on the merge address difference sequence according to claim 4, wherein in said step S13, the specific way of counting hits of each data item to update each data item and its confidence is: when a certain data item in the hit difference value mapping array (2) is queried, if the hit data item does not reach a saturation value, the confidence number of the hit data item is increased by 1, if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference value mapping array (2) are halved, and the hit data item maintains the saturation value; otherwise, if the hit fails, replacing the data item with the smallest confidence number in the difference mapping array (2) by using the leader difference of the current difference sequence segment, and setting the initial confidence number of the data item to be 1.

7. The method according to claim 2, wherein in the step S2, when multiple matching is performed, an adaptive voting strategy is adopted to obtain the best matching complete sequence segment and the corresponding prediction target difference according to the confidence numbers of the matched prefix sub-sequences and the corresponding matching lengths thereof.

8. The method for prefetching CPU cache data based on the merged address difference sequence of claim 7, wherein said step S2 comprises:

step S21: inquiring the data item in the hit difference mapping array (2) by utilizing the leader difference in the difference sequence to be predicted;

step S22: according to the hit result of the step S21, if the hit is successful, the number of the hit data item is obtained, the subsequent difference values of the difference value sequence to be predicted are utilized to respectively form matching labels with different lengths, each data item of the data group corresponding to the number of the data item in the difference value sequence segment sub-table (3) is concurrently removed, and the last difference value of the difference value sequence segment in the data item of each successfully matched difference value sequence segment sub-table (3) is the candidate difference value to be prefetched; otherwise, ending the flow;

step S23: calculating the score of each candidate difference value to be prefetched, wherein the score is the sum of the products of confidence numbers and voting factors of the difference value sequence segments to which all candidate difference values to be prefetched belong; then, counting the total score of all the candidate difference values to be prefetched, and calculating the proportion of the score of each candidate difference value to be prefetched; if the proportion of the score of the pre-fetching candidate difference exceeds a pre-fetching threshold, the complete sequence segment corresponding to the pre-fetching candidate difference is the complete sequence segment with the best match, the pre-fetching candidate difference with the score exceeding the pre-fetching threshold is the predicted target difference, otherwise, the process is ended.

9. The CPU cache data prefetching method based on the merge address difference sequence according to claim 8, wherein the length of the merge address difference sequence segment when complete, the length of the difference sequence segment sub-table (3), and the length of the prefix sub-sequence are all 3 differences, and the length of the matching tag includes a length of 1 difference and a length of 2 differences.

10. The CPU cache data prefetching method based on the merge address difference sequence according to claim 2, further comprising step S4: adding the predicted target difference value obtained in the step S2 to the tail end of the current difference value sequence, and removing the oldest difference value to keep the same sequence length, so as to obtain an updated difference value sequence to be predicted; subsequently, the steps S2 and S3 are repeated and the step of updating the sequence of differences to be predicted is repeated until the number of repetitions of steps S2 and S3 reaches a predetermined prediction depth, ending the flow accordingly.