CN113656332A

CN113656332A - CPU cache data prefetching method based on merged address difference sequence

Info

Publication number: CN113656332A
Application number: CN202110962555.4A
Authority: CN
Inventors: 蒋实知; 慈轶为; 杨秋松; 李明树
Original assignee: Shanghai Advanced Research Institute of CAS
Current assignee: Shanghai Advanced Research Institute of CAS
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-16
Anticipated expiration: 2041-08-20
Also published as: CN113656332B

Abstract

The invention provides a CPU cache data prefetching method based on a merged address difference sequence, which comprises the following steps: collecting the current access information of a data cache to be prefetched, and trying to acquire information from a historical information table to update to obtain a current difference value sequence segment when the current access information is collected; updating a historical information table, a difference mapping array and a difference sequence segment sub-table according to the current difference sequence segment, and removing the first difference value to obtain a difference sequence to be predicted; using the difference sequence to be predicted to remove the prefix subsequence of the complete sequence segment stored in the multiple matching dynamic mapping mode table to obtain the best matching complete sequence segment and the corresponding prediction target difference value; and adding the memory access address in the current memory access information to the prediction target difference value to obtain a prediction target address. The invention solves the problem that the prior art needs multi-table cascade to store the multi-length difference sequence, only uses one table to store, and simplifies the storage and query logic of the access mode.

Description

CPU cache data prefetching method based on merged address difference sequence

Technical Field

The invention belongs to the technical field of CPU chip hardware architecture, and particularly relates to a CPU cache data prefetching method based on a merged address difference sequence.

Background

High latency of CPU accesses is one of the major bottlenecks that hinder their performance improvement. The cache data prefetcher loads data from the memory to the cache in advance by predicting the data address required by the CPU calculation, so that the average memory access delay is reduced, and the overall performance of the CPU is improved. Wherein coverage, accuracy and timeliness are three major factors for measuring the performance of the prefetcher.

Currently, mainstream prefetching methods are mainly designed into two categories: spatial prefetching and Temporal prefetching. Among them, Temporal prefetching is not discussed in the present document because it requires too much hardware overhead and requires an operating system to provide additional support. The Spatial prefetching uses the correlation between address sequences in a small address region to predict the possible memory addresses in the region in the future. Prefetch designs in the Spatial major category also include VLDP, SPP, SMS, etc. Among them, VLDP is an advanced multi-table based multi-length difference sequence matching prefetching method (see [ Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubrarian, Chris Wilkerson, Seth H.Pugsley, and Zeshan Chishti.2015. effective preprocessing complex addressing patterns. in Proceedings of the 48th International Symposium on Microarchitechicule, MICRO 2015, Waikiki, USA, December 5-9,2015, Milos Prvulovic (Ed.). ACM,141-152.https:// doi. org/10.1145/2830772.2830793) ].

Disclosure of Invention

The invention aims to provide a CPU cache data prefetching method based on a merged address difference sequence, which aims to solve the problem that in the prior art, a multi-table cascade is needed to store a multi-length difference sequence, only one table is used for storing, and the storage and query logics of a memory access mode are simplified.

In order to achieve the above object, the present invention provides a CPU cache data prefetching method based on a merged address difference sequence, including:

s0: providing a historical information table, a difference value mapping array and a difference value sequence segment sub-table which are connected with a data cache to be prefetched, wherein the difference value mapping array and the difference value sequence segment sub-table form a dynamic mapping mode table;

s1: acquiring a sequence of a memory access request from a core of a CPU by using a data cache to be prefetched, and collecting current memory access information in the sequence through a bypass, wherein the memory access information comprises a memory access address and a PC (personal computer) corresponding to the memory access address, and the PC is a program counter value; when the current access information is collected each time, information is tried to be obtained from the historical information table to obtain a current difference value sequence segment through updating; when the current difference sequence segment is obtained, updating the historical information table, the difference mapping array and the difference sequence segment sub-table according to the current difference sequence segment, and removing the first difference value to obtain a difference sequence to be predicted;

step S2: using a difference sequence to be predicted to remove a prefix subsequence of a complete sequence segment stored in a multiple matching dynamic mapping mode table, wherein the multiple matching means that data items of a difference mapping array in a hit dynamic mapping table are inquired and hit firstly, and then are matched with all data items of a corresponding data group in the difference sequence segment sub table, wherein the lengths of matching labels are multiple, so that the best matching complete sequence segment and the corresponding predicted target difference value are obtained;

s3: and adding the memory access address in the current memory access information to the predicted target difference value to obtain a predicted target address, and sending the predicted target address to a missing state processing register to wait for memory access of the predicted target address so as to realize data prefetching.

The history information table is used for storing a plurality of history information table records, each history information table record comprises a PC label, a page label, the last in-page offset, a combined address difference value sequence segment and a valid bit, the history information table takes the low bit of the PC as an index, and the high bit part of the PC is saved as the PC label;

the difference mapping array is composed of a plurality of fully-connected caches, the difference mapping array comprises a plurality of data items, and each data item comprises a first difference, a confidence number and a valid bit;

the difference sequence segment sub-table is composed of a plurality of buffers connected by a block of multi-path group, the number of data groups of the difference sequence segment sub-table is consistent with the number of data items of the difference mapping array, each data group is provided with a plurality of data items, and each data item comprises a difference sequence segment, a confidence number and a valid bit;

the difference value mapping array and the difference value sequence segment sub-table take the processing result of the hit times of each data item obtained through statistics as the confidence number of the difference value mapping array and the difference value sequence segment sub-table; and the data groups of the difference sequence segment sub-table correspond to the numbers of the data items in the difference mapping array one by one.

In step S1, an attempt is made to acquire the merged address difference sequence segment and the last page offset corresponding to the same PC and page address from the history information table based on the current access information, and when the merged address difference sequence segment and the last page offset are successfully acquired, the current difference sequence segment is updated based on the merged address difference sequence segment, the last page offset and the current access information.

In step S1, the attempting to obtain information from the history information table according to the current access information to update the current difference value sequence segment, and updating the history information table, the difference value mapping array and the difference value sequence segment sub-table according to the current difference value sequence segment when the current difference value sequence segment is obtained includes:

s11: indexing the historical information table by using the PC low order of the current access information, and then hashing the PC high order to obtain a PC label and matching the PC label with the indexed historical information table record; if the matching is successful, acquiring a corresponding page label, the last page offset and the merged address difference sequence segment; matching the obtained page tag with a hash result of the memory access address in the current memory access information, if the two are successfully matched, performing step S12, otherwise, ending the current flow;

s12: subtracting the last page offset from the current page offset to obtain a current difference value, updating the historical information table 1 by using the current difference value, confirming whether the number of past difference values in the merged address difference value sequence segment stored in the historical information table 1 is complete, and if the number is complete, forming a current difference value sequence segment consisting of 4 difference values by using the current difference value and the merged address difference value sequence segment so as to be used as a difference value sequence segment to be learned;

s13: searching and hitting the data items in the difference mapping array by using the first difference value in the current difference sequence segment, and counting the hitting times of each data item in the difference mapping array to update each data item and the confidence coefficient thereof;

s14: updating the corresponding data set in the difference sequence segment sub-table with the subsequent portion of the current difference sequence segment based on the hit at step S13;

in step S14, if the hit is successful, the number of the hit data item is obtained, the subsequent part of the current difference sequence segment is used to query each data item of the data group corresponding to the number of the data item in the hit difference sequence segment sub-table, and the hit times of each data item is counted to update each data item and its confidence score, where the update policy is the same as the update policy of the difference mapping array; otherwise, if the hit fails, clearing all data items in the data group of the difference sequence segment sub-table corresponding to the number of the data item with the minimum credit in the difference mapping array, storing and recording the subsequent part of the current difference sequence segment at the clearing position of the difference sequence segment sub-table, and setting the initial credit of the data item to be 1.

In step S11, when at least one history information table record can be indexed and the current access information and the indexed history information table record have the same PC tag, performing subsequent data extraction to obtain a corresponding page tag, a last intra-page offset, and a merged address difference sequence segment, and setting the valid bit of the history information table record to 1; if the PC tags of the current access information and the indexed records are different, if the valid bit of the record of the history information table is 1, the valid bit is 0, otherwise, if the valid bit of the record of the history information table is 0, the PC tag of the record of the history information table is replaced, and other data fields are emptied;

and in the step S12, if the number of the differences in the merged address difference sequence segment stored in the history information table is less than 3, the process ends.

In step S13, the specific way of counting the number of hits of each data item to update each data item and its confidence score is as follows: when a certain data item in the hit difference mapping array is inquired, if the hit data item does not reach a saturation value, the confidence number of the hit data item is added with 1, if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference mapping array are halved, and the hit data item maintains the saturation value; otherwise, if the hit fails, replacing the data item with the minimum confidence number in the difference mapping array by using the first item difference value of the current difference sequence segment, and setting the initial confidence number of the data item to be 1.

In step S2, a self-adaptive voting strategy is used to obtain a best matching complete sequence segment and a corresponding predicted target difference value according to the confidence numbers of the matched prefix subsequences and the matching lengths thereof.

The step S2 includes:

s21: querying data items in the hit difference mapping array by using the first difference in the difference sequence to be predicted;

s22: according to the hit result of the step S21, if the hit is successful, acquiring the number of the hit data item, forming matching labels of different lengths by using subsequent differences of the difference sequence to be predicted, respectively, and concurrently dematching the data items of the data group corresponding to the number of the data item in the difference sequence segment sub-table, where the last difference of the difference sequence segment in the data items of each successfully matched difference sequence segment sub-table is a candidate difference to be prefetched; otherwise, ending the flow;

s23: calculating the score of each candidate difference to be prefetched, wherein the score is the sum of the product of the confidence number of the difference sequence segment to which all the candidate differences to be prefetched belong and the voting factor; then, counting the total scores of all the candidate differences to be prefetched, and calculating the proportion of the score of each candidate difference to be prefetched; if the score of one pre-fetching candidate difference value exceeds a pre-fetching threshold value, the complete sequence segment corresponding to the pre-fetching candidate difference value is the best matched complete sequence segment, and the pre-fetching candidate difference value with the score exceeding the pre-fetching threshold value is a predicted target difference value, otherwise, the flow is ended;

the lengths of the merged address difference sequence segment, the difference sequence segment of the difference sequence segment sub-table and the prefix sub-sequence are all 3 differences, and the length of the matching label comprises the lengths of 1 difference and 2 differences.

The method for prefetching the cache data of the CPU based on the merged address difference sequence further includes step S4: adding the prediction target difference value obtained in the step S2 to the end of the current difference value sequence, and removing the oldest difference value to keep the same sequence length to obtain an updated difference value sequence to be predicted; subsequently, the steps S2 and S3 are repeated and the step of updating the sequence of difference values to be predicted is repeated until the number of repetitions of steps S2 and S3 reaches the predetermined prediction depth, and the flow ends accordingly.

Therefore, the invention utilizes a multiple matching mechanism, so that the matching of multiple length sequences can be realized by adopting a single table, the matching probability under different program loads is increased, the coverage rate is effectively improved, the cache miss rate is reduced, and the complex updating management logic of multi-table cascade is avoided. In the updating process, sequence segments with different lengths do not need to be updated one by one, and only a merged address difference value sequence segment with a fixed length needs to be updated.

The invention adopts different voting weights according to different matching lengths, filters irrelevant difference sequence segments and improves the prediction accuracy. The prior art (VLDP) considers only the longest matching sequences and ignores the effect of high confidence short matching sequences.

The dynamic mapping mode table of the invention takes the first difference of the merged address difference sequence segment as the index of the dynamic mapping mode table, forms mapping with the storage position allocated to the dynamic state, and is explicitly stored in the difference mapping array, thereby decoupling the stored information characteristic from the index of the dynamic mapping mode table, and explicitly storing the information characteristic and the table position relation in an independent structure. According to the method, the mapping relation between the metadata characteristics and the difference sequence segment sub-tables is decoupled, the sequence segments with high-frequency access are guaranteed to reside in the metadata tables as much as possible, the expenditure of the prefetcher is greatly reduced, and the performance of the prefetcher can still exceed that of other high-expenditure data prefetchers.

Drawings

FIG. 1 is a diagram of the design architecture and the environment for the merged address difference sequence based CPU cache data prefetching method of the present invention.

FIG. 2 is a memory access pattern learning flow diagram of a merged address difference sequence based CPU cache data prefetching method of the present invention.

FIG. 3 is a flow chart of program access pattern prediction and prefetching of the CPU cache data prefetching method based on the merged address difference sequence according to the invention.

Fig. 4 is a schematic diagram of recursive prediction.

Detailed Description

The present invention is described in further detail below with reference to specific examples and the attached drawings.

FIG. 1 shows an applicable environment and its design architecture diagram of the merged address difference sequence-based CPU cache data prefetching method of the present invention. The merged address difference sequence-based CPU cache data prefetching method of the present invention is applicable to an L1 level data cache of a Core (in this embodiment, the Core is Core0 No. 0 in the CPU) in a conventional CPU, and is also applicable to L2 and L3 level data caches, and corresponding performance is reduced. Each core in the traditional CPU can adopt a set of independent CPU cache data prefetching method based on the merged address difference sequence to realize data prefetching, thereby realizing the data prefetching of each core in the traditional CPU.

The CPU cache data prefetching method based on the merging address difference sequence specifically comprises the following steps:

step S0: providing a history information table 1, a difference value mapping array 2 and a difference value sequence segment sub-table 3 which are connected with a data cache to be prefetched and are shown in fig. 1, wherein the difference value mapping array 2 and the difference value sequence segment sub-table 3 form a dynamic mapping mode table, and the history information table 1 is used for storing a plurality of items of history information table records;

in the present embodiment, the data cache to be prefetched is an L1 level data cache, however, in other embodiments, the data cache to be prefetched may also be an L2 data cache or an L3 level data cache.

The history information table 1 is composed of a Direct-connection (Direct-Mapped) multinomial table format cache, and a plurality of historical information table records are stored in the history information table 1. In the present embodiment, the history information table 1 can store a history information table record having 128 entries. Each history information table record includes the following fields: PC tags (12bits), page tags (8bits), last page internal offset (9bits), combined address difference sequence segments (30bits) and significant bits (1bit), 60bits in total.

The history information table 1 uses the lower bits of a Program Counter (PC) value as an index, and stores the upper bit portion of the PC as a PC tag. The PC label and the page label are respectively obtained by hash of the high-order part and the access address of the original PC in the access information, so that the overhead is reduced. The obtained 128 PC tags of the history information table 1 are independent from each other, and are used to determine whether the storage content of the history information table entry matches with a PC of certain access information. The PC tag and the page tag represent a particular PC and physical page, respectively. The length of the merged address difference sequence segment in the history table record is a fixed length, which is used to store the accumulated 3 past differences. In the present embodiment, the number of difference values that can be stored is at most 3, i.e., the merged address difference value sequence segment is composed of accumulating 3 past difference values or all past difference values when the number of past difference values is less than 3. That is, there are two cases in the merged address difference value sequence segment of the history information table 1, one is composed of all past difference values when the difference values are less than 3, then the newly generated difference value will be added to the end of the current sequence; if the merged address difference sequence segment consists of 3 differences, the newly generated differences will be added to the end of the sequence by removing the oldest differences. In summary, when the difference value of the merged address difference value sequence segment is less than three, the merged address difference value sequence segment only stays in the history information table, and is not updated into the dynamic mapping mode table (i.e. the difference value mapping array 2 and the difference value sequence segment sub-table 3). The merged address difference value sequence segment is obtained by calculating the offset difference in the page of the address sequence, and the value range can be a positive value or a negative value. The difference here refers to an intra-page offset difference, and the current intra-page offset difference is obtained by subtracting the last intra-page offset from the current intra-page offset, where the intra-page offset refers to the lower bits of the intra-page of one paging address, and therefore, the last intra-page offset is the lower bits of the intra-page of the paging address of the last access information, and the current intra-page offset is the lower bits of the intra-page of the paging address of the current access information.

The difference mapping array 2 is composed of a block of multi-item fully-connected cache, and stores the mapping relation between the first difference in the current address difference sequence segment and the data group of the difference sequence segment sub-table group 3. The difference map array 2 includes a plurality of data items (in the present embodiment, the number of data items is 16), each of which includes the following fields: a leader difference (10bits), a confidence score (6bits), and a significance (1 bit). Thus, the leading difference in the merged address difference sequence segment is specifically stored in a separate memory structure to decouple the direct mapping of the difference sequence and difference sequence segment sub-table 3.

The confidence numbers of the difference mapping array 2 are updated by using the first difference value in the current difference sequence section in each access mode learning step, when a certain data item in the difference mapping array 2 is inquired and hit, if the hit data item does not reach a saturation value (the saturation value is 63), the confidence number of the hit data item is increased by one, if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference mapping array 2 are all reduced by half, and the hit data item maintains the saturation value. In addition, if any data item is not hit, the data item with the minimum confidence number in the difference mapping array 2 is selected, the head difference value of the data item is replaced by the head difference value in the current difference sequence section, and the initial confidence number of the data item is set to be 1.

The difference sequence segment sub-table 3 is composed of a plurality of buffers connected by a plurality of paths, the number of data groups of the difference sequence segment sub-table is consistent with that of the data items of the difference mapping array 2, and each data group is provided with a plurality of data items; in the present embodiment, the number of data groups is 16, and each data group has 8 data items, thereby having 128 data items in total. Each data entry of difference sequence segment sub-table 3 includes the following fields: difference sequence segment (30bits), a confidence number (9bits) and a significant bit (1 bit).

Wherein the difference sequence segment (30bits) is set to completely store the subsequent part (i.e. all subsequent differences except the leading difference) of the current address difference sequence segment. In this embodiment, the number of differences in the current address difference sequence segment is 4, so the length of the difference sequence segment in the difference sequence segment sub-table 3 is the length of 3 differences, that is, the number of differences in the difference sequence segment sub-table 3 is complete when the number of differences is 3. Therefore, the difference sequence segment with the length of 4 differences can be formed by 3 differences in the difference

sequence segment sub-table

3 and 1 difference stored in the difference mapping array 2. The confidence numbers for each data item in the difference sequence sub-table 3 are updated using the same update logic as the difference map array, with an upper numerical limit of 511.

The difference mapping array 2 and the difference sequence segment sub-table 3 both use the processing result of the hit times of each data item obtained through statistics as the confidence number.

The data group number of the difference sequence segment sub-table 3 and the data item number in the difference mapping array 2 are fixed in hardware, so that the data group number and the data item number in the difference mapping array 2 can be directly in a one-to-one correspondence relationship in hardware (namely, the data group of the difference sequence segment sub-table 3 and the data item number in the difference mapping array 2 are in one-to-one correspondence), and the displayed stored number is not required, and belongs to a static part. The head difference value stored in the difference mapping array 2 can be changed, which is a dynamic part. Therefore, when the mapping relationship between the difference sequence segment sub-table 3 and the difference mapping array 2 is saved, only one of the two relationships needs to be dynamically changed (in the embodiment, the first difference of the difference mapping array 2 is dynamically changed), and the two relationships (i.e., the difference sequence segment sub-table 3 and the difference mapping array 2) do not need to be dynamically changed. Therefore, the group number of the difference sequence segment sub-table 3 and the data item number in the difference mapping array 2 are in one-to-one correspondence, in the present invention, after a first difference query used in addition hits a certain data item of the difference mapping array 2, the group number of the difference sequence segment sub-table 3 corresponding to the data item number of the hit difference mapping array 2 is the group number of the subsequent sequence segment in the difference sequence segment sub-table 3, and the obtained group number can be used in the following steps of the access and memory pattern learning (step S1) and the access and memory pattern prediction (step S2) which will be detailed below.

The valid bits of the difference map array 2 and the difference sequence segment sub-table 3 are used to confirm whether the record is valid at the time of hardware design. If a record is invalid, matching and comparison are not needed, and a matching structure is not needed to be considered.

Therefore, the dynamic mapping mode table of the invention takes the head difference value of the merged address difference value sequence segment as the index (namely the metadata characteristic) of the dynamic mapping mode table, forms mapping with the dynamically allocated storage position, and is explicitly stored in the difference value mapping array, thereby decoupling the stored information characteristic from the index of the dynamic mapping mode table, and explicitly storing the information characteristic and the table position relation in an independent structure. According to the invention, the mapping relation between the metadata characteristics and the difference sequence segment sub-tables is decoupled, so that the sequence segments accessed at high frequency are ensured to reside in the metadata tables as much as possible. In this way, the prefetcher's overhead is greatly reduced and its performance can still exceed that of other high overhead data prefetchers.

Step S1: the method for learning the access mode of the executive program comprises the following steps: acquiring a sequence of access requests from each core in a CPU by using a data cache to be prefetched, so as to collect current access information in the sequence through a bypass, wherein the access information comprises an access address and a Program Counter (PC) value (namely a PC value-address pair) corresponding to the access address; when current access information is collected every time, information is tried to be obtained from the historical information table 1 to obtain a current difference sequence section through updating, when the current difference sequence section is obtained, the historical information table 1, the difference mapping array 2 and the difference sequence section sub-table 3 are updated according to the current difference sequence section, and meanwhile, a first difference value is removed to obtain a difference sequence to be predicted; thus, the resulting sequence of difference values to be predicted has 3 consecutive difference values.

The prefetching method of the present invention needs to access the information of the history information table in both the steps of the memory access pattern learning (step S1) and the memory access pattern prediction (step S2), which will be described in detail later, and these two accesses can be combined into one access. After the history information table is updated by using the current PC and the memory address information in the learning stage, the obtained current difference value sequence segment (composed of 4 difference values) directly removes the first difference value of the current difference value sequence segment for the prediction stage, namely 3 continuous difference values including the current difference value are used for the prediction stage.

The PC is not limited to this, but the sets of access information collected and stored in the same history information table 1 should belong to the same program. At the same time, the PC and the memory access address are in one-to-one correspondence in general. Because one PC corresponding to a plurality of access addresses can also be naturally converted into a plurality of same PCs corresponding to a plurality of access addresses.

In step S1, an attempt is made to acquire the merged address difference sequence segment and the last page offset corresponding to the same PC and page address from the history information table 1 based on the current access information, and when the merged address difference sequence segment and the last page offset are successfully acquired, the current difference sequence segment is updated based on the merged address difference sequence segment, the last page offset and the current access information.

Fig. 2 shows the flow of a complete program memory pattern learning. As shown in fig. 2, in step S1, attempting to obtain information from the history information table 1 according to the current access information to update and obtain the current difference value sequence segment, and updating the history information table 1, the difference value mapping array 2, and the difference value sequence segment sub-table 3 according to the current difference value sequence segment when obtaining the current difference value sequence segment specifically includes:

step S11: indexing the historical information table 1 by using the PC low order of the current access information, and then carrying out hash on the PC high order to obtain a PC label and matching the PC label with the indexed historical information table record; if the matching is successful, acquiring a corresponding page label, the last page offset and the merged address difference sequence segment; matching the obtained page tag with a hash result of the memory access address in the current memory access information, if the two are successfully matched, performing step S12, otherwise, ending the current flow; therefore, the current access information and the matched history information table record are ensured to be positioned in the same physical page.

Wherein the tag (tag) is changeable for confirming that the stored history information table record matches a certain PC. Thus, the current access information (PC-address pair) may generate a PC tag with the upper portion of its own PC and index a record in the history information table with the lower portion of its own PC. It should be noted that the addresses stored in the same data entry of the history information table 1 all correspond to the same PC and should be in the same physical page.

In step S11, when at least one history information table record can be indexed and the current access information and the indexed history information table record have the same PC tag, subsequent data extraction may be performed to obtain a corresponding page tag, a previous intra-page offset, and a merged address difference sequence segment, and the valid bit of the history information table record is set to 1; if the PC tags of the current access information and the indexed records are different, if the valid bit of the record of the history information table is 1, the valid bit is 0, otherwise, if the valid bit of the record of the history information table is already 0, the PC tag of the record of the history information table is replaced, and all data segments corresponding to the PC in the history information table are emptied.

Step S12: after the matching is successful, subtracting the last page offset from the current page offset to obtain a current difference value, and updating the historical information table 1 by using the current difference value; then, it is determined whether the number of past difference values in the merged address difference value sequence segment stored in the history information table 1 is complete (i.e., whether the number is 3), and if the number is determined to be complete, the current difference value and the merged address difference value sequence segment together form a current difference value sequence segment composed of 4 difference values, so as to serve as a difference value sequence segment to be learned.

In step S12, if the number of differences in the merged address difference sequence segment stored in the history information table 1 is less than 3, the process ends to avoid updating the dynamic mapping mode table. Therefore, the history information table 1 needs to be updated when each piece of access information comes; the dynamic mapping mode table is updated only when the length of the current difference sequence segment meets the requirement (the length is 4), and is not updated when the number of the differences in the merging address difference sequence segment is less than 3.

The calculation formula of the intra-page offset PageOffset is as follows:

PageOffset＝(Address&((1<<12)－1))>>3，

where Address is the Address, < < is the bit left shift operation in the computer, corresponding > > is the bit right shift operation, -is the minus sign, 1, 12, 3 represent the number of bits (bit) of the bit operation.

When the page label does not span the page, the current calculation formula of the difference Delta is as follows:

Delta＝CurrentPageOffset-LastPageOffset，

where LastPageOffset is the last intra-page offset and CurrentPageOffset is the current intra-page offset.

In addition, the page tag is used to distinguish whether pages are crossed, i.e. the corresponding addresses of the same PC may come from different physical pages. If a page spread is found to occur, the method of calculating the current difference value needs to be adjusted. The purpose of recording page labels is to judge whether the pages are on the same page according to a normal current difference value calculation method; in different pages, a default calculation method of the current difference value of page connection is adopted, such as: the last access is from page a offset 59 (maximum offset of 64) and the current access is from page B offset 1. We then assume that page B is the subsequent page to a and that the calculated offset difference needs to be corrected (1-59+ 64-6), rather than subtracting directly to get-58.

The updating of the history information table 1 by using the current difference is performed by updating the merged address difference sequence segment stored in the history information table 1 to a combination of the difference in the merged address difference sequence segment and the current difference when the difference in the merged address difference sequence segment is less than 3, and combining the current difference with the merged address difference sequence segment stored in the history information table 1 after removing the first difference (i.e., the oldest difference) when the difference in the merged address difference sequence segment is 3.

The formula for the update (or formation) of the merged address difference sequence segment is:

S_n＝((S_n-1<<10|D)&((1<<10k) -1), wherein S represents a sequence of difference values, S_nIs a current sumAnd address difference sequence segment, S_n-1And k is a fixed difference value number contained in the merged address difference value sequence segment, and D represents the current difference value.

Therefore, if step S11 fails to match any history information table record in the history information table or step S12 fails to constitute the current difference value sequence segment to be learned composed of 4 difference values, i.e., the current difference value sequence segment is not obtained, the process ends, i.e., the process returns directly to step S1 without performing any subsequent steps (e.g., step S13, step S14, step S2, step S3, step S4, etc.).

Step S13: searching the data items in the difference mapping array 2 by using the first difference value (namely the oldest difference value in a plurality of difference values) in the current difference value sequence segment, and counting the hit times of each data item in the difference mapping array 2 to update each data item and the confidence setting number thereof;

step S14: the corresponding data set in the difference sequence segment sub-table 3 is updated with the subsequent portion of the current difference sequence segment based on the hit at step S13.

In step S14, if the hit is successful (i.e. a certain data item in the hit difference mapping array 2 is queried), the number of the hit data item is obtained, the subsequent part of the current difference sequence segment is utilized to query each data item in the data group corresponding to the number of the data item in the hit difference sequence segment sub-table 3, the hit frequency of each data item is counted to update each data item and its confidence score, and the update policy is the same as the update policy of the difference mapping array; otherwise, if the hit fails, clearing all data items in the data group of the difference sequence segment sub-table 3 corresponding to the number of the data item with the minimum credit in the difference mapping array 2, storing and recording the subsequent part of the current difference sequence segment at the clearing position of the difference sequence segment sub-table 3, and setting the initial credit of the data item to be 1. Thus, a learning process is completed.

In step S13, the specific way of counting the number of hits of each data item to update each data item and its confidence number is as follows:

when a certain data item in the hit difference mapping array 2 is queried, if the hit data item does not reach the saturation value (the saturation value is 63), the confidence number of the hit data item is added with 1, and if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference mapping array 2 are all halved, and the hit data item maintains the saturation value. Otherwise, if the hit fails (i.e. any data item is not hit), the data item with the minimum credit in the difference mapping array 2 is replaced by using the first difference of the current difference sequence segment, and the initial credit of the data item is set to be 1.

In step S14, the specific way of counting the number of hits of each data item to update each data item and its confidence number is as follows:

when a certain data item in the hit difference sequence sub-table 3 is inquired, if the hit data item does not reach a saturation value (the saturation value is 511), the confidence number of the hit data item is added with 1, if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference sequence sub-table 3 are all halved, and the hit data item maintains the saturation value; if the hit fails (i.e., any data item is missed), the data item with the lowest confidence in the difference sequence sub-table 3 is replaced with the subsequent part of the current difference sequence, and the initial confidence of the data item is set to 1.

In summary, step S1 collects the last page offset (i.e. the access address of the last access information) corresponding to each PC, calculates and stores the difference corresponding to each PC in combination with the current page offset (i.e. the access address of the current access information), and the obtained several continuous differences form a merged address difference sequence segment with a fixed length, and the merged address difference sequence segment and the corresponding current difference sequence segment are continuously updated in the running process through steps S11-S14 (i.e. the oldest first difference is removed and the newest difference is added to continuously update), and the history information table 1, the difference mapping array 2 and the difference sequence segment sub-table 3 are updated accordingly.

Step S2: the method for predicting the access mode of the executive program comprises the following steps: and the difference value sequence to be predicted is used for carrying out multiple matching on the prefix subsequence of the complete sequence segment stored in the dynamic mapping mode table, so that the best matched complete sequence segment and the corresponding predicted target difference value are obtained.

As described above, if the current difference sequence segment is not generated in step S1, the flow ends without performing steps S2-S4.

The dynamic mapping mode table is composed of a difference mapping array 2 and a difference sequence sub-table 3, and the first difference of the difference mapping array 2 and the difference sequence in the difference sequence sub-table 3 jointly form a complete sequence of the dynamic mapping mode table. The complete sequence segment stored in the dynamic mapping mode table is divided into a prefix subsequence and a target difference value, wherein the prefix subsequence refers to a subsequence formed by a plurality of difference values of the complete sequence segment, and the target difference value is the last difference value of the complete sequence segment. Since the difference mapping array 2 only has the leading difference, and in this embodiment, the number of the differences in the difference sequence segment sub-table 3 is complete when the number of the differences is 3, in this embodiment, the number of the differences in the complete sequence segment stored in the dynamic mapping mode table is 4, the difference in the prefix sub-sequence is a sub-sequence formed by the first 3 differences of the complete sequence segment, and the target difference is the last difference of the complete sequence segment. Therefore, a part of the prefix sub-sequence (i.e. the first difference) is stored in the difference map array 2, and the rest (i.e. the last two differences) is stored in the difference sequence segment sub-table 3 as a part of the difference sequence segment sub-table 3; the target difference is stored in the difference sequence segment sub-table 3.

In the step S2, the multiple matching refers to querying data items of the difference mapping array 2 in the hit dynamic mapping table, and then matching the data items with data items of the corresponding data group in the difference sequence segment sub-table 3, where the matching tags have multiple lengths. Multiple matching includes two stages: 1) extracting a first difference value of the difference value sequence to be predicted to search a mapping relation in the difference value mapping array 2; 2) and matching the subsequent part of the difference sequence to be predicted with corresponding data items in the difference sequence segment sub-table 3 according to the searched mapping relation. The two stages can be matched, namely one-time successful matching.

In step S2, during multi-matching, a self-adaptive voting strategy is adopted according to the confidence numbers of the matched prefix subsequences and the matching lengths corresponding to the prefix subsequences, so as to obtain a best-matched complete sequence segment and a corresponding predicted target difference value. The adaptive voting strategy is a scoring method, and the score of the candidate difference value to be prefetched obtained by the adaptive voting strategy is the product of the confidence number of the difference value sequence segment of the candidate difference value to be prefetched and the voting factor corresponding to the matching length of the difference value sequence segment. Then, the final prefetch target is judged by comparing whether the ratio of the scores of the respective predicted target differences exceeds the prefetch threshold.

Step S3: and performing data prefetching: and adding a prediction target difference value to the memory access address in the current memory access information to obtain a prediction target address, and sending the prediction target address to a Miss Status Handling Register (MSHR) to wait for the memory access of the prediction target address so as to realize data prefetching.

In step S3, the miss status processing register refers to an existing component in the CPU cache for processing the memory access request caused by the data miss.

FIG. 3 shows the program access pattern prediction method of step S2 and the data pre-fetching method of step S3 according to the present invention.

As shown in fig. 3, the step S2 includes:

step S21: and querying the data items in the hit difference mapping array 2 by using the first difference in the difference sequence to be predicted. The exact manner of querying the data items in hit difference map array 2 is similar to method step S12, but does not update or cull any data.

Step S22: according to the hit result of the step S21, if the hit is successful, acquiring the number of the hit data item, forming matching tags with lengths of 1 difference and 2 differences respectively by using subsequent differences of the difference sequence to be predicted, and concurrently dematching each data item of the data group corresponding to the number of the data item in the difference sequence sub-table 3, where the last difference of the difference sequence segment in the data items of each successfully matched difference sequence sub-table 3 is a candidate difference to be prefetched; otherwise, if the step S21 does not result in a successful hit, the process ends to stop prediction.

In step S22, the difference sequence segment sub-table 3 supports multi-length sequence matching by extracting prefixes of different lengths through different masks, where when matching is performed by using a matching tag with a length of 1 difference and a matching tag with a length of 2 differences, the corresponding masks are 0x3ff and 0xffff, respectively, and the corresponding voting factors are 3 and 4, respectively.

However, since the difference sequence segment in the difference sequence segment sub-table 3 has only 3 differences, and the last difference in the difference sequence segment sub-table 3 is to be used as the prediction target, in this embodiment, only matching tags with lengths of 1 difference and 2 differences exist, and matching with a length of 3 does not exist. The voting factor is the "weight" of the scores of the matching sequences of different lengths when voting.

According to the experimental statistics, under the current configuration, an average of 3.09 candidate differences to be prefetched is generated by one multi-length matching process, that is, 2-4 candidate differences to be prefetched are generated.

Step S23: calculating the score of each candidate difference to be prefetched, wherein the score is the sum of the product of the confidence number of the difference sequence segment to which all the candidate differences to be prefetched belong and the voting factor; then, counting the total scores of all the candidate differences to be prefetched, and calculating the proportion of the score of each candidate difference to be prefetched; if the ratio of the score of one prefetch candidate difference exceeds a prefetch threshold (in this embodiment, the prefetch threshold is set to 50%), the complete sequence segment corresponding to the prefetch candidate difference is the best matching complete sequence segment, and the prefetch candidate difference whose ratio of the score exceeds the prefetch threshold is the predicted target difference. Thus, step S3 may be performed subsequently to instigate a prefetch. Otherwise, if the complete sequence segment which meets the condition and is the best match cannot be found, the flow is ended to stop prefetching.

Step S4: performing recursive prediction: adding the prediction target difference value obtained in the step S2 to the end of the current difference value sequence, and removing the oldest difference value to keep the same sequence length to obtain an updated difference value sequence to be predicted; subsequently, the steps S2 and S3 are repeated and the step of updating the to-be-predicted difference sequence is repeated to recursively perform a new round of matching, prediction, and data prefetching once using the to-be-predicted difference sequence until the number of repetitions of steps S2 and S3 reaches a predetermined prediction depth, and the flow is accordingly ended to stop the recursive data prefetching.

Further, as described above, if the ratio of the score of the difference of no prefetch candidates in step S2 exceeds the prefetch threshold, the flow is also ended, and thus the recursive data prefetch is also stopped.

Since, as described above, the present invention will try to update the current difference sequence segment every time the current access information is collected, so as to trigger the prediction of step S2 and the pre-fetching process of step S3 when the current difference sequence segment is obtained. Thus, according to step S4 of the present invention, once step S1 successfully completes the first trigger (i.e., successfully obtains the current difference sequence segment), steps S2 and S3 are then repeatedly executed according to the recursive prediction method of step S4 to prefetch as much data as possible. In order to limit the prefetcher heap memory bandwidth usage, in the present embodiment, the prediction depth has a value of 8, i.e., a maximum of 8 prefetches are generated per recursive prediction in step S4.

Fig. 4 illustrates, in a simplified example, the general flow of recursive prediction, which omits the history information table, wherein the length of the complete sequence segment of the dynamic mapping mode table is consistent with the present embodiment, a difference sequence segment with a length of 4 differences is saved, the confidence number is omitted, and no dynamic mapping mechanism is employed. This figure 4 is used to explain the general recursive prediction mechanism.

As shown in fig. 4, the dynamic mapping mode table is composed of a difference mapping array 2 and a difference sequence sub-table 3, and the first difference of the difference mapping array 2 and the difference sequence in the difference sequence sub-table 3 together form a complete sequence of the dynamic mapping mode table. The complete sequence segment stored in the dynamic mapping mode table is divided into a prefix subsequence and a target difference value, wherein the prefix subsequence refers to a subsequence formed by a plurality of difference values of the complete sequence segment, and the target difference value is the last difference value of the complete sequence segment. As shown in fig. 4, assuming that the difference sequence to be predicted is {1, 2, 3}, the difference sequence to be predicted matches the first record in the table, so that the value of the predicted target difference is 4, and the predicted target address is generated, where the predicted target address is equal to the current address + 4. At this time, the prediction is not stopped, but the prediction target difference 4 is continuously added to the tail of the difference sequence to be predicted, the oldest difference 1 is removed, and a new sequence segment {2, 3, 4} is generated and is used as an updated difference sequence to be predicted for the next matching and prediction. The process of relying on the last prediction to infer the next prediction is called recursive prediction. The invention is established under a recursive prediction framework, so that the invention can predict a plurality of data at one time when the current difference sequence segment is obtained.

In summary, the invention adopts the merged address differential sequence to realize the matching of the multi-length sequence, thereby improving the coverage rate of the prefetcher, avoiding managing a plurality of cascade metadata tables and reducing the complexity of metadata management; by utilizing the self-adaptive voting strategy, the problem that the program access mode is difficult to restore in the out-of-order execution process of the CPU is solved; and finally, by a dynamic metadata mapping mechanism, the performance of the prefetcher is ensured, and high hardware overhead is avoided.

And (3) experimental verification:

an Intel 10 core processor, a memory 64G and a hard disk 1TB are adopted in the experimental environment; the operating system is Ubuntu 20.04; a hardware architecture simulation was performed using the simulator ChampSim, whose configuration is shown in table 1. SPEC 2017 was additionally used as experimental load.

Table 1 simulator configuration table

Experimental results show that 53.1% performance improvement can be achieved when the CPU is added to a CPU without a prefetcher.

The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims

1. A CPU cache data prefetching method based on a merged address difference sequence is characterized by comprising the following steps:

step S0: providing a historical information table (1), a difference value mapping array (2) and a difference value sequence segment sub-table (3) which are connected with a data cache to be prefetched, wherein the difference value mapping array (2) and the difference value sequence segment sub-table (3) form a dynamic mapping mode table;

step S1: acquiring a sequence of a memory access request from a core of a CPU by using a data cache to be prefetched, and collecting current memory access information in the sequence through a bypass, wherein the memory access information comprises a memory access address and a PC (personal computer) corresponding to the memory access address, and the PC is a program counter value; when current access information is collected every time, information is tried to be obtained from the historical information table (1) to update to obtain a current difference value sequence segment; when the current difference sequence segment is obtained, updating the historical information table (1), the difference mapping array (2) and the difference sequence segment sub-table (3) according to the current difference sequence segment, and removing the first difference value to obtain a difference sequence to be predicted;

step S2: using a difference sequence to be predicted to remove a prefix subsequence of a complete sequence segment stored in a multiple matching dynamic mapping mode table, wherein the multiple matching means that data items of a difference mapping array (2) in a hit dynamic mapping table are inquired and hit first, and then are matched with data items of a corresponding data group in a difference sequence segment sub-table (3), wherein the lengths of matching labels are multiple, so that the best matching complete sequence segment and the corresponding prediction target difference value are obtained;

step S3: and adding the memory access address in the current memory access information to the predicted target difference value to obtain a predicted target address, and sending the predicted target address to a missing state processing register to wait for memory access of the predicted target address so as to realize data prefetching.

2. The CPU cache data prefetching method based on merged address difference sequence according to claim 1, wherein the history information table (1) is used for storing a plurality of entries of history information table records, each entry of history information table record comprises a PC tag, a page tag, a last in-page offset, a merged address difference sequence segment and a valid bit, the history information table (1) takes a PC low bit as an index and stores a PC high bit part as a PC tag;

the difference mapping array (2) is composed of a block of multi-item fully-connected caches, the difference mapping array (2) comprises a plurality of data items, and each data item comprises a first item difference value, a confidence number and a valid bit;

the difference sequence segment sub-table (3) is composed of a plurality of buffers connected by a block, the number of data groups of the difference sequence segment sub-table is consistent with that of data items of the difference mapping array (2), each data group is provided with a plurality of data items, and each data item comprises a difference sequence segment, a confidence number and a valid bit;

the difference mapping array (2) and the difference sequence segment sub-table (3) both take the processing result of the hit times of each data item obtained through statistics as the confidence number of the data item; and the data groups of the difference sequence segment sub-table (3) correspond to the numbers of the data items in the difference mapping array (2) one by one.

3. The merged address difference sequence-based CPU cache data prefetching method according to claim 2, wherein in step S1, an attempt is made to acquire the merged address difference sequence segment and the last page offset corresponding to the same PC and page address from the history information table (1) based on the current access information, and when the merged address difference sequence segment and the last page offset are successfully acquired, the current difference sequence segment is updated based on the merged address difference sequence segment, the last page offset, and the current access information.

4. The merged address difference sequence-based CPU cache data prefetching method according to claim 3, wherein in step S1, the current difference sequence segment is obtained by attempting to obtain information from the history information table (1) according to the current access information, and the history information table (1), the difference map array (2) and the difference sequence segment sub-table (3) are updated according to the current difference sequence segment when the current difference sequence segment is obtained, comprising:

step S11: indexing the historical information table (1) by using the PC low order of the current access information, and then carrying out hash on the PC high order to obtain a PC label and matching the PC label with the indexed historical information table record; if the matching is successful, acquiring a corresponding page label, the last page offset and the merged address difference sequence segment; matching the obtained page tag with a hash result of the memory access address in the current memory access information, if the two are successfully matched, performing step S12, otherwise, ending the current flow;

step S12: subtracting the last page offset from the current page offset to obtain a current difference value, updating the historical information table (1) by using the current difference value, confirming whether the number of past difference values in the merged address difference value sequence segment stored in the historical information table (1) is complete, and if the number is complete, forming the current difference value and the merged address difference value sequence segment into a current difference value sequence segment consisting of 4 difference values together to serve as a difference value sequence segment to be learned;

step S13: inquiring and hitting the data items in the difference mapping array (2) by using the first difference value in the current difference sequence segment, and counting the hitting times of each data item in the difference mapping array (2) to update each data item and the confidence coefficient thereof;

step S14: updating the corresponding data set in the difference sequence segment sub-table (3) with the subsequent portion of the current difference sequence segment based on the hit of step S13;

in step S14, if the hit is successful, the number of the hit data item is obtained, the subsequent part of the current difference sequence segment is used to query each data item of the data group corresponding to the number of the data item in the hit difference sequence segment sub-table (3), the hit times of each data item is counted to update each data item and its confidence score, and the update policy is the same as the update policy of the difference mapping array; otherwise, if the hit fails, clearing all data items in the data group of the difference sequence segment sub-table (3) corresponding to the number of the data item with the minimum credit in the difference mapping array (2), storing and recording the subsequent part of the current difference sequence segment at the clearing position of the difference sequence segment sub-table (3), and setting the initial credit of the data item to be 1.

5. The merging address difference sequence-based CPU cache data prefetching method of claim 4, wherein in step S11, when at least one history information table record can be indexed and the current access information and the indexed history information table record have the same PC tag, then performing subsequent data extraction to obtain the corresponding page tag, the last intra-page offset, and the merging address difference sequence segment, and setting the valid bit of the history information table record to 1; if the PC tags of the current access information and the indexed records are different, if the valid bit of the record of the history information table is 1, the valid bit is 0, otherwise, if the valid bit of the record of the history information table is 0, the PC tag of the record of the history information table is replaced, and other data fields are emptied;

and in the step S12, if the number of the differences in the merged address difference sequence segment stored in the history information table (1) is less than 3, the process ends.

6. The method as claimed in claim 4, wherein in step S13, the specific way to count the hit times of each data item to update each data item and its confidence score is as follows: when a certain data item in the hit difference mapping array (2) is inquired, if the hit data item does not reach a saturation value, the confidence number of the hit data item is added with 1, if the hit data item reaches the saturation value, the confidence numbers of other data items in the difference mapping array (2) are reduced by half, and the hit data item maintains the saturation value; otherwise, if the hit fails, replacing the data item with the minimum confidence number in the difference mapping array (2) by using the first item difference value of the current difference sequence segment, and setting the initial confidence number of the data item as 1.

7. The merged address difference sequence-based CPU cache data prefetching method as claimed in claim 2, wherein in step S2, when performing multiple matches, an adaptive voting strategy is employed to obtain the best matching complete sequence segment and the corresponding predicted target difference according to the confidence numbers of the matched multiple prefix subsequences and the corresponding matching lengths.

8. The CPU cache data prefetching method based on merged address difference value sequence as claimed in claim 7, wherein said step S2 comprises:

step S21: querying data items in the hit difference mapping array (2) by using the first difference in the difference sequence to be predicted;

step S22: according to the hit result of the step S21, if the hit is successful, acquiring the number of the hit data item, forming matching labels of different lengths by using subsequent differences of the difference sequence to be predicted, respectively, and concurrently dematching the data items of the data group corresponding to the number of the data item in the difference sequence segment sub-table (3), where the last difference of the difference sequence segment in the data items of the difference sequence segment sub-table (3) that is successfully matched is a candidate difference to be prefetched; otherwise, ending the flow;

step S23: calculating the score of each candidate difference to be prefetched, wherein the score is the sum of the product of the confidence number of the difference sequence segment to which all the candidate differences to be prefetched belong and the voting factor; then, counting the total scores of all the candidate differences to be prefetched, and calculating the proportion of the score of each candidate difference to be prefetched; if the score of one pre-fetching candidate difference value exceeds a pre-fetching threshold value, the complete sequence segment corresponding to the pre-fetching candidate difference value is the best matched complete sequence segment, and the pre-fetching candidate difference value with the score exceeding the pre-fetching threshold value is the predicted target difference value, otherwise, the flow is ended.

9. The merged address difference sequence-based CPU cache data prefetching method of claim 8, wherein the length of the merged address difference sequence segment when complete, the length of the difference sequence segment sub-table (3), and the length of the prefix sub-sequence are all 3 differences, and the length of the matching tag includes the lengths of 1 difference and 2 differences.

10. The merged address difference value sequence-based CPU cache data prefetching method according to claim 2, further comprising the step S4: adding the prediction target difference value obtained in the step S2 to the end of the current difference value sequence, and removing the oldest difference value to keep the same sequence length to obtain an updated difference value sequence to be predicted; subsequently, the steps S2 and S3 are repeated and the step of updating the sequence of difference values to be predicted is repeated until the number of repetitions of steps S2 and S3 reaches the predetermined prediction depth, and the flow ends accordingly.