CN117971731A - Hardware implementation device of LRU (least recently used) approximation algorithm, and updating method and device of LRU value - Google Patents
Hardware implementation device of LRU (least recently used) approximation algorithm, and updating method and device of LRU value Download PDFInfo
- Publication number
- CN117971731A CN117971731A CN202311868919.8A CN202311868919A CN117971731A CN 117971731 A CN117971731 A CN 117971731A CN 202311868919 A CN202311868919 A CN 202311868919A CN 117971731 A CN117971731 A CN 117971731A
- Authority
- CN
- China
- Prior art keywords
- lru
- cache
- value
- cache block
- paths
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000004590 computer program Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 description 11
- 238000013507 mapping Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/123—Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention provides a hardware implementation device of an LRU (least recently used) approximation algorithm, an updating method and device of an LRU value. The hardware implementation device is applied to N-way group-connected mapped caches, each Cache comprises a plurality of groups, each group comprises N Cache blocks, and the device comprises: least recently used LRU register sets, buffer configuration units, and LRU update units; the LRU register set is arranged on a state marking bit in each Cache block, the LRU register set comprises M multiplied by log 2 N registers, and M is the number of the Cache blocks included in the Cache; the LRU register set is used for storing the LRU value of each Cache block; the LRU updating unit is used for updating the LRU value of the Cache block under the condition that the Cache block hits or misses; the buffer configuration unit is used for initializing the LRU value of each Cache block and completing the numerical configuration of the buffer. The hardware implementation device of the LRU approximation algorithm provided by the invention can greatly improve the operation efficiency of the Cache.
Description
Technical Field
The present invention relates to the field of central processing units, and in particular, to a hardware implementation device of an LRU approximation algorithm, and an updating method and device of an LRU value.
Background
The single chip multiprocessor (Chip Multiprocessors, CMP) architecture is a trend in the future of microprocessor architectures, where memory speeds up far from microprocessor processing frequencies while microprocessor architectures continue to evolve. The CMP architecture refers to integrating multiple microprocessor cores on a chip, where the multiple processor cores share a single main memory space, so that the speed gap between the processor and the main memory is more obvious. Therefore, it is currently urgent to improve Cache (Cache) performance.
Hit rate is one of the main indexes of the performance of the Cache, and the replacement algorithm of the Cache directly affects the hit rate. Wherein, the least recently Used (LEAST RECENTLY Used) algorithm is to always select which least recently Used set is replaced according to the usage condition of each set. The method better reflects the program locality rule and has the highest hit rate.
Although the LRU algorithm is a better algorithm widely used, the LRU algorithm needs more hardware support, has high hardware cost and lower operation efficiency.
Disclosure of Invention
The invention provides a hardware implementation device of an LRU (least recently used) approximation algorithm, and a method and a device for updating an LRU value, which are used for solving the problems of high hardware overhead and low operation efficiency of the existing LRU algorithm in the hardware implementation process in the prior art.
The invention provides a hardware implementation device of an LRU approximation algorithm, which is applied to N-way group-connected mapped caches, wherein each Cache comprises a plurality of groups, each group comprises N-way Cache blocks, and the hardware implementation device of the LRU approximation algorithm comprises:
Least recently used LRU register sets, buffer configuration units, and LRU update units;
the LRU register set is arranged on a state marking bit in each Cache block, the LRU register set comprises M multiplied by log 2 N registers, and M is the number of the Cache blocks included in the Cache;
The LRU register set is used for storing the LRU value of each Cache block, and the register is used for changing the size of the LRU value of each Cache block;
The buffer area configuration unit is used for initializing the LRU value of each Cache block and completing the numerical configuration of the buffer area; the buffer area is used for judging whether the LRU value of each Cache block needs to be updated or not based on the size of the LRU value of each Cache block and a configured preset LRU threshold value, and the size of the preset LRU threshold value is configured by the CPU;
The LRU updating unit is configured to clear an LRU value of a target way corresponding to the hit Cache block and add one to the LRU value of the Cache block in the buffer which is smaller than the LRU value of the target way, when the hit Cache block is in the buffer.
In some embodiments, the LRU updating unit is further to:
under the condition that the hit Cache block is in the buffer area, maintaining the LRU value of other paths unchanged, wherein the other paths are paths in the buffer area which are larger than or equal to the LRU value of the target path;
And under the condition that the hit Cache block is in a non-buffer zone, maintaining the LRU values of all the paths in the target set corresponding to the hit Cache block unchanged, wherein the non-buffer zone comprises Cache blocks with the LRU values smaller than the preset threshold value.
In some embodiments, the LRU updating unit is further to:
And when the address accessing the Cache is missed, selecting a Cache block of a path with the maximum LRU value for replacement, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value.
The invention provides an updating method of LRU value, which is applied to the hardware implementation device of LRU approximation algorithm, and comprises the following steps:
Initializing the LRU value of each path of Cache block in the Cache;
determining that the address accessing the Cache is hit or missed;
Under the condition that the address accessing the Cache is missed, selecting a Cache block of a path with the maximum LRU value for replacement, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value;
Under the condition that the address accessing the Cache hits, the LRU value of a target path corresponding to the hit Cache block is cleared, and the LRU value of the Cache block smaller than the LRU value of the target path in a buffer zone is added by one, wherein the buffer zone comprises the Cache blocks with the LRU values larger than or equal to a preset threshold value.
In some embodiments, after the determining that the address of the access Cache hits or misses, the method further comprises:
under the condition that the hit Cache block is in the buffer area, maintaining the LRU value of other paths unchanged, wherein the other paths are paths in the buffer area which are larger than or equal to the LRU value of the target path;
And under the condition that the hit Cache block is in a non-buffer zone, maintaining the LRU values of all the paths in the target set corresponding to the hit Cache block unchanged, wherein the non-buffer zone comprises Cache blocks with the LRU values smaller than the preset threshold value.
In some embodiments, the determining that the address accessing the Cache hits or misses comprises:
comparing whether the address of the Cache is matched with a field of a Tag table;
determining that the address accessing the Cache hits under the condition that whether the address of the Cache is matched with a field of a Tag table;
And determining that the address accessing the Cache is missed under the condition that whether the address of the Cache is not matched with the field of the Tag table.
The invention also provides an updating device of the LRU value, which is applied to the hardware implementation device of the LRU approximation algorithm, and comprises:
the initialization module is used for initializing the LRU value of each Cache block in the Cache;
A determining module, configured to determine that an address accessing the Cache hits or misses;
And the updating module is used for selecting a Cache block of a path with the maximum LRU value for replacement under the condition that the address accessing the Cache is missed, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor realizes the updating method of the LRU value according to any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of updating LRU values as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a method of updating an LRU value as described in any of the above.
According to the hardware implementation device of the LRU approximation algorithm, the LRU value updating method and the device, an N-bit LRU register set is added for each Cache block, when the Cache is hit or missed, the LRU value is updated through the LRU updating unit, so that the record of the access condition of each Cache block in each set is realized, in addition, the buffer area configuration unit can configure the size of the buffer area through a CPU (Central processing Unit), when the Cache is hit, each path of LRU value in the buffer area does not need to be updated from time to time, on the premise of ensuring the replacement accuracy, the complexity of the LRU algorithm in the implementation process is effectively simplified, the operation efficiency of the Cache can be greatly improved, and the hardware implementation cost is effectively reduced.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a hardware implementation device of the LRU approximation algorithm provided by the present invention;
FIG. 2 is a schematic diagram of the internal structure of a Cache in the N-way set associative mapping manner provided by the invention;
FIG. 3 is a schematic diagram of an update flow of LRU values of a hardware implementation device of the LRU approximation algorithm provided by the present invention;
FIG. 4 is a flowchart illustrating a method for updating LRU values according to the present invention;
FIG. 5 is a schematic diagram of LRU values of a 4-way set associative Cache according to the method for updating LRU values provided by the present invention;
FIG. 6 is a schematic diagram of an apparatus for updating LRU values according to the present invention;
Fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the related art, the following three address mapping modes can be adopted between the Cache and the main memory: full associative mapping, direct associative mapping and group associative mapping. The Cache adopting the group-associative mapping mode has the advantages of low block conflict probability, high utilization rate and low block failure rate.
The hardware implementation device of the LRU approximation algorithm provided by the invention is a hardware implementation device of the LRU approximation algorithm under a Cache structure of the set associative mapping. Specifically, the method is applicable to the Cache of N-way set associative mapping, wherein N is a positive integer and N >2.
The hardware implementation device of the LRU approximation algorithm, the updating method and the updating device of the LRU value of the present invention are described below with reference to fig. 1 to 7.
Fig. 1 is a schematic diagram of a hardware implementation device of the LRU approximation algorithm provided by the present invention. Referring to fig. 1, the hardware implementation device of the LRU approximation algorithm provided by the present invention is applied to a Cache mapped by N sets of connection, where the Cache includes a plurality of sets, each set includes a Cache block of N paths, and the hardware implementation device of the LRU approximation algorithm includes:
least recently used LRU register set 110, buffer configuration unit 120, and LRU update unit 130;
The LRU register set 110 is arranged on a state marking bit in each Cache block, the LRU register set 110 comprises M multiplied by log 2 N registers, and M is the number of the Cache blocks included in the Cache;
The LRU register set 110 is configured to store an LRU value of each Cache block, and the register is configured to change a size of the LRU value of each Cache block;
The buffer configuration unit 120 is configured to initialize the LRU value of each Cache block and complete the numerical configuration of the buffer; the buffer area is used for judging whether the LRU value of each Cache block needs to be updated or not based on the size of the LRU value of each Cache block and a preset LRU threshold value configured by a central processing unit (Central Processing Unit, CPU).
The LRU updating unit 130 is configured to, when the hit Cache block is in the buffer, zero the LRU value of the target way corresponding to the hit Cache block, and add one to the LRU value of the Cache block in the buffer that is smaller than the LRU value of the target way.
Assuming the number of Cache blocks of the Cache mapped by the N-way Group association is M, each Group consists of N Cache blocks, namely, each Group has N Cache blocks, and the Group number Group of the Cache is M/N. The internal structure of the Cache in the N-way set associative mapping mode is shown in FIG. 2.
The LRU register set 110 is composed of M×log 2 N registers, and is Used for storing the least recently Used (LEAST RECENTLY Used, LRU) value of each Cache block;
it should be noted that, the LRU algorithm is a commonly used page replacement algorithm, and selects the page that is not used the latest to be eliminated.
In actual execution, it is necessary to add an LRU register set of log 2 N bits to the Status Tag bit (Status Tag bit) of each Cache block.
Taking the Cache mapped by 4-way set connection as an example, the Status Tag bit of each Cache block needs to be added with LRU register sets with 2-bit width, and each set is added with 4 LRU register sets with 2-bit width, so that the whole Cache needs to be added with the register sets of M LRUs with 2-bit width.
The buffer configuration unit 120 performs initialized binary encoding on the LRU values of the N ways in each set.
Taking the i-th Group i as an example, the initial encoding of each LRU value in Group i is: way 0i:(0)bin,Way1i:(1)bin,…,Wayni:(n)bin.
Where Way 0i is the LRU value of Group i Way 0, way 1i is the LRU value of Group i Way 1, way ni is the LRU value of Group i Way N, and the other groups are encoded in the same manner as Group i.
When a replacement operation or a hit operation occurs to a Cache block, the LRU values of the corresponding set of Cache blocks are updated, and specific steps will be described in the following embodiments.
The LRU updating unit 120 is configured to implement LRU value updating when the Cache hits or misses, where the operation does not cause an overflow problem of the LRU value during the updating process of the LRU value; the buffer configuration unit 130 mainly receives the instruction sent by the CPU, and is used for completing the buffer value configuration and the initialization of the LRU value.
The buffer configuration unit 120 is further configured to perform numerical configuration of the buffer.
First, each Way in each set of Cache is divided into two parts, part0 and Part1, according to the size of the LRU value, i.e., way 0,Way1…Wayi…Wayn is grouped.
Wherein Part0 refers to all ways whose LRU value is equal to or less than M lru of Way i, and Part1 refers to all ways whose LRU value is equal to or greater than M lru of Way i. Wherein M lru is the preset LRU threshold.
It should be noted that, the value of M lru is configured by the CPU through a register, the value of M lru is any integer between 2 and n, and the number of paths in Part0 and Part1 is adjusted by the value of M lru.
Part1 is equivalent to a buffer area for judging whether the LRU value needs to be updated when the Cache block hits, and the LRU value of each path in Part1 does not need to be updated from time to time, wherein the size of the buffer area (the range of Part 1) is regulated by the M lru value.
Because the LRU values of each way also change continuously during the replacement of the Cache block, the elements in Part0 and Part1 also change continuously according to the LRU values.
Taking the ith Group i of the Cache mapped by 4-way Group connection as an example, the change process of each way member in the Part0 and Part1 groups is as follows:
In the initial state, the LRU value of each way is: way 0i:2′b00;Way1i:2′b01;Way2i:2′b10;Way3i: 2' b11.
When M lru = 2, part0 contains a Way { Way 0i,Way1i }, part1 contains a Way { Way 2i,Way3i }; when M lru = 3, part0 contains a Way { Way 0i,Way1i,Way2i }, part1 contains a Way { Way 3i }.
Assuming a 2 nd Way hit, a zero-clearing operation occurs to the 2 nd Way 2i of the set, and the LRU value of each Way becomes: way 0i:2′b01;Way1i:2′b10;Way2i:2′b00;Way3i: 2' b11.
When M lru = 2, part0 contains a Way { Way 2i,Way0i }, part1 contains a Way { Way 1i,Way3i }; when M lru = 3, part0 contains a Way { Way 2i,Way0i,Way1i }, part1 contains a Way { Way 3i }.
The LRU updating unit 130 is configured to update the LRU value when the LRU value of the hit Cache block falls into the buffer, otherwise, each way LRU value in the set remains unchanged. The method is only approximate to the LRU algorithm, but can effectively improve the operation efficiency of the Cache on the premise of not affecting the hit rate.
If the Cache hits, the CPU directly calls the content from the Cache, and the LRU updating unit 130 judges whether to update the LRU value of each way in the set of the hit Cache block according to the LRU value of the current hit Cache block.
According to the hardware implementation device of the LRU approximation algorithm, an N-bit LRU register set is added for each Cache block, when the Cache is hit or missed, the LRU update unit updates the LRU value to achieve the record of the access condition of each Cache block in each set, in addition, the buffer area configuration unit configures the buffer area through the CPU, when the Cache is hit, the LRU value of each path in the buffer area does not need to be updated from time to time, and on the premise of guaranteeing the replacement accuracy, the complexity of the LRU algorithm in the implementation process is effectively simplified, the operation efficiency of the Cache can be greatly improved, and the hardware implementation cost is effectively reduced.
In some embodiments, LRU updating unit 130 is further to:
under the condition that the hit Cache block is in the buffer area, maintaining the LRU value of other paths unchanged, wherein the other paths are paths in the buffer area which are larger than or equal to the LRU value of the target path;
and under the condition that the hit Cache block is in a non-buffer zone, maintaining the LRU values of all the ways in the target set corresponding to the hit Cache block unchanged, wherein the non-buffer zone comprises the Cache block with the LRU value smaller than a preset threshold value.
In actual execution, when the access address initiated by the CPU falls within the Group i, assuming that the jth Cache block (Way ji) hits, it is first determined whether the hit Cache block Way ji is within the non-buffer Part0 or the buffer Part1, and if the hit Cache block is within the non-buffer Part0, the values of LRUs of all ways in the target Group corresponding to the hit Cache block remain unchanged. Wherein the jth path
The hit Cache block corresponds to the target way.
If the hit Cache block Way ji is in the buffer Part1, the LRU register bit of the jth Way is cleared, i.e. the LRU value of the target Way is cleared, and meanwhile, the Cache block smaller than the LRU value of the jth Way in the set corresponding to the target Way is found, and the LRU value is added by one, and the LRU values of other ways are kept unchanged.
In some embodiments, LRU updating unit 130 is further to:
When the address of the access Cache is missed, the Cache block of the path with the maximum LRU value is selected for replacement, and the LRU value of a part of the paths is added by one, and the part of the paths are paths with LRU values smaller than the maximum LRU value in the set corresponding to the path with the maximum LRU value.
If the content to be accessed is not in the Cache, selecting the Cache block with the largest LRU value in the Group for replacement, assuming that the LRU value in the j-Way of the Group i Group is the largest, clearing the LRU register bit of the j-th Way (Way ji), searching the Cache block with the LRU value smaller than the j-Way in the Group, adding one to the LRU value, and keeping the LRU values of other ways unchanged. The update flow of the LRU value is shown in fig. 3.
If the Cache is missed, the CPU operates corresponding content in the main memory through the Cache, selects the Cache block of the path with the maximum LRU value for replacement, and updates the LRU value of each path in the set where the replaced Cache block is located.
In actual execution, if the maximum LRU value is Way j of the j-th Way, it is necessary to zero the LRU value Way j of the j-th Way. The LRU value of the jth Way is larger than the LRU value of the Cache blocks of all ways of Way j and is kept unchanged; the LRU value is less than the LRU value of the Cache blocks of all ways j plus one.
The hardware implementation device of the LRU approximation algorithm provided by the invention is used for correspondingly modifying an LRU value updating method in order to improve the operation efficiency of the Cache, namely updating the LRU values of all paths in the set when the Cache is replaced; and when the Cache hits, only the LRU value of the internal branch of the set is updated.
According to the program locality principle, after the replacement operation of the Cache block is performed on the missed address, the probability that the address of the Cache block is accessed in the last period of time is predicted to be larger, and because the address update and replacement operation of the Cache block needs a plurality of clock cycles, when the address of the Cache block is updated, the update of each path of LRU value is performed at the same time, and the access and storage efficiency of the whole Cache is not affected.
Therefore, when the replacement operation of the Cache block address is performed, updating each path of LRU value of the group where the replaced Cache block is located; the hit Cache block indicates that a hit occurs recently after an address is called into the Cache, and each path of LRU value of the set is updated according to an update mechanism of the LRU algorithm, but in an efficient microprocessor design, in order to enable an instruction/data hit in the Cache to return early, a shorter pipeline is often adopted, that is, the shorter the clock period required by the hit operation is, the better, and frequent LRU value update operations affect the access speed of the whole Cache.
For N-way set-connected mapped caches, where N is greater than 2, only when the LRU value of the hit Cache block is (N) bin, the Cache block is replaced, so that when the Cache block hits, the LRU value does not need to be updated from time to time.
The execution subject of the update method of LRU value provided by the present invention may be an electronic device, a component in an electronic device, an integrated circuit, or a chip. The electronic device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a cell phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, wearable device, ultra-mobile personal computer (UMPC), netbook or Personal Digital Assistant (PDA), etc., and the non-mobile electronic device may be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the invention is not limited in particular.
The technical scheme of the present invention will be described in detail below by taking a computer as an example to execute the method for updating the LRU value provided by the present invention.
Fig. 4 is a flowchart illustrating a method for updating LRU values according to the present invention.
Referring to fig. 4, the method for updating LRU values provided by the present invention is applied to the hardware implementation device of the LRU approximation algorithm in the above embodiment, and includes:
step 410, initializing the LRU value of each Cache block in the Cache;
step 420, determining that the address of the access Cache hits or misses;
Step 430, when the address accessing the Cache is missed, selecting the Cache block of the way with the maximum LRU value for replacement, adding one to the LRU value of a part of ways, wherein the part of ways are ways with LRU values smaller than the maximum LRU value in the set corresponding to the way with the maximum LRU value;
Step 440, under the condition that the address of the access Cache hits, the LRU value of the target way corresponding to the hit Cache block is cleared, and the LRU value of the Cache block smaller than the LRU value of the target way in the buffer area is added by one, and the buffer area includes the Cache block with the LRU value greater than or equal to the preset threshold.
In the related art, since future access conditions of each address cannot be predicted, and the case where an address has been accessed is easy to control, only "recent past" can be used to predict "recent future".
The essence of the LRU algorithm is to select the address that has not been accessed for the longest period of time to replace, when a Cache block replacement occurs, select the block that is least likely to be used in the "nearest future" for replacement, and mark this block as the most likely range of addresses to be accessed in the last period of time.
The LRU algorithm is a prediction of future Cache access conditions through the past, so a mechanism needs to be employed to record the access condition of each Cache block over a period of time.
The invention marks the access condition of the Cache block in the latest period of time through the update of the LRU value when the replacement or hit operation of the Cache block occurs. When the CPU accesses the memory, firstly judging whether the content to be accessed is in the Cache, if so, the content is called as Cache hit (hit), and at the moment, the CPU directly calls the content from the Cache and updates the LRU value of each path in the set where the hit Cache block is located; if not, called Cache miss (miss), the CPU operates on the corresponding content in the main memory through the Cache, selects a path with the largest LRU value for replacement, and updates the LRU value of each path in the set where the replaced Cache block is located. However, such frequent LRU value updating operation does not bring about an improvement in Cache hit rate, but rather causes a decrease in Cache memory access efficiency.
Taking the Cache mapped by 4-Way group connection as an example, the initialized value of each Way in each group is Way 0:2′b00;Way1:2′b01;Way2:2′b10;Way3: 2' b11, wherein the LRU values of a set of 4-way set associative caches are organized as shown in FIG. 5, the replacement bit is "1", indicating that the Cache block of that way will be replaced when a replacement of the Cache block occurs.
Taking the Group i of the N-way Group connection mapping Cache as an example, when the Cache blocks of the Group hit or miss, the LRU value updating process of each way of Cache blocks in the Group comprises the following steps:
Step one: the initial values of LRU of the Cache blocks of each Way are set, and the initial values from Way 0 to Way n are as follows:
way0:(0)bin;way1:(1)bin;...;wayn:(n)bin。
In this step, a different initial value of LRU is set for each way, and it is required to satisfy:
LRU_init0<LRU_init1<LRU_init2...<LRU_initn。
Step two: judging whether the address of the CPU access Cache hits or not, if yes, entering a step five; if the step III is missed.
In some embodiments, determining that an address accessing the Cache hits or misses comprises:
comparing whether the address of the Cache is matched with the field of the Tag table;
Under the condition that whether the address of the Cache is matched with a field of the Tag table, determining that the address of the access Cache hits;
and determining that the address of the access Cache is missed under the condition that whether the address of the Cache is not matched with the field of the Tag table.
In the step, whether the address of the Cache is matched with the field of the Tag table is compared, and if so, hit is indicated; if there is no match, a miss, i.e., a miss, is indicated.
Step three: when the address of the CPU accessing the Cache is lost, the path with the maximum LRU value is selected for replacement.
Assuming that the maximum LRU value is Way j of the j-th Way, the LRU value Way j of the j-th Way needs to be cleared. And step four is entered.
Step four: the LRU values of Cache blocks of all ways with LRU values greater than Way j in the set remain unchanged; the LRU value is less than the LRU value of the Cache blocks of all ways j plus one. And (3) entering a step two.
Step five: when the address of the CPU accessing the Cache hits, assuming that the Cache block of the Way j is hit, judging whether the Cache block of the Way j is in the non-buffer area P art0 or the buffer area Part 1. If the buffer is in the unbuffered area Part0, entering a step six; if the buffer Part1 is in the buffer Part1, the step seven is entered.
In some embodiments, the method further comprises:
under the condition that the hit Cache block is in the buffer area, maintaining the LRU value of other paths unchanged, wherein the other paths are paths in the buffer area which are larger than or equal to the LRU value of the target path;
and under the condition that the hit Cache block is in a non-buffer zone, maintaining the LRU values of all the ways in the target set corresponding to the hit Cache block unchanged, wherein the non-buffer zone comprises the Cache block with the LRU value smaller than a preset threshold value.
Step six: and under the condition that the hit Cache block is in the non-buffer area, maintaining the LRU values of all the ways in the target set corresponding to the hit Cache block unchanged, and entering the step two.
Step seven: under the condition that the hit Cache block is in the buffer area, the LRU value of the Cache block of all the ways with the LRU value larger than Way j is kept unchanged; the LRU value is less than the LRU value of Cache blocks of all ways of Way j plus one; and (3) entering a step two.
It will be appreciated that Way 0,Way1,...,Wayn in this step refers to Cache blocks in each Way of the Group i, and the LRU value updating process of each Way of the other groups is the same as that of the Group.
The updating method of the LRU value provided by the invention effectively simplifies the complexity of the LRU algorithm in the implementation process on the premise of ensuring the replacement accuracy, so that the LRU algorithm is more efficient in the implementation process, the operation efficiency of the Cache can be greatly improved, and the hardware implementation cost is effectively reduced.
The LRU value updating device provided by the present invention will be described below, and the LRU value updating device described below and the LRU value updating method described above may be referred to correspondingly to each other.
Fig. 6 is a schematic structural diagram of an apparatus for updating LRU values according to the present invention. Referring to fig. 6, the device for updating LRU value provided by the present invention is applied to the hardware implementation device of the LRU approximation algorithm in the above embodiment, and the device for updating LRU value includes:
An initialization module 610, configured to initialize LRU values of each Cache block in the Cache;
A determining module 620, configured to determine that an address accessing the Cache hits or misses;
and the updating module 630 is configured to select a Cache block of a way with a maximum LRU value for replacement when an address accessing the Cache is missed, and add one to an LRU value of a partial way, where the partial way is a way with an LRU value smaller than the maximum LRU value in a set corresponding to the way with the maximum LRU value.
The updating device for the LRU value provided by the invention effectively simplifies the complexity of the LRU algorithm in the implementation process on the premise of ensuring the replacement accuracy, so that the LRU algorithm is more efficient in the implementation process, the operation efficiency of the Cache can be greatly improved, and the hardware implementation cost is effectively reduced.
In some embodiments, the updating module 630 is further configured to:
After the hit or miss of the address of the access Cache is determined, keeping the LRU value of other ways unchanged under the condition that the hit Cache block is in the buffer area, wherein the other ways are ways which are larger than or equal to the LRU value of the target way in the buffer area;
And under the condition that the hit Cache block is in a non-buffer zone, maintaining the LRU values of all the paths in the target set corresponding to the hit Cache block unchanged, wherein the non-buffer zone comprises Cache blocks with the LRU values smaller than the preset threshold value.
In some embodiments, the determining module 620 is specifically configured to:
comparing whether the address of the Cache is matched with a field of a Tag table;
determining that the address accessing the Cache hits under the condition that whether the address of the Cache is matched with a field of a Tag table;
And determining that the address accessing the Cache is missed under the condition that whether the address of the Cache is not matched with the field of the Tag table.
Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform an update method of LRU values, applied to hardware implementation devices of an LRU approximation algorithm, comprising:
Initializing the LRU value of each path of Cache block in the Cache;
determining that the address accessing the Cache is hit or missed;
Under the condition that the address accessing the Cache is missed, selecting a Cache block of a path with the maximum LRU value for replacement, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value;
Under the condition that the address accessing the Cache hits, the LRU value of a target path corresponding to the hit Cache block is cleared, and the LRU value of the Cache block smaller than the LRU value of the target path in a buffer zone is added by one, wherein the buffer zone comprises the Cache blocks with the LRU values larger than or equal to a preset threshold value.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute a method for updating LRU values provided by the above methods, and the method is applied to a hardware implementation device of an LRU approximation algorithm, and the method includes:
Initializing the LRU value of each path of Cache block in the Cache;
determining that the address accessing the Cache is hit or missed;
Under the condition that the address accessing the Cache is missed, selecting a Cache block of a path with the maximum LRU value for replacement, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value;
Under the condition that the address accessing the Cache hits, the LRU value of a target path corresponding to the hit Cache block is cleared, and the LRU value of the Cache block smaller than the LRU value of the target path in a buffer zone is added by one, wherein the buffer zone comprises the Cache blocks with the LRU values larger than or equal to a preset threshold value.
In still another aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method for updating LRU values provided by the above methods, the method being applied to a hardware implementation device of an LRU approximation algorithm, the method comprising:
Initializing the LRU value of each path of Cache block in the Cache;
determining that the address accessing the Cache is hit or missed;
Under the condition that the address accessing the Cache is missed, selecting a Cache block of a path with the maximum LRU value for replacement, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value;
Under the condition that the address accessing the Cache hits, the LRU value of a target path corresponding to the hit Cache block is cleared, and the LRU value of the Cache block smaller than the LRU value of the target path in a buffer zone is added by one, wherein the buffer zone comprises the Cache blocks with the LRU values larger than or equal to a preset threshold value.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. The hardware implementation device of the LRU approximation algorithm is applied to N-way set-connected mapped caches, wherein each Cache comprises a plurality of sets, each set comprises N-way Cache blocks, and the hardware implementation device of the LRU approximation algorithm comprises:
Least recently used LRU register sets, buffer configuration units, and LRU update units;
the LRU register set is arranged on a state marking bit in each Cache block, the LRU register set comprises M multiplied by log 2 N registers, and M is the number of the Cache blocks included in the Cache;
The LRU register set is used for storing the LRU value of each Cache block, and the register is used for changing the size of the LRU value of each Cache block;
The buffer area configuration unit is used for initializing the LRU value of each Cache block and completing the numerical configuration of the buffer area; the buffer area is used for judging whether the LRU value of each Cache block needs to be updated or not based on the size of the LRU value of each Cache block and a configured preset LRU threshold value, and the size of the preset LRU threshold value is configured by the CPU;
The LRU updating unit is configured to clear an LRU value of a target way corresponding to the hit Cache block and add one to the LRU value of the Cache block in the buffer which is smaller than the LRU value of the target way, when the hit Cache block is in the buffer.
2. The hardware implementation apparatus of the LRU approximation algorithm according to claim 1, wherein the LRU updating unit is further configured to:
under the condition that the hit Cache block is in the buffer area, maintaining the LRU value of other paths unchanged, wherein the other paths are paths in the buffer area which are larger than or equal to the LRU value of the target path;
And under the condition that the hit Cache block is in a non-buffer zone, maintaining the LRU values of all the paths in the target set corresponding to the hit Cache block unchanged, wherein the non-buffer zone comprises Cache blocks with the LRU values smaller than the preset threshold value.
3. The hardware implementation apparatus of the LRU approximation algorithm according to claim 1, wherein the LRU updating unit is further configured to:
And when the address accessing the Cache is missed, selecting a Cache block of a path with the maximum LRU value for replacement, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value.
4. A method for updating LRU values, characterized in that it is applied to the hardware implementation device of the LRU approximation algorithm as claimed in any one of claims 1 to 3, and the method for updating LRU values includes:
Initializing the LRU value of each path of Cache block in the Cache;
determining that the address accessing the Cache is hit or missed;
Under the condition that the address accessing the Cache is missed, selecting a Cache block of a path with the maximum LRU value for replacement, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value;
Under the condition that the address accessing the Cache hits, the LRU value of a target path corresponding to the hit Cache block is cleared, and the LRU value of the Cache block smaller than the LRU value of the target path in a buffer zone is added by one, wherein the buffer zone comprises the Cache blocks with the LRU values larger than or equal to a preset threshold value.
5. The method for updating an LRU value according to claim 4, wherein after determining that the address of the access Cache hits or misses, the method further comprises:
under the condition that the hit Cache block is in the buffer area, maintaining the LRU value of other paths unchanged, wherein the other paths are paths in the buffer area which are larger than or equal to the LRU value of the target path;
And under the condition that the hit Cache block is in a non-buffer zone, maintaining the LRU values of all the paths in the target set corresponding to the hit Cache block unchanged, wherein the non-buffer zone comprises Cache blocks with the LRU values smaller than the preset threshold value.
6. The method of updating LRU value according to claim 4, wherein determining that an address accessing the Cache hits or misses includes:
comparing whether the address of the Cache is matched with a field of a Tag table;
determining that the address accessing the Cache hits under the condition that whether the address of the Cache is matched with a field of a Tag table;
And determining that the address accessing the Cache is missed under the condition that whether the address of the Cache is not matched with the field of the Tag table.
7. An apparatus for updating LRU values, characterized in that it is applied to the hardware implementation apparatus of the LRU approximation algorithm as claimed in any one of claims 1 to 4, and the apparatus for updating LRU values includes:
the initialization module is used for initializing the LRU value of each Cache block in the Cache;
A determining module, configured to determine that an address accessing the Cache hits or misses;
And the updating module is used for selecting a Cache block of a path with the maximum LRU value for replacement under the condition that the address accessing the Cache is missed, and adding one to the LRU value of a part of paths, wherein the part of paths are paths with the LRU value smaller than the maximum LRU value in a set corresponding to the path with the maximum LRU value.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of updating LRU values according to any one of claims 4 to 6 when the program is executed by the processor.
9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements a method of updating LRU values according to any one of claims 4 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements a method of updating LRU values according to any of claims 4 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311868919.8A CN117971731A (en) | 2023-12-29 | 2023-12-29 | Hardware implementation device of LRU (least recently used) approximation algorithm, and updating method and device of LRU value |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311868919.8A CN117971731A (en) | 2023-12-29 | 2023-12-29 | Hardware implementation device of LRU (least recently used) approximation algorithm, and updating method and device of LRU value |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117971731A true CN117971731A (en) | 2024-05-03 |
Family
ID=90855559
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311868919.8A Pending CN117971731A (en) | 2023-12-29 | 2023-12-29 | Hardware implementation device of LRU (least recently used) approximation algorithm, and updating method and device of LRU value |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117971731A (en) |
-
2023
- 2023-12-29 CN CN202311868919.8A patent/CN117971731A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8176258B2 (en) | System and method for cache management | |
US8533422B2 (en) | Instruction prefetching using cache line history | |
KR101165132B1 (en) | Apparatus and methods to reduce castouts in a multi-level cache hierarchy | |
US20080168236A1 (en) | Performance of a cache by detecting cache lines that have been reused | |
US8583874B2 (en) | Method and apparatus for caching prefetched data | |
US20110320720A1 (en) | Cache Line Replacement In A Symmetric Multiprocessing Computer | |
JP2018005395A (en) | Arithmetic processing device, information processing device and method for controlling arithmetic processing device | |
US11301250B2 (en) | Data prefetching auxiliary circuit, data prefetching method, and microprocessor | |
US12099451B2 (en) | Re-reference interval prediction (RRIP) with pseudo-LRU supplemental age information | |
CN108132893A (en) | A kind of constant Cache for supporting flowing water | |
WO2023173991A1 (en) | Cache line compression prediction and adaptive compression | |
US20110202727A1 (en) | Apparatus and Methods to Reduce Duplicate Line Fills in a Victim Cache | |
WO2023173995A1 (en) | Cache line compression prediction and adaptive compression | |
WO2006053334A1 (en) | Method and apparatus for handling non-temporal memory accesses in a cache | |
CN112379929A (en) | Instruction replacement method, device, processor, electronic equipment and storage medium | |
CN117971731A (en) | Hardware implementation device of LRU (least recently used) approximation algorithm, and updating method and device of LRU value | |
JP2024511768A (en) | Method and apparatus for DRAM cache tag prefetcher | |
KR20230022439A (en) | Cache management circuitry for predictive adjustment of cache control policies based on persistent historical cache control information. | |
CN114090080A (en) | Instruction cache, instruction reading method and electronic equipment | |
WO2023055478A1 (en) | Using request class and reuse recording in one cache for insertion policies of another cache | |
CN117917648A (en) | Method, device, chip, processor and system for caching data | |
JPWO2010098152A1 (en) | Cache memory system and cache memory control method | |
JP2000066954A (en) | Replacing method for cache memory and cache memory using the method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |