WO2019128958A1 - 缓存替换技术 - Google Patents

缓存替换技术 Download PDF

Info

Publication number
WO2019128958A1
WO2019128958A1 PCT/CN2018/123362 CN2018123362W WO2019128958A1 WO 2019128958 A1 WO2019128958 A1 WO 2019128958A1 CN 2018123362 W CN2018123362 W CN 2018123362W WO 2019128958 A1 WO2019128958 A1 WO 2019128958A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
data
path
written
sample data
Prior art date
Application number
PCT/CN2018/123362
Other languages
English (en)
French (fr)
Inventor
朗诺斯弗洛里安
邬可俊
杨伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019128958A1 publication Critical patent/WO2019128958A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/123Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list

Definitions

  • the present invention relates to the field of storage technologies, and in particular, to a cache replacement technology.
  • Cache is a memory that can exchange high-speed data. Because of its fast access rate, it will exchange data with the central processing unit CPU in preference to the memory.
  • the CPU wants to read a data, it first looks up from the CPU cache, finds it and immediately reads it and sends it to the CPU for processing. If it is not found, it is read from the relatively slow memory and sent to the CPU for processing. At the same time, the data block where the data is located is transferred into the cache, so that the reading of the entire block of data can be performed from the cache. No need to call memory again. In this way, the access speed of the computer system is improved.
  • the cache is implemented by a static random access memory (SRAM), but the static leakage problem of the SRAM causes a large power consumption of the system. Moreover, SRAM storage units are increasingly difficult to reduce, resulting in bottlenecks in hardware implementation, and the storage space of the cache is limited. With the development of storage technology, more and more non-volatile storage media are gradually being used as caches due to their advantages of fast access speed and low static power consumption. However, when a non-volatile memory (NVM) is used as the cache, although the NVM does not have a static leakage problem, that is, the static power consumption of the NVM is small, since the write power consumption of the NVM is large, Therefore, when writing data to the cache, it also consumes more system power.
  • NVM non-volatile memory
  • the embodiment of the present application provides a cache replacement technology capable of reducing memory power consumption, and can improve memory access speed.
  • the present application provides a method of applying to a cache replacement.
  • the method is applied to a computer system including a cache, the cache including a cache controller and a storage medium coupled to the cache controller for caching data, the storage medium being a non-volatile storage medium.
  • the cache controller determines N to be selected from the cache set corresponding to the access address. road.
  • the cache includes multiple cache sets, each cache set includes M paths, and each path includes a cache line, and the value of N is not less than 2, and M is greater than N.
  • the cache controller compares the to-be-written data with the sample data of the N to-be-selected paths to obtain N Hamming distances, and minimizes the N Hamming distances.
  • the cache line in the corresponding path is used as the cache line to be replaced.
  • the sample data is the same length as the data to be written, and the Hamming distance is used to indicate the number of different corresponding bits of the two data of the same length.
  • the cache controller writes the data to be written into the storage medium, and the data to be written is used to replace the cache line to be replaced.
  • the cache replacement method provided by the embodiment of the present invention selects a plurality of to-be-selected paths in a least-used LRU path of a cache set corresponding to an access address when the non-volatile storage medium is used for caching, and respectively
  • the data to be written is compared with the sample data of the plurality of to-be-selected paths to obtain a plurality of Hamming distances, and the cache line in the path corresponding to the Hamming distance having the smallest value is used as the cache line to be replaced. Since the plurality of to-be-selected paths are selected in the LRU way of the cache set, the influence of the replacement cache line on the cache hit ratio can be reduced.
  • the cache line to be replaced is determined according to the Hamming distance in the plurality of to-be-selected paths, the Hamming distance can reflect the similarity of the two data, and therefore, the data to be written is written into the cache.
  • the amount of data written can be reduced, the power consumption when writing data is saved, and the write overhead of the system is reduced.
  • the method further includes the cache controller obtaining the cache according to a cache line of an i-th path in each of the plurality of cache sets Sample data of the i-th channel in the set, wherein the sample data of the i-th path is the same as the length of the cache line in the i-th path, and the i-th path is any one of the M roads, and i is greater than Equal to 1 and less than or equal to M.
  • the cache controller adopts a fuzzy pseudo least recently used PLRU algorithm from the least recently used LRU of the cache set
  • the sample data of the i-th path in the different cache sets of the cache is the same, where i is greater than or equal to 0 and less than or equal to M-1.
  • the method further includes the Writing a quantity of the first preset value in each bit in the i-channel cache line, and updating a corresponding bit in the sample data of the i-th path according to the number of the first preset value written in each bit Bit to obtain sample data of the updated i-th road.
  • the first preset value includes “1” or “0”.
  • the sample data of different paths in the cache set is different.
  • an embodiment of the present application provides a computer system.
  • the computer system includes a cache controller and a cache coupled to the cache controller, the cache being a non-volatile memory, the cache controller for performing the first aspect described above and various possible implementations of the first aspect The cache replacement method in the mode.
  • an embodiment of the present application provides a cache controller.
  • the cache controller is applied to a computer system that includes a non-volatile cache.
  • the cache controller includes means for performing the cache replacement method in the first aspect described above and any one of the possible implementations of the first aspect.
  • the present application provides a computer program product comprising a computer readable storage medium storing program code, the program code comprising instructions for performing any of the above first aspect and the first aspect At least one cache replacement method.
  • FIG. 1 is a schematic structural diagram of a computer system according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of group connection mapping according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a cache replacement method according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of an access address according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a search tree according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a cache controller according to an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a computer system according to an embodiment of the present invention.
  • computer system 100 can include at least processor 105, cache 110, memory controller 115, and memory 120.
  • the processor 105 is the core of the computer system 100, and the processor 105 can invoke different software programs in the computer system 100 to implement different functions.
  • processor 105 can implement access to cache 110 and memory 130.
  • the processor 105 can be a central processing unit (CPU).
  • the processor may be another application specific integrated circuit (ASIC) or one or more integrated circuits configured to implement embodiments of the present invention.
  • the computer system may also include multiple processors.
  • the embodiment of the present invention takes a processor as an example for example.
  • the processor may be a single core processor or a multi-core processor.
  • multiple processor cores can be included in a processor.
  • one or more CPU cores 110 may be included in the processor 105.
  • CPU core 110 is just one example of a processor core. The number of processors and the number of processor cores in a processor are not limited in the embodiment of the present invention.
  • the cache 110 is a temporary memory located in the computer system 100 between the processor 105 and the memory 120. Since the rate of the cache 110 is relatively fast, the cache 110 is used for high speed data exchange with the CPU core 108. The cache 110 is used to cache part of the data in the memory 120 and the data to be written to the memory 120. When the processor 105 accesses the data, the processor 105 first looks in the cache 110. When there is no data to be accessed by the processor 105 in the cache 110, the processor 105 accesses the memory 120, thereby speeding up access by the processor 105. rate.
  • the cache 110 may include a cache controller 112 and a storage medium 114. Cache controller 112 is a control circuit in cache 110 that is capable of accessing storage medium 114.
  • cache controller 112 may return data cached in storage medium 114 to processor 105 in accordance with a read operation instruction sent by processor 105.
  • the cache controller 112 can also cache data to be written in the storage medium 114 of the cache 110 in accordance with a write operation instruction of the processor 105.
  • the cache controller 112 can also manage data cached in the storage medium 114.
  • the cache 110 can be integrated into the processor 105.
  • each CPU core 108 can include a cache 110.
  • the cache 110 is composed of a static random access memory (SRAM).
  • SRAM static random access memory
  • the storage medium 114 may include a phase-change random access memory (PCM), a resistive random access memory (RRAM), and a spin torque transmission random access memory (spin torque). Transfer RAMs, STT-RAM), etc.
  • PCM phase-change random access memory
  • RRAM resistive random access memory
  • spin torque spin torque transmission random access memory
  • Memory controller 115 is an important component of computer system 100 internal control memory 120 and for exchanging data between memory 120 and processor 105 (e.g., a CPU).
  • the memory controller 115 can be located inside the north bridge chip.
  • the memory controller 115 can be integrated into the processor 105.
  • the memory controller 115 can be integrated on the substrate of the processor 105. It can be understood that when the memory controller 115 is located inside the north bridge chip, the memory controller needs to exchange data with the processor through the north bridge chip, resulting in a large data delay.
  • the memory control 115 can exchange data directly with the processor.
  • the memory controller 115 can connect to the memory 120 via a memory bus (eg, a double rate DDR bus). It can be understood that, in practical applications, the memory controller 115 can also communicate with the memory 120 through other types of buses such as a PCI high speed bus, a desktop management interface (DMI) bus, and the like.
  • a memory bus eg, a double rate DDR bus
  • DMI desktop management interface
  • the memory 120 is used to store various running software, input and output data, and information exchanged with external memory in the operating system.
  • Memory 120 can again be referred to as main memory.
  • the memory 120 has the advantage of fast access speed.
  • a dynamic random access memory (DRAM) is generally used as the memory 120.
  • DRAM dynamic random access memory
  • PCM phase-change random access memory
  • RRAM resistive random access memory
  • MRAM magnetic random New NVMs
  • FRAM ferroelectric random access memory
  • the specific storage medium type of the memory 120 is not limited in the embodiment of the present invention.
  • the cache 110 is primarily used to cache a portion of the data stored in main memory (e.g., memory 120 in FIG. 1) and to cache data to be written to main memory. Since the capacity of the cache cache is relatively small compared with the main memory, the content saved by the cache is only a subset of the contents of the main storage, and the data exchange between the cache and the main memory is in units of blocks.
  • main memory e.g., memory 120 in FIG. 1
  • address mapping In order to cache the data in the main memory to the Cache, some function must be applied to locate the main memory address into the Cache. This is called address mapping. After the data in the main memory is cached in the Cache according to this mapping relationship, when the CPU executes the program, the main memory address in the program is converted into a Cache address.
  • Cache address mapping usually has direct mapping and group associative mapping.
  • direct mapping mode a block in main memory can only be mapped to a specific block of the Cache.
  • Direct mapping is the simplest way to map addresses. It has simple hardware, low cost, and fast address translation. However, this method is not flexible enough, and the storage space of the Cache is not fully utilized. Since each memory block can only be stored in a fixed location in the Cache, conflicts are easily generated and the Cache efficiency is lowered.
  • more and more storage systems use group-connected mapping.
  • the cache 110 and the memory 120 are also in a group-connected mapping manner. In order to facilitate the understanding of the solution, the group connection mapping mode adopted by the memory 120 and the cache 110 in the embodiment of the present invention is described below.
  • the main memory and the Cache are divided into a plurality of groups, and the number of blocks in one set in the main memory is the same as the number of groups in the Cache.
  • which group of memory blocks is stored is fixed, and which one of the groups is flexible.
  • the main memory is divided into 256 groups, each group of 8 blocks, and the Cache is divided into 8 groups, each group having 2 blocks.
  • the 0th block and the 8th block in the main memory are all mapped to the 0th group of the Cache, but can be mapped to the 0th block or the 1st block in the Cache Group 0; the 1st block and the 9th block of the main memory ... are mapped to the first group of Cache, but can be mapped to the second or third block in the first group of Cache.
  • Connected Cache or 16-way set associative Cache may also be referred to as “collection”.
  • the data in the memory 120 is mapped into the cache 110 by using a group connection mapping manner.
  • the data block in the memory 120 may also be referred to as a memory block
  • the data block in the cache 110 may be referred to as a cache block or a cache line.
  • a memory block can be 4KB (kilobyte) in size
  • a cache line can be 4KB in size. Understandably, in practical applications, you can also set the size of memory blocks and cache lines to other values.
  • the size of the memory block is the same as the size of the cache line.
  • FIG. 2 shows a mapping diagram of the memory 120 and the cache 110 in the embodiment of the present invention. Specifically, FIG. 2 shows a schematic diagram of group connection mapping between memory and cache.
  • the cache 110 may include a plurality of cache sets, and each cache set may include a plurality of cache lines.
  • each cache set can include multiple way data. Among them, each way has a cache entry. That is, the cache entry is used to indicate a specific way or cache line in the cache set.
  • the NVM 10 includes multiple cache sets, such as cache set 1 and cache set 2, and the cache set cache set 1 includes three paths, which are indicated by the following three cache entries: cache entry 200_1, cache entry 200_2 And cache entry 200_3.
  • the memory space of the memory 120 is also divided into a plurality of different memory sets: set 1 210_1, set 2 210_2, ... set N 210_N.
  • the mapping mode of the group connection the memory block corresponding to any storage address in set 1 210_1 can be mapped to a cache set of the cache 110, but can be freely mapped to any one of the cache sets.
  • the memory block corresponding to any one of the storage addresses of the set 1 210_1 can be mapped into the cache set cache set 1 of the cache 110, and can be freely mapped to any one of the cache set 1.
  • the memory block corresponding to any storage address in set 1 210_1 can be mapped to cache entry 200_1, cache entry 200_2 or cache entry 200_3 in cache set 1.
  • a cache entry corresponds to one row of data.
  • a cache entry corresponds to a cache line.
  • the cache 110 may include a plurality of rows, each of which may store a plurality of bytes of data.
  • Each cache entry includes at least a valid bit 201, a dirty bit 203, a tag 205, and a data 207. It can be understood that, in actual applications, each cache entry may further include an Error Correcting Code (ECC) to ensure the accuracy of the stored data.
  • ECC Error Correcting Code
  • the Tag 205 is a part of the main memory address and is used to indicate the location of the memory block mapped by the cache line in the memory 120.
  • Data 207 refers to the data of the memory block cached in the cache line.
  • a valid bit 201 is used to indicate the validity of the cache line. When the valid bit is indicated as a valid valid, the data in the cache line is available. When the valid bit is indicated as invalid invalid, the data in the cache line is not available.
  • a dirty bit 203 is used to indicate whether the data in the cache line is the same as the data in the corresponding memory block. For example, when the dirty bit is indicated as dirty, the data portion in the cache line (such as Data 207 in Figure 2) is different from the data in the corresponding memory block. In other words, when the dirty bit is indicated as dirty. The cache line contains new data. When the dirty bit is indicated as clean and clean, the data in the cache line is the same as the data in the corresponding memory block. In practical applications, the value of the dirty bit can be indicated as dirty or clean with a certain value. There is no limit here.
  • the cache 110 and the memory 120 are mapped is described above.
  • the processor 105 can issue an access request to the cache 110, where the access request includes an access address.
  • the cache controller 112 first determines whether the data requested by the processor 105 is cached in the storage medium 114 based on the access address. In another way, the cache controller 112 first determines whether the access request can hit the cache based on the access address.
  • the cache controller 112 may directly return the data requested by the processor 105 to the processor 105 when the access request does not hit, that is, in the case where it is determined that the data of the to-be-accessed address is not cached in the cache, the processor 105 accesses the memory 120. Specifically, the data of the to-be-accessed address can be obtained from the memory 120 through the memory controller 115.
  • the cache 110 needs to continuously update its cached content according to the access conditions during the data access process to meet the ever-changing access requirements. Specifically, when the data access hits the cache, the data in the cache can be directly accessed without a replacement update of the cache line. When the data access misses the cache, the cache controller 112 needs to determine a cache line to be replaced from the currently cached cache line, and replace the wait with the cache line of the new address read from the memory. Replaced Cache line.
  • the Cache line is the smallest unit of operation of the Cache Controller 112.
  • the cache controller 112 when the cache controller 112 writes the data in the cache 114 to the memory, the cache controller 112 writes a line of line data into the memory in units of the Cache line.
  • the cache controller 112 reads data from the memory, it also reads the data in units of the Cache line.
  • a Cache line may represent data of a Cache line.
  • the "replacement cache line" in the embodiment of the present invention refers to replacing data of a Cache line in the Cache with data of a Cache line read from the memory.
  • the goal of most cache replacement methods in the prior art is to optimize the cache hit ratio. That is to say, in the prior art, the cache line to be replaced is mainly selected based on increasing the cache hit ratio.
  • the cache replacement method in the prior art usually causes a large write overhead when writing data in a write request to a cache with NVM as a medium.
  • group connection mapping although the group connection can improve the cache hit rate, since a cache line to be replaced can be arbitrarily selected in a cache set, the selection of the cache line to be replaced is further increased. Suitably increase the risk of write overhead.
  • the embodiment of the present invention provides a cache replacement method.
  • a cache replacement method In a computer system in which a non-volatile storage medium is used as a cache, in a case where a cache and a memory are connected by a group, the cache hit ratio can be improved. Based on this, reduce the overhead of writing data during cache replacement.
  • FIG. 3 is a flowchart of a cache replacement method according to an embodiment of the present invention. This method is primarily performed by the cache controller 112 in the cache 110. As shown in FIG. 3, the method can include the following steps.
  • the cache controller receives a write request, the write request including data to be written and an access address.
  • the access address is a physical address of the memory.
  • step 304 the cache controller determines, according to the access address, that the cache line corresponding to the cache is not cached. Specifically, in this step, the cache controller 112 may determine whether the address to be accessed hits the cache 110 by using a tag in the access address. In another way, the cache 110 can determine whether the data in the address is cached by a tag in the access address. The following describes how the cache controller determines whether the access address hits the cache 110 in conjunction with FIG.
  • the cache controller 112 can divide the access address 400 into three parts: a tag 402, a set index 404, and a block offset 406.
  • the set index 404 is used to indicate which cache set in the cache 100 the memory block pointed to by the access address 400 is mapped;
  • the tag 302 is used to indicate the location of the memory block pointed to by the access address 400 in the memory 120.
  • a block offset 406 is used to indicate that the data to be written is at the offset position of the row, that is, the block offset 306 is used to determine at which position of the row the data to be written is written. .
  • the cache controller 112 may first determine which of the caches 110 the access address 400 belongs to according to the set index 404 portion of the access address 400. Due to the group-connected mapping mode, a cache set includes multiple paths. In other words, because multiple cache lines are included in a single cache set. Therefore, after determining the cache set to which the access address 400 belongs, the cache controller 112 can store the value of the tag 402 portion of the access address 400 with the cache entry of each way in the cache set pointed to by the set index 404 portion (eg, in FIG. 2 The tag bits in cache entry 200_1, cache entry 200_2, and the like cache entry 200_3) (eg, tag 205 in FIG. 2) are compared to determine if access address 400 hits cache 110.
  • the tag of the access address 400 is the same as the tag in a cache entry in the cache set, the data corresponding to the access address is cached in the cache 110.
  • the tag in the target address is not the same as the tag of the cache entry in the cache set, it is determined that the access request misses the cache 110. In this case, further access to the memory 120 is required.
  • step 306 if the cache controller 112 determines the access request miss cache 110 according to the above method, that is, when the cache controller 112 determines that the cache 110 does not cache the cache corresponding to the access address. In the case of a line cache line, the method proceeds to step 306.
  • the cache controller determines N to-be-selected paths from the cache set corresponding to the access address.
  • the cache includes multiple cache sets, each cache set includes M paths, and each path includes a cache line, and the value of N is not less than 2, and M is greater than N.
  • the N to-be-selected paths are the least recently used LRU paths. Specifically, after the cache controller determines the access request miss cache 110 according to the access address in the access request, the cache line to be replaced needs to be further selected from the cache set corresponding to the access address.
  • the embodiment of the present invention describes an example in which a cache set includes M paths, where M is an integer greater than 2.
  • PLRU pseudo least recently used
  • all paths in a cache set can be indicated by a binary search tree.
  • the search set 500 shown in FIG. 5 can be used to indicate the cache set. Each road in the middle.
  • the child node "0" is used to indicate "go to the left to find a pseudo LRU way”
  • the child node "1" is used to indicate "go to the right to find the pseudo LRU way”.
  • way 0 and way 1 can be found according to child node "0”
  • way 2 and way 3 can be found according to child node "1".
  • the fuzzy PLRU algorithm provided by the embodiment of the present invention may select a corresponding number of to-be-selected paths from the search tree according to a preset value of N.
  • N a preset value of N.
  • the search tree shown in FIG. 5 when the path to be selected is selected, if the value of the root node is not considered, two LRU paths can be selected. If you do not consider the root node and the value of the root node of the subtree, you can select 4 LRU paths.
  • 2 k LRU paths may be selected, wherein the k-layer node is a node that does not include a leaf node. . k is less than or equal to log 2 M.
  • the search tree corresponding to the cache set may be searched according to the PLRU code of the cache set corresponding to the access address to select the N to-be-selected paths.
  • the fuzzy P-LRU code of the cache set corresponding to the access address is: “01”
  • the search tree of the cache set is shown by the search tree 500 in FIG. 5, and the search tree described in FIG. 5 has 4 Road, to find 2 roads to be selected.
  • Subtree That is, when searching the search tree, the two subtrees under the root node "1" (including the subtree 502 whose child node is "0") may be searched according to the P-LRU code "01" of the cache set.
  • a subtree 504 having a child node of "1" is included to select 2 roads to be selected.
  • the way 0 in the subtree 502 can be found according to the upper bit “0” in the code “01”, and the way 3 in the subtree 504 is searched according to the lower bit “1” in the code “01”, thereby, the subtree Road 0 in 502 and way 3 in subtree 504 serve as the two to-be-selected paths.
  • the method proceeds to step 308.
  • step 308 the cache controller compares the data to be written with the sample data of the N to-be-selected paths to obtain N Hamming distances.
  • the sample data is the same length as the data to be written, and the Hamming distance is used to indicate the number of different corresponding bits of the two data of the same length.
  • each path in the cache set is provided with one sample data.
  • the length of the sample data is the same as the length of the cache line.
  • the i-th path in the different cache sets shares the same sample data, wherein the i-th path is any one of the cache sets, and the value of i is not greater than M, for example, i is greater than or equal to 0 and less than or equal to M-1. In this way, the sample data of different paths in the same cache set are not necessarily the same.
  • sample data of each path can be randomly generated.
  • the Hamming distance between the two different sample data is not less than the second preset value.
  • the second preset value may be 512/M, where M is the number of ways in a cache set.
  • samples can also be generated dynamically.
  • an initial value may be set for the sample data of the i-th path, for example, the initial value may be 0000, or may be the first data written to the i-th path.
  • the sample data can be dynamically updated according to the data written to the i-th road.
  • a first counter is provided for the i-th path, and the first counter is used to track the number of write requests for writing data to the i-th path.
  • a second counter is set for each bit of the cache line of the i-th path, where the second counter is used to count the number of corresponding bits being written into the first preset value, and the first preset value may be “0” or “1”. That is, the second counter is used to count the number of corresponding bits being written to "0" or written to "1". According to this manner, if the cache line of the i-th path has a P bit, it is necessary to set P second counters correspondingly. It can be understood that since the length of the sample data is the same as the length of the cache line, the P second counters can also be considered to correspond to each bit of the sample data.
  • the cache controller may set a second counter according to each bit of the i-th cache line.
  • the value of the sample is updated. Specifically, when the second counter of the kth bit counts that the number of times the kth bit in the i-th cache line is written to "1" exceeds the second threshold, the value of the kth bit of the sample data is set to " 1". When the second counter of the k+1th bit counts that the number of times the k+1th bit is written to "1" does not exceed the second threshold, the value of the k+1th bit of the sample data is set to "0".
  • the second threshold is not greater than the first threshold.
  • the second threshold may be half of the first threshold. It can be understood that, in practical applications, the first threshold and the second threshold may also be set as needed.
  • the first threshold may be 10 and the second threshold may be 8. This is not limited here.
  • the sample data of the i-th path can be updated every 30 minutes.
  • the above i-th path can be the i-th path of different cache sets.
  • the i-th road is taken as an example to describe how to generate sample data of each path in the cache set.
  • the cache controller may store the data to be written carried in the write request and the N to be selected paths.
  • the sample data is compared to obtain the N Hamming distances. For example, if it is determined in step 306 that the way 0 and the way 3 shown in FIG. 5 are the two paths to be selected, in this step, the cache controller may separately select the sample data of the way 0 and the sample data of the way 3.
  • Each bit is compared with a value of a corresponding bit of the data to be written to obtain two Hamming distances including a Hamming distance of the data to be written and sample data of way 0 and The Hamming distance of the data to be written and the sample data of the way 3.
  • Hamming distance refers to the number of different corresponding bits of data of two same lengths. For example, if the data to be written is 0011, the sample data of the road 0 is 0101, and the sample data of the road 3 is 1000, the value of the data to be written and the sample data of the road 0 have two corresponding bits, that is, The Hamming distance between the data to be written and the sample data of the way 0 is 2.
  • the data to be written and the sample data of the way 3 have different values of three corresponding bits, that is, the Hamming distance of the sample data to be written and the path 3 is 3.
  • the cache controller uses the cache line in the path corresponding to the minimum of the N Hamming distances as the cache line to be replaced. Since the Hamming distance can reflect the degree to which the data to be written is approximated by the value of the cache line in the cache. In other words, the smaller the Hamming distance between the two data, the more similar the two data are. Since the sample data of a certain path is obtained according to the cache line in the path, the Hamming distance of the data to be written and the sample data of a certain path can also indicate the path to be written in the data field. The approximate degree of the cache line.
  • the cache controller may use the N The cache line of the path corresponding to the minimum value of the Hamming distances is used as the cache line to be replaced, so that the number of rewritten bits is minimized when the data to be written is written.
  • the cache controller may select the cache line in the way 0 as the cache line to be replaced.
  • the cache controller writes the data to be written into the cache, and the data to be written is used to replace the cache line to be replaced.
  • the cache controller may write the data to be written into the path 0 in bits to replace the cache line to be replaced. Since the Hamming distance between the data to be written and the sample data of the way 0 is small, the data to be written is also more similar to the cache line to be replaced in the way 0. Thus, when writing the data to be written, fewer bits need to be rewritten.
  • the tag corresponding to the cache line to be replaced in the cache set may be replaced by a tag in the access address.
  • the cache controller can process subsequent access requests according to the replaced cache line in the cache set.
  • the fuzzy P-LRU of the cache set needs to be updated. coding.
  • the root node "0" of the subtree 1 where the way 0 is located may be set to 1, and the fuzzy P- of the cache set.
  • the LRU code is updated from "01" to "11", where the upper "1" is used to point to the subtree 1 to the left of the root node, and the lower "1" is used to point to the subtree 2 to the right of the root node.
  • the cache controller may process the subsequent access request according to the method provided by the embodiment of the present invention according to the updated fuzzy P-LRU encoding of the cache set.
  • the cache replacement method provided by the embodiment of the present invention selects a plurality of to-be-selected paths in a least-used LRU path of a cache set corresponding to an access address when the non-volatile storage medium is used for caching, and respectively
  • the data to be written is compared with the sample data of the plurality of to-be-selected paths to obtain a plurality of Hamming distances, and the cache line in the path corresponding to the Hamming distance having the smallest value is used as the cache line to be replaced. Since the plurality of to-be-selected paths are selected in the LRU way of the cache set, the influence of the replacement cache line on the cache hit ratio can be reduced.
  • the cache line to be replaced is determined according to the Hamming distance in the plurality of to-be-selected paths, the Hamming distance can reflect the similarity of the two data, and therefore, the data to be written is written into the cache.
  • the amount of data written can be reduced, the power consumption when writing data is saved, and the write overhead of the system is reduced.
  • FIG. 6 is a schematic structural diagram of a cache controller according to an embodiment of the present invention.
  • the cache controller shown in FIG. 6 may be the cache controller 112 in the computer system shown in FIG. 1.
  • the cache controller 600 may include the following modules.
  • the receiving module 602 is configured to receive a write request, where the write request includes data to be written and an access address.
  • the determining module 604 is configured to determine, according to the access address, whether a cache line cache line is cached in the cache. When the determining module 604 determines that the corresponding cache line cache line is not cached in the cache, the selection module 606 is triggered.
  • the selecting module 606 is configured to determine N to-be-selected paths from the cache set corresponding to the access address.
  • the cache includes multiple cache sets, each cache set includes M paths, and each path includes a cache line, and the value of N is not less than 2, and M is greater than N.
  • the calculating module 608 is configured to compare the to-be-written data with the sample data of the N to-be-selected paths to obtain N Hamming distances.
  • the sample data is the same length as the data to be written, and the Hamming distance is used to indicate the number of different corresponding bits of the two data of the same length.
  • the selection module 606 is further configured to use, as the calculation result of the calculation module 608, a cache line in a path corresponding to a minimum value of the N Hamming distances as a cache line to be replaced.
  • the writing module 610 is configured to write the data to be written into the storage medium, where the data to be written is used to replace the cache line to be replaced.
  • the selecting module 606 is specifically configured to determine the N to-be-selected paths from the path of the least recently used LRU of the cache set by using a fuzzy pseudo least recently used PLRU algorithm.
  • the cache controller 600 may further include a sample data processing module 612.
  • the sample data processing module 612 is configured to obtain sample data of an i-th path in the cache set according to a cache line of an i-th path in each of the plurality of cache sets.
  • the sample data of the i-th channel is the same as the length of the cache line in the i-th path, and the i-th path is any one of the M roads, and i is greater than or equal to 1 and less than or equal to M.
  • the sample data processing module 612 is further configured to count the number of the first preset values written in each bit in the i-th cache line, and write the location according to each bit. Updating the corresponding bit in the sample data of the i-th path to obtain the updated sample data of the i-th path, wherein the first preset value includes “1” or “0” .
  • sample data of the i-th path in different cache sets is the same, where i is greater than or equal to 0 and less than or equal to M-1.
  • the sample data of different paths in the same cache set is different.
  • the embodiment of the invention further provides a computer program product of a cache replacement method, comprising a computer readable storage medium storing program code, the program code comprising instructions for executing the method flow described in any one of the foregoing method embodiments.
  • a person skilled in the art can understand that the foregoing storage medium includes: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a random access memory (RAM), a solid state drive (SSD), or other nonvolatiles.
  • a non-transitory machine readable medium that can store program code, such as non-volatile memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种缓存替换技术,该缓存替换技术应用于包含非易失性缓存的计算机系统(100)中。根据所述缓存替换技术,在与访问地址对应的缓存集合中确定N个待选择路(306);分别将所述待写入数据与所述N个待选择路的样本数据进行比较获得多个汉明距离(308),将所述N个汉明距离中的最小值对应的路中的cache line作为待替换的cache line(310)。该缓存替换技术能够在保证缓存命中率的基础上降低所述计算机系统(100)的写开销。

Description

缓存替换技术 技术领域
本发明涉及存储技术领域,尤其涉及一种缓存替换技术。
背景技术
缓存是指可以进行高速数据交换的存储器,由于其访问速率很快,它会优先于内存与中央处理器CPU交换数据。当CPU要读取一个数据时,首先从CPU缓存中查找,找到就立即读取并送给CPU处理。如果没有找到,就从速率相对较慢的内存中读取并送给CPU处理,同时把这个数据所在的数据块调入缓存中,可以使得以后对整块数据的读取都从缓存中进行,不必再调用内存。通过这种方式,提升了计算机系统的访问速度。
通常,缓存采用静态随机存取存储器(Static Random Access Memory,SRAM)来实现,但SRAM存在静态漏电问题,导致系统功耗较大。并且,SRAM存储单元越来越难减少,导致硬件实现上存在瓶颈,且缓存的存储空间有限。随着存储技术的发展,越来越多的非易失性存储介质由于具有访问速度快、且静态功耗低的优点,也逐渐被作为缓存使用。然而,当采用非易失性存储器(non-volatile memory,NVM)作为缓存时,虽然NVM不存在静态漏电问题,也就是说NVM的静态功耗较小,但由于NVM的写功率消耗较大,因此,在将数据写入缓存时,也会消耗较多的系统功率。
发明内容
本申请实施例中提供了一种能够减少内存功耗的缓存替换技术,并且能够提升内存访问速度。
第一方面,本申请提供一种应用于缓存替换方法。所述方法应用于包括缓存的计算机系统中,所述缓存包括缓存控制器以及与所述缓存控制器连接的用于缓存数据的存储介质,所述存储介质为非易失性存储介质。根据所述方法,在接收写请求并根据所述访问地址确定所述缓存中没有缓存对应的缓存行cache line之后,所述缓存控制器从所述访问地址对应的缓存集合中确定N个待选择路。其中,所述缓存中包括多个缓存集合,每个缓存集合中包含有M个路,每个路中包含有一个cache line,N的值不小于2,且M大于N。进一步的,所述缓存控制器分别将所述待写入数据与所述N个待选择路的样本数据进行比较以获得N个汉明距离,并将所述N个汉明距离中的最小值所对应的路中的cache line作为待替换的cache line。其中,所述样本数据与所述待写入数据的长度相同,所述汉明距离用于指示所述两个相同长度的数据具有的不同的对应位的数量。然后,所述缓存控制器将所述待写入的数据写入所述存储介质中,所述待写入数据用于替换所述待替换的cache line。
本发明实施例提供的缓存替换方法,在采用非易失性存储介质做缓 存时,通过在与访问地址对应的缓存集合的最近最少使用LRU路中选择多个待选择路,并分别将所述待写入数据与所述多个待选择路的样本数据进行比较获得多个汉明距离,并将值最小的汉明距离对应的路中的cache line作为待替换的cache line。由于所述多个待选择路是在缓存集合的LRU路中选择的,因此能够减少因替换cache line对缓存命中率的影响。并且,由于待替换的cache line是根据汉明距离在所述多个待选择路中确定的,汉明距离能够体现两个数据的相似性,因此,在将所述待写入数据写入缓存以替换所述待替换cache line时能够减少写入的数据量,节省写数据时的功耗,降低系统的写开销。
结合第一方面,在第一种可能的实现方式中,所述方法还包括所述缓存控制器根据所述多个缓存集合中的每一个缓存集合中的第i路的cache line获得所述缓存集合中的第i路的样本数据,其中,所述第i路的样本数据与所述第i路中的cache line的长度相同,第i路为所述M个路中的任意一路,i大于等于1且小于等于M。
结合第一方面或第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述缓存控制器采用模糊伪最近最少使用PLRU算法从所述缓存集合的最近最少使用LRU的路中确定所述N个待选择路,其中,N=2 n,n为不小于1的整数。
结合第一方面、第一方面的第一种或第二种可能的实现方式,在第三种可能的实现方式中,所述缓存的不同缓存集合中的第i路的样本数据相同,其中,i大于等于0且小于等于M-1。
结合第一方面的第一种至第三种可能的实现方式中的任意一种可能的实现方式,在第四种可能的实现方式中,所述方法还包括所述缓存控制器统计所述第i路cache line中的各个比特位中写入第一预设值的数量,并根据各比特位中写入所述第一预设值的数量更新所述第i路的样本数据中的对应比特位,以获得更新的第i路的样本数据。其中,所述第一预设值包括“1”或“0”。
结合第一方面以及第一方面的第一至第四种可能的实现方式中的任意一种实现方式,在第五种可能的实现方式中,所述缓存集合中不同路的样本数据不同。
第二方面,本申请实施例提供了一种计算机系统。所述计算机系统包括缓存控制器和与所述缓存控制器连接的缓存,所述缓存为非易失性存储器,所述缓存控制器用于执行上述第一方面以及第一方面的各种可能的实现方式中的缓存替换方法。
第三方面,本申请实施例提供了一种缓存控制器。所述缓存控制器应用于包含有非易失性缓存的计算机系统中。所述缓存控制器包括用于执行上述第一方面以及第一方面的任意一种可能的实现方式中的缓存替换方法的模块。
第四方面,本申请提供了一种计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行上述第一方面及第一方面的任意一种实现方式中的至少一种缓存替换方法。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。显而易见地,下面描述中的附图仅仅是本发明的一些实施例。
图1为本发明实施例提供的一种计算机系统的结构示意图;
图2为本发明实施例提供的一种组相连映射示意图;
图3为本发明实施例提供的一种缓存替换方法流程图;
图4为本发明实施例提供的一种访问地址结构示意;
图5为本发明实施例提供的一种搜索树的示意图;
图6为本发明实施例提供的一种缓存控制器的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述。显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。
图1为本发明实施例提供的一种计算机系统的结构示意图。如图1所示,计算机系统100至少可以包括处理器105、缓存110、内存控制器115以及内存120。处理器105是计算机系统100的核心,处理器105可以调用计算机系统100中不同的软件程序实现不同的功能。例如,处理器105能够实现对缓存110以及内存130的访问。可以理解的是,处理器105可以是中央处理器(central processing unit,CPU)。除了CPU外,处理器还可以是其他特定集成电路(application specific integrated circuit,ASIC),或者是被配置成实施本发明实施例的一个或多个集成电路。实际应用中,计算机系统还可以包括多个处理器。为了描述方便,本发明实施例以一个处理器为例进行示例。另外,处理器可以是单核处理器,也可以是多核处理器。在多核处理器架构中,处理器中可以包括多个处理器核。例如,如图1所示,处理器105中可以包括一个或多个CPU核110。可以理解的是,CPU核110只是处理器核的一种示例。在本发明实施例中不对处理器的数量以及一个处理器中处理器核的数量进行限定。
缓存110是计算机系统100中位于处理器105与内存120之间的临时存储器。由于缓存110的速率较快,因此,缓存110用于与CPU核108进行高速数据交换。缓存110用于缓存内存120中的部分数据以及将要写入内存120的数据。当处理器105访问数据时,处理器105会先在缓存110中查找,当缓存110中没有处理器105所需访问的数据时,处理器105才访问内存120,从而加快了处理器105的访问速率。具体的,缓存110可以包括缓存控制器112以及存储介质114。缓存控制器112是缓存110中的控制电路,能够对存储介质114进行访问。例如,缓存控制器112可以根据处理器105发送的读操作指令向处理器105返回存储介质114中缓存的数据。缓存控制器112还可以根据处理器105的 写操作指令将待写入的数据缓存于缓存110的存储介质114中。并且,缓存控制器112还可以对存储介质114中缓存的数据进行管理。本领域人员可以知道,缓存110可以集成在处理器105中。在多核处理器系统中,每个CPU核108都可以包含一个缓存110。
通常,缓存110由静态随机存取存储器(Static Random Access Memory,SRAM)组成。然而,SRAM存储单元越来越难减小,因此,采用SRAM做缓存时其缓存容量受限。并且,SRAM还存在静态漏电等问题。因此,随着存储技术的发展,越来越多的非易失性存储介质被作为缓存。例如,存储介质114可以包括相变随机存取存储器(phase-change random access memory,PCM)、阻变随机存取存储器(resistive random access memory,RRAM)以及自旋扭矩传输随机存取存储器(spin torque transfer RAMs,STT-RAM)等。
内存控制器115是计算机系统100内部控制内存120并且使内存120与处理器105(例如CPU)之间交换数据的重要组成部分。实际应用中,一种情况下,内存控制器115可以位于北桥芯片内部。在另一种情况下,可以将内存控制器115集成在处理器105中,具体的,内存控制器115可以集成在处理器105的基板上。可以理解的是,当内存控制器115位于北桥芯片内部时,内存控制器需要通过北桥芯片与处理器交换数据,导致数据的延迟较大。当内存控制器115集成在处理器105中时,内存控制115可以直接与处理器交换数据。
如图1所示,内存控制器115可以通过内存总线(例如,双倍速率DDR总线)连接内存120。可以理解的是,实际应用中,内存控制器115还可以通过PCI高速总线、桌面管理接口(DMI)总线等其他类型的总线与内存120通信。
内存120用来存放操作系统中各种正在运行的软件、输入和输出数据以及与外存交换的信息等。内存120又可以被称为主存。内存120具有访问速度快的优点。在传统的计算机系统架构中,通常采用动态随机存取存储器(dynamic random access memory,DRAM)作为内存120。随着非易失性存储器(non-volatile memory,NVM)技术的发展,相变存储器(phase-change random access memory,PCM),阻变存储器(resistive random access memory,RRAM)、磁性存储器(magnetic random access memory,MRAM)或铁电式存储器(ferroelectric random access memory,FRAM)等新型NVM也逐渐被作为内存使用。在本发明实施例中不对内存120的具体存储介质类型进行限定。
本领域人员可以知道,缓存110主要用于缓存主存(例如图1中的内存120)的一部分数据以及缓存待写入主存中的数据。由于与主存相比,缓存Cache的容量相对较小,Cache保存的内容只是主存存储的内容的一个子集,且Cache与主存的数据交换是以块为单位的。为了把主存中的数据缓存到Cache中,必须应用某种函数把主存地址定位到Cache中,这称为地址映射。在将主存中的数据按这种映射关系缓存到Cache中后,CPU执行程序时,会将程序中的主存地址变换成Cache地址。Cache的地址映射方式通常有直接映射和组相联映射。 在直接映射方式下,主存中的一个块只能映射到Cache的某一特定块中去。直接映射是最简单的地址映射方式,它的硬件简单,成本低,地址变换速度快。但是这种方式不够灵活,Cache的存储空间得不到充分利用。由于每个内存块只能存放在Cache中的一个固定位置,容易产生冲突,使Cache效率下降。为了提高缓存命中率,越来越多的存储系统采用组相连的映射方式。在本发明实施例中,缓存110和内存120也是采用组相连映射方式。为了便于理解本方案,下面将本发明实施例中内存120和缓存110采用的组相连映射方式进行描述。
在组相连映射方式中,将主存和Cache都分成多个组,主存中一个组(set)内的块的数量与Cache中的组的数量相同。主存中的各块与Cache的组号之间有固定的映射关系,但可自由映射到对应Cache组中的任何一块。换一种表达方式,在这种映射方式下,内存块存放到哪个组是固定的,至于存到该组的哪一块则是灵活的。例如,主存分为256组,每组8块,Cache分为8组,每组2块。主存中的第0块、第8块……均映射于Cache的第0组,但可映射到Cache第0组中的第0块或第1块;主存的第1块、第9块……均映射于Cache的第1组,但可映射到Cache第1组中的第2块或第3块。在采用组相连映射方式的cache中,每组内可以有2、4、8或16块,相应的,可以分别被称为2路组相联Cache、4路组相联Cache、8路组相联Cache或16路组相联Cache。需要说明的是,本发明实施例中的“组”也可以被称为“集合”。
在本发明实施例中,采用组相连映射方式将内存120中的数据映射到缓存110中。为了描述方便,在本发明实施例中也可以将内存120中的数据块称为内存块,将缓存110中的数据块称为缓存块或缓存行(cache line)。通常,一个内存块的大小可以为4KB(kilobyte),一个缓存行的大小也可以为4KB。可以理解的是,实际应用中,还可以将内存块和缓存行的大小设置为其他值。内存块的大小与缓存行的大小相同。
图2示出了本发明实施例中内存120和缓存110的映射示意。具体的,图2给出了内存和缓存之间的组相连映射方式示意。图2中主存和Cache都被分成多个组(set)。如图2所示,缓存110中可以包括多个缓存集合(cache set),每个缓存集合可以包括多个缓存行(cache line)。换一种表达方式,每个缓存集合中可以包括多路(way)数据。其中,每一路有一个缓存条目(cache entry)。也就是说,cache entry用于指示缓存集合中的具体的路或cache line。例如,NVM10包括cache set 1和cache set 2等多个缓存集合,缓存集合cache set 1中包含三个路,这三个路分别通过下述三个cache entry进行指示:cache entry 200_1、cache entry 200_2以及cache entry 200_3。内存120的存储空间也被分成多个不同的内存集合(set):set 1 210_1,set 2 210_2,…set N 210_N。根据组相连的映射方式,set 1 210_1中任意一个存储地址对应的内存块均可以映射到缓存110的一个缓存集合中,但可以自由映射到该缓存集合中的任何一块。例如,set 1 210_1中任意一个存储地址对应的内存块均可以映射到缓存110的缓存集合cache set 1中,且可以自由映射到cache set 1中的任意一路。根据这种方式,set 1 210_1中 任意一个存储地址对应的内存块均可以映射到cache set 1中的cache entry 200_1、cache entry 200_2或cache entry 200_3中。
进一步的,如图2所示,一个缓存条目(cache entry)对应一行数据。换一种表达方式,一条cache entry对应一个缓存行(cache line)。缓存110中可以包括多个行,每一行可以存储多个字节的数据。每一条cache entry至少包括有效位(valid bit)201、脏位(dirty bit)203、标签(tag)205以及数据(data)207。可以理解的是,实际应用中,每一条cache entry中还可以包括纠错码信息(Error Correcting Code,ECC),以保证存储的数据的准确性。其中,Tag 205为主存地址的一部分,用于指示缓存行映射的内存块在内存120中的位置。Data 207是指缓存行中缓存的内存块的数据。有效位(valid bit)201用于指示缓存行的有效性。当valid bit位指示为有效valid时,说明该缓存行中的数据可用。当valid bit位指示为无效invalid时,说明该缓存行中的数据不可用。脏位(dirty bit)203用于指示缓存行中数据是否与对应的内存块中的数据相同。例如,当dirty bit位指示为脏时,说明缓存行中的数据部分(如图2中的Data 207)与对应的内存块中的数据不同,换一种表达方式,当dirty bit位指示为脏时,该缓存行中包含有新数据。当dirty bit位指示为干净clean时,说明该缓存行中的数据与对应的内存块中的数据相同。实际应用中,可以以某个值指示dirty bit位指示为脏或干净。在此不做限定。
上面对缓存110和内存120的映射方式进行了描述。本领域技术人员可以知道,为了提高访问速度,在进行数据访问时,处理器105可以向缓存110发出访问请求,该访问请求中包含访问地址。缓存控制器112首先根据访问地址确定处理器105请求访问的数据是否缓存在存储介质114中。换一种表达方式,缓存控制器112首先根据访问地址判断所述访问请求是否能够命中缓存。当所述访问请求命中缓存时,即在确定所述待访问地址对应的数据缓存在所述缓存中的情况下,缓存控制器112可以直接向处理器105返回其请求访问的数据,当所述访问请求没有命中时,也就是说,在确定所述待访问地址的数据没有缓存在所述缓存中的情况下,处理器105会访问内存120。具体的,可以通过内存控制器115从内存120中获取待访问地址的数据。
由于缓存110的缓存空间一般较小,因此,缓存110在数据访问过程中,需要根据访问情况不断更新其缓存内容,以满足不断变化的访问需求。具体来说,当数据访问命中缓存时,则可以直接访问缓存中的数据,而无需进行缓存线(Cache line)的替换更新。当数据访问未命中缓存时,缓存控制器112需要从当前缓存的缓存行(Cache line)中确定一待替换的Cache line,采用从内存中读取的新的地址的Cache line来替换所述待替换的Cache line。
本领域技术人员可以知道,Cache line是缓存控制器(Cache Controller)112的最小操作单位。换一种表达方式,当缓存控制器112将缓存114中的数据写入内存时,缓存控制器112会按照Cache line为单位将一行line数据写到内存中。当缓存控制器112从内存中读数据时,也是以Cache line为单位来读数据的。 为了描述方便,在本发明实施例中,一个Cache line可以表示一个Cache line的数据。本发明实施例中的“替换Cache line”是指用从内存中读出的一个Cache line的数据来替换Cache中的一个Cache line的数据。
现有技术中大部分的缓存替换方法的目标是优化缓存命中率,也就是说,现有技术中主要是基于提高缓存命中率来选择待替换的cache line。然而,当采用NVM作为缓存时,由于将数据写入NVM时,NVM的写功率消耗较大。因此,现有技术中的缓存替换方法在将写请求中的数据写入以NVM为介质的缓存时通常会造成较大的写开销。尤其是在组相连映射的情形下,虽然组相连能够提高缓存命中率,但由于可以在一个cache set中任意选择一个待替换的cache line,从而进一步加大了由于待替换的cache line选择的不合适而增加写开销的风险。基于该问题,本发明实施例提出了一种缓存替换方法,在以非易失性存储介质作为缓存的计算机系统中,在缓存与内存采用组相连映射的情况下,能够在提高缓存命中率的基础上,减少缓存替换过程中写数据的开销。
下面将结合图1对本发明实施例提供的缓存替换方法进行详细的介绍。需要说明的是,本发明实施例以缓存110中的存储介质114为非易失性存储介质为例进行描述。图3为本发明实施例提供的一种缓存替换方法流程图。该方法主要由缓存110中的缓存控制器112来执行。如图3所示,该方法可以包括如下步骤。
在步骤302中,所述缓存控制器接收写请求,所述写请求中包含有待写入的数据和访问地址。其中,所述访问地址为内存的物理地址。需要说明的是,由于本发明实施例主要解决写数据时对非易失性缓存造成的写开销的问题,所以本发明实施例以写请求为例进行描述。
在步骤304中,所述缓存控制器根据所述访问地址确定所述缓存中没有缓存对应的缓存行cache line。具体的,在本步骤中,缓存控制器112可以通过所述访问地址中的标签(tag)来确定所述待访问的地址是否命中缓存110。换一种表达方式,缓存110可以通过所述访问地址中的标签(tag)判断其是否缓存有该地址中的数据。下面将结合图4,对缓存控制器如何判断访问地址是否命中缓存110进行描述。
如图4所示,缓存控制器112可以将访问地址400分成三个部分:标签(tag)402、集合索引(set index)404以及块偏移(block offset)406。其中,set index 404用于指示访问地址400指向的内存块映射在缓存100中的哪个缓存集合;tag 302用于指示访问地址400指向的内存块在内存120中的位置。块偏移(block offset)406用于指示待写入的数据在该行的偏移位置,也就是说,块偏移306用于确定在这行的哪一个位置写入所述待写入数据。实际应用中,当收到访问请求后,缓存控制器112可以先根据访问地址400中的set index 404部分确定所述访问地址400属于缓存110中的哪一个缓存集合。由于在组相连的映射方式中,一个缓存集合中包括多个路。换句话说,由于一个缓存集合中包括多个缓存行。因此,在确定访问地址400所属的缓存集合后,缓存控制器112可以 将访问地址400中的tag 402部分的值与set index 404部分指向的缓存集合中的各路的cache entry(例如图2中的cache entry 200_1、cache entry 200_2和等cache entry 200_3)中的tag位(例如图2中的tag 205)进行比较,以确定访问地址400是否命中所述缓存110。当访问地址400的tag与所述缓存集合中的某个cache entry中的标签(tag)相同时,说明所述访问地址对应的数据缓存在缓存110中。当目标地址中的tag与所述缓存集合中的cache entry的tag不相同时,确定所述访问请求未命中缓存110。在这种情况下,需要进一步访问内存120。
在本步骤中,若缓存控制器112根据上述方法确定所述访问请求未命中缓存110,也就是说,当所述缓存控制器112确定所述缓存110中没有缓存与所述访问地址对应的缓存行cache line的情况下,则该方法进入步骤306。
在步骤306中,所述缓存控制器从所述访问地址对应的缓存集合中确定N个待选择路。其中,所述缓存中包括多个缓存集合,每个缓存集合中包含有M个路,每个路中包含有一个cache line,N的值不小于2,且M大于N。所述N个待选择路均为最近最少使用LRU路。具体的,在缓存控制器根据所述访问请求中的访问地址确定所述访问请求未命中缓存110后,则需要进一步从与所述访问地址对应的缓存集合中选择待替换的cache line。为了描述方便,本发明实施例以一个缓存集合中包括M个路为例进行描述,其中,M为大于2的整数。
由于本发明实示例中的缓存110为非易失性缓存,如果采用传统的最近最少使用LRU算法从所述访问地址对应的缓存集合中选择待替换的cache line,则虽然可能保证缓存命中率,但可能在写入新数据的过程中造成较大的写消耗。因此,为了保证缓存命中率,在本发明实施例中,缓存控制器可以采用模糊伪最近最少使用(pseudo least recently used,PLRU)算法从所述缓存集合的最近最少使用LRU路中选择N个待选择路,其中,N为不小于2的整数,N小于M。具体的,N可以为2的n次幂(即,N=2 n),n为不小于1的整数。需要说明的是,N的值可以根据具体情况预先设定。
根据PLRU算法,缓存集合(cache set)中的所有路可以通过一个二进制的搜索树来指示。如图5所示,假设与所述访问地址对应的缓存集合中包含4个路:路0、路1、路2和路3,则可以用图5所示的搜索树500来指示该缓存集合中的各个路。在图5所示的搜索树中,子节点“0”用于指示“向左边走以找到一个伪LRU路”,子节点“1”用于指示“向右边走以找到伪LRU路”。例如,根据子节点“0”可以找到路0和路1,根据子节点“1”可以找到路2和路3。
在本发明实施例提供的模糊PLRU算法可以根据预设的N的值从所述搜索树中选择相应数量的待选择路。根据图5所示的搜索树可以看出,在选择待选择路时,如果不考虑根节点的值,则可以选择两个LRU路。如果不考虑根节点以及子树的根节点的值,则可以选择4个LRU路。换一种表达方式,在按照模糊PLRU算法搜索待选择的路时,如果不考虑k层节点的值,则可以选择2 k个LRU路,其中,所述k层节点为不包括叶子节点的节点。k小于等于log 2M。 例如,如果所述缓存集合有4个路,则所述缓存集合的搜索树共有log 2(4)=2层节点,则k的值小于等于2。
具体的,在本步骤中,可以根据与所述访问地址对应的缓存集合的PLRU编码查找所述缓存集合对应的搜索树以选择所述N个待选择路。为描述方便,以N=2为例进行描述。假设与所述访问地址对应的缓存集合的模糊P-LRU编码为:“01”,所述缓存集合的搜索树为图5中的搜索树500所示,由于图5所述的搜索树有4个路,要查找2个待选择路。由于N=2 k=2,则k=1,即,可以不考虑1层节点的值,也就是不考虑整个搜索树的根节点“1”,而只需考虑根节点“1”下层的两个子树。也就是说,在搜索所述搜索树时,可以根据所述缓存集合的P-LRU编码“01”搜索根节点“1”下层的两个子树(包含子节点为“0”的子树502以及包含子节点为“1”的子树504)以选择2个待选择路。具体的,可以根据编码“01”中的高位“0”查找到子树502中的路0,根据编码“01”中的低位“1”查找到子树504中的路3,从而,子树502中的路0和子树504中的路3作为所述2个待选择路。该方法进入步骤308。
在步骤308中,所述缓存控制器分别将所述待写入数据与所述N个待选择路的样本数据进行比较以获得N个汉明距离。其中,所述样本数据与所述待写入数据的长度相同,所述汉明距离用于指示所述两个相同长度的数据具有的不同的对应位的数量。为了减少写数据时的功耗,在本发明实施例中,缓存集合中的每个路均设置有一个样本数据。其中,样本数据的长度与cache line的长度相同。为了减少存储开销,在本发明实施例中,不同缓存集合中的第i路共用相同的样本数据,其中,第i路为所述缓存集合中的任意一路,i的值不大于M,例如,i大于等于0且小于等于M-1。根据这种方式,同一个缓存集合中的不同路的样本数据并不一定相同。
实际应用中,一种方式下,可以随机生成各路的样本数据。为了提高精确性,采用随机生成样本数据的方式时,不同的两个样本数据之间的汉明距离不小于第二预设值。例如,该第二预设值可以为512/M,其中,M为一个缓存集合中的路的数量。
在另一种情形下,还可以动态生成样本。为描述方便,以如何动态生成第i路的样本数据为例进行描述。具体的,在系统启动时,可以为第i路的样本数据设置一个初始值,例如,该初始值可以是0000,也可以是写入第i路的第一个数据。为了提高精确性,可以根据写入第i路的数据动态更新样本数据。具体的,在本发明实施例中为第i路设置有一个第一计数器,所述第一计数器用于跟踪向第i路写入数据的写请求的数量。并且,为第i路的cache line的每一位分别设置一个第二计数器,所述第二计数器用于统计对应位被写入第一预设值的数量,所述第一预设值可以为“0”或“1”。也就是说,所述第二计数器用于统计对应位被写入“0”或被写入“1”的数量。根据这种方式,如果第i路的cache line有P位,则需要对应设置P个第二计数器。可以理解的是,由于样本数据的长度和cache line的长度相同,因此,所述P个第二计数器也可以被认为对应于所述样 本数据的每一位。当第i路的写请求的数量达到第一阈值时,也就是说,当第一计数器的值达到第一阈值时,缓存控制器可以根据第i路cache line的每一位设置的第二计数器的值更新所述样本数据。具体的,当第k位的第二计数器统计第i路cache line中的第k位被写入“1”的次数超过第二阈值时,将所述样本数据的第k位的值置为“1”。当第k+1位的第二计数器统计第k+1位被写入“1”的次数未超过第二阈值时,将所述样本数据的第k+1位的值置为“0”。其中,第二阈值不大于所述第一阈值,例如,第二阈值可以为第一阈值的一半。可以理解的是,实际应用中,第一阈值和第二阈值也可以根据需要自行设定。例如,第一阈值可以为10,第二阈值可以为8。在此不作限定。
另外,实际应用中,还可以不为第i路设置第一计数器,而为第i路设置一个计时器。当所述第i路的计时器显示到达预设的第三阈值时,根据为所述第i路的cache line的每一位分别设置的第二计数器的值更新所述样本数据。
例如,可以设置每30分钟更新一次第i路的样本数据,具体更新时,也可以如前所述,根据样本数据的每一位中写入“1”的数量分别确定是否将该位设置为“1”,例如,当第k位的第二计数器统计第k位被写入“1”的次数超过第二阈值时,将所述样本数据的第k位的值置为“1”。可以理解的是,上述第i路可以是不同缓存集合的第i路。
上面以第i路为例对如何生成缓存集合中的各路的样本数据进行了简单的介绍。当通过步骤306确定了N个待选择的路后,为了减少写数据消耗,在步骤308中,缓存控制器可以将所述写请求中携带的待写入数据与所述N个待选择路的样本数据进行比较,以获得所述N个汉明距离。例如,若在步骤306中确定了图5所示的路0和路3为待选择的两个路,则在本步骤中,缓存控制器可以分别将路0的样本数据和路3的样本数据的各个位与所述待写入数据的相应位的值进行比较,以获得两个汉明距离,这两个汉明距离包括所述待写入数据与路0的样本数据的汉明距离以及所述待写入数据与路3的样本数据的汉明距离。其中,汉明距离是指两个相同长度的数据具有的不同的对应位的数量。例如,若待写入数据为0011,路0的样本数据为0101,路3的样本数据为1000,则所述待写入数据与路0的样本数据有两个对应位的值不同,即,所述待写入数据与路0的样本数据的汉明距离为2。所述待写入数据与路3的样本数据有三个对应位的值不同,即,待写入数据与路3的样本数据的汉明距离为3。
在步骤310中,所述缓存控制器将所述N个汉明距离中的最小值所对应的路中的cache line作为待替换的cache line。由于汉明距离能够反应待写入数据与缓存中的cache line的值的近似程度。也就是说,两个数据的汉明距离越小,说明两个数据越近似。由于某路的样本数据是根据该路中的cache line获得的,因此,所述待写入数据与某个路的样本数据的汉明距离也能指示所述待写入数据域该路中的cache line的近似程度。因此,在本发明实施例中,在获得所述待写入数据与所述N个待选择路的样本数据的汉明距离之后,为了减少写功耗,所述缓存控制器可以将所述N个汉明距离中的最小值对应的路的cache line作为 待替换的cache line,从而能够使得在写入所述待写入数据时,改写的位最少。例如,仍然以前述的路0和路3为例,在获得所述待写入数据与路0的样本数据的汉明距离为2,所述待写入数据与路3的样本数据的汉明距离为3后,在本步骤中,所述缓存控制器可以选择路0中的cache line为待替换的cache line。
在步骤312中,所述缓存控制器将所述待写入数据写入所述缓存中,所述待写入数据用于替换所述待替换的cache line。具体的,缓存控制器可以将所述待写入数据按位写入所述路0中,以替换所述待替换的cache line。由于所述待写入数据与路0的样本数据的汉明距离较小,因此,所述待写入数据与所述路0中的待替换的cache line也更近似。从而,在写入所述待写入数据时,需要改写的位较少。例如,若待写入数据为0011,路0的cache line为0101,则在写入所述待写入数据时,只需要改写所述cache line的中间两位的值,从而能够在一定程度上减少写数据时的功率损耗。
可以理解的是,在将所述待写入数据写入所述缓存后,可以用所述访问地址中的tag替换所述缓存集合中的待替换cache line对应的tag。从而缓存控制器可以根据所述缓存集合中替换后的cache line处理后续的访问请求。
进一步的,在本发明实施例中,在所述缓存集合中的所述待替换cache line替换后,由于被替换的路已不属于LRU路,因此,需要更新所述缓存集合的模糊P-LRU编码。例如,在图5所示的示例中,当路0的cache line被选择替换后,则可以将路0所在的子树1的根节点“0”置为1,所述缓存集合的模糊P-LRU编码则由“01”被更新为“11”,其中,高位的“1”用于指向根节点左边的子树1,低位“1”用于指向根节点右边的子树2。进而,缓存控制器可以根据所述缓存集合更新后的模糊P-LRU编码按照本发明实施例提供的方法处理后续的访问请求。
本发明实施例提供的缓存替换方法,在采用非易失性存储介质做缓存时,通过在与访问地址对应的缓存集合的最近最少使用LRU路中选择多个待选择路,并分别将所述待写入数据与所述多个待选择路的样本数据进行比较获得多个汉明距离,并将值最小的汉明距离对应的路中的cache line作为待替换的cache line。由于所述多个待选择路是在缓存集合的LRU路中选择的,因此能够减少因替换cache line对缓存命中率的影响。并且,由于待替换的cache line是根据汉明距离在所述多个待选择路中确定的,汉明距离能够体现两个数据的相似性,因此,在将所述待写入数据写入缓存以替换所述待替换cache line时能够减少写入的数据量,节省写数据时的功耗,降低系统的写开销。
图6为本发明实施例提供的一种缓存控制器的结构示意图。图6所示的缓存控制器可以为图1所示的计算机系统中的缓存控制器112。如图6所示,所述缓存控制器600可以包括下述模块。
接收模块602,用于接收写请求,所述写请求中包含有待写入的数据和访问地址。判断模块604,用于根据所述访问地址判断所述缓存中是否缓存有对应的缓存行cache line。当所述判断模块604确定所述缓存中没有缓存对应的 缓存行cache line时,触发选择模块606。
所述选择模块606,用于从所述访问地址对应的缓存集合中确定N个待选择路。其中,所述缓存中包括多个缓存集合,每个缓存集合中包含有M个路,每个路中包含有一个cache line,N的值不小于2,且M大于N。
计算模块608,用于分别将所述待写入数据与所述N个待选择路的样本数据进行比较以获得N个汉明距离。其中,所述样本数据与所述待写入数据的长度相同,所述汉明距离用于指示所述两个相同长度的数据具有的不同的对应位的数量。
所述选择模块606,还用于根据所述计算模块608的计算结果,将所述N个汉明距离中的最小值所对应的路中的cache line作为待替换的cache line。写入模块610,用于将所述待写入的数据写入所述存储介质中,所述待写入数据用于替换所述待替换的cache line。
实际应用中,选择模块606具体用于采用模糊伪最近最少使用PLRU算法从所述缓存集合的最近最少使用LRU的路中确定所述N个待选择路。
进一步的,所述缓存控制器600还可以包括样本数据处理模块612。所述样本数据处理模块612用于根据所述多个缓存集合中的每一个缓存集合中的第i路的cache line获得所述缓存集合中的第i路的样本数据。其中,所述第i路的样本数据与所述第i路中的cache line的长度相同,第i路为所述M个路中的任意一路,i大于等于1且小于等于M。
在另一种情形下,所述样本数据处理模块612还用于统计所述第i路cache line中的各个比特位中写入第一预设值的数量,并根据各比特位中写入所述第一预设值的数量更新所述第i路的样本数据中的对应比特位,以获得更新的第i路的样本数据其中,所述第一预设值包括“1”或“0”。
可选的,在所述缓存中,不同缓存集合中的第i路的样本数据相同,其中,i大于等于0且小于等于M-1。同一个缓存集合中的不同路的样本数据不相同。
本发明实施例所提供的缓存控制器600中各个模块功能的详细描述可以参见前述实施例描述的缓存替换方法中的相关描述,在此不再赘述。
本发明实施例还提供一种缓存替换方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行前述任意一个方法实施例所述的方法流程。本领域普通技术人员可以理解,前述的存储介质包括:U盘、移动硬盘、磁碟、光盘、随机存储器(Random-Access Memory,RAM)、固态硬盘(Solid State Drive,SSD)或者其他非易失性存储器(non-volatile memory)等各种可以存储程序代码的非短暂性的(non-transitory)机器可读介质。
需要说明的是,本申请所提供的实施例仅仅是示意性的。所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在 也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。

Claims (15)

  1. 一种缓存替换方法,其特征在于,所述方法应用于包括缓存的计算机系统中,所述缓存包括缓存控制器以及与所述缓存控制器连接的用于缓存数据的存储介质,所述存储介质为非易失性存储介质,所述方法包括:
    所述缓存控制器接收写请求,所述写请求中包含有待写入的数据和访问地址;
    所述缓存控制器根据所述访问地址确定所述缓存中没有缓存对应的缓存行cache line;
    所述缓存控制器从所述访问地址对应的缓存集合中确定N个待选择路,其中,所述缓存中包括多个缓存集合,每个缓存集合中包含有M个路,每个路中包含有一个cache line,N的值不小于2,且M大于N;
    所述缓存控制器分别将所述待写入数据与所述N个待选择路的样本数据进行比较以获得N个汉明距离,其中,所述样本数据与所述待写入数据的长度相同,所述汉明距离用于指示所述两个相同长度的数据具有的不同的对应位的数量;
    所述缓存控制器将所述N个汉明距离中的最小值所对应的路中的cache line作为待替换的cache line;
    所述缓存控制器将所述待写入的数据写入所述存储介质中,所述待写入数据用于替换所述待替换的cache line。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    所述缓存控制器根据所述多个缓存集合中的每一个缓存集合中的第i路的cache line获得所述缓存集合中的第i路的样本数据,其中,所述第i路的样本数据与所述第i路中的cache line的长度相同,第i路为所述M个路中的任意一路,i大于等于1且小于等于M。
  3. 根据权利要求1或2所述的方法,其特征在于,所述缓存控制器从与所述访问地址对应的缓存集合中确定N个待选择路包括:
    所述缓存控制器采用模糊伪最近最少使用PLRU算法从所述缓存集合的最近最少使用LRU的路中确定所述N个待选择路,其中,N=2 n,n为不小于1的整数。
  4. 根据权利要求1-3任意一项所述的方法,其特征在于:所述缓存的不同缓存集合中的第i路的样本数据相同,其中,i大于等于0且小于等于M-1。
  5. 根据权利要求2-4任意一项所述的方法,其特征在于,还包括:
    所述缓存控制器统计所述第i路cache line中的各个比特位中写入第一预设值的数量,其中,所述第一预设值包括“1”或“0”;
    所述缓存控制器根据各比特位中写入所述第一预设值的数量更新所述第i路 的样本数据中的对应比特位,以获得更新的第i路的样本数据。
  6. 一种计算机系统,其特征在于,所述计算机系统包括缓存控制器和与所述缓存控制器连接的缓存,所述缓存为非易失性存储器,所述缓存控制器用于:
    接收写请求,所述写请求中包含有待写入的数据和访问地址;
    根据所述访问地址确定所述缓存中没有缓存对应的缓存行cache line;
    从所述访问地址对应的缓存集合中确定N个待选择路,其中,所述缓存中包括多个缓存集合,每个缓存集合中包含有M个路,N的值不小于2,且M大于N;
    分别将所述待写入数据与所述N个待选择路的样本数据进行比较以获得N个汉明距离,其中,所述样本数据与所述待写入数据的长度相同,所述汉明距离用于指示所述两个相同长度的数据具有的不同的对应位的数量;
    将所述N个汉明距离中的最小值所对应的路中的cache line作为待替换的cache line;
    将所述待写入的数据写入所述缓存中,所述待写入数据用于替换所述待替换的cache line。
  7. 根据权利要求6所述的计算机系统,其特征在于,所述缓存控制器还用于:
    根据所述多个缓存集合中的每一个缓存集合中的第i路的cache line获得所述缓存集合中的第i路的样本数据,其中,所述第i路的样本数据与所述第i路中的cache line的长度相同,第i路为所述M个路中的任意一路,i大于等于1且小于等于M。
  8. 根据权利要求6或7所述的计算机系统,其特征在于,所述缓存控制器用于:
    采用模糊伪最近最少使用PLRU算法从所述缓存集合的最近最少使用LRU的路中确定所述N个待选择路,其中,N=2 n,n为不小于1的整数。
  9. 根据权利要求6-8任意一项所述的计算机系统,其特征在于:所述缓存的不同缓存集合中的第i路的样本数据相同,其中,i大于等于1且小于等于M。
  10. 根据权利要求7-9任意一项所述的计算机系统,其特征在于,所述缓存控制器还用于:
    统计所述第i路cache line中的各个比特位中写入第一预设值的数量,其中,所述第一预设值包括“1”或“0”;
    根据各比特位中写入的所述第一预设值的数量更新所述第i路的样本数据中的对应比特位,以获得更新的第i路的样本数据。
  11. 一种缓存控制器,其特征在于,包括:
    接收模块,用于接收写请求,所述写请求中包含有待写入的数据和访问地址;
    判断模块,用于根据所述访问地址确定缓存中没有缓存对应的缓存行cache line,其中,所述缓存为非易失性存储器;
    选择模块,用于从所述访问地址对应的缓存集合中确定N个待选择路,其中,所述缓存中包括多个缓存集合,每个缓存集合中包含有M个路,每个路中包含有一个cache line,N的值不小于2,且M大于N;
    计算模块,用于分别将所述待写入数据与所述N个待选择路的样本数据进行比较以获得N个汉明距离,其中,所述样本数据与所述待写入数据的长度相同,所述汉明距离用于指示所述两个相同长度的数据具有的不同的对应位的数量;
    所述选择模块,还用于将所述N个汉明距离中的最小值所对应的路中的cache line作为待替换的cache line;
    写入模块,用于将所述待写入的数据写入所述存储介质中,所述待写入数据用于替换所述待替换的cache line。
  12. 根据权利要求11所述的缓存控制器,其特征在于,还包括:
    样本数据处理模块,用于根据所述多个缓存集合中的每一个缓存集合中的第i路的cache line获得所述缓存集合中的第i路的样本数据,其中,所述第i路的样本数据与所述第i路中的cache line的长度相同,第i路为所述M个路中的任意一路,i大于等于1且小于等于M。
  13. 根据权利要求11或12所述的缓存控制器,其特征在于,所述选择模块具体用于:
    采用模糊伪最近最少使用PLRU算法从所述缓存集合的最近最少使用LRU的路中确定所述N个待选择路,其中,N=2 n,n为不小于1的整数。
  14. 根据权利要求11-13任意一项所述的缓存控制器,其特征在于:所述缓存的不同缓存集合中的第i路的样本数据相同,其中,i大于等于0且小于等于M-1。
  15. 根据权利要求12-14任意一项所述的缓存控制器,其特征在于,所述样本数据处理模块还用于:
    统计所述第i路cache line中的各个比特位中写入第一预设值的数量,其中,所述第一预设值包括“1”或“0”;
    根据各比特位中写入所述第一预设值的数量更新所述第i路的样本数据中的对应比特位,以获得更新的第i路的样本数据。
PCT/CN2018/123362 2017-12-29 2018-12-25 缓存替换技术 WO2019128958A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711479277.7A CN110018971B (zh) 2017-12-29 2017-12-29 缓存替换技术
CN201711479277.7 2017-12-29

Publications (1)

Publication Number Publication Date
WO2019128958A1 true WO2019128958A1 (zh) 2019-07-04

Family

ID=67063167

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123362 WO2019128958A1 (zh) 2017-12-29 2018-12-25 缓存替换技术

Country Status (2)

Country Link
CN (1) CN110018971B (zh)
WO (1) WO2019128958A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349199A (zh) * 2023-11-30 2024-01-05 摩尔线程智能科技(北京)有限责任公司 缓存管理装置及系统
CN117806992A (zh) * 2024-02-29 2024-04-02 山东云海国创云计算装备产业创新中心有限公司 数据块替换方法、装置、电子设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948282A (zh) * 2019-12-31 2021-06-11 北京忆芯科技有限公司 用于数据快速查找的计算加速系统
CN113190474B (zh) * 2021-04-30 2022-07-12 华中科技大学 一种提升stt-mram近似缓存能效的方法及系统
CN113392043A (zh) * 2021-07-06 2021-09-14 南京英锐创电子科技有限公司 缓存数据替换方法、装置、设备和存储介质
CN116737609A (zh) * 2022-03-04 2023-09-12 格兰菲智能科技有限公司 选择替换缓存行的方法及其装置
CN115794675B (zh) * 2023-01-19 2023-05-16 北京象帝先计算技术有限公司 写数据方法、装置、图形处理系统、电子组件及电子设备
CN116644008B (zh) * 2023-06-16 2023-12-15 合芯科技有限公司 一种缓存替换控制方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1606735A (zh) * 2001-12-20 2005-04-13 英特尔公司 在缓存路中进行数据替换的系统和方法
CN101156140A (zh) * 2005-02-07 2008-04-02 Nxp股份有限公司 数据处理系统和高速缓存替换方法
CN102043591A (zh) * 2010-11-24 2011-05-04 清华大学 Pram的写操作方法
US20120246543A1 (en) * 2011-03-25 2012-09-27 Ariel Szapiro Apparatus and method for fast tag hit
CN104298622A (zh) * 2013-07-17 2015-01-21 飞思卡尔半导体公司 使用fifo的最近最少使用的高速缓存替代实现

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180969B2 (en) * 2008-01-15 2012-05-15 Freescale Semiconductor, Inc. Cache using pseudo least recently used (PLRU) cache replacement with locking
US20150286571A1 (en) * 2014-04-04 2015-10-08 Qualcomm Incorporated Adaptive cache prefetching based on competing dedicated prefetch policies in dedicated cache sets to reduce cache pollution
CN107463509B (zh) * 2016-06-05 2020-12-15 华为技术有限公司 缓存管理方法、缓存控制器以及计算机系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1606735A (zh) * 2001-12-20 2005-04-13 英特尔公司 在缓存路中进行数据替换的系统和方法
CN101156140A (zh) * 2005-02-07 2008-04-02 Nxp股份有限公司 数据处理系统和高速缓存替换方法
CN102043591A (zh) * 2010-11-24 2011-05-04 清华大学 Pram的写操作方法
US20120246543A1 (en) * 2011-03-25 2012-09-27 Ariel Szapiro Apparatus and method for fast tag hit
CN104298622A (zh) * 2013-07-17 2015-01-21 飞思卡尔半导体公司 使用fifo的最近最少使用的高速缓存替代实现

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349199A (zh) * 2023-11-30 2024-01-05 摩尔线程智能科技(北京)有限责任公司 缓存管理装置及系统
CN117806992A (zh) * 2024-02-29 2024-04-02 山东云海国创云计算装备产业创新中心有限公司 数据块替换方法、装置、电子设备及存储介质
CN117806992B (zh) * 2024-02-29 2024-06-07 山东云海国创云计算装备产业创新中心有限公司 数据块替换方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN110018971B (zh) 2023-08-22
CN110018971A (zh) 2019-07-16

Similar Documents

Publication Publication Date Title
WO2019128958A1 (zh) 缓存替换技术
US11119940B2 (en) Sequential-write-based partitions in a logical-to-physical table cache
US10908821B2 (en) Use of outstanding command queues for separate read-only cache and write-read cache in a memory sub-system
CN109582214B (zh) 数据访问方法以及计算机系统
US9092321B2 (en) System and method for performing efficient searches and queries in a storage node
US11210020B2 (en) Methods and systems for accessing a memory
US11494311B2 (en) Page table hooks to memory types
US20120102273A1 (en) Memory agent to access memory blade as part of the cache coherency domain
US20230236747A1 (en) Accessing stored metadata to identify memory devices in which data is stored
US11016905B1 (en) Storage class memory access
US20140317337A1 (en) Metadata management and support for phase change memory with switch (pcms)
CN111512290B (zh) 文件页表管理技术
US20200278941A1 (en) Priority scheduling in queues to access cache data in a memory sub-system
US11397683B2 (en) Low latency cache for non-volatile memory in a hybrid DIMM
US20240020014A1 (en) Method for Writing Data to Solid-State Drive
WO2021143154A1 (zh) 一种缓存管理方法及装置
KR20220065817A (ko) 하이브리드 dimm의 전송 파이프라인에서의 데이터 의존도 관리
US20230120184A1 (en) Systems, methods, and devices for ordered access of data in block modified memory
CN112805692A (zh) 混合式双列直插式存储器模块中的高速缓存操作
CN116340203A (zh) 数据预读取方法、装置、处理器及预取器
CN116795736A (zh) 数据预读取方法、装置、电子设备和存储介质
US11995314B2 (en) Memory management
US20240211406A1 (en) Systems, methods, and apparatus for accessing data from memory or storage at a storage node
US11797183B1 (en) Host assisted application grouping for efficient utilization of device resources
US20220229552A1 (en) Computer system including main memory device having heterogeneous memories, and data management method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18895938

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18895938

Country of ref document: EP

Kind code of ref document: A1