WO2019062747A1 - 数据访问方法以及计算机系统 - Google Patents

数据访问方法以及计算机系统 Download PDF

Info

Publication number
WO2019062747A1
WO2019062747A1 PCT/CN2018/107553 CN2018107553W WO2019062747A1 WO 2019062747 A1 WO2019062747 A1 WO 2019062747A1 CN 2018107553 W CN2018107553 W CN 2018107553W WO 2019062747 A1 WO2019062747 A1 WO 2019062747A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
partition
access request
cache
data
Prior art date
Application number
PCT/CN2018/107553
Other languages
English (en)
French (fr)
Inventor
潘海洋
陈明宇
卢天越
刘宇航
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019062747A1 publication Critical patent/WO2019062747A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools

Definitions

  • the present application relates to the field of memory technologies, and in particular, to a data access method and a computer system.
  • NVM Non-volatile Memory
  • PCM Phase Change Memory
  • DRAM Dynamic Random Access Memory
  • NVM Non-Volatile Memory
  • DRAM Dynamic Random Access Memory
  • NVM Non-Volatile Memory
  • the inventor has found that in a computer system with a mixed memory architecture, when the memory controller processes the memory access request, it first determines the memory access according to the tag corresponding to the address in the memory access request. Whether the request hits the DRAM to determine whether the data to be accessed is stored in the DRAM.
  • the tag corresponding to the address of the cache block buffered in the DRMA also needs to occupy a large storage space. Therefore, in practical applications, the tag corresponding to the cache block is usually stored in the DRAM.
  • the embodiment of the present application provides a data access method and a computer system, which solves the problem that a large system overhead is caused by introducing an extra access delay in the related art, resulting in a decrease in performance of the computer system.
  • the technical solution is as follows:
  • a data access method for a computer system, the computer system comprising a processor, a cache, a first memory, a second memory, and a memory controller, wherein the first memory is used for a cache
  • the data in the second memory the cache is used to cache a tag corresponding to at least part of the cache block in the second partition of the first memory, where the first memory includes a first partition and the second partition
  • the second partition is configured to cache a cache block that is replaced from the first partition, and one memory block in the second memory is mapped to a cache block of the first partition, and one of the second memories
  • a memory block is mapped into one packet of the second partition, and one packet of the second partition includes a plurality of cache blocks.
  • the embodiments of the present application can be divided into the following situations when implementing data access.
  • the access request hits the cache.
  • the data access process is as follows:
  • the processor Determining, by the processor, the first physical address according to the first access address in the first access request; the processor determining, according to the first tag in the first physical address, whether the first access request hits the cache; When the first access request hits the cache, the processor sends a first memory access request to the memory controller, where the first memory access request carries the first physical address; the memory The controller acquires, according to the first physical address, the first data to be accessed by the first access request from the second partition.
  • the data access process is as follows:
  • the processor obtains a second physical address according to the second access address in the second access request; when the processor is configured according to the second physical address The second tag determines that the second access request misses the cache, the processor sends a second memory access request to the memory controller, where the second memory access request carries the second physics address;
  • the memory controller determines, according to the second tag, whether the second memory access request hits a first partition of the first memory; when the second memory access request hits a first partition of the first memory And the memory controller acquires, from the first partition, the second data to be accessed by the second access request.
  • the access request misses the cache, and the access request misses the first partition of the first memory but hits the second partition.
  • the data access process is as follows:
  • the processor obtains a third physical address according to a third access address in the third access request; when the processor is configured according to the third physical address The third tag determines that the third access request misses the cache, the processor sends a third memory access request to the memory controller, where the third memory access request carries the third physics address;
  • the memory controller determines, according to the third tag, whether the third memory access request hits a first partition of the first memory; when the third memory access request misses a first partition of the first memory The memory controller determines, according to the third tag, whether the third memory access request hits a second partition of the first memory;
  • the memory controller acquires the third data to be accessed by the third access request from the second partition.
  • the access request misses the cache, and the access request misses the first partition and the second partition of the first memory.
  • the data access process is as follows:
  • the processor obtains a fourth physical address according to a fourth access address in the fourth access request; when the processor is configured according to the fourth physical address The fourth tag determines that the fourth access request misses the cache, the processor sends a fourth memory access request to the memory controller, where the fourth memory access request carries the fourth physics address;
  • the memory controller determines that the fourth memory access request misses the first partition and the second partition of the first memory according to the fourth tag, the memory controller is configured according to the fourth physical address Acquiring, in the second memory, fourth data to be accessed by the fourth access request;
  • the memory controller stores the fourth data in a first partition of the first memory and stores the replaced data in the first partition in a second partition of the first memory.
  • a computer system comprising a processor, a cache, a first memory, a second memory, and a memory controller, wherein the first memory is used to cache the second memory Data, the cache is used to cache a tag corresponding to at least part of the cache block in the second partition of the first memory, the first memory includes a first partition and the second partition, and the second partition is used for Cache a cache block that is replaced from the first partition, a memory block in the second memory is mapped to a cache block of the first partition, and a memory block in the second memory is mapped to the In one packet of the second partition, one packet of the second partition includes a plurality of cache blocks.
  • the processor and the memory controller are configured to execute the data access method as described in the first aspect above.
  • the embodiment of the present application divides the first memory into two areas, that is, the first partition that directly maps and the second partition that adopts the group associative mapping, and uses the cache to store the tag corresponding to the partial cache block in the second partition.
  • Such a design takes into account the advantages of direct mapping and group association mapping.
  • the processor can access the cache before accessing the first partition and the second partition of the first memory, if the access request hits Cache, the processor can control the memory controller to directly obtain the data to be accessed from the second partition of the first memory, and the system overhead of accessing the cache read tag is negligible in determining whether to hit the cache, so only The first memory is accessed once to acquire data, thus effectively reducing the system overhead while ensuring a high hit rate, and improving the performance of the computer system.
  • FIG. 1 is a schematic structural diagram of a computer system according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of mapping between an NVM and a DRAM according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a composition of a target address provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of capacity of an SRAM, a DRAM, and an NVM according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a TDV structure provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a grouping of a main memory address and a DRAM according to an embodiment of the present application
  • FIG. 7 is a flowchart of a data access method according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a computer system according to an embodiment of the present application.
  • FIG. 1 is provided in an embodiment of the present application.
  • computer system 100 can include at least processor 105, memory controller 115, NVM 120, and DRAM 125.
  • the DRAM 125 and the NVM 120 are both memory of the computer system 100.
  • the connection relationship of the computer system shown in FIG. 1 is only one example of a computer system having a hybrid memory architecture, and the DRAM 125 and the NVM 120 shown in FIG. 1 are only multi-level memories in the computer system.
  • the internal structure in the computer system is not specifically limited, and the memory system may include other memories than the DRAM 125 and the NVM 120.
  • the computer system in the embodiment of the present application only needs to include a first level memory and a second level memory which can be used as a memory, and the first level memory can support cache access.
  • the computer system in the embodiment of the present application only needs to include at least two levels of memory, and the first level memory is a cache of the second level memory, and is used for buffering part of the data in the second memory.
  • a computer system with a hybrid memory architecture provided by an embodiment of the present application will be described below by taking FIG. 1 as an example.
  • the processor 105 is the core of the computer system 100, and the processor 105 can invoke different software programs in the computer system 100 to implement different functions.
  • processor 105 can implement access to DRAM 125, NVM 120.
  • the processor 105 can be a Central Processing Unit (CPU).
  • the processor may be an application specific integrated circuit (asic) or one or more integrated circuits configured to implement the embodiments of the present application.
  • the embodiment of the present application is exemplified by a processor.
  • the computer system may further include multiple processors.
  • the processor may be a single core processor or a multi-core processor. In a multi-core processor architecture, multiple processor cores can be included in a processor.
  • one or more CPU cores 110 may be included in the processor 105.
  • the number of processors and the number of processor cores in one processor are not limited in the embodiment of the present application.
  • the memory controller 115 is an important component of the computer system 100 internally controlling memory and exchanging data between the memory and the processor 105 (e.g., CPU).
  • the memory controller 115 can be located inside the north bridge chip.
  • the memory controller 115 can be integrated into the processor 105 (as shown in FIG. 1).
  • the memory controller 115 can be integrated on the substrate of the processor 105. It can be understood that when the memory controller 115 is located inside the north bridge chip, the memory controller needs to exchange data with the processor through the north bridge chip, resulting in a large data delay.
  • the memory control 115 can exchange data directly with the processor.
  • memory controller 115 can couple NVM controller 116 and DRAM controller 118.
  • the DRAM controller 118 is used to control access to the DRAM 125
  • the NVM controller 116 is used to control access to the NVM 120.
  • NVM controller 116 and DRAM controller 118 may in turn be referred to as a media controller. It can be understood that, in practical applications, in one case, the NVM controller 116 and the DRAM controller 118 can be independent of the memory controller 115. In another case, NVM controller 116 and DRAM controller 118 may also be integrated into memory controller 115, logically as part of memory controller 115 (shown in Figure 1).
  • the memory controller 115 can connect the NVM 120 and the DRAM 125 through a memory bus (for example, a double rate DDR bus). It can be understood that, in practical applications, the NVM controller 125 can also communicate with the NVM 120 via other types of buses, such as a PCI high speed bus, a desktop management interface (DMI) bus, and the like.
  • a memory bus for example, a double rate DDR bus.
  • DMI desktop management interface
  • the memory controller 115 can be connected to the NVM 120 and the DRAM 125 in the manner shown in FIG. 1, and other methods can be adopted.
  • the memory controller 115 can directly connect to the DRAM 125 through the memory bus, but The DRAM 125 is indirectly connected to the NVM 120.
  • the memory control 115 is coupled to the DRAM 125 via a memory bus that is coupled to the NVM 120 via a memory bus.
  • DRAM 125 can be coupled to processor 105 via a memory bus.
  • the DRAM 125 has the advantage of fast access speed.
  • the processor 105 is capable of high speed access to the DRAM 125 for reading or writing to the DRAM 125.
  • DRAM 125 is typically used to store various operating software, input and output data, and information exchanged with external memory in the operating system. However, the DRAM 125 is volatile, and when the power is turned off, the information in the DRAM 125 will no longer be saved.
  • the NVM 120 can be used together with the DRAM 125 as the memory of the computer system 100. Compared with the DRAM 125, the NVM 120 has a non-volatile characteristic, so that data can be better preserved.
  • a nonvolatile memory that can be used as a memory can be referred to as a storage class memory (SCM).
  • the DRAM is a kind of volatile memory.
  • RAM random access memory
  • SRAM static random access memory
  • the NVM 120 shown in FIG. 1 may include: Phase-Change Random Access Memory (PCM), Resistive Random Access Memory (RRAM), and Magnetic Random Access (Magnetic Random Access).
  • PCM Phase-Change Random Access Memory
  • RRAM Resistive Random Access Memory
  • MRAM memory
  • FRAM ferroelectric random access memory
  • the NVM 120 Since the access speed of the NVM 120 is relatively slow compared to the DRAM 125, the NVM 120 is generally used as the main memory of the system, and the DRAM 125 is used as a cache of the NVM 120 to compensate for the main memory.
  • the NVM 120 has a slow access speed and increases memory access speed.
  • DRAM 125 acts as a cache for NVM 120.
  • the memory controller 115 receives the memory access request sent by the processor 105, it first determines whether the target address (ie, the address of the memory block to be accessed) in the memory access request hits the DRAM 125 to determine whether the data to be accessed is stored. In DRAM 125.
  • the memory controller 115 can directly acquire the data to be accessed from the DRAM 125 to shorten the access delay.
  • the memory controller 115 determines that the target address in the memory access request does not hit the DRAM 125, the memory controller 115 acquires the data to be accessed from the NVM 120.
  • Cache address mapping usually has direct mapping and group associative mapping. The following explains the Cache in detail before explaining the direct mapping and the group associative mapping.
  • the basic unit of the Cache is a Cache Line.
  • the Cache Line may also be referred to as a cache block or a cache line.
  • the data stored in the main memory is similarly divided.
  • the divided data blocks in the NVM 120 may also be referred to as memory blocks in the embodiment of the present application.
  • a memory block can be 4 KB (kilobyte), and a cache block can also be 4 KB in size. It can be understood that, in practical applications, the size of the memory block and the cache line can also be set to other values, and only the size of the memory block must be the same as the size of the cache block.
  • a memory block in the main memory can only be mapped to a specific cache block of the Cache.
  • a memory in the main memory is placed in a unique location in the Cache. For example, suppose the main memory has 16 memory blocks, and 16 memory blocks are numbered sequentially by numbers 0 to 15. The Cache has 4 blocks, and the 0th block, the 4th block, the 8th block, and the 12th block of the main memory are stored. It can only be mapped to the 0th block of the Cache; the 1st, 5th, 9th, and 13th blocks of the main memory can only be mapped to the first block of the Cache....
  • Direct mapping is the simplest way to map addresses. It has simple hardware, low cost, and fast address translation. However, this method is not flexible enough, and the storage space of the Cache is not fully utilized. Since each memory block can only be stored in a fixed location in the Cache, conflicts are easily generated and the Cache efficiency is degraded.
  • the best way is to copy the 0th block and the 4th block in the main memory to the Cache at the same time, but because of the 0th in the main memory.
  • Both the block and the fourth block can only be copied to the 0th block of the Cache. Even if other storage spaces in the Cache are empty, they cannot be occupied. Therefore, the two blocks are continually buffered into the Cache, resulting in a lower hit rate.
  • the main memory and the Cache are divided into a plurality of groups, and the number of blocks in one set in the main memory is the same as the number of groups in the Cache.
  • the group associative mapping is a direct mapping mode between groups.
  • the group adopts a fully associative mapping method. For example, the main memory is divided into 256 groups, each group of 8 blocks, and the Cache is divided into 8 groups, each group having 2 blocks.
  • the 0th block and the 8th block in the main memory are all mapped to the 0th group of the Cache, but can be mapped to the 0th block or the 1st block in the Cache Group 0; the 1st block and the 9th block of the main memory ... are mapped to the first group of Cache, but can be mapped to the second or third block in the first group of Cache.
  • Associated Cache or 16-way set associative Cache may also be referred to as “collection”.
  • the DRAM 125 serves as a cache of the NVM 120 for buffering a portion of the memory blocks in the NVM 120. Therefore, it is also necessary to map the data in the main memory NVM 120 to the DRAM 125 in a certain mapping manner. In practical applications, the data in the NVM 120 is usually mapped into the DRAM 125 by a direct mapping method and a group associative mapping method.
  • FIG. 2 shows a mapping diagram of NVM 120 and DRAM 125 in the embodiment of the present application.
  • the storage space of the NVM 120 can be divided into a plurality of different cache sets: set 1 210_1, set 2 210_2, ... set N 210_N.
  • Each set is assigned a cache entry in DRAM 125.
  • cache entry 200_1 is a cache entry reserved for any of the storage addresses of set 1 210_1.
  • Cache entry 200_2 is a cache entry reserved for any storage address in set 2 210_2. In this way, the memory block corresponding to any one of the set 1 210_1 can be mapped into the cache entry 200_1.
  • a cache entry corresponds to one row of data.
  • a cache entry corresponds to a Cache Line.
  • a plurality of rows may be included in the DRAM 125, and each row may store a plurality of bytes of data.
  • Each cache entry includes at least a valid bit 201, a dirty bit 203, a tag 205, and a data 207. It can be understood that, in actual applications, each cache entry may further include an Error Correcting Code (ECC) to ensure the accuracy of the stored data.
  • ECC Error Correcting Code
  • the tag 205 is a part of the main memory address and is used to indicate the location of the memory block mapped by the cache block in the main memory NVM 120.
  • Data 207 refers to the data of the memory block cached in the cache block.
  • the valid bit 201 and the dirty bit 203 are both flag bits used to indicate the status of the cache line.
  • the valid bit 201 is used to indicate the validity of the cache line. When the valid bit 201 indicates valid, it indicates that the data in the cache line is available. When the valid bit 201 indicates invalid, it indicates that the data in the cache line is not available.
  • the dirty bit 203 is used to indicate whether the data in the cache line is the same as the data in the corresponding memory block. For example, when the dirty bit 203 indicates dirty, it indicates that the data portion in the cache line (such as data 207 in FIG. 2) is different from the data in the corresponding memory block, in another way, when the dirty bit 203 indicates dirty.
  • the cache line contains new data.
  • the dirty bit 203 When the dirty bit 203 indicates that it is clean, it indicates that the data in the cache line is the same as the data in the corresponding memory block. In practical applications, the dirty bit 203 may be indicated as being dirty or clean with a certain value. For example, when the dirty bit 203 is "1", the dirty bit 203 indicates dirty, indicating that the cache line contains new data; when the dirty bit 203 is "0", the dirty bit 203 indicates clean, indicating that it is in the cache line. The data is the same as the data in the corresponding memory block. It can be understood that the cache line indication may be identified as dirty or clean by other values, which is not limited herein.
  • FIG. 2 is a schematic diagram of mapping when the NVM 120 and the DRAM 125 are in the direct mapping mode.
  • the data in the DRAM 125 can also be organized in the form of a cache set according to the group connection mapping manner. In this manner, multiple cache sets can be included in DRAM 125, and each cache set can include multiple line data. In another way, multiple cache entries can be included in each cache set.
  • the cache entry 200_1 and the cache entry 200_2 in the DRAM 200 shown in FIG. 2 can be used as one set.
  • Set 1 210_1 in NVM 210 can be mapped to cache entry 200_1 or cache entry 200_2 in the set.
  • a set index is usually used to indicate the location of the cache line of the memory block map in the cache. It can be understood that in the direct mapping mode, set index can be used to indicate the location of a cache line of a memory block map in the cache. In the group associative mapping mode, a set index can be used to indicate the location of a set of cache lines in the cache. For example, in the above embodiment, when the 0th block and the 8th block in the main memory are all mapped to the 0th group of the Cache, the set index of the 0th group can be used to indicate the cache line of the 0th group. (Includes the 0th block and the 1st block) in the Cache location.
  • the memory controller 115 when receiving the memory access request, the memory controller 115 first determines whether the target address in the memory access request hits the DRAM 125 to determine whether the data to be accessed is stored in the DRAM 125. Specifically, when the memory controller 115 receives the access request from the processor 105 including the target address, the memory controller 115 may determine whether to hit the DRAM 125 by using a tag in the target address. Alternatively, the memory controller 115 can determine, by the tag in the target address, whether the DRAM 125 has buffered data in the address. Specifically, as shown in FIG. 3, the target address 300 can be divided into three parts: a tag 302, a set index 304, and a block offset 306.
  • the set index 304 is used to indicate which cache set the memory block pointed to by the target address 300 is mapped in the cache; the tag 302 is used to indicate the location of the memory block pointed to by the target address 300 in the main memory (for example, the NVM 120).
  • the block offset 306 is used to indicate that the data to be accessed is at the offset position of the row, that is, the block offset 306 is used to determine which location of the row the data to be accessed is in.
  • the memory controller 115 may first determine which of the DRAMs 125 the target address 300 belongs to according to the set index 304 portion of the target address 300. . Since at least one cache entry is included in a cache set. In other words, since at least one cache line is included in a cache set. Therefore, after determining the cache set to which the target address 300 belongs, the memory controller 115 can set the value of the tag 302 portion of the target address 300 to each cache entry in the cache set pointed to by the set index 304 portion (eg, the cache in FIG. 2). The tag bits in entry 200_1, cache entry 200_2, etc. (e.g., tag 205 in FIG. 2) are compared to determine if target address 300 hits DRAM 125.
  • the memory controller 115 can directly access the DRAM 125 to return the target data buffered in the DRAM 125 to the CPU.
  • the tag in the target address is not the same as the tag of the cache entry in the cache set, it is determined that the target address 300 misses the DRAM 125. In this case, the memory controller 115 needs to access the NVM 120 to retrieve the data to be accessed from the NVM 120.
  • the DRAM 125 As a cache of the NVM 120, on the basis of ensuring that the data in the main memory is not lost, the memory access time is shortened and the memory access speed is improved.
  • the tag In practical applications, in the case where the DRAM 125 is used as a cache of the NVM 120, the tag also needs to occupy a large storage space. For example, a cache entry size is usually 64B (Byte), and the tag usually occupies 4B. If the size of the DRAM is 16 GB, the size of the tag is 1 GB. Therefore, in practical applications, tags are usually stored in DRAM.
  • the capacity ratio between the DRAM and the NVM as the NVM cache may be 1:8, 1:16, 1:32, etc. Specific restrictions are made.
  • the capacity of the on-chip cache SRAM on the processor can be 20 MB
  • the capacity of the DRAM as an off-chip cache can be 32 GB
  • the capacity of the NVM can be 512 GB.
  • the second point to be explained is that for direct mapping, since the memory block in the main memory is mapped to a fixed position in the cache, only one cache line tag needs to be accessed when determining whether to hit the cache. Therefore, by adding burst (burst) technology or adopting ECC encoding, the tag and the data to be accessed are taken out together, so if the cache is hit, there is no need to access the cache again to obtain the data to be accessed. And directly return the same data with the tag to take out. That is, with a directly mapped cache, you only need to access the cache once when hitting the cache.
  • the memory block in the main memory can be mapped to any position in a fixed group, when determining whether to hit the cache, it is necessary to access the tags of the plurality of cache lines, so The mapping also takes the tag and the data to be accessed together, so the cache associated with the group association mapping needs to access the cache twice for DRAM when hitting the cache.
  • the tag Cache technology is used on the basis of the group association mapping, the access tag overhead can be reduced to a certain extent.
  • the tag Cache is directly used to cache the tags stored in the DRAM, but is limited. The capacity of the SRAM is small, and the tag Cache cannot cache all the tags, so the tag Cache technology still has defects, so that the performance of the computer system is still not high.
  • the tag Cache technology is to store the tags stored in some DRAMs in the SRAM.
  • This SRAM dedicated to buffering tags is called a tag Cache.
  • the tag Cache hit rate is more sensitive to capacity. That is, the larger the tag Cache capacity, the more tags are stored, so the hit rate of the subsequent tag Cache is higher, so only the capacity of the tag Cache is large enough.
  • the overhead caused by accessing tags stored in DRAM can be reduced.
  • SRAM as a tag Cache usually has a small capacity.
  • the embodiment of the present application divides the DRAM into two areas, namely, a direct associative area (Direct area, referred to as D area) and a Victim area (referred to as V area).
  • D area adopts a direct mapping
  • V area adopts a group associative mapping
  • the tag Cache is used to cache a tag corresponding to a partial cache block in the V area, thereby taking into consideration the advantages of the direct mapping and the group association mapping, which can be maintained.
  • a high hit rate and can also minimize the system overhead caused by access to the tag, to improve the performance of the computer system.
  • the structure of the DRAM of the embodiment of the present application will be explained below in conjunction with FIG. 6 before explaining the above two partitions in detail.
  • a plurality of cache blocks divided by the DRAM are divided into a plurality of packets, and one packet may be referred to as a set, wherein a set in FIG. 6 specifically includes 32 cache blocks.
  • a set in FIG. 6 specifically includes 32 cache blocks.
  • 39 bits can be used to represent a main memory address.
  • a 39-bit main memory address can be split into three parts: tag+set index+inter-block offset address.
  • the offset address in the block occupies 6 bits
  • the remaining 32 bits occupy 9 bits
  • the set index occupies 24 bits.
  • the set index since the D area adopts direct mapping, the set index specifically refers to the location of the cache line; since the V area adopts the group associative mapping, the set index specifically refers to the location of the group.
  • the embodiment of the present application divides each set into two zones, one of which is used to form the D zone, and the other zone is used to form the V zone.
  • the tag corresponding to the cache block in the D area is stored in the D area
  • the tag Cache located in the SRAM is responsible for buffering the tag corresponding to the partial cache block in the V area.
  • the tag Cache stores only the tags corresponding to the partial cache blocks in the V area.
  • the tag corresponding to the remaining cache block in the V zone is stored in the V zone.
  • the data is replaced from the D area and then enters the V area, and the tag Cache stores the tags corresponding to those data that may be frequently accessed by the processor.
  • the foregoing structure may be referred to as a TDV (Tag Cache-Direct-Victim) Cache.
  • TDV Tag Cache-Direct-Victim
  • the embodiment of the present application realizes that the DRAM is divided into two partitions of the D area and the V area by the TDV structure, and the D area adopts direct mapping, and the V area adopts group association mapping, which not only combines the direct mapping when the hit has a low access delay and the group.
  • the associated mapping has a high hit rate, and the tag cache is used to store the tags corresponding to the partial cache blocks in the V area. Therefore, in the scenario where the large-capacity DRAM is used as the off-chip cache, the tag Cache can still be guaranteed. .
  • the cache refers to the tag Cache
  • the first memory refers to the DRAM
  • the second memory refers to the NVM
  • the first partition refers to the D zone
  • the second partition refers to the V zone.
  • the cache is used to cache a tag corresponding to at least part of the cache block in the second partition of the first memory, and the second partition is used to cache data replaced from the first partition, one of the second memories
  • the memory block is mapped to one cache block of the first partition, that is, the first partition adopts direct mapping; one memory block in the second memory is mapped to one packet of the second partition, that is, the second partition adopts a group associative mapping.
  • FIG. 7 is a flowchart of a data access method according to an embodiment of the present application. Referring to FIG. 7, the method process provided by the embodiment of the present application includes:
  • the processor receives an access request, where the access request carries an access address, and the processor obtains a physical address of the data to be accessed according to the access address.
  • the processor needs to first convert the access address into a physical address, and then implement the actual memory access based on the physical address. In another way, the processor can convert the access address carried in the received access request into the physical address of the data to be accessed through the address translation technology.
  • the processor determines, according to the obtained tag in the physical address, whether the received access request hits the cache. If the cache is hit, performing step 703 described below. If the cache is missed, then step 704 described below is performed.
  • the obtained physical address can be split into three parts, which are respectively corresponding to the tag corresponding to the high address and the intermediate address.
  • the tag corresponding to all the cache blocks in the packet is obtained from the tag stored in the cache, which is referred to as N tags in the embodiment of the present application;
  • the tag in the physical address is compared with each of the N tags. When the tag in the obtained physical address matches one of the N tags, the received access request hit cache is determined.
  • the valid bit may be further determined.
  • the processor determines The received access request hit cache is not specifically limited in this embodiment of the present application.
  • the processor sends a memory access controller with the obtained physical address, and the memory controller obtains the physical address from the second partition according to the physical address carried in the memory access request. Obtaining the data to be accessed by the above access request, and returning the acquired data to the processor.
  • the memory controller Since the access request hits the cache, it indicates that the data to be accessed is stored in the second partition, so the memory controller acquires the data to be accessed from the second partition.
  • the overhead of accessing the tag is negligible in determining whether the access request hits the cache, so for this case, only the memory controller access exists.
  • the second partition of the first memory is accessed once to obtain the overhead of the data to be accessed. In other words, for this case, data reading is completed by one access to the first memory, which reduces the access delay and reduces the system overhead.
  • the access request that appears in the foregoing step may be referred to as a first access request, and the access address may be referred to as a first The access address, the physical address may be referred to as a first physical address, and the memory access request may be referred to as a first memory access request, and the data to be accessed may be referred to as first data to be accessed.
  • the processor sends a memory access controller to the memory controller to perform a memory access request, and the memory controller determines, according to the tag in the physical address, whether the memory access request is hit.
  • the first partition of the first memory if the first partition is hit, the following step 705 is performed; if the first partition is missed, the following step 706 is performed.
  • the obtained physical address may be split into three parts, which are respectively a tag corresponding to the high address and a cache corresponding to the intermediate address.
  • the memory controller can obtain the tag corresponding to the cache block indicated by the block address from the tag stored in the first partition according to the obtained block address. If the tag in the obtained physical address matches the tag corresponding to the cache block indicated by the block address, the memory controller determines that the received memory access request hits the first partition.
  • the valid bit may be further determined.
  • the memory controller determines the received access request hit cache, which is not specifically limited in this embodiment of the present application.
  • the memory controller may also read the data stored in the corresponding cache block together.
  • the first partition is accessed once and the tag and the data are obtained.
  • the reason why the memory controller reads the data together with the tag is that if the received access request is subsequently determined to hit the first partition, the first partition may not be accessed once again to obtain the data to be accessed. That is, in the case of a hit, the memory controller can acquire the data by accessing the first partition once, so the access delay of the direct mapping is small, and the system overhead is reduced.
  • the memory controller receives the memory access request and hits the first partition, the memory controller returns the data to be accessed by the access request obtained from the first partition to the processor.
  • the memory controller Since the memory access request hits the first partition, it indicates that the data to be accessed is stored in the first partition, so the memory controller acquires the data to be accessed from the first partition.
  • the access request that appears in the foregoing step may be referred to as a second access request
  • the access address may be referred to as
  • the physical address may be referred to as a second physical address
  • the memory access request may be referred to as a second memory access request
  • the data to be accessed may be referred to as second data to be accessed.
  • the memory controller determines, according to the tag in the physical address, whether the memory access request hits the second partition; if the second partition is hit, performing the following Step 707; If the second partition is missed, the following step 708 is performed.
  • the memory controller determines whether the memory access request hits the second partition according to the tag in the obtained physical address. The judgment of whether to hit the second partition is similar to the above step 702, and details are not described herein again.
  • the memory controller can access the second memory in synchronization with accessing the second partition, and try to access the second memory. Data acquisition is performed in the second memory.
  • the memory controller acquires data to be accessed from the second memory according to the physical address while accessing the second partition.
  • the memory controller can directly read data from the second partition, and since the data read speed of the second memory is far behind the second partition, It is possible to prevent the second memory from reading data.
  • the received memory access request misses the second partition, since the flow of acquiring data from the second memory has been started, the effect is far superior to the second when the memory access request misses the second partition.
  • the memory performs data acquisition.
  • the memory controller When the memory controller receives the memory access request and hits the second partition, the memory controller acquires the third data to be accessed by the access request from the second partition, and returns the acquired data to the processor.
  • the memory controller Since the memory access request hits the second partition, it indicates that the data to be accessed is stored in the second partition, so the memory controller acquires the data to be accessed from the second partition.
  • the access request that appears in the foregoing step may be referred to as a third access request, and the access address may be It is called a third access address, and the physical address may be referred to as a third physical address.
  • the memory access request may be referred to as a third memory access request, and the data to be accessed may be referred to as third data to be accessed.
  • the memory controller When the memory controller receives the memory access request and misses the second partition, the memory controller acquires the data to be accessed by the access request from the second memory according to the physical address, and returns the acquired data to the processor. .
  • the memory controller receives the memory access request and misses the first partition to the second partition, so the memory controller directly acquires the data to be accessed from the second memory.
  • the memory controller stores the data acquired from the second memory in the first partition, and stores the replaced data in the first partition in the second partition of the first memory.
  • the data to be accessed is on a particular cache block that needs to be mapped to the first partition.
  • the originally cached data on the specific cache blocks is also replaced, and the replaced data is migrated in the embodiment of the present application. Go to the second partition. After the data that is replaced in the first partition is stored in the second partition, when the subsequent processor requests access to the replaced data, data acquisition can be directly performed from the second partition, thereby avoiding storage of data that will be replaced. In the second memory, the data is read slowly when the subsequent access is replaced by the data.
  • the data stored on the first partition is changed, so in addition to performing data replacement, the tag needs to be modified synchronously, and the tag corresponding to the replaced data is updated to be corresponding to the data to be accessed.
  • Tag Similarly, after the data that has been replaced from the first partition is stored in the second partition, the tags stored in the V area are also updated synchronously. In summary, when data is updated in the first partition and the second partition, the tags are also updated synchronously. In addition, the tag stored in the cache can also be updated. For example, if the processor frequently accesses a certain data stored in the second partition, the tag corresponding to the data can be stored in the cache. Alternatively, after the data that is to be replaced in the first partition is stored in the second partition, the tag corresponding to the data to be replaced may be directly stored in the cache, which is not specifically limited in this embodiment of the present application.
  • the access request that appears in the foregoing step may be referred to as a fourth access request, and the access is performed.
  • the address may be referred to as a fourth access address
  • the physical address may be referred to as a fourth physical address.
  • the memory access request may be referred to as a fourth memory access request
  • the data to be accessed may be referred to as a fourth data to be accessed.
  • the processor may first access the cache before accessing the first partition and the second partition of the DRAM, and if the access request hits the cache, the processor controls the memory controller directly from the cache controller.
  • the DRAM adopts the second partition of the group associative mapping to obtain the data to be accessed by the access request. Since the system overhead of accessing the cache read tag is negligible when determining whether to hit the cache, only the first memory needs to be accessed once. Acquiring data, thus effectively reducing the system overhead while ensuring a high hit rate, and improving the performance of the computer system.
  • the processor may also send a memory access request to the memory controller to cause the memory controller to determine whether the memory access request hits the first partition, where the memory controller accesses the first partition.
  • the tag When the tag is obtained, the corresponding data can also be taken out together, so that the first partition can be hit to complete the data acquisition by accessing the DRAM once, thereby effectively reducing the system overhead when hitting the first partition, and improving the computer system. performance.
  • FIG. 8 is a schematic structural diagram of a computer system according to an embodiment of the present application.
  • the computer system includes a processor 801, a cache 802, a first memory 803, a second memory 804, and a memory controller 805.
  • the first memory 803 is configured to cache 802 data in the second memory 804, and cache 802.
  • the first memory 803 includes a first partition and a second partition, and the second partition is used for the cache 802 to be replaced from the first partition.
  • Cache 802 blocks, one memory block in the second memory 804 is mapped to one cache 802 block of the first partition, one memory block in the second memory 804 is mapped to one packet of the second partition, and one packet of the second partition Includes multiple cache 802 blocks;
  • the processor 801 is configured to obtain a first physical address according to the first access address in the first access request.
  • the processor 801 is further configured to determine, according to the first tag tag in the first physical address, whether the first access request hits the cache 802.
  • the processor 801 is further configured to: when the first access request hits the cache 802, send a first memory access request to the memory controller 805, where the first memory access request carries the first physical address;
  • the memory controller 805 is configured to obtain, according to the first physical address, the first data to be accessed by the first access request from the second partition.
  • the computer system divides the first memory into two areas, that is, adopts a first partition directly mapped and a second partition that adopts a group associative mapping, and uses a cache to store a part in the second partition.
  • the tag corresponding to the cache block takes into account the advantages of direct mapping and group association mapping.
  • the processor can access the cache before accessing the first partition and the second partition of the first memory. If the access request hits the cache, the processor may control the memory controller to directly obtain the data to be accessed from the second partition of the first memory, and the system overhead of accessing the cache read tag may be ignored due to the determination of whether to hit the cache. Except, so only one access to the first memory is required to obtain data, thus effectively reducing the system overhead while ensuring a high hit rate, and improving the performance of the computer system.
  • the processor 801 is further configured to obtain the second physical address according to the second access address in the second access request.
  • the processor 801 is further configured to: when determining the second access request miss cache 802 according to the second tag in the second physical address, send a second memory access request to the memory controller 805, where the second memory access request carries the first Two physical addresses;
  • the memory controller 805 is further configured to determine, according to the second tag, whether the second memory access request hits the first partition of the first memory 803;
  • the memory controller 805 is further configured to: when the second memory access request hits the first partition of the first memory 803, acquire second data that is to be accessed by the second access request from the first partition.
  • the processor 801 is further configured to obtain a third physical address according to the third access address in the third access request.
  • the processor 801 is further configured to: when determining the third access request miss cache 802 according to the third tag in the third physical address, send a third memory access request to the memory controller 805, where the third memory access request carries the third Three physical addresses;
  • the memory controller 805 is further configured to determine, according to the third tag, whether the third memory access request hits the first partition of the first memory 803;
  • the memory controller 805 is further configured to: when the third memory access request misses the first partition of the first memory 803, determine, according to the third tag, whether the third memory access request hits the second partition of the first memory 803;
  • the memory controller 805 is further configured to: when the third memory access request hits the second partition of the first memory 803, acquire third data that is to be accessed by the third access request from the second partition.
  • the processor 801 is further configured to obtain a fourth physical address according to the fourth access address in the fourth access request.
  • the processor 801 is further configured to: when determining the fourth access request miss cache 802 according to the fourth tag in the fourth physical address, the memory controller 805 sends a fourth memory access request, where the fourth memory access request carries the fourth Physical address
  • the memory controller 805 is further configured to: when determining that the fourth memory access request misses the first partition and the second partition of the first memory 803 according to the fourth tag, acquiring the fourth access from the second memory 804 according to the fourth physical address Requesting fourth data to be accessed;
  • the memory controller 805 is further configured to store the fourth data in the first partition of the first memory 803, and store the replaced data in the first partition in the second partition of the first memory 803.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Abstract

本申请提供了一种数据访问方法以及计算机系统,属于内存技术领域。该方法应用于计算机系统,计算机系统包括处理器、缓存、第一存储器、第二存储器以及内存控制器,第一存储器用于缓存第二存储器中的数据,缓存用于缓存第二分区中部分缓存块对应的tag,第一存储器的第二分区用于缓存从第一分区中替换出来的缓存块,第二存储器中的一个内存块映射到第一分区的一个缓存块,映射到第二分区的一个包括多个缓存块的分组中,方法包括:处理器根据访问请求中的访问地址获得物理地址,当根据物理地址中的tag确定访问请求命中缓存时,向内存控制器发送携带有物理地址的访存请求,内存控制器根据物理地址从第二分区中获取数据。本申请提升了计算机系统的性能。

Description

数据访问方法以及计算机系统 技术领域
本申请涉及内存技术领域,特别涉及一种数据访问方法以及计算机系统。
背景技术
随着内存技术的发展,如相变存储器(Phase Change Memory,PCM)等非易失性存储器(None Volatile Memory,NVM)的运用越来越广泛。在系统断电之后,NVM仍能保存数据,且NVM具有密度高,可扩展性好等优点,因此,NVM逐渐被作为内存来使用。
在现有的混合内存架构中,动态随机访问内存(Dynamic Random Access Memory,DRAM)以及非易失性内存(Non-Volatile Memory,NVM)可以一起作为计算机系统的内存。其中,由于NVM相较于DRAM具有容量大、易扩展、非易失以及访问速度相对较慢等特性,因此在计算机系统中通常将NVM作为主存,而DRAM作为NVM的缓存,用于缓存NVM中的部分数据。
在实现本申请的过程中,发明人发现,在混合内存架构的计算机系统中,当内存控制器处理访存请求时,会先根据访存请求中的地址对应的标签(tag)判断该访存请求是否命中DRAM,以确定待访问的数据是否存储于DRAM中。在DRAM作为NVM的缓存的情况下,由于DRMA中缓存的缓存块的地址对应的tag也需要占用较大的存储空间。因此,实际应用中,通常将缓存块对应的tag也存放在DRAM中。在这种情况下,当内存控制器处理访存请求时,需要先从DRAM中读取tag,以将访存请求对应的tag与DRAM中缓存的tag进行比较以判断访存请求是否命中DRAM。在这种情况下,由于增加了一次在读tag时对DRAM的访问,所以引入了额外的访问延迟,造成了较大的系统开销,降低了计算机系统的性能。
发明内容
本申请实施例提供了一种数据访问方法以及计算机系统,解决了相关技术中由于引入额外的访问延迟,造成较大的系统开销,导致计算机系统性能降低的问题。所述技术方案如下:
第一方面,提供了一种数据访问方法,应用于计算机系统,所述计算机系统包括处理器、缓存、第一存储器、第二存储器以及内存控制器,其中,所述第一存储器用于缓存所述第二存储器中的数据,所述缓存用于缓存所述第一存储器的第二分区中的至少部分缓存块对应的tag,所述第一存储器包括第一分区以及所述第二分区,所述第二分区用于缓存从所述第一分区中替换出来的缓存块,所述第二存储器中的一个内存块映射到所述第一分区的一个缓存块,所述第二存储器中的一个内存块映射到所述第二分区的一个分组中,所述第二分区的一个分组包括多个缓存块。
基于上述计算系统,本申请实施例在实现数据访问时,可分为下述几种情形。
情形一、访问请求命中缓存。在该种情形下,数据访问流程如下:
所述处理器根据第一访问请求中的第一访问地址获得第一物理地址;所述处理器根据所述第一物理地址中的第一tag确定所述第一访问请求是否命中所述缓存;当所述第一访问请求命中所述缓存时,所述处理器向所述内存控制器发送第一访存请求,所述第一访存请求中携带有所述第一物理地址;所述内存控制器根据所述第一物理地址,从所述第二分区中获取 所述第一访问请求待访问的第一数据。
情形二、访问请求未命中缓存,但是访存请求命中第一存储器的第一分区。在该种情形下,数据访问流程如下:
即,在第一方面的第一种可能的实现方式中,所述处理器根据第二访问请求中的第二访问地址获得第二物理地址;当所述处理器根据所述第二物理地址中的第二tag确定所述第二访问请求未命中所述缓存时,所述处理器向所述内存控制器发送第二访存请求,所述第二访存请求中携带有所述第二物理地址;
所述内存控制器根据所述第二tag判断所述第二访存请求是否命中所述第一存储器的第一分区;当所述第二访存请求命中所述第一存储器的第一分区时,所述内存控制器从所述第一分区中获取所述第二访问请求待访问的第二数据。
情形三、访问请求未命中缓存,访问请求未命中第一存储器的第一分区但是命中第二分区。在该种情形下,数据访问流程如下:
即,在第一方面的第二种可能的实现方式中,所述处理器根据第三访问请求中的第三访问地址获得第三物理地址;当所述处理器根据所述第三物理地址中的第三tag确定所述第三访问请求未命中所述缓存时,所述处理器向所述内存控制器发送第三访存请求,所述第三访存请求中携带有所述第三物理地址;
所述内存控制器根据所述第三tag判断所述第三访存请求是否命中所述第一存储器的第一分区;当所述第三访存请求未命中所述第一存储器的第一分区时,所述内存控制器根据所述第三tag判断所述第三访存请求是否命中所述第一存储器的第二分区;
当所述第三访存请求命中所述第一存储器的第二分区时,所述内存控制器从所述第二分区中获取所述第三访问请求待访问的第三数据。
情形四、访问请求未命中缓存,且访问请求未命中第一存储器的第一分区以及第二分区。在该种情形下,数据访问流程如下:
即,在第一方面的第三种可能的实现方式中,所述处理器根据第四访问请求中的第四访问地址获得第四物理地址;当所述处理器根据所述第四物理地址中的第四tag确定所述第四访问请求未命中所述缓存时,所述处理器向所述内存控制器发送第四访存请求,所述第四访存请求中携带有所述第四物理地址;
当所述内存控制器根据所述第四tag确定所述第四访存请求未命中所述第一存储器的第一分区以及第二分区时,所述内存控制器根据所述第四物理地址从所述第二存储器中获取所述第四访问请求待访问的第四数据;
所述内存控制器将所述第四数据存储在所述第一存储器的第一分区中,并将所述第一分区中被替换出来的数据存储于所述第一存储器的第二分区。
第二方面,提供了一种计算机系统,所述计算机系统包括处理器、缓存、第一存储器、第二存储器以及内存控制器,其中,所述第一存储器用于缓存所述第二存储器中的数据,所述缓存用于缓存所述第一存储器的第二分区中的至少部分缓存块对应的tag,所述第一存储器包括第一分区以及所述第二分区,所述第二分区用于缓存从所述第一分区中替换出来的缓存块,所述第二存储器中的一个内存块映射到所述第一分区的一个缓存块,所述第二存储器中的一个内存块映射到所述第二分区的一个分组中,所述第二分区的一个分组包括多个缓存块。 其中,处理器以及内存控制器用于执行如上述第一方面所述的数据访问方法。
本申请实施例提供的技术方案带来的有益效果是:
本申请实施例将第一存储器分为了两个区,即采取直接映射的第一分区以及采取组相联映射的第二分区,且利用于缓存来存储第二分区中的部分缓存块对应的tag,这样的设计同时兼顾了直接映射与组相联映射的优点,处理器在接收到一个访问请求后,在访问第一存储器的第一分区以及第二分区之前可以先访问缓存,如果访问请求命中缓存,则处理器可控制内存控制器直接从第一存储器的第二分区获取访问请求待访问的数据,由于在进行是否命中缓存的判断时访问缓存读tag的系统开销可以忽略不计,所以仅需对第一存储器进行一次访问来获取数据,因此在确保了高命中率的同时有效地减小了系统开销,提升了计算机系统的性能。
附图说明
图1是本申请实施例提供的一种计算机系统的结构示意图;
图2是本申请实施例提供的一种NVM和DRAM的映射示意图;
图3是本申请实施例提供的一种目标地址的组成示意图;
图4是本申请实施例提供的一种SRAM、DRAM和NVM的容量示意图;
图5是本申请实施例提供的一种TDV结构的示意图;
图6是本申请实施例提供的一种主存地址与DRAM的分组的示意图;
图7是本申请实施例提供的一种数据访问方法的流程图;
图8是本申请实施例提供的一种计算机系统的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
图1为本申请实施例提供的。如图1所示,计算机系统100至少可以包括处理器105、内存控制器115、NVM 120以及DRAM 125。其中,DRAM 125和NVM 120均为计算机系统100的内存。可以理解的是,图1所示的计算机系统的连接关系仅仅是具有混合内存架构的计算机系统的一种示例,图1中所示的DRAM 125和NVM 120仅仅是计算机系统中的多级存储器的一种示例。实际应用中,并不对计算机系统中的内部结构进行具体的限定,且计算机系统中还可以包括除DRAM 125和NVM 120之外的其他存储器。本申请实施例中的计算机系统只要包括能够作为内存的第一级存储器以及第二级存储器,且第一级存储器能够支持高速缓存访问即可。换一种表达方式,本申请实施例中的计算机系统只要包含至少两级存储器,且第一级存储器为第二级存储器的缓存,用于缓存第二存储器中的部分数据即可。下面将以图1为例对本申请实施例提供的具有混合内存架构的计算机系统进行介绍。
处理器105是计算机系统100的核心,处理器105可以调用计算机系统100中不同的软件程序实现不同的功能。例如,处理器105能够实现对DRAM 125、NVM 120的访问。可以理解的是,处理器105可以是中央处理器(Central Processing Unit,CPU)。除了CPU外,处理器还可以是其他特定集成电路(application specific integrated circuit,asic),或者是被配置成实施本申请实施例的一个或多个集成电路。为了描述方便,本申请实施例以一个处理器 为例进行示例,实际应用中,计算机系统还可以包括多个处理器。另外,处理器可以是单核处理器,也可以是多核处理器。在多核处理器架构中,处理器中可以包括多个处理器核。例如,如图1所示,处理器105中可以包括一个或多个CPU核110。在本申请实施例中不对处理器的数量以及一个处理器中处理器核的数量进行限定。
内存控制器115是计算机系统100内部控制内存并且使内存与处理器105(例如CPU)之间交换数据的重要组成部分。实际应用中,一种情况下,内存控制器115可以位于北桥芯片内部。在另一种情况下,可以将内存控制器115集成在处理器105中(如图1所示),具体的,内存控制器115可以集成在处理器105的基板上。可以理解的是,当内存控制器115位于北桥芯片内部时,内存控制器需要通过北桥芯片与处理器交换数据,导致数据的延迟较大。内存控制器115可以集成在处理器105中时,内存控制115可以直接与处理器交换数据。
如图1所示,内存控制器115可以耦合NVM控制器116以及DRAM控制器118。其中,DRAM控制器118用于控制对DRAM 125的访问,NVM控制器116用于控制对NVM 120的访问。NVM控制器116以及DRAM控制器118又可以被称为介质控制器。可以理解的是,实际应用中,一种情况下,NVM控制器116和DRAM控制器118可以独立于内存控制器115。另一种情况下,NVM控制器116和DRAM控制器118也可以集成在内存控制器115中,在逻辑上作为内存控制器115一部分(如图1所示)。在本申请实施例中,内存控制器115可以通过内存总线(例如,双倍速率DDR总线)连接NVM 120和DRAM 125。可以理解的是,实际应用中,NVM控制器125还可以通过PCI高速总线、桌面管理接口(DMI)总线等其他类型的总线与NVM120通信。
需要说明的一点是,内存控制器115除了可通过图1所示的方式连接NVM 120和DRAM 125之外,还可以采取其他方式,比如内存控制器115可以通过内存总线直接连接DRAM 125,但是通过DRAM 125间接与NVM 120连接。换一种表达方式,内存控制115通过内存总线连接DRAM 125,DRAM 125通过内存总线连接NVM 120。
如前所述,在图1所示的计算机系统中,DRAM 125可以通过内存总线与处理器105连接。DRAM 125具有访问速度快的优点。处理器105能够高速访问DRAM 125,对DRAM 125进行读或写操作。通常DRAM 125用来存放操作系统中各种正在运行的软件、输入和输出数据以及与外存交换的信息等。然而,DRAM 125是易失性的,当关闭电源后,DRAM 125中的信息将不再保存。
由于新型NVM能够按字节(Byte)寻址,将数据以位(bit)为单位写入非易失性存储器中,因而能够作为内存使用。在本申请实施例中,NVM 120可以与DRAM 125共同作为计算机系统100的内存。与DRAM 125相比,由于NVM 120具有非易失性的特点,从而能够更好地保存数据。在本申请实施例中,可以将能够作为内存使用的非易失性存储器称为存储级内存(Storage Class Memory,SCM)。
需要说明的是,DRAM是易失性内存(volatile memory)的一种,实际应用中还可以采用其他的随机存储器(Random Access Memory,RAM)作为计算机系统的内存。例如,还可以采用静态随机存储器(Static Random Access Memory,SRAM)作为计算机系统的内存。在本申请实施例中,图1中所示的NVM 120可以包括:相变存储器(Phase-change Random Access memory,PCM),阻变存储器(Resistive Random Access Memory,RRAM)、磁性存储器(Magnetic Random Access Memory,MRAM)或铁电式存储器(Ferroelectric Random Access  Memory,FRAM)等新型非易失性存储器,在此不对本申请实施例中的NVM的具体类型进行限定。
由于与DRAM 125相比,NVM 120的访问速度相对较慢,因此,通常将NVM 120作为系统的主存(main memory)使用,将DRAM 125作为NVM 120的缓存(Cache)使用,以弥补主存NVM 120访问速度慢的缺陷,提高内存访问速度。如图1所示,在图1所示的计算机系统中,DRAM 125作为NVM 120的缓存。当内存控制器115接收到处理器105发送的访存请求时,首先确定访存请求中的目标地址(即,待访问的内存块的地址)是否命中DRAM 125,以确定待访问的数据是否存储于DRAM 125中。当确定访存请求中的目标地址命中DRAM 125时,内存控制器115可以直接从DRAM 125中获取待访问的数据,以缩短访问延时。当内存控制器115确定访存请求中的目标地址没有命中DRAM 125时,内存控制器115才从NVM 120中获取待访问的数据。
本领域人员可以知道,由于Cache的容量很小,Cache保存的内容只是主存内容的一个子集,且Cache与主存的数据交换是以块为单位的。为了把主存中的数据缓存到Cache中,必须应用某种函数把主存地址定位到Cache中,这称为地址映射。在将主存中的数据按这种映射关系缓存到Cache中后,CPU执行程序时,会将程序中的主存地址变换成Cache地址。Cache的地址映射方式通常有直接映射和组相联映射。下面在对直接映射以及组相联映射进行解释说明之前,先对Cache进行一下详细地解释说明。
虽然Cache的容量相较于主存来说较小,但是速度相较于主存来说却快的多,因此Cache的主要功能是用来存储近期处理器可能需要频繁访问到的数据。这样处理器便可以直接到Cache中进行数据读取,而无需频繁地访问速度较慢的主存,以此来提高处理器对内存的访问速度。其中,Cache的基本单位是Cache Line,在本申请实施例中Cache Line又可称之为缓存块或缓存行。另外,与Cache分成多个缓存块类似,主存中存储的数据也进行了类似划分。为了描述方便,在本申请实施例中也可以将NVM 120中的划分出来的数据块称为内存块。通常,一个内存块的大小可以为4KB(kilobyte),一个缓存块的大小也可以为4KB。可以理解的是,实际应用中,还可以将内存块和缓存行的大小设置为其他值,仅需保证内存块的大小与缓存块的大小相同即可。
在直接映射方式下,主存中的一个内存块只能映射到Cache的某一特定的缓存块中去,换一种表达方式,主存中的一个内存放置到Cache中唯一的位置上。例如,假设主存有16个内存块,且16个内存块以数字0至15进行顺序编号,Cache有4个块,则主存的第0块、第4块、第8块和第12块只能映射到Cache的第0块;而主存的第1块、第5块、第9块和第13块只能映射到Cache的第1块……。直接映射是最简单的地址映射方式,它的硬件简单,成本低,地址变换速度快。但是这种方式不够灵活,Cache的存储空间得不到充分利用。由于每个内存块只能存放在Cache中的一个固定位置,因此容易产生冲突,使Cache效率下降。
例如,如果一个程序需要重复引用主存中第0块与第4块,最好的方式是将主存中的第0块与第4块同时复制到Cache中,但由于主存中的第0块与第4块都只能复制到Cache的第0块中去,即使Cache中别的存储空间空着也不能占用,因此这两个块会不断地交替缓存到Cache中,导致命中率降低。
在组相联映射方式中,将主存和Cache都分成多个组,主存中一个组(set)内的块的数量与Cache中的组的数量相同。主存中的各块与Cache的组号之间有固定的映射关系,但可 自由映射到对应Cache组中的任何一块。换一种表达方式,在这种映射方式下,内存块存放到哪个组是固定的,至于存到该组中的哪一块则是灵活的,即组相联映射是组间采取直接映射方式,但是组内采取全相联映射方式。例如,主存分为256组,每组8块,Cache分为8组,每组2块。主存中的第0块、第8块……均映射于Cache的第0组,但可映射到Cache第0组中的第0块或第1块;主存的第1块、第9块……均映射于Cache的第1组,但可映射到Cache第1组中的第2块或第3块。在采用组相联映射方式的cache中,每组内可以有2、4、8或16块,相应的,可以分别被称为2路组相联Cache、4路组相联Cache、8路组相联Cache或16路组相联Cache。需要说明的是,本申请实施例中的“组”也可以被称为“集合”。
在本申请实施例中,由于DRAM 125作为NVM 120的Cache,用于缓存NVM 120中的部分内存块。因此也需要将主存NVM 120中的数据按照某种映射方式映射到DRAM 125中。实际应用中,通常采用直接映射方式和组相联映射方式将NVM 120中的数据映射到DRAM 125中。
图2示出了本申请实施例中NVM 120和DRAM 125的映射示意。如图2所示,NVM 120的存储空间可以被分成多个不同的缓存集合(set):set 1 210_1,set 2 210_2,…set N 210_N。每个set被分配一条DRAM125中的缓存条目(cache entry)。例如,如图2所示,cache entry 200_1是为与set 1 210_1中任意一个存储地址预留的cache entry。cache entry 200_2是为与set 2 210_2中任意一个存储地址预留的cache entry。根据这种方式,set 1 210_1中任意一个存储地址对应的内存块均可以映射到cache entry 200_1中。
图2中还示出了DRAM 125中的数据的组织结构,如图2中的DRAM组织结构200所示,一条缓存条目(cache entry)对应一行数据。换一种表达方式,一条cache entry对应一个缓存行(Cache Line)。DRAM125中可以包括多个行,每一行可以存储多个字节的数据。每一条cache entry至少包括有效位(valid bit)201、脏位(dirty bit)203、标签(tag)205以及数据(data)207。可以理解的是,实际应用中,每一条cache entry中还可以包括纠错码信息(Error Correcting Code,ECC),以保证存储的数据的准确性。其中,标签(tag)205为主存地址的一部分,用于指示缓存块映射的内存块在主存NVM 120中的位置。data 207是指缓存块中缓存的内存块的数据。
有效位201和脏位203均为标识位(flag bit),用来指示缓存行的状态。有效位201用于指示缓存行的有效性。当有效位201指示为有效(valid)时,说明该缓存行中的数据可用。当有效位201指示为无效(invalid)时,说明该缓存行中的数据不可用。脏位203用于指示缓存行中数据是否与对应的内存块中的数据相同。例如,当脏位203指示为脏时,说明缓存行中的数据部分(如图2中的data 207)与对应的内存块中的数据不同,换一种表达方式,当脏位203指示为脏时,该缓存行中包含有新数据。当脏位203指示为干净(clean)时,说明该缓存行中的数据与对应的内存块中的数据相同。实际应用中,可以以某个值指示脏位203指示为脏或干净。例如,当脏位203为“1”时,脏位203指示为脏,表示缓存行中包含有新数据;当脏位203为“0”时,脏位203指示为干净,表示缓存行中的数据与对应的内存块中的数据相同。可以理解的是,还可以以其他值标识缓存行指示为脏或干净,在此不做限定。
图2是NVM120和DRAM 125采用直接映射方式时的映射示意。可以理解的是,在本申请实施例中,还可以按照组相连映射方式将DRAM 125中的数据按照缓存集合(set)的形式进行组织。在这种方式下,DRAM 125中可以包括多个缓存集合(set),每个缓存集合可以包 括多行(line)数据。换一种表达方式,每个缓存集合中可以包括多条cache entry。例如可以将图2中所示的DRAM 200中的cache entry200_1和cache entry200_2作为一个集合。NVM 210中的Set 1 210_1可以映射到该集合中的cache entry 200_1或cache entry 200_2。
在将主存中的数据映射到Cache中后,通常采用集合索引(set index)来指示内存块映射的缓存行在缓存中的位置。可以理解的是,在直接映射方式下,set index可以用于指示一个内存块映射的一个缓存行在cache中的位置。在组相联映射方式下,一个set index可以用于指示一组缓存行在cache中的位置。例如,在上述实施例中,当将主存中的第0块、第8块……均映射于Cache的第0组时,则第0组的set index可以用于指示第0组的缓存行(包括第0块和第1块)在Cache中的位置。
如前所述,内存控制器115接收到访存请求时,首先确定访存请求中的目标地址是否命中DRAM125,以确定待访问的数据是否存储于DRAM 125中。具体地,当内存控制器115接收到处理器105发送的包含目标地址的访问请求时,内存控制器115可以通过所述目标地址中的标签(tag)来确定是否命中DRAM 125。换一种表达方式,内存控制器115可以通过所述目标地址中的tag判断DRAM 125是否缓存有该地址中的数据。具体的,如图3所示,目标地址300可以分成三个部分:标签(tag)302、集合索引(set index)304以及块偏移(block offset)306。其中,set index 304用于指示目标地址300指向的内存块映射在缓存中的哪个缓存集合;tag 302用于指示目标地址300指向的内存块在主存(例如,NVM120)中的位置。块偏移306用于指出待访问的数据在该行的偏移位置,也就是说,块偏移306用于确定待访问的数据在这行的哪一个位置。
实际应用中,当内存控制器115接收到处理器105发送的目标地址300时,内存控制器115可以先根据目标地址300中的set index 304部分确定目标地址300属于DRAM 125中的哪一个缓存集合。由于一个缓存集合中包括至少一个cache entry。换句话说,由于一个缓存集合中包括至少一个缓存行。因此,在确定目标地址300所属的缓存集合后,内存控制器115可以将目标地址300中的tag 302部分的值与set index 304部分指向的缓存集合中的各个cache entry(例如图2中的cache entry200_1、cache entry200_2等)中的tag位(例如图2中的tag 205)进行比较,以确定目标地址300是否命中所述DRAM 125。
当目标地址300的tag与所述缓存集合中的某个cache entry中的tag相同时,说明待访问的数据缓存在DRAM 125中。在这种情况下,内存控制器115可以直接访问DRAM 125,将DRAM 125中缓存的目标数据返回给CPU。当目标地址中的tag与所述缓存集合中的cache entry的tag不相同时,确定目标地址300未命中DRAM 125。在这种情况下,内存控制器115需要访问NVM 120,从NVM 120中获取待访问的数据。
通过将DRAM 125作为NVM 120的缓存,在确保主存中的数据掉电不丢失的基础上,缩短了内存访问时间,提高了内存访问速度。实际应用中,在DRAM 125作为NVM 120的缓存的情况下,tag也需要占用较大的存储空间。以一条cache entry的大小通常为64B(Byte)为例,tag通常占了其中的4B。若DRAM的大小为16GB,则其中tag的大小则为1GB。因此,实际应用中,通常将tag也存放在DRAM中。
以上内容给出了涉及混合内存架构的计算机系统的完整介绍。需要说明的第一点是,在该种计算机系统中,作为NVM缓存的DRAM与NVM之间的容量比例可为1:8、1:16以及 1:32等等,本申请实施例对此不进行具体限定。举例来说,如图4所示,位于处理器上的片上缓存SRAM的容量可为20MB大小,作为片外缓存的DRAM的容量可为32GB大小,而NVM的容量可为512GB大小。
需要说明的第二点是,对于直接映射来说,由于主存中的内存块是映射到缓存中的一个固定位置,因此在进行是否命中缓存的判断时,仅需访问一个缓存行的tag,因此可以通过增大burst(突发)技术或者采用ECC编码等方式,实现将tag和待访问的数据一并取出,这样如果命中缓存,则不需要再次访问缓存去进行待访问的数据的获取,而直接返回同tag一并取出的数据即可。即,采用直接映射的缓存,在命中缓存时可以仅需要访问一次缓存。对于组相联映射来说,由于主存中的内存块可以映射到一个固定分组中的任意位置上,因此在进行是否命中缓存的判断时,需要访问多个缓存行的tag,因此无法如直接映射一样将tag和待访问的数据一并取出,所以采用组相联映射的缓存,在命中缓存时需要访问缓存两次DRAM。
基于上述分析,在进行作为NVM缓存的DRAM结构设计时,如果DRAM仅采用直接映射,则由于主存中的多个内存块映射到DRAM中的一个固定缓存块,其中一个缓存块对应的数据块的数量要明显多于组相联映射,因此其相较于组相联映射存在因冲突多进而导致命中率较低的问题;而如果仅采用组相联映射,由于引入了访问tag的开销,且该开销在DRAM中由于访问延迟较大不可忽略,因此使得组相联映射虽然命中率高,但是相较于直接映射性能反而更低。而即便在组相联映射的基础上采用tag Cache技术可以在一定程度上减少访问tag开销,但是针对作为片外缓存的DRAM而言,直接使用tag Cache来缓存DRAM中存储的tag,但是受限于SRAM的容量较小,tag Cache不可能缓存全部tag,所以tag Cache技术还是存在缺陷,致使计算机系统的性能依然不高。
其中,tag Cache技术即是在SRAM中缓存部分DRAM中存储的tag,这个专门用于缓存tag的SRAM便称之为tag Cache。但是tag Cache的命中率对容量较为敏感,即tag Cache的容量越大,存储的tag数量便越多,这样后续tag Cache的命中率也越高,所以仅有保证tag Cache的容量足够大,方可降低因为访问DRAM中存储的tag所造成的开销。但是受限于造价昂贵等因素限制,作为tag Cache的SRAM通常容量很小。
综上所述,单纯地直接使用上述方案均不能达到较高的读取效率。基于此,针对DRAM而言,在tag Cache因为SRAM容量过小,而无法发挥作用的情况下,如何使用有限的tag Cache,可以维持较高的命中率,且还可尽量减少因访问tag造成的系统开销成为了一个关键所在。
为此,参见图5,本申请实施例将DRAM分为了两个区,即直接相联区(Direct区,简称D区)和Victim区(简称V区)。其中,D区采取直接映射,而V区采用组相联映射,且tag Cache用于缓存V区中的部分缓存块对应的tag,这样同时兼顾了直接映射与组相联映射的优点,可以维持较高的命中率,且还可尽量减少因访问tag造成的系统开销,实现提升计算机系统的性能。下面在对上述两个分区进行详细地解释说明之前,先结合图6对本申请实施例DRAM的结构组织方式进行一下解释说明。
参见图6,DRAM划分出来的多个缓存块被分成了多个分组,一个分组可称之为一个set,其中在图6中一个set具体包括32个缓存块。在本申请实施例中,以主存的容量大小是512GB为例,则可用39位来表示一个主存地址。
另外,如图6所示,一个39位的主存地址可被拆分为tag+set index+块内偏移地址这3 部分。其中,由于一个缓存块的容量大小为64bit,因此块内偏移地址占据6位,剩余的32位中tag占据9位,set index占据24位。针对set index而言,由于D区采取直接映射,因此set index具体指代缓存行的位置;由于V区采取组相联映射,因此set index具体指代分组的位置。
在这种DRAM的结构组织方式下,本申请实施例将每一个set均分成了两个区,其中一个区用于构成D区,而另一个区用于构成V区。其中,D区中的缓存块对应的tag存储在D区,而位于SRAM中的tag Cache负责缓存V区中的部分缓存块对应的tag。需要说明的是,由于V区的容量很大,因此tag数量也较为庞大,而SRAM的容量较小,因此tag Cache存储的通常仅是V区中的部分缓存块对应的tag。而V区中的剩余缓存块对应的tag存储在V区中。还需要说明的一点是,在本申请实施例中数据从D区被替换出来后进入V区,且tag Cache中存储的通常是可能被处理器频繁访问到的那些数据对应的tag。
在本申请实施例中,上述结构可称之为TDV(Tag Cache-Direct-Victim)Cache。本申请实施例通过TDV结构实现了将DRAM分成了D区和V区两个分区,且D区采用直接映射,V区采用组相联映射,不但结合了直接映射在命中时访问延迟低和组相联映射命中率高的各自优势,而且通过tag Cache来对V区中的部分缓存块对应的tag进行存储,因此在使用大容量DRAM作为片外缓存的场景下,依然可以保证tag Cache发挥作用。
下面结合这种结构,对本申请实施例涉及的数据访问方法进行详细介绍,具体参见下述实施例。需要说明的一点是,在下述实施例中,缓存指代tag Cache、第一存储器指代DRAM、第二存储器指代NVM、第一分区指代D区、第二分区指代V区。如前文所示,所述缓存用于缓存第一存储器的第二分区中的至少部分缓存块对应的tag,第二分区用于缓存从第一分区中替换出来的数据,第二存储器中的一个内存块映射到所述第一分区的一个缓存块,即第一分区采取直接映射;第二存储器中的一个内存块映射到第二分区的一个分组中,即第二分区采取组相联映射。
图7是本申请实施例提供的一种数据访问方法的流程图。参见图7,本申请实施例提供的方法流程包括:
701、处理器接收访问请求,该访问请求中携带有访问地址,处理器根据该访问地址获得待访问的数据的物理地址。
由于软件程序所使用的是逻辑地址,因此处理器在接收到一个逻辑地址形式的访问地址后,还需先将该访问地址转换成物理地址,之后基于物理地址方可实现实际内存访问。换一种表达方式,处理器可通过地址转换技术,实现将接收到的访问请求中携带的访问地址转换为待访问的数据的物理地址。
702、处理器根据获得的物理地址中的tag确定接收到的访问请求是否命中缓存;如果命中缓存,则执行下述步骤703。如果未命中缓存,则执行下述步骤704。
在本申请实施例中,可采取下述方式来确定接收到的访问请求是否命中缓存:
由于缓存中存储的是第二分区的部分缓存块对应的tag,且第二分区采取组相联映射,因此获得的物理地址可以拆分出来三部分,分别为高地址对应的tag、中间地址对应的一个第二分区的分组的地址、以及低地址对应的块内偏移地址。之后,便可以根据获取到的这个分组的地址,从缓存中存储的tag中获取这个分组中全部缓存块对应的tag,在本申请实施例中称 之为N个tag;接下来,将获得的物理地址中的tag分别与N个tag中的每一个tag进行比对;当获得的物理地址中的tag与N个tag中的一个tag一致时,确定接收到的访问请求命中缓存。
在一种可能的实现方式中,在确定获得的物理地址中的tag与N个tag中的一个tag一致后,还可以继续对有效位进行判断,当确定有效位指示数据可用时,处理器确定接收到的访问请求命中缓存,本申请实施例对此不进行具体限定。
703、当处理器接收到的访问请求命中缓存时,处理器向内存控制器发送携带有获得的物理地址的访存请求,内存控制器根据该访存请求中携带的物理地址,从第二分区中获取上述访问请求待访问的数据,并将获取到的数据返回给处理器。
由于访问请求命中缓存,则说明待访问的数据存储在第二分区中,因此内存控制器从第二分区获取待访问的数据。
需要说明的是,由于上述提及的缓存设置在处理器上,因此在确定访问请求是否命中缓存的判断时,访问tag的开销可忽略不计,所以针对该种情况,仅存在由于内存控制器访问1次第一存储器的第二分区以获取待访问的数据的系统开销。换句话说,针对该种情况通过对第一存储器的1次访问完成了数据读取,减小了访问延迟,降低了系统开销。
需要说明的是,本申请实施例在执行上述步骤701至步骤703所示的数据访问流程时,在上述步骤中出现的访问请求可称之为第一访问请求,访问地址可称之为第一访问地址,物理地址可称之为第一物理地址,访存请求可称之为第一访存请求,待访问的数据可称之为待访问的第一数据。
704、当处理器接收到的访问请求未命中缓存时,处理器向内存控制器发送携带有获得的物理地址的访存请求,内存控制器根据该物理地址中的tag判断该访存请求是否命中第一存储器的第一分区;如果命中第一分区,则执行下述步骤705;如果未命中第一分区,则执行下述步骤706。
在本申请实施例中,由于进行是否命中第一分区的判断,且第一分区采取直接映射,因此获得的物理地址可以拆分出来三部分,分别为高地址对应的tag、中间地址对应的缓存块的块地址、以及低地址对应的块内偏移地址。之后,内存控制器便可以根据获取到的这个块地址,从第一分区存储的tag中获取这个块地址指示的缓存块对应的tag。如果获得的物理地址中的tag与这个块地址指示的缓存块对应的tag一致,则内存控制器确定接收到的访存请求命中第一分区。
在一种可能的实现方式中,在确定获得的物理地址中的tag与这个块地址指示的缓存块对应的tag一致后,还可以继续对有效位进行判断,当确定有效位指示数据可用时,内存控制器确定接收到的访问请求命中缓存,本申请实施例对此不进行具体限定。
在一种可能的实现方式中,内存控制器在访问第一分区获取tag时,还可一同将对应缓存块中存储的数据读取出来。其中,在本申请实施例中可通过增大burst技术或者采用ECC编码的方式,来实现1次访问第一分区同时得到tag以及数据。
而内存控制器之所以将数据同tag一并读取出来,是为了若后续确定接收到的访存请求命中第一分区时,则可以无需再访问1次第一分区以获取待访问的数据。即,在命中的情况下,内存控制器通过1次访问第一分区便可实现数据的获取,所以直接映射的访问延迟要小,降低了系统开销。
705、当内存控制器接收到的访存请求命中第一分区时,内存控制器将从第一分区中获取 到的该访问请求待访问的数据返回给处理器。
由于访存请求命中第一分区,则说明待访问的数据存储在第一分区中,因此内存控制器从第一分区获取待访问的数据。
需要说明的是,本申请实施例在执行上述步骤701、702、704以及705所示的数据访问流程时,在上述步骤中出现的访问请求可称之为第二访问请求,访问地址可称之为第二访问地址,物理地址可称之为第二物理地址,访存请求可称之为第二访存请求,待访问的数据可称之为待访问的第二数据。
706、当内存控制器接收到的访存请求未命中第一分区时,内存控制器根据该物理地址中的tag判断该访存请求是否命中第二分区;如果命中第二分区,则执行下述步骤707;如果未命中第二分区,则执行下述步骤708。
在本申请实施例中,如果缓存未命中且第一分区也未命中,则内存控制器再根据获得的物理地址中的tag判断访存请求是否命中第二分区。其中,对于是否命中第二分区的判断同上述步骤702类似,此处不再赘述。
在一种可能的实现方式中,在缓存未命中且第一分区也未命中的情况下,内存控制器除了访问第二分区执行上述步骤706之外,还可以同步访问第二存储器,尝试从第二存储器中进行数据获取。
即,内存控制器在访问第二分区的同时,根据该物理地址从第二存储器中获取待访问的数据。当接收到的访存请求命中第二分区时,内存控制器直接可从第二分区中读取数据,而由于第二存储器的数据读取速度要远远落后于第二分区,所以此时还来得及阻止第二存储器进行数据读取。当接收到的访存请求未命中第二分区时,由于已经启动从第二存储器获取数据的流程了,因此其效果要远远优于在确定访存请求未命中第二分区时才从第二存储器进行数据获取。
707、当内存控制器接收到的访存请求命中第二分区时,内存控制器从第二分区中获取该访问请求待访问的第三数据,并将获取到的数据返回给处理器。
由于访存请求命中第二分区,则说明待访问的数据存储在第二分区中,因此内存控制器从第二分区获取待访问的数据。
需要说明的是,本申请实施例在执行上述步骤701、702、704、706以及707所示的数据访问流程时,在上述步骤中出现的访问请求可称之为第三访问请求,访问地址可称之为第三访问地址,物理地址可称之为第三物理地址,访存请求可称之为第三访存请求,待访问的数据可称之为待访问的第三数据。
708、当内存控制器接收到的访存请求未命中第二分区时,内存控制器根据该物理地址从第二存储器中获取该访问请求待访问的数据,并将获取到的数据返回给处理器。
由于处理器接收到的访问请求未命中缓存,内存控制器接收到的访存请求未命中第一分区以第二分区,所以内存控制器直接从第二存储器中获取待访问的数据。
709、内存控制器将从第二存储器获取到的数据存储在第一分区中,并将第一分区中被替换出来的数据存储于第一存储器的第二分区。
由于第一分区采取直接映射,因此待访问的数据是需要映射到第一分区的特定缓存块上的。在本申请实施例中,为了将待访问的数据复制到这些特定缓存块上,还需将这些特定缓存块上原来缓存的数据替换出去,其中被替换出来的数据在本申请实施例中是迁移到第二分 区。在将第一分区中被替换的数据存储在第二分区后,当后续处理器请求访问被替换的数据时,直接从第二分区中进行数据获取即可,避免了因将被替换的数据存储在第二存储器中,而造成的后续访问被替换的数据时数据读取速度缓慢。
在一种可能的实现方式中,由于第一分区上存储的数据发生了变化,所以除了进行数据替换之外,还需要同步修改tag,将替换出去的数据对应的tag更新为待访问的数据对应的tag。同理,在将从第一分区中被替换出来的数据存入第二分区后,V区中存储的tag也要同步进行一下更新。总结来说,在第一分区和第二分区中有数据发生更新时,tag也同步进行更新。此外,缓存中存储的tag也可进行更新,比如处理器经常访问第二分区中存储的某一数据,则该数据对应的tag便可以存入缓存中。或者,在将从第一分区中被替换出来的数据存入第二分区后,也可直接将被替换出来的数据对应的tag存入缓存中,本申请实施例对此不进行具体限定。
需要说明的是,本申请实施例在执行上述步骤701、702、704、706、708以及709所示的数据访问流程时,在上述步骤中出现的访问请求可称之为第四访问请求,访问地址可称之为第四访问地址,物理地址可称之为第四物理地址,访存请求可称之为第四访存请求,待访问的数据可称之为待访问的第四数据。
本申请实施例提供的方法,处理器在接收到访问请求后,在访问DRAM的第一分区以及第二分区之前,可以首先访问缓存,如果访问请求命中缓存,则处理器控制内存控制器直接从DRAM的采取组相联映射的第二分区获取访问请求待访问的数据,由于在进行是否命中缓存的判断时访问缓存读tag的系统开销可以忽略不计,所以仅需对第一存储器进行一次访问以获取数据,因此在确保了高命中率的同时有效地减小了系统开销,提升了计算机系统的性能。
此外,如果访问请求未命中缓存,则处理器还可向内存控制器发送访存请求,以使内存控制器再去判断访存请求是否命中第一分区,其中内存控制器在访问第一分区进行tag获取时,还可将相应数据一并取出来,这样在命中第一分区便可实现通过一次访问DRAM完成数据获取,因此有效减小了命中第一分区时的系统开销,提升了计算机系统的性能。
图8是本申请实施例提供的一种计算机系统的结构示意图。参见图8,该计算机系统包括处理器801、缓存802、第一存储器803、第二存储器804以及内存控制器805,其中,第一存储器803用于缓存802第二存储器804中的数据,缓存802用于缓存802第一存储器803的第二分区中的至少部分缓存802块对应的tag,第一存储器803包括第一分区以及第二分区,第二分区用于缓存802从第一分区中替换出来的缓存802块,第二存储器804中的一个内存块映射到第一分区的一个缓存802块,第二存储器804中的一个内存块映射到第二分区的一个分组中,第二分区的一个分组包括多个缓存802块;
处理器801,用于根据第一访问请求中的第一访问地址获得第一物理地址;
处理器801,还用于根据第一物理地址中的第一标签tag确定第一访问请求是否命中缓存802;
处理器801,还用于当第一访问请求命中缓存802时,向内存控制器805发送第一访存请求,第一访存请求中携带有第一物理地址;
内存控制器805,用于根据第一物理地址,从第二分区中获取第一访问请求待访问的第 一数据。
本申请实施例提供的计算机系统,将第一存储器分为了两个区,即采取直接映射的第一分区以及采取组相联映射的第二分区,且利用于缓存来存储第二分区中的部分缓存块对应的tag,这样的设计同时兼顾了直接映射与组相联映射的优点,处理器在接收到一个访问请求后,在访问第一存储器的第一分区以及第二分区之前可以先访问缓存,如果访问请求命中缓存,则处理器可控制内存控制器直接从第一存储器的第二分区获取访问请求待访问的数据,由于在进行是否命中缓存的判断时访问缓存读tag的系统开销可以忽略不计,所以仅需对第一存储器进行一次访问以获取数据,因此在确保了高命中率的同时有效地减小了系统开销,提升了计算机系统的性能。
在另一种可能的实现方式中,处理器801,还用于根据第二访问请求中的第二访问地址获得第二物理地址;
处理器801,还用于当根据第二物理地址中的第二tag确定第二访问请求未命中缓存802时,向内存控制器805发送第二访存请求,第二访存请求中携带有第二物理地址;
内存控制器805,还用于根据第二tag判断第二访存请求是否命中第一存储器803的第一分区;
内存控制器805,还用于当第二访存请求命中第一存储器803的第一分区时,从第一分区中获取第二访问请求待访问的第二数据。
在另一种可能的实现方式中,处理器801,还用于根据第三访问请求中的第三访问地址获得第三物理地址;
处理器801,还用于当根据第三物理地址中的第三tag确定第三访问请求未命中缓存802时,向内存控制器805发送第三访存请求,第三访存请求中携带有第三物理地址;
内存控制器805,还用于根据第三tag判断第三访存请求是否命中第一存储器803的第一分区;
内存控制器805,还用于当第三访存请求未命中第一存储器803的第一分区时,根据第三tag判断第三访存请求是否命中第一存储器803的第二分区;
内存控制器805,还用于当第三访存请求命中第一存储器803的第二分区时,从第二分区中获取第三访问请求待访问的第三数据。
在另一种可能的实现方式中,处理器801,还用于根据第四访问请求中的第四访问地址获得第四物理地址;
处理器801,还用于当根据第四物理地址中的第四tag确定第四访问请求未命中缓存802时,内存控制器805发送第四访存请求,第四访存请求中携带有第四物理地址;
内存控制器805,还用于当根据第四tag确定第四访存请求未命中第一存储器803的第一分区以及第二分区时,根据第四物理地址从第二存储器804中获取第四访问请求待访问的第四数据;
内存控制器805,还用于将第四数据存储在第一存储器803的第一分区中,并将第一分区中被替换出来的数据存储于第一存储器803的第二分区。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成, 也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (8)

  1. 一种数据访问方法,其特征在于,应用于计算机系统,所述计算机系统包括处理器、缓存、第一存储器、第二存储器以及内存控制器,所述第一存储器用于缓存所述第二存储器中的数据,所述方法包括:
    所述处理器根据第一访问请求中的第一访问地址获得第一物理地址;
    所述处理器根据所述第一物理地址中的第一标签tag确定所述第一访问请求是否命中所述缓存,其中,所述缓存用于缓存所述第一存储器的第二分区中的至少部分缓存块对应的tag,所述第一存储器包括第一分区以及所述第二分区,所述第二分区用于缓存从所述第一分区中替换出来的缓存块,所述第二存储器中的一个内存块映射到所述第一分区的一个缓存块,所述第二存储器中的一个内存块映射到所述第二分区的一个分组中,所述第二分区的一个分组包括多个缓存块;
    当所述第一访问请求命中所述缓存时,所述处理器向所述内存控制器发送第一访存请求,所述第一访存请求中携带有所述第一物理地址;
    所述内存控制器根据所述第一物理地址,从所述第二分区中获取所述第一访问请求待访问的第一数据。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述处理器根据第二访问请求中的第二访问地址获得第二物理地址;
    当所述处理器根据所述第二物理地址中的第二tag确定所述第二访问请求未命中所述缓存时,所述处理器向所述内存控制器发送第二访存请求,所述第二访存请求中携带有所述第二物理地址;
    所述内存控制器根据所述第二tag判断所述第二访存请求是否命中所述第一存储器的第一分区;
    当所述第二访存请求命中所述第一存储器的第一分区时,所述内存控制器从所述第一分区中获取所述第二访问请求待访问的第二数据。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述处理器根据第三访问请求中的第三访问地址获得第三物理地址;
    当所述处理器根据所述第三物理地址中的第三tag确定所述第三访问请求未命中所述缓存时,所述处理器向所述内存控制器发送第三访存请求,所述第三访存请求中携带有所述第三物理地址;
    所述内存控制器根据所述第三tag判断所述第三访存请求是否命中所述第一存储器的第一分区;
    当所述第三访存请求未命中所述第一存储器的第一分区时,所述内存控制器根据所述第三tag判断所述第三访存请求是否命中所述第一存储器的第二分区;
    当所述第三访存请求命中所述第一存储器的第二分区时,所述内存控制器从所述第二分区中获取所述第三访问请求待访问的第三数据。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述处理器根据第四访问请求中的第四访问地址获得第四物理地址;
    当所述处理器根据所述第四物理地址中的第四tag确定所述第四访问请求未命中所述缓存时,所述处理器向所述内存控制器发送第四访存请求,所述第四访存请求中携带有所述第四物理地址;
    当所述内存控制器根据所述第四tag确定所述第四访存请求未命中所述第一存储器的第一分区以及第二分区时,所述内存控制器根据所述第四物理地址从所述第二存储器中获取所述第四访问请求待访问的第四数据;
    所述内存控制器将所述第四数据存储在所述第一存储器的第一分区中,并将所述第一分区中被替换出来的数据存储于所述第一存储器的第二分区。
  5. 一种计算机系统,其特征在于,所述计算机系统包括处理器、缓存、第一存储器、第二存储器以及内存控制器,其中,所述第一存储器用于缓存所述第二存储器中的数据,所述缓存用于缓存所述第一存储器的第二分区中的至少部分缓存块对应的标签tag,所述第一存储器包括第一分区以及所述第二分区,所述第二分区用于缓存从所述第一分区中替换出来的缓存块,所述第二存储器中的一个内存块映射到所述第一分区的一个缓存块,所述第二存储器中的一个内存块映射到所述第二分区的一个分组中,所述第二分区的一个分组包括多个缓存块;
    所述处理器,用于根据第一访问请求中的第一访问地址获得第一物理地址;
    所述处理器,还用于根据所述第一物理地址中的第一tag确定所述第一访问请求是否命中所述缓存;
    所述处理器,还用于当所述第一访问请求命中所述缓存时,向所述内存控制器发送第一访存请求,所述第一访存请求中携带有所述第一物理地址;
    所述内存控制器,用于根据所述第一物理地址,从所述第二分区中获取所述第一访问请求待访问的第一数据。
  6. 根据权利要求5所述的计算机系统,其特征在于,
    所述处理器,还用于根据第二访问请求中的第二访问地址获得第二物理地址;
    所述处理器,还用于当根据所述第二物理地址中的第二tag确定所述第二访问请求未命中所述缓存时,向所述内存控制器发送第二访存请求,所述第二访存请求中携带有所述第二物理地址;
    所述内存控制器,还用于根据所述第二tag判断所述第二访存请求是否命中所述第一存储器的第一分区;
    所述内存控制器,还用于当所述第二访存请求命中所述第一存储器的第一分区时,从所述第一分区中获取所述第二访问请求待访问的第二数据。
  7. 根据权利要求5所述的计算机系统,其特征在于,
    所述处理器,还用于根据第三访问请求中的第三访问地址获得第三物理地址;
    所述处理器,还用于当根据所述第三物理地址中的第三tag确定所述第三访问请求未命 中所述缓存时,向所述内存控制器发送第三访存请求,所述第三访存请求中携带有所述第三物理地址;
    所述内存控制器,还用于根据所述第三tag判断所述第三访存请求是否命中所述第一存储器的第一分区;
    所述内存控制器,还用于当所述第三访存请求未命中所述第一存储器的第一分区时,根据所述第三tag判断所述第三访存请求是否命中所述第一存储器的第二分区;
    所述内存控制器,还用于当所述第三访存请求命中所述第一存储器的第二分区时,从所述第二分区中获取所述第三访问请求待访问的第三数据。
  8. 根据权利要求5所述的计算机系统,其特征在于,
    所述处理器,还用于根据第四访问请求中的第四访问地址获得第四物理地址;
    所述处理器,还用于当根据所述第四物理地址中的第四tag确定所述第四访问请求未命中所述缓存时,所述内存控制器发送第四访存请求,所述第四访存请求中携带有所述第四物理地址;
    所述内存控制器,还用于当根据所述第四tag确定所述第四访存请求未命中所述第一存储器的第一分区以及第二分区时,根据所述第四物理地址从所述第二存储器中获取所述第四访问请求待访问的第四数据;
    所述内存控制器,还用于将所述第四数据存储在所述第一存储器的第一分区中,并将所述第一分区中被替换出来的数据存储于所述第一存储器的第二分区。
PCT/CN2018/107553 2017-09-29 2018-09-26 数据访问方法以及计算机系统 WO2019062747A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710911982.3A CN109582214B (zh) 2017-09-29 2017-09-29 数据访问方法以及计算机系统
CN201710911982.3 2017-09-29

Publications (1)

Publication Number Publication Date
WO2019062747A1 true WO2019062747A1 (zh) 2019-04-04

Family

ID=65900574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107553 WO2019062747A1 (zh) 2017-09-29 2018-09-26 数据访问方法以及计算机系统

Country Status (2)

Country Link
CN (1) CN109582214B (zh)
WO (1) WO2019062747A1 (zh)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125794B (zh) * 2019-12-31 2023-09-26 海光云芯集成电路设计(上海)有限公司 访存控制方法、系统及存储装置控制器
CN111241009B (zh) * 2019-12-31 2023-05-16 西安翔腾微电子科技有限公司 一种数据反馈方法及装置
CN115698964A (zh) * 2020-07-30 2023-02-03 华为技术有限公司 缓存系统、方法和芯片
CN112558889B (zh) * 2021-02-26 2021-05-28 北京微核芯科技有限公司 一种基于SEDRAM的堆叠式Cache系统、控制方法和Cache装置
CN113900816B (zh) * 2021-10-15 2024-03-22 无锡江南计算技术研究所 多深度缓冲激活重发方法及装置
CN113703690B (zh) * 2021-10-28 2022-02-22 北京微核芯科技有限公司 处理器单元、访问内存的方法、计算机主板和计算机系统
CN113900966B (zh) * 2021-11-16 2022-03-25 北京微核芯科技有限公司 一种基于Cache的访存方法及装置
CN114036084B (zh) * 2021-11-17 2022-12-06 海光信息技术股份有限公司 一种数据访问方法、共享缓存、芯片系统和电子设备
CN115168247B (zh) * 2022-09-02 2022-12-02 北京登临科技有限公司 用于并行处理器中动态共享存储空间的方法及相应处理器
CN115757203B (zh) * 2023-01-10 2023-10-10 摩尔线程智能科技(北京)有限责任公司 访存策略管理方法及装置、处理器和计算设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1746865A (zh) * 2005-10-13 2006-03-15 上海交通大学 数字信号处理器可重构指令高速缓存部分的实现方法
CN102012872A (zh) * 2010-11-24 2011-04-13 烽火通信科技股份有限公司 一种用于嵌入式系统的二级缓存控制方法及装置
US20120030453A1 (en) * 2010-08-02 2012-02-02 Canon Kabushiki Kaisha Information processing apparatus, cache apparatus, and data processing method
US20120215979A1 (en) * 2011-02-21 2012-08-23 Advanced Micro Devices, Inc. Cache for storing multiple forms of information and a method for controlling a cache storing multiple forms of information
CN106126440A (zh) * 2016-06-22 2016-11-16 中国科学院计算技术研究所 一种改善数据在缓存中空间局部性的缓存方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100409203C (zh) * 2005-10-14 2008-08-06 杭州中天微系统有限公司 一种低功耗高速缓存的实现方法及其高速缓存器
US8489801B2 (en) * 2009-03-04 2013-07-16 Henry F. Huang Non-volatile memory with hybrid index tag array
CN104216837A (zh) * 2013-05-31 2014-12-17 华为技术有限公司 一种内存系统、内存访问请求的处理方法和计算机系统
US9558120B2 (en) * 2014-03-27 2017-01-31 Intel Corporation Method, apparatus and system to cache sets of tags of an off-die cache memory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1746865A (zh) * 2005-10-13 2006-03-15 上海交通大学 数字信号处理器可重构指令高速缓存部分的实现方法
US20120030453A1 (en) * 2010-08-02 2012-02-02 Canon Kabushiki Kaisha Information processing apparatus, cache apparatus, and data processing method
CN102012872A (zh) * 2010-11-24 2011-04-13 烽火通信科技股份有限公司 一种用于嵌入式系统的二级缓存控制方法及装置
US20120215979A1 (en) * 2011-02-21 2012-08-23 Advanced Micro Devices, Inc. Cache for storing multiple forms of information and a method for controlling a cache storing multiple forms of information
CN106126440A (zh) * 2016-06-22 2016-11-16 中国科学院计算技术研究所 一种改善数据在缓存中空间局部性的缓存方法及装置

Also Published As

Publication number Publication date
CN109582214A (zh) 2019-04-05
CN109582214B (zh) 2020-04-28

Similar Documents

Publication Publication Date Title
WO2019062747A1 (zh) 数据访问方法以及计算机系统
US10248576B2 (en) DRAM/NVM hierarchical heterogeneous memory access method and system with software-hardware cooperative management
US11210020B2 (en) Methods and systems for accessing a memory
TWI234709B (en) Weighted cache line replacement
US8291175B2 (en) Processor-bus attached flash main-memory module
TWI393050B (zh) 促進多重處理器介面之板內建快取記憶體系統之記憶體裝置及方法及使用其之電腦系統
WO2019128958A1 (zh) 缓存替换技术
CN110362504A (zh) 对一致性链路和多级存储器的管理
CN105740164A (zh) 支持缓存一致性的多核处理器、读写方法、装置及设备
CN107818052B (zh) 内存访问方法及装置
US20170060434A1 (en) Transaction-based hybrid memory module
US20120102273A1 (en) Memory agent to access memory blade as part of the cache coherency domain
JP2017151982A (ja) データストレージサブシステムにおけるキャッシングのための方法およびシステム
US20110161597A1 (en) Combined Memory Including a Logical Partition in a Storage Memory Accessed Through an IO Controller
CN111512290B (zh) 文件页表管理技术
JP2009524137A (ja) 上位レベル・キャッシュのエビクション候補を識別するための巡回スヌープ
US11714752B2 (en) Nonvolatile physical memory with DRAM cache
US7117312B1 (en) Mechanism and method employing a plurality of hash functions for cache snoop filtering
WO2018119773A1 (zh) 非易失内存访问方法、装置和系统
CN107870867B (zh) 32位cpu访问大于4gb内存空间的方法与装置
US7325102B1 (en) Mechanism and method for cache snoop filtering
US20210056030A1 (en) Multi-level system memory with near memory capable of storing compressed cache lines
US11126573B1 (en) Systems and methods for managing variable size load units
US20180285274A1 (en) Apparatus, method and system for just-in-time cache associativity
CN107870870B (zh) 访问超过地址总线宽度的内存空间

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18861036

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18861036

Country of ref document: EP

Kind code of ref document: A1