WO2022012307A1 - 数据访问的方法和处理器系统 - Google Patents

数据访问的方法和处理器系统 Download PDF

Info

Publication number
WO2022012307A1
WO2022012307A1 PCT/CN2021/102603 CN2021102603W WO2022012307A1 WO 2022012307 A1 WO2022012307 A1 WO 2022012307A1 CN 2021102603 W CN2021102603 W CN 2021102603W WO 2022012307 A1 WO2022012307 A1 WO 2022012307A1
Authority
WO
WIPO (PCT)
Prior art keywords
level cache
cache
data
memory
level
Prior art date
Application number
PCT/CN2021/102603
Other languages
English (en)
French (fr)
Inventor
周轶刚
栗炜
尹文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022012307A1 publication Critical patent/WO2022012307A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0658Controller construction arrangements

Definitions

  • the present application relates to the field of computing, and in particular, to a data access method and a processor system.
  • the structure and capacity of the cache are important performance indicators of the Central Processing Unit (CPU), which have a great impact on the speed of the CPU.
  • CPU Central Processing Unit
  • the operating frequency of the cache in the CPU is extremely high, generally it can operate at the same frequency as the processor, and the efficiency is much higher than the system memory and hard disk.
  • the CPU often needs to read the same data block repeatedly, and the increase of the cache capacity can greatly improve the hit rate of the CPU read data, thereby improving the system performance.
  • the cache capacity is generally small.
  • the CPU cache can be divided into a first-level cache L1, a second-level cache L2, and some high-end CPUs also have a third-level cache L3. Data are all part of the next-level cache.
  • the technical difficulty and manufacturing cost of these three caches are relatively decreasing, and their capacity is also relatively increasing.
  • the capacity of the above-mentioned third-level cache is limited, therefore, a large-capacity fourth-level cache L4 is also introduced in the prior art.
  • the CPU When the CPU wants to read the data to be accessed, it sequentially searches for the data to be accessed from L1-L3. When none of the L1-L3 caches miss the data to be accessed, the controller of the L3 cache determines whether the data is in L4 based on the L4 tags (tags) stored in the L3 cache, and if not, reads the data to be accessed from the memory data.
  • the present application provides a data access method and a processor system, which avoid storing lower-level cache tags in the upper-level cache, thereby avoiding the space waste of the upper-level cache.
  • the present application provides a data access method, the method is applied to a processor system, and the processor system includes: a processor core, an upper-level cache, and a lower-level cache, wherein the upper-level cache and the The lower-level caches are located in different dies, and the method includes: when the upper-level cache does not store the data to be accessed, the upper-level cache sends a data read request to the lower-level cache and the memory, respectively, from the lower-level cache or the memory. Either party in the memory receives the previously returned data to be accessed.
  • the upper-level cache and the lower-level cache respectively include hardware control logic and a cache space, such as a cache controller (Cache Controller).
  • the cache controller is used to communicate with the CPU core or the cache controller of other caches, and perform operations on the cache space of this level of cache, such as query, write and aging, etc.
  • the method of the first aspect may be executed by a cache controller of the upper-level cache.
  • the receiving the data to be accessed previously returned from either the lower-level cache or the memory includes: receiving the previously returned data from the lower-level cache in advance When the data to be accessed, discard the data read response returned by the memory received later; or, when the data to be accessed is not stored in the lower-level cache, obtain the data to be accessed returned from the memory .
  • the lower-level cache returns the to-be-accessed data to the upper-level cache faster than the memory returns the The speed of the data to be accessed.
  • the upper-level cache wants the processor core to return the to-be-accessed data, and discards the to-be-accessed data returned by the memory received later.
  • the lower-level cache When the data to be accessed is not cached in the lower-level cache, the lower-level cache will return a query response (snoop response) to the upper-level cache, and the query response is used to indicate that the valid data to be accessed is not cached in the lower-level cache. .
  • the upper-level cache receives the data to be accessed from the memory.
  • the die where the upper-level cache is located and the die where the lower-level cache is located are packaged together to form a processor chip, and the upper-level cache communicates with the lower-level cache through an inter-die bus.
  • the upper level cache and processor core can be on the same CPU die.
  • the lower-level cache is located outside the processor chip where the upper-level cache is located, and the upper-level cache communicates with the lower-level cache through an inter-chip bus.
  • the upper-level cache communicates with the memory through a first port of a memory controller on the processor chip
  • the lower-level cache communicates with the memory through a second port of a memory controller on the processor chip
  • a port communicates with the memory.
  • the memory is a dual-ported random access memory (Dual-ported Random Access Memory, DPRAM)
  • the processor system further includes an IO die, and the IO die may be used to implement the connection between the upper-level cache and the lower-level cache and the memory.
  • the IO die connects the upper-level cache and the memory through the memory controller on the IO die, and connects the lower-level cache and the memory.
  • the lower-level cache communicates with the memory through a DDR bus (PHY) existing between the memory controller and the memory.
  • PHY DDR bus
  • the lower-level cache updates the aged cache data into the memory through the DDR bus; or the lower-level cache obtains the data to be cached from the memory through the DDR bus.
  • the lower-level cache updates the aged cache data into the memory through the memory controller.
  • the upper-level cache sends the aged first cache line cache line to the lower-level cache; if the first cache line hits in the lower-level cache, the lower-level cache uses The received first cache line updates the local cache; if the first cache line does not hit the lower-level cache, the first cache line is written into the lower-level cache.
  • the upper-level cache is the third-level cache L3, and the lower-level cache is the fourth-level cache L4; or,
  • the upper-level cache is the second-level cache L2, and the lower-level cache is the third-level cache L3.
  • an embodiment of the present application further provides a processor system, including a processor core, an upper-level cache, and a lower-level cache, wherein the upper-level cache and the lower-level cache are located on different dies,
  • the upper-level cache is used to send a data read request to the lower-level cache and the memory respectively when the data to be accessed is not stored in the upper-level cache, so as to request the to-be-accessed data, from the lower-level cache or the memory. Either party in the memory receives the previously returned data to be accessed.
  • the actions of the second aspect may be performed by the cache controller of the upper-level cache.
  • an embodiment of the present application further provides a processor system, including a processor core, an upper-level cache, and a lower-level cache, wherein the upper-level cache and the lower-level cache are located on different dies, and there is no cache in the upper-level cache.
  • the tag tag of the lower-level cache is stored.
  • an embodiment of the present application further provides a processor system, including a processor core, a first-level cache, a second-level cache, a third-level cache, and a fourth-level cache, wherein the processor core, The first-level cache, the second-level cache and the third-level cache are located on the processor die, the fourth-level cache and the third-level cache are located on a different die,
  • the third-level cache is used to send data read to the fourth-level cache when none of the first-level cache, the second-level cache, and the third-level cache store the data to be accessed requested by the processor core. fetch request;
  • the fourth-level cache acquires the to-be-accessed data from the memory when the fourth-level cache does not store the to-be-accessed data.
  • the fourth-level cache and the third-level cache are in different dies in the same processor chip, and communicate with the third-level cache through an inter-die bus; or,
  • the fourth-level cache is located outside the processor chip where the third-level cache is located, and communicates with the third-level cache through an inter-chip bus domain.
  • an embodiment of the present application further provides a server, including a memory and the processor system according to any one of the foregoing second to fourth aspects.
  • an embodiment of the present application further provides a processor chip, including a processor core, a first cache, and a second cache, wherein the first cache and the second cache are located in the processor chip different die,
  • the first cache is configured to send a data read request to the second cache, so as to obtain the data to be accessed requested by the processor core;
  • the first cache acquires the data to be accessed from the memory when the data to be accessed is not stored in the second cache.
  • an embodiment of the present application further provides a processor chip, including a processor core and a first cache,
  • the first cache is used to send a data read request to the second cache outside the processor chip, so as to obtain the data to be accessed requested by the processor core;
  • the first cache acquires the data to be accessed from the memory when the data to be accessed is not stored in the second cache.
  • an embodiment of the present application further provides another data access method, the method is applied to a processor system, and the processor system includes: a processor core, an upper-level cache, and a lower-level cache, wherein the upper-level cache The tag tag of the lower-level cache is not stored in the cache, and the method includes:
  • the previously returned data to be accessed is received from either the lower-level cache or the memory.
  • an embodiment of the present application further provides another processor system, including: a first die and a second die, the first die includes an upper-level cache and a memory controller, the second die includes a lower-level cache,
  • the lower-level cache is used to determine a cache line to be aged, and send the cache line to the second die;
  • the second die is used for sending the cache line to the memory through the port of the memory controller.
  • the embodiments of the present application further provide another data access method, and the method is applied to a processor system, where the processor system includes: an upper-level cache and a lower-level cache, wherein the upper-level cache and the Lower-level caches are located in different dies, and the methods include:
  • the lower-level cache determines the cache line to be aged
  • the lower-level cache sends the cache line to the memory through a memory controller port on the die where the upper-level cache is located.
  • an embodiment of the present application provides a cache controller, where the cache controller is configured to execute the methods of the first aspect, the eighth aspect, and the tenth aspect.
  • FIG. 1 is a schematic diagram of an example of a cache structure provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a cache line structure provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a processor system provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another processor system provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for a processor core to read data to be accessed according to an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of another processor system provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another processor system provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of another processor system provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a method for data access provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of another data access provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a method for aging (evict) cached data provided by an embodiment of the application.
  • Processors often contain multiple levels of cache.
  • instructions Instruction
  • data data
  • the program and data share a set of caches.
  • the cached content is collectively referred to as data
  • the data to be accessed by the processor core is referred to as the data to be accessed.
  • FIG. 1 it is a schematic diagram of an example of a cache structure provided by an embodiment of the present application.
  • the processor core 0 and the processor core 1 respectively exclusively share the L1 cache and the L2 cache, and share the L3 cache.
  • the cache space is divided into multiple cache lines (CL), and the size of each cache line can be 32byte, 64byte, etc.
  • the cache line is the smallest unit of data exchange between the cache and the memory.
  • each cache line usually includes three parts: a flag bit (valid), a tag and a block, wherein the flag bit indicates whether the cache line is valid or not , the tag stores the address of the memory block corresponding to the cache line, and the block stores the data corresponding to the address of the memory block.
  • the processor core When the processor core needs to read some data, it first searches in the cache, searches the cache line according to the address of the data to be accessed, and verifies the valid flag of the cache line. If the valid flag of the cache line found according to the address is valid , the data is read from the cache, which is also called a cache hit. When the cache hits, the processor core reads the data from the cache.
  • the cache determines whether there is to-be-accessed data read by the processor core in the local cache by using the index and the address in the tag.
  • the address in the tag may specifically be the high-order address or part of the memory address bits of the data memory address to be accessed, and the index may be the low-order address of the data memory address to be accessed.
  • the method for judging whether the cache is hit may adopt a common method in the field, which is not limited in the embodiment of the present application.
  • FIG. 3 a schematic structural diagram of a processor system provided by an embodiment of the present application is shown.
  • the processor system includes a CPU die 30, a fourth-level cache 32 and a memory 34.
  • the illustrated CPU die 30 includes a processor core 301, a first-level cache L1I302a and L1D302b, a second-level cache 303 and a third-level cache. 304.
  • the tag is used to represent address information of the data stored in the fourth-level cache, for example, the address of the data in the memory.
  • the third-level cache can determine whether the data to be accessed by the processor core is cached in the fourth-level cache through the tag. At this point, as the capacity of the fourth-level cache increases, the L4tag will occupy more space in the third-level cache.
  • the processor core sends an access request, the access request carries the memory address of the data to be accessed, and the cache queries whether the data to be accessed is locally cached according to the memory address.
  • a data access request is sent to the lower-level cache, and the data access request may be a load request.
  • the third-level cache determines whether the to-be-accessed data is cached locally.
  • the tag of the first-level cache determines whether the data to be accessed is stored in the fourth-level cache.
  • the to-be-accessed data is cached in the fourth-level cache
  • a load request is sent to the fourth-level cache and read from the fourth-level cache.
  • the data to be accessed is fetched, and if the data to be accessed is not stored in the fourth level cache, the third level cache acquires the data to be accessed from the memory. Since the fourth-level cache is a large-capacity cache, the tags of the fourth-level cache occupy valuable storage space of the third-level cache, thereby reducing the amount of data that can be stored in the third-level cache and affecting the performance of the processor system.
  • the fourth-level cache with a capacity of 512MB may need to occupy about 25M space in the third-level cache to store the L4tags of the fourth-level cache, so that the 25M space in the third-level cache cannot be used as a data cache. use.
  • an embodiment of the present application provides a data access method, which is applied to a processor system.
  • the processor system includes a processor core 401 , an upper-level cache 402 , a lower-level cache 403 and The memory 404, as shown in FIG. 5, the method for the processor core 401 to read the data to be accessed includes:
  • the upper-level cache 402 may send the data read request to the lower-level cache 403 and the memory 404 in parallel.
  • the tag tag of the lower-level cache is used to represent the information of the data cached in the lower-level cache.
  • the upper-level cache does not need to store the tag of the lower-level cache, which avoids the space waste of the upper-level cache and improves the space use efficiency of the upper-level cache.
  • the tag tag of the lower-level cache is not stored in the upper-level cache, and the tag is used to represent address information of the data stored in the lower-level cache.
  • the upper-level cache 402 is specifically configured to send a data read request to the memory controller of the memory 404, and the memory controller of the memory 404 performs the query and return operations of the data to be accessed.
  • the embodiment of the present application avoids storing the label of the lower-level cache in the upper-level cache, thereby avoiding the waste of the upper-level cache space and improving the performance of the upper-level cache. space efficiency.
  • the processor core 401 and the upper-level cache 402 are on the CPU die.
  • the upper-level cache 402 may also be located on a different die from the processor core 401 .
  • the upper-level cache is the third-level cache
  • the lower-level cache is the fourth-level cache.
  • the CPU die where the processor core is located includes the processor core, the first-level cache, and the second-level cache.
  • the third-level cache Cached on the second die of the non-CPU die, and the fourth level cached on the third die. When the three dies are encapsulated, the three dies can communicate through the inter-die bus.
  • a cache that belongs to the exclusive processing core may also be included between the upper-level cache and the processor core.
  • the processor core reads the data to be accessed , first determine that the data to be accessed is not stored in the exclusive cache, and then query whether the data to be accessed is cached in the upper-level cache.
  • the processor core 401 and the upper-level cache 402 may be packaged together, and the formed package structure is a processor chip.
  • a DDR (Double data rate) bus (PHY) exists between the memory 404 and the lower-level cache 403, and the lower-level cache 403 and the memory 404 can communicate directly through the DDR PHY.
  • the lower-level cache will update the aged cache data into the memory through the DDR PHY; or, the lower-level cache obtains the data to be cached from the memory through the DDR PHY.
  • each level of cache may include a cache controller, which is used to perform various operations on the cache of this level.
  • the cache controller is hardware control logic.
  • the upper-level cache 403 includes the hardware control logic for executing cache content query, determining whether the data to be accessed is hit, and control functions such as separate sending of read requests.
  • the packaging structure may further include the lower-level cache 403 , and at this time, the die where the processor core 401 is located and the die where the lower-level cache 403 is located may be connected through an inter-die bus.
  • FIG. 7 it is a schematic structural diagram of another processor system provided by an embodiment of the present application.
  • the processor system includes a processor core 701, a first-level cache L1D and L1I 702, a second-level cache 703, a third-level cache 704, a fourth-level cache 705 and a memory 706, and the package structure 71 includes a processor core 701, the first level cache L1D and L1I 702, the second level cache 703 and the third level cache 704, the package structure 71 includes at least 2 memory controller ports 711 and 712, the memory 706 has dual channels, one channel Connected to the first port 711 of the memory controller, another channel is connected to the second port of the memory controller, the first port 711 of the memory controller is connected to the third level cache 704, and the first port 712 of the memory controller Connected to the fourth level cache 705 .
  • the fourth-level cache 70 may also be located in the package structure.
  • the third-level cache 704 communicates with the memory through the first port of the memory controller on the processor chip, and the fourth-level cache 705 communicates with the memory through the second port of the memory controller on the processor chip. memory communication.
  • the fourth-level cache 705 may perform a cache line aging evict operation and a data prefetching prefetch operation through the first port of the memory controller.
  • FIG. 8 which is a schematic structural diagram of another processor system provided by an embodiment of the present application
  • the difference from FIG. 7 is that the processor system shown in FIG. 8 further includes an IO die 809 , and the IO die 809 For connecting the third-level cache 704 and the memory 706 , and for connecting the fourth-level cache 705 and the memory 706 .
  • the fourth-level cache 705 and the memory 706 are connected to a memory controller on the IO die 809, and the memory controller is specifically a DDR controller.
  • the connection between the fourth level cache 705 and the IO die may only be used to perform the aging Evict function.
  • an embodiment of the present application provides a schematic flowchart of a data access method, and the method includes:
  • Step 901 When the processor core performs a memory access operation, and reads the data to be accessed, the data to be accessed is not queried in the second-level replacement, at this time, L2 miss.
  • Step 902 The processor core sends a read (Load) request to the third-level cache to request the data to be accessed.
  • Step 903 The data to be accessed is not queried in the third-level cache, at this time, L3 miss.
  • Step 904 The third-level cache sends a read (Load) request to the fourth-level cache and the memory controller, respectively, for concurrently requesting the data to be accessed from the fourth-level cache and memory in the case of an L3 miss .
  • Load read
  • Step 905 When the data to be accessed is queried in the fourth-level cache, it indicates L4hit. At this time, the fourth-level cache returns the to-be-accessed data to the third-level cache, and sends the data to the processing through the third-level cache. The core returns the data to be accessed.
  • the storage method and query method of the cache line of the fourth-level cache may be the same as the third-level cache L3 in the prior art. be limited.
  • Step 906 The memory controller returns the to-be-accessed data queried from the memory to the third-level cache.
  • Step 907 The third-level cache determines that it has previously received the to-be-accessed data returned by the fourth-level cache, and discards the to-be-accessed data read from the memory by the memory controller.
  • the tag of the fourth-level cache is stored in the third-level cache.
  • L3 miss it is determined whether there is data to be accessed in the fourth-level cache by querying the L4 tag stored in the L3.
  • the third-level cache does not need to store the tag of the fourth-level cache, and when the processor core misses the data to be accessed in the L1-L3 cache, it does not need to pass the tag of the third-level cache.
  • the tag first determines whether the to-be-accessed data can be hit in the fourth cache, but concurrently sends a data read request to the fourth-level cache and the memory, and obtains the to-be-accessed data from either of the two, thus eliminating the need for
  • the tag used for querying the fourth-level cache is stored in the third-level cache, so as to avoid the space waste of the third-level cache.
  • FIG. 10 another schematic flowchart of data access provided by an embodiment of the present application is different from the embodiment shown in FIG. 9 in that the data to be accessed is not stored in the fourth level cache.
  • the method includes:
  • Steps 1001-1004 same as steps 901-904.
  • Step 1005 When the data to be accessed is not queried in the fourth-level cache, it indicates an L4 miss, and the fourth-level cache notifies the third-level cache that the to-be-accessed data is not queried.
  • the response may be a read failure
  • Step 1006 The memory controller returns the data to be accessed queried from the memory to the third-level cache.
  • Step 1007 The third-level cache returns the data to be accessed obtained from the memory to the processor core.
  • FIG. 11 a schematic flowchart of a method for aging (evict) of cached data provided by an embodiment of the present application, the method includes:
  • Step 1101 The processor performs an evict operation on the first cache line cache line in L2.
  • the first cache line may be a cache line that needs to be aged and determined according to the prior art;
  • Step 1102 Determine whether the first cache line to be aged is hit in the third-level cache, if so, update the first cache line in the third-level cache, if there is no hit in the third-level cache
  • the first cache line needs to be written into the third-level cache.
  • the third-level cache has no free storage space, in order to write the first cache line to the third-level cache, another cache line (the second cache line) needs to be deleted from the third-level cache line).
  • the cache line to be aged can be selected from the cache according to the LRU algorithm or other methods.
  • Step 1103 The third-level cache performs an evict operation on the second cache line, and sends the second cache line to the fourth-level cache.
  • Step 1104 Determine whether the second cache line to be aged is hit in the fourth level cache, if so, update the second cache line in the fourth level cache, if there is no hit in the fourth level cache
  • the second cache line needs to be written into the fourth level cache.
  • the fourth-level cache has no free storage space, in order to write the second cache line to the fourth-level cache, another cache line (third cache line) needs to be aged out from the fourth-level cache. line).
  • the cache line to be aged can be selected from the cache according to the LRU algorithm or other methods.
  • Step 1105 The fourth-level cache performs an evict operation on the third cache line, and sends the third cache line to the memory.
  • Step 1106 The memory controller receives the third cache line, and updates the data of the third cache line to the memory.
  • a DDR PHY interface exists between the fourth-level cache and the memory, and the fourth-level cache writes the third cache line to the memory through the DDR PHY interface;
  • processor core the upper-level cache, and the lower-level cache included in the aforementioned processor system in the embodiment of the present application may have various connection structures.
  • the processor system provided by the embodiments of the present application includes a processor core, an upper-level cache, and a lower-level cache, wherein the upper-level cache and the lower-level cache are located on different dies, and the upper-level cache The tag tag of the lower-level cache is not stored in .
  • the upper-level cache includes a first-level cache, a second-level cache, and a third-level cache
  • the lower-level cache is a fourth-level cache
  • an embodiment of the present application further provides a processor system, including a processor core, a first-level cache, a second-level cache, a third-level cache, and a fourth-level cache, wherein the processor core, the first-level cache, the third-level cache, and the fourth-level cache.
  • the first-level cache, the second-level cache and the third-level cache are located on the processor die, the fourth-level cache and the third-level cache are located on a different die,
  • the third-level cache is used to send data read to the fourth-level cache when none of the first-level cache, the second-level cache, and the third-level cache store the data to be accessed requested by the processor core. fetch request;
  • the third-level cache acquires the to-be-accessed data from the memory when the fourth-level cache does not store the to-be-accessed data.
  • the fourth-level cache and the third-level cache are in different dies in the same processor chip, and communicate with the third-level cache through an inter-die bus; or,
  • the fourth-level cache is located outside the processor chip where the third-level cache is located, and communicates with the third-level cache through an inter-chip bus domain.
  • Embodiments of the present application further provide a server, including a memory and a processor system as in the foregoing embodiments.
  • the lower-level cache and the upper-level cache may be co-sealed, or the lower-level cache and the upper-level cache may not be co-packaged, but connected through an inter-chip bus.
  • the upper-level cache in the foregoing embodiments may be referred to as the first cache, and the lower-level cache may be referred to as the second cache.
  • An embodiment of the present application further provides a processor chip, including a processor core, a first cache, and a second cache, wherein the first cache and the second cache are located on different dies in the processor chip,
  • the first cache is configured to send a data read request to the second cache, so as to obtain the data to be accessed requested by the processor core;
  • the first cache acquires the data to be accessed from the memory when the data to be accessed is not stored in the second cache.
  • Embodiments of the present application further provide a processor chip, including a processor core and a first cache,
  • the first cache is used to send a data read request to the second cache outside the processor chip, so as to obtain the data to be accessed requested by the processor core;
  • the first cache acquires the data to be accessed from the memory when the data to be accessed is not stored in the second cache.
  • the upper-level cache (for example, the third-level cache) can request the data to be accessed from the lower-level cache (for example, the fourth-level cache), without first judging whether it can be accessed in the lower-level cache.
  • the data to be accessed does not need to be stored in the upper-level cache, thereby avoiding the space waste of the upper-level cache.
  • the data to be accessed is not stored in the lower-level cache, the data to be accessed is requested from the memory.
  • the die in the foregoing embodiments of this application is also called a bare die or a bare chip, which is an integrated circuit made of semiconductor materials and not packaged.
  • the predetermined function of the integrated circuit is here implemented on a small piece of semiconductor.
  • integrated circuits are fabricated on a large semiconductor wafer by a number of steps such as photolithography, and then divided into small square pieces, which are called bare die.
  • the processor system may include multiple processors, and each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor )processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请公开了一种数据访问的方法和处理器系统,所述处理器系统包括:处理器核、上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,所述方法包括:判断所述上级缓存中是否有待访问数据,当所述上级缓存未存储待访问数据时,向所述下级缓存和内存分别发送数据读取请求,从所述下级缓存或者所述内存中的任一方接收在先返回的所述待访问数据。通过上述方式,可以使得处理器核在上级缓存中未命中待访问数据时,上级缓存可以分别向下级缓存和内存请求待访问数据,无需先判断是否可以在所述下级缓存中命中所述待访问数据,从而不需要将所述下级缓存的tag存储在上级缓存中,避免了对上级缓存的空间浪费。

Description

数据访问的方法和处理器系统
本申请要求于2020年7月13日提交的申请号为202010671444.3、发明名称为“一种数据访问的方法和处理器系统”的中国专利申请的优先权,以及于2020年12月30日提交中国专利局、申请号为202011620284.6、申请名称为“数据访问的方法和处理器系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算领域,特别涉及一种数据访问的方法和处理器系统。
背景技术
服务器系统中,缓存的结构和容量大小是中央处理器(Central Processing Unit,CPU)的重要性能指标,对CPU速度的影响非常大。CPU内缓存的运行频率极高,一般可以和处理器同频运作,效率远远高于系统内存和硬盘。实际运行场景中,CPU往往需要重复读取同样的数据块,而缓存容量的增大,可以大幅度提升CPU读取数据的命中率,以此提高系统性能。但是由于CPU芯片面积和成本的因素,缓存容量一般较小。
按照数据读取顺序和与CPU结合的紧密程度,CPU缓存可以分为第一级缓存L1,第二级缓存L2,部分高端CPU还具有第三级缓存L3,每一级缓存中所储存的全部数据都是下一级缓存的一部分,这三种缓存的技术难度和制造成本是相对递减的,其容量也是相对递增的。上述三级缓存容量有限,因此,现有技术中还引入了大容量的第四级缓存L4。
当CPU要读取待访问数据时,依次从L1-L3中查找待访问数据。当在L1~L3级缓存均未命中所述待访问数据时,L3缓存的控制器基于L3缓存中存的L4tags(标签)判断数据是否在L4中,如果不在,则从内存中读取待访问数据。
发明内容
本申请提供了一种数据访问的方法和处理器系统,避免在上级缓存中存放下级缓存标签tags,从而避免对所述上级缓存的空间浪费。
第一方面,本申请提供了一种数据访问的方法,所述方法应用于处理器系统,所述处理器系统包括:处理器核、上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,所述方法包括:当所述上级缓存未存储待访问数据时,所述上级缓存向所述下级缓存和所述内存分别发送数据读取请求,从所述下级缓存或者所述内存中的任一方接收在先返回的所述待访问数据。
通过上述方式,可以使得在上级缓存中未命中待访问数据时,无需先判断是否可以在所述下级缓存中命中所述待访问数据,而是并发向下级缓存和内存发送数据获取请求,从二者中的任一个获取待访问数据,从而不需要在所述上级缓存中存储用于查询所述下级缓存的标签tag,避免了对上级缓存的空间浪费。
在一种可能的实施方式中,上级缓存和下级缓存分别包括硬件控制逻辑和缓存空间,例如缓存控制器(Cache Controller)。缓存控制器用于与CPU核或者其他缓存的缓存控制器通信,并对本级缓存的缓存空间执行操作,例如,查询、写入和老化等等。
在另一种可能的实施方式中,可以由所述上级缓存的缓存控制器执行第一方面的方法。
在另一种可能的实施方式中,所述从所述下级缓存或者所述内存中的任一方接收在先返回的所述待访问数据包括:在先接收到从所述下级缓存返回的所述待访问数据时,丢弃在后收到的所述内存返回的数据读取响应;或者,在所述下级缓存中未存储所述待访问数据时,获取从所述内存返回的所述待访问数据。
由于下级缓存的数据访问速度远快于内存的数据访问速度,当待访问数据已经缓存在所述下级缓存时,下级缓存返回所述待访问数据给所述上级缓存的速度快于内存返回所述待访问数据的速度。此时,所述上级缓存在接收到来自下级缓存的待访问数据后,想处理器核返回待访问数据,并丢弃在后收到的内存返回的待访问数据。当待访问数据没有缓存在所述下级缓存中时,下级缓存会向上级缓存返回查询响应(snoop response),所述查询响应用于表示所述下级缓存中未缓存有有效的所述待访问数据。所述上级缓存从所述内存接收所述待访问数据。
在一种可能的实施方式中,所述上级缓存所在的die与所述下级缓存所在的die合封形成处理器芯片,所述上级缓存与所述下级缓存通过die间总线通信。
上级缓存和处理器核可以在同一CPU die上。
在另一种可能的实施方式中,所述下级缓存处在所述上级缓存所在的处理器芯片的外部,所述上级缓存与所述下级缓存通过芯片间总线通信。
在一种可能的实施方式中,所述上级缓存通过所述处理器芯片上的内存控制器第一端口与所述内存通信,所述下级缓存通过所述处理器芯片上的内存控制器第二端口与所述内存通信。所述内存为双通道随机存取内存(Dual-ported Random Access Memory,DPRAM)
在另一种可能的实施方式中,所述处理器系统还包括IO die,所述IO die可以用于实现所述上级缓存和所述下级缓存与所述内存之间的连接。具体的,所述IO die通过其上的内存控制器连接所述上级缓存与所述内存,以及,连接所述下级缓存和所述内存。
在一种可能的实施方式中,所述下级缓存通过所述内存控制器与所述内存之间存在DDR总线(PHY)与所述内存进行通信。
具体的,所述下级缓存通过所述DDR总线将老化的缓存数据更新到所述内存中;或者所述下级缓存通过所述DDR总线从所述内存获取待缓存的数据。
所述下级缓存通过所述内存控制器将老化的缓存数据更新到所述内存中。
在一种可能的实施方式中,所述上级缓存将老化的第一缓存行cache line发送给所述下级缓存;如果所述第一缓存行在所述下级缓存中命中,则所述下级缓存使用接收到的所述第一缓存行更新本地缓存;如果所述第一缓存行未在所述下级缓存命中,则将所述第一缓存行写入所述下级缓存。
所述上级缓存为第三级缓存L3,所述下级缓存为第四级缓存L4;或,
所述上级缓存为第二级缓存L2,所述下级缓存为第三级缓存L3。
第二方面,本申请实施例还提供了一种处理器系统,包括处理器核、上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,
所述上级缓存用于在所述上级缓存中没有存储待访问数据时,向所述下级缓存和内存分别发送数据读取请求,用于请求所述待访问数据,从所述下级缓存或者所述内存中的任一方接收在先返回的所述待访问数据。
具体的,可以由所述上级缓存的缓存控制器执行前述第二方面的动作。
第三方面,本申请实施例还提供了一种处理器系统,包括处理器核、上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,所述上级缓存中未存储所述下级缓存的标签tag。
第四方面,本申请实施例还提供了一种处理器系统,包括处理器核、第一级缓存、第二级缓存、第三级缓存和第四级缓存,其中,所述处理器核、第一级缓存、第二级缓存和第三级缓存位于处理器die,所述第四级缓存与所述第三级缓存位于不同的die,
所述第三级缓存用于在所述第一级缓存、第二级缓存和第三级缓存中均未存储有处理器核请求的待访问数据时,向所述第四级缓存发送数据读取请求;
所述第四级缓存在所述第四缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
所述第四级缓存与所述第三级缓存在同一处理器芯片中的不同die,并通过die间总线与所述第三级缓存通信;或者,
所述第四级缓存位于所述第三级缓存所在的处理器芯片外部,通过芯片间总线域所述第三级缓存通信。
第五方面,本申请实施例还提供了一种服务器,包括内存和如前述第二至第四方面任一所述的处理器系统。
第六方面,本申请实施例还提供了一种处理器芯片,包括处理器核、第一缓存和第二缓存,其中,所述第一缓存和所述第二缓存位于所述处理器芯片中不同的die,
所述第一缓存用于向第二缓存发送数据读取请求,用于获取所述处理器核请求的待访问数据;
所述第一缓存在所述第二缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
第七方面,本申请实施例还提供了一种处理器芯片,包括处理器核和第一缓存,
所述第一缓存用于向所述处理器芯片外的第二缓存发送数据读取请求,用于获取所述处理器核请求的待访问数据;
所述第一缓存在所述第二缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
第八方面,本申请实施例还提供了另一种数据访问的方法,所述方法应用于处理器系统,所述处理器系统包括:处理器核、上级缓存和下级缓存,其中,所述上级缓存中未存储所述下级缓存的标签tag,所述方法包括:
当所述上级缓存中未存储待访问数据时,向所述下级缓存和内存分别发送数据读取请求,用于请求所述待访问数据;
从所述下级缓存或者所述内存中的任一方接收在先返回的所述待访问数据。
第九方面,本申请实施例还提供了另一种处理器系统,包括:第一die和第二die,所述第一die包括上级缓存和内存控制器,所述第二die包括下级缓存,
所述下级缓存用于确定待老化的缓存行,将所述缓存行发送到所述第二die;
所述第二die用于通过所述内存控制器的端口将所述缓存行发送到内存。
第十方面,本申请实施例还提供了另一种数据访问的方法,所述方法应用于处理器系统,所述处理器系统包括:上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,所述方法包括:
所述下级缓存确定待老化的缓存行;
所述下级缓存将所述缓存行通过所述上级缓存所在的die上的内存控制器端口发送到内存。
第十一方面,本申请实施例提供了一种缓存控制器,所述缓存控制器用于执行前述第一方面、第八方面和第十方面的方法。
附图说明
图1是本申请实施例提供的一种缓存结构示例的示意图;
图2是本申请实施例提供的一种cache line结构示意图;
图3是本申请实施例提供的一种处理器系统结构示意图;
图4是本申请实施例提供的另一种处理器系统结构示意图;
图5是本申请实施例提供的一种处理器核读取待访问数据的方法流程示意图;
图6是本申请实施例提供的另一种处理器系统结构示意图;
图7是本申请实施例提供的另一种处理器系统结构示意图;
图8是本申请实施例提供的另一种处理器系统的结构图示意图;
图9是本申请实施例提供的一种数据访问的方法流程示意图;
图10是本申请实施例提供的另一种数据访问的流程示意图;
图11是申请实施例提供的一种缓存数据老化(evict)的方法流程示意图。
具体实施方式
下面将结合附图对本申请实施方式作进一步地详细描述。
处理器通常包含多级缓存。在第一级缓存L1中,指令(Instruction)和数据(data)分别使用各自的缓存L1I和L1D,在第二级缓存以及第二级以下的缓存中,通过程序和数据共用一套缓存。本申请将缓存的内容统一称为数据,将处理器核要访问的数据成为待访问数据。
如图1所示,为本申请实施例提供的一种缓存结构示例的示意图。示例性的,处理器核0和处理器核1分别独享L1缓存和L2缓存,共享L3缓存。缓存空间被分为多个缓存行(cache line,CL),每个cache line的大小可以为32byte、64byte等等,cache line为缓存和内存进行数据交换的最小单位。
如图2所示,为本申请实施例提供的一种cache line结构示意图,每个cache line通常包含3个部分:标志位(valid)、tag和block,其中,标志位表示该cache line是否有效,tag中存储的是该cache line对应的内存块的地址、block中存储的是内存块的地址对应的数据。当处理器核需要读取某个数据时,首先在缓存中查找,根据待访问数据的地址查找cache line并验证该cache line的valid标志,如果根据地址查找到的cache line的valid标志位为有效,则从缓存中读取数据,此过程又称为缓存命中(cache hit)。当缓存命中时,处理器核从缓存中读取数据,这个过程的时间消耗通常为几个时钟周期,当缓存未命中时,处理器核从内存中读取数据,则需要消耗的时间一般为几十或几百个时钟周期,会影响到处理器系统的整体性能。在具体实施场景中,缓存通过索引和标签tag中的地址,来确定本地缓存是否有处理器核读取的待访问数据。标签中的地址具体可以为待访问数据内存地址的高位地址或者部分内存地址位,索引可以为待访问数据内存地址的低位地址。本申请 实施例中,判断缓存是否命中的方法可以采用本领域中的常用方式,本申请实施例对此并不进行限定。
如图3所示,为本申请实施例提供的一种处理器系统结构示意图。示例性的,处理器系统包括CPU die 30、第四级缓存32和内存34,所示CPU die 30包括处理器核301、第一级缓存L1I302a和L1D302b、第二级缓存303和第三级缓存304。
当所述第三级缓存304中存储有第四级缓存的标签L4tag时,所述标签用于表示第四级缓存中存储的数据的地址信息,例如,数据在内存中的地址。所述第三级缓存通过上述标签可以判断处理器核要访问的数据是否缓存在所述第四级缓存中。此时,随着第四级缓存的容量越大,L4tag则会占用越多的第三级缓存的空间
具体的,处理器核发送访问请求,访问请求携带待访问数据的内存地址,缓存根据所述内存地址查询本地是否缓存有待访问数据。在上一级缓存中未命中待访问数据时,向下一级缓存发送数据访问请求,所述数据访问请求可以为load请求。具体的,如果第一级和第二级缓存中均没有命中待访问数据时,第三级缓存判断本地是否缓存了待访问数据,如果也没有命中,则第三级缓存根据本地存储的第四级缓存的标签判断第四级缓存中是否存储有所述待访问数据,如果第四级缓存中缓存有所述待访问数据,则向第四级缓存发送load请求,从第四级缓存中读取待访问数据,如果第四级缓存中没有存储所述待访问数据,第三级缓存则从内存获取待访问数据。由于第四级缓存为大容量缓存,而第四级缓存的标签则会占用第三级缓存宝贵的存储空间,从而降低了第三级缓存可以存储的数据量,影响了处理器系统的性能。例如,容量为512MB的第四级缓存可能需要在第三级缓存中占用大约25M空间存储,用于存储第四级缓存的标签L4tags,使得第三级缓存中的该25M的空间无法作为数据缓存使用。
为了解决上述技术问题,本申请实施例提供了一种数据访问的方法,应用于处理器系统,如图4所示,所述处理器系统包括处理器核401、上级缓存402、下级缓存403以及内存404,如图5所示,处理器核401读取待访问数据的方法包括:
501:判断所述上级缓存402中是否存储有所述待访问数据;
502:在所述上级缓存402中没有存储待访问数据时,向所述下级缓存403和所述内存404分别发送数据读取请求,用于请求所述待访问数据;
具体的,所述上级缓存402可以向所述下级缓存403和所述内存404并行发送所述数据读取请求。
503:从所述下级缓存403或者所述内存404中的任一方接收在先返回的所述待访问数据。
其中,所述下级缓存的标签tag用于表示缓存在所述下级缓存中的数据的信息。
通过上述方式,可以使得在上级缓存中未命中待访问数据时,并发向下级缓存和内存发送数据获取请求,从二者中的任一个获取待访问数据,从而无需由所述上级缓存判断是否可以在所述下级缓存中命中所述待访问数据,因此,上级缓存中不需要存储下级缓存的标签,避免了对上级缓存的空间浪费,提高了所述上级缓存的空间使用效率。
所述上级缓存中未存储所述下级缓存的标签tag,所述标签用于表示所述下级缓存中存储的数据的地址信息。
需要说明的是,所述上级缓存402具体用于向内存404的内存控制器发送数据读取请求, 由所述内存404的内存控制器执行待访问数据的查询和返回操作。
与现有技术中将下级缓存的标签放在上级缓存,避免跨die时延相比,本申请实施例,通过避免在上级缓存中存储下级缓存的标签,避免上级缓存空间的浪费,提高上级缓存的空间使用效率。
图4所示的示例中,处理器核401和上级缓存402在CPU die。与图4不同的是,在另一种可能的实施方式中,所述上级缓存402还可能与处理器核401在不同的die上。例如,所述上级缓存为第三级缓存,所述下级缓存为第四级缓存,此时,处理器核所在的CPU die包括处理器核、第一级缓存和第二级缓存,第三级缓存在非CPU die的第二die上,第四级缓存在第三die上。当三个die合封时,三个die可以通过die间总线通信。
在具体的实施方式中,与前述实施例描述内容类似,在所述上级缓存和所述处理器核之间还可以包括归属于处理核独享的缓存,在处理器核读取待访问数据时,先确定待访问数据未存储在独享的缓存后,查询所述待访问数据是否缓存在了所述上级缓存。所述上级缓存与所述处理器核之间可以有一级或多级缓存,本申请实施例对此并不进行限定。
示例性的,处理器核401和上级缓存402可以合封,形成的封装结构为处理器芯片。所述内存404与所述下级缓存403之间存在DDR(Double data rate)总线(PHY),所述下级缓存403与所述内存404可以通过所述DDR PHY进行直接通信。具体的,所述下级缓存将通过所述DDR PHY将老化的缓存数据更新到所述内存中;或者,所述下级缓存通过所述DDR PHY从所述内存获取待缓存的数据。
在可能的实施方式中,各级缓存均可以包括缓存控制器,用来执行针对本级缓存的各种操作。
示例性的,所述缓存控制器为硬件控制逻辑。所述上级缓存403包括所述硬件控制逻辑,用于执行缓存内容查询,确定待访问数据是否命中,以及读取请求的分别发送等等控制功能。
进一步的,如图6所示,为本申请实施例提供的另一种处理器系统结构示意图。与图4不同的是,所述封装结构还可以包括所述下级缓存403,此时,所述处理器核401所在的die与所述下级缓存403所在的die可以通过die间总线相连。
如图7所示,为本申请实施例提供的另一种处理器系统结构示意图。所述处理器系统包括处理器核701、第一级缓存L1D和L1I 702、第二级缓存703、第三级缓存704、第四级缓存705和内存706,所述封装结构71包括处理器核701、第一级缓存L1D和L1I 702、第二级缓存703和第三级缓存704,所述封装结构71包括至少2个内存控制器端口711和712,所述内存706具有双通道,一个通道连接到内存控制器第一端口711,另一个通道连接到内存控制器第二端口,所述内存控制器第一端口711与所述第三级缓存704相连,所述内存控制器第一端口712与所述第四级缓存705相连。
与图7不同的是,在一种可能的实施方式中,第四级缓存70还可以位于封装结构内。
所述第三级缓存704通过所述处理器芯片上的内存控制器第一端口与所述内存通信,所述第四级缓存705通过所述处理器芯片上的内存控制器第二端口与所述内存通信。
第四级缓存705可以通过内存控制器第一端口执行缓存行老化evict操作以及数据预取prefetch操作。
如图8所示,为本申请实施例提供的另一种处理器系统的结构图示意图,与图7不同的是,图8所示的处理器系统还包括IO die 809,所述IO die 809用于连接所述第三级缓存704与所述内存706,以及用于连接所述第四级缓存705和所述内存706。具体的,所述第四 级缓存705和所述内存706与所述IO die 809上的内存控制器相连,所述内存控制器具体为DDR控制器。所述第四级缓存705与所述IO die之间的连接可以仅用于执行老化Evict功能。
如图9所示,结合前述实施例提供的处理器系统的结构,本申请实施例提供了一种数据访问的方法流程示意图,所述方法包括:
步骤901:处理器核执行内存访问操作,读取待访问的数据时,第二级换成中未查询到所述待访问的数据,此时,L2miss。
步骤902:处理器核向第三级缓存发送读取(Load)请求,用于请求所述待访问数据。
步骤903:第三级缓存中未查询到所述待访问数据,此时,L3miss。
步骤904:所述第三级缓存向第四级缓存和内存控制器分别发送读取(Load)请求,用于在L3miss的情况下,并发地向第四级缓存和内存请求所述待访问数据。
步骤905:第四级缓存中查询到所述待访问数据时,表示L4hit,此时第四级缓存向第三级缓存返回所述待访问数据,并通过所述第三级缓存向所述处理器核返回所述待访问数据。
需要说明的是,本申请实施例中,第四级缓存的cache line的存储方式和查询方式可以与现有技术中的第三级缓存L3相同,本申请实施例对缓存内部的实现原理并不进行限定。
步骤906:内存控制器向所述第三级缓存返回从内存中查询到的待访问数据。
步骤907:第三级缓存确定在先收到了第四级缓存返回的所述待访问数据,丢弃所述内存控制器从所述内存中读取的所述待访问数据。
现有技术中,第三级缓存中存储第四级缓存的tag,在L3miss的情况下,通过查询在L3中存储的L4tag判断第四级缓存中是否有待访问的数据。与现有技术不同的是,在本申请实施例提供的上述方式中,第三级缓存无需存储第四级缓存的tag,处理器核在L1-L3缓存中未命中待访问数据时,无需通过tag先判断是否可以在所述第四缓存中命中所述待访问数据,而是并发向第四级缓存和内存发送数据读取请求,从二者中的任一个获取待访问数据,从而不需要将用于查询所述第四级缓存的tag存储在第三级缓存中,避免了对第三级缓存的空间浪费。
如图10所示,为本申请实施例提供的另一种数据访问的流程示意图,与图9所示的实施例不同之处在于,第四级缓存中未存储有待访问的数据。所述方法包括:
步骤1001-步骤1004:与步骤901-904相同。
步骤1005:第四级缓存中未查询到所述待访问数据时,表示L4miss,第四级缓通知第三级缓存未查询到所述待访问数据。响应可能是读取失败
步骤1006:内存控制器向所述第三级缓存返回从内存中查询到的待访问数据。
步骤1007:第三级缓存将从内存中获取的所述待访问数据返回给处理器核。
如图11所示,为本申请实施例提供的一种缓存数据老化(evict)的方法流程示意图,所述方法包括:
步骤1101:处理器核对L2中的第一缓存行cache line执行evict操作,示例性的,所述第一cache line可以为依据现有技术确定的需要老化的cache line;
步骤1102:判断在第三级缓存中是否命中(hit)待老化的所述第一cache line,如果是,则更新第三级缓存中的所述第一cache line,如果第三级缓存中没有存储所述第一cache line(Cache line miss),则需要把所述第一cache line写入到第三级缓存中。此时,如 果第三级缓存没有空余的存储空间,则为了将所述第一cache line写入到第三级缓存,需要从所述第三级缓存中删除掉另外一条cache line(第二cache line)。具体的,可以根据LRU算法等方式从缓存中选择待老化的cache line。
步骤1103:第三级缓存对第二cache line执行evict操作,向第四级缓存发送所述第二cache line。
步骤1104:判断在第四级缓存中是否命中(hit)待老化的所述第二cache line,如果是,则更新第四级缓存中的所述第二cache line,如果第四级缓存中没有存储所述第二cache line(Cache line miss),则需要把所述第二cache line写入到第四级缓存中。此时,如果第四级缓存没有空余的存储空间,则为了将所述第二cache line写入到第四级缓存,需要从所述第四级缓存中老化掉另外一条cache line(第三cache line)。具体的,可以根据LRU算法等方式从缓存中选择待老化的cache line。
步骤1105:第四级缓存对第三cache line执行evict操作,向内存发送所述第三cache line。
步骤1106:内存控制器接收所述第三cache line,将第三cache line的数据更新到内存。
在一种具体的实施方式中,第四级缓存与所述内存之间存在DDR PHY接口,所述第四级缓存通过所述DDR PHY接口将所述第三cache line写入到所述内存;
在另一种具体的实施方式中,所述第四级缓存与所述内存之间不存在直接连接,所述第四级缓存将所述第三cache line写入第三级缓存,通过所述第三级缓存将所述第三cache line写入到所述内存中。
需要说明的是,本申请实施例中前述处理器系统包括的处理器核、上级缓存和下级缓存可能存在多种连接结构。
在一种可能的实施方式中,本申请实施例提供的处理器系统,包括处理器核、上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,所述上级缓存中未存储所述下级缓存的标签tag。
在另一种可能的实施方式中,所述上级缓存包括第一级缓存、第二级缓存和第三级缓存,所述下级缓存为第四级缓存。相应地,本申请实施例还提供了一种处理器系统,包括处理器核、第一级缓存、第二级缓存、第三级缓存和第四级缓存,其中,所述处理器核、第一级缓存、第二级缓存和第三级缓存位于处理器die,所述第四级缓存与所述第三级缓存位于不同的die,
所述第三级缓存用于在所述第一级缓存、第二级缓存和第三级缓存中均未存储有处理器核请求的待访问数据时,向所述第四级缓存发送数据读取请求;
所述第三级缓存在所述第四缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
所述第四级缓存与所述第三级缓存在同一处理器芯片中的不同die,并通过die间总线与所述第三级缓存通信;或者,
所述第四级缓存位于所述第三级缓存所在的处理器芯片外部,通过芯片间总线域所述第三级缓存通信。
本申请实施例还提供了一种服务器,包括内存和如前述各实施例中的处理器系统。
需要说明的是,处理器系统在封装时,可以将下级缓存与上级缓存合封,或者不将下级 缓存与上级缓存合封,而是通过芯片间总线相连。前述各实施例中的上级缓存可以称为第一缓存,下级缓存可以称为第二缓存。
本申请实施例还提供了一种处理器芯片,包括处理器核、第一缓存和第二缓存,其中,所述第一缓存和所述第二缓存位于所述处理器芯片中不同的die,
所述第一缓存用于向第二缓存发送数据读取请求,用于获取所述处理器核请求的待访问数据;
所述第一缓存在所述第二缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
本申请实施例还提供了一种处理器芯片,包括处理器核和第一缓存,
所述第一缓存用于向所述处理器芯片外的第二缓存发送数据读取请求,用于获取所述处理器核请求的待访问数据;
所述第一缓存在所述第二缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
处理器核在上级缓存中未命中待访问数据时,上级缓存(例如,第三级缓存)可以向下级缓存(例如第四级缓存)请求待访问数据,无需先判断是否可以在所述下级缓存中命中所述待访问数据,从而不需要将所述下级缓存的tag存储在上级缓存中,避免了对上级缓存的空间浪费。在下级缓存中未存储待访问数据时,再向内存请求待访问数据。
需要说明的是,在本申请前述各实施例中的die又称为裸晶或者裸片,是以半导体材料制作而成,且未经封装的一块集成电路,该集成电路的既定功能就是在这一小片半导体上实现。通常情况下,集成电路是经光刻等多项步骤,制作在大片的半导体晶圆上,然后再分割成方型小片,方形小片就称为裸晶die。die上只有用于封装的压焊点(pad),在经过封装后得到的芯片对外提供引脚。
在具体实现中,作为一种实施例,处理器系统可以包括多个处理器,这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (27)

  1. 一种数据访问的方法,其特征在于,所述方法应用于处理器系统,所述处理器系统包括:处理器核、上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,所述方法包括:
    当所述上级缓存未存储待访问数据时,向所述下级缓存和内存分别发送数据读取请求,用于请求所述待访问数据;
    从所述下级缓存或者所述内存中的任一方接收在先返回的所述待访问数据。
  2. 如权利要求1所述的方法,其特征在于,所述上级缓存所在的die与所述下级缓存所在的die合封形成处理器芯片,所述上级缓存与所述下级缓存通过die间总线通信。
  3. 如权利要求1所述的方法,其特征在于,所述下级缓存处在所述上级缓存所在的处理器芯片的外部,所述上级缓存与所述下级缓存通过芯片间总线通信。
  4. 如权利要求2或3所述的方法,其特征在于,所述上级缓存通过所述处理器芯片上的内存控制器第一端口与所述内存通信,所述下级缓存通过所述处理器芯片上的内存控制器第二端口与所述内存通信。
  5. 如权利要求2或3任一所述的方法,其特征在于,所述处理器系统还包括IO die,
    所述IO die通过其上的内存控制器连接所述上级缓存与所述内存,以及,连接所述下级缓存和所述内存。
  6. 如权利要求4或5任一所述的方法,其特征在于,所述下级缓存通过所述内存控制器与所述内存之间存在DDR总线与所述内存进行通信。
  7. 如权利要求6所述的方法,其特征在于,所述方法包括:
    所述下级缓存通过所述DDR总线将老化的缓存数据更新到所述内存中;或者
    所述下级缓存通过所述DDR总线从所述内存获取待缓存的数据。
  8. 如权利要求7所述的方法,其特征在于,所述下级缓存通过所述DDR总线将老化的缓存数据更新到所述内存中包括:
    所述下级缓存通过所述内存控制器将老化的缓存数据更新到所述内存中。
  9. 如权利要求1-8任一所述的方法,其特征在于,所述方法包括:
    所述上级缓存将老化的第一缓存行cache line发送给所述下级缓存;
    如果所述第一缓存行在所述下级缓存中命中,则所述下级缓存使用接收到的所述第一缓存行更新本地缓存;
    如果所述第一缓存行未在所述下级缓存命中,则将所述第一缓存行写入所述下级缓存。
  10. 如权利要求1-9任一所述的方法,其特征在于,
    所述上级缓存为第三级缓存L3,所述下级缓存为第四级缓存L4;或,
    所述上级缓存为第二级缓存L2,所述下级缓存为第三级缓存L3。
  11. 一种处理器系统,其特征在于,包括:第一die和第二die,所述第一die包括上级缓存和内存控制器,所述第二die包括下级缓存,
    所述下级缓存用于确定待老化的缓存行,将所述缓存行发送到所述第二die;
    所述第二die用于通过所述内存控制器的端口将所述缓存行发送到内存。
  12. 一种处理器系统,其特征在于,包括处理器核、上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,
    所述上级缓存用于在未缓存待访问数据时,向所述下级缓存和内存分别发送数据读取请求,用于请求所述待访问数据,从所述下级缓存或者所述内存中的任一方接收在先返回的所述待访问数据。
  13. 如权利要求12的系统,其特征在于,
    所述上级缓存具体用于在先接收到从所述下级缓存返回的所述待访问数据时,丢弃在后收到的所述内存返回的数据读取响应;或者
    所述上级缓存具体用于在所述下级缓存中未存储所述待访问数据时,获取从所述内存返回的所述待访问数据。
  14. 如权利要求12或13的系统,其特征在于,
    所述上级缓存所在的die与所述下级缓存所在的die合封形成处理器芯片,所述上级缓存与所述下级缓存通过die间总线通信。
  15. 如权利要求12或13的系统,其特征在于,
    所述下级缓存处在所述上级缓存所在的处理器芯片的外部,所述上级缓存与所述下级缓存通过芯片间总线通信。
  16. 如权利要求14或15的系统,其特征在于,
    所述上级缓存通过所述处理器芯片上的内存控制器第一端口与所述内存通信,所述下级缓存通过所述处理器芯片上的内存控制器第二端口与所述内存通信。
  17. 如权利要求14或15的系统,其特征在于,
    所述处理器系统还包括IO die,所述IO die通过其上的内存控制器连接所述上级缓存与所述内存,以及,连接所述下级缓存和所述内存。
  18. 如权利要求16或17任一所述的系统,其特征在于,
    所述下级缓存通过所述内存控制器与所述内存之间存在DDR总线与所述内存进行通信。
  19. 如权利要求18所述的系统,其特征在于,
    所述下级缓存还用于通过所述DDR总线将老化的缓存数据更新到所述内存中;或者
    所述下级缓存还用于通过所述DDR总线从所述内存获取待缓存的数据。
  20. 如权利要求19所述的系统,其特征在于,
    所述下级缓存具体用于通过所述内存控制器将老化的缓存数据更新到所述内存中。
  21. 如权利要求12-20任一所述的系统,其特征在于,
    所述上级缓存还用于将老化的第一缓存行cache line发送给所述下级缓存;
    所述下级缓存还用于在所述第一缓存行在本地命中时,使用接收到的所述第一缓存行更新本地缓存,在所述第一缓存行在本地缓存中未命中时,将所述第一缓存行写入本地缓存。
  22. 一种处理器系统,其特征在于,包括处理器核、上级缓存和下级缓存,其中,所述上级缓存和所述下级缓存位于不同的die,所述上级缓存中未存储所述下级缓存的标签tag。
  23. 一种处理器系统,其特征在于,包括处理器核、第一级缓存、第二级缓存、第三级缓存和第四级缓存,其中,所述处理器核、第一级缓存、第二级缓存和第三级缓存位于处理器die,所述第四级缓存与所述第三级缓存位于不同的die,
    所述第三级缓存用于在所述第一级缓存、第二级缓存和第三级缓存中均未存储有处理器核请求的待访问数据时,向所述第四级缓存发送数据读取请求;
    所述第三级缓存在所述第四缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
  24. 如权利要求所述的系统,其特征在于,所述第四级缓存与所述第三级缓存在同一处理器芯片中的不同die,并通过die间总线与所述第三级缓存通信;或者,
    所述第四级缓存位于所述第三级缓存所在的处理器芯片外部,通过芯片间总线域所述第三级缓存通信。
  25. 一种服务器,其特征在于,包括内存和如权利要求12-24任一所述的处理器系统。
  26. 一种处理器芯片,其特征在于,包括处理器核、第一缓存和第二缓存,其中,所述第一缓存和所述第二缓存位于所述处理器芯片中不同的die,
    所述第一缓存用于向第二缓存发送数据读取请求,用于获取所述处理器核请求的待访问数据;
    所述第一缓存在所述第二缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
  27. 一种处理器芯片,其特征在于,包括处理器核和第一缓存,
    所述第一缓存用于向所述处理器芯片外的第二缓存发送数据读取请求,用于获取所述处理器核请求的待访问数据;
    所述第一缓存在所述第二缓存中未存储所述待访问数据时,从内存获取所述待访问数据。
PCT/CN2021/102603 2020-07-13 2021-06-28 数据访问的方法和处理器系统 WO2022012307A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010671444.3 2020-07-13
CN202010671444 2020-07-13
CN202011620284.6 2020-12-30
CN202011620284.6A CN113934364A (zh) 2020-07-13 2020-12-30 数据访问的方法和处理器系统

Publications (1)

Publication Number Publication Date
WO2022012307A1 true WO2022012307A1 (zh) 2022-01-20

Family

ID=79274128

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102603 WO2022012307A1 (zh) 2020-07-13 2021-06-28 数据访问的方法和处理器系统

Country Status (2)

Country Link
CN (1) CN113934364A (zh)
WO (1) WO2022012307A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133748A1 (en) * 2003-01-07 2004-07-08 Jaehyung Yang Unbalanced inclusive tags
CN104679669A (zh) * 2014-11-27 2015-06-03 华为技术有限公司 高速缓存cache存储器系统及访问缓存行cache line的方法
CN111078592A (zh) * 2019-12-27 2020-04-28 无锡中感微电子股份有限公司 一种低功耗片上系统的多级指令缓存

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133748A1 (en) * 2003-01-07 2004-07-08 Jaehyung Yang Unbalanced inclusive tags
CN104679669A (zh) * 2014-11-27 2015-06-03 华为技术有限公司 高速缓存cache存储器系统及访问缓存行cache line的方法
CN111078592A (zh) * 2019-12-27 2020-04-28 无锡中感微电子股份有限公司 一种低功耗片上系统的多级指令缓存

Also Published As

Publication number Publication date
CN113934364A (zh) 2022-01-14

Similar Documents

Publication Publication Date Title
US20230418759A1 (en) Slot/sub-slot prefetch architecture for multiple memory requestors
TWI545435B (zh) 於階層式快取處理器中之協調預取
US7861055B2 (en) Method and system for on-chip configurable data ram for fast memory and pseudo associative caches
JP6859361B2 (ja) 中央処理ユニット(cpu)ベースシステムにおいて複数のラストレベルキャッシュ(llc)ラインを使用してメモリ帯域幅圧縮を行うこと
WO2022178998A1 (zh) 一种基于SEDRAM的堆叠式Cache系统、控制方法和Cache装置
JP2010532517A (ja) 連想度を設定可能なキャッシュメモリ
US20130046934A1 (en) System caching using heterogenous memories
US20080086599A1 (en) Method to retain critical data in a cache in order to increase application performance
US8996815B2 (en) Cache memory controller
TWI393050B (zh) 促進多重處理器介面之板內建快取記憶體系統之記憶體裝置及方法及使用其之電腦系統
US20150363314A1 (en) System and Method for Concurrently Checking Availability of Data in Extending Memories
JP2014517387A (ja) 大型データキャッシュのための効率的なタグストレージ
US6934811B2 (en) Microprocessor having a low-power cache memory
US9058283B2 (en) Cache arrangement
JP7108141B2 (ja) データ領域を記憶するためのキャッシュ
JP2001043130A (ja) コンピュータシステム
WO2024066195A1 (zh) 缓存管理方法及装置、缓存装置、电子装置和介质
US20090006777A1 (en) Apparatus for reducing cache latency while preserving cache bandwidth in a cache subsystem of a processor
US20130191587A1 (en) Memory control device, control method, and information processing apparatus
JP5976225B2 (ja) スティッキー抜去エンジンを伴うシステムキャッシュ
US7882309B2 (en) Method and apparatus for handling excess data during memory access
WO2022012307A1 (zh) 数据访问的方法和处理器系统
JP2020531950A (ja) サービスレベル合意に基づいたキャッシング用の方法及びシステム
US11755477B2 (en) Cache allocation policy
US20240054072A1 (en) Metadata-caching integrated circuit device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21842487

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21842487

Country of ref document: EP

Kind code of ref document: A1