WO2019085649A1 - 一种缓存访问方法、多级缓存系统及计算机系统 - Google Patents

一种缓存访问方法、多级缓存系统及计算机系统 Download PDF

Info

Publication number
WO2019085649A1
WO2019085649A1 PCT/CN2018/105010 CN2018105010W WO2019085649A1 WO 2019085649 A1 WO2019085649 A1 WO 2019085649A1 CN 2018105010 W CN2018105010 W CN 2018105010W WO 2019085649 A1 WO2019085649 A1 WO 2019085649A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
dca
instruction
target
cluster
Prior art date
Application number
PCT/CN2018/105010
Other languages
English (en)
French (fr)
Inventor
陈俊锐
余谓为
崔鲁平
李琪
熊礼文
徐志通
李又麟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019085649A1 publication Critical patent/WO2019085649A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels

Definitions

  • the present application relates to the field of communications technologies, and in particular, to a cache access method, a multi-level cache system, and a computer system.
  • CPU Central Processing Unit
  • the cache Cache is a small-capacity memory between the CPU and Memory.
  • the access speed is faster than Memory and close to the CPU. It can provide instructions and data to the CPU at high speed, improving the execution speed of the program.
  • Cache Stashing technology is an effective solution for solving the latency problem of Memory access provided by ARM.
  • the CPU core of the source cluster cluster needs to push data to the CPU core of the target cluster (currently, multiple CPU cores are integrated into a cluster in a multi-core processor system)
  • the Cache Stashing technology needs to complete data consistency first.
  • the Snoop Stash operation then initiates a Prefetch operation by the Level 1 Cache of the CPU core of the target cluster, and retrieves the data in the CPU core of the source cluster.
  • the present application provides a cache access method, a multi-level cache system, and a computer system, which are used to reduce the operation steps when the source cluster reads or writes data to the target cluster, thereby reducing the latency and improving the CPU performance of the system.
  • the first aspect of the present application provides a cache access method, which is applied to a multi-level cache system, where the multi-level cache system includes a shared cache and at least two clusters, each cluster having at least one level 1 cache and a level 2 cache.
  • Methods include:
  • the source L2 cache obtains a direct access cache DCA instruction
  • the source L2 cache is a secondary cache of the source cluster
  • the DCA instruction includes the target The cache ID of the target level 1 cache in the cluster
  • the source secondary cache sends the DCA instruction to the shared cache
  • the shared cache generates a DCA operation instruction according to the DCA instruction
  • the shared cache sends the DCA operation instruction to the target level 1 cache through a target level 2 cache in the target cluster, such that the target level 1 cache writes the data or writes the data Describe the source cluster.
  • the source L2 cache acquires a DCA instruction carrying the cache identifier of the target L1 cache, the source L2 cache sends the DCA instruction to the shared cache, and the shared cache is generated according to the DCA instruction.
  • the DCA operation instruction then sends the DCA operation instruction to the target level 1 cache through the target level 2 cache in the target cluster, so that the target level 1 cache writes data or writes the data to the source. Due to the DCA technology, the steps are reduced compared with the existing Cache Stashing technology. For example, to push the data of the HAC in the source cluster into the target level 1 cache in the target cluster, for example, in the Cache Stashing technology.
  • the Pfetch is sent to the shared cache through the target level 1 cache, and then the shared cache carries the HAC data in the Fetch response and feeds back to the target level 1 cache.
  • the shared cache carries the HAC data in the DCA operation instruction.
  • the shared cache sends the DCA operation instruction to the target level 1 cache through the target cluster's L2 Cache, the HAC data is pushed into the target level 1 cache.
  • the method before the source L2 cache sends the DCA instruction to the shared cache, the method further includes:
  • the source L2 cache sends a first probe command to the source L1 cache in the source cluster, so that the source L1 cache feeds back a first probe response, where the first probe command is used to perform the source cluster Data consistency operation;
  • the source L2 cache obtains a first probe response of the source L1 cache feedback, and determines, according to the first probe response, that the source cluster has data consistency.
  • the data consistency mechanism can be pre-guaranteed by the multi-level cache system, or the data consistency can be determined according to the method in the Cache Stashing technology. If it is determined according to the method in the Cache Stashing technology, then the source level is required. After obtaining the DCA instruction, the cache initiates a first probe instruction to each level 1 cache in the source cluster according to the data consistency mechanism, and each level 1 cache in the source cluster completes the data consistency operation according to the first probe instruction, The source L2 caches the first probe response, and the source L2 cache obtains the first probe response of each level 1 cache feedback. According to the first probe response, the source cluster can be determined to have data consistency.
  • the method before the generating, by the shared cache, the DCA operation instruction according to the DCA instruction, the method further includes:
  • the shared cache sends a DCA probe instruction to a target secondary cache in the target cluster
  • the target L2 cache Sending, by the target L2 cache, a second probe instruction to all the L1 caches in the target cluster according to the DCA probe command, so that all L1 caches in the target cluster feed back a second probe response, the second The probe instruction is used to perform data consistency operations of the target cluster;
  • the target secondary cache receives a second probe response of all the level 1 cache feedbacks in the target cluster, and feeds the second probe response to the shared cache;
  • the shared cache receives a second probe response of the target secondary cache feedback, and determines, according to the second probe response, that the target cluster has data consistency.
  • the data consistency mechanism can be pre-guaranteed by the multi-level cache system
  • the data consistency can also be determined according to the method in the Cache Stashing technology. If it is determined according to the manner in the Cache Stashing technology, when the shared cache receives the DCA. After the instruction, according to the data consistency mechanism, the data consistency of the target cluster needs to be confirmed first. Therefore, the DCA probe instruction needs to be sent to the target secondary cache, and after receiving the DCA probe instruction, the target secondary cache sends to all the primary caches in the target cluster.
  • the second probe command causes all the level 1 caches to feed back the second probe response, the target level 2 cache receives the second probe response, and feeds the second probe response to the shared cache, and the shared cache receives the second probe of all the level 1 caches. After the response, it is determined that the target cluster has data consistency.
  • the DCA instruction when the source cluster needs to write data into the target cluster, the DCA instruction is a DCA direct write command,
  • the DCA direct write command further includes the data, and the DCA operation instruction is a DCA fill instruction,
  • the shared cache generates a DCA operation instruction according to the DCA instruction, including:
  • the shared cache generates a DCA fill instruction, the DCA fill instruction including the data, the DCA fill instruction being used to directly write the data to the target level one cache.
  • the working mode of the Cache includes a write-through Cache mode and a write-back Cache mode.
  • the write-through Cache mode is: when the CPU writes data into the memory, in addition to updating the data on the Cache, the data is also written in the DRAM.
  • the write-back Cache mode is: whenever the CPU writes data to Memory, it will only update the data on the Cache first, and then let the Cache stop when the bus is not jammed. Write the data back to DRAM.
  • the multi-level cache system adopts a write-back Cache method, and the DCA instruction is specifically a DCA write-back instruction; in the case of determining data consistency according to the method in the Cache Stashing technology, the method is adopted.
  • the direct write Cache method the DCA instruction is specifically a DCA direct write instruction.
  • the DCA instruction can be a DCA write-through instruction or a DCA fill instruction, and the data of the source cluster is included in the DCA instruction, when the source cluster needs to write data to the target cluster, and
  • the DCA instruction is a DCA direct write instruction
  • the shared cache determines the target level 1 cache according to the cache identifier in the DCA direct write instruction, generates a DCA fill instruction, and the DCA fill instruction includes the data of the source cluster, and then sends the DCA fill instruction to the target level. Cache, you can directly write data to the target cluster.
  • the DCA instruction is a DCA write-back instruction
  • the DCA write-back instruction further includes the data
  • the DCA operation instruction is a DCA fill instruction
  • the shared cache generates a DCA operation instruction according to the DCA instruction, including:
  • the shared cache generates a DCA fill instruction, the DCA fill instruction including the data, the DCA fill instruction being used to directly write the data to the target level one cache.
  • the working mode of the Cache includes a write-through Cache mode and a write-back Cache mode.
  • the write-through Cache mode is: when the CPU writes data into the memory, in addition to updating the data on the Cache, the data is also written in the DRAM.
  • the write-back Cache mode is: whenever the CPU writes data to Memory, it will only update the data on the Cache first, and then let the Cache stop when the bus is not jammed. Write the data back to DRAM.
  • the multi-level cache system adopts a write-back Cache method, and the DCA instruction is specifically a DCA write-back instruction; in the case of determining data consistency according to the method in the Cache Stashing technology, the method is adopted.
  • the direct write Cache method the DCA instruction is specifically a DCA direct write instruction.
  • the DCA instruction can be a DCA write-through instruction or a DCA fill instruction, and the data of the source cluster is included in the DCA instruction, when the source When the cluster needs to write data to the target cluster, and the DCA instruction is a DCA direct write instruction, the DCA instruction is a DCA write back instruction, and the shared cache determines the target level 1 cache according to the cache identifier in the DCA write back instruction, and generates a DCA fill instruction.
  • the DCA fill instruction contains the data of the source cluster, and then the DCA fill instruction is sent to the target level 1 cache, so that the data can be directly written to the target cluster.
  • the DCA instruction is a DCA read instruction
  • the DCA operation instruction is a probe write back instruction
  • the shared cache generates a DCA operation instruction according to the DCA instruction, including:
  • the shared cache generates a probe writeback instruction, the probe writeback instruction for instructing the target secondary cache to read the data from the target primary cache.
  • the sharing The cache generates a DCA operation instruction according to the DCA instruction, specifically: the shared cache determines the target level 1 cache according to the cache identifier in the DCA read instruction, the shared cache generates a probe write back instruction, and the probe write back instruction is used to indicate the target level 2 cache from the target L1. Cache reads data.
  • the shared cache sends the DCA operation instruction to the target level by using a target secondary cache in the target cluster.
  • the cache After the cache, it also includes:
  • the target level 1 cache receives the probe write back instruction
  • the target level 1 cache feeds a third probe response to the target level 2 cache according to the probe writeback instruction, where the third probe response includes the data;
  • the target secondary cache forwards the third probe response to the shared cache
  • the shared cache generates a DCA read response according to the third probe response, the DCA read response including the data;
  • the shared cache sends the DCA read response to the source L2 cache, such that the source L2 cache obtains the data of the target L1 cache according to the DCA read response.
  • the target level 1 cache After the target level 1 cache receives the probe writeback instruction, the target level 1 cache feeds back a third probe response to the target level 2 cache according to the probe write back instruction, and includes the data that the source cluster needs to read in the third probe response, target 2
  • the level cache forwards the third probe response to the shared cache, the shared cache generates a probe writeback instruction according to the third probe response, and the probe writeback instruction includes data, and the shared cache sends the probe writeback instruction to the source secondary cache,
  • the source secondary cache After obtaining the probe writeback instruction, the source secondary cache can obtain the data of the target cluster that needs to be read by the source cluster included in the probe writeback instruction.
  • the second aspect of the present application provides a multi-level cache system, including:
  • the source L2 cache is configured to obtain a direct access cache DCA command when the source cluster needs to read or write data in the target cluster, where the source L2 cache is a L2 cache in the source cluster.
  • the DCA instruction includes a cache identifier of a target level 1 cache in the target cluster;
  • the source L2 cache is further configured to send the DCA instruction to the shared cache.
  • the shared cache is configured to generate a DCA operation instruction according to the DCA instruction
  • the shared cache is further configured to send the DCA operation instruction to the target L1 cache through a target L2 cache in the target cluster, so that the target L1 cache writes the data or the Data is written to the source cluster.
  • the source L2 cache acquires a DCA instruction carrying the cache identifier of the target L1 cache, the source L2 cache sends the DCA instruction to the shared cache, and the shared cache is generated according to the DCA instruction.
  • the DCA operation instruction then sends the DCA operation instruction to the target level 1 cache through the target level 2 cache in the target cluster, so that the target level 1 cache writes data or writes the data to the source. Due to the DCA technology, the steps are reduced compared with the existing Cache Stashing technology. For example, to push the data of the HAC in the source cluster into the target level 1 cache in the target cluster, for example, in the Cache Stashing technology.
  • the Pfetch is sent to the shared cache through the target level 1 cache, and then the shared cache carries the HAC data in the Fetch response and feeds back to the target level 1 cache.
  • the shared cache carries the HAC data in the DCA operation instruction.
  • the shared cache sends the DCA operation instruction to the target level 1 cache through the target cluster's L2 Cache, the HAC data is pushed into the target level 1 cache.
  • the source L2 cache is further configured to send a first probe command to the source L1 cache in the source cluster, so that the source L1 cache feeds back a first probe response, where the first probe command is used to perform the Describe the data consistency operation of the source cluster;
  • the source L2 cache is further configured to receive the first probe response of the source L1 cache feedback, and determine, according to the first probe response, that the source cluster has data consistency.
  • the data consistency mechanism can be pre-guaranteed by the multi-level cache system, or the data consistency can be determined according to the method in the Cache Stashing technology. If it is determined according to the method in the Cache Stashing technology, then the source level is required. After obtaining the DCA instruction, the cache initiates a first probe instruction to each level 1 cache in the source cluster according to the data consistency mechanism, and each level 1 cache in the source cluster completes the data consistency operation according to the first probe instruction, The source L2 caches the first probe response, and the source L2 cache obtains the first probe response of each level 1 cache feedback. According to the first probe response, the source cluster can be determined to have data consistency.
  • the shared cache is further configured to send a DCA probe instruction to a target secondary cache in the target cluster;
  • the shared cache is further configured to receive a second probe response of the target secondary cache feedback, and determine, according to the second probe response, that the target cluster has data consistency.
  • the data consistency mechanism can be pre-guaranteed by the multi-level cache system
  • the data consistency can also be determined according to the method in the Cache Stashing technology. If it is determined according to the manner in the Cache Stashing technology, when the shared cache receives the DCA. After the instruction, according to the data consistency mechanism, the data consistency of the target cluster needs to be confirmed first. Therefore, the DCA probe instruction needs to be sent to the target secondary cache, and after receiving the DCA probe instruction, the target secondary cache sends to all the primary caches in the target cluster.
  • the second probe command causes all the level 1 caches to feed back the second probe response, the target level 2 cache receives the second probe response, and feeds the second probe response to the shared cache, and the shared cache receives the second probe of all the level 1 caches. After the response, it is determined that the target cluster has data consistency.
  • the DCA instruction is a DCA direct write command
  • the DCA direct write command further includes the data
  • the DCA operation instruction is a DCA fill instruction
  • the shared cache is further configured to determine the target level 1 cache according to the cache identifier in the DCA direct write instruction, and acquire the data in the DCA direct write instruction;
  • the shared cache is further configured to generate a DCA fill instruction, the DCA fill instruction includes the data, and the DCA fill instruction is used to directly write the data into the target level 1 cache.
  • the multi-level cache system adopts a write-back Cache method, and the DCA instruction is specifically a DCA write-back instruction; in the case of determining data consistency according to the method in the Cache Stashing technology, the method is adopted.
  • the direct write Cache method the DCA instruction is specifically a DCA direct write instruction. Therefore, when the source cluster needs to write data to the target cluster, the DCA instruction can be a DCA write-through instruction or a DCA write-back instruction, and the data of the source cluster is included in the DCA instruction.
  • the shared cache determines the target level 1 cache according to the cache identifier in the DCA direct write instruction, generates a DCA fill instruction, and the DCA fill instruction includes the data of the source cluster, then sends the DCA fill instruction to the target one.
  • Level cache you can directly write data to the target cluster.
  • the shared cache is further configured to determine the target level 1 cache according to the cache identifier in the DCA write back instruction, and acquire the data in the DCA write back instruction;
  • the shared cache is further configured to generate a DCA fill instruction, the DCA fill instruction includes the data, and the DCA fill instruction is used to directly write the data into the target level 1 cache.
  • the working mode of the Cache includes a write-through Cache mode and a write-back Cache mode.
  • the write-through Cache mode is: when the CPU writes data into the memory, in addition to updating the data on the Cache, the data is also written in the DRAM.
  • the write-back Cache mode is: whenever the CPU writes data to Memory, it will only update the data on the Cache first, and then let the Cache stop when the bus is not jammed. Write the data back to DRAM.
  • the multi-level cache system adopts a write-back Cache method, and the DCA instruction is specifically a DCA write-back instruction; in the case of determining data consistency according to the method in the Cache Stashing technology, the method is adopted.
  • the direct write Cache method the DCA instruction is specifically a DCA direct write instruction.
  • the DCA instruction is a DCA read instruction
  • the DCA operation instruction is a probe write back instruction
  • the shared cache is further configured to determine the target level 1 cache according to the cache identifier in the DCA read instruction;
  • the shared cache is further configured to generate a probe writeback instruction, where the probe writeback instruction is used to instruct the target secondary cache to read the data from the target primary cache.
  • the target level 1 cache is further configured to feed back a third probe response to the target level 2 cache according to the probe writeback instruction, where the third probe response includes the data;
  • the target secondary cache is further configured to forward the third probe response to the shared cache
  • the target level 1 cache After the target level 1 cache receives the probe writeback instruction, the target level 1 cache feeds back a third probe response to the target level 2 cache according to the probe write back instruction, and includes the data that the source cluster needs to read in the third probe response, target 2
  • the level cache forwards the third probe response to the shared cache, the shared cache generates a probe writeback instruction according to the third probe response, and the probe writeback instruction includes data, and the shared cache sends the probe writeback instruction to the source secondary cache,
  • the source secondary cache After obtaining the probe writeback instruction, the source secondary cache can obtain the data of the target cluster that needs to be read by the source cluster included in the probe writeback instruction.
  • a third aspect of the present application provides a computer system, including:
  • the multi-level cache system includes a shared cache and at least two clusters, each cluster having at least one level 1 cache and a level 2 cache;
  • the source secondary cache sends the DCA instruction to the shared cache
  • the shared cache generates a DCA operation instruction according to the DCA instruction
  • an external memory and a multi-level cache system In a computer system, an external memory and a multi-level cache system, the external memory and the multi-level cache system are connected by a bus, the multi-level cache system includes a shared cache and at least two clusters, each cluster having at least one level 1 cache and a second level buffer
  • the source L2 cache acquires a DCA instruction carrying the cache identifier of the target L1 cache
  • the source L2 cache sends the DCA instruction to the shared cache
  • the shared cache is according to the DCA instruction.
  • a DCA operation instruction is generated, and then the DCA operation instruction is sent to the target level 1 cache through the target level 2 cache in the target cluster, so that the target level 1 cache writes data or writes the data to the source acquisition. Due to the DCA technology, the steps are reduced compared to the existing Cache Stashing technology, thus reducing the latency and improving the CPU performance of the system.
  • FIG. 1 is an architectural diagram of a multi-level cache system provided by the present application
  • FIG. 3 is a schematic flowchart of an embodiment of a multi-level caching method provided by the present application.
  • FIG. 5 is a schematic diagram of signaling of an embodiment of another multi-level caching method provided by the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a multi-level cache system provided by the present application.
  • the present application provides a cache access method, a multi-level cache system, and a computer system, which are used to reduce the operation steps when the source cluster reads or writes data to the target cluster, thereby reducing the latency and improving the CPU performance of the system.
  • the local Cache is a small-capacity memory, access Faster than main memory, close to the CPU, which can improve the performance of CPU access data.
  • the Cache Stashing technology is based on multiple levels.
  • the cache system as shown in Figure 1, is the architecture diagram of the multi-level cache system.
  • the CPU core 2 in the cluster 1 is pushed to the CPU core 2 in the cluster 2 as an example.
  • the cluster1 is used as the source cluster
  • the cluster2 is the target cluster
  • the L1 Cache corresponding to the CPU core 2 in the cluster 2 is used as the target L1 Cache.
  • the HAC in the source cluster needs to push data to the L1 Cache of the CPU core 2 of the target cluster
  • the HAC in the source cluster initiates a push operation instruction (ie, Snoop Stash) to the L2 Cache of the source cluster, and the target L1 Cache is included in the Snoop Stash.
  • Snoop Stash a push operation instruction
  • the L2 Cache of the source cluster initiates a data consistency operation instruction (ie, Snoop) for each L1 Cache in the source cluster according to the data consistency mechanism, and the data consistency mechanism is mainly for processing shared data to ensure that each CPU core sees
  • the shared data is correct and consistent, which needs to be implemented by the controllers of all levels of Cache, that is, the Snoop mechanism of ARM, so that the object that initiates Snoop has absolute authority to modify the data without causing consistency problems. Therefore, the source cluster's L2 Cache initiates Snoop to ensure data consistency of the source cluster;
  • each L1 Cache of the source cluster feeds back a data consistency operation response (ie, a Snoop Response) to the L2 Cache of the source cluster, and the L2 Cache completes the data consistency of the source cluster according to the Snoop Response received by each L1 Cache. operating;
  • the L2 Cache of the target cluster After receiving the Snoop for Stash, the L2 Cache of the target cluster sends a target instruction to the target L1 Cache according to the address information of the target L1 Cache.
  • the target instruction includes a Soonp and a prefetch trigger instruction, and the prefetch trigger instruction is used to trigger the target L1.
  • the Cache sends a prefetch command. Therefore, after receiving the target instruction, the target L1 Cache generates a prefetch instruction (Ppreetch), sends it to the L2 Cache, and feeds back the Snoop response to the L2 Cache.
  • Ppreetch prefetch instruction
  • the L2 Cache of the target cluster sends a Snoop to the other L1 Cache of the target cluster, and the L1 Cache of the Snoop receives the Snoop response to the L2 Cache.
  • the L2 Cache of the target cluster collects a Snoop response from the target L1 Cache.
  • the L2 Cache of the target cluster collects Snoop responses from other L1 Caches
  • the L2 Cache of the target cluster forwards the Prefetch sent by the target L1 Cache to the L3 Cache.
  • the L3 Cache feeds back a prefetch response (ie, a Fetch response) to the L2 Cache of the target cluster, where the Fetch response includes data.
  • a prefetch response ie, a Fetch response
  • the target cluster's L2 Cache forwards the Fetch response to the target L1 Cache, thereby implementing the HAC data to be pushed into the target L1 Cache.
  • an embodiment of the present application provides a cache access method, including:
  • the multi-level cache system shown in FIG. 1 is taken as an example.
  • cluster1 is the source cluster
  • cluster2 is the target cluster
  • the CPU 1 corresponds to the L1 Cache as the target L1 Cahe
  • the L2 Cache in cluster1 is the source L2 cache (source L2 Cache)
  • the L2 Cache in cluster2 is the target II.
  • CPU core 1 in the cluster 1 needs to read data from the CPU core 2 in the cluster 2, or push the data to the CPU core 2 in the cluster 2, the sender of the DCA instruction is the cluster 1 at this time.
  • CPU core 1 CPU core 1 forwards the HAC instruction to the L2 Cache of cluster1 through the corresponding L1 Cache.
  • the source L2 cache sends the DCA instruction to the shared cache.
  • the source L2 Cache since the DCA instruction received by the source L2 Cache includes the cache identifier of the target L1 Cache, the source L2 Cache can determine that the target L1 Cache is in cluster2, because the shared cache between the source cluster and the target cluster is L3. Cache, therefore, the source L2 Cache sends DCA instructions to the L3 Cache.
  • the shared cache generates a DCA operation instruction according to the DCA instruction.
  • the L3 Cache after receiving the DCA instruction sent by the source L2 Cache, the L3 Cache generates a DCA operation instruction according to the DCA instruction, and the DCA operation instruction may cause the target L1 Cache to write the data of the HAC in the source cluster, or the target L1 Cache.
  • the data is written to the source cluster's HAC.
  • the specific form of the DCA operation instruction needs to be determined by the HAC of the source cluster that generates the DCA instruction. For example, if the HAC needs to read data from the CPU core 2 in the cluster 2, the DCA instruction involves the read type instruction. If the HAC needs to push data to the CPU core 2 in cluster2, then the DCA instruction involves the write type of instruction.
  • the shared cache sends the DCA operation instruction to the target level 1 cache through the target level 2 cache in the target cluster.
  • the data consistency mechanism may be pre-guaranteed by the multi-level cache system, or the data consistency may be determined according to the manner in the Cache Stashing technology shown in FIG. 2 above. The following describes how to determine data consistency by way of an embodiment.
  • the source L2 cache obtains a DCA instruction.
  • the source L2 cache sends a first probe instruction to the source L1 cache in the source cluster.
  • the source L2 Cache after receiving the DCA command, the source L2 Cache initiates a first probe instruction (ie, Snoop) for each L1 Cache in the source cluster according to the data consistency mechanism, and the Snoop is used to perform data consistency operation of the source cluster. After the data consistency operation is completed according to the Snoop, each source L1 Cache in the source cluster feeds back the first probe response (that is, the Snoop response) to the source L2 Cache.
  • Snoop a first probe instruction for each L1 Cache in the source cluster according to the data consistency mechanism
  • the source L2 cache obtains a first probe response of the source L1 cache feedback, and determines, according to the first probe response, that the source cluster has data consistency.
  • the source L2 Cache receives the Snoop response fed back by each source L1 Cache in the source cluster, and according to the Snoop response, it can be determined that the data consistency operation of the source cluster is completed, and the source cluster has data consistency.
  • the source L2 Cache since the DCA instruction received by the source L2 Cache includes the cache identifier of the target L1 Cache, the source L2 Cache can determine that the target L1 Cache is in the target cluster, because the shared cache between the source cluster and the target cluster is L3 Cache, therefore, the source L2 Cache sends DCA instructions to the L3 Cache.
  • the target secondary cache sends a second probe instruction to all level 1 caches in the target cluster according to the DCA probe instruction.
  • the target L2 Cache receives the Snoop response fed back by all L1 Caches in the target cluster, and feeds back the Snoop response to the L3 Cache.
  • the shared cache receives a second probe response of the target secondary cache feedback, and determines, according to the second probe response, that the target cluster has data consistency.
  • the L3 Cache receives the Snoop response of all L1 Caches fed back by the target L2 Cache, and according to the Snoop response, can determine that the data consistency operation of the target cluster is completed, and the target cluster has data consistency.
  • the shared cache generates a DCA operation instruction according to the DCA instruction.
  • the L3 Cache after receiving the DCA instruction sent by the source L2 Cache, the L3 Cache generates a DCA operation instruction according to the DCA instruction.
  • the specific form of the DCA operation instruction needs to be determined by the HAC of the source cluster that generates the DCA instruction, for example, If the source cluster's HAC needs to push data to the target L1 Cache in the target cluster, then the DCA instruction involves the write type of the instruction, and the HAC data is carried in the DCA instruction, and the L3 Cache generates the DCA operation instruction.
  • the shared cache sends the DCA operation instruction to the target level 1 cache through the target level 2 cache in the target cluster.
  • the L3 Cache can determine the target L1 Cache according to the cache identifier of the target L1 Cache carried in the DCA command, so that the target L2 Cache can be determined.
  • the L3 Cache passes the target L2 Cache in the target cluster.
  • the DCA operation instruction generated in step 409 is sent to the target L1 Cache, so that the target L1 Cache can obtain the data of the HAC according to the DCA operation instruction, or send the data to be read by the HAC to the L3 Cache according to the DCA operation instruction, and the L3 Cache Send the data to the HAC.
  • the cache access method when the data consistency needs to be determined is described in detail. Compared with the Cache Stashing technology shown in FIG. 2, the embodiment of the present application does not need to perform step 206 and step 208, and therefore, in determining When the data is consistent, the embodiment of the present application can further reduce the latency compared with the Cache Stashing technology.
  • the working mode of the Cache is not taken into consideration.
  • the working mode of the Cache includes a write-through Cache mode and a write-back Cache mode.
  • the write-through Cache mode is: when the CPU writes data into the memory, in addition to updating the data on the Cache, the data is also written in a dynamic random manner.
  • DRAM Memory Random Access Memory
  • the write-back Cache mode is: whenever the CPU writes data to Memory, it only updates the data on the Cache first. Then let Cache write data back to DRAM when the bus is not in traffic.
  • the multi-level cache system pre-guarantes the data consistency, the natural use is the write-back Cache method, and the DCA instruction is specifically the DCA write-back instruction; the data consistency is determined according to the method in the Cache Stashing technique shown in FIG.
  • the DCA instruction is specifically a DCA direct write instruction.
  • the DCA instruction is a DCA write through instruction (DCA write through), and the DCA operation instruction is a DCA fill instruction (ie, DCA fill);
  • an embodiment of the present application provides a cache access method, including:
  • the source secondary cache obtains a DCA write through
  • the multi-level cache system shown in FIG. 1 is taken as an example.
  • the HAC in the cluster 1 needs to write data to the CPU core 2 in the cluster 2
  • the cluster 1 is used as the source cluster
  • the cluster 2 is used as the target cluster, cluster1.
  • the CPU 1 corresponds to the L1 Cache as the target L1 Cahe, the L2 Cache in the cluster1 as the source L2 cache (source L2 Cache), the L2 Cache in the cluster2 as the target L2 cache (the target L2 Cache), and the L3 Cache as the shared cache.
  • the HAC in the source cluster initiates a DCA write through (ie DCA write-through instruction) with the cache identifier of the target L1 Cache to the source L2 Cache, and the DCA write through contains the data that the HAC needs to write to the target L1 Cache, source L2
  • the Cache receives the DCA write through sent by the HAC.
  • the source L2 cache sends a Snoop to the source L1 cache in the source cluster.
  • the source L2 Cache after receiving the DCA write through, the source L2 Cache sends a Snoop to each source L1 Cache in the source cluster according to the data consistency mechanism. After the source L1 Cache completes the data consistency operation according to the Snoop, it will send to the source L2 Cache. Feedback on Snoop response.
  • the source L2 cache obtains a Snoop response of the source L1 cache, and determines, according to the Snoop response, the source cluster has data consistency.
  • the source L2 Cache receives the Snoop response fed back by each source L1 Cache in the source cluster, and according to the Snoop response, it can be determined that the data consistency operation of the source cluster is completed, and the source cluster has data consistency.
  • the source L2 cache sends the DCA write through to the shared cache.
  • the source L2 Cache since the DCA write through received by the source L2 Cache includes the cache identifier of the target L1 Cache, the source L2 Cache can determine that the target L1 Cache is in the target cluster, due to the shared cache between the source cluster and the target cluster. It is the L3 Cache, so the source L2 Cache sends the DCA write through to the L3 Cache.
  • the shared cache sends a DCA direct write probe instruction to the target secondary cache in the target cluster.
  • the Snoop for DCA write through is sent to the target L2 Cache to make the target L2 Cache target.
  • Cluster performs data consistency operations.
  • the target secondary cache sends a Snoop to all the primary caches in the target cluster according to the Snoop for DCA write through;
  • the target L2 Cache after receiving the Snoop for DCA write through, the target L2 Cache sends a Snoop to all L1 Caches in the target cluster. After all the L1 Caches in the target cluster complete the data consistency operation according to the Snoop, the target L2 Cache feeds back to the target L2 Cache. Snoop response.
  • the target L2 cache receives the Snoop response of all the L1 caches in the target cluster, and feeds the Snoop response to the shared cache.
  • the shared cache receives the Snoop response of the target secondary cache feedback, and determines that the target cluster has data consistency according to the Snoop response.
  • the L3 Cache receives the Snoop response of all L1 Caches fed back by the target L2 Cache, and according to the Snoop response, can determine that the data consistency operation of the target cluster is completed, and the target cluster has data consistency.
  • the L3 Cache after receiving the DCA write through sent by the source L2 Cache, the L3 Cache generates a DCA fill according to the DCA write through, and after obtaining the data of the HAC from the DCA write through, carries the data of the HAC in the DCA fill.
  • the shared cache sends the DCA fill to the target level 1 cache through the target level 2 cache in the target cluster.
  • the L3 Cache after determining the target L1 Cache according to the cache identifier of the target L1 Cache carried in the DCA write through, the L3 Cache sends the generated DCA fill to the target L1 Cache through the target L2 Cache, because the DCA fill carries the HAC. Data, therefore, when the target L1 Cache obtains the DCA fill, the HAC data is already written to the target L1 Cache.
  • an embodiment of the present application provides a cache access method, including:
  • the source secondary cache obtains DCA write-back
  • the DCA write-back of the cache identifier is to the source L2 Cache of the source cluster, and the DCA write-back contains the data that the HAC needs to write to the target L1 Cache, and the source L2 Cache receives the DCA write-back sent by the HAC.
  • the source L2 cache sends the DCA write-back to the shared cache.
  • the source L2 Cache since the DCA write-back received by the source L2 Cache includes the cache identifier of the target L1 Cache, the source L2 Cache can determine that the target L1 Cache is in the target cluster, due to the sharing between the source cluster and the target cluster.
  • the cache is the L3 Cache, so the source L2 Cache sends the DCA write-back to the L3 Cache.
  • the shared cache generates a DCA fill according to the DCA write-back.
  • the L3 Cache after receiving the DCA write-back sent by the source L2 Cache, the L3 Cache generates a DCA fill according to the DCA write-back, and after obtaining the HAC data from the DCA write through, carries the HAC data in the DCA fill. .
  • the shared cache sends the DCA fill to the target level 1 cache through the target level 2 cache in the target cluster.
  • the DCA instruction may specifically be a DCA write-back instruction and a DCA write-through instruction, which makes the implementation of the scheme more diverse.
  • FIG. 5 and FIG. 6 introduces that when the source cluster needs to write data into the target cluster, the following needs to describe the source cluster needs to read data from the target cluster by using the embodiment, and the foregoing description is introduced in the embodiment.
  • HAC If the data needs to be read from the CPU core of the target cluster, the DCA instruction is DCA read, and the L1 Cache corresponding to the CPU core of the target cluster actually needs to write data to the HAC through write-back.
  • the shared cache generates DCA operation instructions according to the DCA instruction, including:
  • the shared cache determines the target level 1 cache according to the cache identifier in the DCA read instruction
  • the shared cache generates a probe writeback instruction, which is used to instruct the target secondary cache to read data from the target primary cache.
  • the shared cache when the source cluster needs to read data from the target cluster, and the multi-level cache system does not require cache consistency, the shared cache generates a DCA operation instruction according to the DCA instruction, specifically: the L3 Cache is based on the cache identifier in the DCA read. Determine the target L1 Cache, the L3 Cache generates a Snoop to writeback, and the Snoop to writeback is used to instruct the target L2 Cache to read data from the target L1 Cache.
  • the method further includes:
  • the target level 1 cache feeds back a third probe response to the target level 2 cache according to the probe writeback instruction, and the third probe response includes data;
  • the target L2 cache forwards the third probe response to the shared cache
  • the shared cache sends a DCA read response to the source L2 cache, so that the source L2 cache gets the data of the target L1 cache according to the DCA read response.
  • the above embodiment introduces a multi-level caching method.
  • the multi-level caching system applied by the multi-level caching method will be described in detail below.
  • an embodiment of the present application provides a multi-level cache system, including:
  • the shared cache 701 and the at least two clusters, the at least two clusters include a source cluster 702 and a target cluster 703.
  • the source cluster 702 includes a source level cache 7021 and a source level cache 7022.
  • the target cluster 703 includes a target level 1 cache 7031 and a target. Secondary cache 7032;
  • the source L2 cache 7022 is configured to acquire a DCA instruction when the source cluster 702 needs to read or write data in the target cluster 703, the source L2 cache 7022 is a L2 cache in the source cluster 702, and the DCA instruction includes the target cluster 703.
  • the shared cache 701 is also used to send DCA operation instructions to the target L1 cache 7031 through the target L2 cache 7032 in the target cluster 703, such that the target L1 cache 7031 writes data or writes data to the source cluster 702.
  • the source L2 cache 7022 acquires a DCA command carrying the cache identifier of the target L1 cache 7031, and the source L2 cache 7022 will execute the DCA command.
  • the shared cache 701 sends to the shared cache 701, the shared cache 701 generates a DCA operation instruction according to the DCA instruction, and then sends the DCA operation instruction to the target level 1 cache 7031 through the target level 2 cache 7032 in the target cluster 703, so that the target level 1 cache 7031 writes data. Or write data to source cluster 702.
  • the steps are reduced.
  • the data of the HAC in the source cluster is pushed into the target L1 Cache in the target cluster.
  • the target L1 Cache needs to send the Prefetch to the L3 Cache, and then the L3 Cache carries the data of the HAC in the Fetch response and feeds back to the target L1 Cache.
  • the L3 Cache carries the data of the HAC.
  • the DCA operation instruction when the L3 Cache sends the DCA operation instruction to the target L1 Cache through the L2 Cache of the target cluster, the data of the HAC is pushed into the target L1 Cache. It can be clearly seen that the steps in the embodiment of the present application are significantly reduced compared to the Cache Stashing technology, thereby reducing the latency and improving the CPU performance of the system.
  • the source secondary cache 7022 is further configured to send a first probe instruction to the source level cache 7021 in the source cluster 702, so that the source level cache 7021 feeds back the first probe response, and the first probe instruction is used to perform data of the source cluster 702. Consistent operation
  • the source secondary cache 7022 is further configured to receive a first probe response fed back by the source level cache 7021. and determine, according to the first probe response, that the source cluster 702 has data consistency.
  • the data consistency is determined by using the Cache Stashing technology shown in FIG. 2, specifically, the source L2 cache 7022 sends a first probe command to the source L1 cache 7021 in the source cluster 702, so that the source level is The cache 7021 feeds back the first probe response. After receiving the first probe response fed back by the source level cache 7021, the source secondary cache 7022 can determine that the source cluster 702 has data consistency according to the first probe response.
  • the shared cache 701 is further configured to send a DCA probe instruction to the target secondary cache 7032 in the target cluster 703;
  • the target L2 cache 7032 is configured to send a second probe instruction to all the L1 caches in the target cluster 703 according to the DCA probe command, so that all the L1 caches in the target cluster 703 feed back a second probe response, and the second probe command is used to Performing data consistency operations of the target cluster 703;
  • the target second level cache 7032 is further configured to receive a second probe response of all the level 1 cache feedbacks in the target cluster 703, and feed back the second probe response to the shared cache 701;
  • the shared cache 701 is further configured to receive a second probe response fed back by the target secondary cache 7032, and determine, according to the second probe response, that the target cluster 703 has data consistency.
  • the data consistency is determined by using the Cache Stashing technology shown in FIG. 2, which can be obtained according to the cache access method embodiment shown in FIG. 4, and can be seen by comparing with the Cache Stashing technology shown in FIG.
  • step 206 and step 208 need not be performed. Therefore, when determining data consistency, the latency can be further reduced, and the CPU performance of the system is improved.
  • the DCA instruction when the source cluster 702 needs to write data to the target cluster 703, the DCA instruction is a DCA direct write instruction, the DCA direct write instruction further includes data, and the DCA operation instruction is a DCA fill instruction.
  • the shared cache 701 is further configured to determine the target level 1 cache 7031 according to the cache identifier in the DCA direct write instruction, and acquire data in the DCA direct write instruction;
  • the shared cache 701 is also used to generate a DCA fill instruction, the DCA fill instruction includes data, and the DCA fill instruction is used to directly write data to the target level 1 cache 7031.
  • the multi-level cache system pre-guarantes the data consistency, the natural use is the write-back Cache method, and the DCA instruction is specifically the DCA write-back instruction; the data consistency is determined according to the method in the Cache Stashing technique shown in FIG.
  • the DCA instruction is specifically a DCA direct write instruction.
  • the DCA instruction can be a DCA write-through instruction or a DCA write-back instruction, when the source cluster 702 needs to write data to the target cluster 703, and the DCA instruction is DCA straight
  • the shared cache 701 determines the target level 1 cache 7031 according to the cache identifier in the DCA direct write instruction, generates a DCA fill instruction, and the DCA fill instruction includes the data of the source cluster 702, and then sends the DCA fill instruction to the target level 1 cache 7031. It is possible to directly write data to the target cluster 703.
  • the DCA instruction is a DCA writeback instruction, and the DCA write back The instruction also includes data, and the DCA operation instruction is a DCA fill instruction.
  • the shared cache 701 is further configured to determine the target level 1 cache 7031 according to the cache identifier in the DCA write back instruction, and acquire data in the DCA write back instruction;
  • the multi-level cache system pre-guarantes the data consistency, the natural use is the write-back Cache method, and the DCA instruction is specifically the DCA write-back instruction; the data consistency is determined according to the method in the Cache Stashing technique shown in FIG.
  • the DCA instruction is specifically a DCA direct write instruction. Therefore, when the source cluster 702 needs to write data to the target cluster 703, and the source cluster 702 and the target cluster 703 have data consistency, the DCA instruction can be a DCA write-through instruction or a DCA write-back instruction when the source cluster 702 needs to put data.
  • the DCA operation instruction is a DCA fill instruction
  • the shared cache 701 determines the target level 1 cache 7031 according to the cache identifier in the DCA write-back instruction to generate a DCA fill instruction, DCA.
  • the padding instruction includes the data of the source cluster 702, and then the DCA padding instruction is sent to the target level 1 cache 7031 to directly write the data to the target cluster 703.
  • the DCA instruction is a DCA read instruction
  • the DCA operation instruction is a probe.
  • the shared cache 701 is further configured to determine a target level 1 cache 7031 according to the cache identifier in the DCA read instruction;
  • the shared cache 701 is further configured to generate a probe writeback instruction, and the probe writeback instruction is used to instruct the target secondary cache 7032 to read data from the target primary cache 7031.
  • the specific case may be that each cluster of the multi-level cache system has only one CPU core. That is, there is only one level 1 cache. At this time, the Cache consistency does not need to be determined, or the Cache consistency of the multi-level cache system can be ensured by the preset setting.
  • the shared cache 701 generates a DCA operation instruction according to the DCA instruction, specifically: the shared cache 701.
  • the target level 1 cache 7031 is determined according to the cache identifier in the DCA read.
  • the shared cache 701 generates a Snoop to writeback, and the Snoop to writeback is used to instruct the target level 2 cache 7032 to read data from the target level 1 cache 7031.
  • a target level 1 cache 7031 configured to receive a probe writeback instruction
  • the target level 1 cache 7031 is further configured to feed back a third probe response to the target level 2 cache 7032 according to the probe writeback instruction, where the third probe response includes data;
  • the target secondary cache 7032 is further configured to forward the third probe response to the shared cache 701;
  • the shared cache 701 is further configured to generate a DCA read response according to the third probe response, generate a DCA read response, and the DCA read response includes the data;
  • the shared cache 701 is further configured to send a DCA read response to the source L2 cache 7022, such that the source L2 cache 7022 obtains data in the target L1 cache 7031 according to the DCA read response.
  • the target level cache 7031 feeds back a Snoop response to the shared cache 701 according to the Snoop to writeback, and the shared cache 701 generates a DCA read response according to the Snoop response, and generates a DCA read response.
  • the source L2 cache 7022 sends to the source L2 cache 7022, the source L2 cache 7022 reads the data in the target L1 cache 7031 according to the DCA read response, and then the HAC of the source cluster 702 or the L1 cache of the source cluster 702 requests the L1 cache of the data from The source secondary cache 7022 obtains data, thereby completing the source cluster 702 to read data from the Cache of the target cluster 703.
  • the embodiment of the present application provides a computer system 800, including:
  • the external storage 82 and the multi-level cache system 81, the external storage 82 and the multi-level cache system 81 are connected by a bus;
  • the multi-level cache system 81 includes a shared cache 801 and at least two clusters, each cluster having at least one level 1 cache and a level 2 cache, at least two clusters including a source cluster 802 and a target cluster 803, and the source cluster 802 includes a source level
  • the source L2 cache 8022 is configured to receive a DCA command when the source cluster 802 needs to read or write data in the target cluster 803.
  • the source L2 cache 8022 is a L2 cache in the source cluster 802, and the DCA command includes the target cluster 803.
  • the source L2 cache 8022 is further configured to send the DCA instruction to the shared cache 801;
  • a shared cache 801 configured to generate a DCA operation instruction according to the DCA instruction
  • the shared cache 801 is also used to send DCA operation instructions to the target level 1 cache 8031 through the target level 2 cache 8032 in the target cluster 803, so that the target level 1 cache 8031 writes data or writes data to the source cluster 802.
  • the data of the HAC is pushed into the target L1 Cache.
  • the Cache Stashing technology needs to perform the operation of the Prefetch-Fetch response.
  • this application only needs to directly push the HAC data into the target L1 Cache through the DCA operation instruction. Therefore, this application can reduce the latency and improve the CPU performance of the system.
  • the embodiment of the present invention further provides a computer program product for implementing an access request processing method, comprising: a computer readable storage medium storing program code, the program code comprising instructions for executing the method described in any one of the foregoing method embodiments Process.
  • a computer readable storage medium storing program code, the program code comprising instructions for executing the method described in any one of the foregoing method embodiments Process.
  • the foregoing storage medium includes: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a random access memory (RAM), a solid state disk (SSD), or other nonvolatiles.
  • a non-transitory machine readable medium that can store program code, such as non-volatile memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请公开了一种缓存访问方法、多级缓存系统及计算机系统,用于减少源集群向目标集群读取或写入数据时的操作步骤,从而降低latency,提升了系统的CPU性能。本申请实施例方法包括:当源集群需要在目标集群读取或写入数据时,源二级缓存获取DCA指令,源二级缓存为源集群中的二级缓存,DCA指令包括目标集群中的目标一级缓存的缓存标识;源二级缓存将DCA指令发送至共享缓存;共享缓存根据DCA指令生成DCA操作指令;共享缓存通过目标集群中的目标二级缓存将DCA操作指令发送至目标一级缓存,使得目标一级缓存写入数据或将数据写入源集群。

Description

一种缓存访问方法、多级缓存系统及计算机系统
本申请要求于2017年11月2日提交中国专利局、申请号为201711063243.X、申请名称为“一种缓存访问方法、多级缓存系统及计算机系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,具体涉及一种缓存访问方法、多级缓存系统及计算机系统。
背景技术
随着中央处理器(Central Processing Unit,CPU)技术的发展,CPU对于内存Memory访问的延时latency问题越来越敏感,提高数据访问的效率,以及减少latency问题成为了提升CPU性能的关键。
缓存Cache是介于CPU和Memory之间的小容量存储器,存取速度比Memory快,接近CPU。它能高速地向CPU提供指令和数据,提高程序的执行速度。随着半导体器件集成度的不断提高,当前已出现了两级以上的多级Cache系统,每一级Cache的所有控制逻辑全部由该级Cache的内部控制器实现。Cache Stashing技术是由ARM公司提供的现有的解决Memory访问的latency问题的有效方案。当源集群cluster的CPU核需要把数据推送到目标cluster的CPU核时(目前多核处理器系统中通常把多个CPU核集成在一起,形成一个cluster),Cache Stashing技术需要先完成数据一致性的Snoop Stash操作,然后由目标cluster的CPU核的一级Cache发起Prefetch操作,将源cluster的CPU核中的数据取回。
但是,Cache Stashing技术存在操作步骤繁琐的缺点,latency问题并未得到有效的解决,CPU性能仍然受到latency问题困扰。
发明内容
本申请提供了一种缓存访问方法、多级缓存系统及计算机系统,用于减少源集群向目标集群读取或写入数据时的操作步骤,从而降低latency,提升了系统的CPU性能。
本申请第一方面提供一种缓存访问方法,应用于多级缓存系统,所述多级缓存系统包括共享缓存及至少两个集群,每个集群具有至少一个一级缓存及二级缓存,所述方法包括:
当源集群需要在目标集群读取或写入数据时,源二级缓存获取直接访问缓存DCA指令,所述源二级缓存为所述源集群的二级缓存,所述DCA指令包括所述目标集群中的目标一级缓存的缓存标识;
所述源二级缓存将所述DCA指令发送至所述共享缓存;
所述共享缓存根据所述DCA指令生成DCA操作指令;
所述共享缓存通过所述目标集群中的目标二级缓存将所述DCA操作指令发送至所述目标一级缓存,使得所述目标一级缓存写入所述数据或将所述数据写入所述源集群。
在源集群需要在目标集群读取或写入数据时,源二级缓存获取携带目标一级缓存的缓存标识的DCA指令,源二级缓存将DCA指令发送至共享缓存,共享缓存根据DCA指令生成DCA操作指令,然后通过目标集群中的目标二级缓存将DCA操作指令发送至目标一级缓存,使得目标一级缓存写入数据或将数据写入源获取。由于采用的是DCA技术,与现有的Cache Stashing技术相比步骤有所减少,例如,以将源集群中的HAC的数据推送进目标集群中的目标一级缓存中为例,Cache Stashing技术中需要通过目标一级缓存向共享缓存发送Prefetch,然后共享缓存将HAC的数据携带于Fetch response中,反馈给目标一级缓存;而实施例中,共享缓存将HAC的数据携带于DCA操作指令中,在共享缓存通过目标集群的L2 Cache将DCA操作指令发送到目标一级缓存时,就实现了将HAC的数据推送进目标一级缓存。可以明显的看出,相比于Cache Stashing技术,步骤明显减少了,因此降低latency,提升了系统的CPU性能。
结合本申请第一方面,本申请第一方面第一实施方式中,所述源二级缓存将所述DCA指令发送至所述共享缓存之前,还包括:
所述源二级缓存向所述源集群中的源一级缓存发送第一探查指令,使得所述源一级缓存反馈第一探查回应,所述第一探查指令用于进行所述源集群的数据一致性操作;
所述源二级缓存获取所述源一级缓存反馈的第一探查回应,根据所述第一探查回应确定所述源集群具有数据一致性。
考虑到数据一致性机制可以是多级缓存系统预先保证了,也可以是按照Cache Stashing技术中的方式确定数据一致性的,如果是按照Cache Stashing技术中的方式确定的,那么需要当源二级缓存获取到DCA指令之后,根据数据一致性机制对源集群的中的各个一级缓存发起第一探查指令,源集群的中的各个一级缓存根据第一探查指令完成数据一致性操作后,向源二级缓存反馈第一探查回应,源二级缓存获取各个一级缓存反馈的第一探查回应,根据第一探查回应就能够确定源集群具有数据一致性。
结合本申请第一方面第一实施方式,本申请第一方面第二实施方式中,所述共享缓存根据所述DCA指令生成DCA操作指令之前,还包括:
所述共享缓存向所述目标集群中的目标二级缓存发送DCA探查指令;
所述目标二级缓存根据所述DCA探查指令向所述目标集群中的所有一级缓存发送第二探查指令,使得所述目标集群中的所有一级缓存反馈第二探查回应,所述第二探查指令用于进行所述目标集群的数据一致性操作;
所述目标二级缓存接收所述目标集群中的所有一级缓存反馈的第二探查回应,并将所述第二探查回应反馈至所述共享缓存;
所述共享缓存接收所述目标二级缓存反馈的第二探查回应,根据所述第二探查回应确定所述目标集群具有数据一致性。
考虑到数据一致性机制可以是多级缓存系统预先保证了,也可以是按照Cache Stashing技术中的方式确定数据一致性的,如果是按照Cache Stashing技术中的方式确定的,当共享缓存接收到DCA指令之后,根据数据一致性机制需要先确认目标集群的数据一致性,因此需要向目标二级缓存发送DCA探查指令,目标二级缓存接收到DCA探查指令 之后,向目标集群中所有一级缓存发送第二探查指令,使得所有一级缓存反馈第二探查回应,目标二级缓存接收到第二探查回应,并将第二探查回应反馈至共享缓存,共享缓存接收到所有一级缓存的第二探查回应后,确定目标集群具有数据一致性。
结合本申请第一方面第二实施方式,本申请第一方面第三实施方式中,当所述源集群需要将数据写入所述目标集群时,所述DCA指令为DCA直写指令,所述DCA直写指令还包括所述数据,所述DCA操作指令为DCA填充指令,
所述共享缓存根据所述DCA指令生成DCA操作指令,包括:
所述共享缓存根据所述DCA直写指令中的所述缓存标识确定所述目标一级缓存,并获取所述DCA直写指令中的所述数据;
所述共享缓存生成DCA填充指令,所述DCA填充指令包括所述数据,所述DCA填充指令用于直接将所述数据写入所述目标一级缓存。
目前Cache的工作方式包括直写式Cache方式和回写式Cache方式,直写式Cache方式是:当CPU要将数据写入内存时,除了更新Cache上的数据外,也将数据写在DRAM中,以维持Memory与Cache的数据一致性;回写式Cache方式是:每当CPU要将数据写入Memory时,只会先更新Cache上的数据,随后再让Cache在总线不塞车的时候,才把数据写回DRAM。那么多级缓存系统预先保证了数据一致性的前提下,采用的自然是回写式Cache方式,DCA指令具体为DCA回写指令;按照Cache Stashing技术中的方式确定数据一致性的情况下,采用的是直写式Cache方式,DCA指令具体为DCA直写指令。因此,当源集群需要将数据写入目标集群时,DCA指令可以为DCA直写指令或DCA填充指令,源集群的数据包含于DCA指令中,当源集群需要将数据写入目标集群时,并且DCA指令为DCA直写指令时,共享缓存根据DCA直写指令中的缓存标识确定目标一级缓存,生成DCA填充指令,DCA填充指令包含源集群的数据,那么将DCA填充指令发送至目标一级缓存,就能实现直接将数据写入目标集群。
结合本申请第一方面,本申请第一方面第四实施方式中,当所述源集群需要将数据写入所述目标集群,并且所述源集群和所述目标集群具有数据一致性时,所述DCA指令为DCA回写指令,所述DCA回写指令还包括所述数据,所述DCA操作指令为DCA填充指令,
所述共享缓存根据所述DCA指令生成DCA操作指令,包括:
所述共享缓存根据所述DCA回写指令中的所述缓存标识确定所述目标一级缓存,并获取所述DCA回写指令中的所述数据;
所述共享缓存生成DCA填充指令,所述DCA填充指令包括所述数据,所述DCA填充指令用于直接将所述数据写入所述目标一级缓存。
目前Cache的工作方式包括直写式Cache方式和回写式Cache方式,直写式Cache方式是:当CPU要将数据写入内存时,除了更新Cache上的数据外,也将数据写在DRAM中,以维持Memory与Cache的数据一致性;回写式Cache方式是:每当CPU要将数据写入Memory时,只会先更新Cache上的数据,随后再让Cache在总线不塞车的时候,才把数据写回DRAM。那么多级缓存系统预先保证了数据一致性的前提下,采用的自然是回写式Cache方式,DCA指令具体为DCA回写指令;按照Cache Stashing技术中的方式确定数据一致性 的情况下,采用的是直写式Cache方式,DCA指令具体为DCA直写指令。因此,当源集群需要将数据写入目标集群,并且源集群和目标集群具有数据一致性时,DCA指令可以为DCA直写指令或DCA填充指令,源集群的数据包含于DCA指令中,当源集群需要将数据写入目标集群时,并且DCA指令为DCA直写指令时,DCA指令为DCA回写指令,共享缓存根据DCA回写指令中的缓存标识确定目标一级缓存,生成DCA填充指令,DCA填充指令包含源集群的数据,那么将DCA填充指令发送至目标一级缓存,就能实现直接将数据写入目标集群。
结合本申请第一方面第二实施方式,本申请第一方面第五实施方式中,当所述源集群需要从所述目标集群读取数据,并且所述多级缓存系统不要求缓存一致性时,所述DCA指令为DCA读取指令,所述DCA操作指令为探查回写指令,
所述共享缓存根据所述DCA指令生成DCA操作指令,包括:
所述共享缓存根据所述DCA读取指令中的所述缓存标识确定所述目标一级缓存;
所述共享缓存生成探查回写指令,所述探查回写指令用于指示所述目标二级缓存从所述目标一级缓存读取所述数据。
上述是源集群需要将数据写入目标集群时,如果源集群需要从目标集群读取数据时,而且每个CPU中只具有一个一级缓存或者多级缓存系统的Cache一致性能够保证时,共享缓存根据DCA指令生成DCA操作指令具体为:共享缓存根据DCA读取指令中的缓存标识确定目标一级缓存,共享缓存生成探查回写指令,探查回写指令用于指示目标二级缓存从目标L1 Cache读取数据。
结合本申请第一方面第五实施方式,本申请第一方面第六实施方式中,所述共享缓存通过所述目标集群中的目标二级缓存将所述DCA操作指令发送至所述目标一级缓存之后,还包括:
所述目标一级缓存接收所述探查回写指令;
所述目标一级缓存根据所述探查回写指令向所述目标二级缓存反馈第三探查回应,所述第三探查回应包括所述数据;
所述目标二级缓存将所述第三探查回应转发至所述共享缓存;
所述共享缓存根据所述第三探查回应生成DCA读取回应,所述DCA读取回应包括所述数据;
所述共享缓存将所述DCA读取回应发送至所述源二级缓存,使得所述源二级缓存根据所述DCA读取回应得到所述目标一级缓存的所述数据。
目标一级缓存接收探查回写指令之后,目标一级缓存根据探查回写指令向目标二级缓存反馈第三探查回应,并且将源集群需要读取的数据包含在第三探查回应中,目标二级缓存将第三探查回应转发至共享缓存,共享缓存根据第三探查回应生成探查回写指令,并且探查回写指令中包括有数据,共享缓存将探查回写指令发送至源二级缓存,使得源二级缓存获取到探查回写指令后,就能得到探查回写指令中包含的源集群需要读取的目标集群的数据。
本申请第二方面提供一种多级缓存系统,包括:
共享缓存及至少两个集群,每个集群具有至少一个一级缓存及二级缓存;
源二级缓存,用于当所述源集群需要在所述目标集群读取或写入数据时,获取直接访问缓存DCA指令,所述源二级缓存为所述源集群中的二级缓存,所述DCA指令包括所述目标集群中的目标一级缓存的缓存标识;
所述源二级缓存,还用于将所述DCA指令发送至所述共享缓存;
所述共享缓存,用于根据所述DCA指令生成DCA操作指令;
所述共享缓存,还用于通过所述目标集群中的目标二级缓存将所述DCA操作指令发送至所述目标一级缓存,使得所述目标一级缓存写入所述数据或将所述数据写入所述源集群。
在源集群需要在目标集群读取或写入数据时,源二级缓存获取携带目标一级缓存的缓存标识的DCA指令,源二级缓存将DCA指令发送至共享缓存,共享缓存根据DCA指令生成DCA操作指令,然后通过目标集群中的目标二级缓存将DCA操作指令发送至目标一级缓存,使得目标一级缓存写入数据或将数据写入源获取。由于采用的是DCA技术,与现有的Cache Stashing技术相比步骤有所减少,例如,以将源集群中的HAC的数据推送进目标集群中的目标一级缓存中为例,Cache Stashing技术中需要通过目标一级缓存向共享缓存发送Prefetch,然后共享缓存将HAC的数据携带于Fetch response中,反馈给目标一级缓存;而实施例中,共享缓存将HAC的数据携带于DCA操作指令中,在共享缓存通过目标集群的L2 Cache将DCA操作指令发送到目标一级缓存时,就实现了将HAC的数据推送进目标一级缓存。可以明显的看出,相比于Cache Stashing技术,步骤明显减少了,因此降低latency,提升了系统的CPU性能。
结合本申请第二方面,本申请第二方面第一实施方式中,
所述源二级缓存,还用于向所述源集群中的源一级缓存发送第一探查指令,使得所述源一级缓存反馈第一探查回应,所述第一探查指令用于进行所述源集群的数据一致性操作;
所述源二级缓存,还用于接收所述源一级缓存反馈的第一探查回应,根据所述第一探查回应确定所述源集群具有数据一致性。
考虑到数据一致性机制可以是多级缓存系统预先保证了,也可以是按照Cache Stashing技术中的方式确定数据一致性的,如果是按照Cache Stashing技术中的方式确定的,那么需要当源二级缓存获取到DCA指令之后,根据数据一致性机制对源集群的中的各个一级缓存发起第一探查指令,源集群的中的各个一级缓存根据第一探查指令完成数据一致性操作后,向源二级缓存反馈第一探查回应,源二级缓存获取各个一级缓存反馈的第一探查回应,根据第一探查回应就能够确定源集群具有数据一致性。
结合本申请第二方面第一实施方式,本申请第二方面第二实施方式中,
所述共享缓存,还用于向所述目标集群中的目标二级缓存发送DCA探查指令;
所述目标二级缓存,用于根据所述DCA探查指令向所述目标集群中的所有一级缓存发送第二探查指令,使得所述目标集群中的所有一级缓存反馈第二探查回应,所述第二探查指令用于进行所述目标集群的数据一致性操作;
所述目标二级缓存,还用于接收所述目标集群中的所有目标一级缓存反馈的第二探查回应,并将所述第二探查回应反馈至所述共享缓存;
所述共享缓存,还用于接收所述目标二级缓存反馈的第二探查回应,根据所述第二探 查回应确定所述目标集群具有数据一致性。
考虑到数据一致性机制可以是多级缓存系统预先保证了,也可以是按照Cache Stashing技术中的方式确定数据一致性的,如果是按照Cache Stashing技术中的方式确定的,当共享缓存接收到DCA指令之后,根据数据一致性机制需要先确认目标集群的数据一致性,因此需要向目标二级缓存发送DCA探查指令,目标二级缓存接收到DCA探查指令之后,向目标集群中所有一级缓存发送第二探查指令,使得所有一级缓存反馈第二探查回应,目标二级缓存接收到第二探查回应,并将第二探查回应反馈至共享缓存,共享缓存接收到所有一级缓存的第二探查回应后,确定目标集群具有数据一致性。
结合本申请第二方面第二实施方式,本申请第二方面第三实施方式中,当所述源集群需要将数据写入所述目标集群时,所述DCA指令为DCA直写指令,所述DCA直写指令还包括所述数据,所述DCA操作指令为DCA填充指令,
所述共享缓存,还用于根据所述DCA直写指令中的所述缓存标识确定所述目标一级缓存,并获取所述DCA直写指令中的所述数据;
所述共享缓存,还用于生成DCA填充指令,所述DCA填充指令包括所述数据,所述DCA填充指令用于直接将所述数据写入所述目标一级缓存。
目前Cache的工作方式包括直写式Cache方式和回写式Cache方式,直写式Cache方式是:当CPU要将数据写入内存时,除了更新Cache上的数据外,也将数据写在DRAM中,以维持Memory与Cache的数据一致性;回写式Cache方式是:每当CPU要将数据写入Memory时,只会先更新Cache上的数据,随后再让Cache在总线不塞车的时候,才把数据写回DRAM。那么多级缓存系统预先保证了数据一致性的前提下,采用的自然是回写式Cache方式,DCA指令具体为DCA回写指令;按照Cache Stashing技术中的方式确定数据一致性的情况下,采用的是直写式Cache方式,DCA指令具体为DCA直写指令。因此,当源集群需要将数据写入目标集群时,DCA指令可以为DCA直写指令或DCA回写指令,源集群的数据包含于DCA指令中,当源集群需要将数据写入目标集群时,并且DCA指令为DCA直写指令时,共享缓存根据DCA直写指令中的缓存标识确定目标一级缓存,生成DCA填充指令,DCA填充指令包含源集群的数据,那么将DCA填充指令发送至目标一级缓存,就能实现直接将数据写入目标集群。
结合本申请第二方面,本申请第二方面第四实施方式中,所述源集群需要将数据写入所述目标集群,并且所述源集群和所述目标集群具有数据一致性时,所述DCA指令为DCA回写指令,所述DCA回写指令还包括所述数据,所述DCA操作指令为DCA填充指令,
所述共享缓存,还用于根据所述DCA回写指令中的所述缓存标识确定所述目标一级缓存,并获取所述DCA回写指令中的所述数据;
所述共享缓存,还用于生成DCA填充指令,所述DCA填充指令包括所述数据,所述DCA填充指令用于直接将所述数据写入所述目标一级缓存。
目前Cache的工作方式包括直写式Cache方式和回写式Cache方式,直写式Cache方式是:当CPU要将数据写入内存时,除了更新Cache上的数据外,也将数据写在DRAM中,以维持Memory与Cache的数据一致性;回写式Cache方式是:每当CPU要将数据写入 Memory时,只会先更新Cache上的数据,随后再让Cache在总线不塞车的时候,才把数据写回DRAM。那么多级缓存系统预先保证了数据一致性的前提下,采用的自然是回写式Cache方式,DCA指令具体为DCA回写指令;按照Cache Stashing技术中的方式确定数据一致性的情况下,采用的是直写式Cache方式,DCA指令具体为DCA直写指令。因此,当源集群需要将数据写入目标集群,并且源集群和目标集群具有数据一致性时,DCA指令可以为DCA直写指令或DCA填充指令,源集群的数据包含于DCA指令中,当源集群需要将数据写入目标集群时,并且DCA指令为DCA直写指令时,DCA操作指令为DCA填充指令,共享缓存根据DCA回写指令中的缓存标识确定目标一级缓存,生成DCA填充指令,DCA填充指令包含源集群的数据,那么将DCA填充指令发送至目标一级缓存,就能实现直接将数据写入目标集群。
结合本申请第二方面第二实施方式,本申请第二方面第五实施方式中,当所述源集群需要从所述目标集群读取数据,并且所述多级缓存系统不要求缓存一致性时,所述DCA指令为DCA读取指令,所述DCA操作指令为探查回写指令,
所述共享缓存,还用于根据所述DCA读取指令中的所述缓存标识确定所述目标一级缓存;
所述共享缓存,还用于生成探查回写指令,所述探查回写指令用于指示所述目标二级缓存从所述目标一级缓存读取所述数据。
上述是源集群需要将数据写入目标集群时,如果源集群需要从目标集群读取数据时,而且每个集群中只具有一个一级缓存或者多级缓存系统的Cache一致性能够保证时,共享缓存根据DCA指令生成DCA操作指令具体为:共享缓存根据DCA读取指令中的缓存标识确定目标一级缓存,共享缓存生成探查回写指令,探查回写指令用于指示目标二级缓存从目标L1 Cache读取数据。
结合本申请第二方面第五实施方式,本申请第二方面第六实施方式中,
所述目标一级缓存,用于接收所述探查回写指令;
所述目标一级缓存,还用于根据所述探查回写指令向所述目标二级缓存反馈第三探查回应,所述第三探查回应包括所述数据;
所述目标二级缓存,还用于将所述第三探查回应转发至所述共享缓存;
所述共享缓存,还用于根据所述第三探查回应生成DCA读取回应,所述DCA读取回应包括所述数据;
所述共享缓存,还用于将所述DCA读取回应发送至所述源二级缓存,使得所述源二级缓存根据所述DCA读取回应得到所述目标一级缓存的所述数据。
目标一级缓存接收探查回写指令之后,目标一级缓存根据探查回写指令向目标二级缓存反馈第三探查回应,并且将源集群需要读取的数据包含在第三探查回应中,目标二级缓存将第三探查回应转发至共享缓存,共享缓存根据第三探查回应生成探查回写指令,并且探查回写指令中包括有数据,共享缓存将探查回写指令发送至源二级缓存,使得源二级缓存获取到探查回写指令后,就能得到探查回写指令中包含的源集群需要读取的目标集群的数据。
本申请第三方面提供一种计算机系统,包括:
外存及多级缓存系统,所述外存与所述多级缓存系统通过总线连接;
所述多级缓存系统包括共享缓存及至少两个集群,每个集群具有至少一个一级缓存及二级缓存;
当所述源集群需要在所述目标集群读取或写入数据时,所述源二级缓存获取直接访问缓存DCA指令,所述源二级缓存为所述集群中的二级缓存,所述DCA指令包括所述目标集群中的目标一级缓存的缓存标识;
所述源二级缓存将所述DCA指令发送至所述共享缓存;
所述共享缓存根据所述DCA指令生成DCA操作指令;
所述共享缓存通过所述目标集群中的目标二级缓存将所述DCA操作指令发送至所述目标一级缓存,使得所述目标一级缓存写入所述数据或将所述数据写入所述集群。
在计算机系统中,外存及多级缓存系统,外存与多级缓存系统通过总线连接,多级缓存系统包括共享缓存及至少两个集群,每个集群具有至少一个一级缓存及二级缓,在源集群需要在目标集群读取或写入数据时,源二级缓存获取携带目标一级缓存的缓存标识的DCA指令,源二级缓存将DCA指令发送至共享缓存,共享缓存根据DCA指令生成DCA操作指令,然后通过目标集群中的目标二级缓存将DCA操作指令发送至目标一级缓存,使得目标一级缓存写入数据或将数据写入源获取。由于采用的是DCA技术,与现有的Cache Stashing技术相比步骤有所减少,因此降低latency,提升了系统的CPU性能。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例和现有技术描述中所需要使用的附图作简单地介绍。
图1为本申请提供的多级缓存系统的架构图;
图2为本申请提供的Cache Stashing技术的信令图;
图3为本申请提供的一个多级缓存方法的实施例流程示意图;
图4为本申请提供的一个多级缓存方法的实施例信令示意图;
图5为本申请提供的另一个多级缓存方法的实施例信令示意图;
图6为本申请提供的再一个多级缓存方法的实施例信令示意图;
图7为本申请提供的一个多级缓存系统的实施例结构示意图;
图8为本申请提供的一个计算机系统的实施例结构示意图。
具体实施方式
本申请提供了一种缓存访问方法、多级缓存系统及计算机系统,用于减少源集群向目标集群读取或写入数据时的操作步骤,从而降低latency,提升了系统的CPU性能。
下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述。
首先简单介绍本申请应用的系统构架或场景。
随着CPU技术的发展,CPU对于Memory访问的latency问题越来越敏感,提高数据访问的效率,以及减少latency问题成为了提升CPU性能的关键。想要提升CPU性能最主要是达到以下5点:
1.把CPU从data/message的处理中解放出来;
因为data/message的处理会消耗大量的时间,让CPU处于准备(pending)状态,严重影响CPU的每一时钟周期内所执行的指令数(Instruction Per Clock,IPC),一旦CPU从这些繁重data/message中解放出来,就能处理更多的执行,就能显著提升CPU的IPC,从而提高CPU的总体性能。
2.减少CPU由于处理data/message而产生的latency;
3.通过close-by本地Cache来提升CPU访问数据的性能;
由于离CPU越近的Memory,CPU对其进行访问所需的时间也就越少,CPU由于内存访问(Memory Access)产生的latency也就越短,性能提升,本地Cache是小容量存储器,存取速度比主存快,接近CPU,从而可以提升CPU访问数据的性能。
4.通过一片透明的Cache的做法来提高兼容性;
透明的Cache是相对于本地Memory的概念而言,本地Memory通常是大小确定的,通过其进行数据的搬运,一旦搬运的数据超出其大小范围,数据有可能被污染,而Cache对CPU来说是透明的,不需要感知其存储空间的大小,如果超出则由Memory的数据一致性机制来保护。
5.允许加速器直接对data/message进行操作。
允许加速器的计算机系统相对于不允许加速器的计算机系统而言,只能通过CPU的Prefetch指令等手段,加速对data/message的处理,效率偏低,会影响CPU的总体性能,加速器能更快的提速对于data/message的操作。
采用Cache技术能够满足上述的第1、2、3和4点,但是由于没有涉及到加速器直接对data/message进行操作这一点,因此无法满足第5点;而采用直接访问内存(Direct Memory Access,DMA)技术的话,由于本地Cache存在,虽然能同时满足以上第1、2、3和5点,但是由于透明的Cache不存在,那么不能满足第4点。
那么以上的Cache技术和DMA方式都不能同时满足以上的5点,因此CPU性能的提升仍然有空间,而ARM公司提供的Cache Stashing技术是可以同时满足以上5点的,Cache Stashing技术是基于多级缓存系统的,如图1所示为多级缓存系统的架构图。图1中具有三级Cache,cluster1和cluster2分别具有两个CPU核,每一个CPU核都具有L1 Cache(一级缓存),而且每个cluster中包含一个L2 Cache(二级缓存),而L3 Cache(三级缓存)作为共享缓存,处理两个cluster之间的数据一致性,不处于cluster1和cluster2中,需要说明的是,在图1的多级缓存系统中只列举两个cluster,而且每个cluster包括两个CPU核,在实际应用中可能具有更多的cluster,并且每一个cluster中还可能包含更多的CPU核,多级缓存系统也不限制在只包括三级缓存,可能还有更高层级的缓存。而硬件加速器(Hardware Accelerator Controller,HAC)是专用的定点功能外设,用于处理特定功能,特定算法的模块,可以减轻CPU核的负担,因此,使用HAC就提供了一种高性价 比的方法,以增加CPU核的计算能力。在图1的多级缓存系统的基础上结合图2所示的指令交互图,对Cache Stashing技术主要实现方式进行如下描述:
201、以cluster1中的HAC将数据推送到cluster2中的CPU核2为例进行说明,其中,cluster1作为源cluster,cluster2作为目标cluster,而cluster2中的CPU核2对应的L1 Cache作为目标L1 Cache。当源cluster中的HAC需要将数据推送至目标cluster的CPU核2的L1 Cache时,源cluster中的HAC向源cluster的L2 Cache发起推送操作指令(即Snoop Stash),Snoop Stash中包括目标L1 Cache的地址信息以及数据;
202、由源cluster的L2 Cache根据数据一致性机制对源cluster的中的各个L1 Cache发起数据一致性操作指令(即Snoop),数据一致性机制主要是为了处理共享数据,保证各CPU核看到是共享数据都是正确和一致的,这需要通过各级Cache的控制器来实现,也就是ARM公司的Snoop机制,让发起Snoop的对象拥有绝对权限去修改数据,而不会引起一致性的问题,因此源cluster的L2 Cache发起Snoop,是为了确保源cluster的数据一致性;
203、源cluster的各个L1 Cache接收到Snoop后,向源cluster的L2 Cache反馈数据一致性操作回应(即Snoop Response),L2 Cache根据接收到各L1 Cache的Snoop Response,完成源cluster的数据一致性操作;
204、源cluster的L2 Cache接收到各L1 Cache的Snoop Response后,完成了源cluster的数据一致性操作,根据多cluster间的数据一致性机制将推送操作请求(Snoop Stash request)发送至L3 Cache(共享缓存),Snoop Stash request中包括源cluster的数据一致性信息及目标L1 Cache的地址信息以及数据;
205、L3 Cache接收到Snoop Stash request之后,能够确定源cluster的数据一致性,并且根据数据一致性机制向目标cluster的L2 Cache发送推送操作指令(即Snoop for Stash),Snoop for Stash包括目标L1 Cache的地址信息;
206、目标cluster的L2 Cache接收到Snoop for Stash之后,根据目标L1 Cache的地址信息向目标L1 Cache发送目标指令,该目标指令中包括Soonp和预取触发指令,预取触发指令用于触发目标L1 Cache发送预取指令,因此,目标L1 Cache接收目标指令后,生成预取指令(即Prefetch),发送到L2 Cache,并且向L2 Cache反馈Snoop response;
207、目标cluster的L2 Cache对目标cluster的其他L1 Cache发送Snoop,收到Snoop的L1 Cache向L2 Cache反馈Snoop response;
208、目标cluster的L2 Cache收集来自目标L1 Cache的Snoop response;
209、目标cluster的L2 Cache收集来自其他L1 Cache的Snoop response;
210、目标cluster的L2 Cache根据目标L1 Cache和其他L1 Cache的Snoop response,完成目标cluster的数据一致性操作,将目标cluster的数据一致性信息反馈给L3 Cache,从而使得L3 Cache完成了多cluster间的数据一致性操作;
211、目标L1 Cache在接收到目标指令之时,根据预取触发指令可以确定源cluster中HAC需要推送的数据存储于L3 Cache中,因此,向L2 Cache发送Prefetch,Prefetch用于通知L3 Cache可以将数据发送至目标L1 Cache了;
212、目标cluster的L2 Cache向L3 Cache转发目标L1 Cache发送的Prefetch;
213、L3 Cache接收到Prefetch之后,向目标cluster的L2 Cache反馈预取回应(即Fetch response),Fetch response中包括数据;
214、目标cluster的L2 Cache将Fetch response转发给目标L1 Cache,从而实现了将HAC的数据推送进目标L1 Cache中。
虽然以上图2所示的Cache Stashing技术可以同时满足以上5点,但是从步骤201-步骤214可以看出,完成一次数据推送,需要完成的步骤很多,latency问题的解决还不够完善,CPU性能还是会受到latency问题的影响,而本申请需要解决的问题就是通过减少现有的Cache Stashing技术中的步骤,来进一步提升多级缓存系统的CPU性能,下面通过实施例进行具体说明。
请参阅图3,本申请实施例提供一种缓存访问方法,包括:
301、源二级缓存获取DCA指令;
本实施例中,以图1所示的多级缓存系统为例,当cluster1中的HAC需要从cluster2中的CPU核2中读取数据,或者,将数据推送到cluster2中的CPU核2中时,此时cluster1作为源cluster,cluster2作为目标cluster,而CPU核2对应的L1 Cache作为目标L1 Cahe,cluster1中的L2 Cache作为源二级缓存(源L2 Cache),cluster2中的L2 Cache作为目标二级缓存(目标L2 Cache),L3 Cache作为共享缓存,源cluster中的HAC发送直接访问缓存(Direct Cache Access,DCA)指令到源cluster的源L2 Cache,DCA指令中包括目标L1 Cache的缓存标识及需要读取或者推送的数据,源L2 Cache接收到HAC发送的DCA指令。DCA技术是:发送方可以使用DCA指令将数据直接写入到接收方,或者,发送方可以使用DCA指令从接收方中读取数据,最重要的是接收方为Cache。DMA技术是:发送方可以使用DMA指令从接收方读取数据或将数据写入接收方,而接收方是内存。那么比较DCA技术与DMA技术,由于Cache处于Memory和CPU核之间,那么显然采用DCA技术比采用DMA技术的latency问题更小。
需要说明的是,如果是cluster1中的CPU核1需要从cluster2中的CPU核2中读取数据,或者,将数据推送到cluster2中的CPU核2中时,此时DCA指令的发送者就是cluster1的CPU核1,CPU核1通过对应的L1 Cache将HAC指令转发到cluster1的L2 Cache。
302、源二级缓存将DCA指令发送至共享缓存;
本实施例中,由于源L2 Cache接收到的DCA指令中包括目标L1 Cache的缓存标识,那么源L2 Cache能够确定目标L1 Cache是cluster2中的,由于源cluster和目标cluster之间的共享缓存是L3 Cache,因此,源L2 Cache将DCA指令发送到L3 Cache。
303、共享缓存根据DCA指令生成DCA操作指令;
本实施例中,L3 Cache接收到源L2 Cache发送的DCA指令之后,根据DCA指令生成DCA操作指令,DCA操作指令可以使得目标L1 Cache写入源cluster中HAC的数据,或者,将目标L1 Cache中的数据写入源cluster的HAC中。DCA操作指令具体的形式,需要以生成DCA指令的源集群的HAC的需求为准,例如,如果HAC需要从cluster2中的CPU核2中读取数据,那么DCA指令涉及的就是读取类型的指令;如果HAC需要将数据推送到cluster2中的CPU核2中,那么DCA指令涉及的就是写入类型的指令。
304、共享缓存通过目标集群中的目标二级缓存将DCA操作指令发送至目标一级缓存。
本实施例中,L3 Cache能够根据DCA指令中携带的目标L1 Cache的缓存标识确定目标L1 Cache,从而能够确定目标L2 Cache,生成DCA操作指令之后,L3 Cache通过目标cluster中的目标L2 Cache,将DCA操作指令发送到目标L1 Cache中,从而将HAC的数据写入到目标cluster的L1 Cache中,或者从目标cluster的L1 Cache中将数据读回并写入到HAC中。
本申请实施例中,在源集群需要在目标集群读取或写入数据时,源二级缓存获取携带目标一级缓存的缓存标识的DCA指令,源二级缓存将DCA指令发送至共享缓存,共享缓存根据DCA指令生成DCA操作指令,然后通过目标集群中的目标二级缓存将DCA操作指令发送至目标一级缓存,使得目标一级缓存写入数据或将数据写入源集群。与图2所示的Cache Stashing技术相比,由于本申请实施例采用的是DCA技术,步骤有所减少,例如,以将源cluster中的HAC的数据推送进目标cluster中的目标L1 Cache中为例,Cache Stashing技术中需要通过目标L1 Cache向L3 Cache发送Prefetch,然后L3 Cache将HAC的数据携带于Fetch response中,反馈给目标L1 Cache;而本申请实施例中,L3 Cache将HAC的数据携带于DCA操作指令中,在L3 Cache通过目标cluster的L2 Cache将DCA操作指令发送到目标L1 Cache时,就实现了将HAC的数据推送进目标L1 Cache。可以明显的看出,相比于Cache Stashing技术,本申请实施例中的步骤明显减少了,因此降低latency,提升了系统的CPU性能。
上述图3所示的实施例中,考虑到数据一致性机制可以是多级缓存系统预先保证了,也可以是按照以上图2所示的Cache Stashing技术中的方式确定数据一致性的。下面通过实施例对如何确定数据一致性的进行详细说明。
请参阅图4,本申请实施例提供一种缓存访问方法,包括:
401、源二级缓存获取DCA指令;
详情请参阅图3所示实施例的步骤301。
402、源二级缓存向源集群中的源一级缓存发送第一探查指令;
本实施例中,当源L2 Cache接收到DCA指令之后,根据数据一致性机制对源cluster的中的各个L1 Cache发起第一探查指令(即Snoop),Snoop用于进行源cluster的数据一致性操作,源cluster的中的各个源L1 Cache根据Snoop完成数据一致性操作后,会向源L2 Cache反馈第一探查回应(即Snoop response)。
403、源二级缓存获取源一级缓存反馈的第一探查回应,根据第一探查回应确定源集群具有数据一致性;
本实施例中,源L2 Cache接收源cluster的中的各个源L1 Cache反馈的Snoop response,根据Snoop response就能够确定源cluster的数据一致性操作完成,源cluster具有数据一致性。
404、源二级缓存将DCA指令发送至共享缓存;
本实施例中,由于源L2 Cache接收到的DCA指令中包括目标L1 Cache的缓存标识,那么源L2 Cache能够确定目标L1 Cache是目标cluster中的,由于源cluster和目标 cluster之间的共享缓存是L3 Cache,因此,源L2 Cache将DCA指令发送到L3 Cache。
405、共享缓存向目标集群中的目标二级缓存发送DCA探查指令;
本实施例中,当L3 Cache接收到DCA指令之后,根据数据一致性机制需要先确认目标cluster的数据一致性,因此需要向目标L2 Cache发送DCA探查指令(即Snoop for DCA)。
406、目标二级缓存根据DCA探查指令向目标集群中的所有一级缓存发送第二探查指令;
本实施例中,目标L2 Cache接收到Snoop for DCA之后,向目标cluster中所有L1 Cache发送第二探查指令(即Snoop),Snoop用于进行目标cluster的数据一致性操作,目标cluster的中的所有L1 Cache根据Snoop完成数据一致性操作后,会向目标L2 Cache反馈第二探查回应(即Snoop response)。
407、目标二级缓存接收目标集群中所有一级缓存反馈的第二探查回应,并将第二探查回应反馈至共享缓存;
本实施例中,目标L2 Cache接收目标cluster中所有L1 Cache反馈的Snoop response,并将Snoop response反馈至L3 Cache。
408、共享缓存接收目标二级缓存反馈的第二探查回应,根据第二探查回应确定目标集群具有数据一致性;
本实施例中,L3 Cache接收目标L2 Cache反馈的所有L1 Cache的Snoop response,根据Snoop response就能够确定目标cluster的数据一致性操作完成,目标cluster具有数据一致性。
409、共享缓存根据DCA指令生成DCA操作指令;
本实施例中,L3 Cache接收到源L2 Cache发送的DCA指令之后,根据DCA指令生成DCA操作指令,DCA操作指令具体的形式,需要以生成DCA指令的源cluster的HAC的需求为准,例如,如果源cluster的HAC需要将数据推送到目标cluster中的目标L1 Cache中,那么DCA指令涉及的就是写入类型的指令,而且HAC的数据携带在DCA指令中,而L3 Cache生成的DCA操作指令中就包含有HAC的数据;如果源cluster的HAC需要从目标cluster中的目标L1 Cache中读取数据,那么DCA指令涉及的就是读取类型的指令,此时L3 Cache生成的DCA操作指令是为了从目标L1 Cache读取到数据。
410、共享缓存通过目标cluster中的目标二级缓存将DCA操作指令发送至目标一级缓存。
本实施例中,L3 Cache能够根据DCA指令中携带的目标L1 Cache的缓存标识确定目标L1 Cache,从而能够确定目标L2 Cache,生成DCA操作指令之后,L3 Cache通过目标cluster中的目标L2 Cache,将步骤409中生成的DCA操作指令发送到目标L1 Cache中,使得目标L1 Cache能够根据DCA操作指令获得HAC的数据,或者,根据DCA操作指令将HAC所要读取的数据发送给L3 Cache,L3 Cache再将数据发送到HAC。
本申请实施例中,详细介绍了需要确定数据一致性时的缓存访问方法,与图2所示的Cache Stashing技术对比可以看出,本申请实施例无需执行步骤206和步骤208,因此,在确定数据一致性的时候,本申请实施例与Cache Stashing技术相比,还可以进一步的降 低latency。
在以上实施例中,Cache的工作方式并未考虑进去。目前Cache的工作方式包括直写式Cache方式和回写式Cache方式,直写式Cache方式是:当CPU要将数据写入内存时,除了更新Cache上的数据外,也将数据写在动态随机存取存储器(Dynamic Random Access Memory,DRAM)中,以维持Memory与Cache的数据一致性;回写式Cache方式是:每当CPU要将数据写入Memory时,只会先更新Cache上的数据,随后再让Cache在总线不塞车的时候,才把数据写回DRAM。那么多级缓存系统预先保证了数据一致性的前提下,采用的自然是回写式Cache方式,DCA指令具体为DCA回写指令;按照图2所示的Cache Stashing技术中的方式确定数据一致性的情况下,采用的是直写式Cache方式,DCA指令具体为DCA直写指令。
下面通过(一)和(二)两个实施例,对源集群需要将数据写入目标集群的情况时,不同的Cache的工作方式下进行说明。
(一)、当源集群需要将数据写入目标集群时,DCA指令为DCA直写指令(即DCA write through),DCA操作指令为DCA填充指令(即DCA fill);
请参阅图5,本申请实施例提供一种缓存访问方法,包括:
501、源二级缓存获取DCA write through;
本实施例中,以图1所示的多级缓存系统为例,当cluster1中的HAC需要将数据写入cluster 2中的CPU核2时,cluster 1作为源cluster,cluster 2作为目标cluster,cluster1的CPU核2对应的L1 Cache作为目标L1 Cahe,cluster1中的L2 Cache作为源二级缓存(源L2 Cache),cluster2中的L2 Cache作为目标二级缓存(目标L2 Cache),L3 Cache作为共享缓存,源cluster中的HAC发起一次带有目标L1 Cache的缓存标识的DCA write through(即DCA直写指令)到源L2 Cache,并且DCA write through包含有HAC需要写入目标L1 Cache的数据,源L2 Cache接收到HAC发送的DCA write through。
502、源二级缓存向源集群中的源一级缓存发送Snoop;
本实施例中,当源L2 Cache接收到DCA write through之后,根据数据一致性机制向源cluster中的各个源L1 Cache发送Snoop,源L1 Cache根据Snoop完成数据一致性操作后,会向源L2 Cache反馈Snoop response。
503、源二级缓存获取源一级缓存反馈的Snoop response,根据Snoop response确定源集群具有数据一致性;
本实施例中,源L2 Cache接收源cluster中的各个源L1 Cache反馈的Snoop response,根据Snoop response就能够确定源cluster的数据一致性操作完成,源cluster具有数据一致性。
504、源二级缓存将DCA write through发送至共享缓存;
本实施例中,由于源L2 Cache接收到的DCA write through中包括目标L1 Cache的缓存标识,那么源L2 Cache能够确定目标L1 Cache是目标cluster中的,由于源cluster和目标cluster之间的共享缓存是L3 Cache,因此,源L2 Cache将DCA write through 发送到L3 Cache。
505、共享缓存向目标集群中的目标二级缓存发送DCA直写探查指令;
本实施例中,当L3 Cache接收到DCA write through之后,根据数据一致性机制需要先确保目标cluster具有数据一致性,因此,向目标L2 Cache发送Snoop for DCA write through,以使得目标L2 Cache对目标cluster进行数据一致性操作。
506、目标二级缓存根据Snoop for DCA write through向目标cluster中的所有一级缓存发送Snoop;
本实施例中,目标L2 Cache接收到Snoop for DCA write through之后,向目标cluster中所有L1 Cache发送Snoop,目标cluster的中的所有L1 Cache根据Snoop完成数据一致性操作后,会向目标L2 Cache反馈Snoop response。
507、目标二级缓存接收目标集群中所有一级缓存反馈的Snoop response,并将Snoop response反馈至共享缓存;
本实施例中,目标L2 Cache接收目标cluster中所有L1 Cache反馈的Snoop response,并将Snoop response反馈至L3 Cache。
508、共享缓存接收目标二级缓存反馈的Snoop response,根据Snoop response确定目标集群具有数据一致性;
本实施例中,L3 Cache接收目标L2 Cache反馈的所有L1 Cache的Snoop response,根据Snoop response就能够确定目标cluster的数据一致性操作完成,目标cluster具有数据一致性。
509、共享缓存根据DCA write through生成DCA fill;
本实施例中,L3 Cache接收到源L2 Cache发送的DCA write through之后,根据DCA write through生成DCA fill,并且从DCA write through中获得HAC的数据之后,将HAC的数据携带于DCA fill中。
510、共享缓存通过目标集群中的目标二级缓存将DCA fill发送至目标一级缓存。
本实施例中,L3 Cache根据DCA write through中携带的目标L1 Cache的缓存标识确定目标L1 Cache之后,将生成的DCA fill通过目标L2 Cache发送到目标L1 Cache,由于DCA fill中就携带有HAC的数据,因此,目标L1 Cache获取到DCA fill时,HAC的数据就已经写入到目标L1 Cache中。
(二)、当源集群需要将数据写入目标集群,并且多级缓存系统的数据一致性机制已经确保源集群和目标集群具有数据一致性时,DCA指令为DCA回写指令(即DCA write-back),DCA操作指令为DCA填充指令(即DCA fill)。
请参阅图6,本申请实施例提供一种缓存访问方法,包括:
601、源二级缓存获取DCA write-back;
本实施例中,当cluster1中的HAC需要将数据写入cluster 2中的CPU核2时,cluster 1作为源cluster,cluster 2作为目标cluster,cluster1的CPU核2对应的L1 Cache作为目标L1 Cahe,cluster1中的L2 Cache作为源二级缓存(源L2 Cache),cluster2中的L2 Cache作为目标二级缓存(目标L2 Cache),L3 Cache作为共享缓存,源cluster中 的HAC发起一次带有目标L1 Cache的缓存标识的DCA write-back到源集群的源L2 Cache,并且DCA write-back包含有HAC需要写入目标L1 Cache的数据,源L2 Cache接收到HAC发送的DCA write-back。
602、源二级缓存将DCA write-back发送至共享缓存;
本实施例中,由于源L2 Cache接收到的DCA write-back中包括目标L1 Cache的缓存标识,那么源L2 Cache能够确定目标L1 Cache是目标cluster中的,由于源cluster和目标cluster之间的共享缓存是L3 Cache,因此,源L2 Cache将DCA write-back发送到L3 Cache。
603、共享缓存根据DCA write-back生成DCA fill;
本实施例中,L3 Cache接收到源L2 Cache发送的DCA write-back之后,根据DCA write-back生成DCA fill,并且从DCA write through中获得HAC的数据之后,将HAC的数据携带于DCA fill中。
604、共享缓存通过目标集群中的目标二级缓存将DCA fill发送至目标一级缓存。
本实施例中,L3 Cache根据DCA write-back中携带的目标L1 Cache的缓存标识确定目标L1 Cache之后,将生成的DCA fill通过目标L2 Cache发送到目标L1 Cache,由于DCA fill中就携带有HAC的数据,因此,目标L1 Cache获取到DCA fill时,HAC的数据就已经写入到目标L1 Cache中。
以上图5和图6所示实施例中,详细介绍了当源集群需要将数据写入目标集群时,结合数据一致性的确定方式,选择不同的Cache工作方式,从而具体采用不同的DCA指令,DCA指令具体可以是DCA回写指令和DCA直写指令,使得方案的实施更加多样化。
上述图5和图6的实施例中介绍的是源集群需要将数据写入目标集群时,下面通过实施例对源集群需要从目标集群读取数据进行说明,而以上描述的实施例中介绍的都是每个集群中包括了两个以上的L1 Cache的情形,那么需要进行数据一致性操作,而如果每个集群中只具有一个L1 Cache或者多级缓存系统的Cache一致性能够保证时,HAC需要从目标集群的CPU核读取数据的话,DCA指令为DCA read,而目标集群的CPU核对应的L1 Cache实际上需要通过回写方式将数据写入到HAC中。下面通过实施例进行详细说明。
可选的,本申请的一些实施例中,当源集群需要从目标集群读取数据,并且多级缓存系统不要求缓存一致性时,DCA指令为DCA读取指令,DCA操作指令为探查回写指令,
共享缓存根据DCA指令生成DCA操作指令,包括:
共享缓存根据DCA读取指令中的缓存标识确定目标一级缓存;
共享缓存生成探查回写指令,探查回写指令用于指示目标二级缓存从目标一级缓存读取数据。
本申请实施例中,当源集群需要从目标集群读取数据,并且多级缓存系统不要求缓存一致性时,共享缓存根据DCA指令生成DCA操作指令具体为:L3 Cache根据DCA read中的缓存标识确定目标L1 Cache,L3 Cache生成Snoop to writeback,Snoop to writeback用于指示目标L2 Cache从目标L1 Cache读取数据。
可选的,本申请的一些实施例中,共享缓存通过目标集群中的目标二级缓存将DCA操作指令发送至目标一级缓存之后,还包括:
目标一级缓存接收探查回写指令;
目标一级缓存根据探查回写指令向目标二级缓存反馈第三探查回应,第三探查回应包括数据;
目标二级缓存将第三探查回应转发至共享缓存;
共享缓存根据第三探查回应生成DCA读取回应,DCA读取回应包括数据;
共享缓存将DCA读取回应发送至源二级缓存,使得源二级缓存根据DCA读取回应得到目标一级缓存的数据。
本申请实施例中,在目标L1 Cache接收到Snoop to writeback之后,目标L1 Cache根据Snoop to writeback向目标L2 Cache反馈Snoop response,并且将HAC需要读取的数据包含在Snoop response中,目标L2 Cache将Snoop response转发至L3 Cache,L3 Cache根据Snoop response生成DCA read response,并且DCA read response中包括有Snoop response中的数据,L3 Cache将DCA read response发送至源L2 Cache,使得源L2 Cache接收到DCA read response后,就能得到DCA read response中包含的HAC需要读取的目标L1 Cache中的数据,然后源L2 Cache再将DCA read response反馈给发送DCA read的HAC,实现源cluster的HAC读取到目标cluster的目标L1 Cache中的数据。
上述实施例介绍的是多级缓存方法,下面对多级缓存方法应用的多级缓存系统进行详细说明。
请参阅图7,本申请实施例提供一种多级缓存系统,包括:
共享缓存701及至少两个集群,至少两个集群中包括源集群702和目标集群703,源集群702包括源一级缓存7021及源二级缓存7022,目标集群703包括目标一级缓存7031及目标二级缓存7032;
源二级缓存7022,用于当源集群702需要在目标集群703读取或写入数据时,获取DCA指令,源二级缓存7022为源集群702中的二级缓存,DCA指令包括目标集群703中的目标一级缓存7031的缓存标识;
源二级缓存7022,还用于将DCA指令发送至共享缓存701;
共享缓存701,用于根据DCA指令生成DCA操作指令;
共享缓存701,还用于通过目标集群703中的目标二级缓存7032将DCA操作指令发送至目标一级缓存7031,使得目标一级缓存7031写入数据或将数据写入源集群702。
本申请实施例中,在源集群702需要在目标集群703读取或写入数据时,源二级缓存7022获取携带目标一级缓存7031的缓存标识的DCA指令,源二级缓存7022将DCA指令发送至共享缓存701,共享缓存701根据DCA指令生成DCA操作指令,然后通过目标集群703中的目标二级缓存7032将DCA操作指令发送至目标一级缓存7031,使得目标一级缓存7031写入数据或将数据写入源集群702。与图2所示的Cache Stashing技术相比,由于本申请实施例采用的是DCA技术,步骤有所减少,例如,以将源cluster中的HAC的数据推送进 目标cluster中的目标L1 Cache中为例,Cache Stashing技术中需要通过目标L1 Cache向L3 Cache发送Prefetch,然后L3 Cache将HAC的数据携带于Fetch response中,反馈给目标L1 Cache;而本申请实施例中,L3 Cache将HAC的数据携带于DCA操作指令中,在L3 Cache通过目标cluster的L2 Cache将DCA操作指令发送到目标L1 Cache时,就实现了将HAC的数据推送进目标L1 Cache。可以明显的看出,相比于Cache Stashing技术,本申请实施例中的步骤明显减少了,因此降低latency,提升了系统的CPU性能。
可选的,本申请的一些实施例中,
源二级缓存7022,还用于向源集群702中的源一级缓存7021发送第一探查指令,使得源一级缓存7021反馈第一探查回应,第一探查指令用于进行源集群702的数据一致性操作;
源二级缓存7022,还用于接收源一级缓存7021反馈的第一探查回应,根据第一探查回应确定源集群702具有数据一致性。
本申请实施例中,采用图2所示的Cache Stashing技术的方式确定数据一致性,具体为源二级缓存7022向源集群702中的源一级缓存7021发送第一探查指令,使得源一级缓存7021反馈第一探查回应,源二级缓存7022接收到源一级缓存7021反馈的第一探查回应后,根据第一探查回应就能够确定源集群702具有数据一致性。
可选的,本申请的一些实施例中,
共享缓存701,还用于向目标集群703中的目标二级缓存7032发送DCA探查指令;
目标二级缓存7032,用于根据DCA探查指令向目标集群703中的所有一级缓存发送第二探查指令,使得目标集群703中的所有一级缓存反馈第二探查回应,第二探查指令用于进行目标集群703的数据一致性操作;
目标二级缓存7032,还用于接收目标集群703中的所有一级缓存反馈的第二探查回应,并将第二探查回应反馈至共享缓存701;
共享缓存701,还用于接收目标二级缓存7032反馈的第二探查回应,根据第二探查回应确定目标集群703具有数据一致性。
本申请实施例中,采用图2所示的Cache Stashing技术的方式确定数据一致性,根据图4所示的缓存访问方法实施例中可以得到,与图2所示的Cache Stashing技术对比可以看出,本申请无需执行步骤206和步骤208,因此,在确定数据一致性的时候,还可以进一步的降低latency,提升了系统的CPU性能。
可选的,本申请的一些实施例中,当源集群702需要将数据写入目标集群703时,DCA指令为DCA直写指令,DCA直写指令还包括数据,DCA操作指令为DCA填充指令,
共享缓存701,还用于根据DCA直写指令中的缓存标识确定目标一级缓存7031,并获取DCA直写指令中的数据;
共享缓存701,还用于生成DCA填充指令,DCA填充指令包括数据,DCA填充指令用于直接将数据写入目标一级缓存7031。
本申请实施例中,目前Cache的工作方式包括直写式Cache方式和回写式Cache方式,直写式Cache方式是:当CPU要将数据写入内存时,除了更新Cache上的数据外,也将数 据写在DRAM中,以维持Memory与Cache的数据一致性;回写式Cache方式是:每当CPU要将数据写入Memory时,只会先更新Cache上的数据,随后再让Cache在总线不塞车的时候,才把数据写回DRAM。那么多级缓存系统预先保证了数据一致性的前提下,采用的自然是回写式Cache方式,DCA指令具体为DCA回写指令;按照图2所示的Cache Stashing技术中的方式确定数据一致性的情况下,采用的是直写式Cache方式,DCA指令具体为DCA直写指令。因此,当源集群702需要将数据写入目标集群703时,DCA指令可以为DCA直写指令或DCA回写指令,当源集群702需要将数据写入目标集群703时,并且DCA指令为DCA直写指令时,共享缓存701根据DCA直写指令中的缓存标识确定目标一级缓存7031,生成DCA填充指令,DCA填充指令包含源集群702的数据,那么将DCA填充指令发送至目标一级缓存7031,就能实现直接将数据写入目标集群703。
可选的,本申请的一些实施例中,当源集群702需要将数据写入目标集群703,并且源集群702和目标集群703具有数据一致性时,DCA指令为DCA回写指令,DCA回写指令还包括数据,DCA操作指令为DCA填充指令,
共享缓存701,还用于根据DCA回写指令中的缓存标识确定目标一级缓存7031,并获取DCA回写指令中的数据;
共享缓存701,还用于生成DCA填充指令,DCA填充指令包括数据,DCA填充指令用于直接将数据写入目标一级缓存7031。
本申请实施例中,目前Cache的工作方式包括直写式Cache方式和回写式Cache方式,直写式Cache方式是:当CPU要将数据写入内存时,除了更新Cache上的数据外,也将数据写在DRAM中,以维持Memory与Cache的数据一致性;回写式Cache方式是:每当CPU要将数据写入Memory时,只会先更新Cache上的数据,随后再让Cache在总线不塞车的时候,才把数据写回DRAM。那么多级缓存系统预先保证了数据一致性的前提下,采用的自然是回写式Cache方式,DCA指令具体为DCA回写指令;按照图2所示的Cache Stashing技术中的方式确定数据一致性的情况下,采用的是直写式Cache方式,DCA指令具体为DCA直写指令。因此,当源集群702需要将数据写入目标集群703,并且源集群702和目标集群703具有数据一致性时,DCA指令可以为DCA直写指令或DCA回写指令,当源集群702需要将数据写入目标集群703时,并且DCA指令为DCA回写指令时,DCA操作指令为DCA填充指令,共享缓存701根据DCA回写指令中的缓存标识确定目标一级缓存7031,生成DCA填充指令,DCA填充指令包含源集群702的数据,那么将DCA填充指令发送至目标一级缓存7031,就能实现直接将数据写入目标集群703。
可选的,本申请的一些实施例中,当源集群702需要从目标集群703读取数据,并且多级缓存系统不要求缓存一致性时,DCA指令为DCA读取指令,DCA操作指令为探查回写指令,
共享缓存701,还用于根据DCA读取指令中的缓存标识确定目标一级缓存7031;
共享缓存701,还用于生成探查回写指令,探查回写指令用于指示目标二级缓存7032从目标一级缓存7031读取数据。
本申请实施例中,当源集群702需要从目标集群703读取数据,并且多级缓存系统不 要求Cache一致性时,具体情况可以是,多级缓存系统的每个集群中只具有一个CPU核,即只有一个一级缓存,此时Cache一致性不需要确定,或者多级缓存系统的Cache一致性通过预先的设置能够保证了,共享缓存701根据DCA指令生成DCA操作指令具体为:共享缓存701根据DCA read中的缓存标识确定目标一级缓存7031,共享缓存701生成Snoop to writeback,Snoop to writeback用于指示目标二级缓存7032从目标一级缓存7031读取数据。
可选的,本申请的一些实施例中,
目标一级缓存7031,用于接收探查回写指令;
目标一级缓存7031,还用于根据探查回写指令向目标二级缓存7032反馈第三探查回应,第三探查回应包括数据;
目标二级缓存7032,还用于将第三探查回应转发至共享缓存701;
共享缓存701,还用于根据第三探查回应生成DCA读取回应,生成DCA读取回应,DCA读取回应包括数据;
共享缓存701,还用于将DCA读取回应发送至源二级缓存7022,使得源二级缓存7022根据DCA读取回应得到目标一级缓存7031中的数据。
本申请实施例中,目标一级缓存7031接收Snoop to writeback之后,目标一级缓存7031根据Snoop to writeback向共享缓存701反馈Snoop response,共享缓存701根据Snoop response生成DCA read response,并将DCA read response发送至源二级缓存7022,使得源二级缓存7022根据DCA read response读取到目标一级缓存7031中的数据,然后源集群702的HAC或者源集群702的需求该数据的一级缓存7021从源二级缓存7022获得数据,从而完成源集群702从目标集群703的Cache中读取数据。
如图8所示,本申请实施例提供一种计算机系统800,包括:
外存82及多级缓存系统81,外存82及多级缓存系统81通过总线连接;
多级缓存系统81包括共享缓存801及至少两个集群,每个集群具有至少一个一级缓存及二级缓存,至少两个集群中包括源集群802和目标集群803,源集群802包括源一级缓存8021及源二级缓存8022,目标集群803包括目标一级缓存8031及目标二级缓存8032;
源二级缓存8022,用于当源集群802需要在目标集群803读取或写入数据时,接收DCA指令,源二级缓存8022为源集群802中的二级缓存,DCA指令包括目标集群803中的目标一级缓存8031的缓存标识;
源二级缓存8022,还用于将DCA指令发送至共享缓存801;
共享缓存801,用于根据DCA指令生成DCA操作指令;
共享缓存801,还用于通过目标集群803中的目标二级缓存8032将DCA操作指令发送至目标一级缓存8031,使得目标一级缓存8031写入数据或将数据写入源集群802。
本申请实施例中,在源集群802需要在目标集群803读取或写入数据时,源二级缓存802接收携带目标一级缓存8031的缓存标识的DCA指令,源二级缓存8022将DCA指令发 送至共享缓存801,共享缓存801根据DCA指令生成DCA操作指令,然后通过目标集群803中的目标二级缓存8032将DCA操作指令发送至目标一级缓存8031,使得目标一级缓存8031写入数据或将数据写入源集群802。与图2所示的Cache Stashing技术相比,由于采用的是DCA技术,步骤有所减少,例如,将HAC的数据推送进目标L1 Cache中,Cache Stashing技术中需要通过Prefetch-Fetch response的操作来,而本申请只需要通过DCA操作指令直接将,HAC的数据推送进目标L1 Cache。因此,本申请可以降低latency,提升了系统的CPU性能。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本发明实施例还提供一种实现访问请求处理方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行前述任意一个方法实施例所述的方法流程。本领域普通技术人员可以理解,前述的存储介质包括:U盘、移动硬盘、磁碟、光盘、随机存储器(Random-Access Memory,RAM)、固态硬盘(Solid State Disk,SSD)或者其他非易失性存储器(non-volatile memory)等各种可以存储程序代码的非短暂性的(non-transitory)机器可读介质。
需要说明的是,本申请所提供的实施例仅仅是示意性的。所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。

Claims (15)

  1. 一种缓存访问方法,其特征在于,应用于多级缓存系统,所述多级缓存系统包括共享缓存及至少两个集群,每个集群具有至少一个一级缓存及二级缓存,所述方法包括:
    当源集群需要在目标集群读取或写入数据时,源二级缓存获取直接访问缓存DCA指令,所述源二级缓存为所述源集群的二级缓存,所述DCA指令包括所述目标集群中的目标一级缓存的缓存标识;
    所述源二级缓存将所述DCA指令发送至所述共享缓存;
    所述共享缓存根据所述DCA指令生成DCA操作指令;
    所述共享缓存通过所述目标集群中的目标二级缓存将所述DCA操作指令发送至所述目标一级缓存,使得所述目标一级缓存写入所述数据或将所述数据写入所述源集群。
  2. 根据权利要求1所述的方法,其特征在于,所述源二级缓存将所述DCA指令发送至所述共享缓存之前,还包括:
    所述源二级缓存向所述源集群中的源一级缓存发送第一探查指令,使得所述源一级缓存反馈第一探查回应,所述第一探查指令用于进行所述源集群的数据一致性操作;
    所述源二级缓存获取所述源一级缓存反馈的第一探查回应,根据所述第一探查回应确定所述源集群具有数据一致性。
  3. 根据权利要求2所述的方法,其特征在于,所述共享缓存根据所述DCA指令生成DCA操作指令之前,还包括:
    所述共享缓存向所述目标集群中的目标二级缓存发送DCA探查指令;
    所述目标二级缓存根据所述DCA探查指令向所述目标集群中的所有一级缓存发送第二探查指令,使得所述目标集群中的所有一级缓存反馈第二探查回应,所述第二探查指令用于进行所述目标集群的数据一致性操作;
    所述目标二级缓存接收所述目标集群中的所有一级缓存反馈的第二探查回应,并将所述第二探查回应反馈至所述共享缓存;
    所述共享缓存接收所述目标二级缓存反馈的第二探查回应,根据所述第二探查回应确定所述目标集群具有数据一致性。
  4. 根据权利要求3所述的方法,其特征在于,当所述源集群需要将数据写入所述目标集群时,所述DCA指令为DCA直写指令,所述DCA直写指令还包括所述数据,所述DCA操作指令为DCA填充指令,
    所述共享缓存根据所述DCA指令生成DCA操作指令,包括:
    所述共享缓存根据所述DCA直写指令中的所述缓存标识确定所述目标一级缓存,并获取所述DCA直写指令中的所述数据;
    所述共享缓存生成DCA填充指令,所述DCA填充指令包括所述数据,所述DCA填充指令用于直接将所述数据写入所述目标一级缓存。
  5. 根据权利要求1所述的方法,其特征在于,当所述源集群需要将数据写入所述目标集群,并且所述源集群和所述目标集群具有数据一致性时,所述DCA指令为DCA回写指令,所述DCA回写指令还包括所述数据,所述DCA操作指令为DCA填充指令,
    所述共享缓存根据所述DCA指令生成DCA操作指令,包括:
    所述共享缓存根据所述DCA回写指令中的所述缓存标识确定所述目标一级缓存,并获取所述DCA回写指令中的所述数据;
    所述共享缓存生成DCA填充指令,所述DCA填充指令包括所述数据,所述DCA填充指令用于直接将所述数据写入所述目标一级缓存。
  6. 根据权利要求3所述的方法,其特征在于,当所述源集群需要从所述目标集群读取数据,并且所述多级缓存系统不要求缓存一致性时,所述DCA指令为DCA读取指令,所述DCA操作指令为探查回写指令,
    所述共享缓存根据所述DCA指令生成DCA操作指令,包括:
    所述共享缓存根据所述DCA读取指令中的所述缓存标识确定所述目标一级缓存;
    所述共享缓存生成探查回写指令,所述探查回写指令用于指示所述目标二级缓存从所述目标一级缓存读取所述数据。
  7. 根据权利要求6所述的方法,其特征在于,所述共享缓存通过所述目标集群中的目标二级缓存将所述DCA操作指令发送至所述目标一级缓存之后,还包括:
    所述目标一级缓存接收所述探查回写指令;
    所述目标一级缓存根据所述探查回写指令向所述目标二级缓存反馈第三探查回应,所述第三探查回应包括所述数据;
    所述目标二级缓存将所述第三探查回应转发至所述共享缓存;
    所述共享缓存根据所述第三探查回应生成DCA读取回应,所述DCA读取回应包括所述数据;
    所述共享缓存将所述DCA读取回应发送至所述源二级缓存,使得所述源二级缓存根据所述DCA读取回应得到所述目标一级缓存的所述数据。
  8. 一种多级缓存系统,其特征在于,包括:
    共享缓存及至少两个集群,每个集群具有至少一个一级缓存及二级缓存;
    源二级缓存,用于当所述源集群需要在所述目标集群读取或写入数据时,获取直接访问缓存DCA指令,所述源二级缓存为所述源集群中的二级缓存,所述DCA指令包括所述目标集群中的目标一级缓存的缓存标识;
    所述源二级缓存,还用于将所述DCA指令发送至所述共享缓存;
    所述共享缓存,用于根据所述DCA指令生成DCA操作指令;
    所述共享缓存,还用于通过所述目标集群中的目标二级缓存将所述DCA操作指令发送至所述目标一级缓存,使得所述目标一级缓存写入所述数据或将所述数据写入所述源集群。
  9. 根据权利要求8所述的系统,其特征在于,
    所述源二级缓存,还用于向所述源集群中的源一级缓存发送第一探查指令,使得所述源一级缓存反馈第一探查回应,所述第一探查指令用于进行所述源集群的数据一致性操作;
    所述源二级缓存,还用于接收所述源一级缓存反馈的第一探查回应,根据所述第一探查回应确定所述源集群具有数据一致性。
  10. 根据权利要求9所述的系统,其特征在于,
    所述共享缓存,还用于向所述目标集群中的目标二级缓存发送DCA探查指令;
    所述目标二级缓存,用于根据所述DCA探查指令向所述目标集群中的所有一级缓存发送第二探查指令,使得所述目标集群中的所有一级缓存反馈第二探查回应,所述第二探查指令用于进行所述目标集群的数据一致性操作;
    所述目标二级缓存,还用于接收所述目标集群中的所有目标一级缓存反馈的第二探查回应,并将所述第二探查回应反馈至所述共享缓存;
    所述共享缓存,还用于接收所述目标二级缓存反馈的第二探查回应,根据所述第二探查回应确定所述目标集群具有数据一致性。
  11. 根据权利要求10所述的系统,其特征在于,当所述源集群需要将数据写入所述目标集群时,所述DCA指令为DCA直写指令,所述DCA直写指令还包括所述数据,所述DCA操作指令为DCA填充指令,
    所述共享缓存,还用于根据所述DCA直写指令中的所述缓存标识确定所述目标一级缓存,并获取所述DCA直写指令中的所述数据;
    所述共享缓存,还用于生成DCA填充指令,所述DCA填充指令包括所述数据,所述DCA填充指令用于直接将所述数据写入所述目标一级缓存。
  12. 根据权利要求8所述的系统,其特征在于,所述源集群需要将数据写入所述目标集群,并且所述源集群和所述目标集群具有数据一致性时,所述DCA指令为DCA回写指令,所述DCA回写指令还包括所述数据,所述DCA操作指令为DCA填充指令,
    所述共享缓存,还用于根据所述DCA回写指令中的所述缓存标识确定所述目标一级缓存,并获取所述DCA回写指令中的所述数据;
    所述共享缓存,还用于生成DCA填充指令,所述DCA填充指令包括所述数据,所述DCA填充指令用于直接将所述数据写入所述目标一级缓存。
  13. 根据权利要求12所述的系统,其特征在于,当所述源集群需要从所述目标集群读取数据,并且所述多级缓存系统不要求缓存一致性时,所述DCA指令为DCA读取指令,所述DCA操作指令为探查回写指令,
    所述共享缓存,还用于根据所述DCA读取指令中的所述缓存标识确定所述目标一级缓存;
    所述共享缓存,还用于生成探查回写指令,所述探查回写指令用于指示所述目标二级缓存从所述目标一级缓存读取所述数据。
  14. 根据权利要求13所述的系统,其特征在于,
    所述目标一级缓存,用于接收所述探查回写指令;
    所述目标一级缓存,还用于根据所述探查回写指令向所述目标二级缓存反馈第三探查回应,所述第三探查回应包括所述数据;
    所述目标二级缓存,还用于将所述第三探查回应转发至所述共享缓存;
    所述共享缓存,还用于根据所述第三探查回应生成DCA读取回应,所述DCA读取回应包括所述数据;
    所述共享缓存,还用于将所述DCA读取回应发送至所述源二级缓存,使得所述源二级 缓存根据所述DCA读取回应得到所述目标一级缓存的所述数据。
  15. 一种计算机系统,其特征在于,包括:
    外存及多级缓存系统,所述外存与所述多级缓存系统通过总线连接;
    所述多级缓存系统包括共享缓存及至少两个集群,每个集群具有至少一个一级缓存及二级缓存;
    当所述源集群需要在所述目标集群读取或写入数据时,所述源二级缓存获取直接访问缓存DCA指令,所述源二级缓存为所述集群中的二级缓存,所述DCA指令包括所述目标集群中的目标一级缓存的缓存标识;
    所述源二级缓存将所述DCA指令发送至所述共享缓存;
    所述共享缓存根据所述DCA指令生成DCA操作指令;
    所述共享缓存通过所述目标集群中的目标二级缓存将所述DCA操作指令发送至所述目标一级缓存,使得所述目标一级缓存写入所述数据或将所述数据写入所述集群。
PCT/CN2018/105010 2017-11-02 2018-09-11 一种缓存访问方法、多级缓存系统及计算机系统 WO2019085649A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711063243.XA CN109753445B (zh) 2017-11-02 2017-11-02 一种缓存访问方法、多级缓存系统及计算机系统
CN201711063243.X 2017-11-02

Publications (1)

Publication Number Publication Date
WO2019085649A1 true WO2019085649A1 (zh) 2019-05-09

Family

ID=66332832

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/105010 WO2019085649A1 (zh) 2017-11-02 2018-09-11 一种缓存访问方法、多级缓存系统及计算机系统

Country Status (2)

Country Link
CN (1) CN109753445B (zh)
WO (1) WO2019085649A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022129869A1 (en) * 2020-12-16 2022-06-23 Arm Limited Cache stash relay

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11023375B1 (en) * 2020-02-21 2021-06-01 SiFive, Inc. Data cache with hybrid writeback and writethrough
CN112416251B (zh) * 2020-11-24 2023-02-10 上海壁仞智能科技有限公司 计算系统
CN115174673B (zh) * 2022-06-29 2023-11-03 北京奕斯伟计算技术股份有限公司 具备低延迟处理器的数据处理装置、数据处理方法及设备
CN115858408A (zh) * 2022-12-29 2023-03-28 南京维拓科技股份有限公司 一种工业设计流程中设计参数的传递方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7555597B2 (en) * 2006-09-08 2009-06-30 Intel Corporation Direct cache access in multiple core processors
CN103959239A (zh) * 2011-11-30 2014-07-30 英特尔公司 对使用前缀的isa指令的条件执行支持
US20140297960A1 (en) * 2011-01-21 2014-10-02 Commissariat A L'energie Atomique Et Aux Energies Alternatives Multi-core system and method of data consistency
CN105740164A (zh) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 支持缓存一致性的多核处理器、读写方法、装置及设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7334089B2 (en) * 2003-05-20 2008-02-19 Newisys, Inc. Methods and apparatus for providing cache state information
CN104346294B (zh) * 2013-07-31 2017-08-25 华为技术有限公司 基于多级缓存的数据读/写方法、装置和计算机系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7555597B2 (en) * 2006-09-08 2009-06-30 Intel Corporation Direct cache access in multiple core processors
US20140297960A1 (en) * 2011-01-21 2014-10-02 Commissariat A L'energie Atomique Et Aux Energies Alternatives Multi-core system and method of data consistency
CN103959239A (zh) * 2011-11-30 2014-07-30 英特尔公司 对使用前缀的isa指令的条件执行支持
CN105740164A (zh) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 支持缓存一致性的多核处理器、读写方法、装置及设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022129869A1 (en) * 2020-12-16 2022-06-23 Arm Limited Cache stash relay
GB2616399A (en) * 2020-12-16 2023-09-06 Advanced Risc Mach Ltd Cache stash relay

Also Published As

Publication number Publication date
CN109753445A (zh) 2019-05-14
CN109753445B (zh) 2022-12-27

Similar Documents

Publication Publication Date Title
WO2019085649A1 (zh) 一种缓存访问方法、多级缓存系统及计算机系统
US9665486B2 (en) Hierarchical cache structure and handling thereof
US10552339B2 (en) Dynamically adapting mechanism for translation lookaside buffer shootdowns
US9170946B2 (en) Directory cache supporting non-atomic input/output operations
US9336146B2 (en) Accelerating cache state transfer on a directory-based multicore architecture
CN106326140B (zh) 数据拷贝方法、直接内存访问控制器及计算机系统
US9183146B2 (en) Hierarchical cache structure and handling thereof
US10339058B2 (en) Automatic cache coherency for page table data
CN116134475A (zh) 计算机存储器扩展设备及其操作方法
EP3183659B1 (en) Power aware padding
US9612971B2 (en) Supplemental write cache command for bandwidth compression
US20220113901A1 (en) Read optional and write optional commands
KR20190033122A (ko) 멀티캐스트 통신 프로토콜에 따라 호스트와 통신하는 저장 장치 및 호스트의 통신 방법
JP6334824B2 (ja) メモリコントローラ、情報処理装置および処理装置
EP4026005B1 (en) Producer-to-consumer active direct cache transfers
US20140006716A1 (en) Data control using last accessor information
US20210224213A1 (en) Techniques for near data acceleration for a multi-core architecture
CN112527729A (zh) 一种紧耦合异构多核处理器架构及其处理方法
US11687460B2 (en) Network cache injection for coherent GPUs
US20190179758A1 (en) Cache to cache data transfer acceleration techniques
JP6209573B2 (ja) 情報処理装置および情報処理方法
JP2016508650A (ja) リフレクティブメモリとのコヒーレンシの実施
US20240241831A1 (en) Techniques to reduce data processing latency for a device
KR101192423B1 (ko) 멀티코어 시스템 및 멀티코어 시스템의 메모리 관리 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18872618

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18872618

Country of ref document: EP

Kind code of ref document: A1