WO2022246848A1 - 分布式缓存系统和数据缓存方法 - Google Patents

分布式缓存系统和数据缓存方法 Download PDF

Info

Publication number
WO2022246848A1
WO2022246848A1 PCT/CN2021/096988 CN2021096988W WO2022246848A1 WO 2022246848 A1 WO2022246848 A1 WO 2022246848A1 CN 2021096988 W CN2021096988 W CN 2021096988W WO 2022246848 A1 WO2022246848 A1 WO 2022246848A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
time
node
request
cache
Prior art date
Application number
PCT/CN2021/096988
Other languages
English (en)
French (fr)
Inventor
何涛
于东浩
兰可嘉
李瑛�
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180093084.6A priority Critical patent/CN116848516A/zh
Priority to PCT/CN2021/096988 priority patent/WO2022246848A1/zh
Publication of WO2022246848A1 publication Critical patent/WO2022246848A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating

Definitions

  • the present application relates to the field of storage, in particular to a distributed cache system and a data cache method.
  • the distributed cache system includes a data management node and multiple data request nodes.
  • the data management node is responsible for managing the data in the address space (multiple addresses). Multiple data request nodes can request the data management node to cache data in a certain address. That is, multiple copies of the same data can be cached on multiple data request nodes.
  • the data management node In order to ensure the cache consistency between multiple copies and the original data, if the data in the address is invalid (for example, a write operation occurs), the data management node sends a message to the data request node that has cached the copy to notify the data in the address Has expired. When the number of data request nodes that need to be notified is large, the bandwidth of the entire system will be significantly reduced.
  • Embodiments of the present application provide a distributed cache system and a data cache method, which are used to increase the bandwidth of the distributed cache system.
  • a distributed cache system including a data management node, a data request node, and a memory.
  • the data management node is used to manage the cache consistency of the data in the memory; the data request node is used to send the data to the data management node.
  • the first request message includes the first time and the first address in the memory, the first request message is used to request to cache the target data in the first address; the first time is used to indicate that the target data is in the data requesting node The expiration time of the cache in the middle; the data management node is also used to send the first response message to the data request node, and update the second time according to the first time, wherein the first response message includes the target data, and the second time is used to indicate the target The latest expiration time for data cached by other nodes.
  • the two complete the negotiation of the expiration time of the data cached in the data requesting node.
  • the data cached in the data request node is automatically invalidated, and there is no need to interact with the data management node and the storage node for data invalidation, so the system bandwidth will not be reduced, and only one maximum invalidation time is recorded for one address, and the resource overhead is small.
  • the data management node includes: a time local agent and a home agent; the home agent is used to perform cache consistency management on the data in the memory; the time local agent is used to receive the first request message, and from the home agent The target data is acquired, the first response message is sent, and the second time is updated according to the first time.
  • the local agent is still responsible for cache consistency management, that is, it still uses the MESI protocol to communicate to be compatible with the existing technology.
  • the newly added time local agent is responsible for the communication of the timestamp protocol or time management, and is also responsible for communicating with the local agent according to the MESI protocol.
  • the first time is a relative time
  • the first response message further includes a third time
  • the third time is the first time minus the transmission time delay between the data management node and the data requesting node time. It is convenient for the data request node to determine the absolute time when the cached target data becomes invalid.
  • the first time is an absolute time.
  • the data management node further includes a first caching agent; the first caching agent is used to request the local agent to exclusively read the target data before the second time; the local agent is also used to request the time local agent Invalidate the target data cached by other nodes; the time local proxy is further used to indicate to the local proxy that the target data cached by other nodes is invalid after a second time; the local proxy is also used to send the target data to the first caching proxy. That is to say, when the caching proxy in the data management node requests to read the target data exclusively, the local proxy will send the target data to the local proxy after the target data cached by all other nodes has been invalidated.
  • the data requesting node is further configured to request the data management node to cache the target data in the first address after the first caching agent requests to exclusively read the target data and before the second time;
  • the management node is further configured to send the target data to the data request node after the second time. That is, before the second time, new data management nodes are no longer allowed to request cached target data in the first address, and the data requesting node can be instructed to keep re-requesting until the end of the second time, or block to the data management node after the second time Returns the object data at the first address. It can prevent the second time from being extended, and prevent the first cache agent in the data management node from being unable to exclusively read the target data as soon as possible.
  • the data request node includes: a time caching agent and a second caching agent; the second caching agent is used to request cache target data from the time caching agent; the time caching agent is used to send the first request message, receive the first response message, and send the target data to the second caching proxy.
  • the second cache agent is still responsible for cache consistency management, that is, it still uses the MESI protocol to communicate to be compatible with the existing technology.
  • the newly added time cache agent is responsible for the communication of the timestamp protocol or time management, and is also responsible for communicating with the second cache agent according to MESI protocol communication.
  • the time caching agent is further configured to request invalid cached target data from the second caching agent after the first time. After the data request node reaches the first time, the cached target data becomes invalid.
  • a data caching method including: a data request node sends a first request message to a data management node, the first request message includes a first time and a first address in a memory, and the first request message is used for Request to cache the target data in the first address; the first time is used to indicate the expiration time of the target data cached in the data request node; the data management node is used to manage the cache consistency of the data in the memory; the data management node requests the data The node sends the first response message, and updates the second time according to the first time, wherein the first response message includes the target data, and the second time is used to indicate the latest expiration time when the target data is cached by other nodes.
  • the data management node sends the first response message to the data requesting node, and updates the second time according to the first time, including: the time local agent of the data management node obtains the target from the local agent of the data management node data, sending the first response message, and updating the second time according to the first time, wherein the local agent is used to perform cache consistency management on the data in the memory.
  • the first time is a relative time
  • the first response message further includes a third time
  • the third time is the first time minus the transmission time delay between the data management node and the data requesting node time.
  • the first time is an absolute time.
  • the first caching agent of the data management node requests the local agent to exclusively read the target data before the second time; the local agent requests the local agent to invalidate the target data cached by other nodes ; Time The local agent indicates to the home agent that the target data cached by other nodes is invalid after the second time; the home agent sends the target data to the first caching proxy.
  • the data requesting node requests the data management node to cache the target data in the first address after the first caching agent requests to exclusively read the target data and before the second time;
  • the node sends the target data to the data requesting node after the second time.
  • the data request node sends the first request message to the data management node, including: the second cache agent of the data request node requests the cache target data from the time cache agent of the data request node;
  • the management node sends a first request message.
  • the method further includes: the time caching agent requests the second caching agent to invalidate the cached target data after the first time.
  • a computer-readable storage medium where instructions are stored in the computer-readable storage medium, and the instructions are run on a distributed cache system, so that the distributed cache system executes the method described in the second aspect and any implementation manner thereof. described method.
  • a computer program product including instructions is provided, and the instructions run on a distributed cache system, so that the distributed cache system executes the method described in the second aspect and any implementation manner thereof.
  • FIG. 1 is a schematic structural diagram of a processor with multiple cores provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of the architecture of a distributed cache system provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of another distributed cache system provided by an embodiment of the present application.
  • FIG. 4 is a first schematic flow diagram of a data caching method provided by an embodiment of the present application.
  • FIG. 5 is a second schematic flow diagram of a data caching method provided by an embodiment of the present application.
  • FIG. 6 is a third schematic flow diagram of a data caching method provided by an embodiment of the present application.
  • FIG. 7 is a fourth schematic flowchart of a data caching method provided by an embodiment of the present application.
  • Cache consistency The operating speed of the processor is much faster than the reading and writing speed of the memory (or memory).
  • a cache (such as a first-level cache or a second-level cache) can be set between the core of the processor and the memory, and the reading and writing speed of the cache is faster than the reading and writing speed of the memory but slower than the operating speed of the processor.
  • DMA direct memory access
  • a cache 12 is set for each core, when different cores of the same processor store a copy of the data at the same address in the memory 13 through their respective caches 12 When , these copies have a cache consistency problem with the original data. When the original data is modified (or invalid), these copies must be updated again, otherwise there will be a problem of inconsistency between the cached copy and the original data.
  • the distributed cache system includes a data management node 21 , a data request node 22 and a memory 23 .
  • Both the data management node 21 and the data request node 22 include a kernel (not shown) and a cache.
  • the data management node 21 is responsible for cache coherency management of the data in the memory 23. When multiple caches (including the data in the data management node 21 When copies of the data at the same address in the memory 23 are stored in the cache and the cache in the data request node 22, these copies also have a cache consistency problem with the original data.
  • E exclusive shared invalid
  • M modified (modified, M) state
  • shared (shared, S) state shared (shared, S) state
  • invalid (invalid, I) state the E state indicates that the cache line is valid, the data in the cache is consistent with the data in the memory, and the data only exists in the local cache.
  • the cached data can be called E-state data, and the data management node 21 has the authority to apply for the E-state;
  • the M state indicates The cache line is valid, the data has been modified, the data in the cache is inconsistent with the data in the memory, the cached data can be called M-state data, and the data management node 21 has the authority to apply for the M-state;
  • the S state indicates that the cache line is valid, and the data in the cache Consistent with the data in the memory, the data exists in multiple caches, and the cached data can be called S-state data.
  • Both the data management node 21 and the data request node 22 have the authority to apply for the S-state; the I-state indicates that the cache line is invalid, that is, the data Not stored in the cache but used directly, the data can be called I-state data, and both the data management node 21 and the data request node 22 have the authority to apply for the I-state.
  • the processor involved in this embodiment of the present application may be a chip.
  • it can be a field programmable gate array (field programmable gate array, FPGA), an application specific integrated circuit (ASIC), a system on chip (SoC), or a central processing unit.
  • It can also be a central processor unit (CPU), a network processor (network processor, NP), a digital signal processing circuit (digital signal processor, DSP), or a microcontroller (micro controller unit, MCU) , and can also be a programmable logic device (programmable logic device, PLD) or other integrated chips.
  • the memory involved in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory may be flash memory (Flash).
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM synchronous DRAM
  • DDR double data rate
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the module responsible for cache consistency management of the data in the memory in the data management node 11 is a local agent (home agent, HA), the data management node 21 and the data request node
  • the module responsible for the MESI state management of the corresponding cache in 22 is a cache agent (cache agent, CA).
  • CA0 in processor 0 is responsible for the MESI state management of cache 0 in processor 0
  • CA1 in processor 1 is responsible for processing
  • CA2 in processor 2 is responsible for the MESI state management of cache 1 in processor 1
  • CA3 in processor 0 is responsible for the MESI state management of cache 3 in processor 3.
  • the corresponding CA requests the corresponding authority from the HA according to the MESI protocol to ensure cache consistency.
  • CA0 in processor 0 can send a modification message to HA in processor 0 to write the M-state data into memory 23 middle.
  • CA1 in processor 1 may send a modification message to HA in processor 0 to write M-state data into memory 23 .
  • CA0 in processor 0 may send a read exclusive (read exclusive, RE) message to HA in processor 0 to request Read E-state data, or, CA0 in processor 0 may send a read shared (read shared, RS) message to HA in processor 0 to request to read S-state data.
  • RE read exclusive
  • RS read shared
  • CA0 in processor 0 can send a read invalid (read invalid, RI) to HA in processor 0 Message to request to read I-state data.
  • processor 1 wants to directly use data at a certain address in memory 23 without storing it in cache 1, CA1 in processor 1 can send a read invalid (read invalid, RI) message to HA in processor 0 to request Read I-state data.
  • HA In order to ensure the cache consistency of multiple copies, HA maintains a directory for recording the MESI status of data in a certain address.
  • the directory can be recorded in an accurate or fuzzy manner:
  • a CA requests the HA to cache the data in a certain address
  • the HA creates a vector in the directory, which indicates the MESI status of the data in the address and requests to cache the data in the address The identity of the CA. If multiple CAs request caching of data at the same address, HA will create multiple vectors for that address.
  • HA allocates a vector in the directory for each address of the requested cached data, which indicates the MESI status of the data in the address, that is, if multiple CAs request to cache the data in the same address, Then HA will create a vector for this address.
  • the data request node has no right to request to read the E-state data from the data management node, and the data request node has the right to request to read the I-state data or read the S-state data from the data management node.
  • the HA does not need to notify the data requesting node that the data in the address is invalid.
  • the data requesting node requests to read S-state data from the data management node, that is, the data requesting node will cache the data in a certain address of the memory.
  • the CA that has cached the data at the address sends an invalidation listener request message to notify each CA that the data at the address has been invalidated.
  • HA needs to broadcast invalid listen request message to all CAs or to some CAs, that is, the number of CAs that need to be notified is large, and many of them do not cache the data in the address, so there is Many invalid listening request messages are invalid.
  • they will be transmitted over long distances for a long time, which will significantly reduce the bandwidth of the entire system. Restrictions on network order.
  • the directory resources are reduced compared to the precise recording method, the overhead of directory resources is still very large for a large-scale distributed cache system.
  • the number of data requesting nodes that can be accessed is limited, so the scalability is very limited.
  • the embodiments of the present application provide a distributed cache system and a data cache method, which are applicable to the above-mentioned scenario where a data request node requests a data management node to read S-state data.
  • the CA in each data request node notifies the HA in the data management node of the expiration time of the data cached in the same address.
  • the data cached by the data request node is automatically invalidated.
  • the HA of the data management node can determine that all cached copies of the data in the address have expired.
  • a time home agent (time home agent, THA) (such as THA in CPU0) can be added at the data management node 21, and a time cache agent (time cache agent, TCA) can be added at the data request node 22 (such as CPU1 TCA1 in CPU2, TCA2 in CPU2), the cache coherence management function of HA and the MESI state management function of CA are described with reference to Figure 2.
  • THA time home agent
  • TCA time cache agent
  • the data management node and the data request node execute a data caching method as shown in Figure 4:
  • the data request node sends a first request message to the data management node.
  • the first request message includes the first time and the first address in the memory, and the first request message is used to request to cache the target data in the first address, or in other words, the first request message is used to request to read S from the first address state data, the first request message may be a read S state data (RS) message.
  • RS read S state data
  • the first time is used to indicate the invalidation time of the cache of the target data in the data request node, that is, after the first time passes, the target data cached by the data request node becomes invalid.
  • the first time may be a relative time (for example, X milliseconds after the current moment) or an absolute time (for example, X o'clock X minutes X seconds X microseconds).
  • the CA of the data requesting node requests the TCA to cache the target data in the first address, and the TCA sends the first request message to the THA of the data management node. After receiving the first request message, the THA of the data management node sends The HA requests to cache the target data in the first address.
  • CA1 in CPU1 sends an RS1 message to TCA1 in CPU1, and the RS1 message includes the first address.
  • TCA1 builds the first mapping table (referred to as "build table 1" in the figure) and sends RS2 message to THA in CPU0.
  • the first mapping table indicates the mapping relationship between the CA1 identifier and the THA identifier, so that when a response message is subsequently received from the THA, it is determined according to the first mapping table that the response message should be forwarded to CA1.
  • the RS2 message includes a first address and a first time 1. Exemplarily, the first time 1 may be time T1.
  • the THA in CPU0 After receiving the RS2 message, the THA in CPU0 builds a second mapping table (referred to as "build table 2" in the figure) and sends an RS3 message to the HA in CPU0.
  • the second mapping table indicates the mapping relationship between the HA identifier and the TCA1 identifier, so that when receiving a response message from the HA subsequently, it is determined according to the second mapping table that the response message should be forwarded to TCA1.
  • the RS3 message includes the first address.
  • CA2 in CPU2 sends an RS4 message to TCA2 in CPU2, and the RS4 message includes the first address.
  • TCA2 builds a third mapping table (referred to as "build table 3" in the figure) and sends an RS5 message to THA in CPU0.
  • the third mapping table indicates the mapping relationship between the CA2 identifier and the THA identifier, so that when receiving a response message from THA subsequently, it is determined according to the third mapping table that the response message should be forwarded to CA2.
  • the RS5 message includes a first address and a first time 2. Exemplarily, the first time 2 may be time T2.
  • the THA in CPU0 After the THA in CPU0 receives the RS5 message, it establishes a fourth mapping table (referred to as "building table 4" in the figure) and sends an RS6 message to the HA in CPU0.
  • the fourth mapping table indicates the mapping relationship between the HA identification and the TCA2 identification, which is convenient for subsequent When receiving the response message from the HA, it is determined according to the fourth mapping table that the response message should be forwarded to TCA2.
  • the RS6 message includes the first address.
  • the data management node sends a first response message to the data request node, and updates the second time according to the first time.
  • the second time is used to indicate the latest expiration time when the target data is cached by other nodes (referring to the data request node that has cached the target data). If there are multiple data request nodes requesting to cache the target data in the same address, each data request node will send its own first time, and the data management node will select the latest first time to update the second time, and at the second time Afterwards, the data management node may determine that the target data cached by each data requesting node has expired. In the example of FIG. 5 , the second time is time T2. It should be noted that the data management node may update the second time according to the first time at step S401 or step S402, which is not limited in this application.
  • the first response message includes the target data.
  • the first response message may also include a third time.
  • the third time is the first time minus the data management node and the data request node. The delay time between transmissions is convenient for the data request node to determine the absolute time when the cached target data becomes invalid, and the absolute time is consistent with the data management node.
  • the HA of the data management node sends the target data to the THA of the data management node, and the THA of the data management node sends a first response message to the TCA of the data requesting node, and updates the second time according to the first time, and the TCA of the data requesting node Send the target data to the CA of the data requesting node.
  • the HA in CPU0 sends a D1 message to the THA in CPU0, and the D1 message includes the target data.
  • THA in CPU0 sends a D2 message (that is, a first response message) to TCA1 in CPU1, where the D2 message includes target data, and may optionally include a third time 1 .
  • TCA1 in CPU1 sends a D3 message to CA1 in CPU1, and the D3 message includes target data.
  • HA in CPU0 sends a D4 message to THA in CPU0 , and the D4 message includes the target data.
  • THA in CPU0 sends a D5 message (that is, a first response message) to TCA2 in CPU2, where the D5 message includes target data, and may optionally include a third time 2 .
  • TCA2 in CPU2 sends a D6 message to CA2 in CPU2, and the D6 message includes target data.
  • the data requesting node may invalidate the cached target data.
  • the TCA of the data requesting node requests the CA of the data requesting node to invalidate the cached target data.
  • TCA1 in CPU1 sends invalidation listening request message (Snp1) to request invalidation cache to CA1 in CPU1 after time T1, TCA1 in CPU1 sends invalidation listening request message (Snp1) to request invalidation cache to CA1 in CPU1 After CA1 in CPU1 clears the target data stored in the cache, it sends an invalid interception response message (Rsp1) to TCA1 in CPU1, and TCA1 in CPU1 deletes the first mapping table (referred to as "delete table 1" in the figure) .
  • Snp1 invalidation listening request message
  • Rsp1 invalid interception response message
  • TCA2 in CPU2 sends invalidation listening request message (Snp2) to CA2 in CPU2 to request invalid cache Target data
  • Snp2 in CPU2 removes the target data stored in the cache, it sends an invalid interception response message (Rsp2) to TCA2 in CPU2
  • Rsp2 in CPU2 deletes the third mapping table (referring to " delete table 3 " in the figure).
  • the data management node determines that the target data cached by all data requesting nodes are invalid, and at this time, the data management node may delete the mapping table related to the target address.
  • CPU0 may delete the second mapping table (referred to as “delete table 2" in the figure) and the fourth mapping table (referred to as “delete table 4" in the figure) after time T2.
  • the CA in the data management node requests the HA in the data management node to exclusively read the target data before the second time, that is, requests to read E-state data (RE)
  • the HA request to the THA in the data management node is invalid
  • THA indicates to HA after the second time that the target data cached by other nodes is invalid, and HA sends the target data to the CA in the data management node, that is, the CA exclusive in the data management node is realized to read the target data.
  • CA0 in CPU0 sends a RE1 message to HA in CPU0 before time T2 to request to exclusively read target data
  • HA in CPU0 sends an invalid listening request message to THA in CPU0 (Snp3)
  • THA in CPU0 sends an invalid interception response message (Rsp3) to HA in CPU0 after time T2
  • HA in CPU0 sends a D7 message to CA0 in CPU0
  • the D7 message includes target data.
  • the CA in the data management node requests the HA in the data management node to exclusively read the target data before the second time
  • the data requesting node is blocked from requesting to cache the target data in the first address before the second time. That is, before the second time, new data management nodes are no longer allowed to request cached target data in the first address, and the data requesting node can be instructed to keep re-requesting until the end of the second time, or block to the data management node after the second time Returns the object data at the first address. It can prevent the second time from being extended, and prevent the CA in the data management node from being unable to exclusively read the target data as soon as possible.
  • the two complete the negotiation of the expiration time of the data cached in the data requesting node.
  • the time is up, the data cached in the data requesting node will automatically expire, and there is no need to interact with the data management node and the storage node for data invalidation, so the system bandwidth will not be reduced, and only one maximum invalidation time is recorded for one address, resource Less overhead.
  • the embodiment of the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and the instructions are run on the distributed cache system, so that the distributed cache system executes the method in FIG. 4 .
  • the embodiment of the present application also provides a computer program product including instructions, and the instructions run on the distributed cache system, so that the distributed cache system executes the method in FIG. 4 .
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • a software program it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server, or data center Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device including one or more servers, data centers, etc. that can be integrated with the medium.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.
  • a magnetic medium such as a floppy disk, a hard disk, or a magnetic tape
  • an optical medium such as a DVD
  • a semiconductor medium such as a solid state disk (Solid State Disk, SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种分布式缓存系统和数据缓存方法,涉及存储领域,用于提高分布式缓存系统的带宽。分布式缓存系统包括数据管理节点、数据请求节点和存储器,数据管理节点用于对存储器中的数据进行缓存一致性管理;数据请求节点用于向数据管理节点发送第一请求消息(S401),第一请求消息中包括第一时间和存储器中的第一地址,第一请求消息用于请求缓存第一地址中的目标数据;第一时间用于指示目标数据在数据请求节点中缓存的失效时间;数据管理节点还用于向数据请求节点发送第一响应消息,并根据第一时间更新第二时间(S402),其中,第一响应消息中包括目标数据,第二时间用于指示目标数据被其他节点缓存的最晚失效时间。

Description

分布式缓存系统和数据缓存方法 技术领域
本申请涉及存储领域,尤其涉及一种分布式缓存系统和数据缓存方法。
背景技术
分布式缓存系统包括数据管理节点和多个数据请求节点,数据管理节点负责管理地址空间(多个地址)中的数据,多个数据请求节点可以向数据管理节点请求缓存某一地址中的数据,即可以在多个数据请求节点缓存同一数据的多个副本。
为了保证多个副本与原始数据的缓存一致性,如果该地址中的数据失效(例如发生了写操作),则数据管理节点向缓存了副本的数据请求节点发送消息,以通知该地址中的数据已经失效。当需要通知的数据请求节点的数目较多时,会明显降低整个系统的带宽。
发明内容
本申请实施例提供一种分布式缓存系统和数据缓存方法,用于提高分布式缓存系统的带宽。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供了一种分布式缓存系统,包括数据管理节点、数据请求节点和存储器,数据管理节点用于对存储器中的数据进行缓存一致性管理;数据请求节点用于向数据管理节点发送第一请求消息,第一请求消息中包括第一时间和存储器中的第一地址,第一请求消息用于请求缓存第一地址中的目标数据;第一时间用于指示目标数据在数据请求节点中缓存的失效时间;数据管理节点还用于向数据请求节点发送第一响应消息,并根据第一时间更新第二时间,其中,第一响应消息中包括目标数据,第二时间用于指示目标数据被其他节点缓存的最晚失效时间。
本申请实施例提供的分布式缓存系统,通过数据请求节点在向数据管理节点请求缓存数据时,二者即完成了在数据请求节点中缓存的数据的失效时间的协商,当失效时间达到时,数据请求节点中缓存的数据自动失效,数据管理节点与存储节点之间无需再针对数据失效进行交互,因此不会降低系统带宽,并且针对一个地址只用记录一个最大失效时间,资源开销较小。
在一种可能的实施方式中,数据管理节点包括:时间本地代理和本地代理;本地代理用于对存储器中的数据进行缓存一致性管理;时间本地代理用于接收第一请求消息,从本地代理获取目标数据,发送第一响应消息,以及根据第一时间更新第二时间。本地代理仍然负责缓存一致性管理,即仍采用MESI协议通信以兼容现有技术,新增的时间本地代理负责时间戳协议的通信或称负责时间管理,还负责与本地代理按照MESI协议通信。
在一种可能的实施方式中,第一时间为相对时间,第一响应消息中还包括第三时间,第三时间为第一时间减去数据管理节点与数据请求节点之间传输时延后的时间。方便数据请求节点确定缓存的目标数据失效的绝对时间。
在一种可能的实施方式中,第一时间为绝对时间。
在一种可能的实施方式中,数据管理节点还包括第一缓存代理;第一缓存代理用于在第二时间之前向本地代理请求独占地读目标数据;本地代理还用于向时间本地代理请求无效被其他节点缓存的目标数据;时间本地代理还用于在第二时间之后向本地代理指示被其他节点缓存的目标数据已经无效;本地代理还用于向第一缓存代理发送目标数据。也就是说,在数据管理节点中的缓存代理请求独占地读目标数据时,本地代理要在所有其他节点缓存的目标数据已经无效之后把目标数据发送给本地代理。
在一种可能的实施方式中,数据请求节点,还用于在第一缓存代理请求独占地读目标数据之后,并且第二时间之前,向数据管理节点请求缓存第一地址中的目标数据;数据管理节点,还用于在第二时间之后向数据请求节点发送目标数据。即在第二时间之前不再允许新的数据管理节点请求缓存第一地址中的目标数据,可以指示数据请求节点不断重新请求直到第二时间结束,或者,阻塞到第二时间之后向数据管理节点返回第一地址中的目标数据。可以防止再延长第二时间,避免数据管理节点中的第一缓存代理无法尽快独占地读目标数据。
在一种可能的实施方式中,数据请求节点包括:时间缓存代理和第二缓存代理;第二缓存代理用于向时间缓存代理请求缓存目标数据;时间缓存代理用于向数据管理节点发送第一请求消息,接收第一响应消息,并向第二缓存代理发送目标数据。第二缓存代理仍然负责缓存一致性管理,即仍采用MESI协议通信以兼容现有技术,新增的时间缓存代理负责时间戳协议的通信或称负责时间管理,还负责与第二缓存代理按照MESI协议通信。
在一种可能的实施方式中,时间缓存代理还用于在第一时间之后向第二缓存代理请求无效缓存的目标数据。数据请求节点在达到第一时间以后,缓存的目标数据即失效。
第二方面,提供了一种数据缓存方法,包括:数据请求节点向数据管理节点发送第一请求消息,第一请求消息中包括第一时间和存储器中的第一地址,第一请求消息用于请求缓存第一地址中的目标数据;第一时间用于指示目标数据在数据请求节点中缓存的失效时间;数据管理节点用于对存储器中的数据进行缓存一致性管理;数据管理节点向数据请求节点发送第一响应消息,并根据第一时间更新第二时间,其中,第一响应消息中包括目标数据,第二时间用于指示目标数据被其他节点缓存的最晚失效时间。
在一种可能的实施方式中,数据管理节点向数据请求节点发送第一响应消息,并根据第一时间更新第二时间,包括:数据管理节点的时间本地代理从数据管理节点的本地代理获取目标数据,发送第一响应消息,以及根据第一时间更新第二时间,其中,本地代理用于对存储器中的数据进行缓存一致性管理。
在一种可能的实施方式中,第一时间为相对时间,第一响应消息中还包括第三时间,第三时间为第一时间减去数据管理节点与数据请求节点之间传输时延后的时间。
在一种可能的实施方式中,第一时间为绝对时间。
在一种可能的实施方式中,还包括:数据管理节点的第一缓存代理在第二时间之前向本地代理请求独占地读目标数据;本地代理向时间本地代理请求无效被其他节点 缓存的目标数据;时间本地代理在第二时间之后向本地代理指示被其他节点缓存的目标数据已经无效;本地代理向第一缓存代理发送目标数据。
在一种可能的实施方式中,还包括:数据请求节点在第一缓存代理请求独占地读目标数据之后,并且第二时间之前,向数据管理节点请求缓存第一地址中的目标数据;数据管理节点在第二时间之后向数据请求节点发送目标数据。
在一种可能的实施方式中,数据请求节点向数据管理节点发送第一请求消息,包括:数据请求节点的第二缓存代理向数据请求节点的时间缓存代理请求缓存目标数据;时间缓存代理向数据管理节点发送第一请求消息。
在一种可能的实施方式中,还包括:时间缓存代理在第一时间之后向第二缓存代理请求无效缓存的所述目标数据。
第三方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,指令在分布式缓存系统上运行,使得分布式缓存系统执行第二方面及其任一实施方式所述的方法。
第四方面,提供了一种包含指令的计算机程序产品,该指令在分布式缓存系统上运行,使得分布式缓存系统执行第二方面及其任一实施方式所述的方法。
关于第二方面至第四方面的技术效果参照第一方面及其任一实施方式的技术效果。
附图说明
图1为本申请实施例提供的一种多个内核的处理器的结构示意图;
图2为本申请实施例提供的一种分布式缓存系统的架构示意图;
图3为本申请实施例提供的另一种分布式缓存系统的架构示意图;
图4为本申请实施例提供的一种数据缓存方法的流程示意图一;
图5为本申请实施例提供的一种数据缓存方法的流程示意图二;
图6为本申请实施例提供的一种数据缓存方法的流程示意图三;
图7为本申请实施例提供的一种数据缓存方法的流程示意图四。
具体实施方式
首先对本申请涉及的一些概念进行描述:
缓存一致性:处理器的运行速度相对于存储器(或称内存)的读写速度快很多,处理器对存储器进行读写操作时,如果等待读写操作完成后再处理其他任务,将会降低处理器的工作效率。因此,在处理器的内核和存储器之间可以设置缓存(例如一级缓存或二级缓存),缓存的读写速度快于存储器的读写速度而慢于处理器的运行速度。当处理器向存储器中写数据时,可以将数据先写入缓存然后就可以处理其他任务,由直接存储器访问(direct memory access,DMA)器件来将数据存储至存储器;同理,当处理器读存储器中的数据时,由DMA器件先将数据从存储器写入缓存,再由处理器从缓存中读取数据。
如图1所示,对于包括多个内核的处理器11来说,针对每个内核会设置缓存12,当同一处理器的不同内核通过各自的缓存12存储了存储器13中同一地址的数据的副本时,这些副本就与原始数据存在缓存一致性问题,当原始数据发生修改(或称无效)时,这些副本也要重新更新,否则会出现缓存的副本与原始数据不一致的问题。
如图2所示,对于分布式缓存系统来说,包括数据管理节点21、数据请求节点22 和存储器23。数据请求节点22可以有一个或多个,数据管理节点21和数据请求节点22可以为位于不同主机(host)的处理器。数据管理节点21和数据请求节点22中都包括内核(未示出)和缓存,数据管理节点21负责对存储器23中的数据进行缓存一致性管理,当多个缓存(包括数据管理节点21中的缓存以及数据请求节点22中的缓存)中存储了存储器23中同一地址的数据的副本时,这些副本同样与原始数据存在缓存一致性问题。
要求缓存一致性的设备可以遵守修改独占共享无效(modified exclusive shared invalid,MESI)协议,在MESI协议中规定了缓存线(cache line)(缓存中的最小缓存单位)的四种状态,包括:独占(exclusive,E)态、修改(modified,M)态、共享(shared,S)态和无效(invalid,I)态。其中,E态表示该缓存线有效,缓存中数据和存储器中数据一致,数据只存在于本地缓存中,缓存的数据可以称为E态数据,数据管理节点21有权限申请E态;M态表示该缓存线有效,数据被修改了,缓存中数据和存储器中数据不一致,缓存的数据可以称为M态数据,数据管理节点21有权限申请M态;S态表示该缓存线有效,缓存中数据和存储器中数据一致,数据存在于多个缓存中,缓存的数据可以称为S态数据,数据管理节点21和数据请求节点22都有权限申请S态;I态表示该缓存线无效,即数据不存储在缓存中而是被直接使用,该数据可以称为I态数据,数据管理节点21和数据请求节点22都有权限申请I态。
本申请实施例涉及的处理器可以是一个芯片。例如,可以是现场可编程门阵列(field programmable gate array,FPGA),可以是专用集成芯片(application specific integrated circuit,ASIC),还可以是系统芯片(system on chip,SoC),还可以是中央处理器(central processor unit,CPU),还可以是网络处理器(network processor,NP),还可以是数字信号处理电路(digital signal processor,DSP),还可以是微控制器(micro controller unit,MCU),还可以是可编程控制器(programmable logic device,PLD)或其他集成芯片。
本申请实施例涉及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是闪存(Flash)。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率(double data rate,DDR)存储器、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
如图2所示,对于分布式缓存系统来说,数据管理节点11中负责对存储器中的数据进行缓存一致性管理的模块为本地代理(home agent,HA),数据管理节点21和数据请求节点22中负责对应缓存的MESI状态管理的模块为缓存代理(cache agent,CA),例如,处理器0中的CA0负责处理器0中的缓存0的MESI状态管理,处理器1中的CA1负责处理器1中的缓存1的MESI状态管理,处理器2中的CA2负责处理器2中的缓存2的MESI状态管理,处理器0中的CA3负责处理器3中的缓存3的MESI状 态管理。
当数据管理节点21或数据请求节点22要将某个缓存中的数据存储至存储器中的某个地址时,或者,当数据管理节点21或数据请求节点22要将存储器中某个地址的数据存储至缓存中时,对应的CA根据MESI协议向HA请求对应的权限,以保证缓存一致性。
例如,当处理器0要将缓存0中的数据存储至存储器23中某个地址时,处理器0中的CA0可以向处理器0中的HA发送修改消息,以将M态数据写入存储器23中。当处理器1要将缓存1中的数据存储至存储器23中某个地址时,处理器1中的CA1可以向处理器0中的HA发送修改消息,以将M态数据写入存储器23中。
再例如,当处理器0要将存储器23中某个地址的数据存储至缓存0中时,处理器0中的CA0可以向处理器0中的HA发送读独占(read exclusive,RE)消息以请求读取E态数据,或者,处理器0中的CA0可以向处理器0中的HA发送读共享(read shared,RS)消息以请求读取S态数据。当处理器1要将存储器23中某个地址的数据存储至缓存1中时,处理器1中的CA1可以向处理器0中的HA发送读共享(read shared,RS)消息以请求读取S态数据。
再例如,当处理器0要直接使用存储器23中某个地址的数据而不必存储至缓存0中时,处理器0中的CA0可以向处理器0中的HA发送读无效(read invalid,RI)消息以请求读取I态数据。当处理器1要直接使用存储器23中某个地址的数据而不必存储至缓存1中时,处理器1中的CA1可以向处理器0中的HA发送读无效(read invalid,RI)消息以请求读取I态数据。
对于HA来说,为了保证多个副本的缓存一致性,HA维护一个目录,用于记录某个地址中的数据的MESI状态,该目录可以采用精确记录方式或模糊记录方式:
对于精确记录方式来说,如果某个CA向HA请求缓存某一地址中的数据,则HA在目录中建立一个向量,该向量指示该地址中的数据的MESI状态以及请求缓存该地址中的数据的CA的标识。如果有多个CA请求缓存同一地址中的数据,则HA会针对该地址建立多个向量。
对于模糊记录方式来说,HA在目录中针对每个被请求缓存数据的地址分配一个向量,该向量指示该地址中的数据的MESI状态,即如果有多个CA请求缓存同一地址中的数据,则HA会针对该地址建立一个向量。
数据请求节点无权限向数据管理节点请求读取E态数据,数据请求节点有权限向数据管理节点请求读取I态数据或读取S态数据。
如果数据请求节点向数据管理节点请求读取I态数据,即数据请求节点直接使用存储器的某一地址中的数据而不进行缓存,在这些场景下,如果该地址中的数据失效(例如发生了写操作),则HA不必通知该数据请求节点该地址中的数据已经失效。
如果数据请求节点向数据管理节点请求读取S态数据,即数据请求节点会缓存存储器的某一地址中的数据,如果该地址中的数据失效(例如发生了写操作),则HA根据目录向缓存了该地址中的数据的CA发送无效侦听请求消息,以通知各个CA该地址中的数据已经失效。
对于精确记录方式来说,当整个分布式缓存系统的规模较大时,HA需要记录的 向量数目非常多,导致目录资源开销非常大;另外,HA管理的目录结构一旦确定就限制了可以接入的数据请求节点的数目,因此扩展性非常有限。
对于模糊记录方式来说,HA需要向所有CA广播或者向一部分CA组播无效侦听请求消息,即需要通知的CA的数目较多,而其中很多CA并未缓存该地址中的数据,因此有很多无效侦听请求消息是无效的,在大规模的分布式缓存系统中会经过长距离长时间传输,明显降低整个系统的带宽,并且为了避免冲突,有可能HA与CA经过多次握手或者需要网络保序的限制。虽然相对于精确记录方式减小了目录资源,但是对于大规模的分布式缓存系统来说目录资源的开销还是非常大。另外,与精确记录方式类似的,HA管理的目录结构一旦确定就限制了可以接入的数据请求节点的数目,因此扩展性非常有限。
为此,本申请实施例提供了一种分布式缓存系统和数据缓存方法,适用于前文所述的数据请求节点向数据管理节点请求读取S态数据的场景。由各个数据请求节点中的CA分别通知数据管理节点中的HA缓存同一地址中的数据的失效时间,当达到某个数据请求节点缓存数据的失效时间时,该数据请求节点缓存的数据自动失效,当达到同一地址中的数据被缓存的最晚失效时间时,数据管理节点的HA即能够确定该地址中的数据被缓存的副本都已经失效。即通过数据请求节点侧缓存的数据的失效时间以及数据管理节点侧的数据的最晚失效时间,实现数据自动失效,数据管理节点与存储节点之间无需进行交互,不会降低系统带宽,并且针对一个地址只用记录一个最大失效时间,资源开销较小。
如图3所示,可以在数据管理节点21增加时间本地代理(time home agent,THA)(例如CPU0中的THA),在数据请求节点22增加时间缓存代理(time cache agent,TCA)(例如CPU1中的TCA1,CPU2中的TCA2),HA的缓存一致性管理功能以及CA的MESI状态管理功能参照图2描述,与图2的区别在于,HA与CA之间的直接通信变为通过THA和TCA来间接通信,新增的THA和TCA负责时间戳协议的通信或称负责时间管理,THA还负责与HA按照MESI协议通信,TCA还负责与CA按照MESI协议通信。
具体的,数据管理节点和数据请求节点执行如图4所示的一种数据缓存方法:
S401、数据请求节点向数据管理节点发送第一请求消息。
第一请求消息中包括第一时间和存储器中的第一地址,第一请求消息用于请求缓存第一地址中的目标数据,或者说,第一请求消息用于请求从第一地址读取S态数据,该第一请求消息可以为读取S态数据(RS)消息。
第一时间用于指示目标数据在数据请求节点中缓存的失效时间,即在第一时间过后,数据请求节点缓存的目标数据变为无效。第一时间可以为相对时间(例如当前时刻的X毫秒之后)或者绝对时间(例如X点X分X秒X微秒)。
具体的,数据请求节点的CA向TCA请求缓存第一地址中的目标数据,TCA向数据管理节点的THA发送第一请求消息,数据管理节点的THA接收到第一请求消息后,向数据管理节点的HA请求缓存第一地址中的目标数据。
示例性的,如图5所示,对于CPU1向CPU0请求读取S态数据来说,CPU1中的CA1向CPU1中的TCA1发送RS1消息,该RS1消息中包括第一地址。TCA1建立第 一映射表(图中指“建表1”)并向CPU0中的THA发送RS2消息。该第一映射表指示CA1标识与THA标识的映射关系,便于后续从THA接收响应消息时,根据该第一映射表确定应该将响应消息转发给CA1。该RS2消息中包括第一地址和第一时间1,示例性的,第一时间1可以为时间T1。CPU0中的THA接收RS2消息后,建立第二映射表(图中指“建表2”)并向CPU0中的HA发送RS3消息。该第二映射表指示HA标识与TCA1标识的映射关系,便于后续从HA接收响应消息时,根据该第二映射表确定应该将响应消息转发给TCA1。该RS3消息中包括第一地址。
类似地,如图5所示,对于CPU2向CPU0请求读取S态数据来说,CPU2中的CA2向CPU2中的TCA2发送RS4消息,该RS4消息中包括第一地址。TCA2建立第三映射表(图中指“建表3”)并向CPU0中的THA发送RS5消息。该第三映射表指示CA2标识与THA标识的映射关系,便于后续从THA接收响应消息时,根据该第三映射表确定应该将响应消息转发给CA2。该RS5消息中包括第一地址和第一时间2,示例性的,第一时间2可以为时间T2。CPU0中的THA接收RS5消息后,建立第四映射表(图中指“建表4”)并向CPU0中的HA发送RS6消息,该第四映射表指示HA标识与TCA2标识的映射关系,便于后续从HA接收响应消息时,根据该第四映射表确定应该将响应消息转发给TCA2。该RS6消息中包括第一地址。
S402、数据管理节点向数据请求节点发送第一响应消息,并根据第一时间更新第二时间。
其中,第二时间用于指示目标数据被其他节点(指缓存了目标数据的数据请求节点)缓存的最晚失效时间。如果有多个数据请求节点请求缓存同一地址中的目标数据,每个数据请求节点都会发送各自的第一时间,则数据管理节点选择最晚的第一时间来更新第二时间,在第二时间之后,数据管理节点可以确定各个数据请求节点缓存的目标数据已经失效,在图5的示例中,第二时间即为时间T2。需要说明的是,数据管理节点根据第一时间更新第二时间可以位于步骤S401或步骤S402,本申请不作限定。
第一响应消息中包括目标数据,可选的,当第一时间为相对时间时,第一响应消息中还可以包括第三时间,第三时间为第一时间减去数据管理节点与数据请求节点之间传输时延后的时间,方便数据请求节点确定缓存的目标数据失效的绝对时间,该绝对时间与数据管理节点保持一致。
具体的,数据管理节点的HA向数据管理节点的THA发送目标数据,数据管理节点的THA向数据请求节点的TCA发送第一响应消息,并根据第一时间更新第二时间,数据请求节点的TCA向数据请求节点的CA发送目标数据。
示例性的,如图5所示,对于CPU1向CPU0请求读取S态数据来说,CPU0中的HA向CPU0中的THA发送D1消息,该D1消息中包括目标数据。CPU0中的THA向CPU1中的TCA1发送D2消息(即第一响应消息),该D2消息中包括目标数据,可选的,可以包括第三时间1。CPU1中的TCA1向CPU1中的CA1发送D3消息,该D3消息中包括目标数据。
类似地,如图5所示,对于CPU2向CPU0请求读取S态数据来说,CPU0中的HA向CPU0中的THA发送D4消息,该D4消息中包括目标数据。CPU0中的THA向CPU2中的TCA2发送D5消息(即第一响应消息),该D5消息中包括目标数据, 可选的,可以包括第三时间2。CPU2中的TCA2向CPU2中的CA2发送D6消息,该D6消息中包括目标数据。
在第一时间之后,数据请求节点可以无效缓存的目标数据,具体的,数据请求节点的TCA向数据请求节点的CA请求无效缓存的目标数据。
示例性的,如图5所示,对于CPU1向CPU0请求读取S态数据来说,在时间T1之后,CPU1中的TCA1向CPU1中的CA1发送无效侦听请求消息(Snp1)以请求无效缓存的目标数据,CPU1中的CA1清除缓存中存储的目标数据后,向CPU1中的TCA1发送无效侦听响应消息(Rsp1),CPU1中的TCA1删除第一映射表(图中指“删表1”)。
类似地,如图5所示,对于CPU2向CPU0请求读取S态数据来说,在时间T2之后,CPU2中的TCA2向CPU2中的CA2发送无效侦听请求消息(Snp2)以请求无效缓存的目标数据,CPU2中的CA2清除缓存中存储的目标数据后,向CPU2中的TCA2发送无效侦听响应消息(Rsp2),CPU2中的TCA2删除第三映射表(图中指“删表3”)。
在第二时间之后,数据管理节点确定被所有数据请求节点缓存的目标数据均已经失效,此时,数据管理节点可以删除该目标地址相关的映射表。
示例性的,如图5所示,CPU0可以在时间T2以后删除第二映射表(图中指“删表2”)以及第四映射表(图中指“删表4”)。
另外,如果数据管理节点中的CA在第二时间之前向数据管理节点中的HA请求独占地读目标数据,即请求读取E态数据(RE),则HA向数据管理节点中的THA请求无效被其他节点缓存的目标数据,THA在第二时间之后向HA指示被其他节点缓存的目标数据已经无效,由HA向数据管理节点中的CA发送目标数据,即实现了数据管理节点中的CA独占地读目标数据。
示例性的,如图6所示,CPU0中的CA0在时间T2之前向CPU0中的HA发送RE1消息,以请求独占地读目标数据,CPU0中的HA向CPU0中的THA发送无效侦听请求消息(Snp3)以请求无效被其他节点缓存的目标数据,CPU0中的THA在时间T2之后向CPU0中的HA发送无效侦听响应消息(Rsp3),CPU0中的HA向CPU0中的CA0发送D7消息,该D7消息中包括目标数据。
进一步地,如果数据管理节点中的CA在第二时间之前向数据管理节点中的HA请求独占地读目标数据,则在第二时间之前阻塞数据请求节点请求缓存第一地址中的目标数据。即在第二时间之前不再允许新的数据管理节点请求缓存第一地址中的目标数据,可以指示数据请求节点不断重新请求直到第二时间结束,或者,阻塞到第二时间之后向数据管理节点返回第一地址中的目标数据。可以防止再延长第二时间,避免数据管理节点中的CA无法尽快独占地读目标数据。
示例性的,如图7所示,假设CPU0中的CA0在时间T2之前向CPU0中的HA在第二时间之前请求独占地读目标数据,CPU0中的HA向CPU0中的THA发送了无效侦听请求消息(Snp3),然后CPU1向CPU0请求读取S态数据,CPU1中的CA1向CPU1中的TCA1发送RS1消息,TCA1向CPU0中的THA发送RS2消息,则CPU0中的THA在第二时间后向CPU0中的HA发送RS3消息,CPU0中的HA向CPU0中的THA发送D1消息,CPU0中的THA向CPU1中的TCA1发送D2消息,CPU1中的TCA1向CPU1中的CA1发送D3消息。
本申请实施例提供的分布式缓存系统和数据缓存方法,通过数据请求节点在向数据管理节点请求缓存数据时,二者即完成了在数据请求节点中缓存的数据的失效时间的协商,当失效时间达到时,数据请求节点中缓存的数据自动失效,数据管理节点与存储节点之间无需再针对数据失效进行交互,因此不会降低系统带宽,并且针对一个地址只用记录一个最大失效时间,资源开销较小。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,指令在分布式缓存系统上运行,使得分布式缓存系统执行图4中的方法。
本申请实施例还提供了一种包含指令的计算机程序产品,指令在分布式缓存系统上运行,使得分布式缓存系统执行图4中的方法。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或 无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种分布式缓存系统,其特征在于,包括数据管理节点、数据请求节点和存储器,所述数据管理节点用于对所述存储器中的数据进行缓存一致性管理;
    所述数据请求节点用于向所述数据管理节点发送第一请求消息,所述第一请求消息中包括第一时间和所述存储器中的第一地址,所述第一请求消息用于请求缓存所述第一地址中的目标数据;所述第一时间用于指示所述目标数据在所述数据请求节点中缓存的失效时间;
    所述数据管理节点还用于向所述数据请求节点发送第一响应消息,并根据所述第一时间更新第二时间,其中,所述第一响应消息中包括所述目标数据,所述第二时间用于指示所述目标数据被其他节点缓存的最晚失效时间。
  2. 根据权利要求1所述的分布式缓存系统,其特征在于,所述数据管理节点包括:时间本地代理和本地代理;
    所述本地代理用于对所述存储器中的数据进行缓存一致性管理;
    所述时间本地代理用于接收所述第一请求消息,从所述本地代理获取所述目标数据,发送所述第一响应消息,以及根据所述第一时间更新所述第二时间。
  3. 根据权利要求1-2任一项所述的分布式缓存系统,其特征在于,所述第一时间为相对时间,所述第一响应消息中还包括第三时间,所述第三时间为所述第一时间减去所述数据管理节点与所述数据请求节点之间传输时延后的时间。
  4. 根据权利要求1-2任一项所述的分布式缓存系统,其特征在于,所述第一时间为绝对时间。
  5. 根据权利要求2-4任一项所述的分布式缓存系统,其特征在于,所述数据管理节点还包括第一缓存代理;
    所述第一缓存代理用于在所述第二时间之前向所述本地代理请求独占地读所述目标数据;
    所述本地代理还用于向所述时间本地代理请求无效被其他节点缓存的所述目标数据;
    所述时间本地代理还用于在所述第二时间之后向所述本地代理指示被其他节点缓存的所述目标数据已经无效;
    所述本地代理还用于向所述第一缓存代理发送所述目标数据。
  6. 根据权利要求5所述的分布式缓存系统,其特征在于,
    所述数据请求节点,还用于在所述第一缓存代理请求独占地读所述目标数据之后,并且所述第二时间之前,向所述数据管理节点请求缓存所述第一地址中的目标数据;
    所述数据管理节点,还用于在所述第二时间之后向所述数据请求节点发送所述目标数据。
  7. 根据权利要求1-6任一项所述的分布式缓存系统,其特征在于,所述数据请求节点包括:时间缓存代理和第二缓存代理;
    所述第二缓存代理用于向所述时间缓存代理请求缓存所述目标数据;
    所述时间缓存代理用于向所述数据管理节点发送所述第一请求消息,接收所述第一响应消息,并向所述第二缓存代理发送所述目标数据。
  8. 根据权利要求7所述的分布式缓存系统,其特征在于,所述时间缓存代理还用于在所述第一时间之后向所述第二缓存代理请求无效缓存的所述目标数据。
  9. 一种数据缓存方法,其特征在于,包括:
    数据请求节点向数据管理节点发送第一请求消息,所述第一请求消息中包括第一时间和存储器中的第一地址,所述第一请求消息用于请求缓存所述第一地址中的目标数据;所述第一时间用于指示所述目标数据在所述数据请求节点中缓存的失效时间;所述数据管理节点用于对所述存储器中的数据进行缓存一致性管理;
    所述数据管理节点向所述数据请求节点发送第一响应消息,并根据所述第一时间更新第二时间,其中,所述第一响应消息中包括所述目标数据,所述第二时间用于指示所述目标数据被其他节点缓存的最晚失效时间。
  10. 根据权利要求9所述的方法,其特征在于,所述数据管理节点向所述数据请求节点发送第一响应消息,并根据所述第一时间更新第二时间,包括:
    所述数据管理节点的时间本地代理从所述数据管理节点的本地代理获取所述目标数据,发送所述第一响应消息,以及根据所述第一时间更新所述第二时间,其中,所述本地代理用于对所述存储器中的数据进行缓存一致性管理。
  11. 根据权利要求9-10任一项所述的方法,其特征在于,所述第一时间为相对时间,所述第一响应消息中还包括第三时间,所述第三时间为所述第一时间减去所述数据管理节点与所述数据请求节点之间传输时延后的时间。
  12. 根据权利要求9-10任一项所述的方法,其特征在于,所述第一时间为绝对时间。
  13. 根据权利要求10-12任一项所述的方法,其特征在于,还包括:
    所述数据管理节点的第一缓存代理在所述第二时间之前向所述本地代理请求独占地读所述目标数据;
    所述本地代理向所述时间本地代理请求无效被其他节点缓存的所述目标数据;
    所述时间本地代理在所述第二时间之后向所述本地代理指示被其他节点缓存的所述目标数据已经无效;
    所述本地代理向所述第一缓存代理发送所述目标数据。
  14. 根据权利要求13所述的方法,其特征在于,还包括:
    所述数据请求节点在所述第一缓存代理请求独占地读所述目标数据之后,并且所述第二时间之前,向所述数据管理节点请求缓存所述第一地址中的目标数据;
    所述数据管理节点在所述第二时间之后向所述数据请求节点发送所述目标数据。
  15. 根据权利要求9-14任一项所述的方法,其特征在于,所述数据请求节点向数据管理节点发送第一请求消息,包括:
    所述数据请求节点的第二缓存代理向所述数据请求节点的所述时间缓存代理请求缓存所述目标数据;
    所述时间缓存代理向所述数据管理节点发送所述第一请求消息。
  16. 根据权利要求15所述的方法,其特征在于,还包括:
    所述时间缓存代理在所述第一时间之后向所述第二缓存代理请求无效缓存的所述目标数据。
PCT/CN2021/096988 2021-05-28 2021-05-28 分布式缓存系统和数据缓存方法 WO2022246848A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180093084.6A CN116848516A (zh) 2021-05-28 2021-05-28 分布式缓存系统和数据缓存方法
PCT/CN2021/096988 WO2022246848A1 (zh) 2021-05-28 2021-05-28 分布式缓存系统和数据缓存方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/096988 WO2022246848A1 (zh) 2021-05-28 2021-05-28 分布式缓存系统和数据缓存方法

Publications (1)

Publication Number Publication Date
WO2022246848A1 true WO2022246848A1 (zh) 2022-12-01

Family

ID=84229468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096988 WO2022246848A1 (zh) 2021-05-28 2021-05-28 分布式缓存系统和数据缓存方法

Country Status (2)

Country Link
CN (1) CN116848516A (zh)
WO (1) WO2022246848A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625663A (zh) * 2008-07-07 2010-01-13 英特尔公司 满足部分读取和非监听访问之间的访存顺序要求
CN105279034A (zh) * 2015-10-26 2016-01-27 北京皮尔布莱尼软件有限公司 一致性缓存控制系统和方法
US20180165213A1 (en) * 2016-12-12 2018-06-14 Intel Corporation Method and apparatus for memory consistency using cache coherency protocols

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625663A (zh) * 2008-07-07 2010-01-13 英特尔公司 满足部分读取和非监听访问之间的访存顺序要求
CN105279034A (zh) * 2015-10-26 2016-01-27 北京皮尔布莱尼软件有限公司 一致性缓存控制系统和方法
US20180165213A1 (en) * 2016-12-12 2018-06-14 Intel Corporation Method and apparatus for memory consistency using cache coherency protocols

Also Published As

Publication number Publication date
CN116848516A (zh) 2023-10-03

Similar Documents

Publication Publication Date Title
JP4737691B2 (ja) 排他的所有権のスヌープフィルタ
JP5833282B2 (ja) 多階層キャッシュコヒーレンシドメインシステムおよび多階層キャッシュコヒーレンシドメインシステムのローカルドメインにおけるShare−F状態の構成方法
JP3399501B2 (ja) 分割フェーズ制御を用いる明示的コヒーレンス
US7814279B2 (en) Low-cost cache coherency for accelerators
KR100952589B1 (ko) 공유 메모리 시스템 내의 메모리 운영
US8205045B2 (en) Satisfying memory ordering requirements between partial writes and non-snoop accesses
US20160092362A1 (en) Memory network to route memory traffic and i/o traffic
US6772298B2 (en) Method and apparatus for invalidating a cache line without data return in a multi-node architecture
US20150189039A1 (en) Memory Data Access Method and Apparatus, and System
US9684597B1 (en) Distributed cache coherent shared memory controller integrated with a protocol offload network interface card
US6920532B2 (en) Cache coherence directory eviction mechanisms for modified copies of memory lines in multiprocessor systems
US6934814B2 (en) Cache coherence directory eviction mechanisms in multiprocessor systems which maintain transaction ordering
JP2005519391A (ja) 共有ベクトルの増加を伴わないdsmマルチプロセッサシステムにおけるキャッシュコヒーレンスのための方法およびシステム
US6925536B2 (en) Cache coherence directory eviction mechanisms for unmodified copies of memory lines in multiprocessor systems
JP2022528630A (ja) 共有メモリにアクセスするためのシステム、方法、及び装置
JPH10214222A (ja) コンピュータシステムをコヒーレントドメインと接続するためのコヒーレンス方法、及びその装置
WO2020038466A1 (zh) 数据预取方法及装置
WO2004091136A2 (en) Multi-node system in which global address generated by processing subsystem includes global to local translation information
WO2022246848A1 (zh) 分布式缓存系统和数据缓存方法
WO2015035882A1 (zh) 基于节点控制器的请求响应方法和装置
CN110083548B (zh) 数据处理方法及相关网元、设备、系统
CN108415873B (zh) 转发对监听请求的响应
US8484422B2 (en) Maintaining data coherence by using data domains
US20210397560A1 (en) Cache stashing system
WO2019149031A1 (zh) 应用于节点系统的数据处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942419

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180093084.6

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21942419

Country of ref document: EP

Kind code of ref document: A1