WO2012149812A1 - 一种防止节点控制器死锁的方法及节点控制器 - Google Patents

一种防止节点控制器死锁的方法及节点控制器 Download PDF

Info

Publication number
WO2012149812A1
WO2012149812A1 PCT/CN2011/081393 CN2011081393W WO2012149812A1 WO 2012149812 A1 WO2012149812 A1 WO 2012149812A1 CN 2011081393 W CN2011081393 W CN 2011081393W WO 2012149812 A1 WO2012149812 A1 WO 2012149812A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
message
system address
cached
request message
Prior art date
Application number
PCT/CN2011/081393
Other languages
English (en)
French (fr)
Inventor
赵亚飞
戴若星
褚小伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN2011800021394A priority Critical patent/CN102439571B/zh
Priority to EP11864625.6A priority patent/EP2568379B1/en
Priority to PCT/CN2011/081393 priority patent/WO2012149812A1/zh
Publication of WO2012149812A1 publication Critical patent/WO2012149812A1/zh
Priority to US13/708,670 priority patent/US20130111150A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0808Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/0828Cache consistency protocols using directory methods with concurrent directory accessing, i.e. handling multiple concurrent coherency transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0833Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means in combination with broadcast means (e.g. for invalidation or updating)

Definitions

  • the present invention relates to the field of computers, and in particular, to a method and a node controller for preventing a deadlock of a node controller, and is applied to a non-uniform memory access (NUMA) system.
  • NUMA non-uniform memory access
  • the structure of the conventional NUMA system is as shown in FIG. 1. It consists of three nodes: a first node N0, a second node N1, and a third node N2.
  • the three nodes are connected by a node network (Node Interconnect).
  • the first node NO includes a node controller (NC), a system memory (not shown in FIG. 1), and a home agent (HA) for managing the system address of the system memory, and a processor.
  • the unit (not shown in Figure 1) and the Caching Agent (CA) of the processor unit.
  • each component in the first node NO is locally interconnected, and the node of the first node NO, the second node N1, and the third node N2 are Node Interconnect.
  • the second node N1 and the third node N2 may be nodes including only the processor unit and its CA, or may be a complex node having a structure similar to the first node NO.
  • the HA of the first node NO is used for the system address of the managed system memory, and the CA of the first node NO, the second node N1, and the third node N2 can implement the system through the system address. Memory access.
  • the second node N1 and/or the third node N2 need to request the system address A
  • the second node N1 and/or the third node N2 need to initiate a request message and transmit Giving the NC of the first node NO, and then transmitting the request message to the HA of the first node NO by the NC of the first node NO
  • the HA of the first node NO transmits the system address A to the first node after receiving the request message
  • the NC of the NO is then transmitted by the NC of the first node NO to the second node N1 and/or the third node N2.
  • the CA of the first node NO also needs to initiate a request message and transmit it to the The HA of the first node NO, after receiving the request message initiated by the CA, the HA of the first node NO needs to initiate a listening message, and transmits it to the NC of the first node NO, and the listening message is sent by the NC of the first node NO. Transfer to the second node N1.
  • the second node N1 After receiving the interception message, the second node N1 transmits the feedback message to the NC of the first node NO, and then the NC of the first node NO transmits the HA to the first node NO, so that the HA of the first node NO learns the second.
  • the system address A of the node cache is valid, and the HA of the first node NO can transmit the stored system address A to the CA of the first node NO through the local network, and the HA of the first node NO completes the NO of the first node NO. Processing of request messages initiated by the CA.
  • the HA of the first node NO learns that the system address A cached by the second node N1 is valid, and further transmits the system address A to the CA through the local network, so that the CA can use the system address A for system memory, ensuring the first The CA of the node NO and the system address used by the second node N1 are identical, thereby meeting the requirements of the Cache Coherence Protocol of the NUMA system.
  • the queuing policy is set on the NC of the first node NO, and the request message Request from the second node N1 or the third node N2 first enters the processing queue, so that the first node is obtained.
  • the HA-initiated listening message Probe of the NO may be blocked by the request message Request initiated by the second node N1 or the third node N2.
  • the HA of the first node NO is also provided with a queuing policy, and the request message Request initiated by the CA from the first node NO first enters the processing queue, so that the request message Request transmitted by the NC from the first node NO may also be sent.
  • the request message Request initiated by the CA of a node NO is blocked, so that a blocking ring as shown in FIG. 2 is formed between the NC and the HA of the first node NO, thereby causing the NC deadlock of the first node NO, and the NC Deadlocks permanently consume NUMA system resources and eventually cause the NUMA system to crash.
  • a method for preventing a deadlock of a node controller and a node controller provided by the embodiment of the present invention are mainly applied to a NUMA system, which can prevent a node controller from being deadlocked, thereby preventing a NUMA system from degrading or crashing due to a deadlock. .
  • a method for preventing a node controller from deadlocking, applied to a NUMA system including:
  • the node controller of the node receives the request message sent by any node, and writes the request message to the processing queue; the request message is used to request the system address;
  • the node controller monitors whether cached data blocks containing the system address are cached on other nodes, and if so, caches data blocks containing the system address in the cache on the other nodes. Invalid processing, so that the node controller directly responds to the feedback message to the home agent when receiving the first listening message transmitted by the local agent of the local node, and avoids writing the first listening message to the processing
  • the queue is blocked by the request message; the first listening message is used to monitor whether the system address is cached on the other node; and the feedback message is used to indicate that the system address cached on the other node is invalid. Causing the home agent to transfer the stored system address to the cache agent of the local node;
  • the node controller transmits the request message that has been written to the processing queue to the local agent of the first node.
  • a node controller is applied to a NUMA system, the node controller is located in a local node of the NUMA system, and the node controller includes:
  • a receiving unit configured to receive a request message sent by any node, and write the request message to a processing queue; the request message is used to request a system address;
  • a monitoring unit configured to monitor whether cached data blocks including the system address are cached on other nodes
  • a processing unit configured to: when the monitoring result of the monitoring unit is YES, invalidate the cached data block that is cached on the other node and include the system address;
  • the receiving unit is further configured to receive a first listening message that is sent by the local proxy of the local node, where the first listening message is used to monitor whether the system address is cached on the other node;
  • the receiving unit receives the first listening message, directly responding to the feedback message to the home agent, preventing the receiving unit from writing the first listening message to the processing queue unit to be blocked by the request message;
  • the feedback message is used to indicate that the system address cached on the other node is invalid, so that the home agent transmits the stored system address to the cache agent of the local node;
  • the processing queue unit is configured to store a request message written by the receiving unit
  • the transmitting unit is further configured to transmit the request message that has been written to the processing queue unit to the home agent.
  • a NUMA system includes a local node and other nodes than a local node, the local node including a node controller, a local proxy, and a caching proxy, wherein:
  • the node controller receives a request message of the other node and writes the request message Queue, the request message is used to request a system address; the node controller monitors whether the cached data block containing the system address is cached on the other node, and if so, the cached inclusion on the other node The cache data block of the system address is invalidated, so that the node controller directly responds to the feedback message to the home agent when receiving the first listening message transmitted by the home agent, and avoids the first listening message.
  • the first listening message is used to monitor whether the system address is cached on the other node; and the feedback message is used to indicate that the other node caches The system address is invalid, such that the home agent transmits its stored system address to the caching agent; the node controller transmits the request message that has been written to the processing queue to the home agent.
  • the node controller of the node after receiving the request message sent by any node and writing to the processing queue, the node controller of the node first monitors whether the cached data block containing the system address is cached on the other node, and if the monitoring finds other The cached data block containing the address of the system is cached on the node, and the node controller invalidates the cached data block containing the address of the system cached on the other node, and the subsequent node controller receives the first transfer of the local proxy of the node.
  • the node controller When the message is monitored, since the node controller has invalidated the cached data block that is cached on the other node and contains the system address, the node controller does not need to transmit the first interception message to other nodes, and directly responds to the feedback message to the HA. Yes, the node controller is prevented from writing the first interception message into the processing queue and blocked by the request message, thereby unlocking the interdependent blocking loop between the node controller and the HA, preventing the node controller from being deadlocked, and avoiding The NUMA system crashed due to a deadlock on the node controller.
  • Figure 1 is a schematic structural view of a conventional NUMA system
  • FIG. 2 is a schematic flow chart of a NC deadlock in a conventional NUMA system
  • FIG. 3 is a schematic flowchart of a method for preventing a deadlock of a node controller according to Embodiment 1 of the present invention
  • FIG. 4 is a schematic flowchart of a method for preventing deadlock of a node controller according to Embodiment 2 of the present invention
  • FIG. 6 is a schematic structural diagram of another node controller according to Embodiment 3 of the present invention
  • FIG. 7 is a schematic structural diagram of a NUMA system according to Embodiment 4 of the present invention.
  • Embodiments of the present invention provide a method, a section, a controller, and a NUMA system for preventing a deadlock of a node controller, which can prevent a node controller from being deadlocked and prevent the NUMA system from crashing due to a deadlock of the node controller.
  • the following description will be made by way of specific examples.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 3 is a schematic diagram of a method for preventing a deadlock of a node controller according to Embodiment 1 of the present invention, which is applied to a NUMA system. As shown in FIG. 3, the method may include the following steps:
  • the node controller (NC) of the node receives a request message sent by any node, and writes the request message to a processing queue, where the request message is used to request a system address.
  • a queuing policy is set on the NC, and after the request message for requesting the system address is transmitted to the NC, the NC needs to write the request message to the processing queue for queuing, and then according to the processing authority (ie, the processing order) deal with.
  • any node can access system memory based on the requested system address.
  • this node can be a central processing unit (Central Processing Unit,
  • the CPU may be a Symmetric Multi-processing (SMP) system, which is not limited in the embodiment of the present invention.
  • SMP Symmetric Multi-processing
  • the NC monitors whether the cached data block (Cache Line) including the system address is cached on the other node, and if so, invalidates the cached data block that is cached on the other node and includes the system address, so that the NC receives the node.
  • the first listening message transmitted by the HA directly responds to the feedback message to the HA, and prevents the NC from writing the first listening message to the processing queue to be blocked by the request message; wherein the first listening message is used to monitor whether the system is cached on other nodes.
  • the feedback message is used to indicate that the above system address cached on the other node is invalid, so that the HA transmits the stored system address to the CA of the local node.
  • other nodes may be CPUs, or may be SMP systems, which are not limited in the embodiment of the present invention.
  • the implementation process of the NC monitoring whether the cached data block including the above system address is cached on another node may include the following steps:
  • the NC transmits a second snoop message SnpData to other nodes, and the second snoop message SnpData is used to monitor whether cached data blocks containing the above system address are cached on other nodes.
  • the NC receives the response message RapS transmitted by the other node, and the response message RapS is used to indicate whether the cached data block containing the above system address is cached on the other node.
  • the NC can implement monitoring of other nodes, and learn whether cached data blocks containing the above system address are cached on other nodes.
  • the implementation process in which the NC invalidates the cache data block that is cached on the other node and includes the foregoing system address may be:
  • the NC transmits an indication message SnpInvXtoI to other nodes, wherein the indication message SnpInvXtoI is used to instruct other nodes to delete or disable the cached cache data block containing the above system address.
  • the NC may further receive an indication response message Rspl transmitted by another node, where the indication response message Rspl is that the other node deletes or sets the cached cache data block containing the system address according to the indication of the indication message SnpInvXtoI. After the transfer.
  • the other node may delete or invalidate the buffered cache data block containing the above system address according to the indication of the indication message SnpInvXtoI.
  • the specific implementation of the cached data block that is cached by the other node and is not available is a common knowledge of those skilled in the art, which is not described in detail herein.
  • the NC After receiving the indication response message Rspl transmitted by other nodes, it can learn that other nodes have invalidated the buffered data block containing the above system address, so that the NC even in the subsequent process
  • the first listening message received by the HA of the local node does not need to be transmitted to the other node again, and the NC is prevented from writing the first listening message to the processing queue and blocked by the request message.
  • the NC transmits a request message that has been written into the processing queue to the HA.
  • the NC caches the cached data block including the system address on the other nodes, and sends the cached data block that is cached on the other node and includes the system address to the HA of the local node to
  • the HA can transmit the stored system address to the NC according to the request of the request message, and then the NC transmits the system address to the second node, so that the second node can use the system address to perform system memory.
  • the NC may send a feedback message to the HA, so that the HA learns that the system address cached by the other node is invalid according to the indication of the feedback message, so that the HA can store the same.
  • the above system address is transmitted to the CA, so that the CA can use the system address for system memory access, thereby meeting the requirements of the NUMA system cache coherency protocol.
  • the NC may also send a feedback message to the HA after receiving the first interception message transmitted by the HA, so that the HA can The stored system address is transmitted to the CA to complete a handshake process.
  • the NC also does not need to transmit the first snoop message to other nodes, so that the first snoop message can be prevented from being blocked by the request message in the processing queue, thereby preventing the NUMA system from crashing due to the deadlock of the NC.
  • the sequence of the NC transmission request message and the first listening message is not limited, as long as the NC has invalidated the cache data block including the system address cached by other nodes before receiving the first listening message. .
  • the NC has previously learned that the other node has invalidated the cached data block containing the system address, so that the NC does not need to use the first one when receiving the first listening message transmitted by the HA.
  • the listening message is transmitted to other nodes, so that the NC can prevent the first listening message from being written into the processing queue and blocked by the request message. Since the deadlock occurs because the NC and the HA form an interdependent blocking ring, as long as the first listening message is prevented from being blocked by the request message on the NC, the blocking ring that causes the deadlock is also unlocked. This prevents the NUMA system from crashing due to a deadlock in the NC.
  • the request message transmitted by the CA is used to request the above system address, that is, the CA is the same as the system address requested by the request message transmitted by any node, so that the CA is consistent with the system address used by any node. Meet the requirements of the NUMA system's cache coherency protocol.
  • the system address requested by the CA and the request message transmitted by any node may be multiple systems managed by the HA. Any of the addresses.
  • the node controller of the node after receiving the request message sent by any node and writing to the processing queue, the node controller of the node first monitors whether the cached data block containing the system address is cached on the other node, and if the monitoring finds other The cached data block containing the address of the system is cached on the node, and the node controller invalidates the cached data block containing the address of the system cached on the other node, and the subsequent node controller receives the first transfer of the local proxy of the node.
  • the node controller When the message is monitored, since the node controller has invalidated the cached data block that is cached on the other node and contains the system address, the node controller does not need to transmit the first interception message to other nodes, and directly responds to the feedback message to the HA. Yes, the node controller is prevented from writing the first interception message into the processing queue and blocked by the request message, thereby unlocking the interdependent blocking loop between the node controller and the HA, preventing the node controller from being deadlocked, and avoiding The NUMA system crashed due to a deadlock on the node controller.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 4 is a schematic diagram of a method for preventing a deadlock of a node controller according to a second embodiment of the present invention, which is applied to a NUMA system.
  • the method of preventing the deadlock of the node controller provided in the embodiment of the present invention is described in the embodiment of the present invention as an example of a NUMA system that meets the Quick Path Interconnect (QPI) protocol.
  • the third node N2 in the H ⁇ NUMA system caches a cache data block (Cache Line) containing the system address A.
  • the method can include the following steps:
  • the NC of the first node NO receives the request message RdData sent by the second node N1 for requesting the system address A, and writes the request message RdData into the processing queue.
  • the NC of the first node NO transmits a snoop message to the third node N2.
  • the snpData is used to monitor whether the cached data block containing the system address A is cached on the third node N2.
  • the NC of the first node NO receives the response message RapS transmitted by the third node N2, and the response message RapS is used to indicate that the third node N2 caches a Cache Line including the system address A.
  • the NC of the first node N0 transmits the indication message SnpInvXtoI to the third node N2, where the indication message SnpInvXtoI is used to instruct the third node N2 to invalidate the cached Cache Line containing the system address A.
  • the NC of the first node N0 receives the indication response message Rspl sent by the third node N2.
  • the indication response message Rspl is transmitted by the third node N2 according to the instruction of the indication message SnpInvXtoI, and the buffered cache line including the system address A is invalidated.
  • the third node N2 invalidates the cached Cache Line containing the system address A.
  • the third node N2 deletes or sets the cached Cache Line containing the system address A to be unavailable.
  • the NC of the first node NO transmits the request message RdData that has been written into the processing queue to the HA of the first node NO.
  • the NC of the first node NO receives the listening message SnpData of the HA transmitted by the first node NO, where the listening message SnpData transmitted by the HA of the first node NO is the CA of the first node NO receiving the CA of the first node NO.
  • the request message RdData for requesting system address A triggers the transfer.
  • the NC of the first node N0 immediately transmits the feedback message RspCnfit to the HA of the first node N0 when receiving the Sn_Data of the HA transmission of the first node N0.
  • the HA of the first node N0 can queue the request message RdData for requesting the system address A and the request message RdData of the NC transmission of the first node N0 to the processing queue, and according to the processing authority ( That is, the order of processing) is processed in turn. Specifically, the HA of the first node N0 can transmit the system address A to the CA according to the request message RdData transmitted by the CA; and transmit the system address A to the NC of the first node N0 according to the request message RdData transmitted by the NC of the first node N0. Then, the NC address of the first node N0 transmits the system address A to the second node N1.
  • the NC of the first node N0 when the NC of the first node N0 receives the listening message SnpData transmitted by the HA of the first node N0, the NC of the first node N0 has learned all the nodes of the other node of the first node N0 (ie, includes The Cache Line of the system address A does not exist on the second node N1 and the third node N2), then the NC of the first node N0 does not need to continue to transmit the snoop message SnpData to the other side, and the feedback message RspCnfit can be transmitted to The HA of the first node N0 avoids that the NC of the first node N0 blocks the interception message SnpData write processing queue by the request message RdData, so that the interdependent blocking ring can be unlocked on the NC of the first node N0, avoiding the first A NC of a node N0 has a deadlock.
  • the first node N0 is arranged on the NC except the row.
  • the so-called domain request refers to the HA for requesting the system address request message to enter the first node NO from other nodes.
  • the NC of the first node NO finds that the request cannot be satisfied as the HA proxy, and the NC of the first node NO performs the domain request first.
  • the Cache Line of the system address A cached on the third node N2 on the second node N1 side is invalidated first.
  • the NC of the first node NO receives the listening message from the HA transmission of the first node NO, although the NC of the first node NO is already processing the request message from the second node N1 at this time, the NC of the first node NO also The HA may be sent to the first node for feedback messages for the listening message. In this way, the HA of the first node NO can process the request message from the CA first, and then continue to process the request message of the NC transmission from the first node NO.
  • the NC of the first node NO of the NUMA system invalidates the Cache Line including the system address A cached on the third node N2.
  • the NC then transmits the request message that has been written to the processing queue to the HA of the first node NO.
  • the NC of the first node NO When the NC of the first node NO receives the listening message of the HA transmission of the first node NO, the NC of the first node NO has invalidated the Cache Line containing the system address A cached on the third node N2, so that the first node NO The NC does not need to continue to transmit the interception message to the third node N2, so that the NC of the first node NO can be prevented from blocking the message being written into the processing queue by the request message, thereby uncoupling between the NC and the HA of the first node NO.
  • the interdependent blocking loop prevents the NC of the first node NO from deadlocking, and prevents the NUMA system from crashing due to the deadlock of the NC of the first node NO.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • FIG. 5 is a node controller according to Embodiment 3 of the present invention, which is applied to a NUMA system.
  • the node controller provided in this embodiment is located in a node of the NUMA system, and the node control is performed.
  • the controller can include:
  • the receiving unit 501 is configured to receive a request message sent by any node, and write the request message to the processing queue 505.
  • the request message is used to request a system address.
  • the monitoring unit 502 is configured to monitor whether cached data blocks including the foregoing system address are cached on other nodes;
  • the processing unit 503 is configured to: when the monitoring result of the monitoring unit 502 is YES, invalidate the cached data block that is cached on the other node and includes the system address;
  • the receiving unit 501 is further configured to receive a first intercepting message that is sent by the local proxy of the local node, where the first snooping message is used to monitor whether the system address is cached on the other node;
  • the transmitting unit 504 is configured to directly respond to the feedback message to the home agent when the receiving unit 501 receives the first listening message, and prevent the receiving unit 501 from writing the first listening message to the processing queue unit to be blocked by the request message; wherein, the feedback message is used for Instructing the above-mentioned system address cached on other nodes to be invalid, so that the home agent transmits the above-mentioned system address stored by it to the cache agent of the node;
  • a processing queue unit 505, configured to store a request message written by the receiving unit 501;
  • the transmitting unit 504 is also operative to transmit a request message that has been written to the processing queue unit to the home agent.
  • FIG. 6 is another node controller according to Embodiment 3 of the present invention, which is applied to a NUMA system.
  • the node controller shown in Fig. 6 is optimized by the node controller shown in Fig. 5.
  • the node controller shown in Fig. 6 is also located in a node of the NUMA system.
  • the monitoring unit 502 can include:
  • the first module 5021 is configured to send, to other nodes, a second intercepting message, where the second intercepting message is used to monitor whether the cached data block including the system address is cached on the other node;
  • the second module 5022 is configured to receive, by the other node, a response message, where the response message is used to indicate whether the cached data block including the system address is cached on the other node.
  • the processing unit 503 is specifically configured to: when the response message received by the second module 5022 indicates that the cached data block including the system address is cached on the third node, perform the cached data block that includes the system address cached on the third node. Invalid processing.
  • the processing unit 503 may include: a third module 5031, configured to transmit the indication message SnpInvXtoI to other nodes, where the indication message SnpInvXtoI is used to indicate that other nodes cache the inclusion thereof. Cache data block of the above system address Delete or set to unavailable.
  • the processing unit 503 may further include: a fourth module 5032, configured to receive an indication response message Rspl transmitted by another node, where the indication response message Rspl is based on the indication message by other nodes.
  • SnpInvXtoI indicates that the cached cache data block containing the above system address is deleted or set to be unavailable.
  • the local agent learns that the other node caches the system address according to the indication of the feedback message sent by the sending unit 504, the local agent stores the system address stored in the local proxy to the caching proxy, so that the caching proxy can access the system address.
  • the internet
  • the request message transmitted by the CA of the node is also used to request the above system address, that is, the system address requested by the CA is the same as the system address requested by the request message transmitted by any node, so that CA and The system address used by a node is the same, meeting the requirements of the NUMA system's cache coherency protocol.
  • the node may be a CPU, or may be an SMP system, which is not limited by the embodiment of the present invention.
  • the node controller of the node after receiving the request message sent by any node and writing to the processing queue, the node controller of the node first monitors whether the cached data block containing the system address is cached on the other node, and if the monitoring finds other The cached data block containing the address of the system is cached on the node, and the node controller invalidates the cached data block containing the address of the system cached on the other node, and the subsequent node controller receives the first transfer of the local proxy of the node.
  • the node controller When the message is monitored, since the node controller has invalidated the cached data block that is cached on the other node and contains the system address, the node controller does not need to transmit the first interception message to other nodes, and directly responds to the feedback message to the HA. Yes, the node controller is prevented from writing the first interception message into the processing queue and blocked by the request message, thereby unlocking the interdependent blocking loop between the node controller and the HA, preventing the node controller from being deadlocked, and avoiding The NUMA system crashed due to a deadlock on the node controller.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • FIG. 7 is a NUMA system according to Embodiment 4 of the present invention.
  • the NUMA system includes a local node 701 and other nodes 702 other than the local node 701.
  • the structure of the local node 701 is similar to that of the first node NO in FIG. 1 .
  • the difference is that the structure of the NC of the local node 701 is the same as that of the node controller shown in FIG. 5 . Or the same structure as the node controller shown in FIG. 6.
  • the node controller of the local node 701 receives the request message of the other node 702 and writes the request message to the processing queue, where the request message is used to request the system address;
  • the node controller of the local node 701 monitors whether the cached data block containing the system address is cached on the other node 702, and if so, invalidates the cached data block that is cached on the other node 702 and includes the system address, so that the node controls
  • the device directly responds to the feedback message to the home agent when receiving the first listening message sent by the home agent, and avoids blocking the first listening message by writing the processing message to the request message.
  • the first listening message is used to monitor whether the other node 702 is The system address is cached; the feedback message is used to indicate that the system address cached on the other node 702 is invalid, and the local agent transmits the stored system address to the cache agent, so that the node controller of the local node 701 will be written.
  • the request message into the processing queue is passed to the local agent.
  • the first listening message is sent to the node controller after the local agent receives the request message sent by the cache agent, and the request message transmitted by the cache agent is used to request the system address.
  • the cache agent requests the above system address to be the same as the system address requested by the other node 702.
  • the node controller of the local node 701 after receiving the request message sent by the other node 702 and writing to the processing queue, the node controller of the local node 701 first monitors whether the other node 702 caches the cached data block including the system address, if the monitoring is performed. If the cached data block containing the system address is cached on the other node 702, the node controller invalidates the cached data block that is cached on the other node 702 and includes the system address, and the subsequent node controller receives the local proxy transmission.
  • the node controller When the message is monitored, since the node controller has invalidated the cached data block that is cached on the other node 702 and contains the system address, the node controller does not need to transmit the first interception message to the other node 702, and directly responds to the feedback message.
  • the local agent can be avoided, and the node controller is prevented from writing the first interception message to the processing queue and blocked by the request message, thereby unlocking the interdependent blocking loop between the node controller and the HA, preventing the node controller from dying. Lock, avoiding the NUMA system crashing due to deadlock on the node controller .
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage medium includes: a Read-Only Memory (ROM), a Random Access Memory (RAM), a disk or an optical disk, and the like, which can store program codes. Medium.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • disk or an optical disk and the like, which can store program codes.
  • Medium The method for preventing the deadlock of the node controller and the node controller and the NUMA system provided by the embodiment of the present invention are described in detail. The principle and the implementation manner of the present invention are described in the following.

Description

一种防止节点控制器死锁的方法及节点控制器 技术领域
本发明涉及计算机领域,尤其涉及一种防止节点控制器死锁的方法及节点 控制器, 应用于非一致内存访问 (Non-Uniform Memory Access, NUMA ) 系 统。
背景技术
传统的 NUMA系统的结构如图 1所示, 由第一节点 N0、第二节点 N1 和 第三节点 N2 这三个节点构成, 这三个节点之间通过节点网络 (Node Interconnect )相连。其中,第一节点 NO中包括一个节点控制器( Node Controller, NC ), 一块系统内存(图 1未显示)以及用于管理系统内存的系统地址的本地 代理(Home Agent, HA ), 一个处理器单元(图 1未显示)及该处理器单元的 緩存代理(Caching Agent, CA )。 如图 1所示, 第一节点 NO中的各个组件本 地互联( Local Interconnect ), 第一节点 NO的 NC、 第二节点 N1以及第三节点 N2之间节点互联( Node Interconnect )。 其中, 第二节点 N1、 第三节点 N2可 以是仅包括处理器单元及其 CA 的节点, 或者也可以是结构类似于第一节点 NO 的复杂节点。
在图 1所示的 NUMA系统中, 第一节点 NO的 HA用于管理的系统内存 的系统地址, 第一节点 NO的 CA、 第二节点 N1以及第三节点 N2通过该系统 地址可以实现对系统内存访问。 支设第一节点 NO的 HA管理系统地址 A , 当 第二节点 N1 和 /或第三节点 N2 需要请求系统地址 A时, 第二节点 N1 和 / 或第三节点 N2需要发起请求消息, 并传送给第一节点 NO 的 NC, 然后由第 一节点 NO的 NC将该请求消息传送给第一节点 NO 的 HA, 第一节点 NO 的 HA收到该请求消息后将系统地址 A传送给第一节点 NO 的 NC, 再由第一节 点 NO 的 NC传送给第二节点 N1 和 /或第三节点 N2。 进一步地, 4 设第二节 点 N1 已緩存( Cached )了系统地址 A, 而且第一节点 NO 的 CA也需要请求 系统地址 A时, 则第一节点 NO 的 CA也需要发起请求消息, 并传送给第一 节点 NO的 HA, 第一节点 NO的 HA收到其 CA发起的请求消息后需要发起 监听消息, 并传送给第一节点 NO的 NC, 由第一节点 NO的 NC将该监听消息 传送给第二节点 Nl。 第二节点 N1 收到该监听消息后, 将反馈消息传送给第 一节点 NO的 NC, 再由第一节点 NO的 NC传送给第一节点 NO的 HA, 使得 第一节点 NO的 HA获悉第二节点緩存的系统地址 A有效, 进而第一节点 NO 的 HA 可以将其存储的系统地址 A通过本地网络传送给第一节点 NO的 CA, 至此第一节点 NO的 HA完成了对第一节点 NO的 CA发起的请求消息的处理。 在上述过程中, 第一节点 NO的 HA获悉第二节点 N1緩存的系统地址 A有效 后进一步通过本地网络将系统地址 A传送给 CA , 使 CA可以使用系统地址 A 进行系统内存, 确保了第一节点 NO的 CA和第二节点 N1所使用的系统地址 一致, 从而满足 NUMA系统的緩存一致性协议(Cache Coherence Protocol ) 的要求。
实践中发现, 在图 1所示的 NUMA 系统中, 第一节点 NO的 NC上设有 排队策略, 来自第二节点 N1或第三节点 N2 的请求消息 Request先进入处理 队列, 使得来自第一节点 NO的 HA发起的监听消息 Probe有可能被第二节点 N1或第三节点 N2发起的请求消息 Request阻塞。同时,第一节点 NO 的 HA上 也设有排队策略, 来自第一节点 NO 的 CA发起的请求消息 Request先进入处 理队列, 使来自第一节点 NO的 NC传送的请求消息 Request也有可能被来第 一节点 NO的 CA发起的请求消息 Request阻塞, 如此, 第一节点 NO的 NC和 HA之间就形成一个如图 2所示的阻塞环, 从而导致了第一节点 NO的 NC死 锁, 而 NC死锁会永久性地消耗 NUMA 系统资源, 最终会导致 NUMA 系统 崩溃。
发明内容
针对上述缺陷,本发明实施例提供的一种防止节点控制器死锁的方法及节 点控制器, 主要应用于 NUMA系统, 能够防止节点控制器死锁, 从而避免死 锁导致 NUMA系统性能下降或者崩溃。
一种防止节点控制器死锁的方法, 应用于 NUMA系统, 包括:
本节点的节点控制器接收任一节点发送的请求消息,并将所述请求消息写 入处理队列; 所述请求消息用于请求系统地址;
所述节点控制器监测其它节点上是否緩存有包含所述系统地址的緩存数 据块, 若是, 则将所述其它节点上緩存中包含所述系统地址的緩存数据块进行 无效处理,以使所述节点控制器接收到所述本节点的本地代理传送的第一监听 消息时直接回应反馈消息至所述本地代理,而避免将所述第一监听消息写入所 述处理队列被所述请求消息阻塞;所述第一监听消息用于监听所述其它节点上 是否緩存有所述系统地址;所述反馈消息用于指示所述其它节点上緩存的所述 系统地址无效,以使所述本地代理将其存储的所述系统地址传送给所述本节点 的緩存代理;
所述节点控制器将已写入处理队列的所述请求消息传送给所述第一节点 的本地代理。
一种节点控制器,应用于 NUMA系统,所述节点控制器位于 NUMA系统 的本地节点中, 所述节点控制器包括:
接收单元, 用于接收任一节点发送的请求消息, 并将所述请求消息写入处 理队列; 所述请求消息用于请求系统地址;
监测单元,用于监测其它节点上是否緩存有包含所述系统地址的緩存数据 块;
处理单元, 用于在所述监测单元的监测结果为是时,将所述其它节点上緩 存的包含所述系统地址的緩存数据块进行无效处理;
所述接收单元, 还用于接收所述本地节点的本地代理传送的第一监听消 息, 所述第一监听消息用于监听所述其它节点上是否緩存有所述系统地址; 传送单元,用于在所述接收单元接收到所述第一监听消息时直接回应反馈 消息至所述本地代理,避免所述接收单元将所述第一监听消息写入所述处理队 列单元被所述请求消息阻塞;所述反馈消息用于指示所述其它节点上緩存的所 述系统地址无效,以使所述本地代理将其存储的所述系统地址传送给所述本节 点的緩存代理;
所述处理队列单元, 用于存储所述接收单元写入的请求消息;
所述传送单元,还用于将已写入所述处理队列单元的所述请求消息传送给 所述本地代理。
一种 NUMA系统, 包括本地节点以及本地节点以外的其它节点, 所述本 地节点包括节点控制器、 本地代理以及緩存代理, 其中:
所述节点控制器接收所述其它节点的请求消息并将所述请求消息写入处 理队列, 所述请求消息用于请求系统地址; 所述节点控制器监测所述其它节点 上是否緩存有包含所述系统地址的緩存数据块, 若是, 则将所述其它节点上緩 存的包含所述系统地址的緩存数据块进行无效处理,以使所述节点控制器接收 到所述本地代理传送的第一监听消息时直接回应反馈消息至所述本地代理,而 避免将所述第一监听消息写入所述处理队列被所述请求消息阻塞,所述第一监 听消息用于监听所述其它节点上是否緩存有所述系统地址;所述反馈消息用于 指示所述其它节点上緩存的所述系统地址无效,以使所述本地代理将其存储的 所述系统地址传送给所述緩存代理;所述节点控制器将已写入处理队列的所述 请求消息传送给所述本地代理。
本发明实施例中,本节点的节点控制器在接收到任一节点发送的请求消息 并写入处理队列后,先监测其它节点上是否緩存有包含该系统地址的緩存数据 块, 若监测发现其它节点上緩存有包含该系统地址的緩存数据块, 则节点控制 器将其它节点上緩存的包含该系统地址的緩存数据块进行无效处理,后续节点 控制器接收到本节点的本地代理传送的第一监听消息时,由于节点控制器已经 无效掉了其它节点上緩存的包含该系统地址的緩存数据块,使得节点控制器不 需要再向其它节点传送该第一监听消息, 直接回应反馈消息给 HA即可, 避免 了节点控制器将该第一监听消息写入处理队列被请求消息阻塞,从而解开了节 点控制器与 HA之间相互依赖的阻塞环, 防止了节点控制器发生死锁, 避免了 NUMA系统因为节点控制器发生死锁而崩溃。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使 用的附图作筒单地介绍,显而易见地, 下面描述中的附图仅仅是本发明实施例 的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下, 还可以根据这些附图获得其它的附图。
图 1为传统的 NUMA系统的结构示意图;
图 2为传统的 NUMA系统的 NC发生死锁的流程示意图;
图 3为本发明实施例一提供的防止节点控制器死锁的方法流程示意图; 图 4为本发明实施例二提供的防止节点控制器死锁的方法流程示意图; 图 5为本发明实施例三提供的一种节点控制器的结构示意图; 图 6为本发明实施例三提供的另一种节点控制器的结构示意图; 图 7为本发明实施例四提供的 NUMA系统的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其它实施例, 都属于本发明保护的范围。
本发明实施例提供了一种防止节点控制器死锁的方法、 节, 控制器以及 NUMA系统, 可以防止节点控制器死锁, 避免 NUMA系统因为节点控制器死 锁而崩溃。 下面通过具体实施例进行说明。
实施例一:
请参阅图 3, 图 3为本发明实施例一提供的一种防止节点控制器死锁的方 法, 应用于 NUMA系统。 如图 3所示, 该方法可以包括以下步骤:
301、 本节点的节点控制器(NC )接收任一节点发送的请求消息, 并将该 请求消息写入处理队列, 该请求消息用于请求系统地址。
在 NUMA系统中, NC上设置有排队策略, 用于请求系统地址的请求消 息传送至 NC后, NC需要将该请求消息写入处理队列进行排队, 然后按照处 理权限(即处理的先后顺序)进行处理。
在 NUMA系统中, 任一节点根据所请求的系统地址可以访问系统内存。 在 NUMA系统中,本节点可以是中央处理器单元( Central Processing Unit ,
CPU ), 或者可以是对称多处理(Symmetric Multi processing, SMP ) 系统, 本 发明实施例不作限定。
302、 NC 监测其它节点上是否緩存有包含上述系统地址的緩存数据块 ( Cache Line ),若是,则将其它节点上緩存的包含上述系统地址的緩存数据块 进行无效处理,使 NC接收到本节点的 HA传送的第一监听消息时直接回应反 馈消息给 HA, 避免 NC将该第一监听消息写入处理队列被请求消息阻塞; 其 中, 第一监听消息用于监听其它节点上是否緩存有上述系统地址; 该反馈消息 用于指示其它节点上緩存的上述系统地址无效,以使 HA将其存储的上述系统 地址传送给本节点的 CA。 在 NUMA系统中, 其它节点可以是 CPU, 或者可以是 SMP系统, 本发 明实施例不作限定。
作为一种可选的实施方式, NC监测其它节点上是否緩存有包含上述系统 地址的緩存数据块的实现过程可以包括以下步骤:
Al、 NC向其它节点传送第二监听消息 SnpData,该第二监听消息 SnpData 用于监测其它节点上是否緩存有包含上述系统地址的緩存数据块。
Bl、 NC接收其它节点传送的响应消息 RapS, 该响应消息 RapS用于指示 其它节点上是否緩存有包含上述系统地址的緩存数据块。
由此可见,通过上述步骤 A1和步骤 Bl , NC可以实现对其它节点的监测, 获悉其它节点上是否緩存有包含上述系统地址的緩存数据块。
作为一种可选的实施方式, NC将其它节点上緩存的包含上述系统地址的 緩存数据块进行无效处理的实现过程可以为:
NC传送指示消息 SnpInvXtoI给其它节点, 其中, 该指示消息 SnpInvXtoI 用于指示其它节点将其緩存的包含上述系统地址的緩存数据块删除或置为不 可用。
进一步地, NC还可以接收其它节点传送的指示响应消息 Rspl , 该指示 响应消息 Rspl是其它节点根据上述指示消息 SnpInvXtoI的指示, 将其緩存的 包含上述系统地址的緩存数据块删除或置为不可用后传送的。
在 NUMA系统中, 其它节点接收到 NC传送的指示消息 SnpInvXtoI后, 可以根据该指示消息 SnpInvXtoI的指示, 将其緩存的包含上述系统地址的緩 存数据块删除或置为不可用。 其中, 其它节点将其緩存的包含上述系统地址的 緩存数据块删除或置为不可用的具体实现方式是本领域技术人员所公知的常 识, 本发明实施例此处不作详细介绍。
其中, 对于 NC而言, 当其接收到其它节点传送的指示响应消息 Rspl后, 即可获悉其它节点已经将其緩存的包含上述系统地址的緩存数据块进行无效 处理,这样 NC在后续过程中即使收到本节点的 HA传送的第一监听消息也不 需要再将该第一监听消息传送给其它节点了,避免了 NC将该第一监听消息写 入处理队列被请求消息阻塞。
303、 NC将已写入处理队列的请求消息传送给 HA。 本实施例中, NC在监测出其它节点上緩存有包含上述系统地址的緩存数 据块,并将其它节点上緩存的包含上述系统地址的緩存数据块进行无效处理之 送给本节点的 HA, 以使 HA可以根据该请求消息的请求将存储的系统地址传 送给 NC, 再由 NC将该系统地址传送给第二节点, 从而使得第二节点可以采 用该系统地址进行系统内存。
本实施例中, NC收到 HA传送的第一监听消息后, 可以传送反馈消息至 HA,以使 HA根据该反馈消息的指示获悉其它节点緩存的上述系统地址无效, 从而 HA可以将其存储的上述系统地址传送给 CA, 使得 CA可以采用该系统 地址进行系统内存访问, 从而满足 NUMA系统的緩存一致性协议的要求。
本实施例中, NC如果监测出其它节点上没有緩存有包含上述系统地址的 緩存数据块, 则 NC接收到 HA传送的第一监听消息后, 也可以传送反馈消息 至 HA, 使 HA可以将其存储的上述系统地址传送给 CA, 从而完成一次握手 流程。 NC也不需要将该第一监听消息传送给其它节点了, 从而可以避免将该 第一监听消息写入处理队列被请求消息阻塞,进而避免了 NUMA系统因为 NC 发生死锁而崩溃。
本实施例中, NC传送请求消息和接收第一监听消息的先后顺序不限定, 只要 NC在接收到第一监听消息之前已将其它节点緩存的包含上述系统地址 的緩存数据块进行无效处理即可。 本实施例中, NC已经预先获悉了其它节点 已经将其緩存的包含上述系统地址的緩存数据块进行无效处理,这样 NC在接 收到 HA传送的第一监听消息时,不需要再将该第一监听消息传送给其它节点 了, 从而可以避免 NC将该第一监听消息写入处理队列被请求消息阻塞。 由于 发生死锁是因为 NC和 HA之间形成相互依赖的阻塞环, 因此只要在 NC上避 免第一监听消息被请求消息阻塞,那么这个导致发生死锁的阻塞环也随之被解 开, 进而避免了 NUMA系统因为 NC发生死锁而崩溃。
本实施例中, CA传送的请求消息用于请求上述系统地址, 也就是说 CA 与任一节点传送的请求消息所请求的系统地址相同,从而使得 CA与任一节点 所使用的系统地址一致, 满足 NUMA系统的緩存一致性协议的要求。 其中, CA与任一节点传送的请求消息所请求的系统地址可以是 HA管理的多个系统 地址中的任意一个。
本发明实施例中,本节点的节点控制器在接收到任一节点发送的请求消息 并写入处理队列后,先监测其它节点上是否緩存有包含该系统地址的緩存数据 块, 若监测发现其它节点上緩存有包含该系统地址的緩存数据块, 则节点控制 器将其它节点上緩存的包含该系统地址的緩存数据块进行无效处理,后续节点 控制器接收到本节点的本地代理传送的第一监听消息时,由于节点控制器已经 无效掉了其它节点上緩存的包含该系统地址的緩存数据块,使得节点控制器不 需要再向其它节点传送该第一监听消息, 直接回应反馈消息给 HA即可, 避免 了节点控制器将该第一监听消息写入处理队列被请求消息阻塞,从而解开了节 点控制器与 HA之间相互依赖的阻塞环, 防止了节点控制器发生死锁, 避免了 NUMA系统因为节点控制器发生死锁而崩溃。
实施例二:
请参阅图 4, 图 4为本发明实施例二提供的一种防止节点控制器死锁的方 法,应用于 NUMA系统。其中,本实施例二以满足快速通道互联( Quick Path Interconnect, QPI )协议的 NUMA系统为例, 介绍本发明实施例中提供的一 种防止节点控制器死锁的方法。 进一步地, 本实施例二中 H殳 NUMA系统中 的第三节点 N2上緩存有包含系统地址 A的緩存数据块( Cache Line )。 如图 4 所示, 该方法可以包括以下步骤:
401、第一节点 NO的 NC接收第二节点 N1传送的用于请求系统地址 A的 请求消息 RdData, 并将请求消息 RdData写入处理队列。
402、 第一节点 NO的 NC向第三节点 N2传送监听消息 SnpData, 该监听 消息 SnpData用于监测第三节点 N2上是否緩存有包含系统地址 A的緩存数据 块。
403、 第一节点 NO的 NC接收第三节点 N2传送的响应消息 RapS, 该响 应消息 RapS用于指示第三节点 N2上緩存有包含系统地址 A的 Cache Line。
404、 第一节点 N0的 NC将指示消息 SnpInvXtoI传送给第三节点 N2, 其 中,该指示消息 SnpInvXtoI用于指示第三节点 N2将其緩存的包含系统地址 A 的 Cache Line进行无效处理。
405、 第一节点 N0的 NC接收第三节点 N2传送的指示响应消息 Rspl , 其中, 该指示响应消息 Rspl是第三节点 N2根据上述指示消息 SnpInvXtoI的 指示, 将其緩存的包含系统地址 A的 Cache Line进行无效处理后传送的。
本实施例中,第三节点 N2将其緩存的包含系统地址 A的 Cache Line进行 无效处理是指第三节点 N2将其緩存的包含系统地址 A的 Cache Line删除或置 为不可用。
406、 第一节点 NO的 NC将已写入处理队列的请求消息 RdData传送给第 一节点 NO的 HA。
407、第一节点 NO的 NC接收第一节点 NO的 HA传送的监听消息 SnpData, 其中, 第一节点 NO的 HA传送的监听消息 SnpData是第一节点 NO的 HA收 到第一节点 NO的 CA传送的用于请求系统地址 A的请求消息 RdData后触发 传送的。
408、 第一节点 N0 的 NC接收到第一节点 N0 的 HA传送的监听消息 SnpData时, 立刻传送反馈消息 RspCnfit至第一节点 N0的 HA。
本实施例中, 第一节点 N0的 HA可以将 CA传送的用于请求系统地址 A 的请求消息 RdData以及第一节点 N0的 NC传送的请求消息 RdData写入处理 队列进行排队,并按照处理权限(即处理的先后顺序)依次进行处理。具体地, 第一节点 N0的 HA可以根据 CA传送的请求消息 RdData将系统地址 A传送 给 CA;以及根据第一节点 N0的 NC传送的请求消息 RdData将系统地址 A传 送给第一节点 N0的 NC, 再由第一节点 N0的 NC将系统地址 A传送给第二 节点 Nl。
本实施例中, 第一节点 N0的 NC在接收到来自第一节点 N0的 HA传送 的监听消息 SnpData时, 第一节点 N0的 NC已获悉了第一节点 N0的另一侧 所有节点(即包括第二节点 N1和第三节点 N2 )上都不存在系统地址 A的 Cache Line, 那么第一节点 N0 的 NC 也就不需要继续向另一侧去传送监听消息 SnpData了, 可以传送反馈消息 RspCnfit至第一节点 N0的 HA, 避免了第一 节点 N0的 NC将监听消息 SnpData写入处理队列被请求消息 RdData阻塞,从 而可以在第一节点 N0的 NC上解开相互依赖的阻塞环, 避免了第一节点 N0 的 NC发生死锁。
作为一种可选的实施方式, 本实施例中第一节点 N0的 NC上除了设置排 队策略之外, 还可以设置以下策略:
即第一节点 NO的 NC在收到其它节点传送的用于请求系统地址的请求消 息时, 如需要向第一节点 NO的 HA进一步请求时(第一节点 NO的 NC无法 实现 HA的代理 ),必须先使其它节点所在一侧的所有包含该系统地址的 Cache Line无效, 然后才能进行垮域请求。
其中,所谓的垮域请求是指用于请求系统地址的请求消息从其它节点进入 第一节点 NO的 HA。
如图 4所示, 第一节点 NO的 NC接收到来自第二节点 N1传送的请求消 息后, 发现自身作为 HA代理无法满足该请求, 则第一节点 NO的 NC在进行 垮域请求前, 先按照上述设置好的策略, 先将第二节点 N1 —侧的第三节点 N2上緩存的系统地址 A的 Cache Line无效掉。 当第一节点 NO的 NC接收到 来自第一节点 NO的 HA传送的监听消息时,尽管这个时候第一节点 NO的 NC 已经在处理来自第二节点 N1的请求消息, 第一节点 NO的 NC也可以对针对 监听消息传送反馈消息至第一节点的 HA。 如此, 第一节点 NO的 HA就可以 先处理完来自 CA的请求消息, 再继续处理来自第一节点 NO的 NC传送的请 求消息。
本实施例中, NUMA系统的第一节点 NO的 NC在接收到第二节点 N1传 送的请求消息并写入处理队列后, 先将第三节点 N2上緩存的包含系统地址 A 的 Cache Line无效掉;然后 NC再将已写入处理队列的请求消息传送给第一节 点 NO的 HA。 第一节点 NO的 NC接收到第一节点 NO的 HA传送的监听消息 时, 第一节点 NO的 NC已经无效掉了第三节点 N2上緩存的包含系统地址 A 的 Cache Line, 使第一节点 NO的 NC不需要继续向第三节点 N2传送该监听 消息, 从而可以避免第一节点 NO的 NC将监听消息写入处理队列被请求消息 阻塞, 从而解开了第一节点 NO的 NC与 HA之间相互依赖的阻塞环, 防止了 第一节点 NO的 NC发生死锁, 避免了 NUMA系统因为第一节点 NO的 NC发 生死锁而崩溃。
实施例三:
请参阅图 5,图 5为本发明实施例三提供的一种节点控制器,应用于 NUMA 系统, 本实施例提供的节点控制器位于 NUMA系统的某一节点中, 该节点控 制器可以包括:
接收单元 501 , 用于接收任一节点发送的请求消息, 并将该请求消息写入 处理队列 505; 其中, 该请求消息用于请求系统地址;
监测单元 502, 用于监测其它节点上是否緩存有包含上述系统地址的緩存 数据块;
处理单元 503, 用于在监测单元 502的监测结果为是时, 将其它节点上緩 存的包含上述系统地址的緩存数据块进行无效处理;
接收单元 501 , 还用于接收本节点的本地代理传送的第一监听消息, 该第 一监听消息用于监听其它节点上是否緩存有上述系统地址;
传送单元 504, 用于在接收单元 501接收到第一监听消息时直接回应反馈 消息至本地代理,避免接收单元 501将第一监听消息写入处理队列单元被请求 消息阻塞; 其中, 反馈消息用于指示其它节点上緩存的上述系统地址无效, 以 使本地代理将其存储的上述系统地址传送给本节点的緩存代理;
处理队列单元 505, 用于存储接收单元 501写入的请求消息;
传送单元 504还用于将已写入处理队列单元的请求消息传送给本地代理。 请一并参阅图 6, 图 6为本发明实施例三提供的另一种节点控制器, 应用 于 NUMA系统。 其中, 图 6所示的节点控制器是由图 5所示的节点控制器进 行优化得到的, 图 6所示的节点控制器同样位于 NUMA系统的某一节点中。 在图 6所示的节点控制器中, 监测单元 502可以包括:
第一模块 5021 , 用于向其它节点传送第二监听消息, 该第二监听消息用 于监测其它节点上是否緩存有包含上述系统地址的緩存数据块;
第二模块 5022, 用于接收的其它节点传送的响应消息, 该响应消息用于 指示其它节点上是否緩存有包含上述系统地址的緩存数据块。
相应地, 处理单元 503具体用于在第二模块 5022接收的响应消息表示第 三节点上緩存有包含上述系统地址的緩存数据块时,将第三节点上緩存的包含 系统地址的緩存数据块进行无效处理。
进一步地, 在图 6所示的节点控制器中, 处理单元 503可以包括: 第三模块 5031 , 用于将指示消息 SnpInvXtoI传送给其它节点, 该指示消 息 SnpInvXtoI用于指示其它节点将其緩存的包含上述系统地址的緩存数据块 删除或置为不可用。
进一步地, 在图 6所示的节点控制器中, 处理单元 503还可以包括: 第四模块 5032, 用于接收其它节点传送的指示响应消息 Rspl , 该指示响 应消息 Rspl是其它节点根据该指示消息 SnpInvXtoI的指示, 将其緩存的包含 上述系统地址的緩存数据块删除或置为不可用后传送的。
本实施例中,本地代理根据传送单元 504回应的反馈消息的指示获悉其它 节点緩存了上述系统地址之后,可以将本地代理存储的上述系统地址传送给緩 存代理, 使得緩存代理可以采用该系统地址访问网络。
本实施例中, 本节点的 CA传送的请求消息也用于请求上述系统地址, 也 就是说 CA 所请求的系统地址与任一节点传送的请求消息所请求的系统地址 相同, 从而使得 CA与任一节点所使用的系统地址一致, 满足 NUMA系统的 緩存一致性协议的要求。
在 NUMA系统中, 节点可以是 CPU, 或者可以是 SMP系统, 本发明实 施例不作限定。
本发明实施例中,本节点的节点控制器在接收到任一节点发送的请求消息 并写入处理队列后,先监测其它节点上是否緩存有包含该系统地址的緩存数据 块, 若监测发现其它节点上緩存有包含该系统地址的緩存数据块, 则节点控制 器将其它节点上緩存的包含该系统地址的緩存数据块进行无效处理,后续节点 控制器接收到本节点的本地代理传送的第一监听消息时,由于节点控制器已经 无效掉了其它节点上緩存的包含该系统地址的緩存数据块,使得节点控制器不 需要再向其它节点传送该第一监听消息, 直接回应反馈消息给 HA即可, 避免 了节点控制器将该第一监听消息写入处理队列被请求消息阻塞,从而解开了节 点控制器与 HA之间相互依赖的阻塞环, 防止了节点控制器发生死锁, 避免了 NUMA系统因为节点控制器发生死锁而崩溃。
实施例四:
请参阅图 7, 图 7为本发明实施例四提供的一种 NUMA系统。 在图 7所 示的 NUMA系统中,该 NUMA系统包括本地节点 701以及本地节点 701以外 的其它节点 702。 其中, 本地节点 701的结构与图 1中第一节点 NO的结构类 似,不同的是,本地节点 701的 NC的结构与图 5所示的节点控制器结构相同, 或者与图 6所示的节点控制器的结构相同。
其中,本地节点 701的节点控制器接收其它节点 702的请求消息并将该请 求消息写入处理队列, 该请求消息用于请求系统地址;
本地节点 701的节点控制器监测其它节点 702上是否緩存有包含上述系统 地址的緩存数据块, 若是, 则将其它节点 702上緩存的包含上述系统地址的緩 存数据块进行无效处理,以使节点控制器接收到本地代理传送的第一监听消息 时直接回应反馈消息至本地代理,而避免将第一监听消息写入处理队列被请求 消息阻塞, 其中, 第一监听消息用于监听其它节点 702上是否緩存有上述系统 地址;反馈消息用于指示其它节点 702上緩存的上述系统地址无效,使本地代 理将其存储的上述系统地址传送给緩存代理,以便所述本地节点 701的节点控 制器将已写入处理队列的请求消息传送给本地代理。
其中,上述的第一监听消息是本地代理接收到緩存代理传送的请求消息之 后传送给节点控制器的, 緩存代理传送的请求消息用于请求上述系统地址。
其中, 緩存代理请求上述系统地址与其它节点 702请求的系统地址相同。 本发明实施例中,本地节点 701的节点控制器在接收到其它节点 702发送 的请求消息并写入处理队列后,先监测其它节点 702上是否緩存有包含该系统 地址的緩存数据块,若监测发现其它节点 702上緩存有包含该系统地址的緩存 数据块,则节点控制器将其它节点 702上緩存的包含该系统地址的緩存数据块 进行无效处理,后续节点控制器接收到本地代理传送的第一监听消息时, 由于 节点控制器已经无效掉了其它节点 702 上緩存的包含该系统地址的緩存数据 块,使得节点控制器不需要再向其它节点 702传送该第一监听消息, 直接回应 反馈消息给本地代理即可,避免了节点控制器将该第一监听消息写入处理队列 被请求消息阻塞,从而解开了节点控制器与 HA之间相互依赖的阻塞环, 防止 了节点控制器发生死锁,避免了 NUMA系统因为节点控制器发生死锁而崩溃。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可 以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存 储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述的存储 介质包括:只读存储器( Read-Only Memory, ROM )、随机存取存储器 ( Random Access Memory, RAM ), 磁碟或者光盘等各种可以存储程序代码的介质。 以上对本发明实施例提供的一种防止节点控制器死锁的方法及节点控制 器、 NUMA系统进行了详细介绍,本文中应用了具体个例对本发明的原理及实 施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核 心思想; 同时, 对于本领域的一般技术人员, 依据本发明的思想, 在具体实施 方式及应用范围上均会有改变之处, 综上所述, 本说明书内容不应理解为对本 发明的限制。

Claims

权 利 要 求
1、 一种防止节点控制器死锁的方法, 应用于非一致内存访问系统, 其特 征在于, 包括:
本节点的节点控制器接收任一节点发送的请求消息,并将所述请求消息写 入处理队列; 所述请求消息用于请求系统地址;
所述节点控制器监测其它节点上是否緩存有包含所述系统地址的緩存数 据块, 若是, 则将所述其它节点上緩存中包含所述系统地址的緩存数据块进行 无效处理,以使所述节点控制器接收到所述本节点的本地代理传送的第一监听 消息时直接回应反馈消息至所述本地代理,而避免将所述第一监听消息写入所 述处理队列被所述请求消息阻塞;所述第一监听消息用于监听所述其它节点上 是否緩存有所述系统地址;所述反馈消息用于指示所述其它节点上緩存的所述 系统地址无效,以使所述本地代理将其存储的所述系统地址传送给所述本节点 的緩存代理;
所述节点控制器将已写入处理队列的所述请求消息传送给所述第一节点 的本地代理。
2、 根据权利要求 1所述的方法, 其特征在于, 所述节点控制器监测其它 节点上是否緩存有包含所述系统地址的緩存数据块, 包括:
所述节点控制器向其它节点传送第二监听消息,所述第二监听消息用于监 测其它节点上是否緩存有包含所述系统地址的緩存数据块;
所述节点控制器接收的所述其它节点传送的响应消息,所述响应消息用于 指示所述其它节点上是否緩存有包含所述系统地址的緩存数据块。
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述将所述其它节点 上緩存中包括所述系统地址的緩存数据块进行无效处理, 包括:
所述节点控制器传送指示消息 SnpInvXtoI给所述其它节点, 所述指示消 息 SnpInvXtoI用于指示所述其它节点将其緩存中包含所述系统地址的緩存数 据块删除或置为不可用。
4、 根据权利要求 3所述的方法, 其特征在于, 所述方法还包括: 所述节点控制器接收所述其它节点传送的指示响应消息 Rspl , 所述指示 响应消息 Rspl是所述其它节点根据所述指示消息 SnpInvXtoI的指示, 将其緩 存的包含所述系统地址的緩存数据块删除或置为不可用后传送的。
5、 根据权利要求 1或 2所述的方法, 其特征在于, 所述第一监听消息是 所述本地代理接收到所述本节点的緩存代理传送的请求消息之后传送给所述 节点控制器的, 所述緩存代理传送的请求消息用于请求所述系统地址。
6、 根据权利要求 1或 2所述的方法, 其特征在于, 所述本节点为中央处 理器单元或是对称多处理 SMP系统。
7、 一种节点控制器, 应用于非一致内存访问系统, 所述节点控制器位于 所述非一致内存访问系统的本地节点中, 其特征在于, 所述节点控制器包括: 接收单元, 用于接收任一节点发送的请求消息, 并将所述请求消息写入处 理队列; 所述请求消息用于请求系统地址;
监测单元,用于监测其它节点上是否緩存有包含所述系统地址的緩存数据 块;
处理单元, 用于在所述监测单元的监测结果为是时,将所述其它节点上緩 存的包含所述系统地址的緩存数据块进行无效处理;
所述接收单元, 还用于接收所述本地节点的本地代理传送的第一监听消 息, 所述第一监听消息用于监听所述其它节点上是否緩存有所述系统地址; 传送单元,用于在所述接收单元接收到所述第一监听消息时直接回应反馈 消息至所述本地代理,避免所述接收单元将所述第一监听消息写入所述处理队 列单元被所述请求消息阻塞;所述反馈消息用于指示所述其它节点上緩存的所 述系统地址无效,以使所述本地代理将其存储的所述系统地址传送给所述本节 点的緩存代理;
所述处理队列单元, 用于存储所述接收单元写入的请求消息;
所述传送单元,还用于将已写入所述处理队列单元的所述请求消息传送给 所述本地代理。
8、根据权利要求 7所述的节点控制器, 其特征在于, 所述监测单元包括: 第一模块, 用于向其它节点传送第二监听消息, 所述第二监听消息用于监 测其它节点上是否緩存有包含所述系统地址的緩存数据块;
第二模块, 用于接收所述其它节点传送的响应消息, 所述响应消息用于指 示所述其它节点上是否緩存有包含所述系统地址的緩存数据块。
9、 根据权利要求 7或 8所述的节点控制器, 其特征在于, 所述处理单元 包括:
第三模块, 用于在所述监测单元的监测结果为是时, 传送指示消息 SnpInvXtoI给所述其它节点, 所述指示消息 SnpInvXtoI用于指示所述其它节 点将其緩存的包含所述系统地址的緩存数据块删除或置为不可用。
10、 根据权利要求 9所述的节点控制器, 其特征在于, 所述处理单元还包 括:
第四模块, 用于接收所述其它节点传送的指示响应消息 Rspl , 所述指示 响应消息 Rspl是所述其它节点根据所述指示消息 SnpInvXtoI的指示, 将其緩 存的包含所述系统地址的緩存数据块删除或置为不可用后传送的。
11、 根据权利要求 7或 8所述的节点控制器, 其特征在于, 所述第一监听 消息是所述本地代理接收到所述本地节点的緩存代理传送的请求消息之后传 送给所述节点控制器的, 所述緩存代理传送的请求消息用于请求所述系统地 址。
12、 根据权利要求 7或 8所述的节点控制器, 其特征在于, 所述本地节点 为中央处理器单元或是对称多处理 SMP系统。
13、一种非一致内存访问系统, 包括本地节点以及本地节点以外的其它节 点, 所述本地节点包括节点控制器、 本地代理以及緩存代理, 其特征在于: 所述节点控制器接收所述其它节点的请求消息并将所述请求消息写入处 理队列, 所述请求消息用于请求系统地址; 所述节点控制器监测所述其它节点 上是否緩存有包含所述系统地址的緩存数据块, 若是, 则将所述其它节点上緩 存的包含所述系统地址的緩存数据块进行无效处理,以使所述节点控制器接收 到所述本地代理传送的第一监听消息时直接回应反馈消息至所述本地代理,而 避免将所述第一监听消息写入所述处理队列被所述请求消息阻塞,所述第一监 听消息用于监听所述其它节点上是否緩存有所述系统地址;所述反馈消息用于 指示所述其它节点上緩存的所述系统地址无效,使所述节点控制器将已写入处 理队列的所述请求消息传送给所述本地代理。
14、 根据权利要求 13所示的非一致内存访问系统, 其特征在于, 所述第 一监听消息是所述本地代理接收到所述緩存代理传送的请求消息之后传送给 所述节点控制器的, 所述緩存代理传送的请求消息用于请求所述系统地址。
PCT/CN2011/081393 2011-10-27 2011-10-27 一种防止节点控制器死锁的方法及节点控制器 WO2012149812A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN2011800021394A CN102439571B (zh) 2011-10-27 2011-10-27 一种防止节点控制器死锁的方法及装置
EP11864625.6A EP2568379B1 (en) 2011-10-27 2011-10-27 Method for preventing node controller deadlock and node controller
PCT/CN2011/081393 WO2012149812A1 (zh) 2011-10-27 2011-10-27 一种防止节点控制器死锁的方法及节点控制器
US13/708,670 US20130111150A1 (en) 2011-10-27 2012-12-07 Method for preventing deadlock of node controller, and node controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/081393 WO2012149812A1 (zh) 2011-10-27 2011-10-27 一种防止节点控制器死锁的方法及节点控制器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/708,670 Continuation US20130111150A1 (en) 2011-10-27 2012-12-07 Method for preventing deadlock of node controller, and node controller

Publications (1)

Publication Number Publication Date
WO2012149812A1 true WO2012149812A1 (zh) 2012-11-08

Family

ID=45986240

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/081393 WO2012149812A1 (zh) 2011-10-27 2011-10-27 一种防止节点控制器死锁的方法及节点控制器

Country Status (4)

Country Link
US (1) US20130111150A1 (zh)
EP (1) EP2568379B1 (zh)
CN (1) CN102439571B (zh)
WO (1) WO2012149812A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140040561A1 (en) * 2012-07-31 2014-02-06 Futurewei Technologies, Inc. Handling cache write-back and cache eviction for cache coherence
CN103488606B (zh) * 2013-09-10 2016-08-17 华为技术有限公司 基于节点控制器的请求响应方法和装置
CN103870435B (zh) * 2014-03-12 2017-01-18 华为技术有限公司 服务器及数据访问方法
US10104019B2 (en) * 2014-05-27 2018-10-16 Magnet Forensics Inc. Systems and methods for locating application-specific data on a remote endpoint computer
CN104035888B (zh) * 2014-06-11 2017-08-04 华为技术有限公司 一种缓存数据的方法及存储设备
CN106484725B (zh) * 2015-08-31 2019-08-20 华为技术有限公司 一种数据处理方法、装置和系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1264872A (zh) * 1999-02-26 2000-08-30 国际商业机器公司 用于避免因冲突的失效事务而造成的活锁的方法和系统
CN101621714A (zh) * 2008-06-30 2010-01-06 华为技术有限公司 节点、数据处理系统和数据处理方法
CN102026099A (zh) * 2010-11-16 2011-04-20 西安电子科技大学 无线体域网中自适应低时延媒体接入控制方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6338122B1 (en) * 1998-12-15 2002-01-08 International Business Machines Corporation Non-uniform memory access (NUMA) data processing system that speculatively forwards a read request to a remote processing node
US6760809B2 (en) * 2001-06-21 2004-07-06 International Business Machines Corporation Non-uniform memory access (NUMA) data processing system having remote memory cache incorporated within system memory
US7225298B2 (en) * 2003-04-11 2007-05-29 Sun Microsystems, Inc. Multi-node computer system in which networks in different nodes implement different conveyance modes
US7856534B2 (en) * 2004-01-15 2010-12-21 Hewlett-Packard Development Company, L.P. Transaction references for requests in a multi-processor network
JP4848771B2 (ja) * 2006-01-04 2011-12-28 株式会社日立製作所 キャッシュ一貫性制御方法およびチップセットおよびマルチプロセッサシステム
US8205045B2 (en) * 2008-07-07 2012-06-19 Intel Corporation Satisfying memory ordering requirements between partial writes and non-snoop accesses
US9529636B2 (en) * 2009-03-26 2016-12-27 Microsoft Technology Licensing, Llc System and method for adjusting guest memory allocation based on memory pressure in virtual NUMA nodes of a virtual machine
US8327228B2 (en) * 2009-09-30 2012-12-04 Intel Corporation Home agent data and memory management

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1264872A (zh) * 1999-02-26 2000-08-30 国际商业机器公司 用于避免因冲突的失效事务而造成的活锁的方法和系统
CN101621714A (zh) * 2008-06-30 2010-01-06 华为技术有限公司 节点、数据处理系统和数据处理方法
CN102026099A (zh) * 2010-11-16 2011-04-20 西安电子科技大学 无线体域网中自适应低时延媒体接入控制方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2568379A4 *

Also Published As

Publication number Publication date
EP2568379A4 (en) 2014-02-19
EP2568379A1 (en) 2013-03-13
US20130111150A1 (en) 2013-05-02
EP2568379B1 (en) 2016-04-27
CN102439571A (zh) 2012-05-02
CN102439571B (zh) 2013-08-28

Similar Documents

Publication Publication Date Title
US7434006B2 (en) Non-speculative distributed conflict resolution for a cache coherency protocol
US7613882B1 (en) Fast invalidation for cache coherency in distributed shared memory system
WO2012149812A1 (zh) 一种防止节点控制器死锁的方法及节点控制器
KR100634932B1 (ko) 멀티프로세서 시스템 내에서의 캐시 일관성에서 사용하기위한 전송 상태
US8171095B2 (en) Speculative distributed conflict resolution for a cache coherency protocol
US5893160A (en) Deterministic distributed multi-cache coherence method and system
US7568073B2 (en) Mechanisms and methods of cache coherence in network-based multiprocessor systems with ring-based snoop response collection
KR100880059B1 (ko) 효율적인 이홉(two-hop) 캐시 일관성 프로토콜
WO2014146425A1 (zh) 在多级缓存一致性域系统局部域构造Share-F状态的方法
US7818509B2 (en) Combined response cancellation for load command
US7752397B2 (en) Repeated conflict acknowledgements in a cache coherency protocol
JP2006505868A (ja) マルチクラスタのロックのための方法および装置
US7506108B2 (en) Requester-generated forward for late conflicts in a cache coherency protocol
US20080005486A1 (en) Coordination of snoop responses in a multi-processor system
WO2012109906A1 (zh) 访问高速缓冲存储器的方法及非真实缓存代理
US7017012B2 (en) Distributed storage cache coherency system and method
TWI753093B (zh) 針對監聽請求的轉發回應
US8972663B2 (en) Broadcast cache coherence on partially-ordered network
KR20230070033A (ko) 캐시 라인 축출을 위한 다중 레벨 캐시 코히런시 프로토콜

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180002139.4

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2011864625

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11864625

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE