WO2015143810A1 - 节点故障检测方法及装置 - Google Patents
节点故障检测方法及装置 Download PDFInfo
- Publication number
- WO2015143810A1 WO2015143810A1 PCT/CN2014/083256 CN2014083256W WO2015143810A1 WO 2015143810 A1 WO2015143810 A1 WO 2015143810A1 CN 2014083256 W CN2014083256 W CN 2014083256W WO 2015143810 A1 WO2015143810 A1 WO 2015143810A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- echo
- identification information
- wwpn
- response message
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
Definitions
- the present invention relates to the field of communications, and in particular to a node fault detection method and apparatus.
- a Fibre Channel (FC) network in the related art has good network transmission characteristics such as high bandwidth and low latency, and is widely used in a storage network.
- the host node is connected to the magnetic array node through the switch node
- data interaction is performed between the two nodes (the host node and the magnetic array node) that have established the connection relationship, and the data plane is always isolated from the terminal device node (host And the magnetic array) or the switching node, and can not know the connection of the entire network.
- the reachability and validity of the data path are determined by the upper layer of the Fibre Channel.
- the upstream node or the switch node with which the connection relationship exists cannot know that the node downstream of the fault point has occurred. If the fault occurs, the upstream and downstream nodes continue to send data frames until the uppermost layer (host and disk array) senses the timeout before the corresponding fault is processed.
- the current FC protocol does not provide a dedicated mechanism to detect connection validity and fault detection. For a transmission medium with high latency requirements such as FC, invalid frame transmission caused by a node failure affects network traffic, which seriously affects the user experience of using the FC network. When the networking level is relatively deep, detecting the connection and disconnection between the nodes through the frame on the high-speed interface will also affect the user's service bandwidth.
- a small number of magnetic arrays will connect a large number of host nodes at the same time, and provide services for a large number of hosts at the same time. If the switching node or the magnetic array node fails, the fault recovery management cannot affect other host nodes. Business. Since the host node and the magnetic array do not maintain the topology relationship of the entire network, only all nodes that register through the FC-GS-6 protocol register their identity with the switching node. In this case, when a fault occurs, manual maintenance, recovery, and management will occur. Most likely affecting the business that is running.
- the current recovery method is basically to view the actual physical networking physical connection, and view the alarm information of the network management tool, find the physical node that matches the alarm information, and organize the physical connection cable between the initiating physical node and the faulty physical node to locate.
- the fault cannot meet the requirements for fast location and fault resolution in a complex networking environment.
- network deployment will change, which will cause changes in the existing network deployment, according to the previous maintenance experience can not meet the needs of rapid maintenance network.
- the FC protocol accesses the Name Server through a well-known address identifier for an FC device, and uses the Common Transfer protocol defined by the FC-GS-6 to allow the client to attach the address identifier and attributes of the device in the FC switching network.
- GPN_ID gets the port name
- G N_ID gets the node name
- GCSJD uses GCSJD to get the service identifier name
- GFTJD to get the FC-4 attribute
- GPTJD to get the port identification type.
- the host node and the magnetic array node can only query the scattered information of other nodes through the switching node, and there is no direct logical relationship, which cannot provide a unified networking presentation.
- the loopback diagnostic (echo) command is specified in the Fibre Channel FC-LS-2 protocol.
- the echo request receiver returns the payload content after the command code to the initiator of the echo command in a received sequence in the received order, which provides a method for transmitting the data frame and returns by
- the payload content is used for simple loopback diagnostics.
- a sequence can only have one frame, which is used to transmit echo commands and responses.
- the echo used in the FC protocol can only implement the simple loopback diagnosis function, and cannot obtain the information identifier of the node through which the echo message passes.
- a node failure detecting method includes: sending an echo message to a downstream leaf node connected to the current node, where the echo message is used to detect whether an abnormality occurs between the link between the current node and the destination node, and the current node And the destination node are the terminal device nodes in the FC network; the identification information of each node that works normally between the current node and the destination node is obtained according to the echo response message; and the identification information of each node obtained is determined to determine whether the occurrence occurs.
- the node that is faulty includes: sending an echo message to a downstream leaf node connected to the current node, where the echo message is used to detect whether an abnormality occurs between the link between the current node and the destination node, and the current node And the destination node are the terminal device nodes in the FC network; the identification information of each node that works normally between the current node and the destination node is obtained according to the echo response message; and the identification information of each node obtained is determined to determine whether the occurrence occurs.
- the obtaining the identification information of each node according to the echo response message includes: receiving an echo response message from the destination node, where the information carried in the echo response message includes: the echo message is between the current node and the destination node.
- the global port name (WWPN) identification information of each node is forwarded step by step; the echo response message is parsed, and the WWPN identification information of all nodes is extracted from the echo response message.
- the obtaining the identification information of each node according to the echo response message includes: receiving an echo response message from the intermediate node, where the information carried in the echo response message includes: the echo message is between the current node and the intermediate node.
- the WWPN identification information of all nodes is extracted in the text.
- determining, by the acquired identifier information of each node, whether the faulty node exists determining whether the WWPN identifier information extracted from the echo response packet is identification information of all nodes between the current node and the destination node; If no, the type of the faulty node is determined according to the extracted WWPN identification information.
- the state information of the faulty node is directly set to the fault state; For the switching node, the status information of the failed node and all the leaf nodes of the failed node are set to the fault state.
- the method further includes: determining, according to the extracted WWPN identification information, a connection relationship between the nodes that are working normally and status information of all the connection relationships, and generating a network. Topology diagram.
- sending the echo message to the downstream leaf node connected to the current node includes: sending an echo message according to the first preset period; if the sending fails or the echo response message is not received within the preset duration, the first packet is sent The preset period is adjusted to the second preset period, and the echo message is sent N times consecutively, wherein the value of the second preset period is smaller than the first preset period, and N is a positive integer greater than 1, and the echo is sent N times consecutively.
- the result of the success of the packet is used to determine whether to continue sending the echo message.
- a node failure detecting apparatus is provided.
- the node fault detection apparatus includes: a sending module, configured to send an echo message to a downstream leaf node connected to the current node, where the echo message is used to detect whether a link between the current node and the destination node is An exception occurs, the current node and the destination node are terminal device nodes in the FC network; the obtaining module is configured to obtain identification information of each node that works normally between the current node and the destination node according to the echo response message; It is set to determine whether there is a faulty node by acquiring the identification information of each node.
- the obtaining module includes: a first receiving unit, configured to receive an echo response message from the destination node, where the information carried in the echo response message includes: the echo message is stepped between the current node and the destination node.
- the global port name (WWPN) identification information of each node that is forwarded; the first extracting unit is configured to parse the echo response message, and extract WWPN identification information of all nodes from the echo response message.
- WWPN global port name
- the obtaining module includes: a second receiving unit, configured to receive an echo response message from the intermediate node, where the information carried in the echo response message includes: an echo message between the current node and the intermediate node
- the WWPN identification information of each node passing through and the WWPN identification information of all the leaf nodes of the normal working downstream of the intermediate node collected by the intermediate node are forwarded, and the intermediate node is an echo report sent by the current node to the destination node in the FC network.
- the switching node that passes through the text; the second extracting unit is configured to parse the echo response message, and extract WWPN identification information of all nodes from the echo response message.
- the determining module includes: a determining unit, configured to determine whether the WWPN identification information extracted from the echo response message is identification information of all nodes between the current node and the destination node; and the processing unit is configured to output in the determining unit If no, the type of the faulty node is determined according to the extracted WWPN identification information.
- the state information of the faulty node is directly set to the fault state; For the switching node, the status information of the failed node and all the leaf nodes of the failed node are set to the fault state.
- the foregoing apparatus further includes: a generating module, configured to determine, according to the extracted WWPN identification information, a connection relationship between the nodes that are normally working and state information of all the connection relationships, and generate a network topology relationship diagram.
- the echo message is sent to the downstream leaf node connected to the current node, where the echo message is used to detect whether the link between the current node and the destination node is abnormal, and the current node and the destination node are both
- the terminal device node in the FC network obtains the identification information of each node that works normally between the current node and the destination node according to the echo response message; determines whether there is a faulty node by using the acquired identification information of each node, that is, The current node actively sends an echo message to the downstream leaf node, and receives the echo response message, and extracts the identification information of each node through which the echo message passes, without manually configuring each node information, and according to the identifiers of the extracted nodes.
- FIG. 1 is a flowchart of a node fault detection method according to an embodiment of the present invention
- FIG. 2 is a flowchart of a method for acquiring FC network connection information according to a preferred embodiment of the present invention
- 3 is a flow chart of a method for incrementally acquiring FC networking connection information according to a preferred embodiment of the present invention
- FIG. 4a is a schematic diagram of a data format of an Echo payload according to a preferred embodiment of the present invention
- FIG. 4b is an Echo according to a preferred embodiment of the present invention.
- FIG. 5 is a schematic diagram of acquiring FC network topology information according to a preferred embodiment of the present invention
- FIG. 6 is a schematic diagram of acquiring FC network topology information in an incremental manner according to a preferred embodiment of the present invention
- FIG. 8 is a schematic diagram of a connection failure fast detection mode according to a preferred embodiment of the present invention
- FIG. 9 is a block diagram of a node failure detection apparatus according to an embodiment of the present invention
- Figure 10 is a block diagram showing the structure of a node failure detecting apparatus in accordance with a preferred embodiment of the present invention.
- FIG. 1 is a flow chart of a node failure detection method in accordance with an embodiment of the present invention. As shown in FIG.
- the method may include the following steps: Step S102: Send an echo message to a downstream leaf node connected to the current node, where the echo message is used to detect whether the link between the current node and the destination node is An exception occurs, the current node and the destination node are both terminal device nodes in the FC network; Step S104: Obtain identification information of each node that works normally between the current node and the destination node according to the echo response message; Step S106: The identification information of each node to which it is determined determines whether there is a failed node.
- Step S102 Send an echo message to a downstream leaf node connected to the current node, where the echo message is used to detect whether the link between the current node and the destination node is An exception occurs, the current node and the destination node are both terminal device nodes in the FC network; Step S104: Obtain identification information of each node that works normally between the current node and the destination node according to the echo response message; Step S106: The identification information of each node to which it is determined determines
- the current node actively sends an echo message to the downstream leaf node, and receives an echo response message, and extracts identification information of each node through which the echo message passes, without manually configuring each node information. And according to the extracted identification information of each node, timely grasp the data frame timeout of the downstream node caused by the node failure, thereby solving the related art in the FC network switching node or After the connection between the terminal device nodes fails, the data transceiver node cannot quickly detect the problem, and thus the rapid detection of the FC network connection failure is realized, so that the transceiver data node can quickly find the connection failure and timely respond accordingly.
- the node roles in the preferred embodiment of the present invention may be divided into an initiator (for example, the current node) and a receiver (for example, the destination node), and the two ends of the connection may be configured as follows:
- One end is the initiator and the other end is the receiver
- obtaining the identification information of each node according to the echo response message may include the following operations: Step S1: Receive an echo response message from the destination node, where the information carried in the echo response message includes: Echo message forwards the World Wide Port Name (WWPN) identification information of each node passing through the node between the current node and the destination node. Step S2: Parse the echo response packet and reply from the echo The WWPN identification information of all nodes is extracted in the message.
- WWPN World Wide Port Name
- the initiating node (corresponding to the current node) sends an echo message to the other nodes (including the intermediate node and the destination node), and the receiving node (equivalent to the intermediate node) receives the echo message.
- the packet carries its own WWPN identification information and forwards it to the next-level (hop) node.
- the destination node After receiving the echo packet, the destination node carries its own WWPN identification information and the nodes that are passed to the node.
- the WWPN identification information is returned to the originating node.
- the initiating node receives the echo response message, it does not need to manually configure each node information.
- obtaining the identification information of each node according to the echo response message may include the following steps: Step S3: Receive an echo response message from the intermediate node, where the information carried in the echo response message includes: The WWPN identification information of each node that the echo message forwards between the current node and the intermediate node and the WWPN of all the leaf nodes of the normal working downstream of the intermediate node collected by the intermediate node Identification information, the intermediate node is a switching node that passes the echo message sent by the current node to the destination node in the FC network; Step S4: parsing the echo response message, and extracting WWPN identification information of all nodes from the echo response message .
- the initiating node initiates an echo message
- the non-destination node (equivalent to the intermediate node) directly collects the echo message after receiving the echo message according to the above-described preferred embodiment. All the downstream WWPN identification information is carried in the echo reply message and returned to the initiating node. This means that each node is only responsible for initiating information about its downstream nodes. When the networking is large, the network initialization time can be improved. Discovered efficiency.
- determining, by the acquired identification information of each node, whether there is a faulty node may include the following operations: Step S5: determining whether the WWPN identification information extracted from the echo response message is at the current node and The identification information of all the nodes between the destination nodes; Step S6: If no, the type of the faulty node is determined according to the extracted WWPN identification information, and when the faulty node is the terminal device node, the fault will be directly caused.
- the state information of the node is set to the fault state; when the faulty node is the switching node, the state information of the faulty node and all the leaf nodes of the failed node is set to the fault state.
- the network topology fault link is displayed as a fault state, and if the fault node is a switching node. Then, the node downstream of the switching node is also displayed as a fault state, thereby facilitating the network administrator to maintain the node fault, and displaying the state after the fault is repaired, which can greatly improve the manageability of the Fibre Channel.
- Step S7 determining, according to the extracted WWPN identification information, between the nodes that are working normally.
- the relationship information and the state information of all the connection relationships are generated, and a network topology relationship diagram is generated.
- the WWPN identifier of each node that the initiating node passes to the local node (corresponding to the intermediate node) is received, and the network topology information of the node is updated.
- the network topology map may be generated on the network management system, and each component is displayed on the display interface.
- connection relationship of the nodes and the status of all the connection relationships allow the user to intuitively perceive the networking connection and connection status.
- each node may further compare the stored topology information with the information carried in the newly returned response message, and synchronously update the connection state of the networking.
- the sending node can quickly determine the connection status downstream of the switching node according to the return status of the packet, prompt the user to repair the fault path, and cancel the data frame to be sent to the faulty node, and return the application connection fault.
- sending an echo message to the downstream leaf node connected to the current node may include the following operations: Step S8: Send an echo message according to the first preset period; Step S9: If the transmission fails or is preset If the echo response message is not received, the first preset period is adjusted to the second preset period, and the echo message is sent N times, wherein the value of the second preset period is smaller than the first preset period. If N is a positive integer greater than 1, the result of whether the consecutive echo packets are sent consecutively is used to determine whether to continue sending echo packets.
- Step S8 Send an echo message according to the first preset period
- Step S9 If the transmission fails or is preset If the echo response message is not received, the first preset period is adjusted to the second preset period, and the echo message is sent N times, wherein the value of the second preset period is smaller than the first preset period. If N is a positive integer greater than 1, the result of whether the consecutive echo packets are sent consecutively is used to determine whether to continue sending echo packets.
- Step S202 The initiator node carries the WWPN identifier of the node in the echo message
- Step S204 The other node receives the echo message sent by the initiator node, and sets its own The WWPN identifier is fed back to the initiating node, and the packet is forwarded.
- Step S206 determining whether the destination node is; if not, proceeding to step S208; if yes, proceeding to step S210; step S208: the node acquiring its own WWPN identifier and The echo is forwarded to the next hop, and the process goes to step S206.
- Step S210 The WWPN identifier of the own node is added, and the response is returned.
- 3 is a flow chart of a method for incrementally acquiring FC networking connection information according to a preferred embodiment of the present invention.
- Step S302 Start to create an FC network
- Step S304 The initiating node initiates an echo packet with the hop count of 1 and carries the WWPN identifier of the own node
- Step S306 The node that receives the echo packet determines whether it is processed by the node according to the hop count; If no, go to step S308; if yes, go to step S310; step S308: the hop count does not match the node, forward the echo message to the next hop; go to step S306; step S310: hop count and The node matches, the node returns the node identifier and the node next hop identifier, and responds to the initiating node.
- Step S312 The initiating node increments the hop count after receiving the echo response to obtain the identifier of the next hop, and proceeds to step S306.
- 4a is a schematic diagram of a data format of an Echo payload in accordance with a preferred embodiment of the present invention.
- 4b is a diagram showing the data format of an Echo reply in accordance with a preferred embodiment of the present invention. As shown in FIG. 4a and FIG. 4b, two identifier fields are added to the data fields of the Echo payload and the Echo reply, where WWPN Num indicates the number of WWPNs that the echo request has carried, and Type indicates whether the node is a terminal device node or an exchange. node.
- the initiator host node HI carries the WWPN identifier of the node HI in the echo message, and sends the echo message to the destination node magnetic array node D1.
- the switching node S1 receives the echo message, attaches the WWPN identifier of the S1 to the echo message, and determines that it is not the destination node, and forwards the packet to the next hop; at the same time, the switching node S1 also attaches its own WWPN identifier. Returned to the host node H1 in the echo response.
- the host node H2 receives the echo message, appends the WWPN identifier of H2 to the echo message, and determines that it is not the destination node, and forwards the packet to the next hop.
- the switching node H2 also attaches its own WWPN identifier. Returned to the host node H1 in the echo response.
- the switch node S3 receives the echo packet, attaches the WWPN identifier of the S3 to the echo packet, and determines that it is not the destination node, and forwards the packet to the next hop.
- the switching node S3 also attaches its own WWPN identifier. Returned to the host node HI in the echo response.
- the switch point S2 receives the echo message, attaches the WWPN identifier of the S2 to the echo message, and determines that it is not the destination node, and forwards the packet to the next hop; at the same time, the switching node S2 also attaches its own WWPN identifier. Returned to the host node H1 in the echo response.
- the switch point S4 receives the echo message, attaches the WWPN identifier of the S4 to the echo message, and determines that it is not the destination node, and forwards the packet to the next hop; at the same time, the switching node S4 also attaches its own WWPN identifier. Returned to the host node H1 in the echo response.
- the switch point S5 receives the echo packet, attaches the WWPN identifier of the S5 to the echo packet, and determines that it is not the destination node, and forwards the packet to the next hop; at the same time, the switching node S5 also attaches its own WWPN identifier. Returned to the host node H1 in the echo response.
- the destination node D1 receives the echo packet, determines that it is the destination node, attaches the WWPN identifier of D1 to the echo packet, and returns it in response.
- the initiating node HI receives the response packet of the intermediate node, parses the response packet, and extracts the WWPN identification information therein, so as to determine which nodes in the process path from the initiating node to the destination node effectively return the WWPN identifier. From this, the connectivity of the intermediate nodes can be judged.
- the initiating node HI receives the response packet of the destination node D1, and constructs a logical connection topology from the initiating node to the destination node according to the WWPN identifier list returned by D1 to display the networking topology.
- FIG. 6 is a schematic diagram of acquiring FC networking topology information in an incremental manner according to a preferred embodiment of the present invention. As shown in FIG.
- the initiator host node HI carries the WWPN identifier of the node HI in the echo message, and sets the number of hops carried in the echo packet to 1.
- the switching node S1 receives the echo packet with the hop count of 1, and attaches the WWPN identifier of S1 to the echo response and returns it to the host node H1.
- the host node HI After recording the WWPN identification information of the switching node S1, the host node HI sends an echo packet with a hop count of 2.
- the switching node S1 determines from the hop count that it should not be processed by the node and forwarded.
- the host node H2 receives the echo packet with the hop count of 2, and judges that it should be processed by the node according to the hop count, and then adds the WWPN identifier of H2 to the echo response and returns it to the host node HI.
- the host node HI sends an echo packet with a hop count of 3.
- the switch point S2 receives the echo packet with the hop count of 3. It determines that the node should be processed according to the hop count, and then adds the WWPN identifier of S2 to the echo response and returns it to the host node H1.
- the host node HI sends an echo message with a hop count of 4 after recording the WWPN identification information of the switching node S2.
- the switch point S4 receives the echo packet with the hop count of 3, and judges that the node should be processed according to the hop count, and then adds the WWPN identifier of S4 to the echo response and returns it to the host node H1.
- the host node HI sends an echo message with a hop count of 4 after recording the WWPN identification information of the switching node S4.
- the switch point S5 receives the echo packet with the hop count of 3. It determines that the node should be processed according to the hop count, and then adds the WWPN identifier of S5 to the echo response and returns it to the host node H1. After recording the WWPN identification information of the switching node S5, the host node HI sends an echo packet with a hop count of 4.
- the destination node D1 receives the echo packet with the hop count of 4, and determines that it is the destination node. Then, the WWPN identifier of the D1 is attached to the echo packet, and is returned in response. After receiving the response packet, the initiating node HI parses the WWPN identifier list and constructs a logical connection topology from the initiating node to the destination node to display the networking topology.
- 7 is a schematic diagram of rapid detection and location of connection faults in accordance with a preferred embodiment of the present invention. As shown in FIG.
- the FC networking topology information seen from the perspective of the magnetic array node D1 is as follows:
- the switching node S2 fails, the technical solution adopted in the related art usually has to wait for the data sent by D1 to H3 and H4. After the frame is timed out or discarded, D1 can detect the failure of H3 and H4, but it is not clear where the specific fault point occurs from all the connected nodes from D1 to H3. The administrator can only take manual troubleshooting when repairing.
- the intermediate nodes S4, S3, and S2 respectively return the WWPN identifiers of the intermediate nodes S4, S3, and S2 to the D1 when the E1 sends the echo message to the H3.
- FIG. 8 is a schematic diagram of a quick connection detection method according to a preferred embodiment of the present invention.
- the echo detection is performed by the echo.
- the detection mode can be divided into slow detection and fast detection.
- the slow detection sends an echo message every 2S (equivalent to the first preset period). If it fails, it will enter the fast detection mode; if the 10 on the connectivity fails or times out, it will enter the fast detection mode.
- the fast detection mode sends an echo message every 100 ms (corresponding to the second preset period), and continuously transmits 3 (in the preferred embodiment, the value of N is 3), and only 3 times succeeds, and then switches to Slow detection mode; if all 3 failures, the connectivity is considered invalid, otherwise the detection is continued. This allows you to quickly discover node failures and effectively maintain a network-wide topology.
- the node fault detecting apparatus may include: a sending module 10, configured to send an echo packet to a downstream leaf node connected to the current node, where the echo packet is used to detect between the current node and the destination node. Whether the link is abnormal, the current node and the destination node are terminal device nodes in the FC network; and the obtaining module 20 is configured to acquire identification information of each node that works normally between the current node and the destination node according to the echo response message.
- the determining module 30 is configured to determine, by the acquired identification information of each node, whether there is a faulty node.
- the obtaining module 20 may include: a first receiving unit 200, configured to receive an echo response message from the destination node, where the information carried in the echo response message includes: The global port name (WWPN) identification information of each node that is forwarded between the current node and the destination node is forwarded by the first node.
- WWPN global port name
- the first extracting unit 202 is configured to parse the echo response packet, and extract all the nodes from the echo response packet.
- WWPN identification information Preferably, as shown in FIG. 10, the obtaining module 20 may include: a second receiving unit 204, configured to receive an echo response message from the intermediate node, where the information carried in the echo response message includes: The WWPN identification information of each node that is forwarded between the current node and the intermediate node and the WWPN identification information of all the leaf nodes that are normally working in the intermediate node are collected by the intermediate node, and the intermediate node is the current node in the FC network.
- the determining module 30 may include: a determining unit 300, configured to determine whether the WWPN identification information extracted from the echo response message is identification information of all nodes between the current node and the destination node;
- the processing unit 302 is configured to determine, according to the extracted WWPN identification information, the type of the node that fails when the output of the determining unit is NO.
- the foregoing apparatus may further include: a generating module 40, configured to determine, according to the extracted WWPN identification information, a connection relationship between the nodes that are working normally and state information of all the connection relationships, and generate a network topology. Structure diagram.
- the above embodiments achieve the following technical effects (it is necessary to explain that these effects are achievable by some preferred embodiments): in order to improve the reliability and manageability of the Fibre Channel network system.
- the technical solution provided by the embodiment of the present invention can quickly locate the fault state of the connection by using the echo message, and can quickly locate the fault of the FC network.
- the FC-GS- can be effectively solved. In the protocol, only the node information can be obtained, and the problem of the entire network topology cannot be obtained. The topology state of the fault can be quickly located. The fault is restored without affecting the normal operation of other nodes. The host and the magnetic array are greatly improved.
- the topology information of the FC network can be obtained.
- each node can automatically obtain the full-ring topology, without manually configuring the topology information of each node.
- the topology of the FC network changes during the connection failure process, which may cause other nodes to be abnormal, which makes the FC network cable simpler, and does not require manual intervention when dynamically adding or deleting nodes, thereby improving system reliability and manageability. .
- modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
- the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module.
- the invention is not limited to any specific combination of hardware and software.
- the above is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention.
- a node fault detection method and apparatus provided by the embodiments of the present invention have the following beneficial effects:
- the technical solution provided by the embodiments of the present invention By quickly reporting the fault status of the connection through the echo message, the fault of the FC network can be quickly located.
- the node information can only be obtained in the FC-GS-6 protocol. However, the problem of the entire network topology cannot be obtained, and the topology state when the fault occurs can be quickly located.
- the FC network When the fault is repaired, the normal operation of other nodes is not affected, and the management difficulty and workload of the host and the magnetic array network are greatly improved. Use, manage, and maintain the FC network well.
- the initiating node of the echo packet receives the response of the packet, the topology information of the FC network can be obtained.
- each node can Achieve a full-ring topology, without manually configuring the topology information of each node, and also solving the topology change of the FC network during the connection failure process, which may cause other nodes to be abnormal in services, making the FC network cable simpler, dynamically adding and deleting nodes. No manual intervention is required, which improves the reliability and manageability of the system.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本发明公开了一种节点故障检测方法及装置,在上述方法中,向与当前节点连接的下游叶子节点发送echo报文,其中,echo报文用于检测当前节点与目的节点之间的链路是否发生异常,当前节点和目的节点均为FC网络中的终端设备节点;根据echo应答报文获取在当前节点与目的节点之间的正常工作的各个节点的标识信息;通过获取到的各个节点的标识信息确定是否存在发生故障的节点。根据本发明提供的技术方案,实现了对FC网络连接故障的快速检测,使得收发数据节点能够快速发现连接故障,及时地做出相应处理。
Description
节点故障检测方法及装置 技术领域 本发明涉及通信领域, 具体而言, 涉及一种节点故障检测方法及装置。 背景技术 目前, 相关技术中的光纤通道 (Fibre Channel, 简称为 FC) 网络具有高带宽、 低 时延等良好的网络传输特性, 使其在存储网络中得到广泛的应用。 在主机节点经过交换机节点连接至磁阵节点的组网模型中, 在已建立连接关系的 两个节点之间 (主机节点与磁阵节点) 进行数据交互, 数据平面始终孤立于终端设备 节点 (主机和磁阵) 或交换节点, 而无法得知整个网络的连接情况。 对于数据路径的 可达性和有效性是通过光纤通道的上层来确定的, 即当某个交换节点发生故障时, 与 其存在连接关系的上游节点或交换节点无法知道该故障点下游的节点已经发生故障, 上下游节点还是继续发送数据帧, 直到最上层 (主机和磁阵) 感知超时才进行相应的 故障处理。 目前的 FC协议并没有提供专门的机制来检测连接有效性和故障检测。对于 FC这 种高时延要求的传输介质而言, 节点故障后带来的无效的帧传输会影响网络流量, 严 重影响了用户使用 FC 网络的体验。 且当组网层次比较深时, 所有节点之间通过帧来 检测连接通断在高速接口上也会影响用户的业务带宽。 在主机与磁阵的典型组网中, 少量的磁阵会同时连接大量的主机节点, 并为大量 的主机同时提供服务, 如果交换节点或磁阵节点发生故障, 故障恢复管理不能影响其 它主机节点的业务。由于主机节点和磁阵没有维护全网的拓扑关系,只有通过 FC-GS-6 协议所有节点向交换节点注册其身份, 在此种情况下, 当有故障发生时, 手工维护、 恢复、 管理会极有可能影响正在运行的业务。 目前的恢复手段基本上是查看实际环境组网物理连接, 并且查看网管工具的告警 信息, 找出与告警信息匹配的物理节点, 整理发起物理节点与故障物理节点之间的物 理连接线缆才能定位故障, 因而无法满足复杂组网环境下快速定位解决故障的要求。 还有就是随着磁阵网络的升级与改造, 增加设备节点, 网络部署会发生改变, 会造成 已有网络部署的变化, 根据以往的维护经验无法满足快速维护网络的需求。
目前 FC协议对一种 FC设备, 通过众所周知的地址标识访问 Name Server, 使用 FC-GS-6定义的 Common Transfer协议来允许客户端附着到 FC交换网中的设备的地址 标识和属性, 其中, 使用 GPN_ID获取端口名称, 使用 G N_ID获取节点名称, 使用 GCSJD获取服务标识名, 使用 GFTJD获取 FC-4属性, 使用 GPTJD获取端口标识 类型等。 主机节点与磁阵节点只能通过交换节点查询到其它节点的零散信息, 没有直 接的逻辑关系, 无法提供统一的组网展现。 在光纤通道 FC-LS-2协议中规定了环回诊断 (echo) 命令。 echo请求接收方将该 命令码之后的负荷 (payload) 内容按照接收到的顺序, 通过应答 (reply) 序列返回至 echo命令的发起者, 其提供了一种方法用于传输数据帧, 并且通过返回 payload内容 来进行简单环回诊断功能。 序列只能有一个帧, 该帧用来传输 echo指令和应答。 然而, 目前 FC协议中使用的 echo仅能实现简单的环回诊断功能, 而并不能获取 echo报文所经过节点的信息标识。 发明内容 本发明实施例提供了一种节点故障检测方法及装置,以至少解决相关技术中在 FC 网络中交换节点或终端设备节点之间的连接发生故障后, 数据收发节点无法快速感知 的问题。 根据本发明实施例的一个方面, 提供了一种节点故障检测方法。 根据本发明实施例的节点故障检测方法包括: 向与当前节点连接的下游叶子节点 发送 echo报文,其中, echo报文用于检测当前节点与目的节点之间的链路是否发生异 常, 当前节点和目的节点均为 FC网络中的终端设备节点; 根据 echo应答报文获取在 当前节点与目的节点之间的正常工作的各个节点的标识信息; 通过获取到的各个节点 的标识信息确定是否存在发生故障的节点。 优选地, 根据 echo应答报文获取各个节点的标识信息包括: 接收来自于目的节点 的 echo应答报文, 其中, echo应答报文中携带的信息包括: echo报文在当前节点与 目的节点之间逐级转发经过的每个节点的全球端口名称( WWPN)标识信息; 对 echo 应答报文进行解析, 从 echo应答报文中提取全部节点的 WWPN标识信息。 优选地, 根据 echo应答报文获取各个节点的标识信息包括: 接收来自于中间节点 的 echo应答报文, 其中, echo应答报文中携带的信息包括: echo报文在当前节点与 中间节点之间逐级转发经过的每个节点的 WWPN标识信息以及中间节点收集的该中
间节点下游的正常工作的全部叶子节点的 WWPN标识信息,中间节点为在 FC网络中 由当前节点向目的节点发送的 echo报文经过的交换节点; 对 echo应答报文进行解析, 从 echo应答报文中提取全部节点的 WWPN标识信息。 优选地, 通过获取到的各个节点的标识信息确定是否存在发生故障的节点包括: 判断从 echo应答报文中提取的 WWPN标识信息是否为在当前节点与目的节点之间的 全部节点的标识信息; 如果否, 则根据提取到的 WWPN标识信息确定发生故障的节 点的类型, 当发生故障的节点为终端设备节点时, 则直接将发生故障的节点的状态信 息设置为故障状态; 当发生故障的节点为交换节点, 则将发生故障的节点以及发生故 障的节点下级的全部叶子节点的状态信息均设置为故障状态。 优选地, 在从 echo应答报文中提取全部节点的 WWPN标识信息之后, 还包括: 根据提取到的 WWPN标识信息确定正常工作的各个节点之间的连接关系和全部连接 关系的状态信息, 生成网络拓扑结构关系图。 优选地, 向与当前节点连接的下游叶子节点发送 echo报文包括: 按照第一预设周 期发送 echo报文; 如果发送失败或者在预设时长内未接收到 echo应答报文, 则将第 一预设周期调整为第二预设周期, 并且连续发送 N次 echo报文, 其中, 第二预设周 期的取值小于第一预设周期, N为大于 1的正整数,连续发送 N次 echo报文是否成功 的结果用于确定是否继续发送 echo报文。 根据本发明实施例的另一方面, 提供了一种节点故障检测装置。 根据本发明实施例的节点故障检测装置包括: 发送模块, 设置为向与当前节点连 接的下游叶子节点发送 echo报文,其中, echo报文用于检测当前节点与目的节点之间 的链路是否发生异常, 当前节点和目的节点均为 FC 网络中的终端设备节点; 获取模 块,设置为根据 echo应答报文获取在当前节点与目的节点之间的正常工作的各个节点 的标识信息; 确定模块, 设置为通过获取到的各个节点的标识信息确定是否存在发生 故障的节点。 优选地, 获取模块包括: 第一接收单元, 设置为接收来自于目的节点的 echo应答 报文, 其中, echo应答报文中携带的信息包括: echo报文在当前节点与目的节点之间 逐级转发经过的每个节点的全球端口名称 (WWPN) 标识信息; 第一提取单元, 设置 为对 echo应答报文进行解析, 从 echo应答报文中提取全部节点的 WWPN标识信息。 优选地, 获取模块包括: 第二接收单元, 设置为接收来自于中间节点的 echo应答 报文, 其中, echo应答报文中携带的信息包括: echo报文在当前节点与中间节点之间
逐级转发经过的每个节点的 WWPN标识信息以及中间节点收集的该中间节点下游的 正常工作的全部叶子节点的 WWPN标识信息,中间节点为在 FC网络中由当前节点向 目的节点发送的 echo报文经过的交换节点; 第二提取单元, 设置为对 echo应答报文 进行解析, 从 echo应答报文中提取全部节点的 WWPN标识信息。 优选地,确定模块包括:判断单元,设置为判断从 echo应答报文中提取的 WWPN 标识信息是否为在当前节点与目的节点之间的全部节点的标识信息; 处理单元, 设置 为在判断单元输出为否时, 根据提取到的 WWPN标识信息确定发生故障的节点的类 型, 当发生故障的节点为终端设备节点时, 则直接将发生故障的节点的状态信息设置 为故障状态; 当发生故障的节点为交换节点, 则将发生故障的节点以及发生故障的节 点下级的全部叶子节点的状态信息均设置为故障状态。 优选地, 上述装置还包括: 生成模块, 设置为根据提取到的 WWPN标识信息确 定正常工作的各个节点之间的连接关系和全部连接关系的状态信息, 生成网络拓扑结 构关系图。 通过本发明实施例,采用向与当前节点连接的下游叶子节点发送 echo报文,其中, echo报文用于检测当前节点与目的节点之间的链路是否发生异常, 当前节点和目的节 点均为 FC网络中的终端设备节点; 根据 echo应答报文获取在当前节点与目的节点之 间的正常工作的各个节点的标识信息; 通过获取到的各个节点的标识信息确定是否存 在发生故障的节点, 即当前节点主动向下游叶子节点发送 echo报文, 并接收 echo应 答报文, 从中提取 echo报文经过的各个节点的标识信息, 而不需要人工配置各节点信 息, 并根据提取到的各个节点的标识信息及时掌握由节点故障引起的其下游节点数据 帧超时, 由此解决了相关技术中在 FC 网络中交换节点或终端设备节点之间的连接发 生故障后, 数据收发节点无法快速感知的问题, 进而实现了对 FC 网络连接故障的快 速检测, 使得收发数据节点能够快速发现连接故障, 及时地做出相应处理。 附图说明 此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的一部分, 本发 明的示意性实施例及其说明用于解释本发明, 并不构成对本发明的不当限定。 在附图 中: 图 1是根据本发明实施例的节点故障检测方法的流程图; 图 2是根据本发明优选实施例的获取 FC组网连接信息的方法流程图;
图 3是根据本发明优选实施例的递增获取 FC组网连接信息的方法流程图; 图 4a是根据本发明优选实施例的 Echo payload的数据格式示意图; 图 4b是根据本发明优选实施例的 Echo reply的数据格式示意图; 图 5是根据本发明优选实施例的获取 FC组网拓扑信息的示意图; 图 6是根据本发明优选实施例的以递增方式获取 FC组网拓扑信息的示意图; 图 7是根据本发明优选实施例的连接故障快速检测和定位的示意图; 图 8是根据本发明优选实施例的连接故障快速检测方式的示意图; 图 9是根据本发明实施例的节点故障检测装置的结构框图; 图 10是根据本发明优选实施例的节点故障检测装置的结构框图。 具体实施方式 下文中将参考附图并结合实施例来详细说明本发明。 需要说明的是, 在不冲突的 情况下, 本申请中的实施例及实施例中的特征可以相互组合。 图 1是根据本发明实施例的节点故障检测方法的流程图。 如图 1所示, 该方法可 以包括以下处理步骤: 步骤 S102: 向与当前节点连接的下游叶子节点发送 echo报文, 其中, echo报文 用于检测当前节点与目的节点之间的链路是否发生异常,当前节点和目的节点均为 FC 网络中的终端设备节点; 步骤 S104: 根据 echo应答报文获取在当前节点与目的节点之间的正常工作的各 个节点的标识信息; 步骤 S106: 通过获取到的各个节点的标识信息确定是否存在发生故障的节点。 相关技术中, 在 FC 网络中交换节点或终端设备节点之间的连接发生故障后, 数 据收发节点无法快速感知。 采用如图 1所示的方法, 当前节点主动向下游叶子节点发 送 echo报文,并接收 echo应答报文,从中提取 echo报文经过的各个节点的标识信息, 而不需要人工配置各节点信息, 并根据提取到的各个节点的标识信息及时掌握由节点 故障引起的其下游节点数据帧超时, 由此解决了相关技术中在 FC 网络中交换节点或
终端设备节点之间的连接发生故障后, 数据收发节点无法快速感知的问题, 进而实现 了对 FC 网络连接故障的快速检测, 使得收发数据节点能够快速发现连接故障, 及时 地做出相应处理。 需要说明的是, 在本发明优选实施例中的节点角色可以分为发起方 (例如: 上述 当前节点) 和接收方 (例如: 上述目的节点), 其连接两端可以配置如下:
( 1 ) 一端为发起方, 而另一端为接收方;
(2) 两端均为发起方。 但是, 无论如何都不能将连接两端均配置成接收方。 优选地, 在步骤 S104中, 根据 echo应答报文获取各个节点的标识信息可以包括 以下操作: 步骤 S1 : 接收来自于目的节点的 echo应答报文, 其中, echo应答报文中携带的 信息包括: echo报文在当前节点与目的节点之间逐级转发经过的每个节点的全球端口 名称 (World Wide Port Name, 简称为 WWPN) 标识信息; 步骤 S2: 对 echo应答报文进行解析, 从 echo应答报文中提取全部节点的 WWPN 标识信息。 在优选实施例中, 发起节点 (相当于上述当前节点) 向其他节点 (包括: 上述中 间节点和目的节点)发送 echo报文, 途经接收节点 (相当于上述中间节点)在接收到 echo报文后, 在报文中携带自身的 WWPN标识信息并转发至下一级 (跳) 节点, 最 终目的节点在接收到 echo报文后, 携带自身的 WWPN标识信息以及递至本节点所经 过的各节点的 WWPN标识信息返回至发起节点。当发起节点接收到 echo应答报文时, 不需要人工配置各个节点信息。 当一个节点获取到所有节点的 WWPN标识信息之后, 可以根据节点属性类型依次对终端节点和交换节点进行配对。 优选地, 在步骤 S104中, 根据 echo应答报文获取各个节点的标识信息可以包括 以下步骤: 步骤 S3: 接收来自于中间节点的 echo应答报文, 其中, echo应答报文中携带的 信息包括: echo报文在当前节点与中间节点之间逐级转发经过的每个节点的 WWPN 标识信息以及中间节点收集的该中间节点下游的正常工作的全部叶子节点的 WWPN
标识信息, 中间节点为在 FC网络中由当前节点向目的节点发送的 echo报文经过的交 换节点; 步骤 S4: 对 echo应答报文进行解析, 从 echo应答报文中提取全部节点的 WWPN 标识信息。 在优选实施例中, 除了按照上述优选实施方式进行全网拓扑发现之外, 发起节点 发起 echo报文, 非目的节点 (相当于上述中间节点) 在接收到该 echo报文后, 直接 收集该节点下游的所有 WWPN标识信息,并将其携带在 echo reply报文中返回给发起 节点, 这样可以是每个节点只负责发起其下游节点的信息, 当组网比较庞大时, 可以 提高全网初始化时发现的效率。 优选地,在步骤 S106中,通过获取到的各个节点的标识信息确定是否存在发生故 障的节点可以包括以下操作: 步骤 S5 :判断从 echo应答报文中提取的 WWPN标识信息是否为在当前节点与目 的节点之间的全部节点的标识信息; 步骤 S6: 如果否, 则根据提取到的 WWPN标识信息确定发生故障的节点的类型, 当发生故障的节点为终端设备节点时, 则直接将发生故障的节点的状态信息设置为故 障状态; 当发生故障的节点为交换节点, 则将发生故障的节点以及发生故障的节点下 级的全部叶子节点的状态信息均设置为故障状态。 在优选实施例中, 根据各个节点上报的 WWPN标识信息可以判断是否发生了节 点之间连接关系的故障, 如果发生故障, 则将网络拓扑故障链路显示为故障状态, 如 果是故障节点是交换节点, 则将该交换节点下游的节点也显示为故障状态, 由此给网 络管理员维护节点故障带便利, 并且及时显示故障修复后的状态, 可以极大地提高光 纤通道的易管理性。 优选地, 在步骤 S2或步骤 S4, 从 echo应答报文中提取全部节点的 WWPN标识 信息之后, 还可以包括以下步骤: 步骤 S7: 根据提取到的 WWPN标识信息确定正常工作的各个节点之间的连接关 系和全部连接关系的状态信息, 生成网络拓扑结构关系图。 在优选实施例中, 接收发起节点传递至本节点 (相当于上述中间节点) 所经过的 各个节点的 WWPN标识, 更新本节点的网络拓扑信息。 另外, 当节点获取到全网节 点的 WWPN标识之后, 可以在网管上生成网络拓扑图, 并且在显示界面上显示各个
节点的连接关系和所有连接关系的状态, 让用户直观的感知组网连接和连接状态。 此 夕卜, 各个节点还可以根据已经存储的拓扑结构信息与新返回的应答报文携带的信息进 行比较, 同步更新组网的连接状态。 当 FC 网络中出现节点故障时, 发送节点根据报 文的返回情况可以快速确定交换节点下游的连接状态, 提示用户修复故障路径, 并取 消将要发送到故障节点的数据帧, 返回应用连接故障。 优选地, 在步骤 S102中, 向与当前节点连接的下游叶子节点发送 echo报文可以 包括以下操作: 步骤 S8: 按照第一预设周期发送 echo报文; 步骤 S9: 如果发送失败或者在预设时长内未接收到 echo应答报文, 则将第一预 设周期调整为第二预设周期, 并且连续发送 N次 echo报文, 其中, 第二预设周期的 取值小于第一预设周期, N为大于 1的正整数,连续发送 N次 echo报文是否成功的结 果用于确定是否继续发送 echo报文。 下面将结合图 2至图 8所示的优选实施方式对上述优选实施过程做进一步的描述。 图 2是根据本发明优选实施例的获取 FC组网连接信息的方法流程图。 如图 2所 示, 该方法可以包括以下处理步骤: 步骤 S202: 发起方节点在 echo报文中携带该节点的 WWPN标识; 步骤 S204: 其它节点接收发起方节点发送的 echo报文, 将自身的 WWPN标识反 馈给发起节点, 并进行报文转发; 步骤 S206: 判断是否目的节点; 如果否, 则继续执行步骤 S208; 如果是, 则转 到步骤 S210; 步骤 S208: 该节点获取自身的 WWPN标识并将 echo转发至下一跳, 转到步骤 S206; 步骤 S210: 附加本节点自身的 WWPN标识, 返回响应; 步骤 S212:发起方节点接收到 echo响应报文,获取其它节点的 WWPN标识信息。 图 3是根据本发明优选实施例的递增获取 FC组网连接信息的方法流程图。如图 2 所示, 该方法可以包括以下处理步骤:
步骤 S302: 开始创建 FC网络; 步骤 S304: 发起节点发起跳数为 1的 echo报文并携带自身节点的 WWPN标识; 步骤 S306: 接收到 echo报文的节点根据跳数判断是否由该节点处理; 如果否, 则转到步骤 S308; 如果是, 则转到步骤 S310; 步骤 S308: 跳数与该节点不匹配, 转发 echo报文到下一跳; 转到步骤 S306; 步骤 S310: 跳数与该节点匹配, 该节点返回该节点标识和该节点下一跳标识, 响 应发起节点; 步骤 S312: 发起节点在接收到 echo响应之后递增跳数, 以获取下一跳的标识, 继续执行步骤 S306。 图 4a是根据本发明优选实施例的 Echo payload的数据格式示意图。 图 4b是根据 本发明优选实施例的 Echo reply的数据格式示意图。如图 4a和 4b所示,在 Echo payload 和 Echo reply的数据域均增加了两个标识字段, 其中, WWPN Num表示该 echo请求 已经携带的 WWPN个数, Type表示该节点是终端设备节点还是交换节点。 图 5是根据本发明优选实施例的获取 FC组网拓扑信息的示意图。 如图 5所示, 发起方主机节点 HI在 echo报文中携带节点 HI的 WWPN标识, 并将 echo报文发向 目的节点磁阵节点 Dl。 交换节点 S1接收到 echo报文,将 S1的 WWPN标识附加在 echo报文中,并判断 出其不是目的节点, 将报文转发下一跳; 同时, 交换节点 S1还会将自身的 WWPN标 识附加在 echo响应中返回给主机节点 Hl。 主机节点 H2接收到 echo报文, 将 H2的 WWPN标识附加在 echo报文中, 并判 断出其不是目的节点, 将报文转发下一跳; 同时, 交换节点 H2还会将自身的 WWPN 标识附加在 echo响应中返回给主机节点 Hl。 交换机点 S3接收到 echo报文,将 S3的 WWPN标识附加在 echo报文中,并判断 出其不是目的节点, 将报文转发下一跳; 同时, 交换节点 S3还会将自身的 WWPN标 识附加在 echo响应中返回给主机节点 HI。 交换机点 S2接收到 echo报文,将 S2的 WWPN标识附加在 echo报文中,并判断 出其不是目的节点, 将报文转发下一跳; 同时, 交换节点 S2还会将自身的 WWPN标 识附加在 echo响应中返回给主机节点 Hl。
交换机点 S4接收到 echo报文,将 S4的 WWPN标识附加在 echo报文中,并判断 出其不是目的节点, 将报文转发下一跳; 同时, 交换节点 S4还会将自身的 WWPN标 识附加在 echo响应中返回给主机节点 Hl。 交换机点 S5接收到 echo报文,将 S5的 WWPN标识附加在 echo报文中,并判断 出其不是目的节点, 将报文转发下一跳; 同时, 交换节点 S5还会将自身的 WWPN标 识附加在 echo响应中返回给主机节点 Hl。 目的节点 D1接收到 echo报文,判断出其为目的节点,将 D1的 WWPN标识附加 在 echo报文中, 并以响应的方式返回。 发起节点 HI 接收到中间节点的响应报文, 对响应报文进行解析, 提取其中的 WWPN标识信息,便可以判断出从发起节点到目的节点的过程路径中存在哪些节点有 效地返回了 WWPN标识, 由此可以判断中间节点的连接性。 另外, 发起节点 HI接收 到目的节点 D1的响应报文, 可以根据 D1返回的 WWPN标识列表, 构建出从发起节 点到目的节点的逻辑连接拓扑, 以展现组网拓扑形式。 图 6是根据本发明优选实施例的以递增方式获取 FC组网拓扑信息的示意图。 如 图 6所示,发起方主机节点 HI在 echo报文中携带节点 HI的 WWPN标识,并将 echo 报文中携带的跳数设置为 1。 交换节点 S1接收到跳数为 1的 echo报文,将 S1的 WWPN标识附加在 echo响应 中返回给主机节点 Hl。主机节点 HI在记录交换节点 S1的 WWPN标识信息之后, 发 送跳数为 2的 echo报文。交换节点 S1根据跳数判断出不应由该节点处理并进行转发。 主机节点 H2接收到跳数为 2的 echo报文, 根据跳数判断出应该由该节点处理, 于是将 H2的 WWPN标识附加在 echo响应中返回给主机节点 HI。 主机节点 HI在记 录交换节点 H2的 WWPN标识信息之后, 发送跳数为 3的 echo报文。 交换机点 S2接收到跳数为 3的 echo报文, 根据跳数判断出应该由该节点处理, 于是将 S2的 WWPN标识附加在 echo响应中返回给主机节点 Hl。 主机节点 HI在记 录交换节点 S2的 WWPN标识信息之后, 发送跳数为 4的 echo报文。 交换机点 S4接收到跳数为 3的 echo报文, 根据跳数判断出应该由该节点处理, 于是将 S4的 WWPN标识附加在 echo响应中返回给主机节点 Hl。 主机节点 HI在记 录交换节点 S4的 WWPN标识信息之后, 发送跳数为 4的 echo报文。
交换机点 S5接收到跳数为 3的 echo报文, 根据跳数判断出应该由该节点处理, 于是将 S5的 WWPN标识附加在 echo响应中返回给主机节点 Hl。 主机节点 HI在记 录交换节点 S5的 WWPN标识信息之后, 发送跳数为 4的 echo报文。 目的节点 D1接收到跳数为 4的 echo报文, 判断出其是目的节点, 于是将 D1的 WWPN标识附加在 echo报文中, 并以响应的方式返回。发起节点 HI接收到响应报文 后, 解析其中的 WWPN标识列表, 构建出从发起节点到目的节点的逻辑连接拓扑, 以展现组网拓扑形式。 图 7是根据本发明优选实施例的连接故障快速检测和定位的示意图。如图 7所示, 以磁阵节点 D1的视角看到的 FC组网拓扑信息如下: 当交换节点 S2发生故障时,相关技术中所采用的技术方案通常必须等待 D1发送 至 H3和 H4的数据帧超时或被丢弃之后, D1才能感知到 H3和 H4的故障, 但是其并 不清楚从 D1到 H3的所有连接节点中具体故障点发生在哪个位置,管理员在修复时只 能采取手动排查。 而采用本发明优选实施例所提供的技术方案, 由于 D1在向 H3发送 echo报文时, 中间节点 S4, S3, S2都会单独向 D1返回中间节点 S4, S3 , S2各自的 WWPN标识; 而当 S2发生故障时, D1只会接收到中间节点 S4, S3的响应, 而无法接收到 S2的响 应, 则 D1可以快速感知到 S2发生故障。 此时, 可以通过磁阵网管软件向管理员发出 告警, 以提示其进行具体的 S2故障修复。 此外, 还可以根据全网拓扑图, 由 H3返回的 WWPN标识列表判断出连接在 S2 下游的 H3和 H4已经断开,则可以通知 D1上应用层可以断开 D1到 H3和 H4的数据 帧, 以减少无效带宽的浪费, 提高 FC带宽的利用率。 图 8是根据本发明优选实施例的连接故障快速检测方式的示意图。 如图 8所示, 采用 echo进行连通性检测, 其检测方式可以分为慢速检测和快速检测, 慢速检测采取 每 2S (相当于上述第一预设周期) 发送一个 echo报文, 如果发送失败, 则进入快速 检测模式; 如果连能性上的 10出现失败或超时, 也将进入快速检测模式。快速检测模 式每 100ms (相当于上述第二预设周期) 发送一个 echo报文, 连续发送 3 (在该优选 实施例中 N的取值为 3 ) 次, 只有 3次都成功, 才会切换至慢速检测模式; 如果 3次 均失败, 则认为连能性失效, 否则一直进行检测。 这样可以快速发现节点故障并且有 效维护全网拓扑。
图 9是根据本发明实施例的节点故障检测装置的结构框图。 如图 9所示, 该节点 故障检测装置可以包括: 发送模块 10, 设置为向与当前节点连接的下游叶子节点发送 echo报文, 其中, echo报文用于检测当前节点与目的节点之间的链路是否发生异常, 当前节点和目的节点均为 FC网络中的终端设备节点; 获取模块 20, 设置为根据 echo 应答报文获取在当前节点与目的节点之间的正常工作的各个节点的标识信息; 确定模 块 30, 设置为通过获取到的各个节点的标识信息确定是否存在发生故障的节点。 采用如图 3所示的装置, 解决了相关技术中在 FC网络中交换节点或终端设备节 点之间的连接发生故障后, 数据收发节点无法快速感知的问题, 进而实现了对 FC 网 络连接故障的快速检测, 使得收发数据节点能够快速发现连接故障, 及时地做出相应 处理。 优选地, 如图 10所示, 获取模块 20可以包括: 第一接收单元 200, 设置为接收 来自于目的节点的 echo应答报文, 其中, echo应答报文中携带的信息包括: echo报 文在当前节点与目的节点之间逐级转发经过的每个节点的全球端口名称 (WWPN)标 识信息; 第一提取单元 202, 设置为对 echo应答报文进行解析, 从 echo应答报文中提 取全部节点的 WWPN标识信息。 优选地, 如图 10所示, 获取模块 20可以包括: 第二接收单元 204, 设置为接收 来自于中间节点的 echo应答报文, 其中, echo应答报文中携带的信息包括: echo报 文在当前节点与中间节点之间逐级转发经过的每个节点的 WWPN标识信息以及中间 节点收集的该中间节点下游的正常工作的全部叶子节点的 WWPN标识信息, 中间节 点为在 FC网络中由当前节点向目的节点发送的 echo报文经过的交换节点; 第二提取 单元 206, 设置为对 echo应答报文进行解析, 从 echo应答报文中提取全部节点的 WWPN标识信息。 优选地, 如图 10所示,确定模块 30可以包括: 判断单元 300, 设置为判断从 echo 应答报文中提取的 WWPN标识信息是否为在当前节点与目的节点之间的全部节点的 标识信息; 处理单元 302, 设置为在判断单元输出为否时, 根据提取到的 WWPN标识 信息确定发生故障的节点的类型, 当发生故障的节点为终端设备节点时, 则直接将发 生故障的节点的状态信息设置为故障状态; 当发生故障的节点为夂换节点, 则将发生 故障的节点以及发生故障的节点下级的全部叶子节点的状态信息均设置为故障状态。 优选地, 如图 10所示, 上述装置还可以包括: 生成模块 40, 设置为根据提取到 的 WWPN标识信息确定正常工作的各个节点之间的连接关系和全部连接关系的状态 信息, 生成网络拓扑结构关系图。
从以上的描述中, 可以看出, 上述实施例实现了如下技术效果 (需要说明的是这 些效果是某些优选实施例可以达到的效果):为了提高光纤通道网络系统的可靠性与易 管理性, 本发明实施例所提供的技术方案, 通过 echo报文快速反馈连接的故障状态, 能够快速定位 FC组网故障; 另外, 通过在节点侧建立全网拓扑图, 能够有效地解决 FC-GS-6协议中只能获取节点信息, 而无法获取全网拓扑的问题, 可以快速定位故障 发生时的拓扑状态, 在修复故障时不影响其它节点的正常运行, 极大地改善了主机、 磁阵组网的管理工作难度和工作量, 更好地使用、 管理和维护 FC网络。 当 echo报文 的发起节点接收到该报文的响应时, 即可获取 FC网络的拓扑信息, 使用在 FC网络创 建时, 各节点可以自动获得全环拓扑, 无需人工配置各节点的拓扑信息, 而且还解决 了连接故障的过程中 FC网络的拓扑发生变化, 可能会引起其它节点业务异常等问题, 使得 FC 网线更加简单, 动态增删节点时无需人工干预, 提高了系统的可靠性和易管 理性。 显然, 本领域的技术人员应该明白, 上述的本发明的各模块或各步骤可以用通用 的计算装置来实现, 它们可以集中在单个的计算装置上, 或者分布在多个计算装置所 组成的网络上, 可选地, 它们可以用计算装置可执行的程序代码来实现, 从而, 可以 将它们存储在存储装置中由计算装置来执行, 并且在某些情况下, 可以以不同于此处 的顺序执行所示出或描述的步骤, 或者将它们分别制作成各个集成电路模块, 或者将 它们中的多个模块或步骤制作成单个集成电路模块来实现。 这样, 本发明不限制于任 何特定的硬件和软件结合。 以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本领域的技 术人员来说, 本发明可以有各种更改和变化。 凡在本发明的精神和原则之内, 所作的 任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。 工业实用性 如上所述, 本发明实施例提供的一种节点故障检测方法及装置具有以下有益 效果: 为了提高光纤通道网络系统的可靠性与易管理性, 本发明实施例所提供的技 术方案, 通过 echo报文快速反馈连接的故障状态, 能够快速定位 FC组网故障; 另 夕卜, 通过在节点侧建立全网拓扑图, 能够有效地解决 FC-GS-6协议中只能获取节点 信息, 而无法获取全网拓扑的问题, 可以快速定位故障发生时的拓扑状态, 在修复 故障时不影响其它节点的正常运行, 极大地改善了主机、 磁阵组网的管理工作难度 和工作量, 更好地使用、 管理和维护 FC网络。 当 echo报文的发起节点接收到该报 文的响应时, 即可获取 FC网络的拓扑信息, 使用在 FC网络创建时, 各节点可以自
动获得全环拓扑, 无需人工配置各节点的拓扑信息, 而且还解决了连接故障的过程 中 FC网络的拓扑发生变化, 可能会引起其它节点业务异常等问题, 使得 FC网线更 加简单, 动态增删节点时无需人工干预, 提高了系统的可靠性和易管理性。
Claims
1. 一种节点故障检测方法, 包括:
向与当前节点连接的下游叶子节点发送环回诊断 echo报文, 其中, 所述 echo报文用于检测所述当前节点与目的节点之间的链路是否发生异常, 所述当 前节点和所述目的节点均为光纤通道 FC网络中的终端设备节点;
根据 echo应答报文获取在所述当前节点与所述目的节点之间的正常工作 的各个节点的标识信息;
通过获取到的各个节点的标识信息确定是否存在发生故障的节点。
2. 根据权利要求 1所述的方法, 其中, 根据所述 echo应答报文获取所述各个节点 的标识信息包括:
接收来自于所述目的节点的所述 echo应答报文, 其中, 所述 echo应答报 文中携带的信息包括:所述 echo报文在所述当前节点与所述目的节点之间逐级 转发经过的每个节点的全球端口名称 WWPN标识信息;
对所述 echo应答报文进行解析, 从所述 echo应答报文中提取全部节点的 WWPN标识信息。
3. 根据权利要求 1所述的方法, 其中, 根据所述 echo应答报文获取所述各个节点 的标识信息包括:
接收来自于中间节点的所述 echo应答报文, 其中, 所述 echo应答报文中 携带的信息包括:所述 echo报文在所述当前节点与所述中间节点之间逐级转发 经过的每个节点的 WWPN标识信息以及所述中间节点收集的该中间节点下游 的正常工作的全部叶子节点的 WWPN标识信息,所述中间节点为在所述 FC网 络中由所述当前节点向所述目的节点发送的所述 echo报文经过的交换节点; 对所述 echo应答报文进行解析, 从所述 echo应答报文中提取全部节点的 WWPN标识信息。
4. 根据权利要求 1所述的方法, 其中, 通过所述获取到的各个节点的标识信息确 定是否存在发生故障的节点包括:
判断从所述 echo应答报文中提取的 WWPN标识信息是否为在所述当前节 点与所述目的节点之间的全部节点的标识信息;
如果否, 则根据提取到的 WWPN标识信息确定所述发生故障的节点的类 型, 当所述发生故障的节点为所述终端设备节点时, 则直接将所述发生故障的 节点的状态信息设置为故障状态; 当所述发生故障的节点为交换节点, 则将所 述发生故障的节点以及所述发生故障的节点下级的全部叶子节点的状态信息均 设置为所述故障状态。
5. 根据权利要求 2或 3所述的方法, 其中, 在从所述 echo应答报文中提取所述全 部节点的 WWPN标识信息之后, 还包括:
根据提取到的 WWPN标识信息确定正常工作的各个节点之间的所述连接 关系和全部连接关系的状态信息, 生成网络拓扑结构关系图。
6. 根据权利要求 1所述的方法, 其中, 向与所述当前节点连接的下游叶子节点发 送所述 echo报文包括: 按照第一预设周期发送所述 echo报文;
如果发送失败或者在预设时长内未接收到所述 echo应答报文,则将所述第 一预设周期调整为第二预设周期, 并且连续发送 N次所述 echo报文, 其中, 所述第二预设周期的取值小于所述第一预设周期, N为大于 1的正整数, 连续 发送 N次所述 echo报文是否成功的结果用于确定是否继续发送所述 echo报文。
7. 一种节点故障检测装置, 包括:
发送模块,设置为向与当前节点连接的下游叶子节点发送环回诊断 echo报 文, 其中, 所述 echo报文用于检测所述当前节点与目的节点之间的链路是否发 生异常, 所述当前节点和所述目的节点均为光纤通道 FC 网络中的终端设备节 点;
获取模块,设置为根据 echo应答报文获取在所述当前节点与所述目的节点 之间的正常工作的各个节点的标识信息; 确定模块, 设置为通过获取到的各个节点的标识信息确定是否存在发生故 障的节点。
8. 根据权利要求 7所述的装置, 其中, 所述获取模块包括: 第一接收单元, 设置为接收来自于所述目的节点的所述 echo应答报文,其 中, 所述 echo应答报文中携带的信息包括: 所述 echo报文在所述当前节点与 所述目的节点之间逐级转发经过的每个节点的全球端口名称 WWPN标识信息;
第一提取单元, 设置为对所述 echo应答报文进行解析, 从所述 echo应答 报文中提取全部节点的 WWPN标识信息。
9. 根据权利要求 7所述的装置, 其中, 所述获取模块包括: 第二接收单元, 设置为接收来自于中间节点的所述 echo应答报文, 其中, 所述 echo应答报文中携带的信息包括: 所述 echo报文在所述当前节点与所述 中间节点之间逐级转发经过的每个节点的 WWPN标识信息以及所述中间节点 收集的该中间节点下游的正常工作的全部叶子节点的 WWPN标识信息, 所述 中间节点为在所述 FC网络中由所述当前节点向所述目的节点发送的所述 echo 报文经过的交换节点;
第二提取单元, 设置为对所述 echo应答报文进行解析, 从所述 echo应答 报文中提取全部节点的 WWPN标识信息。
10. 根据权利要求 7所述的装置, 其中, 所述确定模块包括: 判断单元, 设置为判断从所述 echo应答报文中提取的 WWPN标识信息是 否为在所述当前节点与所述目的节点之间的全部节点的标识信息;
处理单元, 设置为在所述判断单元输出为否时, 根据提取到的 WWPN标 识信息确定所述发生故障的节点的类型, 当所述发生故障的节点为所述终端设 备节点时, 则直接将所述发生故障的节点的状态信息设置为故障状态; 当所述 发生故障的节点为交换节点, 则将所述发生故障的节点以及所述发生故障的节 点下级的全部叶子节点的状态信息均设置为所述故障状态。
11. 根据权利要求 8或 9所述的装置, 其中, 所述装置还包括: 生成模块, 设置为根据提取到的 WWPN标识信息确定正常工作的各个节 点之间的所述连接关系和全部连接关系的状态信息,生成网络拓扑结构关系图。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410111574.6 | 2014-03-24 | ||
CN201410111574.6A CN104954153A (zh) | 2014-03-24 | 2014-03-24 | 节点故障检测方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015143810A1 true WO2015143810A1 (zh) | 2015-10-01 |
Family
ID=54168530
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/083256 WO2015143810A1 (zh) | 2014-03-24 | 2014-07-29 | 节点故障检测方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104954153A (zh) |
WO (1) | WO2015143810A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108833194A (zh) * | 2018-09-10 | 2018-11-16 | 广东电网有限责任公司 | 一种故障设备的定位方法、装置及设备管理终端 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106656387B (zh) * | 2015-10-30 | 2018-09-07 | 华为技术有限公司 | 用于检测时钟同步路径的方法、节点及系统 |
CN110611532B (zh) * | 2018-06-14 | 2021-03-05 | 中国移动通信集团设计院有限公司 | 一种光缆接头装置及系统 |
CN109257246A (zh) * | 2018-08-09 | 2019-01-22 | 华为技术有限公司 | 检测时延的方法、装置及系统 |
CN109787869B (zh) * | 2019-03-29 | 2020-11-06 | 新华三技术有限公司 | 一种路径故障检测方法及设备 |
CN112566123B (zh) * | 2019-09-09 | 2023-03-28 | 中国移动通信有限公司研究院 | 一种确定异常网络节点的方法及装置 |
CN112202492B (zh) * | 2020-09-01 | 2021-10-01 | 中国移动通信集团广东有限公司 | 一种光缆故障定位方法、装置及电子设备 |
CN112838962B (zh) * | 2020-12-31 | 2022-10-18 | 中国银联股份有限公司 | 一种大数据集群的性能瓶颈检测方法及装置 |
CN113395319B (zh) * | 2021-04-26 | 2022-09-16 | 国网江西省电力有限公司经济技术研究院 | 网络故障感知的方法、系统、电子设备及存储介质 |
CN113254253B (zh) * | 2021-07-14 | 2021-11-02 | 云智慧(北京)科技有限公司 | 一种数据处理方法、系统及设备 |
CN117978647A (zh) * | 2022-10-26 | 2024-05-03 | 大唐移动通信设备有限公司 | 一种网络节点信息维护方法、设备、装置及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101459547A (zh) * | 2007-12-13 | 2009-06-17 | 华为技术有限公司 | 标签转发路径故障的检测方法及系统 |
CN101986604A (zh) * | 2010-10-29 | 2011-03-16 | 中兴通讯股份有限公司 | 分组传送网的链路故障定位方法及系统 |
CN101989934A (zh) * | 2009-08-06 | 2011-03-23 | 中兴通讯股份有限公司 | 一种数据环网故障检测及定位的方法和系统 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8549542B1 (en) * | 2008-09-30 | 2013-10-01 | Emc Corporation | Correlating information from modeled and non-modeled domains |
US7940682B2 (en) * | 2008-12-15 | 2011-05-10 | At&T Intellectual Property I, L.P. | Systems configured to automatically identify open shortest path first (OSPF) protocol problems in a network and related computer program products and methods |
CN102386973A (zh) * | 2011-11-10 | 2012-03-21 | 浙江工业大学 | 光轨网络中基于带外信令的故障检测与定位方法 |
-
2014
- 2014-03-24 CN CN201410111574.6A patent/CN104954153A/zh active Pending
- 2014-07-29 WO PCT/CN2014/083256 patent/WO2015143810A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101459547A (zh) * | 2007-12-13 | 2009-06-17 | 华为技术有限公司 | 标签转发路径故障的检测方法及系统 |
CN101989934A (zh) * | 2009-08-06 | 2011-03-23 | 中兴通讯股份有限公司 | 一种数据环网故障检测及定位的方法和系统 |
CN101986604A (zh) * | 2010-10-29 | 2011-03-16 | 中兴通讯股份有限公司 | 分组传送网的链路故障定位方法及系统 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108833194A (zh) * | 2018-09-10 | 2018-11-16 | 广东电网有限责任公司 | 一种故障设备的定位方法、装置及设备管理终端 |
Also Published As
Publication number | Publication date |
---|---|
CN104954153A (zh) | 2015-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015143810A1 (zh) | 节点故障检测方法及装置 | |
US8085670B2 (en) | Method and system for originating connectivity fault management (CFM) frames on non-CFM aware switches | |
US7995483B1 (en) | Simultaneously testing connectivity to multiple remote maintenance endpoints of the same maintenance association | |
US7957295B2 (en) | Ethernet performance monitoring | |
CN102833108B (zh) | 故障点位置信息处理方法及设备 | |
WO2016045098A1 (zh) | 交换机、控制器、系统及链路质量检测方法 | |
WO2007073649A1 (fr) | Procede et systeme pour obtenir une unite de transfert maximale de voie dans un reseau | |
JP5764820B2 (ja) | 伝送システムおよび伝送システムの制御方法 | |
US10116545B2 (en) | Method, device and system for processing OAM packet | |
JP5530864B2 (ja) | ネットワークシステム、管理サーバ、及び、管理方法 | |
US20090282291A1 (en) | Internal maintenance association end point (mep) for sharing state information | |
US11139995B2 (en) | Methods and router devices for verifying a multicast datapath | |
KR20140072343A (ko) | Sdn 망의 장애 대처 방법 | |
WO2008028385A1 (fr) | Procédé et dispositif de détection de défaut de liaison de dispositif ethernet | |
WO2021018309A1 (zh) | 报文传输路径确定方法、装置及系统、计算机存储介质 | |
JP7430224B2 (ja) | パケット処理方法およびゲートウェイ・デバイス | |
WO2011144158A1 (zh) | 用于中间节点自主实现故障定位的方法及系统 | |
WO2017215456A1 (zh) | 一种告警方法、装置、网络节点和计算机存储介质 | |
WO2011020361A1 (zh) | 一种光接入节点的管理方法及光接入节点 | |
WO2022121638A1 (zh) | 一种报文处理方法及装置 | |
US8929200B2 (en) | Communication device, communication system, and communication method | |
WO2011124178A2 (zh) | 故障检测方法、路由节点及系统 | |
WO2011137766A2 (zh) | 确定网元运行状态的方法以及相关设备和系统 | |
US20210377125A1 (en) | Network Topology Discovery Method, Device, and System | |
WO2012079405A2 (zh) | 链路的跟踪处理方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14887368 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14887368 Country of ref document: EP Kind code of ref document: A1 |