CN104954153A - Method and device for node fault detection - Google Patents

Method and device for node fault detection Download PDF

Info

Publication number
CN104954153A
CN104954153A CN201410111574.6A CN201410111574A CN104954153A CN 104954153 A CN104954153 A CN 104954153A CN 201410111574 A CN201410111574 A CN 201410111574A CN 104954153 A CN104954153 A CN 104954153A
Authority
CN
China
Prior art keywords
node
identification information
echo
wwpn
response message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410111574.6A
Other languages
Chinese (zh)
Inventor
牛克强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201410111574.6A priority Critical patent/CN104954153A/en
Priority to PCT/CN2014/083256 priority patent/WO2015143810A1/en
Publication of CN104954153A publication Critical patent/CN104954153A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a device for node fault detection. In the method, an echo message is sent to a downstream leaf node connected with the current node, wherein the echo message is used for detecting whether a chain between the current node and a destination node is abnormal or not, and the current node and the destination node are both terminal equipment nodes in an FC network; the identification information of various nodes which normally work between the current node and the destination node is obtained according to an echo response message; whether faulted nodes exist or not are determined through the obtained identification information of the various nodes. According to the technical scheme provided by the invention, rapid detection for the connection faults of the FC network is realized, so that data transmitting-receiving nodes can rapidly find the connection faults, and timely make corresponding processing.

Description

Node failure detection method and device
Technical field
The present invention relates to the communications field, in particular to a kind of node failure detection method and device.
Background technology
At present, optical-fibre channel (Fibre Channel, referred to as the FC) network in correlation technique has the good Internet Transmission characteristic such as high bandwidth, low time delay, makes it be widely used in storage networking.
Be connected in the network model of magnetic battle array node at host node through telephone net node, between two nodes established a connection, (host node and magnetic battle array node) carries out data interaction, datum plane is isolated to terminal equipment node (main frame and magnetic battle array) or switching node all the time, and cannot learn the connection of whole network.Determined by the upper strata of optical-fibre channel for the accessibility of data path and validity, namely when certain switching node breaks down, the upstream node or the switching node that there is annexation with it cannot know that the node in this downstream, fault point breaks down, upstream and downstream node still continues to send Frame, until the superiors' (main frame and magnetic battle array) perception time-out just carries out corresponding troubleshooting.
Current FC agreement does not provide special mechanism to detect and is connected with validity and fault detect.For the transmission medium of this high delay requirement of FC, the invalid frame transmission brought after node failure can affect network traffics, has had a strong impact on the experience that user uses FC network.And when networking level is deep, is detected by frame between all nodes and connect the service bandwidth that break-make also can affect user on high-speed interface.
In the typical networking of main frame and magnetic battle array, a small amount of magnetic battle array can connect a large amount of host nodes simultaneously, and provides service for a large amount of main frames simultaneously, if switching node or magnetic battle array nodes break down, fault recovery management can not affect the business of other host node.Because host node and magnetic battle array do not safeguard the topological relation of the whole network, only have and register its identity by all nodes of FC-GS-6 agreement to switching node, in such cases, when there being fault to occur, manual maintenance, recovery, management can very likely affect the business run.
Current recovery ways is check actual environment networking physical connection substantially, and check the warning information of webmaster instrument, find out the physical node mated with warning information, arrange initiate physical connection cable between physical node and faulty physical node could localizing faults, thus cannot meet the requirement that quick position under complicated network environment solves fault.Be exactly the upgrading along with magnetic battle array network and transformation in addition, increase device node, network design can change, and can cause the change of existing network design, cannot meet the demand of Fast-Maintenance network according to maintenance experience in the past.
Current FC agreement is to a kind of FC equipment, by well-known address designation access Name Server, the Common Transfer agreement of FC-GS-6 definition is used to allow client to be attached to address designation and the attribute of the equipment in FC switching network, wherein, use GPN_ID to obtain port title, use GNN_ID to obtain nodename, use GCS_ID to obtain service identifiers name, use GFT_ID to obtain FC-4 attribute, use GPT_ID to obtain port-mark type etc.Host node and magnetic battle array node can only inquire the fragmented information of other node by switching node, do not have direct logical relation, unified networking cannot be provided to represent.
Loopback diagnosis (echo) order is defined in Fibre Channel-LS-2 agreement.Echo request recipient by load (payload) content after this command code according to the order received, the promoter of echo order is back to by response (reply) sequence, which provide a kind of method for transmitting data frame, and carry out simple loopback diagnosis function by returning payload content.Sequence can only have a frame, and this frame is used for transmitting echo instruction and response.
But the echo used in current FC agreement only can realize simple loopback diagnosis function, and can not obtain echo message through the message identification of node.
Summary of the invention
The invention provides a kind of node failure detection method and device, after breaking down with the connection at least solved in correlation technique between in FC network switching node or terminal equipment node, data transmit-receive node cannot the problem of quick sensing.
According to an aspect of the present invention, a kind of node failure detection method is provided.
Node failure detection method according to the embodiment of the present invention comprises: send echo message to the downstream leaf node be connected with present node, wherein, whether echo message there is exception for the link detected between present node and destination node, and present node and destination node are the terminal equipment node in FC network; The identification information of each node of the normal work between present node and destination node is obtained according to echo response message; The node broken down is determined whether there is by the identification information of each node got.
Preferably, the identification information obtaining each node according to echo response message comprises: receive the echo response message coming from destination node, wherein, the information of carrying in echo response message comprises: echo message forwards World Wide Port title (WWPN) identification information of each node of process step by step between present node and destination node; Echo response message is resolved, from echo response message, extracts the WWPN identification information of whole node.
Preferably, the identification information obtaining each node according to echo response message comprises: receive the echo response message coming from intermediate node, wherein, the information of carrying in echo response message comprises: echo message forwards the WWPN identification information of whole leaf nodes of the normal work in the WWPN identification information of each node of process and this intermediate node downstream of intermediate node collection step by step between present node and intermediate node, and intermediate node is the switching node of the echo message process sent to destination node by present node in FC network; Echo response message is resolved, from echo response message, extracts the WWPN identification information of whole node.
Preferably, determine whether there is by the identification information of each node got the node broken down to comprise: judge that whether the WWPN identification information that extracts from echo response message is the identification information of the whole nodes between present node and destination node; If not, then determine the type of the node broken down according to the WWPN identification information extracted, when the node broken down is terminal equipment node, then the direct state information by the node broken down is set to malfunction; When the node broken down is switching node, then the node broken down and the state information of whole leaf nodes of node subordinate that breaks down all are set to malfunction.
Preferably, after the WWPN identification information extracting whole node from echo response message, also comprise: the annexation between each node determining normal work according to the WWPN identification information extracted and the state information of whole annexation, generating network topological structure graph of a relation.
Preferably, send echo message to the downstream leaf node be connected with present node to comprise: send echo message according to the first predetermined period; If send unsuccessfully or do not receive echo response message in preset duration, then the first predetermined period is adjusted to the second predetermined period, and send N echo message continuously, wherein, the value of the second predetermined period is less than the first predetermined period, N be greater than 1 positive integer, send continuously N echo message whether successfully result be used for determining whether to continue transmission echo message.
According to a further aspect in the invention, a kind of node failure checkout gear is provided.
Node failure checkout gear according to the embodiment of the present invention comprises: sending module, for sending echo message to the downstream leaf node be connected with present node, wherein, whether echo message there is exception for the link detected between present node and destination node, and present node and destination node are the terminal equipment node in FC network; Acquisition module, for obtaining the identification information of each node of the normal work between present node and destination node according to echo response message; Determination module, the identification information for each node by getting determines whether there is the node broken down.
Preferably, acquisition module comprises: the first receiving element, for receiving the echo response message coming from destination node, wherein, the information of carrying in echo response message comprises: echo message forwards World Wide Port title (WWPN) identification information of each node of process step by step between present node and destination node; First extraction unit, for resolving echo response message, extracts the WWPN identification information of whole node from echo response message.
Preferably, acquisition module comprises: the second receiving element, for receiving the echo response message coming from intermediate node, wherein, the information of carrying in echo response message comprises: echo message forwards the WWPN identification information of whole leaf nodes of the normal work in the WWPN identification information of each node of process and this intermediate node downstream of intermediate node collection step by step between present node and intermediate node, and intermediate node is the switching node of the echo message process sent to destination node by present node in FC network; Second extraction unit, for resolving echo response message, extracts the WWPN identification information of whole node from echo response message.
Preferably, determination module comprises: judging unit, for judging that whether the WWPN identification information that extracts from echo response message is the identification information of the whole nodes between present node and destination node; Processing unit, for exporting as time no at judging unit, determine the type of the node broken down according to the WWPN identification information extracted, when the node broken down is terminal equipment node, then the direct state information by the node broken down is set to malfunction; When the node broken down is switching node, then the node broken down and the state information of whole leaf nodes of node subordinate that breaks down all are set to malfunction.
Preferably, said apparatus also comprises: generation module, for determine normal work according to the WWPN identification information that extracts each node between annexation and the state information of whole annexation, generating network topological structure graph of a relation.
Pass through the embodiment of the present invention, adopt and send echo message to the downstream leaf node be connected with present node, wherein, whether echo message there is exception for the link detected between present node and destination node, and present node and destination node are the terminal equipment node in FC network, the identification information of each node of the normal work between present node and destination node is obtained according to echo response message, the node broken down is determined whether there is by the identification information of each node got, namely present node initiatively sends echo message to downstream leaf node, and receive echo response message, therefrom extract the identification information of each node of echo message process, and do not need each nodal information of human configuration, and its downstream node data frame timeout caused by node failure is grasped in time according to the identification information of each node extracted, after the connection solved thus in correlation technique between in FC network switching node or terminal equipment node is broken down, data transmit-receive node cannot the problem of quick sensing, and then the fast detecting achieved FC network connectivity fai_lure, make transceiving data node can find fast to connect fault, make respective handling in time.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, and form a application's part, schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the node failure detection method according to the embodiment of the present invention;
Fig. 2 is the method flow diagram obtaining FC networking link information according to the preferred embodiment of the invention;
Fig. 3 increases progressively the method flow diagram obtaining FC networking link information according to the preferred embodiment of the invention;
Fig. 4 a is the data format schematic diagram of Echo payload according to the preferred embodiment of the invention;
Fig. 4 b is the data format schematic diagram of Echo reply according to the preferred embodiment of the invention;
Fig. 5 is the schematic diagram obtaining FC networking topology information according to the preferred embodiment of the invention;
Fig. 6 is the schematic diagram incrementally obtaining FC networking topology information according to the preferred embodiment of the invention;
Fig. 7 is the schematic diagram connecting fault fast detecting and location according to the preferred embodiment of the invention;
Fig. 8 is the schematic diagram connecting fault fast detecting mode according to the preferred embodiment of the invention;
Fig. 9 is the structured flowchart of the node failure checkout gear according to the embodiment of the present invention;
Figure 10 is the structured flowchart of node failure checkout gear according to the preferred embodiment of the invention.
Embodiment
Hereinafter also describe the present invention in detail with reference to accompanying drawing in conjunction with the embodiments.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.
Fig. 1 is the flow chart of the node failure detection method according to the embodiment of the present invention.As shown in Figure 1, the method can comprise following treatment step:
Step S102: send echo message to the downstream leaf node be connected with present node, wherein, whether echo message there is exception for the link detected between present node and destination node, and present node and destination node are the terminal equipment node in FC network;
Step S104: the identification information obtaining each node of the normal work between present node and destination node according to echo response message;
Step S106: determined whether there is the node broken down by the identification information of each node got.
In correlation technique, after the connection in FC network between switching node or terminal equipment node is broken down, data transmit-receive node cannot quick sensing.Adopt method as shown in Figure 1, present node initiatively sends echo message to downstream leaf node, and receive echo response message, therefrom extract the identification information of each node of echo message process, and do not need each nodal information of human configuration, and its downstream node data frame timeout caused by node failure is grasped in time according to the identification information of each node extracted, after the connection solved thus in correlation technique between in FC network switching node or terminal equipment node is broken down, data transmit-receive node cannot the problem of quick sensing, and then the fast detecting achieved FC network connectivity fai_lure, make transceiving data node can find fast to connect fault, make respective handling in time.
It should be noted that, node role in a preferred embodiment of the invention can be divided into initiator's (such as: above-mentioned present node) and recipient's (such as: above-mentioned purpose node), and it connects two ends and can configure as follows:
(1) one end is initiator, and the other end is recipient;
(2) two ends are initiator.
But, in no case connection two ends all can be configured to recipient.
Preferably, in step S104, the identification information obtaining each node according to echo response message can comprise following operation:
Step S1: receive the echo response message coming from destination node, wherein, the information of carrying in echo response message comprises: echo message forwards World Wide Port title (World Wide Port Name, referred to as the WWPN) identification information of each node of process step by step between present node and destination node;
Step S2: resolve echo response message, extracts the WWPN identification information of whole node from echo response message.
In a preferred embodiment, initiate node (being equivalent to above-mentioned present node) and send echo message to other nodes (comprising: above-mentioned intermediate node and destination node), by way of receiving node (being equivalent to above-mentioned intermediate node) after receiving echo message, in message, carry the WWPN identification information of self and be forwarded to next stage (jumping) node, final purpose node after receiving echo message, carry self WWPN identification information and be handed to this node the WWPN identification information of each node of process be back to initiation node.When initiating node and receiving echo response message, do not need each nodal information of human configuration.After a node gets the WWPN identification information of all nodes, can match to terminal node and switching node successively according to nodal community type.
Preferably, in step S104, the identification information obtaining each node according to echo response message can comprise the following steps:
Step S3: receive the echo response message coming from intermediate node, wherein, the information of carrying in echo response message comprises: echo message forwards the WWPN identification information of whole leaf nodes of the normal work in the WWPN identification information of each node of process and this intermediate node downstream of intermediate node collection step by step between present node and intermediate node, and intermediate node is the switching node of the echo message process sent to destination node by present node in FC network;
Step S4: resolve echo response message, extracts the WWPN identification information of whole node from echo response message.
In a preferred embodiment, except carrying out except full mesh topology discovery according to above-mentioned preferred implementation, initiate node and initiate echo message, non-destination node (being equivalent to above-mentioned intermediate node) is after receiving this echo message, all WWPN identification informations in this node downstream of direct collection, and be carried in echo reply message and return to initiation node, can be the information that each node is only responsible for initiating its downstream node like this, when networking is huger, the efficiency found when can improve the whole network initialization.
Preferably, in step s 106, determine whether there is by the identification information of each node got the node broken down and can comprise following operation:
Step S5: judge that whether the WWPN identification information that extracts from echo response message is the identification information of the whole nodes between present node and destination node;
Step S6: if not, then determine the type of the node broken down according to the WWPN identification information extracted, when the node broken down is terminal equipment node, then the direct state information by the node broken down is set to malfunction; When the node broken down is switching node, then the node broken down and the state information of whole leaf nodes of node subordinate that breaks down all are set to malfunction.
In a preferred embodiment, the WWPN identification information reported according to each node can judge whether there occurs the fault of annexation between node, if broken down, then network topology faulty link is shown as malfunction, if malfunctioning node is switching node, then the node in this switching node downstream is also shown as malfunction, gives network manager's Maintenance Point fault band convenient thus, and the state in time after display fault restoration, greatly can improve the ease of manageability of optical-fibre channel.
Preferably, in step S2 or step S4, extract the WWPN identification information of whole node from echo response message after, can also comprise the following steps:
Step S7: the annexation between each node determining normal work according to the WWPN identification information extracted and the state information of whole annexation, generating network topological structure graph of a relation.
In a preferred embodiment, receive initiate node be passed to this node (being equivalent to above-mentioned intermediate node) process each node WWPN mark, upgrade the network topological information of this node.In addition, after node gets the WWPN mark of the whole network node, can on webmaster generating network topological diagram, and on display interface, show the annexation of each node and the state of all annexations, allow user's connection of perception networking intuitively and connection status.In addition, the information that each node can also carry according to the topology information stored and the response message newly returned compares, the connection status of synchronized update networking.When there is node failure in FC network, sending node can determine the connection status in switching node downstream fast according to the situation that returns of message, and prompting user repairs failure path, and cancels the Frame that will be sent to malfunctioning node, returns application and connects fault.
Preferably, in step s 102, send echo message to the downstream leaf node be connected with present node and can comprise following operation:
Step S8: send echo message according to the first predetermined period;
Step S9: if send unsuccessfully or do not receive echo response message in preset duration, then the first predetermined period is adjusted to the second predetermined period, and send N echo message continuously, wherein, the value of the second predetermined period is less than the first predetermined period, N be greater than 1 positive integer, send continuously N echo message whether successfully result be used for determining whether to continue transmission echo message.
Below in conjunction with the preferred implementation shown in Fig. 2 to Fig. 8, above-mentioned preferred implementation process is further described.
Fig. 2 is the method flow diagram obtaining FC networking link information according to the preferred embodiment of the invention.As shown in Figure 2, the method can comprise following treatment step:
Step S202: initiator's node carries the WWPN mark of this node in echo message;
Step S204: other node receives the echo message that initiator's node sends, gives the WWPN identification feedback of self and initiates node, and carry out message repeating;
Step S206: judge whether destination node; If not, then continue to perform step S208; If so, then step S210 is forwarded to;
Step S208: this node obtains the WWPN mark of self and echo is forwarded to down hop, forwards step S206 to;
Step S210: the WWPN mark of additional this node self, returns response;
Step S212: initiator's node receives echo response message, obtains the WWPN identification information of other node.
Fig. 3 increases progressively the method flow diagram obtaining FC networking link information according to the preferred embodiment of the invention.As shown in Figure 2, the method can comprise following treatment step:
Step S302: start to create FC network;
Step S304: initiate node initiate jumping figure be 1 echo message and carry own node WWPN mark;
Step S306: the node receiving echo message judges whether by this node processing according to jumping figure; If not, then step S308 is forwarded to; If so, then step S310 is forwarded to;
Step S308: jumping figure does not mate with this node, forwards echo message to down hop; Forward step S306 to;
Step S310: jumping figure and this node matching, this node returns this node identification and this node down hop mark, and node is initiated in response;
Step S312: initiate node and increase progressively jumping figure after receiving echo response, to obtain the mark of down hop, continues to perform step S306.
Fig. 4 a is the data format schematic diagram of Echo payload according to the preferred embodiment of the invention.Fig. 4 b is the data format schematic diagram of Echo reply according to the preferred embodiment of the invention.As shown in Figs. 4a and 4b, all add two identification fields at the data field of Echo payload and Echo reply, wherein, WWPN Num represents that this echo asks the WWPN number of having carried, and Type represents that this node is terminal equipment node or switching node.
Fig. 5 is the schematic diagram obtaining FC networking topology information according to the preferred embodiment of the invention.As shown in Figure 5, initiator's host node H1 carries the WWPN mark of node H1 in echo message, and echo message is sent to destination node magnetic battle array node D1.
Switching node S1 receives echo message, the WWPN of S1 mark is attached in echo message, and judges that it is not destination node, forward the packet down hop; Meanwhile, the WWPN of self mark also can be attached in echo response and return to host node H1 by switching node S1.
Host node H2 receives echo message, the WWPN of H2 mark is attached in echo message, and judges that it is not destination node, forward the packet down hop; Meanwhile, the WWPN of self mark also can be attached in echo response and return to host node H1 by switching node H2.
Switch point S3 receives echo message, the WWPN of S3 mark is attached in echo message, and judges that it is not destination node, forward the packet down hop; Meanwhile, the WWPN of self mark also can be attached in echo response and return to host node H1 by switching node S3.
Switch point S2 receives echo message, the WWPN of S2 mark is attached in echo message, and judges that it is not destination node, forward the packet down hop; Meanwhile, the WWPN of self mark also can be attached in echo response and return to host node H1 by switching node S2.
Switch point S4 receives echo message, the WWPN of S4 mark is attached in echo message, and judges that it is not destination node, forward the packet down hop; Meanwhile, the WWPN of self mark also can be attached in echo response and return to host node H1 by switching node S4.
Switch point S5 receives echo message, the WWPN of S5 mark is attached in echo message, and judges that it is not destination node, forward the packet down hop; Meanwhile, the WWPN of self mark also can be attached in echo response and return to host node H1 by switching node S5.
Destination node D1 receives echo message, judges node for the purpose of it, the WWPN of D1 mark is attached in echo message, and returns in the mode of response.
Initiate the response message that node H1 receives intermediate node, response message is resolved, extract WWPN identification information wherein, just can judge that from initiation node to the process path of destination node, there is which node returns WWPN mark effectively, can judge the connectivity of intermediate node thus.In addition, initiate the response message that node H1 receives destination node D1, the WWPN identification list that can return according to D1, construct and connect topology, to represent networking topological form from initiating the logic of node to destination node.
Fig. 6 is the schematic diagram incrementally obtaining FC networking topology information according to the preferred embodiment of the invention.As shown in Figure 6, initiator's host node H1 carries the WWPN mark of node H1 in echo message, and the jumping figure carried in echo message is set to 1.
Switching node S1 receives the echo message that jumping figure is 1, the WWPN of S1 mark is attached in echo response and returns to host node H1.Host node H1, after the WWPN identification information of record switching node S1, sends the echo message that jumping figure is 2.Switching node S1 judges not forwarded by this node processing according to jumping figure.
Host node H2 receives the echo message that jumping figure is 2, judges by this node processing, return to host node H1 so the WWPN of H2 mark be attached in echo response according to jumping figure.Host node H1, after the WWPN identification information of record switching node H2, sends the echo message that jumping figure is 3.
Switch point S2 receives the echo message that jumping figure is 3, judges by this node processing, return to host node H1 so the WWPN of S2 mark be attached in echo response according to jumping figure.Host node H1, after the WWPN identification information of record switching node S2, sends the echo message that jumping figure is 4.
Switch point S4 receives the echo message that jumping figure is 3, judges by this node processing, return to host node H1 so the WWPN of S4 mark be attached in echo response according to jumping figure.Host node H1, after the WWPN identification information of record switching node S4, sends the echo message that jumping figure is 4.
Switch point S5 receives the echo message that jumping figure is 3, judges by this node processing, return to host node H1 so the WWPN of S5 mark be attached in echo response according to jumping figure.Host node H1, after the WWPN identification information of record switching node S5, sends the echo message that jumping figure is 4.
Destination node D1 receives the echo message that jumping figure is 4, judges that it is destination node, so the WWPN of D1 mark be attached in echo message, and returns in the mode of response.After initiation node H1 receives response message, resolve WWPN identification list wherein, construct and connect topology, to represent networking topological form from initiating the logic of node to destination node.
Fig. 7 is the schematic diagram connecting fault fast detecting and location according to the preferred embodiment of the invention.As shown in Figure 7, the FC networking topology information seen with the visual angle of magnetic battle array node D1 is as follows:
When switching node S2 breaks down, the technical scheme adopted in correlation technique usually must wait for D1 be sent to H3 and H4 Frame time-out or after being dropped, D1 just can perceive the fault of H3 and H4, but it does not also know in all connected nodes from D1 to H3, which position concrete fault point occurs in, and keeper can only take manual investigation when repairing.
And the technical scheme adopting the preferred embodiment of the present invention to provide, because D1 is when sending echo message to H3, intermediate node S4, S3, S2 can return intermediate node S4 to D1 separately, S3, S2 WWPN mark separately; And when S2 breaks down, D1 only can receive intermediate node S4, the response of S3, and the response of S2 cannot be received, then D1 can break down to S2 by quick sensing.Now, alarm can be sent by magnetic battle array webmastering software to keeper, carry out concrete S2 fault restoration to point out it.
In addition, can also according to full mesh topology figure, the WWPN identification list returned by H3 judges that H3 and H4 being connected to S2 downstream disconnects, then can notify that on D1, application layer can disconnect the Frame of D1 to H3 and H4, to reduce the waste of inefficient bandwidth, improve the utilance of FC bandwidth.
Fig. 8 is the schematic diagram connecting fault fast detecting mode according to the preferred embodiment of the invention.As shown in Figure 8, adopt echo to carry out detection of connectivity, its detection mode can be divided into slow detection and fast detecting, and slow detection takes every 2S(to be equivalent to above-mentioned first predetermined period) send an echo message, if send unsuccessfully, then enter fast detecting pattern; If the IO connected in energy property occurs failed or overtime, also fast detecting pattern will be entered.The every 100ms(of fast detecting pattern is equivalent to above-mentioned second predetermined period) send an echo message, the value sending continuously 3(N is in the preferred embodiment 3) secondary, only have 3 times all successful, just can switch to slow detection pattern; If 3 times all failed, then think and connect and can property lose efficacy, otherwise detect always.Can node failure be found fast and effectively safeguard full mesh topology like this.
Fig. 9 is the structured flowchart of the node failure checkout gear according to the embodiment of the present invention.As shown in Figure 9, this node failure checkout gear can comprise: sending module 10, for sending echo message to the downstream leaf node be connected with present node, wherein, whether echo message there is exception for the link detected between present node and destination node, and present node and destination node are the terminal equipment node in FC network; Acquisition module 20, for obtaining the identification information of each node of the normal work between present node and destination node according to echo response message; Determination module 30, the identification information for each node by getting determines whether there is the node broken down.
Adopt device as shown in Figure 3, after the connection solved in correlation technique between in FC network switching node or terminal equipment node is broken down, data transmit-receive node cannot the problem of quick sensing, and then the fast detecting achieved FC network connectivity fai_lure, make transceiving data node can find fast to connect fault, make respective handling in time.
Preferably, as shown in Figure 10, acquisition module 20 can comprise: the first receiving element 200, for receiving the echo response message coming from destination node, wherein, the information of carrying in echo response message comprises: echo message forwards World Wide Port title (WWPN) identification information of each node of process step by step between present node and destination node; First extraction unit 202, for resolving echo response message, extracts the WWPN identification information of whole node from echo response message.
Preferably, as shown in Figure 10, acquisition module 20 can comprise: the second receiving element 204, for receiving the echo response message coming from intermediate node, wherein, the information of carrying in echo response message comprises: echo message forwards the WWPN identification information of whole leaf nodes of the normal work in the WWPN identification information of each node of process and this intermediate node downstream of intermediate node collection step by step between present node and intermediate node, and intermediate node is the switching node of the echo message process sent to destination node by present node in FC network; Second extraction unit 206, for resolving echo response message, extracts the WWPN identification information of whole node from echo response message.
Preferably, as shown in Figure 10, determination module 30 can comprise: judging unit 300, for judging that whether the WWPN identification information that extracts from echo response message is the identification information of the whole nodes between present node and destination node; Processing unit 302, for exporting as time no at judging unit, determine the type of the node broken down according to the WWPN identification information extracted, when the node broken down is terminal equipment node, then the direct state information by the node broken down is set to malfunction; When the node broken down is switching node, then the node broken down and the state information of whole leaf nodes of node subordinate that breaks down all are set to malfunction.
Preferably, as shown in Figure 10, said apparatus can also comprise: generation module 40, for determine normal work according to the WWPN identification information that extracts each node between annexation and the state information of whole annexation, generating network topological structure graph of a relation.
From above description, can find out, above embodiments enable following technique effect (it should be noted that these effects are effects that some preferred embodiment can reach): in order to improve reliability and the ease of manageability of fiber channel network system, the technical scheme that the embodiment of the present invention provides, by the malfunction that echo message rapid feedback connects, can quick position FC networking fault; In addition, by setting up full mesh topology figure in node side, effectively can solve in FC-GS-6 agreement and can only obtain nodal information, and the problem of full mesh topology cannot be obtained, can quick position fault occur time topology status, not affecting the normal operation of other node when repairing fault, significantly improving main frame, the management work difficulty of magnetic battle array networking and workload, using better, administer and maintain FC network.When the initiation node of echo message receives the response of this message, the topology information of FC network can be obtained, when being used in FC network creation, each node can obtain loopful topology automatically, and without the need to the topology information of each node of human configuration, but also the topology solving FC network in the process connecting fault changes, the problems such as other node traffic exception may be caused, make FC netting twine more simple, without the need to manual intervention during dynamic additions and deletions node, improve reliability and the ease of manageability of system.
Obviously, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, and in some cases, step shown or described by can performing with the order be different from herein, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (11)

1. a node failure detection method, is characterized in that, comprising:
Loopback diagnosis echo message is sent to the downstream leaf node be connected with present node, wherein, whether described echo message there is exception for the link detected between described present node and destination node, and described present node and described destination node are the terminal equipment node in Fibre Channel network;
The identification information of each node of the normal work between described present node and described destination node is obtained according to echo response message;
The node broken down is determined whether there is by the identification information of each node got.
2. method according to claim 1, is characterized in that, the identification information obtaining each node described according to described echo response message comprises:
Receive the described echo response message coming from described destination node, wherein, the information of carrying in described echo response message comprises: described echo message forwards the World Wide Port title WWPN identification information of each node of process step by step between described present node and described destination node;
Described echo response message is resolved, from described echo response message, extracts the WWPN identification information of whole node.
3. method according to claim 1, is characterized in that, the identification information obtaining each node described according to described echo response message comprises:
Receive the described echo response message coming from intermediate node, wherein, the information of carrying in described echo response message comprises: described echo message forwards the WWPN identification information of whole leaf nodes of the normal work in this intermediate node downstream that the WWPN identification information of each node of process and described intermediate node are collected step by step between described present node and described intermediate node, and described intermediate node is the switching node of the described echo message process sent to described destination node by described present node in described FC network;
Described echo response message is resolved, from described echo response message, extracts the WWPN identification information of whole node.
4. method according to claim 1, is characterized in that, determines whether there is the node broken down comprise by the identification information of described each node got:
Judge that whether the WWPN identification information that extracts from described echo response message is the identification information of the whole nodes between described present node and described destination node;
If not, then the type of the node broken down described in determining according to the WWPN identification information extracted, when the described node broken down is described terminal equipment node, then the direct state information by the described node broken down is set to malfunction; When the described node broken down is switching node, then by the described node that breaks down and described in the state information of whole leaf nodes of node subordinate that breaks down all be set to described malfunction.
5. according to the method in claim 2 or 3, it is characterized in that, after the WWPN identification information extracting described whole node from described echo response message, also comprise:
Described annexation between each node determining normal work according to the WWPN identification information extracted and the state information of whole annexation, generating network topological structure graph of a relation.
6. method according to claim 1, is characterized in that, sends described echo message comprise to the downstream leaf node be connected with described present node:
Described echo message is sent according to the first predetermined period;
If send unsuccessfully or do not receive described echo response message in preset duration, then described first predetermined period is adjusted to the second predetermined period, and send N described echo message continuously, wherein, the value of described second predetermined period is less than described first predetermined period, N be greater than 1 positive integer, send continuously N described echo message whether successfully result be used for determining whether to continue the described echo message of transmission.
7. a node failure checkout gear, is characterized in that, comprising:
Sending module, for sending loopback diagnosis echo message to the downstream leaf node be connected with present node, wherein, whether described echo message there is exception for the link detected between described present node and destination node, and described present node and described destination node are the terminal equipment node in Fibre Channel network;
Acquisition module, for obtaining the identification information of each node of the normal work between described present node and described destination node according to echo response message;
Determination module, the identification information for each node by getting determines whether there is the node broken down.
8. device according to claim 7, is characterized in that, described acquisition module comprises:
First receiving element, for receiving the described echo response message coming from described destination node, wherein, the information of carrying in described echo response message comprises: described echo message forwards the World Wide Port title WWPN identification information of each node of process step by step between described present node and described destination node;
First extraction unit, for resolving described echo response message, extracts the WWPN identification information of whole node from described echo response message.
9. device according to claim 7, is characterized in that, described acquisition module comprises:
Second receiving element, for receiving the described echo response message coming from intermediate node, wherein, the information of carrying in described echo response message comprises: described echo message forwards the WWPN identification information of whole leaf nodes of the normal work in this intermediate node downstream that the WWPN identification information of each node of process and described intermediate node are collected step by step between described present node and described intermediate node, and described intermediate node is the switching node of the described echo message process sent to described destination node by described present node in described FC network;
Second extraction unit, for resolving described echo response message, extracts the WWPN identification information of whole node from described echo response message.
10. device according to claim 7, is characterized in that, described determination module comprises:
Judging unit, for judging that whether the WWPN identification information that extracts from described echo response message is the identification information of the whole nodes between described present node and described destination node;
Processing unit, for exporting as time no at described judging unit, the type of the node broken down described in determining according to the WWPN identification information extracted, when the described node broken down is described terminal equipment node, then the direct state information by the described node broken down is set to malfunction; When the described node broken down is switching node, then by the described node that breaks down and described in the state information of whole leaf nodes of node subordinate that breaks down all be set to described malfunction.
11. devices according to claim 8 or claim 9, it is characterized in that, described device also comprises:
Generation module, for determine normal work according to the WWPN identification information that extracts each node between described annexation and the state information of whole annexation, generating network topological structure graph of a relation.
CN201410111574.6A 2014-03-24 2014-03-24 Method and device for node fault detection Pending CN104954153A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410111574.6A CN104954153A (en) 2014-03-24 2014-03-24 Method and device for node fault detection
PCT/CN2014/083256 WO2015143810A1 (en) 2014-03-24 2014-07-29 Node fault detection method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410111574.6A CN104954153A (en) 2014-03-24 2014-03-24 Method and device for node fault detection

Publications (1)

Publication Number Publication Date
CN104954153A true CN104954153A (en) 2015-09-30

Family

ID=54168530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410111574.6A Pending CN104954153A (en) 2014-03-24 2014-03-24 Method and device for node fault detection

Country Status (2)

Country Link
CN (1) CN104954153A (en)
WO (1) WO2015143810A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017071521A1 (en) * 2015-10-30 2017-05-04 华为技术有限公司 Method, node and system for detecting clock synchronization path
CN109257246A (en) * 2018-08-09 2019-01-22 华为技术有限公司 Detect the method, apparatus and system of time delay
CN109787869A (en) * 2019-03-29 2019-05-21 新华三技术有限公司 A kind of path failure detection method and equipment
CN110611532A (en) * 2018-06-14 2019-12-24 中国移动通信集团设计院有限公司 Optical cable joint device and system
CN112202492A (en) * 2020-09-01 2021-01-08 中国移动通信集团广东有限公司 Optical cable fault positioning method and device and electronic equipment
CN112566123A (en) * 2019-09-09 2021-03-26 中国移动通信有限公司研究院 Method and device for determining abnormal network node
CN112838962A (en) * 2020-12-31 2021-05-25 中国银联股份有限公司 Performance bottleneck detection method and device for big data cluster
CN113254253A (en) * 2021-07-14 2021-08-13 云智慧(北京)科技有限公司 Data processing method, system and equipment
CN113395319A (en) * 2021-04-26 2021-09-14 国网江西省电力有限公司经济技术研究院 Method, system, electronic device and storage medium for sensing network fault
WO2024087984A1 (en) * 2022-10-26 2024-05-02 大唐移动通信设备有限公司 Maintenance method, device and apparatus for network node information, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108833194A (en) * 2018-09-10 2018-11-16 广东电网有限责任公司 A kind of localization method of faulty equipment, device and equipment management terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101459547A (en) * 2007-12-13 2009-06-17 华为技术有限公司 Label forwarding path failure detection method and system
US20100149994A1 (en) * 2008-12-15 2010-06-17 At&T Intellectual Property I, L.P. Systems Configured to Automatically Identify Open Shortest Path First (OSPF) Protocol Problems in a Network and Related Computer Program Products and Methods
CN101986604A (en) * 2010-10-29 2011-03-16 中兴通讯股份有限公司 Link fault positioning method and system of packet transport network (PTN)
CN101989934A (en) * 2009-08-06 2011-03-23 中兴通讯股份有限公司 Method and system for data ring network fault detection and location
CN102386973A (en) * 2011-11-10 2012-03-21 浙江工业大学 Out-of-band signaling-based failure detection and positioning method for light-trail network
US8549542B1 (en) * 2008-09-30 2013-10-01 Emc Corporation Correlating information from modeled and non-modeled domains

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101459547A (en) * 2007-12-13 2009-06-17 华为技术有限公司 Label forwarding path failure detection method and system
US8549542B1 (en) * 2008-09-30 2013-10-01 Emc Corporation Correlating information from modeled and non-modeled domains
US20100149994A1 (en) * 2008-12-15 2010-06-17 At&T Intellectual Property I, L.P. Systems Configured to Automatically Identify Open Shortest Path First (OSPF) Protocol Problems in a Network and Related Computer Program Products and Methods
CN101989934A (en) * 2009-08-06 2011-03-23 中兴通讯股份有限公司 Method and system for data ring network fault detection and location
CN101986604A (en) * 2010-10-29 2011-03-16 中兴通讯股份有限公司 Link fault positioning method and system of packet transport network (PTN)
CN102386973A (en) * 2011-11-10 2012-03-21 浙江工业大学 Out-of-band signaling-based failure detection and positioning method for light-trail network

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10462760B2 (en) 2015-10-30 2019-10-29 Huawei Technologies Co., Ltd. Method, node, and system for detecting clock synchronization path
WO2017071521A1 (en) * 2015-10-30 2017-05-04 华为技术有限公司 Method, node and system for detecting clock synchronization path
CN110611532A (en) * 2018-06-14 2019-12-24 中国移动通信集团设计院有限公司 Optical cable joint device and system
CN109257246A (en) * 2018-08-09 2019-01-22 华为技术有限公司 Detect the method, apparatus and system of time delay
CN109787869A (en) * 2019-03-29 2019-05-21 新华三技术有限公司 A kind of path failure detection method and equipment
CN109787869B (en) * 2019-03-29 2020-11-06 新华三技术有限公司 Path fault detection method and device
CN112566123B (en) * 2019-09-09 2023-03-28 中国移动通信有限公司研究院 Method and device for determining abnormal network node
CN112566123A (en) * 2019-09-09 2021-03-26 中国移动通信有限公司研究院 Method and device for determining abnormal network node
CN112202492A (en) * 2020-09-01 2021-01-08 中国移动通信集团广东有限公司 Optical cable fault positioning method and device and electronic equipment
CN112838962B (en) * 2020-12-31 2022-10-18 中国银联股份有限公司 Performance bottleneck detection method and device for big data cluster
CN112838962A (en) * 2020-12-31 2021-05-25 中国银联股份有限公司 Performance bottleneck detection method and device for big data cluster
CN113395319A (en) * 2021-04-26 2021-09-14 国网江西省电力有限公司经济技术研究院 Method, system, electronic device and storage medium for sensing network fault
CN113254253A (en) * 2021-07-14 2021-08-13 云智慧(北京)科技有限公司 Data processing method, system and equipment
CN113254253B (en) * 2021-07-14 2021-11-02 云智慧(北京)科技有限公司 Data processing method, system and equipment
WO2024087984A1 (en) * 2022-10-26 2024-05-02 大唐移动通信设备有限公司 Maintenance method, device and apparatus for network node information, and storage medium

Also Published As

Publication number Publication date
WO2015143810A1 (en) 2015-10-01

Similar Documents

Publication Publication Date Title
CN104954153A (en) Method and device for node fault detection
US20200106662A1 (en) Systems and methods for managing network health
CN106100914B (en) Cloud AC alarm information pushing method and system
CN106100999A (en) Image network flow control protocol in a kind of virtualized network environment
CN106817301A (en) Fault recovery method and device, controller, software defined network
CN108259193A (en) A kind of network management, Network Management System and Element management system
CN101282237A (en) Synthetic network management system based on SNMP
CN107040395A (en) A kind of processing method of warning information, device and system
CN112291075B (en) Network fault positioning method and device, computer equipment and storage medium
CN111010298B (en) PON (passive optical network) network fault monitoring method and device
CN107257300B (en) A kind of 4G access devices of wireless backup, system and method
CN102387028A (en) Network system, network management server, and OAM test method
CN105721235B (en) A kind of method and apparatus detecting connectivity
WO2024149297A1 (en) Container network packet capture processing method, apparatus and device, and readable storage medium
CN107733716A (en) Distributed file system log analysis method, system, equipment and storage medium
EP2811698A1 (en) Construction method, node and system of trill network
CN104580346A (en) Data transmission method and device
CN110620693A (en) Railway station route remote restart control system and method based on Internet of things
CN109450703A (en) The processing method and processing device of failure, storage medium
US8560668B2 (en) Alarm correlation system
CN109412851B (en) Link layer path detection method, device and system
CN105337781A (en) Network management system and method and network system
US9294376B2 (en) Apparatus for searching route in layer 2 network
CN103501240B (en) A kind of method of discovering device, apparatus and system
CN113556291B (en) Flow tracking method, device, equipment and computer readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150930

WD01 Invention patent application deemed withdrawn after publication