CN102164051A - Service-oriented fault detection and positioning method - Google Patents

Service-oriented fault detection and positioning method Download PDF

Info

Publication number
CN102164051A
CN102164051A CN2011101294244A CN201110129424A CN102164051A CN 102164051 A CN102164051 A CN 102164051A CN 2011101294244 A CN2011101294244 A CN 2011101294244A CN 201110129424 A CN201110129424 A CN 201110129424A CN 102164051 A CN102164051 A CN 102164051A
Authority
CN
China
Prior art keywords
node
message
fault
service
qos
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011101294244A
Other languages
Chinese (zh)
Other versions
CN102164051B (en
Inventor
曲桦
赵季红
刘佳飞
王力
李煜伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN 201110129424 priority Critical patent/CN102164051B/en
Publication of CN102164051A publication Critical patent/CN102164051A/en
Application granted granted Critical
Publication of CN102164051B publication Critical patent/CN102164051B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a service-oriented fault detection and positioning method. In the method, the concept of supplemental chord and quality of service (QoS) triggering is put forward for the first time; a QoS index of a service forwarding path is monitored in real time, and when the QoS index of the service forwarding path does not satisfy service transmission needs, the method is started; and a position with undesired QoS in the path is retrieved, the type of a service fault is determined, a source node is informed of the type, and the position of the fault is finally analyzed. By the method, the fault detection and positioning needs of services of different types can be satisfied, network overhead caused by fault detection and positioning are reduced, the types of the service faults can be determined, reasons for the service faults can be judged, and the specific positions of the faults can be determined.

Description

The fault detect of service-oriented and localization method
Technical field
The present invention is primarily aimed at fault detect and the location technology in the IP network, and the fault detect and the location that particularly are applied to a kind of service-oriented of IP network provide technical scheme.
Background technology
Along with development of Communication Technique and user to QoS of survice (Quality of Service, service quality) expectation constantly promotes, business becomes the main drive of communication network development, here the business of mentioning is meant the concrete transmission request of user to network, and the transmission link that requires network foundation to satisfy user's request also carries the business by Client-initiated.Under such overall background, novel business emerges in an endless stream, and common VoIP business, IPTV business, 3G business or the like all are to set up on the basis of IP technology in the daily life.But some the burst factor that occurs in professional transmission course can cause the QoS performance of the business transmitted in the network seriously to descend, and can not satisfy user's business experience, traffic failure promptly occurred.These traffic failures comprise the service disconnection that causes because of error configurations of some parameter in soft, the hardware etc., and the traffic congestion that causes such as inappropriate routing policy.If traffic failure can not get in time handling efficiently, will become the serious hindrance that user experience improves in operator, certainly will cause the decline of user satisfaction.In order to ensure professional reliable transmission, operator is necessary to introduce the fault detect and the localization method of service-oriented in IP network, in the hope of good fault detect and stationkeeping ability is provided in network, be beneficial to the fast quick-recovery of traffic failure, thereby raising network survivability, promote professional service quality, improve user's satisfaction.
Fault detect of now having used in the IP network and Fault Locating Method realize by Hello mechanism and loopback mechanism respectively, yet their detection time is all in level second, and do not support the fault detect and the location of service-oriented, exists certain limitation.
The existing fault detection mode has following two kinds in the IP network:
A kind of is slow Hello mechanism.During the IP route, under the situation that does not have hardware to help, this machine-processed detection time is very long.For example, OSPF (Open shortest Path First, OSPF) Hello and IS-IS (Intermediate System to Intermediate System, intermediate system to intermediate syst) Hello needs second detection time of level, and the detection time of RSVP (Resource Reservation, RSVP) Hello even 10 seconds levels of needs.Slow Hello mechanism is mainly used in traditional Internet services such as WWW, FTP, Email, yet for novel real time business such as online game, VoIP, IPTV, 3G, be that the user is intolerable such detection time.
A kind of outside two is fast Hello mechanism.In the face of the deficiency of legacy network aspect fault detect, bidirectional transmission detecting protocol (the Bidirectional Forwarding Detection that IETF proposes, BFD), provide a kind of light load, Millisecond, Hello testing mechanism independently, be used for detecting the channel failure between the adjacent node.These faults comprise interface fault, data link fault, and the fault of node itself.BFD can detect in real time to any media, any protocol layer, and must than broad can according to actual conditions make dynamic adjustment with the expense scope dictates its detection time.Specifically, BFD can create, delete, revise a BFD session under the prerequisite of given destination address and other parameters, and detection failure also obtains fault message.BFD has two kinds of mode of operations and Echo function, can reduce the network overhead of part, but also exist certain limitation.
Above-mentioned two kinds of fault detection methods exist following deficiency: periodically send probe messages between a. adjacent system towards each other, can not realize second following quick intermittent fault detect of level; B. owing to being operation detecting method between neighbor node, so cause more serious network jitter; C. realize detection failure by a large amount of transmission messages, so under more abominable network environment, the network load that they bring can not be ignored; Though d. existing detection method can the good connectedness of maintaining network, does not support the fault detect of service-oriented.
The Fault-Locating Test of IP network mainly adopts loop back method:
When two end nodes detected the path and have fault, one of them end node sent the loopback message to this path, and this end node is the source end node of loopback message, after each node on the path receives the loopback message, returns this message to the source end node.Whether the source end node can return the position that the loopback message is determined the path failure place according to the node in the path.
Above-mentioned Fault Locating Method exists following deficiency: a. and adopts recursive mode trouble-shooting position step by step, but can only search fault to ground by folk prescription, can't link up the locating information at fault two ends, and promptly can't orient particularly is node failure or link failure; B., certain stationkeeping ability is arranged, but do not support the fault location of service-oriented.
Summary of the invention
Main purpose of the present invention is to provide a kind of fault detect and localization method of service-oriented, trigger fault detect and localization method by QoS of survice, the probe messages that like this, only needs to send seldom just can determine that the type of traffic failure is congested or interrupts.If congested, then partial service stream is carried out " shunting "; If service disconnection, what then should judge service disconnection still is that node failure causes by link failure, and the particular location of definite fault generation.
In order to achieve the above object, concrete technical scheme of the present invention is as follows:
As a whole, the invention provides a kind of fault detect and localization method of service-oriented, this method may further comprise the steps:
[301] set up professional end to end forward-path and benefit " string ", after described benefit " string " is meant the optimum professional forward-path of foundation from the source node to the destination node, seek other can transmit the path of data between source node and destination node set, " string " that this set is mended exactly is for the connectedness that guarantees " string " needs to go up operation multi-hop keep-alive message at " string "; Be responsible for the professional QoS of monitoring in real time by destination node again, and give source node, detection trigger and localization method the feedback information of QoS bust;
[302] source node to downstream node, last node that message can reach in the destination node upstream nodes searching route simultaneously;
[303] above-mentioned last node periodically sends confirmation message to its inaccessible neighbor node, is traffic congestion or interrupts with judgement;
[304] above-mentioned last node generates service disconnection or congestion alarm, and sends to source node;
[305] above-mentioned last node generates traffic failure location message and sends to source node, and source node provides traffic failure reason and position by analysis.
In described trouble-locating message and the fault recognition message, all carrying the business datum that triggers this fault detect and localization method, the response message of passback need be rejected these business datums, to reduce the Network Transmission load.
The specific embodiments of this technical scheme is as follows:
[301] after source node receives certain class service request of destination node, information such as the parameters of integrated network and this professional QoS request, set up an optimum professional forward-path from the source node to the destination node, between source node and destination node, mend simultaneously on " string ", " string " mended is meant the set of reached at the transmission path except operating path between source node and the destination node, and then is responsible for the professional QoS index of monitoring in real time by destination node.When the QoS performance descended suddenly, destination node generated QoS and triggers message, and " string " by prior foundation feeds back to source node, triggers fault detect and localization method.
Described business comprises broadband internet business, IPTV, VoIP, online game, and the 3G business etc.May exist some difference though set up the technology of optimum forward-path for different business, this is not the emphasis that the present invention pays close attention to.
Described QoS of survice index comprises bandwidth, time delay, shake and packet loss.
The monitoring of described QoS index to certain class business, carry out as follows: at first on source node, business datum is divided into different grades according to QoS, use (the Type of Service of ToS in the IPv4/IPv6 header then, type of service) last 6 generation 64 kinds of different DSCP (Distributed Service Code Point, differentiated services code points) values of field are come their priority of mark.If each node in the network topology all is configured to carry out data qualification according to the DSCP mark, then these nodes just can identify the packet of white source node easily, and the QoS of the business behind the classification and marking of this node is monitored in real time.For example, there are many professional end to end forward-paths on certain node, to exist the situation of cross-coincidence in the network, this node can successfully be distinguished different professional forward-paths according to the DSCP value in the packet, source IP address, purpose IP address, the qos parameter of monitoring different business data.
Described QoS triggers in the message and is carrying the QoS diagnostic message, i.e. a certain or a few performance index of QoS of survice decline suddenly become big as the packet loss of business is unexpected.
The process of described benefit " string ".After the QoS of source node integrated network parameter and business, set up optimum professional forward-path, be designated as Path for business.The set that can transmit all paths of data between source node and the destination node is designated as A, and the set in all paths of other among the A except that Path is called " string ", be designated as B, certainly the different paths among the B have in various degree overlapping.Wherein Path is the element of A, and B is the subclass of A.Following relation is arranged between them:
Figure BDA0000062073010000061
" string " B last from source node to destination node; perhaps destination node periodically sends multi-hop keep-alive message to source node; as long as can receive the response message that beam back the opposite end; just show that the forward-path between source node and the destination node is to be communicated with in real time; can think " string " B operate as normal, and do not need to be concerned about in the message repeating process the intermediate node of process.At this moment, all paths among " string " B are referred to as the keep-alive path.
[302] source node generation trouble-locating message, and hop-by-hop each node downstream sends this message, is used for searching last node that forward-path retrieval message can reach.After downstream node is received the retrieval message, upwards a hop node is replied response message, and with this node whether information available be encapsulated in the retrieval message to, and next-hop node sends, this process repeats down always, does not receive in retrieval time up to a certain node till the response message of its next-hop node answer.Meanwhile, destination node also generates the trouble-locating message, again hop-by-hop upstream each node transmit messages, it is similar that its operation and source node send the trouble-locating message.
A trouble-locating message of described source node and each self-generating of destination node is transmitted along path and contrary path that business is transmitted respectively.
The described node of response message of not receiving is called last node that the retrieval message can reach, and is designated as U E(Upstream End node, the upstream retrieval can reach end node) and node D E(Downstream End node, the downstream retrieval can reach end node).
Be (1+0.1 * N described retrieval time 1) * t ETIWherein, t ETIFor sending the retrieval message to its neighbor node, node receives the time interval of the response message that neighbor node is replied, N up to it 1For more than or equal to arbitrary integer of 0, N 1The choosing of value triggered the QoS diagnostic message of carrying in the message according to QoS and carried out comprehensive consideration, as business to the delay requirement in the QoS index when higher, N 1Value is less.
[303] node U EGenerate the fault recognition message, and periodically send this message to its inaccessible next-hop node.If next-hop node can receive that confirmation message is immediately to node U EReply response message, node U EReceive any one response message in time at fault recognition, illustrate that the QoS of survice decreased performance is caused by traffic congestion, the promptly professional fault that takes place is a traffic congestion.Otherwise, in acknowledging time, do not receive any one response message if can reach end node, think that then the QoS of survice decreased performance is caused by service disconnection, the promptly professional fault that takes place is a service disconnection.Node D EOn the contrary path that business is transmitted, also operate in the manner described above.
Described acknowledging time is N 2* t ETI, N 2Choosing according to the QoS grade of such professional expectation of value considered, as business to the delay requirement in the QoS index when higher, N 2Value is less.
Rising of described traffic congestion because link burst of a plurality of chain road directions, high-speed link are excessive etc. to low speed chain circuit transmission, non-key service occupation key business, service traffics.
Rising of described service disconnection because node failure, perhaps link failure.
Described retrieval can reach end node through after the processing in above-mentioned stage, renames the fault location node as, and they are positioned at the both sides of fault occurrence positions, and adjacent with the position of breaking down.What wherein be positioned at fault occurrence positions upstream is called the upstream failure location node, and what be positioned at fault occurrence positions downstream is called the downstream fault location node.
[304] the upstream failure location node generates the traffic failure warning message and sends to source node, and the information of traffic congestion or service disconnection is carried in this alarm.After source node is received the warning message of upstream failure location node transmission, can know the type of traffic failure.In this stage, the downstream fault location node does not process.
[305] after the type of affirmation traffic failure, upstream and downstream fault location node generates the fault location message and also sends to source node.In this process, the location message by way of normal node with this node ID (Identification Code, the identity authentication code) be encapsulated in this location message, source node is received from judging the position that traffic failure takes place by corresponding computing behind the location message on the both direction.
The location message that described upstream failure location node generates sends to source node by this node along the contrary path of professional forward-path; The location message that described downstream node generates sends to destination node by this node along the direction of professional forward-path, sends to source node by destination node along " string " of prior foundation then.
This technical scheme further comprises:
Source node will compare from the locating information of the both direction explicit route with the bearer service of self storage, if find to lack in these locating information the information of some nodes, then think professional and at this node place traffic congestion or interruption will have taken place; If find that nodal information is complete, then think professional and on the link between the fault location node of upstream and downstream, sent traffic congestion or service disconnection.Be to show in these locating information when upstream and downstream fault location node is neighbor node, show that traffic failure is that by the link between them fault to have taken place caused; Show in the locating information that they are not neighbor node, show that traffic failure is to be broken down by the node between them to cause: they and the demonstration route of storage are the ID that nonequivalence operation can obtain these malfunctioning nodes.
Description of drawings
Fig. 1 is the fault detect of service-oriented and the data structure of location message;
Fig. 2 is the fault detect and the traffic coverage in each stage of localization method of service-oriented, and the sending direction of corresponding message;
Fig. 3 is the fault detect of service-oriented and the flow chart of localization method;
Fig. 4 is the network topological diagram in the specific embodiment;
Fig. 5 is the professional forward-path of setting up and " string " chosen, and the scene of three kinds of traffic failures;
Fig. 6 is the fault detect of service-oriented and the system configuration schematic diagram of location.
Embodiment
For making purpose of the present invention, technical scheme and advantage clearer, below in conjunction with the drawings and specific embodiments the present invention is done further to elaborate.
The present invention is applicable to the situation that the business transmitted on the service path breaks down suddenly that detects, and the type that can distinguish traffic failure is congested or interrupts, and can diagnose the reason that business breaks down simultaneously and orient the position that fault takes place.
Fig. 1 is the fault detect of service-oriented and the data structure schematic diagram of location message, and the field that the present invention need pay close attention to is as follows:
The Vers field, full name Version.Be used to refer to the version information of this message, length is 3bit.This message is the 1st edition at present, so this field set is 001.
The M field, full name Multi-Hop-Alive.Source node begins set when the destination node transport service, and length is 1bit.This field is 1 o'clock corresponding multi-hop keep-alive message, only periodically transmission on " string ", the real-time connectedness of responsible " string ".
The Q field, full name QoS Trigger.Set when destination node monitors the QoS performance bust of certain class business, length is 1bit.This field is that 1 o'clock corresponding QoS triggers message, indicates that entering QoS triggers the stage, starts the fault detect and the localization method of service-oriented
The S field, full name Search.Source node is received set when the QoS of destination node feedback triggers message, and length is 1bit.This field is 1 o'clock corresponding trouble-locating message, indicates to enter the trouble-locating stage, is used to search retrieval and can reaches end node.
The C field, full name Confirm.Find retrieval set can reach end node the time, length is 1bit.This field is 1 o'clock corresponding fault recognition message, indicates to enter the fault recognition stage, is used to confirm the type of traffic failure.
The A field, full name Alarm.Set when confirming the traffic failure type, length are 1bit.This field is 1 o'clock corresponding fault warning message, indicates to enter the fault warning stage, by the upstream failure location node fault warning is sent to source node, informs the type of source node traffic failure.
The L field, full name Location.When confirming the traffic failure type, with the set simultaneously of A field, length is 1bit.This field is 1 o'clock corresponding fault location message, indicate and enter the fault location stage, separately this fault location information is sent to source node by upstream and downstream fault location node, be responsible for informing the ID of source node upstream and downstream fault location node, and the node ID of operate as normal in the professional forward-path.
The E field, full name Echo.Set when needing to reply response, length is 1bit.This field is need to represent the message sender to reply response message at 1 o'clock, and be did not need in 0 o'clock.
The QD field, full name QoS Alarm Diagnostic.With the supporting use of Q field, provided which or which index bust of QoS of survice performance, length is 4bit.From the low level to a high position, be followed successively by time delay, packet loss, packet jitter, bandwidth, promptly 0001 expression time delay, 0010 expression packet loss, 0100 expression packet jitter, 1000 are represented bandwidth, when a plurality of index bust, the correspondence position position is got final product, descend simultaneously as 0111 expression time delay, packet loss and packet jitter situation.
The C/I field, full name Service Congestion/Interruption.With the supporting use of Q field, provide the type of traffic failure, length is 2bit.This field is 01 expression traffic congestion, is 10 expression service disconnection, is that 00 and 11 expression business do not break down.
The N field is used to choose the fault recognition time when supporting with the S field, expression t ETIMultiple; Be used to choose the trouble-locating time when supporting with the C field, expression N 2* t ETIMultiple, length is 8bit.N 2The size of value is considered according to the QoS grade of business expectation.
The Length field is used to indicate this fault detect and the length of locating message, and field length is 8bit.
Id field, full name Session Identification Code.Unique non-0 value that transmit leg produces is used to identify different sessions, and length is 32bit.
The ETI field, full name Echo Receive Time Interval.Send this message from its neighbor node of sending direction, receive the time interval of the response message that neighbor node is replied to it, length is 32bit.
The LDI field, full name Location Diagnostic Information.Be used to deposit the ID of upstream and downstream fault location node, and the ID of each normal node of location message approach, length is 32bit.
The SDP field, full name Service Data Patch.Be used for depositing the data of transmission when sending retrieval and confirmation message, the length of this field is decided according to different business.
The Reserve field, reserved field, length is 32bit.
Described all fields represent that current message is invalid if all be changed to 0, and perhaps expression does not need to do any operation.
Fig. 2 is the fault detect and the traffic coverage in each stage of localization method of service-oriented, and the sending direction of corresponding message, at first defining source node is S, the definition source node was S` when " string " that passes through to be mended arrived source node, destination node is D, and it is U that the upstream retrieval can reach end node E, it is D that the downstream retrieval can reach end node EThe sending direction of corresponding message comprises:
QoS triggers the stage, and traffic coverage is D → S`, and QoS triggers message and sends to S` by D along " string " of prior foundation.
In the trouble-locating stage, traffic coverage is D → D EAnd S → U EThe trouble-locating message that D generates along contrary path that business is transmitted to D EHop-by-hop ground is transmitted, the trouble-locating message that S generates along professional forward-path to U EHop-by-hop ground is transmitted.
In the fault recognition stage, traffic coverage is D E→ U E, or U E→ D ED EThe fault recognition message that generates periodically sends U along the next-hop node of professional forward-path to it EThe contrary path that the fault recognition message that generates is transmitted along business periodically sends to its next-hop node.
In the fault warning stage, traffic coverage is U E→ S.Transmit to S hop-by-hop ground in the contrary path that the fault warning message that UE generates is transmitted along business.
The fault location stage, traffic coverage U E→ S and D E→ D → S`.U EThe fault location message that generates is transmitted D along the contrary path that business is transmitted to S hop-by-hop ground EThe fault location message that generates earlier to the forwarding of D hop-by-hop ground, sends to S` by D along " string " of prior foundation along the forward-path of business then.
Below by specific embodiment technical scheme of the present invention is described, Figure 3 shows that the fault detect of service-oriented and the method flow diagram of location, comprising:
Step [301] is set up professional end to end forward-path and is mended " string ", is responsible for the professional QoS of monitoring in real time by destination node, and gives source node, detection trigger and localization method with the feedback information of QoS bust;
Node F sends the request of certain class business among Fig. 4, if node A can provide the service of such business just, node A responds such service request, after the information such as qos parameter of integrated network parameter and service request, the end to end optimum professional forward-path of foundation from source node A to destination node F is A → B → C → D → E → F, and the beginning transport service.At this moment, node A is as the source node of such professional transmission, the destination node that node F transmits as such business.Except Path all can be from node A to node F the path, as " string "
Figure BDA0000062073010000131
And move multi-hop keep-alive message thereon.Wherein, Fig. 4 (a) is mesh topology figure, does not need " augmenting " link when choosing " string "; Fig. 4 (b) is tree topology figure, needs the link of " augmenting " when choosing " string ", after filling link D-M and link I-F among (b) figure, and " string " from node F to node A
Figure BDA0000062073010000132
Many keep-alive paths have just been obtained.
Professional forward-path Path and " string " for having set up among Fig. 5 (a)
Figure BDA0000062073010000133
Destination node F is responsible for monitoring the QoS index of Path, when node F monitors a certain of self or a few QoS transmission objectives when not satisfying professional qos requirement, as the packet loss of VoIP business greater than 5%, during greater than 400ms, node F generates QoS and triggers message along " string " greater than 60ms, time delay in shake
Figure BDA0000062073010000134
Send to node A, trigger this fault detect and localization method then.
Step [302], source node to downstream node, last node that message can reach in the destination node upstream nodes searching route simultaneously;
Among Fig. 5 (a), node A generates and carries the trouble-locating message of business datum, sends to its next-hop node B, supposes that link A-B and Node B are all normal, Node B is replied response message to node A after receiving the retrieval message that node A sends immediately, if node A is at (1+0.1 * N retrieval time 1) * t ETIIn can receive the response message that Node B is sent, illustrate that then above-mentioned hypothesis sets up, promptly link A-B and Node B are all normal, Node B can reach.Node B will be retrieved message again and send to its next-hop node C simultaneously, and the processing of subsequent node is the same.Up to a certain node at (1+0.1 * N retrieval time 1) * t ETIInterior response message for receiving that its next-hop node is replied illustrates that the retrieval message can not arrive its next-hop node, thinks that simultaneously this node is that the upstream retrieval can reach end node.For example, the node U among Fig. 2 EIf at (1+0.1 * N 1) * t ETIIn do not receive the response message that its next-hop node is replied, then think node U EFor the upstream retrieval can reach end node.
Node F also generates same trouble-locating message, again hop-by-hop upstream each node send message, its operating principle is the same.For example, the node D among Fig. 2 EIf at (1+0.1 * N 1) * t ETIIn do not receive the response message that its a last hop node is replied, then think node D EFor the downstream retrieval can reach end node.
To sum up, think professional at node U EWith node D EBetween the path on transmission fault has appearred the time.
Wherein, node U EWith node D ECan be a pair of neighbor node, as the relation of node D among Fig. 5 (b) and node E; Perhaps, node U EWith node D EAlso can not be a pair of neighbor node, they have a same neighbor node, and they have a same neighbor node D as the node C among Fig. 5 (c) and node E; Perhaps node U EWith node D ECan not be a pair of neighbor node also, occur a plurality of nodes or link failure between them, as Node B among Fig. 5 (d) and node E, the neighbor node of Node B be that the neighbor node of node C, node E is node D.Just introduced the situation of three kinds of faults so: link failure, node failure, multiple faults.
Step [303], above-mentioned last node periodically sends confirmation message to its inaccessible neighbor node, is traffic congestion or interrupts with judgement;
Among Fig. 2, node U EPeriodically the next-hop node to it sends the fault recognition message, if node U EAt N 2* t ETIIn received any one response message that its next-hop node is replied, then think U EThe type of the traffic failure that takes place with its next-hop node is a traffic congestion; If node U EAt N 2* t ETIIn do not receive any one response message that its next-hop node is replied, then think U EThe type of the traffic failure that takes place with its next-hop node is a service disconnection.Node D EPeriodically the last hop node to it sends the fault recognition message, if node D EAt N 2* t ETIIn receive any one response message that its last hop node is replied, then think U EThe type of the traffic failure that takes place with its a last hop node is a traffic congestion; If node D EAt N 2* t ETIIn do not receive any one response message that its last hop node is replied, then think U EThe type of the traffic failure that takes place with its a last hop node is a service disconnection.After this, node U EWith node D ERename upstream and downstream fault location node respectively as.
Among Fig. 5 (b), node D periodically sends the fault recognition message to its next-hop node E, if node D is at N 2* t ETIIn receive any one response message, show that traffic congestion has taken place for node D and node E; Otherwise, be service disconnection.Node E periodically sends the fault recognition message to its next-hop node D, if node E is at N 2* t ETIIn receive any one response message, show that traffic congestion has taken place for node E and node D; Otherwise, be service disconnection.
Among Fig. 5 (c), node C periodically sends the fault recognition message to its next-hop node D, if node C is at N 2* t ETIIn receive any one response message, show that traffic congestion has taken place for node C and node D; Otherwise, be service disconnection.Node E periodically sends the fault recognition message to its next-hop node D, if node E is at N 2* t ETIIn receive any one response message, show that traffic congestion has taken place for node E and node D; Otherwise, be service disconnection.
Among Fig. 5 (d), Node B periodically sends the fault recognition message to its next-hop node C, if Node B is at N 2* t ETIIn receive any one response message, show that traffic congestion has taken place for Node B and node C; Otherwise, be service disconnection.Node E periodically sends the fault recognition message to its next-hop node D, if node E is at N 2* t ETIIn receive any one response message, show that traffic congestion has taken place for node E and node D; Otherwise, be service disconnection.
Step [304], above-mentioned last node generate service disconnection or congested warning message and send to source node;
Among Fig. 2, node U EGenerate the fault warning message, along U EThe path of → S sends to source node S.
Among (b) of Fig. 5, (c), (d), generate the fault warning message by node D, node C and Node B respectively, send to source node A along D → C → B → A, C → B → A and B → A path respectively then.After node A receives warning message, resolve this message and can know professional fault type.
Step [305], above-mentioned last node generates traffic failure location message and sends to source node, and source node provides traffic failure reason and position by analysis.
Among Fig. 2, node U EThe fault location message that generates is along U E→ S path sends to source node S, and this message is by way of node U EAnd the ID of all normal node between the node S is encapsulated in this message; Node D EThe fault location message that generates sends to source node S along DE → D → S` path, this message approach node D EAnd the ID of all normal node between the node S` is encapsulated in this message.
Among Fig. 5 (b), the fault location message that node D generates sends to node A along D → C → B → A, by way of node C, B the ID of this node is encapsulated in this message; The fault location message that node E generates sends to node A along E → F → A`, by way of node F the ID of this node is encapsulated in this message.The locating information that node A receives: node D is the upstream failure location node, and node E is the downstream fault location node, and node C, B, F are all normal, so can judge that position that business breaks down is on link D-E.
Among Fig. 5 (c), the fault location message that node C generates sends to node A along C → B → A, by way of Node B the ID of this node is encapsulated in this message; The fault location message that node E generates sends to node A along E → F → A`, by way of node F the ID of this node is encapsulated in this message.The locating information that node A receives: node C is the upstream failure location node, node E is the downstream fault location node, Node B, F are all normal, so can judge position that business breaks down between the C-E of path, they and explicit route are done nonequivalence operation learn that node D breaks down.
Among Fig. 5 (d), the fault location message that Node B generates sends to node A along B → A; The fault location message that node E generates sends to node A along E → F → A`, by way of node F the ID of this node is encapsulated in this message.The locating information that node A receives: Node B is the upstream failure location node, node E is the downstream fault location node, node F is normal, so can judge that position that business breaks down is between the B-E of path, they and explicit route are done nonequivalence operation learn node C, D, or link B-C and node D, or node C and link D-E break down.
In order to realize said method, the present invention also provides a kind of Fault Detection And Location System of service-oriented, is illustrated in figure 6 as the structural representation of each node in the path.
Each node comprises: message generation module 601, message receive and processing module 602, packet forwarding module 603.Because the appearance of network failure is chance event, when nodes different in the network or link occurs fault, it is different that the upstream and downstream retrieval that produces can reach end node, so under different conditions, the type that the internal module of each node generates message is different, and each module functions is as described below:
Message generation module 601, be used to generate the fault detect and location message of service-oriented, and handle from the relevant treatment information of message reception with locating module 602 transfers, module 601 is revised corresponding field in each stage of method operation, when not reaching when requiring field Q set, retrieval phase as QoS during with field S set, the stage of recognition during with field C set, alert phrase during with field A set, positioning stage with field L set, or the like.After 601 modules are finished the task of oneself, transfer it to packet forwarding module 603;
Message receives and processing module 602, be used to receive the message that transmit leg node 603 sends, this message is carried out relevant treatment, to need the message information that changes then, transfer message generation module 601 to as field Q, S, C, A, L, E, other message information is transferred packet forwarding module 603 to;
Packet forwarding module 603 is used for message is transmitted to 602 of next node, response message is sent to 602 modules of a last node.
Message generation module 601 is further used for, and when source node begins to the destination node transport service, operates in multi-hop keep-alive message on " string " by what source node generated; When QoS performance bust, by the QoS warning message of destination node generation; Different the processing stage with field S, field C, field A and field L set, and set field E, whether set needs message sink to reply response message;
Message receives with processing module 602 and is further used for, and recorder calculates the message transmitting time and the time interval that receives response message of module 603 records thus to the time of response message.Module 602 provides the index QD of QoS of survice bust in message; Provide the Type C/I of traffic failure.This module also generates fault location information LDI; Generate different unique session identification codes in different phase.After source node received the location message, the locating information that module 602 is extracted wherein positioned computing;
Packet forwarding module 603 is further used for, and when S field, the set of C field, represents that this message is a data packets for transmission, is not used in the detection and the location of fault, the time and the N of recorded message transmission simultaneously 2Size, the length of last mark message.
Above content is to further describing that the present invention did in conjunction with concrete preferred implementation; can not assert that the specific embodiment of the present invention only limits to this; for the general technical staff of the technical field of the invention; without departing from the inventive concept of the premise; can also make some simple deduction or replace, all should be considered as belonging to the present invention and determine scope of patent protection by claims of being submitted to.

Claims (7)

1. propose a kind of fault detect and localization method of service-oriented, it is characterized in that, may further comprise the steps:
1) after the professional end to end forward-path of foundation is also mended " string ", is responsible for the real-time QoS that monitors business by destination node, and gives source node, detection trigger and localization method the feedback information of QoS bust;
2) source node to downstream node, last node that message can reach in the destination node upstream nodes searching route simultaneously;
3) above-mentioned last node periodically sends confirmation message to its inaccessible neighbor node, is traffic congestion or interrupts with judgement;
4) above-mentioned last node generates service disconnection or congested alarm and sends to source node;
5) above-mentioned last node generates traffic failure location message and sends to source node, and source node provides traffic failure reason and position by analysis.
2. the fault detect of service-oriented according to claim 1 and localization method, it is characterized in that: after described benefit " string " is meant the optimum professional forward-path of foundation from the source node to the destination node, seek other can transmit the path of data between source node and destination node set, " string " that this set is mended exactly is for the connectedness that guarantees " string " needs to go up operation multi-hop keep-alive message at " string "; Described QoS of survice triggers and is meant that destination node tells different business according to informations area such as DSCP value, IP addresses, monitor the QoS index of different business then respectively, the a certain moment is when certain arrives the QoS of survice index of destination and desired value and exists than big-difference, generate QoS and trigger message and pass to source node, start this method then by " string " set up in advance.
3. the fault detect of service-oriented according to claim 1 and localization method, it is characterized in that: described retrieval is meant that source node generates the trouble-locating message, along professional forward-path hop-by-hop downstream each node send this message, be used for searching last node that forward-path retrieval message can reach; After downstream node was received the retrieval message, upwards a hop node was replied response message, and whether information available is encapsulated in the retrieval message with this node, and sends to next-hop node, and this process repeats down always, up to a certain node at (1+0.1 * N retrieval time 1) * t ETIIn do not receive till the response message that its next-hop node replys, wherein, t ETIDactylus point sends the retrieval message to its neighbor node, up to the time interval of receiving the response message that neighbor node is replied, N 1Only appearing at this retrieval phase, is 0.1 * t ETIMultiple, its size is chosen according to the desired value of QoS of survice, as business to the delay requirement in the QoS index when higher, N 1Value is less; Meanwhile, destination node also generates the trouble-locating message, along professional forward-path hop-by-hop upstream each node transmit messages, its operation is the same with source node; Finally, the trouble-locating message that source node and destination node generate respectively will arrive the both sides of fault occurrence positions, i.e. node U EWith node D E
4. the fault detect of service-oriented according to claim 1 and localization method is characterized in that: traffic failure is divided into congested and interrupts, and described fault recognition is meant by certain method confirms professional fault type; This method is node U EGenerate the fault recognition message, and periodically send this message to its inaccessible next-hop node; If next-hop node can receive that this can reach end node answer response message to confirmation message to retrieval immediately, if node U EAt fault recognition time N 2* t ETIIn receive any one response message, N wherein 2Only appear at this stage of recognition, be t ETIMultiple, its size is chosen according to the QoS desired value of business, as business to the delay requirement in the QoS index when higher, N 2Value is less, illustrates that the QoS of survice decreased performance is caused by traffic congestion, and the promptly professional fault that takes place is a traffic congestion; Otherwise, if U EAt acknowledging time N 2* t ETIIn do not receive any one response message, think that then the QoS of survice decreased performance is caused by service disconnection, the promptly professional fault that takes place is a service disconnection; Node D EOn the contrary path that business is transmitted also according to above-mentioned node U EMode operate.
5. the fault detect of service-oriented according to claim 1 and localization method is characterized in that: described fault warning dactylus point U EGenerate the traffic failure warning message and also send to source node, this warning message carries the information of traffic congestion or service disconnection, and the index that is descended by the QoS of survice that traffic congestion or service disconnection cause; Source node is received node U EBehind the warning message that sends, can know the type of traffic failure; In this stage, node D EBe left intact.
6. the fault detect of service-oriented according to claim 1 and localization method, it is characterized in that: the cause of traffic failure may be node failure or link failure, the position that traffic failure takes place is the position of node or link failure, and described fault location is meant to be determined to cause the reason of traffic failure and find out the particular location that traffic failure takes place; Determine the type of traffic failure in the fault recognition stage after, node U EThe fault location message that generates sends to source node, node D along the contrary path of professional forward-path EThe fault location message that generates sends to destination node along the direction of professional forward-path, sends to source node by destination node along " string " of prior foundation then; In this process, the location message by way of normal node this node ID is encapsulated in this location message, source node is received from the position that can judge traffic failure generation behind the location message on the both direction by corresponding computing.
7. the fault detect of service-oriented according to claim 1 and localization method, it is characterized in that: the means of described processing locating information are that source node will compare with the explicit route information of self storing from the locating information of both direction, if find to have lacked in these locating information the information of some node, then think professional and at this node place traffic congestion or interruption have taken place, these nodes are positioned at node U usually EAnd D EBetween; If find that nodal information is complete, then think professional at node U EAnd D EBetween link on sent traffic congestion or service disconnection, show node U in the promptly above-mentioned locating information EAnd D EDuring for neighbor node, show that traffic failure is by node U EAnd D EBetween link that fault has taken place is caused, show node U in the above-mentioned locating information EAnd D EBe not neighbor node, show that traffic failure is by node U EWith DNode between the E breaks down and causes: the node ID in the demonstration route of the node ID of above-mentioned locating information correspondence and storage is the ID that nonequivalence operation can obtain these malfunctioning nodes.
CN 201110129424 2011-05-18 2011-05-18 Service-oriented fault detection and positioning method Expired - Fee Related CN102164051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110129424 CN102164051B (en) 2011-05-18 2011-05-18 Service-oriented fault detection and positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110129424 CN102164051B (en) 2011-05-18 2011-05-18 Service-oriented fault detection and positioning method

Publications (2)

Publication Number Publication Date
CN102164051A true CN102164051A (en) 2011-08-24
CN102164051B CN102164051B (en) 2013-11-06

Family

ID=44465039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110129424 Expired - Fee Related CN102164051B (en) 2011-05-18 2011-05-18 Service-oriented fault detection and positioning method

Country Status (1)

Country Link
CN (1) CN102164051B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102611568A (en) * 2011-12-21 2012-07-25 华为技术有限公司 Failure service path diagnosis method and device
WO2014206207A1 (en) * 2013-06-29 2014-12-31 华为技术有限公司 Route withdrawal method and network device
CN106789223A (en) * 2016-12-13 2017-05-31 中国联合网络通信集团有限公司 A kind of IPTV IPTV service quality determining method and system
CN107241206A (en) * 2016-03-29 2017-10-10 中兴通讯股份有限公司 The method and device that a kind of business service state judges
CN107819594A (en) * 2016-09-12 2018-03-20 中兴通讯股份有限公司 network failure locating method and device
CN108632053A (en) * 2017-03-16 2018-10-09 中兴通讯股份有限公司 The processing method and processing device of business information
CN109309928A (en) * 2017-07-26 2019-02-05 华为技术有限公司 D2D chain circuit detecting method, relevant apparatus and system
CN109413689A (en) * 2018-11-30 2019-03-01 公安部沈阳消防研究所 A kind of Radio Link pull-off network detecting method
CN109644122A (en) * 2016-09-22 2019-04-16 华为技术有限公司 Resource share method, network node and relevant device
CN109787838A (en) * 2019-02-25 2019-05-21 武汉晟联智融微电子科技有限公司 Evade the method for failed trunk node in multihop network
CN109962801A (en) * 2017-12-25 2019-07-02 中国移动通信集团福建有限公司 Communication quality exception localization method, device, equipment and medium
WO2019137052A1 (en) * 2018-01-11 2019-07-18 华为技术有限公司 Method and device for network operation and maintenance
CN110879892A (en) * 2019-09-30 2020-03-13 口碑(上海)信息技术有限公司 Service processing method, device, equipment and computer readable storage medium
CN113422690A (en) * 2020-03-02 2021-09-21 烽火通信科技股份有限公司 Service quality degradation prediction method and system
CN114448785A (en) * 2022-03-18 2022-05-06 新浪网技术(中国)有限公司 Method and device for positioning fault network equipment and electronic equipment
WO2023060985A1 (en) * 2021-10-14 2023-04-20 中兴通讯股份有限公司 Fault locating method and system, and computer-readable storage medium
CN116827817A (en) * 2023-04-12 2023-09-29 国网河北省电力有限公司信息通信分公司 Data link state monitoring method, device, monitoring system and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1968156A (en) * 2006-08-30 2007-05-23 华为技术有限公司 Ethernet device link failure detection method and its system
CN101123483A (en) * 2007-09-11 2008-02-13 华为技术有限公司 Detection method and device for service link
CN101640617A (en) * 2008-07-30 2010-02-03 华为技术有限公司 Method, system and device for detecting and positioning network failure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1968156A (en) * 2006-08-30 2007-05-23 华为技术有限公司 Ethernet device link failure detection method and its system
CN101123483A (en) * 2007-09-11 2008-02-13 华为技术有限公司 Detection method and device for service link
CN101640617A (en) * 2008-07-30 2010-02-03 华为技术有限公司 Method, system and device for detecting and positioning network failure

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102611568B (en) * 2011-12-21 2016-03-30 华为技术有限公司 A kind of failure service path diagnostic method and device
CN102611568A (en) * 2011-12-21 2012-07-25 华为技术有限公司 Failure service path diagnosis method and device
WO2014206207A1 (en) * 2013-06-29 2014-12-31 华为技术有限公司 Route withdrawal method and network device
CN107241206A (en) * 2016-03-29 2017-10-10 中兴通讯股份有限公司 The method and device that a kind of business service state judges
CN107819594B (en) * 2016-09-12 2022-08-02 中兴通讯股份有限公司 Network fault positioning method and device
CN107819594A (en) * 2016-09-12 2018-03-20 中兴通讯股份有限公司 network failure locating method and device
CN109644122A (en) * 2016-09-22 2019-04-16 华为技术有限公司 Resource share method, network node and relevant device
CN106789223B (en) * 2016-12-13 2019-05-21 中国联合网络通信集团有限公司 A kind of Interactive Internet TV IPTV service quality determining method and system
CN106789223A (en) * 2016-12-13 2017-05-31 中国联合网络通信集团有限公司 A kind of IPTV IPTV service quality determining method and system
CN108632053A (en) * 2017-03-16 2018-10-09 中兴通讯股份有限公司 The processing method and processing device of business information
CN109309928A (en) * 2017-07-26 2019-02-05 华为技术有限公司 D2D chain circuit detecting method, relevant apparatus and system
CN109309928B (en) * 2017-07-26 2021-01-29 华为技术有限公司 D2D link detection method, related device and system
CN109962801A (en) * 2017-12-25 2019-07-02 中国移动通信集团福建有限公司 Communication quality exception localization method, device, equipment and medium
CN109962801B (en) * 2017-12-25 2022-06-21 中国移动通信集团福建有限公司 Communication quality abnormity positioning method, device, equipment and medium
WO2019137052A1 (en) * 2018-01-11 2019-07-18 华为技术有限公司 Method and device for network operation and maintenance
CN109413689A (en) * 2018-11-30 2019-03-01 公安部沈阳消防研究所 A kind of Radio Link pull-off network detecting method
CN109787838B (en) * 2019-02-25 2022-02-18 武汉晟联智融微电子科技有限公司 Method for avoiding fault relay node in multi-hop network
CN109787838A (en) * 2019-02-25 2019-05-21 武汉晟联智融微电子科技有限公司 Evade the method for failed trunk node in multihop network
CN110879892A (en) * 2019-09-30 2020-03-13 口碑(上海)信息技术有限公司 Service processing method, device, equipment and computer readable storage medium
CN113422690A (en) * 2020-03-02 2021-09-21 烽火通信科技股份有限公司 Service quality degradation prediction method and system
WO2023060985A1 (en) * 2021-10-14 2023-04-20 中兴通讯股份有限公司 Fault locating method and system, and computer-readable storage medium
CN114448785A (en) * 2022-03-18 2022-05-06 新浪网技术(中国)有限公司 Method and device for positioning fault network equipment and electronic equipment
CN116827817A (en) * 2023-04-12 2023-09-29 国网河北省电力有限公司信息通信分公司 Data link state monitoring method, device, monitoring system and storage medium

Also Published As

Publication number Publication date
CN102164051B (en) 2013-11-06

Similar Documents

Publication Publication Date Title
CN102164051B (en) Service-oriented fault detection and positioning method
CN101065677B (en) Router configured for outputting update messages specifying a detected attribute change of a connected active path according to a prescribed routing protocol
US8115617B2 (en) Alarm reordering to handle alarm storms in large networks
US7849215B2 (en) Updating state in edge routers
CN100417080C (en) Method for detecting network chain fault and positioning said fault
EP1861963B1 (en) System and methods for identifying network path performance
CN101523845B (en) System and method for adjusting codec speed in a transmission path during call set-up due to reduced transmission performance
CN101132320B (en) Method for detecting interface trouble and network node equipment
CN101854697B (en) Multi-constraint quality-of-service control routing method and system for wireless mesh network
CN102123088B (en) Set up the method and apparatus of traffic engineering tunnel
US8649258B2 (en) Relay apparatus, data relay method, and communication system
CN101640637A (en) Resource reservation protocol tunnel management method based on flow rate engineering and system thereof
CN116319422A (en) Network performance monitoring using active measurement protocols and relay mechanisms
CN112688869A (en) Data reliable transmission method based on dynamic routing algorithm in weak network environment
CN101527645A (en) Method, system and relevant device for collecting network topology information
KR20160131532A (en) Adaptive Bidirectional Forwarding Detection protocol and equipment for maximizing service availability in network system
Al-Ani et al. QoS-aware routing for video streaming in multi-rate Ad hoc Networks
Theoleyre et al. Operations, Administration and Maintenance (OAM) features for RAW
CN101931560B (en) Method and device for acquiring connection relationship between network equipment
US20060133387A1 (en) Route tracing in wireless networks
US20100040071A1 (en) Communication system
CN105721296A (en) Method for improving stability of chain structure ZigBee network
CN106571970B (en) The monitoring method and device of bearer network
CN116319549B (en) Distributed flow scheduling method and device
WO2023093227A1 (en) Information collection method and apparatus, and storage medium and electronic apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131106

Termination date: 20160518