CN101483547B - Evaluation method and system for network burst affair - Google Patents

Evaluation method and system for network burst affair Download PDF

Info

Publication number
CN101483547B
CN101483547B CN2009100076279A CN200910007627A CN101483547B CN 101483547 B CN101483547 B CN 101483547B CN 2009100076279 A CN2009100076279 A CN 2009100076279A CN 200910007627 A CN200910007627 A CN 200910007627A CN 101483547 B CN101483547 B CN 101483547B
Authority
CN
China
Prior art keywords
link
network
router
flow
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100076279A
Other languages
Chinese (zh)
Other versions
CN101483547A (en
Inventor
陈庶樵
曹敏
兰巨龙
张建辉
王雨
卜佑军
窦睿彧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Information Engineering University
Original Assignee
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Information Engineering University filed Critical PLA Information Engineering University
Priority to CN2009100076279A priority Critical patent/CN101483547B/en
Publication of CN101483547A publication Critical patent/CN101483547A/en
Application granted granted Critical
Publication of CN101483547B publication Critical patent/CN101483547B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention discloses a method for measuring and evaluating network burst damage event, wherein the method comprises the following steps: passive perceiving and active detecting burst event; filtering and analyzing after merging the result; dividing the event grade according to the merging result, realizing the positioning to the fault reason; determining whether an adjustment of protocol mechanism should be executed according to the grade; and checking the whole condition of adjusted network for ensuring the correctness. The invention also provides a system for measuring the network burst damage event. The system comprises a router control module, an array management algorithm module, a network management module and an off-line calculating module. The problem to be settled by the invention is cause that the router has a flow perceiving capability, realize whether the full-network link and node operate normally and whether the burst damage event is generated. The result is notified in time for adjusting the protocol mechanism. The invention provides criterion for the protocol adjustment thereby increasing the usability and reliability of network node and link and finally guaranteeing the service quality of network.

Description

A kind of network burst affair appraisal procedure and system
Technical field
The present invention relates to computer network and information security field, be specifically related to a kind of tolerance and appraisal procedure and system of network accident.
Background technology
Along with rapid development of network technology, the multi-next-hop procotol is arisen at the historic moment.Network environment under the multi-next-hop procotol, flow equalization realize by parallel transmission.
For realize that network self-healing need detect, fault in the fixer network, fault detect fast and effectively and accurately promptly to discern fault type be basis and the important means that improves network availability, reliability and guaranteed qos.
Provide in the prior art a kind of based on the fault management in the network system of SNMP detect, fault in the fixer network.
By poll with receive Trap (catching) message, obtain fault message, but because the alarm duplicate message does not too much provide trouble correlation analytic again, make the network management configuration complexity, the manual operations amount is big.Flexibility for want of of network failure management pattern and adaptability are difficult to adapt to the needs of the fault management of modern network; Fault message complexity in network, certain event of failure often produces a series of fault warning, these alarms is not added issue the net administrator with handling, certainly will cause net administrator's working strength overweight, redundant information is too much, takies network bandwidth resources, form vicious circle, network management efficiency is low.
Also can follow the tracks of fault in (Trace Router) technology for detection, the fixer network in the prior art by network diagnosis program Ping and route.
Ping is a kind of network diagnosis program (order) commonly used, is used for determining whether local host can exchange (send and receive) data with another main frame.But whether the link that can only detect between certain two node by the Ping order has fault, but can't the origination point of fault be positioned.Therefore, industry positions fault by traceroute device Trace Router technology.In actual applications, warp Ping commonly used counts and judges the link break-make, comes the fault location point with Trace Router.In the link situation that breaks down, Ping for big flow but the once in a while situation of packet loss is difficult to detect; Ping and Trace Router containing easily cause wrong report; The Ping technology can not send Ping fast for a long time and detect bag, generally all limits, and causes detection inaccurate; When a lot of bar Lu Keda was arranged in the middle of detected link, the Ping bag can only be walked wherein link, when the link that exists the link that detects and real data to walk is inconsistent, caused testing result invalid; If make Trace Router fast, need the Bao Caineng of several times of transmissions to detect fault, bigger to consumption of broadband.
Moreover, also adopt the fault in two-way forwarding detection (BFD, BidirectionalForwarding Detection) technology for detection, the fixer network in the prior art.
BFD is a kind of Hello machine contracting system, by judging whether to lose continuously the packet of predetermined number, and then judges the mechanism that link is whether available.But for link is situation available, that just be interrupted packet loss once in a while, then can't realize detecting and reporting; BDF is the mechanism that a kind of point-to-point detects, and can't realize the quick location to link failure point; When between the source and destination mulitpath being arranged, detect bag and can only walk wherein one, may be different with path that actual data packet is walked, therefore produce and report by mistake.
In addition, the multiprotocol label switching (mpls) of also adopting International Telecommunication Union to propose in the prior art can be operated maintainability (OAM) technology and carries out that network failure detects, the location.
Because the label switched path (LSP, Label Switching Path) of MPLS is one section and connects one section foundation, and LSP can be nested, by forward faults indication (FDI, Forwaod DefectIndicator) machine-processed detection and location fault.But only effective to the MPLS network, invalid to IP network.Therefore can't realize fault detect in existing IP network; Also be a kind of by judging continuous packet loss, and then judge the testing mechanism that link is whether available, owing to adopt FDI mechanism, depend on the FDI bag that outer LSP sends for failure location, when outer LSP does not move OAM mechanism, then can't carry out fault location to the node in the link.
Other network fault detecting methods of the prior art under single next-hop network environment, mainly are the detections at network connectivity failure mostly, and the fault urgency are not carried out classification, and can't tackle happens suddenly to ruin hits the sexual behavior part.Therefore, the existing fault detection method can not simply be transplanted to the multi-next-hop network.Because the characteristics of accident are the sudden of it, therefore to it fast with measure accurately and assessment is that it can be found timely and effectively, and the final basis that node and link reliability is not threatened, so concerning the incident of hitting was ruined in the burst of IP network, fault detect speed was more important than accuracy.Whether the problem that need solve at the multi-next-hop network is how to make router have the flow perception, understands whether operate as normal of total network links and self node, having burst to ruin the incident of hitting and taking place.
Summary of the invention
In view of this, the invention provides a kind of tolerance and appraisal procedure and system of accident, can in time detect and accurately judge the network accident.
The tolerance and the appraisal procedure of a kind of network accident that the embodiment of the invention provides comprise:
Detect and obtain each relevant link flow change information of router;
Router is transmitted the conceptual data amount detect, obtain the conceptual data quantitative change information of forwarding;
Each the link flow information obtained and the total amount of data change information of forwarding are carried out merger and analysis;
Judge whether to take place accident according to merger and analysis result.
The embodiment of the invention also provides a kind of tolerance and evaluating system of network accident, comprising:
First detecting unit is used to detect and obtain each relevant link flow change information of router;
Second detecting unit is used to detect router and transmits the conceptual data amount, obtains the conceptual data quantitative change information of forwarding;
Processing unit is used for merger and analyzes each the link flow information obtained and the total amount of data change information of forwarding;
Identifying unit judges whether to take place accident according to merger and analysis result.
The tolerance and the appraisal procedure of the incident of hitting ruined in the burst that the embodiment of the invention provides, by detecting and obtain each relevant link flow change information of router; Router is transmitted the conceptual data amount detect, obtain the conceptual data quantitative change information of forwarding; Each the link flow information obtained and the total amount of data change information of forwarding are carried out merger and analysis; Thereby judge whether to take place accident.
The present invention adopts distributed traffic abnormality detection strategy, and each node is finished volume forecasting and abnormality detection independently in data acquisition, and offered load is little.Hardware supports that need be extra not, resource overhead is few.Has reliability, the availability of the fail safe of assurance itself and the network segment of managing; In time detecting network burst ruins and hits the sexual behavior part and provide early warning information for emergency response personnel.
Description of drawings
Fig. 1 is that network burst is ruined and hit incident metric evaluation system architecture schematic diagram;
Fig. 2 is that network provided by the invention burst is ruined and hit incident metric evaluation schematic diagram;
Fig. 3 is that the network burst that provides in the embodiment of the invention is ruined and hit incident metric evaluation method flow diagram;
Fig. 4 is relevant each link flow sudden change testing process of router figure in the embodiment of the invention;
Fig. 5 is that router is transmitted conceptual data amount sudden change testing process figure in the embodiment of the invention.
Embodiment
Link flow that the present invention causes this because fault or router transmit the sudden change of packet and may cause congestedly be defined as burst and ruin and hit the sexual behavior part, detection by router self is taken precautions against, and assess, and do not increase extra equipment by prediction.
In network system 100, be provided with as lower module:
Network management unit 110, mainly be existing snmp management protocol module, read the required amount of correlated information of detection in its MIB storehouse, transmit data volume and each link flow etc. as router, provide corresponding data information to administrative unit, also calculating for off-line threshold provides data message.
Administrative unit 120 mainly is made up of detection module 121 and sensing module 122, and wherein sensing module 122, and being equivalent to increases the forwarding state detection module on router, can improve himself reliability of applying and controllability in network environment.After just packet being carried out rule match in the present router, decision is transmitted and is still abandoned, not to transmitting the unusual detectability of data volume.When router can detect the unexpected variation of transmitting data volume, whether then can judge to transmit this moment has unusually, if any reason unusually so may be attack or network in neighbor node equipment lost efficacy, send warning information this moment, and reason announced other neighbours, take corresponding strategy according to the accident reason.Here be attribute of sudden change definition, the definition router is transmitted the amplitude of variation μ of data volume, is the normal variation of special occasion network peak period less than the variation of μ.μ discloses the sensitivity of algorithm to accident, and the sudden change amplitude then is greater than μ.After getting rid of normal possibility, the sudden change of appearance is considered as without exception unusually.This is a passive sensing module problem to be solved, just at the perception of router forwarding state.
Detection module 121 in the administrative unit 120 active perception Link State in router, link utilization situation just, detection by sudden change learns whether link flow is unusual, equally in time send warning information, take corresponding measure according to warning information, reach the purpose of avoiding unusual link.Be to judge whether congestedly by the detecting link utilance mostly to the detection of link at present, but this method can increase offered load, brings extra expense.The present invention detects the sudden change of the rising and the two kinds of trend that descend according to the link flow feature, not only detects the situation that congestion condition also comprises link failure.And at present big multimutation detection algorithm can only detect monolateral sudden change.Here be two attributes of sudden change definition: the one, the duration T p of sudden change is considered to normal flow white noise less than the sudden change of Tp; The 2nd, the amplitude of variation δ of sudden change is the normal variation that network self trend takes place less than the variation of δ.Tp and δ are two artificial parameters of adjusting, and they disclose the sensitivity of algorithm to accident, and the duration of fault detect is greater than Tp, and the sudden change amplitude then is greater than δ.After getting rid of normal possibility, the sudden change of appearance is considered as without exception unusually.This is an active detecting module problem to be solved, just to the perception of link utilization.
Router control unit 130 mainly comprises tracking prediction module 131, alarm module 132 and control module 133.Because front detection module 121 and sensing module 122 just to the detection of network part, so when with the abnormal results announcement, after agreement is adjusted, also need to carry out tracking prediction, are verified the correctness of adjusting the result.Tracking prediction module 131 is held by the integral body to state of network traffic, in conjunction with adjusted route matrix, can learn whether the link of fault in the network or node are avoided, and whether adjusted network traffics overall state is normal.This is tracking prediction module 131 problems to be solved, just the grasp of the whole network flow is controlled.Alarm module is mainly collected the warning information of detection module 121 and sensing module 122 announcements, and carries out filter analysis.133 of control modules are to the processing of warning information and carry out the orientation announcement, and hold the whole network situation according to the information that tracking prediction module 131 is submitted to.
Off-line threshold computing unit 140, mainly be collect in the SNMP network management protocol the MIB storehouse in interface group variable: ifInOctets (byte per second), ifOutOctets (byte per second), IP organizes variable: ipForwDatagrams (bag/second), ipInReceives (bag/second), and the threshold value of judgement is provided for online sensing module by computing formula.Simultaneously, record trouble link or node, and its probability of malfunction of statistical analysis are regularly announced the whole network with the node or the link that often break down, can avoid this category node or link when route selection or flow equalization.Failure cause is added up, thus the distribution situation of awareness network failure cause.
The present invention is described in further detail below in conjunction with accompanying drawing.
The present invention detects the sudden change of the rising and the two kinds of trend that descend according to the link flow information of obtaining, and not only detects the situation that congestion condition also comprises link failure.
Particularly, the embodiment of the invention is ruined the detection of the incident of hitting based on distributed burst, threshold value is by the calculating of off-line, reduce the amount of calculation of router, and by the fault location link, or malfunctioning node, avoid the Link State storm by carrying out the orientation announcement, each link failure probability of final entry determines bottleneck link.Provide foundation for optimizing whole Internet resources, when taking place, burst ruins the incident of hitting, network enters a kind of new stable state again after the agreement adjustment, fault detect this moment is the variation that will measure system mode, and need after state changes, still not alarm, avoid this kind situation by producing a back off timer in the embodiment of the invention.
Referring to Fig. 2 and Fig. 3, the burst of network that the embodiment of the invention provides is ruined and is hit incident metric evaluation method flow and comprise following concrete steps:
Step S01, detect and obtain relevant each link flow change information of router;
Referring to Fig. 4, above-mentioned sudden change to relevant each link flow of router detects by the following method and realizes that concrete steps are:
Accomplish that sudden change detects fast, will reduce time overhead, the embodiment of the invention is started with from division and maintenance required time of bucket for this reason, reduces time cost in this regard.
Step 101, read the flow information of the statistics of snmp management agreement in the webmaster module, form data flow x 1, x 2... ..x n
Step 102, to data flow x 1, x 2... ..x nIn each element ask and carry out its prefix and computing, x i'=f Ss(i), promptly
f ss ( i ) = Σ j = 1 i x j
Obtain another one data flow x 1', x 2' ... ..x n';
Step 103, data flow is divided into Preserve in the individual bucket, the division principle of bucket is with x 1' put into a bucket, all the other every adjacent two data are put into a bucket, specifically as shown in the figure:
When n is even number:
When n is odd number:
Figure G2009100076279D00064
Step 104, when new data arrives, be added to successively on the data with existing value, at this moment cluster set is respectively that length of window is 2 to the value of n+1;
Step 105, in order to determine whether to take place the data traffic sudden change fast, be on the data flow of n in length with detection limit in the embodiment of the invention, so lose the cluster set that length of window is n+1, with the length of window that newly obtains is that the cluster set of n is put into the corresponding position of original bucket, carry out this step successively, at this moment first bucket is empty, and the data that newly obtain are put into first bucket, Tong number is constant like this, and keeping cluster set in the bucket represent length of window simultaneously still is 1 to arrive n;
Limit the length of data flow n, just keep the number of bucket constant, that preserve in the up-to-date bucket is prefix and x 1', x 2' ... ..x n' middle maximum data, when new data arrives, upgrade the window minimal data at first, just the value in first barrel.The reason of doing like this is according to the principle that detects: if length is not undergo mutation on the window of L, will not detect length be L+1 and later window thereof to algorithm so, and later sudden change can be thought to cause by jolting.Jolting is meant that the data on some time point have reached certain value in the data flow, but also is not enough to become the situation of sudden change.When on long time window, detecting sudden change, be mistaken as sudden change probably after these little variation accumulation.So preferentially upgrade the cluster set of wicket, accelerate detection speed.
Step 106, all results store with the form of table, the information of a bucket of every row storage, and two row divide odd even to store.When calling cluster set, F (2w i) only need search even column and get final product.As long as the position of corresponding length of window value correspondence is found in all the other inquiries, accelerated searching speed like this.
Step 107, known length are the i (aggregate function F (w on the sliding window of 0≤i≤n) i) result of calculation obtains by above-mentioned steps.Aggregate function F is summation operation sum.Because F ( w i ) = Σ i = 1 i x j , So with inequality F (w i) 〉=β (F (w 2i)-F (w i)) (β>1) describe the sudden change with ascendant trend, with inequality F (w i)≤α (F (w 2i)-F (w i)) (0<α<1) describe the sudden change with downward trend.These two inequality distortion are obtained following formula:
F(w 2i)≥(β+1)(F(w 2i)-F(w i))(β>1)
F(w 2i)≤(α+1)(F(w 2i)-F(w i))(0<α<1)
Step 108, in data flow new element x nWhen arrival and insertion histogram, just move this algorithm, in order to detect x nWhether cause the generation that suddenlys change.This algorithm layering is carried out, since the 1st layer until the n/2 layer, each layer comprises data flow x and goes up two nearest adjacent isometric sliding windows.With the i layer is example, and algorithm is long for detecting on the sliding window of i whether the sudden change generation is arranged at two.Testing process was finished by two steps, at first calculated aggregate function F (w i) value and F (2w i) value, whether set up by detecting above-mentioned two inequality, judged whether that sudden change takes place, and output alarm signal.
Such as, window length is 1, then compares x 1' and (x 2'-x 1') value;
Window length is 2, then compares x 2' and (x 4'-x 2') value;
Window length is 3, then compares x 3' and (x 6'-x 3') value.
Carry out successively up to detecting sudden change, length of window stops to detect up to n/2, and initiate data are proceeded same calculating relatively.
Step 109, note the link that breaks down, deposit in the server of calculated off-line threshold value, carry out statistical analysis, obtain its probability of malfunction, the link that often breaks down is regularly announced the whole network, when routing or flow equalization, can avoid this type of link.Also carry out statistical analysis to the overload link takes place equally, give each link weighted value according to the result, the routing during for flow equalization provides foundation.
Step S02, router is transmitted the conceptual data amount detect, obtain the conceptual data quantitative change information of forwarding;
Referring to Fig. 3, concrete steps are:
With a time series is example, in a sequence, be reference with a sliding window, detect the next adjacent measured value of this sliding window whether ANOMALOUS VARIATIONS takes place, whether fall in the adaptive threshold confidential interval up and down according to detected point, judge whether current point is unusual, just think that when amplitude of variation exceeds confidential interval this measured value is unusual.When sliding window moves forward in sequence step by step in turn, each point in the flow sequence of observations all will be detected.The decision function of detection algorithm is the likelihood ratio about a point and sliding window, it is the ANOMALOUS VARIATIONS between point of check and the sliding window, be the variation relation that detects between indivedual and the part, it can give prominence to the ANOMALOUS VARIATIONS situation in the short time, and can not miss other ANOMALOUS VARIATIONS incident.
Step 201, from snmp management information base MIB fetch interface group variable regularly: ifInOctets (byte per second), ifOutOctets (byte per second), IP organizes variable: ipForwDatagrams (bag/second), ipInReceives (bag/second), and they are the indication equipment interface per second byte number that receives and send, the router number-of-packet that receives and transmit each second respectively.
Step 202, employing sliding window model are exactly only to be concerned about the individual data of N up-to-date in the management information bank (sliding window size), and along with data constantly arrive, statistical information is brought in constant renewal in, the continuous translation of window data.
Step 203: to the mib variable in the sliding window, just four variablees in the step 301 are carried out adaptively sampledly, be arranged in a time series then in chronological order respectively, each time series is exactly a flow sequence of observations like this.
Step 204, at time point ... t N-1, t n, t N+1... a sequence of observations
...X(t n-1)、X(t n)、X(t n+1)...
Make X (t n)=y t, the measured value during expression t=n, suppose ... y T+1, y T+2... y T+N+1Be the part stably, to time slip-window ... y T+1, y T+2... y T+N+1Carry out zero-meanization, promptly do following processing:
y ‾ = 1 N Σ i = 1 N + 1 y t + 1
x t+i=y i+i-y i=1,2......N+1
Step 205: to the sequence after the zero-meanization with AR (2) model match obtain difference ... e T+1, e T+2... e T+N, e T+N+1}:
Figure G2009100076279D00092
I=1,2 ... .N+1
Figure G2009100076279D00093
Be model coefficient.
Step 206: determine decision function W t(N+1)
W t ( N + 1 ) = e t + N + 1 δ ^
Wherein δ ^ = ( e t + 1 2 + e t + 2 2 + . . . + e t + N + 1 2 ) / ( N + 1 )
Step 207: adaptive threshold U (t+1), L (t+1) determine.
The flush mechanism of describing by following formula superposes into behavior before:
p(t+1)=α[y t+N+1-y t+N]+y t+N
Wherein p (t+1) is exactly the threshold value predicted value of t+1 constantly, i.e. t+1 normal model constantly, y T+NBe t measured value constantly, y T+N+1Be t+1 measured value constantly, α is a weighting constant, its control new data shared proportion in model, and controlling models adapts to the speed degree of local behavior.
Step 208, so at first set up modeled normal behaviour, also comprised refreshing of normal model, when if current measured value meets this normal model fully, this measured value obviously is normal so, but this judgement is crossed and is harshness, and actual conditions can not meet theoretical model fully, so set a tolerance boundary, confidential interval just, as long as in this confidential interval, then be judged to be normal:
U (t+1)=p (t+1)+3 standard deviation
L (t+1)=p (t+1)-3 standard deviation
The scope of confidential interval also need calculate to set up the identical method of normal behaviour model, difference is that the data of normal behaviour are from measured value itself, the data of fiducial interval range are then obtained by the standard deviation of these measured values, for example, synchronization in every day, a measured value is all arranged, each of a week just has 7 measured values constantly so in the past, these measured values can be calculated their standard deviation, after obtaining this standard deviation, it is added on the model of normal behaviour, obtains coboundary U (t+1), deduct this standard deviation by normal behaviour again, obtain lower boundary L (t+1).According to adding standard deviation number difference, can obtain the range of tolerable variance of different stage.Ordinary circumstance adopts 2-3 times of standard deviation.So judgment criterion is:
When-L (t+1)≤W t(N+1)≤during U (t+1), be judged as normal
Work as W t(N+1)>U (t+1) or W t(N+1)<-during L (t+1), be judged as unusual.
Wherein standard deviation calculation formulas is:
d m = 1 n ( d 1 + d 2 + . . . + d n )
Figure G2009100076279D00102
n=7
The establishing method that step 209, parameter are selected, it is given that this model has three parameters to need, i.e. the big or small N of the exponent number p of AR model, sliding window and weighting constant α.According to Time-series Theory, in actual applications, the exponent number of AR model is no more than 2.General AR exponent number p and sequence length N must satisfy following constraints:
0≤p≤0.1N
Choose N=20 at this, just the size of sliding window is 20, and weighting constant α is 1.1.
Step 210, note the node ID number that breaks down, deposit in the server of calculated off-line threshold value, carry out statistical analysis, obtain its probability of malfunction, the node that often breaks down is regularly announced the whole network, when routing or flow equalization, can avoid this category node.
Step S03, each the link flow information obtained and the total amount of data change information of forwarding are carried out merger and analysis;
Concrete grammar is as follows:
If the router of four ports, and,, do not comprise concurrent fault so hypothesis only is single link or node failure at this moment because the simultaneous probability of various faults is very little:
Situation one: step 101 testing result is the sudden change of downward trend, supposes link 1 Traffic Anomaly herein, and step 102 testing result is the sudden change of downward trend, is judged to be this link failure this moment.Reason is a link failure, and flow reduces, and so corresponding router is transmitted the conceptual data amount and reduced, because only consider the single fault situation, this moment, other links and node were normal.
Situation two: step 101 testing result is the sudden change of downward trend, supposes link 1 Traffic Anomaly herein, and step 102 testing result is the sudden change of ascendant trend, is judged to be that this is congested this moment.Because different with situation 1, router is transmitted the conceptual data amount to be increased, and so at first node is normal, after the reason that increases is link 1 fault, because re-route traffic is loaded into other links, cause the adjacent link flow to increase, directly show the increase of router transfer amount.And link 1 flow anticlimax is because the congested performance that causes packet loss.
Situation three: step 101 testing result is the sudden change of ascendant trend, supposes link 1 Traffic Anomaly herein, and step 102 testing result is the sudden change of ascendant trend, is judged to be attack this moment.Because attack generally at node, but attack traffic arrives node through link, and both increase simultaneously, so be judged to be attack.Different with situation 2, to transmit the conceptual data amount and increase though be router, the relevant link flow reduces in the situation 2, and this is not meet to attack the actual conditions that take place.
Situation four: step 101 testing result is the sudden change of ascendant trend, supposes link 1 Traffic Anomaly herein, and step 102 testing result is the sudden change of downward trend, is judged to be this node failure this moment.Though because link flow is increasing, router is transmitted data volume and is being reduced, node this moment fault, flow can not normally be transmitted, and the increase of single link flow can't operate as normal cause because of node.
Merger is carried out in step 101,102 gained results and above analysis handled, divide exception level, shown in the table specific as follows:
Step S02, S03 gained result are carried out grade classification, shown in the table specific as follows:
The gained result Code The analysis of causes Grade classification
Link flow is that the total amount of data sent out of turn reducing is for subtracting 00 Link failure The III level
Link flow is that the total amount of data sent out of turn reducing is for increasing 01 Congested The I level
Link flow is to increase the total amount of data of forwarding for increasing 10 Attack The II level
Link flow is to increase the total amount of data of forwarding for subtracting 11 Neighbor node lost efficacy The IV level
More than be that the merger result divides exception level, but need be to the further filter analysis of result.Filtration is meant to reducing false alarm information, carries out certain warning message earlier and filters, and defines three threshold values herein, and false alarms information is got rid of.Step 101 is the sudden change detections to link flow, and the not necessarily unusual certain generation of sudden change, so duration T p and two decision thresholds of amplitude of variation δ of the sudden change of definition flow.When mutation time less than Tp, then think normal flow white noise; When the amplitude of variation of flow sudden change less than δ, then think the normal variation that network self trend takes place.Step 102 is the detections to the overall transfer amount of router, though confidential interval has been set, but still might be the abnormal results that the network normal variation causes, so the definition router is transmitted the amplitude of variation μ of data volume, when amplitude of variation less than μ, then think the normal variation of special occasion network peak period.And the obtaining and to decide in conjunction with the actual conditions of application network of three threshold values.Filter only is to analyze after alarm draws herein, can not eliminate false-alarm and mistake police fully.
Step S04, judge whether to exist burst to ruin according to the merger result to hit the sexual behavior part;
Concrete grammar is:
Here define a tlv triple vector A i=<L j, R j, t 〉, A wherein iExpression warning group, L j, R jThe expression fault type, t represents the time that fault takes place.The total amount of data of transmitting shown in the above-mentioned table middle grade I then tentatively being judged as burst and ruining and hit the sexual behavior part for increasing, and the connectedness of network is impacted, and the above trigger mechanism that produces of table middle grade II carries out the agreement adjustment.Ruin when hitting the sexual behavior part when detecting burst in the network, detect unusual router and send this tlv triple vector information to neighbor node, the reason of directive sending message is to avoid the Link State Advertisement storm.When the needs agreement was adjusted, initiator's node produced one and adjusts message, and this message sends along the Route Selection direction, and each intermediate node is adjusted accordingly according to protocol after receiving and adjusting message.Start back off timer simultaneously, this node or link are not detected in the timer time, when avoiding fault detect to measure the variation of system mode, when state is adjusted, still alarm.
Step S05, the whole flow of adjusted network is assessed, verified its correctness.
Concrete steps are as follows:
Step 501, describe according to Kalman filtering algorithm, whole network is regarded as a system, the flow of each link is the amount of being concerned about, then to this system modelling.System control amount is generally 0, ignores system noise here, and observation noise is an additive white Gaussian noise, and the state expression formula of a certain moment k system is as follows,
β(k|k-1)=F(k,k-1)β(k-1|k-1)
P(k|k-1)=FP(k-1|k-1)F’
Step 502, the value of measuring are that Network Management Protocols SNMP is resulting, and calcaneus rete network flow is directly corresponding, so observing matrix C=1.Obtain following recursive calculation formula:
β(k|k)=β(k|k-1)+Kg(k)(x(k)-β(k|k-1))
Kg(k)=P(k|k-1)/(P(k|k-1)+R)
P(k|k)=(I-Kg(k))P(k|k-1)
Step 503, obtain the amount of model correspondence;
How amount with the model correspondence obtains doing description one by one below.
(k k-1) is state-transition matrix to F, and the transfer matrix of system is a route matrix here; β (k) is a k system mode constantly, and x (k) is a measured value, can measure by SNMP; P (k) is an estimation error covariance, can be calculated by β (k)-β (k|k).
Step 504, starting working in order to make Kalman filter, need tell the Kalman two zero initial values constantly, is X (0|0) and P (0|0), just will select initial moment value.Here with K constantly as the initial moment, so each the link flow value of network that makes X (0|0) read from SNMP constantly for K, P (0|0) is 0 constant, estimates the value of K+2....... network integral body flow constantly thus.
Step 505, when network topology changes, need be according to the route matrix after changing recursive calculation again.
Step 506, to the assessment of model with least mean-square error (MSE), δ is for estimating the poor of evaluation and actual value in definition, i.e. δ=Xi-Xi ', so
MSE = Σ i = 1 n ( δ 2 ) / n
Prediction is about to the flow value of arrival, afterwards real traffic and the predicted flow rate that obtains is compared, and judges whether both absolute difference have surpassed predetermined threshold value.If threshold value is T, moment t real traffic X tWith predicted flow rate X t *Difference be dt=|X t-X t *|, this species diversity is changed into the fault amount do further judgement, when dt-ε>T, show the generation Traffic Anomaly, the adjusted network of agreement still exists unusual link or node.ε=E[e wherein t 2] be predicated error.
The embodiment of the invention also provides a kind of network burst to ruin tolerance and the evaluating system that hits the sexual behavior part, comprising:
First detecting unit is used to detect and obtain each relevant link flow change information of router;
Second detecting unit is used to detect router and transmits the conceptual data amount, obtains the conceptual data quantitative change information of forwarding;
Processing unit is used for merger and analyzes each the link flow information obtained and the total amount of data change information of forwarding;
Identifying unit judges whether to take place accident according to merger and analysis result.
This system also comprises:
The route adjustment unit is used to adjust route; When judging that described link failure, neighbor node lost efficacy or the interdependent node of described link is under attack, then relevant route is adjusted.
The tolerance and the appraisal procedure of the incident of hitting ruined in the burst that the embodiment of the invention provides, by detecting and obtain each relevant link flow change information of router; Router is transmitted the conceptual data amount detect, obtain the conceptual data quantitative change information of forwarding; Each the link flow information obtained and the total amount of data change information of forwarding are carried out merger and analysis; Thereby judge whether to take place accident.
The present invention adopts distributed traffic abnormality detection strategy, and each node is finished volume forecasting and abnormality detection independently in data acquisition, and offered load is little.Hardware supports that need be extra not, resource overhead is few.Has reliability, the availability of the fail safe of assurance itself and the network segment of managing; In time detecting network burst ruins and hits the sexual behavior part and provide early warning information for emergency response personnel.
Obviously, those skilled in the art should be understood that, above-mentioned each unit of the present invention or each step can realize with the general calculation device, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation element forms, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the storage device and carry out by calculation element, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of unit in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention.All any modifications of being done within the spirit and principles in the present invention, be equal to replacement, improvement etc., all be included in protection scope of the present invention.

Claims (8)

1. the tolerance of a network accident and appraisal procedure is characterized in that, comprising:
Detect and obtain each relevant link flow change information of router;
Router is transmitted the conceptual data amount detect, obtain the conceptual data quantitative change information of forwarding;
Each the link flow change information that obtained and the total amount of data change information of forwarding are carried out merger and analysis;
Judge whether to take place accident according to merger and analysis result;
Ruin when hitting the sexual behavior part when detecting burst in the network, detect unusual router and send the tlv triple vector information to neighbor node; When the needs agreement was adjusted, initiator's node produced one and adjusts message, and described adjustment message sends along the Route Selection direction, and each intermediate node is adjusted accordingly according to protocol after receiving and adjusting message.
2. the method for claim 1 is characterized in that, described detection is also obtained each relevant link flow change information of router, specifically comprises:
Collect the link flow value relevant with router, the data flow that obtains is assembled calculating;
To assemble result's branch odd even deposits, the cluster set that utilizes this gathering result to carry out adjacent isometric window as the threshold value of weighing the variation size again compares, at the fixed time in the length, the amplitude of variation that network traffics are compared with the historical flow in past, judged whether to happen suddenly to ruin and hit the incident generation, and the link that record breaks down is a faulty link.
3. the method for claim 1 is characterized in that, describedly router is transmitted the conceptual data amount detects, and obtains the conceptual data quantitative change information of forwarding, specifically comprises:
Collect the data amount information that router is transmitted, it is adaptively sampledly obtained a time series;
Choosing a sliding window in this time series is reference, detect the next adjacent measured value of this sliding window, obtain the data traffic amplitude of variation, judge then that when amplitude of variation exceeds confidential interval this measured value is unusual, and the node that record breaks down is a malfunctioning node.
4. the method for claim 1 is characterized in that, describedly judges whether to take place accident according to merger and analysis result, specifically comprises:
If the link flow sudden change also reduces to predetermined value, the total amount of data sudden change of forwarding also reduces to scheduled volume, then judges described link failure;
If the link flow sudden change also reduces to predetermined value, the total amount of data sudden change of forwarding also is increased to scheduled volume, then judges described link congestion;
If the link flow sudden change also is increased to predetermined value, the total amount of data sudden change of forwarding also reduces to scheduled volume, judges that then neighbor node lost efficacy;
If the link flow sudden change also is increased to predetermined value, the total amount of data sudden change of forwarding also is increased to scheduled volume, judges that then the interdependent node of described link is under attack.
5. method as claimed in claim 4 is characterized in that,
The duration that described link flow sports flow to be increased or reduce reaches scheduled duration and flow increases or the amplitude of minimizing reaches predetermined threshold.
6. the method for claim 1 is characterized in that, further comprises:
When judging that described link failure, neighbor node lost efficacy or the interdependent node of described link is under attack, then carry out the route adjustment.
7. the tolerance of a network accident and evaluating system is characterized in that, comprising:
First detecting unit is used to detect and obtain each relevant link flow change information of router;
Second detecting unit is used to detect router and transmits the conceptual data amount, obtains the conceptual data quantitative change information of forwarding;
Processing unit is used for merger and analyzes each the link flow change information obtained and the total amount of data change information of forwarding;
Identifying unit judges whether to take place accident according to merger and analysis result; Ruin when hitting the sexual behavior part when detecting burst in the network, detect unusual router and send the tlv triple vector information to neighbor node; When the needs agreement was adjusted, initiator's node produced one and adjusts message, and described adjustment message sends along the Route Selection direction, and each intermediate node is adjusted accordingly according to protocol after receiving and adjusting message.
8. system as claimed in claim 7 is characterized in that, also comprises:
The route adjustment unit is used to adjust route; When judging that described link failure, neighbor node lost efficacy or the interdependent node of described link is under attack, then relevant route is adjusted.
CN2009100076279A 2009-02-12 2009-02-12 Evaluation method and system for network burst affair Expired - Fee Related CN101483547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100076279A CN101483547B (en) 2009-02-12 2009-02-12 Evaluation method and system for network burst affair

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100076279A CN101483547B (en) 2009-02-12 2009-02-12 Evaluation method and system for network burst affair

Publications (2)

Publication Number Publication Date
CN101483547A CN101483547A (en) 2009-07-15
CN101483547B true CN101483547B (en) 2011-05-11

Family

ID=40880489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100076279A Expired - Fee Related CN101483547B (en) 2009-02-12 2009-02-12 Evaluation method and system for network burst affair

Country Status (1)

Country Link
CN (1) CN101483547B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355381B (en) * 2011-08-18 2014-03-12 网宿科技股份有限公司 Method and system for predicting flow of self-adaptive differential auto-regression moving average model
US9948532B2 (en) * 2013-12-11 2018-04-17 Mitsubishi Electric Corporation Information processing apparatus, information processing method, and computer readable medium
CN103872640B (en) * 2014-03-21 2016-05-11 国家电网公司 A kind of distribution power automation terminal unit off-line fault rapidly self-healing control method
CN104156524B (en) * 2014-08-01 2018-03-06 河海大学 The Aggregation Query method and system of transport data stream
WO2016112484A1 (en) * 2015-01-12 2016-07-21 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for router maintenance
CN105072418B (en) * 2015-08-27 2019-01-15 浙江宇视科技有限公司 A kind of method and apparatus that judgement monitoring frontend is offline
CN105376614A (en) * 2015-10-27 2016-03-02 南京创维信息技术研究院有限公司 Video quality optimizing method and device
CN105592151A (en) * 2015-12-18 2016-05-18 畅捷通信息技术股份有限公司 Data-processing method and device
US20180005127A1 (en) * 2016-06-29 2018-01-04 Alcatel-Lucent Usa Inc. Predicting problem events from machine data
JP6611353B2 (en) * 2016-08-01 2019-11-27 クラリオン株式会社 Image processing device, external recognition device
CN106656590B (en) * 2016-12-14 2019-09-27 北京亿阳信通科技有限公司 A kind for the treatment of method and apparatus of network equipment alarm information storm
WO2019153337A1 (en) * 2018-02-12 2019-08-15 深圳前海达闼云端智能科技有限公司 Network quality evaluation method and apparatus, network detection device, and readable storage medium
CN108881295A (en) 2018-07-24 2018-11-23 瑞典爱立信有限公司 For detecting and solving the method and the network equipment of anomalous routes
CN109639485A (en) * 2018-12-13 2019-04-16 国家电网有限公司 The monitoring method and device of electricity consumption acquisition communication link
CN110034956A (en) * 2019-03-27 2019-07-19 广州供电局有限公司 Network Data Control method, apparatus, computer equipment and storage medium
CN112132495A (en) * 2019-06-25 2020-12-25 顺丰科技有限公司 State machine quantization method, device, equipment and storage medium
CN110995525A (en) * 2019-10-31 2020-04-10 北京直真科技股份有限公司 Router detection method based on maintenance matrix
CN111147899B (en) * 2019-12-16 2023-05-23 南京亚信智网科技有限公司 Fault early warning method and device
CN111404770B (en) * 2020-02-29 2022-11-11 华为技术有限公司 Network device, data processing method, device and system and readable storage medium
US11432313B2 (en) * 2020-05-29 2022-08-30 Apple Inc. Detecting cellular network bottlenecks through analysis of resource allocation patterns

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030926A (en) * 2006-02-28 2007-09-05 华为技术有限公司 Method for controlling network data flow of global microwave access inter-operation
CN101123587A (en) * 2007-09-13 2008-02-13 杭州华三通信技术有限公司 Traffic control method and device for switch service flow

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030926A (en) * 2006-02-28 2007-09-05 华为技术有限公司 Method for controlling network data flow of global microwave access inter-operation
CN101123587A (en) * 2007-09-13 2008-02-13 杭州华三通信技术有限公司 Traffic control method and device for switch service flow

Also Published As

Publication number Publication date
CN101483547A (en) 2009-07-15

Similar Documents

Publication Publication Date Title
CN101483547B (en) Evaluation method and system for network burst affair
US8634314B2 (en) Reporting statistics on the health of a sensor node in a sensor network
US8638680B2 (en) Applying policies to a sensor network
EP2795841B1 (en) Method and arrangement for fault analysis in a multi-layer network
CN103081407B (en) Fail analysis device, trouble analysis system and failure analysis methods
US20190327130A1 (en) Methods, control node, network element and system for handling network events in a telecomunications network
KR20180120558A (en) System and method for predicting communication apparatuses failure based on deep learning
US20220360502A1 (en) Adaptive stress testing of sd-wan tunnels for what-if scenario model training
US20120026938A1 (en) Applying Policies to a Sensor Network
EP2521306B1 (en) Ethernet traffic statistical analysis method and system
CN101369897B (en) Method and equipment for detecting network attack
CN112564964B (en) Fault link detection and recovery method based on software defined network
WO2002046928A1 (en) Fault detection and prediction for management of computer networks
US10447561B2 (en) BFD method and apparatus
US20120203788A1 (en) Network management system and method for identifying and accessing quality of service issues within a communications network
CN104869014B (en) A kind of Ethernet fault location and detection method
CN105591843B (en) Network performance detection method and system in TCP transmission stream based on receiving end
US20030023716A1 (en) Method and device for monitoring the performance of a network
CN110224883A (en) A kind of Grey Fault Diagnosis method applied to telecommunications bearer network
JP2010088031A (en) Fault detection method of underlay network, and network system
CN105634796A (en) Network device failure prediction and diagnosis method
CN111600805B (en) Bayes-based power data network congestion link inference method
CN102684902A (en) Network fault positioning method based on probe prediction
CN107426051B (en) The monitoring method of the working condition of distributed cluster system interior joint, apparatus and system
CN104486113A (en) Fault link positioning method based on active greed and passive greed in sensor network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110511

Termination date: 20200212