CN108923975A

CN108923975A - A kind of traffic behavior analysis method of Based on Distributed network

Info

Publication number: CN108923975A
Application number: CN201810728186.0A
Authority: CN
Inventors: 马海寿; 谢逸; 王臻
Original assignee: National Sun Yat Sen University
Current assignee: Sun Yat Sen University; National Sun Yat Sen University
Priority date: 2018-07-05
Filing date: 2018-07-05
Publication date: 2018-11-30
Anticipated expiration: 2038-07-05
Also published as: CN108923975B

Abstract

The present invention provides a kind of traffic behavior analysis method of Based on Distributed network, the method includes：On-premise network flow collection scheme；Acquire historical traffic data；Training pattern；Obtain traffic behavior model；Acquire real-time traffic data；Estimate the behavior of network global traffic.Entire distributed network is considered as an entirety by the present invention, by acquiring network node flow information, using network node traffic behavior sky when context relation, analyze network flow inner behavior state, it realizes the monitoring to network global traffic behavior, can assist carrying out the network managements such as scheduling of resource, abnormality detection work.

Description

A kind of traffic behavior analysis method of Based on Distributed network

Technical field

The present invention relates to network management-application fields, more particularly, to a kind of traffic behavior of Based on Distributed network Analysis method.

Background technique

With the fast development of network information technology, network size it is unprecedented expand and various network applications it is extensive It uses, network incorporates the fields such as politics, economy, culture dearly, and the diversity of network brings pole to the Working Life of people Big convenience, at the same time, the complexity of network increase the difficulty of network management and maintenance so that network administrator face it is all It is mostly difficult.The propulsion of IPv6 technology promotes network protocol from IPv4 to IPv6 transition, makes often occur IPv4/IPv6 dual stack in network Parallel situation, which increase the difficulty of Network Abnormal investigation.Various networks include WLAN, wireless MAN, public The access such as mobile communications network internet expands the scale of network, and the isomerism of network brings tired to the operation and maintenance of network It is difficult.The appearance of cloud computing, the rise of social networks, the development of multimedia technology promote network application flow complicated and changeable, seriously When can occupy the bandwidth of regular traffic in network, this brings challenges to distributing rationally for network bandwidth.On the other hand, it emerges one after another Network security problem influence the normal operation of operator, the networks such as enterprise, often result in economic loss, the safety management of network Important task as network administrator.

It is above-mentioned various in order to be solved the problems, such as in complicated network environment, enhance network management capabilities, establishes and stablize, is safe Network environment, academia and industry propose many methods for analyzing network behavior.Including：Towards single-point and face To the flow analysis method of multiple spot.In the flow analysis method towards single-point, paper " Zhao D, Traore I, Sayed B, et al.Botnet detection based on traffic behavior analysis and flow intervals [J].Computers&Security,2013,39(4):2-16. " 0 proposes that a kind of network flow analysis method, Main Analysis are logical The information such as the network flow feature in communication network, including source purpose IP address, source destination port, agreement, packet length, and then detect network In zombie host, this method is deployed in network key node, and the traffic characteristic of analysis node is stiff based on decision Tree algorithms detection Corpse host.A kind of method for detecting abnormality based on network traffic analysis of publication, this method is by going deep into IP data packet Analysis proposes a more complete network flow initial characteristics collection, and is used according to different types of Network Abnormal dynamic select In the character subset of abnormality detection, class prediction is finally carried out to unknown sample according to character subset using Bayes classifier. Green alliance's science and technology releases a network traffic analysis product (green alliance's science and technology network traffic analysis system http:// Www.nsfocus.com.cn/products/details_22_2.html), which can pass through Simple Network Management Protocol (Simple Network Management Protocol, SNMP), Netflow agreement etc. collect routing device in network Flow information carries out the analysis of a variety of dimensions, including traffic conditions, flow constituent, the changes in flow rate trend etc. in network. In the flow analysis method towards multiple spot, paper " Jiang D, Xu Z, Zhang P, et al.A transform domain-based anomaly detection approach to network-wide traffic[J].Journal of Network&Computer Applications,2014,40(C):292-306. " proposes a kind of exception based on transform domain Detection method utilizes the source purpose (Origin- with identical destination node for studying the traffic behavior feature of network side Destination, OD) pair network flow information, network flow is considered as time series, obtains time series using S-transformation Time frequency signal realizes the abnormal flow inspection of network side by comparing normal discharge, abnormal flow in the different characteristics of high fdrequency component It surveys.Paper " Li Y, Luo X, Qian Y, et al.Network-Wide Traffic Anomaly Detection and Localization Based on Robust Multivariate Probabilistic Calibration Model[J] .Mathematical Problems in Engineering,2015,2015(1):1-26. " proposes a kind of network side exception stream The method of amount detection and positioning constructs flow by OD pairs in measurement network of flow such as data packet number, byte number, fluxion Matrix passes through assessment sample using the hidden variable probability theory method construct flow normal behaviour model that multivariable t is distributed Mahalanobis distance realizes abnormality detection and positioning.The patent of Li Zhi roc disclose a kind of network flow analysis system and Method, the system pass through the original flow information of each node in flow collection module acquisition network first, then extract original Application layer traffic information in flow information, then by carrying out statistical comparison to application layer traffic information, analyze application system In whether there is abnormal flow, realize the application layer analysis based on network flow.The patent of Guo Zulong discloses one kind and is based on Distributed network traffic analysis system and method, the system pass through flow information in flow collection module acquisition network first, Then network layer, transport layer and the application layer message in original flow information are extracted, then by network layer, transport layer and answering It is analyzed and processed with layer information, total flow situation, IP to IP data on flows, IP layer network data information and application layer is assisted View information is analyzed.Section carrys out a full flow safety analysis product of network release, and (section carrys out network full flow Safety Analysis System https://app.huaweicloud.com/product/00301-55020-0--0) 0, which passes through complete to network link Flow collection stores, totally according to analysis, there is sharp sensing capability to Network anomalous behaviors.

The above method can solve different network problems to a certain extent, but there is also some limitations：

(1) in the flow analysis method towards single-point, the flow by the node can be only got, office can only be analyzed Portion's network traffic information, it is difficult to the understanding to overall network traffic behavior is obtained, and network management cannot only lean on local message, or Only for local problem, need to formulate the scheme and strategy of total optimization from global angle.

(2) in the flow analysis method towards multiple spot, it is usually associated with the data on flows of multinode, but is not utilized Time brought by network topology information and the network interconnection, spatial context incidence relation, it is difficult to portray network node Between the whole flow behavior state of flow inner behavior state and the whole network, for the network with complication system characteristic Speech, local superposition are difficult to reflect the global behavior feature of the whole network.

Summary of the invention

In order to overcome the limitation of the prior art, the present invention proposes a kind of traffic behavior analysis side of Based on Distributed network Method.Network is considered as an entirety by this method, using network node traffic behavior sky when contextual information, analysis network is global Traffic behavior can disclose the whole flow behavior state of the flow inner behavior state and the whole network between network node, make Network administrator can have a global understanding to administration network.

In order to realize goal of the invention, the technical solutions adopted are as follows：

A kind of traffic behavior analysis method of Based on Distributed network can be realized to network global traffic behavioural analysis, Specially：

Model training stage：Training data of the network history data on flows as training pattern is acquired, network flow is obtained Behavior model；

The study stage：The real-time traffic data of acquisition are inputted into trained network-flow characteristic model, and utilize maximum Posterior estimator criterion obtains network global traffic behavior by iterative calculation.

It preferably, further include deployment network probe acquisition data on flows, tool in administration network before acquiring data on flows Body is to dispose probe in network node, acquires the data on flows of different grain size, different agreement level in network node, and be transmitted to Flow analysis center carries out data analysis.

Preferably, the realization process of the model training stage is specially：It determines network-flow characteristic model structure and estimates Count model parameter；

Determine network-flow characteristic model structure：

Distributed network flow behavioural information is divided into two layers：Hidden state layer and observation data Layer, observation data Layer be by The network node data on flows that network probe measurement obtains is constituted, and hidden state layer is made of the behavior pattern of network node, table Show in network in driving factors, directly drives network node flow external manifestation；Hidden state and sight are indicated using stochastic variable Measured value, therefore hidden state layer and observation data Layer constitute two random fields, i.e., hidden state field and observation field；

Define mathematic sign：In the network that one possesses N number of node,Indicate set of network nodes,It indicates In n-th of node of t-th of time slot,WhereinIndicate institute's nodes of locations at one's leisure Set, hasT is number of timeslots；Use S_t,nIndicate node x_t,nHidden state variable,It indicates random to become Measure S_tnAn example, whereinHidden state set is represented, thenExpression is defined onOn Hidden state family of random variables；Therefore, being able to use S indicates the hidden state field on [1, T],Indicate a configuration of S, WhereinIndicate all possible configuration set of hidden state field；Use similar expression, O_t,nIndicate node x_t,nObservation become Amount,Indicate stochastic variable O_t,nAn example, whereinIndicate observation value set, thenExpression is defined onOn observation family of random variables；Therefore, being able to use O indicates Observation field on [1, T],Indicate a configuration of O, whereinIndicate that observation site is possible to configuration set；

Relationship is developed when portraying the sky between hidden state field and observation field using HMRF model；

For hidden state field, a hypothesis is introduced：Spatially state of the node only with its a hop neighbor node has It closes, it is only related with its state at previous moment on the time；The probability of method based on statistical learning, hidden state field can pass through Following formula obtains：

Wherein,It indicates not including node x_t,nNetwork sky when nodes of locations set,WithIt respectively indicates Node x_t,nSpatial neighbors state and time neighbor state, λ indicate hidden state field parameters；

Local probability in formula (1), is obtained by the following formula：

Wherein, m indicates node state, time transition probabilityAccording to time hidden state transition probability Matrix A calculates, and A indicates hidden state from t moment to the state transition probability matrix at t+1 moment, i.e. time hidden state transfer composition Single order Markov Chain；A matrix is expressed from the next：

Wherein P_ijSubscript i and j respectively indicate node hidden state locating for t the and t+1 moment；Space migrating probability passes through Following formula obtains：

Wherein, U_t,n(m) edge energy function is indicated, andIndicate section Point x_t,nSpatial neighbors node,Indicate node x_t,nSpatial neighbors number of nodes, wherein potential function is defined as：V_t,n (m)=num α, wherein parameter alpha is used to portray present node and its spatial neighbors node influences each other the power of relationship, num table Show spatial neighbors node state and the different quantity of present node state；

For observation field, network node observation is obtained by network probe, i.e. the observation field of networkIt is given data；If the observation of a node is only related with the state of the node, observation Output probability of the field under hidden state-driven is obtained by the following formula：

Wherein, even multiply symbol subscript (t, n) expressionPr[O_t,n=k | S_t,n=m, θ_m] indicate t when It carves, node n exports the probability that observation is k under conditions of state is m, calculates for convenience, by observation O_t,nIt carries out discrete Change, frequency of use is next approximate instead of probability, i.e., comes approximate condition probability, parameter θ in the frequency distribution of state m using observation_m The distribution parameter for indicating observation in the state of specific is indicated used here as output probability matrix B, referred to as observes field parameters, B Matrix is expressed from the next：

Wherein P_mkIndicate the probability that node is k in state m output observation；

Network-flow characteristic model structure is determined as a result, and network-flow characteristic model is portrayed by HMRF model, therefore model Parameter is Ω={ A, α, B },

Estimate model parameter；

After collecting historical traffic data and determining traffic behavior model structure, historical traffic data training is utilized Model parameter Ω={ A, α, B }；For the ease of practical engineering application, the frequency of use approximation probability in calculating process, therefore count It needs before calculating to observation O_t,nCarry out discretization；

Its training process inputs historical traffic data o, i.e. network node observation, output model parameter Ω={ A, α, B }； Estimate that model parameter process step is as follows：

(3-1) initializes iteration poll initial value i, iteration stopping condition Iter, initial hidden state field s⁽¹⁾；

Wherein iteration poll initial value i is initialized as 1；Iteration stopping condition setting is iteration stopping number Iter, according to Experience is preferably arranged to 5-8 times；In addition, iteration stopping condition can also be set as front and back iterative process parameter variation range twice Threshold value stops iteration when variation range is less than given threshold value；Initial hidden state field s⁽¹⁾According to historical traffic data observation It is initialized using clustering algorithm, cluster categorical measure monitors demand according to real network and determines, categorical measure corresponding network Nodes ' behavior number of states, therefore categorical measure reflects the granularity portrayed network-flow characteristic, behavior state quantity is more, The traffic behavior granularity that can be portrayed is thinner；

(3-2) updates model parameter according to the configuring condition of hidden state field, when the frequency that foundation time state jumps updates Between hidden state transition probability matrix A, be transferred to the frequency of state j if moment t is in state i moment t+1 and be denoted as A_ij, then in A State transition probability P_ijEstimated value obtained by following formula：

Empirically formula determines α value, and between the preferred 0.5-10 of empirical value, α is bigger, indicates to interact between node and get over Greatly, the state of neighbor node influences the state of present node bigger, and vice versa；Frequency according to corresponding state output observation Rate distributed update output probability matrix B, if the frequency that state is m in sample and observation is k is B_mk, then being exported in B general Rate P_mkEstimated value obtained by following formula：

Meanwhile iteration wrap count adds 1, i.e. i=i+1；

(3-3) judges whether to meet stop condition, that is, judges i > Iter；

If 3-3-1) being judged as NO, hidden state field s is updated according to estimation behavior state process⁽ⁱ⁾, input data is history Data on flows o, "current" model parameterOutput data is to update hidden state field s⁽ⁱ⁾, wherein estimation behavior state process Initial hidden state field uses current hidden state field s^(i-1)；

Return step (3-2)；

If 3-3-2) being judged as YES, final mask parameter Ω={ A, α, B } is exported；

It, can be based on model parameter Ω={ A, α, the B } that historical traffic data training obtains, as stream according to above step Measure behavior model.

Preferably, the process in the study stage is：

According to the traffic behavior model and collected network node real-time traffic data got, network can be estimated Node flow inner behavior state,

The above process is equivalent under conditions of setting models parameter Ω and observation field o, one for estimating hidden state field ConfigurationAccording to MAP estimation criterion, optimal hidden state field estimated value is foundIt is equivalent to solve following formula：

According to Bayes' theorem, haveSince Pr [o] is a constant, Pr[s|o,Ω]∝Pr[o|s,Ω]·Pr[s|Ω]；Wherein prior probability Pr [s | Ω] and likelihood probability Pr [o | s, Ω] point Not Tong Guo formula (1), (2) be calculated, that is, be obtained by the following formula：

The hidden state field of network global optimum is obtained using the mode of iterative calculation, estimation behavior state process is input data For real-time traffic data o, model parameter Ω={ A, α, B }, output data is the hidden state field estimated value of networkEstimation behavior shape State process step is as follows：

(4-1) initializes iteration poll initial value i, initial hidden state field s⁽⁰⁾, iteration stopping condition Iter；

Iteration poll initial value i is initialized as 1；Priori knowledge init state according to state field and observation field relationship , or clustering algorithm init state field is used according to observation field；Iteration stopping condition setting is iteration stopping number Iter, Rule of thumb it is preferably arranged to 3-5 times；

(4-2) traverses all possible state value for each node in network, is selected most according to MAP estimation criterion The state value of maximum probability is equivalent to solve following formula as node current iteration round estimated result：

Meanwhile iteration wrap count adds 1, i.e. i=i+1；

(4-3) judges whether to meet stop condition, that is, judges i > Iter；

If 4-3-1) being judged as NO, return step (4-2) updates each node state again；

If 4-3-2) being judged as YES, end-state field estimated value is exportedThus network global traffic behavior shape is obtained State, network administrator can obtain the monitoring to the whole network behavior accordingly.

Compared with prior art, the beneficial effect of technical solution of the present invention is：The present invention discloses a kind of Based on Distributed net The traffic behavior analysis method of network.In administration network area, on-premise network probe collection network node data on flows utilizes net The contextual information when sky of network node flow behavior, establishes network-flow characteristic model, and this method can estimate network flow Inner behavior state, makes network administrator have a global understanding to administration network, and scheduling of resource, abnormality detection are carried out in guidance Equal network managements work.

Detailed description of the invention

Fig. 1 is this method overall procedure schematic diagram；

Fig. 2 is this method actual deployment block schematic illustration；

Fig. 3 is this method network node traffic behavior information schematic diagram；

Fig. 4 is that this method estimates model parameter flow chart；

Fig. 5 is that this method estimates behavior state flow chart；

Fig. 6 is that different moments network node data packet reaches mode state figure in embodiment；

Fig. 7 is the Fitted probability density function that data packet corresponds to normalization observation to expression patterns in embodiment.

Specific embodiment

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent；In order to better illustrate this embodiment, attached Scheme certain components to have omission, zoom in or out, does not represent the size of actual product；

To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.

The present invention overcomes the limitation of the prior art, and network is considered as an entirety, utilizes network node traffic behavior Sky when contextual relevance information, this relevance be originated from network itself interconnection and surrounding time network flow correlation Property, i.e., there is interaction in adjacent network node, there are similitude and node surrounding time streams for the traffic behavior of adjacent node There are similitude, this method can disclose the whole flow of flow inner behavior state and the whole network between network node for amount behavior Behavior state, enables network administrator to have a global understanding to administration network, and scheduling of resource, abnormality detection are carried out in guidance Equal network managements work.

Overall framework

A kind of traffic behavior analysis method of Based on Distributed network, this method belong to network management-application, realize to net Network global traffic behavioural analysis, this method overall procedure schematic diagram is as shown in Figure 1, include six steps, respectively：Step S1, On-premise network flow collection scheme；Step S2 acquires historical traffic data；Step S3, training pattern；Step S4 obtains flow Behavior model；Step S5 acquires real-time traffic data；Step S6 estimates the behavior of network global traffic.

By realizing in network node on-premise network probe, the network node data on flows of capture is transmitted to the step S1 Flow analysis center is further analysed；

The step S2 refers to that network probe acquires data on flows to network node, as the training of traffic behavior model Data；

The step S3, which refers to, trains behavior when can portray network flow sky according to collected historical traffic data Model, this method using Hidden Markov random field (Hidden Markov Random Field, HMRF) mathematical model to point The dynamic changing process of cloth network flow models；

The step S4 refers to that flow analysis center obtains traffic behavior model；

The step S5 refers to is acquired network flow data to be analyzed in real time by network probe in practical applications；

The step S6 refers to be estimated to obtain network global traffic behavior shape by real-time traffic data by traffic behavior model State, network administrator can obtain the monitoring to the whole network behavior accordingly, and guide and carry out the networks pipes such as scheduling of resource, abnormality detection Science and engineering is made.

Execution method of the invention is as follows：Deployment network probe acquires historical traffic data in administration network, is made For model training data input model training process, corresponding traffic behavior model is obtained by training, it in practical applications, will Collected real-time traffic data input trained traffic behavior model, utilize maximum a posteriori (Maximum A Posteriori, MAP) estimation criterion obtains network global traffic behavior by iterative calculation, it is realized accordingly to the whole network behavior Monitoring assists further network management to work.

The each step content of this method is described in detail below with reference to Fig. 1.

Step S1, on-premise network flow collection scheme

In order to analyze network-flow characteristic, it is necessary first to on-premise network flow collection scheme.As shown in Fig. 2, this method is logical The deployment network probe acquisition network flow on the node of administration network is crossed, while the data on flows of acquisition is transmitted to flow point Analysis center is analyzed for subsequent network traffic behavior.On-premise network flow collection scheme mainly includes following sub-step, step S1- 1, network node disposes probe, step S1-2, and network probe acquires network flow data, step S1-3, network probe and flow Analysis center's communication.

Step S1-1, network node dispose probe.This programme can be suitable for heterogeneous networks scene, including traditional routing Device, exchange mechanism at internet, the network based on SDN, network and above-mentioned hybrid network based on NFV.The net of deployment Network probe is a functional entity, and probe can be physical equipment, such as private server or hardware probe, be also possible to collect At software function entity on network devices, such as NetFlow or sFlow function, SNMP- on router or interchanger Virtual probe agent service or realized by NFV.

Step S1-2, network probe acquire network flow data.This method can acquire different grain size stream in network node Measure the information such as data, including packet data recording, stream rank record, traffic statistics record.For varigrained data on flows, adopt With different flow collection schemes.

The complete flow packet data recording information of network node is captured, needs to pass through in network equipment position deployment services device The mode of Port Mirroring can capture complete data on flows package informatin, including capture packet time stamp, IP address, port, The information such as agreement, packet length, server undertake storage data on flows, preliminary treatment data on flows, divide with flow as network probe Analyse the tasks such as center to center communications.For the network flow for acquiring network node stream rank, the network of NetFlow or sFlow is being supported to set Standby upper starting NetFlow or sFlow function, and acquire IP flow information on equipment all of the port, NetFlow or sFlow message The flow information for being included mainly includes data package size, flow per second, total flow etc., NetFlow or sFlow function conduct Network probe is integrated on network devices, is exported in a manner of traffic messages and is sent to flow analysis center.To acquire network section The traffic statistics of point start SNMP-agent clothes using the flow collection method based on snmp protocol on network devices Business, as the network probe of network node, the traffic statistics in equipment are stored in local management information in a specified pattern In library (Management Information Base, MIB), the MIB data in the flow analysis center requests network equipment are realized Acquisition to the various effective discharge data of network node.

Step S1-3, network probe and flow analysis center to center communications.For capturing the network of complete flow packet information Local data on flows is transmitted to flow analysis by probe, the communication mode based on client-server, each network probe Center；In the flow collection scheme based on NetFlow or sFlow, network probe is by the flow of acquisition actively to as acquisition The flow analysis center transmitted traffic data of device；In the flow collection scheme based on SNMP, flow analysis center is as SNMP- Manager actively requests traffic statistics to the network probe as SNMP-agent.

According to above scheme, realize the deployment of network flow acquisition scheme, can collect network node different grain size, The flow information of different agreement layers can be handled flow according to real network regulatory requirement.

Step S2 acquires historical traffic data

According to the network flow acquisition scheme of step S1 actual deployment, flow analysis center is by network probe in network section Point collects historical traffic data, as the training data of model training process.Historical traffic data include different grain size, The information such as the data on flows of different agreement level, such as byte number, data packet arrival rate, IP address, application protocol, are also possible to By the flow information being further processed, such as Fourier transformation, wavelet transformation are done to basic time domain data on flows, calculates and flows Measure the comentropy etc. of statistical variable.

In order to reduce the traffic from network probe to flow analysis center, frequency domain is such as calculated for some preliminary processing Calculating task can be deployed on network probe by signal or comentropy, and only the processing result of flow is sent in flow analysis The heart.

According to aforesaid way, the available network node historical traffic data in flow analysis center accordingly can be with training net Network traffic behavior model.

Step S3, training pattern

In step s3, it needs to complete 2 sub-steps：Step S3-1, determines traffic behavior model structure, step S3-2, Estimate model parameter.

In step S3-1, firstly, introducing this method to the modeling approach of network-flow characteristic.As shown in figure 3, for net The traffic behavior information of node is divided into two parts by a node (such as switch or router) in network, this method：It is considerable It surveys and unobservable part.Wherein, Observable part refers to the data on flows that acquisition can be directly measured by network probe, such as The information such as byte number, data packet arrival rate, IP address, application protocol, these measured values reflect the external table of node flow behavior It is existing, hereinafter referred to as " observation ".Unobservable part refers to the internal factor of driving node flow external manifestation, as behavior pattern, Inherent mechanism etc., these factors can not be obtained by network probe measurement, can only be estimated according to the observables of node, under Text is known as " hidden state ".

Distributed network is expanded to, as shown in Fig. 2, distributed network flow behavioural information is divided into two layers by modeling method： Hidden state layer and observation data Layer.The network node data on flows that observation data Layer is obtained by network probe measurement is constituted, hidden shape State layer is made of the behavior pattern of network node, indicates to directly drive the external table of network node flow in driving factors in network It is existing.Indicate hidden state and observation using stochastic variable herein, thus hidden state layer and observation data Layer constitute two with (hereafter " state " is stated equivalent " hidden state ", and " state field " states equivalent " hidden state for airport, i.e., hidden state field and observation field ").

In step S3-1, secondly, defining mathematic sign used in this method.Possess the network of N number of node at one In,Indicate set of network nodes,It indicates the of t-th time slotA node, whereinIt indicates institute's nodes of locations set at one's leisure, hasT is number of timeslots.It uses S_t,nIndicate node x_t,nHidden state variable,Indicate stochastic variable S_t,nAn example, whereinRepresent hidden state set It closes, thenExpression is defined onOn hidden state family of random variables.Therefore, it is possible to use S table Show the hidden state field on [1, T],Indicate a configuration of S, whereinIndicate all possible configuration set of hidden state field. Use similar expression, O_t,nIndicate node x_t,nObservation variable,Indicate stochastic variable O_t,nAn example, WhereinIndicate observation value set, thenExpression is defined onOn observation stochastic variable Race.Therefore, it is possible to use O indicates the observation field on [1, T],Indicate a configuration of O, whereinIndicate observation field All possible configuration set.

Relationship is developed in step S3-1, when finally, portraying the sky between hidden state field and observation field using HMRF model.

For hidden state field, this modeling method introduces an important hypothesis：Spatially a node is only jumped with its one adjacent The state for occupying node is related, only related with its state at previous moment on the time.Method based on statistical learning, state field Probability can be obtained by the following formula：

Wherein,It indicates not including node x_t,nNetwork sky when nodes of locations set,WithIt respectively indicates Node x_t,nSpatial neighbors state and time neighbor state, λ indicate state field parameters.

Local probability in formula (1), is obtained by the following formula：

Wherein, m indicates node state, time transition probabilityAccording to time hidden state transition probability Matrix A calculates, and A indicates hidden state from t moment to the state transition probability matrix at t+1 moment, i.e. time hidden state transfer composition Single order Markov Chain.A matrix is expressed from the next：

Wherein P_ijSubscript i and j respectively indicate node hidden state locating for t the and t+1 moment.Space migrating probability passes through Following formula obtains：

Wherein, U_t,n(m) edge energy function is indicated, andIndicate section Point x_t,nSpatial neighbors node,Indicate node x_t,nSpatial neighbors number of nodes, wherein potential function is defined as：V_t,n (m)=num α, wherein parameter alpha is used to portray present node and its spatial neighbors node influences each other the power of relationship, num table Show spatial neighbors node state and the different quantity of present node state.

For above-mentioned observation field, network node observation passes through the flow collection scheme disposed in step sl and directly obtains It obtains, i.e. the observation field of networkIt is given data.This method thinks the observation of a node Only related with the state of the node, output probability of the observation field under hidden state-driven is obtained by the following formula：

Wherein P_mkIndicate the probability that node is k in state m output observation.

Step S3-1 is completed as a result, determines traffic behavior model structure, traffic behavior model is portrayed by HMRF model, Therefore model parameter is Ω={ A, α, B }, following introduction step S3-2, estimates model parameter.

In step S3-2, after collecting historical traffic data and determining traffic behavior model structure, benefit is needed With historical traffic data training pattern parameter Ω={ A, α, B }.For the ease of practical engineering application, this method is in calculating process Frequency of use approximation probability, therefore need before calculating to observation O_t,nCarry out discretization.Estimate model parameter process such as Fig. 4 Shown, training process inputs historical traffic data o, i.e. network node observation, output model parameter Ω={ A, α, B }.Estimation Model parameter process step is as follows：

(1) iteration poll initial value i, iteration stopping condition Iter, original state field s are initialized⁽¹⁾。

Wherein iteration poll initial value i is initialized as 1.Iteration stopping condition setting is iteration stopping number Iter, according to Experience is preferably arranged to 5-8 times；In addition, iteration stopping condition may be set to be front and back iterative process parameter variation range twice Threshold value stops iteration when variation range is less than given threshold value.Original state field s⁽¹⁾Made according to historical traffic data observation It is initialized with clustering algorithm, such as Kmeans algorithm, clusters categorical measure and determined according to real network monitoring demand, classification number Corresponding network nodes ' behavior number of states is measured, therefore categorical measure reflects the granularity portrayed network-flow characteristic, behavior shape State quantity is more, and the traffic behavior granularity that can be portrayed is thinner.

(2) model parameter is updated according to the configuring condition of state field, the frequency renewal time jumped according to time state is hidden State transition probability matrix A is transferred to the frequency of state j and is denoted as A if moment t is in state i moment t+1_ij, then state in A Transition probability P_ijEstimated value obtained by following formula：

Meanwhile iteration wrap count adds 1, i.e. i=i+1.

(3) judge whether to meet stop condition, that is, judge i > Iter.

If 1) be judged as NO, state field s is updated according to estimation behavior state process in step S6⁽ⁱ⁾, input data is to go through History data on flows o, "current" model parameterOutput data is to update state field s⁽ⁱ⁾, wherein behavior shape is estimated in step S6 (1) step original state field of state process step uses current state field s^(i-1)。

Return step (2).

If 2) be judged as YES, final mask parameter Ω={ A, α, B } is exported.

Step S4 obtains traffic behavior model

Evolution process when the HMRF model that this method proposes can portray the sky of network-flow characteristic, physically discloses net The reason of network flow external manifestation.Model parameter mainly includes Ω={ A, α, B }, the change of network-flow characteristic mode time dimension Change can be used time hidden state-transition matrix A and portray, and the interaction relationship use space state field parameter alpha of space nodes is carved It draws, the relationship between observation and hidden state is portrayed using output probability matrix B.

Traffic behavior model works in flow analysis center, can be with flexible Application model according to real network regulatory requirement. For the different type historical traffic data obtained by step S2, for example, byte number, data packet arrival rate, IP address, using association The information such as view, can train to obtain different models according to step S3 at flow analysis center, such as data packet arrival rate model, stream Measure IP address model, application protocol model or the model for merging various flow rate data.

According to aforesaid way, flow analysis center can obtain different traffic behavior models, for actual flow behavior point Analysis uses, and can provide a user the traffic behavior analysis of various dimensions.

Step S5 acquires real-time traffic data

According to the network flow acquisition scheme disposed in step S1, real-time traffic data can be acquired in practical applications.

It, can real-time or periodical polling request net according to the deployment strategy of network administrator at flow analysis center Network probe data obtains network flow data to be analyzed, including byte number, data packet arrival rate, IP address, application protocol etc. Information, the network flow data reflect the flow external manifestation in current network environment.

According to actual monitoring demand, certain types of data on flows is selected, after sliding-model control, as observation Value input corresponding discharge behavior model, for further estimating the behavior of network global traffic.

According to aforesaid way, flow analysis center can collect real-time traffic data.

Step S6 estimates the behavior of network global traffic

At flow analysis center, according to collected net in the traffic behavior model and step S5 got in step S4 Network node real-time traffic data, can estimate network node flow inner behavior state, and network administrator can obtain pair accordingly The monitoring of the whole network behavior and guides and carries out the network managements such as scheduling of resource, abnormality detection work.

According to Bayes' theorem, haveSince Pr [o] is a constant, Pr[s|o,Ω]∝Pr[o|s,Ω]·Pr[s|Ω].Wherein prior probability Pr [s | Ω] and likelihood probability Pr [o | s, Ω] point Not Tong Guo formula (1), (2) be calculated, that is, be obtained by the following formula：

This method obtains network global optimum state field using the mode of iterative calculation, estimates behavior state process such as Fig. 5 It is shown.Input data is real-time traffic data o, model parameter Ω={ A, α, B }, and output data is network state field estimated value Estimate that behavior state process step is as follows：

(1) iteration poll initial value i, original state field s are initialized⁽⁰⁾, iteration stopping condition Iter.

Iteration poll initial value i is initialized as 1.Priori knowledge init state according to state field and observation field relationship , or clustering algorithm init state field is used according to observation field.Iteration stopping condition setting is iteration stopping number Iter, Rule of thumb it is preferably arranged to 3-5 times.

(2) for each node in network, all possible state value is traversed, maximum is selected according to MAP estimation criterion The state value of probability is equivalent to solve following formula as node current iteration round estimated result：

Meanwhile iteration wrap count adds 1, i.e. i=i+1.

(3) judge whether to meet stop condition, that is, judge i > Iter.

If 1) be judged as NO, return step (2) updates each node state again.

If 2) be judged as YES, end-state field estimated value is exportedThus network global traffic behavior state, net are obtained Network administrator can obtain the monitoring to the whole network behavior accordingly.

Particularly, network is considered as an entirety due to this method, the network behavior state estimated is an overall situation Optimal result indicates the state that network node most probable occurs, the tasks such as scheduling of resource, abnormality detection carried out according to this state It is totally optimal plan and strategy.

For collected different type data on flows, the meaning of corresponding behavior state is different, therefore this traffic behavior point Analysis method can provide the analysis to network flow various dimensions.Such as monitoring network node data packet arrival rate, traffic behavior shape State reflects that the data packet of present node can be known in network not after estimation obtains network global behavior state to expression patterns The data packet of different zones is to expression patterns in the same time, and when network node to expression patterns is in high traffic condition, network administrator can To avoid the service of starting consumption massive band width, preferentially meet high priority bandwidth demand, and when network node is at expression patterns In low flow condition, it can suitably loosen limitation.Such as monitoring network node application protocol network flow, the reflection of traffic behavior state Present node application protocol composition information can know in network different moments after estimation obtains network global behavior state The application protocol composition information of different zones, occupies certain P2P flows the node of larger proportion, and network administrator can be Appropriate situation limits its flow velocity, guarantees regular traffic operation.

This method is estimated to obtain network global traffic behavior state, and the flow scheduling of the whole network can be carried out according to this, such as Carry out load balancing or GreenNet application.According to the network difference node flow size state that estimation obtains, from global view Dispatching algorithm is designed at angle, by the flow of high flow capacity node toward low discharge node scheduling, realizes the whole network flow load balance.Work as network When interior joint is in low flow state, under the premise of guaranteeing that network connectivty and link utilization constrain, part is closed Node realizes GreenNet application to reduce the energy consumption of network to greatest extent in the case where guaranteeing network basic performance.

Due to contextual information when this method utilizes network node flow sky, can estimate to obtain network global traffic Behavior state can disclose contacting for network behavior state and flow external manifestation, and network administrator is helped to establish the view of the whole network The situation and Long-term change trend of network are made a general survey of in angle, grasp the information such as network load condition, the service condition of network application resources.Base It is modeled in historical traffic data, distribution and trend feature of the network in time, space, flow direction can be established, help network pipe Reason person carries out business demand, hot spot, trend etc. to go deep into excavation, the auxiliary network planning and design.According to various dimensions data on flows As the model that observation is established, evolution process when variation reflects network flow sky when the sky of traffic behavior state, Ke Yili With situation of change when the sky of network node state, network overall situation normal discharge behavior model is constructed, auxiliary realizes abnormality detection.

Embodiment

Embodiment illustrates the advantage of this method by taking the data packet to expression patterns for analyzing network node as an example.As shown in fig. 6, showing Example network includes 50 network nodes and 88 links, certain research network (certain German research network topology information of topological source Germany http://sndlib.zib.de/home.action), different nodes capture data on flows in administration network, here with data Packet arrival rate is as node observation, and node data packet to expression patterns is traffic behavior state, according to this Based on Distributed network Traffic behavior analysis method perceive the whole network different time different zones data packet to expression patterns, whole network data packet is arrived in realization The monitoring of expression patterns.According to above-mentioned implementation steps S1-S6, firstly, disposing flow collection scheme in the entire network；Secondly, Historical data packet arrival rate data are acquired in network node；Again, according to historical data packet arrival rate data training pattern；From Secondary, flow analysis center obtains data packet arrival rate model；Then, real time data packet arrival rate data are acquired in network node, Input data packet arrival rate model；Finally, by iterative calculation obtain portraying network packet to expression patterns global status maps. Network administrator can make a general survey of the situation and Long-term change trend of network, grasp network load condition, realize GreenNet, resource accordingly The application such as scheduling.

Different moments network node data packet reaches mode state figure as shown in fig. 6, different colours represent network node institute The different data packet at place to expression patterns, only shown in figure two kinds therein to expression patterns, mode 1 in grayed-out nodes corresponding diagram 7, Mode 2 in dark node corresponding diagram 7, Fig. 7 indicate both data packets to the Fitted probability of the corresponding normalization observation of expression patterns Density function reflects output distribution of the observation under different conditions, is reached according to the network global data packet that estimation obtains Mode, network administrator can the data packet of different nodes arrives expression patterns in awareness network in real time, in perception global network flow In behavior state, know that network node data packet to expression patterns and data packet arrival rate relationship, can be obtained according to behavior state The node load situation of the whole network is taken, auxiliary realizes the application such as GreenNet, scheduling of resource.

Since this method is a non-supervisory learning process, belong to a kind of multi-categorizer, by this method and typical cluster Method Kmeans is compared.Expression patterns are arrived according to actual data packet arrival rate estimation current network node, are arrived in data packet When expression patterns quantity is 5, Kmeans and this method entirety accuracy rate and macro F1 value performance comparison are as shown in Table 1, and difference reaches mould Accurate rate, recall rate, F1 value performance comparison are as shown in Table 2 under formula (state).In terms of Performance Evaluation, this method is chosen whole Accuracy rate, macro F1 value, accurate rate, recall rate, F1 value are as evaluation index.Wherein, accurate rate (precision), recall rate (recall), F1 value is that evaluation index is commonly used in two classification problems, and it is all that accurate rate P indicates that the positive class quantity being estimated correctly accounts for Estimation is positive the ratio of class quantity, and recall rate R indicates that the positive class quantity being estimated correctly accounts for the ratio of all true positive class quantity, F1 value is the harmomic mean of accurate rate and recall rate, i.e. F1=2PR/ (P+R), these three indexs are for measuring this method estimation often A kind of performance of state value, index value is higher, and to represent performance better.Whole accuracy rate indicates that all sample numbers correctly estimated account for The ratio of total number of samples, macro F1 value are the arithmetic mean of instantaneous values to the F1 value of each state, the two indexs represent model Overall performance, equally, index value is higher, and to represent performance better.

One Kmeans of table and this method entirety accuracy rate and macro F1 value performance comparison

The experimental results showed that this method is better than Kmeans method in each performance indicator, this method obtains preferable performance The reason is that when network node be in it is different to expression patterns when, it is possible to create identical arrival rate, i.e. observation under different mode Value, which exists, to be overlapped, and since network node behavior is there are time continuity and spatial coherence, the HMRF model of this method considers The status information of neighbor node when network sky, it is thus possible to preferably the node for belonging to homologous state is distinguished, however Kmeans method directly divides state according to observation data, therefore cannot obtain preferable effect on distinguishing network node state Fruit, this is that behavioural information gives estimation network behavior state bring gain when HMRF introduces empty.Therefore estimated using this method The state value effect arrived is preferable, describes the state of network entirety, estimates that the result of behavior state is used for according to this method Monitoring each dimension network flow data will be more acurrate, and the scheme and strategy for formulating network management accordingly are also total optimization.

Kmeans and this method accurate rate, recall rate, F1 value performance comparison when table two-state number is 5

Be worth explanation, the present embodiment is only an example of this method, but this method be not limited only to analyze it is single The network flow data of dimension, such as data packet arrival rate.This Based on Distributed network traffic behavior analysis method can provide Various dimensions network traffic analysis, using a variety of dimension datas on flows as observation, estimation obtains overall network traffic behavior state Distribution, network state represent in current network in operating mode, can disclose between network node flow inner behavior state with And the whole flow behavior state of the whole network, network administrator can know whole network data on flows and corresponding operating mode letter Breath analyzes the space time distribution of behavior state, realizes and monitor to network-flow characteristic various dimensions.

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims

1. a kind of traffic behavior analysis method of Based on Distributed network, which is characterized in that be specially：

Model training stage：Training data of the network history data on flows as training pattern is acquired, network-flow characteristic is obtained Model；

The study stage：The real-time traffic data of acquisition are inputted into trained network-flow characteristic model, and utilize maximum a posteriori Estimation criterion obtains network global traffic behavior by iterative calculation.

2. the method according to claim 1, wherein further including in the middle part of administration network before acquiring data on flows Affix one's name to network probe acquire data on flows, specifically dispose probe on the network node, for network node acquire different grain size, The data on flows of different agreement level, and be transmitted to flow analysis center and carry out data analysis.

3. the method according to claim 1, wherein the realization process of the model training stage is specially：Really Determine network-flow characteristic model structure and estimation model parameter；

Determine network-flow characteristic model structure：

Distributed network flow behavioural information is divided into two layers：Hidden state layer and observation data Layer, observation data Layer is by network The network node data on flows that probe measurement obtains is constituted, and hidden state layer is made of the behavior pattern of network node, indicates net In driving factors in network, network node flow external manifestation is directly driven；Hidden state and observation are indicated using stochastic variable, Therefore hidden state layer and observation data Layer constitute two random fields, i.e., hidden state field and observation field；

Define mathematic sign：In the network that one possesses N number of node,Indicate set of network nodes,It indicates in t N-th of node of a time slot,WhereinIndicate institute's nodes of locations set at one's leisure, HaveT is number of timeslots；Use S_t,nIndicate node x_t,nHidden state variable,Indicate stochastic variable S_t,n An example, whereinHidden state set is represented, thenExpression is defined onOn hidden shape State family of random variables；Therefore, being able to use S indicates the hidden state field on [1, T],Indicate a configuration of S, wherein Indicate all possible configuration set of hidden state field；Use similar expression, O_t,nIndicate node x_t,nObservation variable,Indicate stochastic variable O_t,nAn example, whereinIndicate observation value set, thenExpression is defined onOn observation family of random variables；Therefore, being able to use O indicates Observation field on [1, T],Indicate a configuration of O, whereinIndicate that observation site is possible to configuration set；

For hidden state field, a hypothesis is introduced：Spatially a node is only related with its state of a hop neighbor node, when Between on it is only related with its state at previous moment；The probability of method based on statistical learning, hidden state field can be by following Formula obtains：

Wherein,It indicates not including node x_t,nNetwork sky when nodes of locations set,WithRespectively indicate node x_t,nSpatial neighbors state and time neighbor state, λ indicate hidden state field parameters；

Local probability in formula (1), is obtained by the following formula：

Wherein, m indicates node state, time transition probabilityAccording to time hidden state transition probability matrix A It calculates, A indicates hidden state from t moment to the state transition probability matrix at t+1 moment, i.e. time hidden state transfer composition single order horse Er Kefu chain；A matrix is expressed from the next：

Wherein, U_t,n(m) edge energy function is indicated, and Indicate node x_t,n Spatial neighbors node,Indicate node x_t,nSpatial neighbors number of nodes, wherein potential function is defined as：V_t,n(m)= Num α, wherein parameter alpha is used to portray present node and its spatial neighbors node influences each other the power of relationship, and num indicates empty Between neighbor node state and the different quantity of present node state；

Wherein, even multiply symbol subscript (t, n) expressionPr[O_t,n=k | S_t,n=m, θ_m] indicate t moment, section Point n exports the probability that observation is k under conditions of state is m, calculates for convenience, by observation O_t,nDiscretization is carried out, is made Probability is replaced come approximate with frequency, i.e., comes approximate condition probability, parameter θ in the frequency distribution of state m using observation_mIt indicates The distribution parameter of observation in the state of specific, used here as output probability matrix B indicate, referred to as observation field parameters, B matrix by Following formula indicates：

Network-flow characteristic model structure is determined as a result, and network-flow characteristic model is portrayed by HMRF model, therefore model parameter For Ω={ A, α, B },

Estimate model parameter；

After collecting historical traffic data and determining traffic behavior model structure, historical traffic data training pattern is utilized Parameter Ω={ A, α, B }；For the ease of practical engineering application, the frequency of use approximation probability in calculating process, therefore calculate it Before need to observation O_t,nCarry out discretization；

Its training process inputs historical traffic data o, i.e. network node observation, output model parameter Ω={ A, α, B }；Estimation Model parameter process step is as follows：

Wherein iteration poll initial value i is initialized as 1；Iteration stopping condition setting is iteration stopping number Iter, rule of thumb It is preferably arranged to 5-8 times；In addition, iteration stopping condition can also be set as front and back iterative process parameter variation range threshold twice Value stops iteration when variation range is less than given threshold value；Initial hidden state field s⁽¹⁾Made according to historical traffic data observation It is initialized with clustering algorithm, cluster categorical measure monitors demand according to real network and determines, categorical measure corresponding network section Point behavior state quantity, therefore categorical measure reflects the granularity portrayed network-flow characteristic, behavior state quantity is more, energy The traffic behavior granularity enough portrayed is thinner；

(3-2) updates model parameter according to the configuring condition of hidden state field, and the frequency renewal time jumped according to time state is hidden State transition probability matrix A is transferred to the frequency of state j and is denoted as A if moment t is in state i moment t+1_ij, then state in A Transition probability P_ijEstimated value obtained by following formula：

Empirically formula determines α value, and between the preferred 0.5-10 of empirical value, α is bigger, indicates to interact between node bigger, The state of neighbor node is bigger on the influence of the state of present node, and vice versa；Frequency according to corresponding state output observation Distributed update output probability matrix B, if the frequency that state is m in sample and observation is k is B_mk, then output probability in B P_mkEstimated value obtained by following formula：

Meanwhile iteration wrap count adds 1, i.e. i=i+1；

(3-3) judges whether to meet stop condition, that is, judges i > Iter；

If 3-3-1) being judged as NO, hidden state field s is updated according to estimation behavior state process⁽ⁱ⁾, input data is historical traffic Data o, "current" model parameterOutput data is to update hidden state field s⁽ⁱ⁾, wherein estimate the initial of behavior state process Hidden state field uses current hidden state field s^(i-1)；

Return step (3-2)；

It, can be based on model parameter Ω={ A, α, the B } that historical traffic data training obtains, as flow row according to above step For model.

4. according to the method described in claim 3, it is characterized in that, the process in the study stage is：

According to the traffic behavior model and collected network node real-time traffic data got, network node can be estimated Flow inner behavior state,

The above process is equivalent under conditions of setting models parameter Ω and observation field o, estimates a configuration of hidden state fieldAccording to MAP estimation criterion, optimal hidden state field estimated value is foundIt is equivalent to solve following formula：

According to Bayes' theorem, haveSince Pr [o] is a constant, Pr [s | o,Ω]∝Pr[o|s,Ω]·Pr[s|Ω]；Wherein prior probability Pr [s | Ω] and likelihood probability Pr [o | s, Ω] lead to respectively Cross formula (1), (2) are calculated, that is, be obtained by the following formula：

The hidden state field of network global optimum is obtained using the mode of iterative calculation, estimation behavior state process is that input data is real When data on flows o, model parameter Ω={ A, α, B }, output data is the hidden state field estimated value of networkEstimate behavior state stream Steps are as follows for journey：

Iteration poll initial value i is initialized as 1；According to the priori knowledge init state field of state field and observation field relationship, or Person uses clustering algorithm init state field according to observation field；Iteration stopping condition setting is iteration stopping number Iter, according to Experience is preferably arranged to 3-5 times；

(4-2) traverses all possible state value for each node in network, is selected most probably according to MAP estimation criterion The state value of rate is equivalent to solve following formula as node current iteration round estimated result：

Meanwhile iteration wrap count adds 1, i.e. i=i+1；

(4-3) judges whether to meet stop condition, that is, judges i > Iter；

If 4-3-2) being judged as YES, end-state field estimated value is exportedThus network global traffic behavior state, net are obtained Network administrator can obtain the monitoring to the whole network behavior accordingly.