Full-network anomaly detection and positioning method based on robust multivariate probability calibration model
(I) the technical field
The invention relates to a network anomaly detection method, in particular to a full-network anomaly detection positioning method based on a robust multivariate probability calibration model.
(II) background Art
Under the current internet environment, various network abnormal events are layered endlessly, large-scale network intrusion such as DDos attack, botnet and the like brings serious threats to the safety operation of the internet, and network congestion, network failure and the like also seriously affect the service quality of the internet, so that the detection and the positioning of network abnormal behaviors are very necessary. Meanwhile, network anomalies are various and rapid in change, and are often hidden in complex and huge background traffic, so that great difficulty is brought to detection and positioning of the network anomalies.
The method is characterized in that a plurality of researches are carried out for network anomaly detection, system logs, audit information and the like of a host are used as data sources, and a method for anomaly detection based on the host is provided by adopting methods such as data mining and the like; using end-to-end round-trip delay, packet loss rate and other performance measurement data, and adopting a unitary time sequence analysis method to provide a single-path-based anomaly detection method; network flow data such as SNMP and NetFlow using a single link are provided, and an anomaly detection method based on the single link is provided by adopting methods such as machine learning and wavelet analysis. The method mainly focuses on local information of the network, the monitoring range is limited, but with the continuous expansion of the network scale, the data transmission speed is continuously accelerated, a plurality of network anomalies present strong global characteristics, the influences of the network anomalies disperse into a plurality of links or paths in the network, and the information is not obvious in local representation. By adopting the analysis and detection method based on the host, the single path or the single link, comprehensive measurement and global analysis can not be carried out on the network, and the detection precision is difficult to ensure.
In view of the above problems, lakhina et al propose a full network-wide anomaly detection method based on principal component analysis subspace PCA for the first time, comprehensively utilize flow statistical information of multiple paths to construct a normal model, and determine whether an anomaly occurs by judging whether the current situation deviates from the normal model; the subsequent research is along the thought of the whole network detection, and researches are carried out on the aspects of the time-space expansibility, the robustness, the real-time property, the abnormal measure and the like of the detection algorithm, so that the content of the whole network abnormal detection is enriched. The method comprehensively utilizes the flow data of the whole network, obtains the detection performance superior to a single-node, single-path or single-link method, and simultaneously adopts an abnormal detection method for establishing a normal model and comparing with the normal model without establishing an abnormal feature library, so that the known abnormal condition and the unknown abnormal condition can be detected, and the application range is wide. The whole network anomaly detection method based on the flow statistics improves the detection performance by introducing network information with wider range and more dimensionality, but the method also faces some practical problems when being deployed on a large-scale high-speed backbone network: firstly, the acquisition range is enlarged, acquisition devices are increased, and the network speed is increased, so that the condition of data acquisition loss caused by the failure of part of devices or the condition of data flow loss in the transmission process must be considered, and the abnormal detection method becomes unusable due to incomplete data; secondly, the actual flow of the backbone network is huge in data quantity and very complex, the accuracy of normal model construction is affected by various hidden abnormal flows, so that model parameters are difficult to select, the stability of an abnormal detection method is extremely difficult to guarantee, and some deliberate attack flows can also severely poison a detector; thirdly, the method can find the abnormity, but is also deficient in the aspect of abnormity positioning.
Dorothy Denning proposed a statistical model of network anomaly detection as early as 1987, and network anomaly detection becomes more important with the development of the Internet. According to the detection range, the method is distinguished by the traditional anomaly detection method based on a host, a single path and a single link, and the research of Yegneswaran and the like finds that the occurrence and development of network anomaly show the global change trend, the detection range is increased, and the improvement of anomaly detection effect exceeds the linear relation, so that the basis of the whole network detection method is laid; in 2004, lakhina et al put forward a full-network anomaly detection method based on flow statistics for the first time, reveal the low-dimensional characteristic of source-destination (OD) flow matrix, and use the flow statistics information of multiple paths comprehensively to construct a normal model, and determine whether an anomaly occurs by judging whether the current situation deviates from the normal model; subsequently, ling Huang et al improve an anomaly detection method based on PCA by using a stochastic matrix perturbation (stochastic matrix perturbation) theory, and provide a low-overhead full-network anomaly detection method of distributed PCA; the Brauckhoff et al generalizes PCA to obtain a Galerkin-based Karhunen-love transformation expansion calculation method and uses the method for detecting the whole network abnormality; ahmed et al propose a kernel recursive least square (kernel recursive least squares) on-line detection method for detecting timeliness. The method lacks research on actual network environment problems (such as abnormal noise interference and data loss) and abnormal positioning problems in network abnormality detection.
Ringberg et al conducted an in-depth study on the influence of abnormal flow noise on the performance of the detector, and indicated that large flow abnormality can cause the deviation of a normal model of a PCA detector, thereby increasing the false alarm rate of abnormal detection; rubinstein et al further studied the poisoning problem of the detector and quantitatively studied the poisoning ability of some poisoning attack methods on PCA anomaly detectors. Qianyukui et al enumerates 3 kinds of toxicity attack mechanisms, and proposes a toxicity defense mechanism RPCA based on robust PCA by using methods such as projection tracking and the like. The method mainly analyzes the harmfulness of different toxic attack modes to the PCA detector, researches the defense mechanism of the PCA detector in a targeted manner, and lacks the research on a common normal modeling method under a complex abnormal noise environment.
The data missing problem is a common practical problem in a plurality of fields, and how to obtain enough useful information from incomplete data is a problem to be solved urgently. Wangli Xu et al propose solutions to the goodness-of-fit inspection problem of models with time-varying coefficients missing for different types of variables, respectively; in the network field, yin Zhang et al indicates that data loss is faced in network traffic data measurement on a large-scale high-speed backbone network, and proposes a sparse normalized matrix decomposition (sparse normalized matrix factorization) method to completely interpolate and fill the missing data.
Network anomaly detection can determine when an anomaly occurs, and anomaly location is a very challenging task. Ringberg et al have studied and analyzed the deficiency that the detection method of the whole network anomaly based on PCA exists in the aspect of the abnormal localization; eriksson et al propose a base detect (basis detect) full network anomaly detection and location method, but the method can only locate the border router, and cannot provide more detailed anomaly location information.
(III) summary of the invention
The technical problem to be solved by the invention is as follows: the method can process complete data and data loss, has strong resistance to abnormal noise interference, has low sensitivity to model parameters, and has stable performance.
The technical scheme of the invention is as follows:
a data source model:
early Internet traffic research focused primarily on the temporal characteristics of single-link packets in an Internet Service Provider (ISP), thereby yielding traffic characteristics that were self-similar, long-correlated, etc. However, an ISP often contains hundreds or thousands of links, and the internet is composed of tens of thousands of such ISPs, and when viewed in such a large background, the spatial characteristics of traffic are highlighted, and the traffic matrix, which is a compact and concise description of traffic between nodes in a given network architecture, is a common model structure in network traffic research. Meanwhile, the analysis of the traffic data of all links of the whole network is a task which is difficult to complete, the OD flow traffic matrix is used for carrying out the traffic analysis of the whole network more directly and clearly, and the Point of presence (Point of presence) level traffic matrix is adopted as a researched data source.
PoP level traffic matrix: assuming that an Autonomous System (AS) has N PoP nodes, the traffic between any pair of PoP nodes is continuously measured in a certain cycle, the inflow node of the traffic is the source node, the outflow node is the destination node, and thus it can be referred to AS source-destination (OD) flow, and then the measurement values are arranged into an N × D matrix X, which is referred to AS a PoP level traffic matrix of the AS (AS shown in fig. 1), where N represents the number of measured cycles and D represents the number of OD traffic measurement values (D = N × N). Flow matrix ith row representationThe ith period is a vector formed by OD flow measurement values; column j represents the vector of which the j-th OD consists of the sequences of time measurements. Any element x of the traffic matrix ij Representing some measure of traffic at the jth OD of the ith cycle. The flow measure adopted by the invention is the flow size (byte number, packet number or flow number).
Data loss problem:
in the data acquisition process, the data volume of the high-speed backbone network is huge and the transmission is fast, so that the burden of acquisition equipment is increased and the stability is reduced, and data loss occurs; data loss may also occur during data transmission due to network congestion, device or link failure.
The loss of traffic data in network measurement is not all completely randomized, and in many cases the loss is highly structured, and 4 loss mechanisms are proposed herein to describe the situation of data loss that may be encountered in the network data acquisition and transmission process.
(1) Complete random deletion
A completely random miss refers to any element X in the traffic matrix X ij The loss, q, with a random probability, may occur as a result of occasional congestion at the flow measuring device, or as a result of random data loss due to unreliable transmission mechanisms used for the measurement data.
(2) Random absence of time period
The rows of the traffic matrix correspond to the acquisition cycles of the traffic, and the random loss of the time period refers to the loss of any row of elements in the traffic matrix X with a probability q, which may cause the loss of the acquired data in the time period due to overload of a storage device or program failure caused by excessive data volume during the centralized processing of the measured data.
(3) OD random deletions
The columns of the traffic matrix correspond to the OD flows, and random loss of OD corresponds to the loss of any column of elements in the traffic matrix X with probability q. This may occur due to OD source or destination identification errors caused by flow filtering or acquisition procedure errors, and link or router failures may also cause loss of data on the relevant OD.
(4) Random block deletion
A block random miss refers to a certain sub-matrix of the traffic matrix X being lost with a probability q. Such a structured absence may occur in case of failure of the acquisition device or a full memory lasting several acquisition cycles, with a data absence corresponding to a plurality of adjacent rows and columns of the traffic matrix. If the missing sub-block is set as a certain row of the traffic matrix, the time period random missing is converted, and if the missing sub-block is set as a certain column, the OD random missing is converted.
Modeling problem under abnormal noise:
under the condition that a sample is known, a maximum likelihood estimation is often used for estimating model parameters for establishing a model, the probability distribution function of known data is needed by using the method, but the probability distribution of the data is difficult to describe accurately, and the normal distribution has good operational property, so that the data generally meets the characteristic of the normal distribution. In the linear Gaussian regression model, the maximum likelihood estimation and the least square estimation are equivalent, least square fitting is carried out in an abnormal noise environment, and the deviation of fitting can be caused by outlier sample points (namely, a small amount of samples which have overlarge distribution difference with most samples in the sample set), so that the accuracy of the model is influenced. Therefore, in an actual internet environment, the acquired traffic contains complex abnormal traffic as an outlier of the traffic population, and the establishment of the model is influenced, and how to construct the traffic normality model more accurately under the circumstance is an urgent problem to be solved.
Abnormal positioning problem:
the network anomaly detection can find the anomaly and determine the time when the anomaly occurs, but in order to more accurately grasp the fault or safety problem in the network, the anomaly location is needed to be carried out in a targeted way. The abnormal positioning in the flow matrix X is the position X for determining the matrix row-column interleaving corresponding to the abnormal occurrence ij A row of X corresponds to the time when the anomaly occurred, and a column of X corresponds to the location (i.e., on some OD or ODs) where the anomaly occurred.
A full-network anomaly detection and positioning method based on Robust Multivariate Probabilistic Calibration Model (RMPCM) comprises the following steps:
step 1, modeling normal flow: establishing a normal model by using the collected flow data;
step 2, flow abnormity detection: measuring whether the sample is abnormal or not by using the Mahalanobis distance between the sample and the normal model;
step 3, abnormal OD positioning: the location of the anomaly occurrence is located by analysis of the contribution to the flow of the anomalous sample OD.
The normal flow modeling in the step 1 comprises the following steps: and (3) constructing a normal model under the condition of abnormal noise data and constructing a normal model under the condition of missing flow data.
The method for constructing the normal model under the abnormal noise data comprises the following steps: introducing multivariate t distribution into an implicit variable probability model, and solving the problem of normal model construction under abnormal noise data;
A. replacing normal distribution with t distribution, and establishing a hidden variable probability model:
let the hidden vector t of each d dimension i All come from a D-dimensional feature vector x i The linear probability projection of (1), wherein D is more than or equal to D, so as to establish a hidden variable probability model; in order to solve the problem that a Gaussian noise model is too sensitive to an abnormal observation value, t distribution is selected to replace normal distribution, unit variance t distribution is selected to serve as prior distribution of hidden vectors, and a probability model is as follows:
p(t i )=S(t i |0,I d ,v) (5)
p(x i |t i )=S(x i |Wt i +μ,τI D ,v) (6)
wherein mu is a position vector, I is a unit matrix, v is the degree of freedom of t distribution, W is a projection matrix, and tau is a model parameter to be solved;
solving the probability model by using maximum likelihood estimation directly, expanding a t distribution model into an infinite Gaussian mixture model with the same mean value, wherein the prior distribution of the t distribution model is gamma distribution, and the parameters are only related to the degree of freedom v of the t distribution:
wherein Ga (u | alpha, beta) = beta α u α-1 e -βu The/gamma (alpha) is a probability density function of gamma distribution, and the lambda is a scale matrix; according to the formula (7), introducing an implicit variable u, u being a scalar, u: ga (v/2 ), if t: t (mu, lambda, v), then t | u: N (mu, u, v) -1 Λ); the following hidden variable models were obtained:
in order to obtain the model parameters { μ, τ, W, v }, it is also necessary to obtain u | x and t | x, u, which can be obtained from equations (9) and (10):
p(x i |u i )=∫p(x i |t i ,u i )p(t i |u i )dt=N(μ,WW T +τI D /u i ) (11)
let Ψ = WW T +τI D For a matrix of D × D, the edge distribution of x follows a t distribution, which can be expressed as: p (x) i )=S(μ,Ψ,v),
From equation (8) and equation (11), the following can be obtained:
wherein, delta 2 =(x i -μ) T Ψ -1 (x i μ) from the equations (9) and (10):
p(t i |x i ,u i )∝p(x i |t i ,u i )p(t i |u i )=N(M -1 W T (x i -μ),τΜ -1 /u i ) (13)
Wherein M = W T W+τI d For a d x d matrix, t | x obeys a t distribution, i.e., p (t) i |x i )=S(M -1 W T (x i -μ),τΜ -1 ,v);
B. Carrying out model parameter solution:
the model parameters to be solved are omega = { mu, tau, W, v, d }, and firstly, the number d of reserved pivot elements is determined to be 5 by using a PCA-based lithograph; for { mu, tau, W, v }, the maximum likelihood estimation is adopted to carry out parameter estimation of the model, and the log-likelihood function is as follows:
wherein N is the number of samples;
in order to simplify the calculation, a REM (Rapid approximation-maximization) algorithm is adopted to obtain the maximum likelihood estimation of the model parameters, the REM can obviously improve the convergence speed of the algorithm, the REM algorithm comprises two stages, each stage adopts an EM algorithm to estimate different parameters, and then circulation of the two stages is carried out iteratively until the convergence condition is met;
the first stage is as follows:
this phase does not take t into account i Estimating only the parameter mu;
the log-likelihood function is:
e, step E:
according to the Jensen inequality, when p (u) i )=p(u i |x i ) Then, the equal sign is established, and L is established 1 Lower bound of (1), byObtaining L from the formula (11) 1 The expectation, see equation (16),<·> represents the calculation expectation:
wherein, the first and the second end of the pipe are connected with each other,
and M:
will be provided with<L 1 &And calculating the partial derivative of mu and making the partial derivative equal to 0 to obtain a mu estimated value:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
The log-likelihood function is:
e, step E:
when p (t) i ,u i )=p(t i ,u i |x i ) Then, L can be obtained from the formula (8), the formula (9) and the formula (10) 2 The expectation of (2):
wherein the content of the first and second substances,<u i >, see formula (17),
<t i >=M -1 W T (x i -μ) (21)
<u i t i >=<u i ><t i > (22)
and M:
maximization<L 2 &And obtaining an updating formula of the model parameters { W, tau, v }:
parameter(s)The estimate of (d) is obtained by a linear search of the following formula:
the normal model construction method under the condition of the flow data loss comprises the following steps:
sample data x for D dimension i If data in certain dimensions of the model are missing, the data can still be used for estimating model parameters; sample data x i Divided into observed and missing values, i.e.
Then sample data x i The mean and covariance matrices of (a) may be expressed as:
due to the fact thatThen
Wherein the content of the first and second substances,
A. replacing normal distribution with t distribution to establish a hidden variable probability model;
B. carrying out model parameter solving: determining the number d of the reserved pivot elements as 5, and performing model parameter estimation under missing data by using a REM algorithm for { mu, tau, W, v }:
the first stage is as follows:
e, step E:
wherein the content of the first and second substances,
<u i (x i -μ)(x i -μ) T >=Q i +<u i >(z i -μ)(z i -μ) T (31)
is sample data x i When some of the dimensions are missing, the number of observable dimensions remain;
and M:
will be provided with<L′ 1 &And g, calculating a partial derivative of mu and enabling the partial derivative to be equal to 0 to obtain a mu estimated value:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
E, step E:
wherein the content of the first and second substances,
m, step:
maximization<L′ 2 &And obtaining an updating formula of the model parameters { W, tau, v }:
the update of the parameter v is found by:
the specific method for detecting abnormal flow in step 2 is as follows:
selecting a metric for complex flow data to judge abnormal flow in the complex flow data; two strategies are commonly used for judging which abnormal points are in data, namely judging Hotelling's T 2 Whether the Square Prediction Error (SPE) exceeds the threshold value or not is judged to determine whether the sample point is an orthogonal abnormal point (orthogonal abnormal point); because the invention establishes a probability model, the Mahalanobis distance of the sample can be simply adopted to carry out flow anomaly detection, and the squared Mahalanobis distance of the complete data sample is delta 2 :
δ 2 =(x i -μ) T Ψ -1 (x i -μ) (38)
For samples with some dimension missing, their mahalanobis distance is squared to find their expectationInstead of:
the abnormality is interpreted using a normally distributed "3 σ" control chart, namely: when the time sequence is abnormal, the corresponding distribution exceeds the control limit; let us denote the mahalanobis distance squared time series as δ 2 (t) mean value of μ M Variance ofThe setting of the "3 σ" control map is as follows:
center line: CL = μ M
An upper control limit: UCL = μ M +3σ M
Lower control limits: LCL = μ M -3σ M
According to the normal distribution rule, the method comprises the following steps:
where Φ (-) is the probability distribution function of a standard normal distribution,
the discriminant formula of the 3 sigma control chart is as follows:
and (3 sigma) control chart is adopted to judge the abnormity, namely when the deviation of the value from the mean value exceeds 3 times of standard deviation, the occurrence of an abnormal event is judged, and the confidence coefficient is 99.74%.
The specific method for positioning the abnormal OD in the step 3 is as follows:
determining an abnormal sample, and further determining which dimensions of the sample (the sample dimension corresponds to the OD) the abnormal values in should be responsible for the abnormal sample, namely abnormal OD positioning; let the abnormal sample be x a In total, D dimensions, x j (j =1, \8230;, D) is x a The j-dimension variable is subjected to abnormal OD positioning by adopting the following contribution analysis method, and the method can be divided into two stages:
stage one: repeating the following steps a1 to c1 each time j =1, 2, 3, and D in sequence to obtain D execution results;
step a1, abnormal samples x a X in (2) j Assuming a missing, an observation and a missing value are obtained:and
abnormal sample x a The mean and covariance matrices of (a) are expressed as:
due to the fact thatThen
Step b1, calculatingThe conditional probability distribution yields the corresponding z a And Q a :
Computing an anomalous sample x a Squared mahalanobis distance of
Step c1, calculating x j Degree of contribution of (2)I.e. x a Andthe squared difference of the mahalanobis distance of (a), which represents the degree of reduction of the anomaly measure of the abnormal sample after removing a certain dimension;
of D execution results, ifLess than a threshold for anomaly discrimination, i.e. degree of contributionIf the value is greater than the set abnormal judgment value, the abnormal sample x is a Of (j) th dimension pair x a Is responsible for the abnormality of (c);
and a second stage:
if not found in stage oneIf the value is less than the threshold value of the abnormal judgment, the abnormal sample x is indicated a If the anomaly of a single dimension is not sufficient to cause an anomaly in the sample, then the following operations are performed:
step a2, according to the contribution degree of each dimensionSequencing the data of each dimension from large to small;
step b2, removing the data of the two dimensions arranged at the top according to the sequence arranged in the step a2, and if the squared Mahalanobis distance calculated at the moment is smaller than the threshold value for judging the abnormality, the two corresponding dimensions are used for matching the sample x together a Is responsible for the abnormality of (c); otherwise, continuously removing the data of the dimension arranged at the top, calculating the squared Mahalanobis distance once the data of the dimension arranged at the top is removed until the calculated squared Mahalanobis distance is smaller than the threshold value for judging the abnormality, and at this time, the data of all the removed dimensions are used for the sample x together a Is responsible for the abnormality of (a).
The invention has the beneficial effects that:
the multivariate t distribution is introduced into a hidden variable probability model instead of normal distribution, a normal model of a flow matrix is further established, abnormal flow detection is carried out by comparing the Mahalanobis distance (Mahalanobis distance) between a sample and the normal model, and abnormal OD positioning is further realized by a contribution analysis method. The method solves the problem of abnormal detection under the condition of incomplete data to be detected by establishing a hidden variable probability model, solves the problem of interference of abnormal noise on a normal model by introducing multivariate t distribution into a probability modeling process, and improves the detection precision; the invention has the advantages of better robustness, wide application scene, capability of processing complete data and data loss, stronger resistance to abnormal noise interference, lower sensitivity to model parameters, stable performance and reduction of complex parameter debugging work in actual deployment.
(IV) description of the drawings
FIG. 1 is a schematic view of a traffic matrix;
FIG. 2 is a diagram of the relationship steps of the RMPCM method;
FIG. 3 is a diagram of normal modeling steps under abnormal noise;
FIG. 4 is a graph model of RMPCM (shaded nodes are observed vectors, arrows indicate conditional correlations of random vectors);
FIG. 5 is a diagram of normal modeling steps in the absence of data;
FIG. 6 is a diagram illustrating a positioning procedure of abnormal ODs;
FIG. 7 is a diagram showing the results of detection under the condition of no missing of measured data;
FIG. 8 is a diagram showing the detection results of measured data under 4 deletion mechanisms;
FIG. 9 is a graph of sensitivity analysis of PCA (left side of the graph) and RMPCM (right side of the graph) to principal component numbers;
FIG. 10 is a graph of sensitivity analysis of PCA (left side of the graph) and RMPCM (right side of the graph) to flow measurements.
(V) detailed description of the preferred embodiments
The method for detecting and positioning the full-network anomaly based on a Robust Multivariate Probabilistic Calibration Model (RMPCM) comprises the following steps:
step 1, modeling normal flow: establishing a normal model by using the collected flow data;
step 2, flow abnormity detection: measuring whether the sample is abnormal or not by using the Mahalanobis distance between the sample and the normal model;
step 3, abnormal OD positioning: the location of the anomaly occurrence is located by analyzing the contribution to the flow of the anomalous sample OD.
The relationship between the abnormal event in the network and the network abnormal detection and location is (as shown in fig. 2): abnormal events can affect the statistics of partial flows in the network, the change of the statistics corresponding to the network flow total triggers an abnormal detector to alarm, and a network manager extracts the abnormal flows in the network through abnormal positioning after receiving the alarm, and then analyzes and determines the abnormal events;
the application of the network anomaly detection method in the actual internet environment often encounters some practical problems, such as the loss of traffic data in the transmission and collection processes, and the model offset caused by the anomaly traffic interference in the network.
Under the condition that flow data are incomplete, the traditional network anomaly detection method cannot be applied, a Bayes statistical calculation method is adopted in the invention, but the complexity of network flow data cannot be directly applied to obtain posterior mean value estimation and progressive variance thereof, an implicit variable probability model is introduced, namely, some potential data are added on the basis of known measurement data, so that the calculation is simplified to complete parameter estimation, a data missing part and unknown parameters can be taken as the potential data in the process, and an EM (expectation-maximization) algorithm is adopted to obtain Maximum Likelihood Estimation (MLE) of model parameters.
The probability distribution of known data is needed during maximum likelihood estimation, and the probability distribution is generally assumed to meet normal distribution, but because the actual network flow contains some abnormal noise flow, the deviation of parameter estimation is too large by adopting the normal distribution assumption, and therefore multivariate t distribution is introduced to replace multivariate normal distribution. Compared with normal distribution, t distribution has a heavy tail characteristic, abnormal samples have lower weight in the maximum likelihood estimation process after the t distribution is introduced, the influence on parameter estimation can be reduced, and the specific explanation is as follows:
the sample data is x i (i =1, \ 8230;, N) assuming that it satisfies an independent identically distributed normal random vector, denoted as
Wherein N is the number of samples, D is the dimension of the samples, mu is the mean vector function, the parameter is theta, sigma is the covariance matrix of DxD, and the parameter isWe adopt the t distribution model to replace the above normal distribution model, denoted as
Similarly, D is the dimension of the sample, μ is the location parameter (location parameter) vector, Λ is the scale matrix (scale matrix), and v is t The degree of freedom of distribution (degrees of freedom), with probability density function (p.d.f.) is:
wherein, delta 2 =(x-μ) T Λ -1 (x- μ) is the square of the mahalanobis distance of x to μ, Γ (·) is the gamma function; when v < ∞andmaximum likelihood estimation is used for parameter estimation, the robustness of t-distribution model to outlier interference is better than that of normal distribution model, mainly embodied in that outliers with larger mahalanobis distance contribute relatively less to model parameter estimation, and the parameter theta estimation is used belowExplanation is given by way of example:
for the normal distribution model (1), its maximum likelihood estimate satisfies the equation:where l is the log-likelihood function, A i Is mu i A matrix of partial derivatives to θ; under the t distribution model (2), the maximum likelihood estimation satisfies the equation:wherein, the first and the second end of the pipe are connected with each other,
for the weight of sample i, see asIncrease w of i Decreasing; abnormal samples have a larger delta i Smaller w i Therefore, the sensitivity to abnormal samples can be reduced by using t distribution in model parameter estimation, and the robustness of the t distribution model is better compared with that of a normal distribution model; v is an adjusting parameter of the t distribution model for robustness of outliers, and increases along with the reduction of v; when v → ∞, the normal distribution model is equivalent to the maximum likelihood estimation of the t distribution model.
The normal flow modeling in the step 1 comprises the following steps: and (3) constructing a normal model under the condition of abnormal noise data and constructing a normal model under the condition of missing flow data.
The method for constructing the normal model under the abnormal noise data comprises the following steps: introducing multivariate t distribution into an implicit variable probability model, and solving the problem of normal model construction under abnormal noise data (as shown in figure 3);
A. and (3) replacing normal distribution with t distribution to establish a hidden variable probability model:
let the hidden vector t of each d dimension i Are all from a D-dimensional feature vector x i Linear probability projection of (1), wherein D ≧ D, is establishedEstablishing a hidden variable probability model; in order to solve the problem that a Gaussian noise model is too sensitive to an abnormal observation value, t distribution is selected to replace normal distribution, unit variance t distribution is selected to serve as prior distribution of hidden vectors, and a probability model is as follows:
p(t i )=S(t i |0,I d ,v) (5)
p(x i |t i )=S(x i |Wt i +μ,τI D ,v) (6)
wherein mu is a position vector, I is a unit matrix, v is the degree of freedom of t distribution, W is a projection matrix, and tau is a model parameter to be solved;
solving the probability model by using maximum likelihood estimation directly, expanding a t distribution model into an infinite Gaussian mixture model with the same mean value, wherein the prior distribution of the t distribution model is gamma distribution, and parameters are only related to the degree of freedom v of the t distribution:
wherein Ga (u | α, β) = β α u α-1 e -βu The/gamma (alpha) is a probability density function of gamma distribution, and the lambda is a scale matrix; according to the formula (7), introducing an implicit variable u, u being a scalar, u: ga (v/2 ), if t: t (mu, lambda, v), then t | u: N (mu, u, v) -1 Λ); the following hidden variable model (as shown in fig. 4) was obtained:
in order to obtain the model parameters { μ, τ, W, v }, it is also necessary to obtain u | x and t | x, u, which can be obtained from equations (9) and (10):
p(x i |u i )=∫p(x i |t i ,u i )p(t i |u i )dt=N(μ,WW T +τI D /u i ) (11)
let Ψ = WW T +τI D For a matrix of D × D, the edge distribution of x follows a t distribution, which can be expressed as: p (x) i )=S(μ,Ψ,v),
From equation (8) and equation (11), the following can be obtained:
wherein, delta 2 =(x i -μ) T Ψ -1 (x i μ), which can be determined from equation (9) and equation (10):
p(t i |x i ,u i )∝p(x i |t i ,u i )p(t i |u i )=N(M -1 W T (x i -μ),τΜ -1 /u i ) (13)
wherein M = W T W+τI d For a d x d matrix, t | x obeys a t distribution, i.e., p (t) i |x i )=S(M -1 W T (x i -μ),τΜ -1 ,v);
B. Carrying out model parameter solving:
the model parameters to be solved are omega = { mu, tau, W, v, d }, and firstly, the number d of reserved pivot elements is determined to be 5 by using a PCA-based lithograph; for { mu, tau, W, v }, the maximum likelihood estimation is adopted to carry out parameter estimation of the model, and the log likelihood function is as follows:
wherein N is the number of samples;
in order to simplify calculation, a REM (Rapid estimation-maximization) algorithm is adopted to obtain the maximum likelihood estimation of the model parameters, the REM can obviously improve the convergence speed of the algorithm, the REM algorithm comprises two stages, each stage adopts an EM algorithm to estimate different parameters, and then circulation of the two stages is carried out iteratively until the convergence condition is met;
the first stage is as follows:
this phase does not take t into account i Estimating only the parameter mu;
the log-likelihood function is:
e, step E:
according to the Jensen inequality, when p (u) i )=p(u i |x i ) Then, the equal sign is established, and L is established 1 The lower bound of (2), L is obtained from the formula (11) 1 The expectation, see equation (16),<·>, representing the computational expectation:
wherein, the first and the second end of the pipe are connected with each other,
m, step:
will be provided with<L 1 &And g, calculating a partial derivative of mu and enabling the partial derivative to be equal to 0 to obtain a mu estimated value:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
The log-likelihood function is:
e, step E:
when p (t) i ,u i )=p(t i ,u i |x i ) Then, L can be obtained from the formula (8), the formula (9) and the formula (10) 2 The expectation of (2):
wherein, the first and the second end of the pipe are connected with each other,<u i >, see formula (17),
<t i >=M -1 W T (x i -μ) (21)
<u i t i >=<u i ><t i > (22)
m, step:
maximization<L 2 &And gt, obtaining an updating formula of model parameters { W, tau, v }:
parameter(s)The estimate of (d) is obtained by a linear search of the following formula:
the method for constructing the normal model under the condition of missing flow data comprises the following steps (as shown in figure 5):
sample data x for D dimension i If data on certain dimensions of the model are missing, the data can still be used for estimating model parameters; sample data x i Divided into observed and missing values, i.e.
Then sample data x i The mean and covariance matrices of (a) can be expressed as:
due to the fact thatThen the
Wherein, the first and the second end of the pipe are connected with each other,
A. replacing normal distribution with t distribution to establish a hidden variable probability model;
B. carrying out model parameter solution: determining the number d of the reserved pivot elements as 5, and performing model parameter estimation under the condition of missing data by using a REM algorithm for { mu, tau, W, v }:
the first stage is as follows:
e, step E:
wherein, the first and the second end of the pipe are connected with each other,
<u i (x i -μ)(x i -μ) T >=Q i +<u i >(z i -μ)(z i -μ) T (31)
is sample data x i When some dimensions are missing, the number of observable dimensions remain;
m, step:
will be < L' 1 The partial derivative of μ is taken and made equal to 0, and the μ estimate is obtained:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
E, step E:
wherein the content of the first and second substances,
m, step:
l 'of maximization' 2 >. Obtaining an update formula of model parameters { W, τ, v }:
the update of the parameter v is obtained by:
the specific method for detecting the abnormal flow in the step 2 is as follows:
for complex flow data, selecting a metric to determine abnormal flow therein; two strategies are commonly used for judging which abnormal points are in data, namely judging Hotelling's T 2 Whether the Square Prediction Error (SPE) exceeds the threshold value or not is judged to determine whether the sample point is an orthogonal abnormal point (orthogonal abnormal point) or not; because the invention establishes the probability model, the probability model can be simply adoptedDetecting abnormal flow by the Mahalanobis distance of the sample, wherein the squared Mahalanobis distance of the complete data sample is delta 2 :
δ 2 =(x i -μ) T Ψ -1 (x i -μ) (38)
For samples with some dimension missing, their Mahalanobis distance squared to find their expectationInstead of:
the abnormality is interpreted using a normally distributed "3 σ" control chart, namely: when the time sequence is abnormal, the corresponding distribution exceeds the control limit; let us denote the mahalanobis distance squared time series as δ 2 (t) mean value of μ M Variance isThe setting of the "3 σ" control map is as follows:
center line: CL = μ M
An upper control limit: UCL = μ M +3σ M
Lower control limits: LCL = μ M -3σ M
According to the normal distribution rule, the method comprises the following steps:
where Φ (-) is the probability distribution function of a standard normal distribution,
the discriminant formula of the "3 σ" control map is:
and (3 sigma) control chart is adopted to judge the abnormity, namely when the deviation of the value from the mean value exceeds 3 times of standard deviation, the occurrence of an abnormal event is judged, and the confidence coefficient is 99.74%.
The specific method for positioning the abnormal OD in the step 3 is as follows:
determining an abnormal sample, and further determining which dimensions of the sample (the sample dimension corresponds to the OD) the abnormal values in should be responsible for the abnormal sample, namely abnormal OD positioning; let the abnormal sample be x a In total, D dimensions, x j (j =1, \ 8230;, D) is x a The j-dimension variable is used for abnormal OD positioning by adopting the following contribution analysis method (the step is shown in FIG. 6), and the method can be divided into two stages:
stage one: repeating the following steps a1 to c1 each time j =1, 2, 3, and D in sequence to obtain D execution results;
step a1, abnormal samples x a X in (2) j Assuming a missing, an observation and a missing value are obtained:and
abnormal sample x a Is expressed as:
due to the fact thatThen the
Step b1, calculatingConditional probability distributionTo give the corresponding z a And Q a :
Computing an anomalous sample x a Squared mahalanobis distance of
Step c1, calculating x j Degree of contribution ofI.e. x a And withThe squared difference of the mahalanobis distance of (a), which represents the degree of reduction of the anomaly measure of the anomalous sample after removing a certain dimension;
of D execution results, ifLess than a threshold for anomaly discrimination, i.e. degree of contributionIf the value is greater than the set abnormal judgment value, the abnormal sample x is a Of (j) th dimension pair x a Is responsible for the abnormality of;
and a second stage:
if not found in stage oneIf the value is less than the threshold value of the abnormal judgment, the abnormal sample x is indicated a If the anomaly of a single dimension is not enough to cause the sample anomaly, then the following operations are performed:
step a2, contribution degree according to each dimensionSequencing the data of each dimension from large to small;
step b2, removing the data of the two dimensions arranged at the top according to the sequence arranged in the step a2, and if the squared Mahalanobis distance calculated at the moment is smaller than the threshold value for judging the abnormality, the two corresponding dimensions are used for matching the sample x together a Is responsible for the abnormality of; otherwise, continuously removing the data of the dimension arranged at the top, calculating the squared Mahalanobis distance once the data of the dimension arranged at the top is removed until the calculated squared Mahalanobis distance is smaller than the threshold value of the abnormal judgment, and at the moment, the data of all the removed dimensions are used for the sample x together a Is responsible for the abnormality of (a).
And (3) analyzing algorithm complexity:
in the RMPCM algorithm, the main computational overhead is the inversion of the metric matrix Ψ of the traffic matrix and the number of iterations of the EM algorithm. Ψ is a matrix of D × D, D is the dimension of the sample X, corresponding to the number of columns of the flow matrix, i.e., the number of ODs (n × n), and Ψ is directly calculated in the calculation process -1 The complexity of the algorithm is greatly influenced, and psi can be obtained by utilizing the Woodbury matrix identity equation -1 =(WW T +τI D ) -1 =τ -1 I D -τ -1 WM -1 W T Wherein M = W T W+τI d For the matrix of dXd, the PCA dimension reduction method is used to determine the reserved principal component number D, and D < D is known, so that the inversion of the matrix psi for solving the matrix of the dXD is converted into the inversion of the matrix M for solving the matrix of the dXd, and the computational complexity is greatly simplified. The time complexity is O (Td) 2 ). The time complexity of the algorithm is also related to the iteration times of the EM algorithm, and the iteration times in the calculation by adopting the rapid EM algorithm are generally less than 15. A Matlab is adopted to configure a computer with a core i 7.5 GHz CPU and a 4GB memory in a win7 system, an RMPCM detection algorithm is executed on a data set in a subsequent experiment, the running time of complete data is not more than 2.53s, and the running time of missing data is less than 5s, which is shown in Table 1. Will Matla if actually deployedb the code conversion to binary executable program will be further accelerated in computation speed.
TABLE 1RMPCM Algorithm execution time
Analyzing network measured data:
data set:
the data set of backbone network Abilene commonly used in network flow research is selected from the actually measured data set, and main users of the Abilene network are universities and scientific research institutions in the United states. Because the data in 2003 is complete and more methods are adopted for reference, the invention selects NetFlow data of 11 PoP nodes from 15 days 12 and 12 days 21 and 12 months in 2003, obtains the entry point and the exit point of each flow according to BGP and ISIS routing tables, and obtains the OD flow size and the flow matrix, which are shown in table 2. The data set is used for carrying out detection performance evaluation and sensitivity analysis under the condition of lacking data.
Table 2 Abilene traffic matrix dataset
And (3) detecting abnormality under data missing:
the detection results under the condition of no missing of the measured data are shown in fig. 7. When an abnormality detection experiment under the condition of actual measurement data set deletion is carried out, a data set with the number of groups (P) is selected to be sequentially tested according to the 4 deletion mechanisms provided above. In order to compare the similarities and differences of the RMPCM method under various deletion mechanisms and analyze the factors influencing the detection performance, the same loss rate is set under 3 mechanisms of complete random deletion, time period random deletion and OD random deletion to compare the influences of different mechanisms on the detection performance, and the influences of the same loss rate and different block sizes on the detection performance are compared under the 4 th mechanism of block random deletion. In order to eliminate the contingency of the experiment of losing data, 10 times of experiments are carried out, the average value of the experiments is taken, and an ROC curve is drawn by taking the detection result of RMPCM under the complete data as a reference.
In the completely random deletion experiment, 3 kinds of loss rates are selected, the loss data accounts for 10%,20%, and 50% of the total data volume of the traffic matrix, and only partial results are displayed in space, for example, fig. 8 (a) shows detection curves and ROC curves with the loss rates of 10%,50% in sequence; in the time period random deletion experiment, the number of cycles of random deletion is set to be 200, 400 and 1000 respectively, and since the total time period is 2010, the data amount discarded by the time period random deletion and the complete random deletion is similar, and partial results are shown in (b) of fig. 8; in the OD random missing experiment, since the whole column of the algorithm restriction matrix cannot be all empty, half of the adjacent data is selected to be empty, the total number of missing ODs is set to be 24, 48, 121, and 121, respectively, so that the discarded data amount of the OD random missing experiment is similar to the former two, and partial results are shown in (c) of fig. 8. Experiments of the three mechanisms show that the main factor influencing the detection rate is the loss of abnormal data, and meanwhile, the fact that the false alarm rate under the same detection rate is gradually increased along with the increase of the data loss rate and the performance shown by an ROC curve is gradually deteriorated is also found; compared with the three mechanisms, the detection performance is minimally affected by complete random deletion, the detection performance is randomly deleted in one time period, and the OD random deletion is maximal, the analysis reason is related to the size of the data quantity of each structured deletion, so that a 4 th block random deletion experiment is carried out, different block sizes are set in the experiment, but the same loss quantity is kept, three block sizes are respectively set, namely 5, 16, 40, the number of the deleted blocks is respectively 2000, 200, 30, and the loss quantity is kept accounting for about 20% of the total data quantity, and as shown in (d) in fig. 8, the ROC curve shows that the detection performance is greatly affected by the data quantity of the structured deletion, and the detection performance is obviously reduced when the block size is increased to 40.
And (3) sensitivity analysis:
the sensitivity analysis of the invention is mainly aimed at two situations: the sensitivity analysis of the number d of the reserved pivot elements is performed, and the sensitivity analysis of the flow measurement is performed. The PCA method was selected for comparison.
Selecting a flow number (F) data set to carry out a principal component number sensitivity analysis experiment, wherein the experimental result is shown in fig. 9, (a) and (c) respectively reserve detection curves with principal component numbers of 4 and 5 for PCA, visible curve profiles are completely different, and the detection results are very different; in fig. 9, (b) and (d) are detection curves corresponding to the RMPCM, respectively, and both the curve profiles and the detection results are consistent. The detection condition of the RMPCM method pivot number of 2 to 10 is verified in the experiment, the detection results are basically consistent, and high robustness is kept.
Data sets B, P and F are respectively selected to carry out sensitivity analysis of flow measurement, experiments set the threshold value of the cumulative variance contribution rate of the pivot element to be 0.85, the results are shown in figure 10, the curve profiles obtained by the PCA method under 3 measurement types are completely different, and the detection results are also greatly different; the detection curve profile of the RMPCM method is approximate, detection results are different but have larger relation, the number of the detected abnormity is the largest by taking the number of packets as the measurement, the result obtained by taking the number of streams as the measurement is contained in the abnormity detected by taking the number of packets as the measurement, the percentage is 59.6%, and the same result obtained by taking the number of bytes as the measurement is also contained in the abnormity obtained by taking the number of packets as the measurement, and the percentage is 43.5%. Due to the fact that no abnormal mark of the measured data set exists, the actual situation of the abnormal situation cannot be known, but the statistics obtained by taking the number of streams, the number of bytes and the number of packets as the measurement values are related, corresponding abnormal detection results are also overlapped, the results obtained by the PCA method conflict with one another at multiple places, the flow measurement degree is sensitive, and compared with the RMPCM method, the RMPCM method is high in robustness, and the obtained results are more reasonable.
The simulation experiment, the simulation platform experiment and the analysis of the internet measured data show that: the detection performance of the RMPCM method is superior to that of the PCA method and the ANTIDOTE method, the robustness is good, whether the data to be detected is complete or not and whether the detection environment has strong interference or not, the method shows relatively stable detection performance and has lower sensitivity to model parameters.