CN105119734B - Whole network abnormality detection localization method based on healthy and strong multivariate probability calibrating patterns - Google Patents

Whole network abnormality detection localization method based on healthy and strong multivariate probability calibrating patterns Download PDF

Info

Publication number
CN105119734B
CN105119734B CN201510414996.5A CN201510414996A CN105119734B CN 105119734 B CN105119734 B CN 105119734B CN 201510414996 A CN201510414996 A CN 201510414996A CN 105119734 B CN105119734 B CN 105119734B
Authority
CN
China
Prior art keywords
model
abnormal
data
distribution
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510414996.5A
Other languages
Chinese (zh)
Other versions
CN105119734A (en
Inventor
钱叶魁
叶立新
朱少卫
李宇翀
杜江
黄浩
杨瑞朋
雒朝峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA AIR DEFENCE FORCES ACADEMY SCHOOL
Original Assignee
PLA AIR DEFENCE FORCES ACADEMY SCHOOL
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA AIR DEFENCE FORCES ACADEMY SCHOOL filed Critical PLA AIR DEFENCE FORCES ACADEMY SCHOOL
Priority to CN201510414996.5A priority Critical patent/CN105119734B/en
Publication of CN105119734A publication Critical patent/CN105119734A/en
Application granted granted Critical
Publication of CN105119734B publication Critical patent/CN105119734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Nitrogen And Oxygen Or Sulfur-Condensed Heterocyclic Ring Systems (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to a kind of whole network abnormality detection localization method based on healthy and strong multivariate probability calibrating patterns;The whole network abnormality detection localization method contains the following steps:Step 1, normal discharge modeling:Normality model is established using the data on flows collected;Normal discharge modeling is divided into:There is the normal model structure under deletion condition containing the normal model structure under extraordinary noise data and data on flows;Step 2, Traffic anomaly detection:It is whether abnormal that the sample is weighed using the mahalanobis distance of sample and normality model;Step 3, exception OD positioning:The position occurred extremely by the contribution analysis positioning flowed to exceptional sample OD;The present invention can both handle partial data, can also handle the situation of shortage of data, stronger and relatively low to the sensitiveness of model parameter to the resistance of extraordinary noise interference, and performance is stablized.

Description

Full-network anomaly detection and positioning method based on robust multivariate probability calibration model
(I) the technical field
The invention relates to a network anomaly detection method, in particular to a full-network anomaly detection positioning method based on a robust multivariate probability calibration model.
(II) background Art
Under the current internet environment, various network abnormal events are layered endlessly, large-scale network intrusion such as DDos attack, botnet and the like brings serious threats to the safety operation of the internet, and network congestion, network failure and the like also seriously affect the service quality of the internet, so that the detection and the positioning of network abnormal behaviors are very necessary. Meanwhile, network anomalies are various and rapid in change, and are often hidden in complex and huge background traffic, so that great difficulty is brought to detection and positioning of the network anomalies.
The method is characterized in that a plurality of researches are carried out for network anomaly detection, system logs, audit information and the like of a host are used as data sources, and a method for anomaly detection based on the host is provided by adopting methods such as data mining and the like; using end-to-end round-trip delay, packet loss rate and other performance measurement data, and adopting a unitary time sequence analysis method to provide a single-path-based anomaly detection method; network flow data such as SNMP and NetFlow using a single link are provided, and an anomaly detection method based on the single link is provided by adopting methods such as machine learning and wavelet analysis. The method mainly focuses on local information of the network, the monitoring range is limited, but with the continuous expansion of the network scale, the data transmission speed is continuously accelerated, a plurality of network anomalies present strong global characteristics, the influences of the network anomalies disperse into a plurality of links or paths in the network, and the information is not obvious in local representation. By adopting the analysis and detection method based on the host, the single path or the single link, comprehensive measurement and global analysis can not be carried out on the network, and the detection precision is difficult to ensure.
In view of the above problems, lakhina et al propose a full network-wide anomaly detection method based on principal component analysis subspace PCA for the first time, comprehensively utilize flow statistical information of multiple paths to construct a normal model, and determine whether an anomaly occurs by judging whether the current situation deviates from the normal model; the subsequent research is along the thought of the whole network detection, and researches are carried out on the aspects of the time-space expansibility, the robustness, the real-time property, the abnormal measure and the like of the detection algorithm, so that the content of the whole network abnormal detection is enriched. The method comprehensively utilizes the flow data of the whole network, obtains the detection performance superior to a single-node, single-path or single-link method, and simultaneously adopts an abnormal detection method for establishing a normal model and comparing with the normal model without establishing an abnormal feature library, so that the known abnormal condition and the unknown abnormal condition can be detected, and the application range is wide. The whole network anomaly detection method based on the flow statistics improves the detection performance by introducing network information with wider range and more dimensionality, but the method also faces some practical problems when being deployed on a large-scale high-speed backbone network: firstly, the acquisition range is enlarged, acquisition devices are increased, and the network speed is increased, so that the condition of data acquisition loss caused by the failure of part of devices or the condition of data flow loss in the transmission process must be considered, and the abnormal detection method becomes unusable due to incomplete data; secondly, the actual flow of the backbone network is huge in data quantity and very complex, the accuracy of normal model construction is affected by various hidden abnormal flows, so that model parameters are difficult to select, the stability of an abnormal detection method is extremely difficult to guarantee, and some deliberate attack flows can also severely poison a detector; thirdly, the method can find the abnormity, but is also deficient in the aspect of abnormity positioning.
Dorothy Denning proposed a statistical model of network anomaly detection as early as 1987, and network anomaly detection becomes more important with the development of the Internet. According to the detection range, the method is distinguished by the traditional anomaly detection method based on a host, a single path and a single link, and the research of Yegneswaran and the like finds that the occurrence and development of network anomaly show the global change trend, the detection range is increased, and the improvement of anomaly detection effect exceeds the linear relation, so that the basis of the whole network detection method is laid; in 2004, lakhina et al put forward a full-network anomaly detection method based on flow statistics for the first time, reveal the low-dimensional characteristic of source-destination (OD) flow matrix, and use the flow statistics information of multiple paths comprehensively to construct a normal model, and determine whether an anomaly occurs by judging whether the current situation deviates from the normal model; subsequently, ling Huang et al improve an anomaly detection method based on PCA by using a stochastic matrix perturbation (stochastic matrix perturbation) theory, and provide a low-overhead full-network anomaly detection method of distributed PCA; the Brauckhoff et al generalizes PCA to obtain a Galerkin-based Karhunen-love transformation expansion calculation method and uses the method for detecting the whole network abnormality; ahmed et al propose a kernel recursive least square (kernel recursive least squares) on-line detection method for detecting timeliness. The method lacks research on actual network environment problems (such as abnormal noise interference and data loss) and abnormal positioning problems in network abnormality detection.
Ringberg et al conducted an in-depth study on the influence of abnormal flow noise on the performance of the detector, and indicated that large flow abnormality can cause the deviation of a normal model of a PCA detector, thereby increasing the false alarm rate of abnormal detection; rubinstein et al further studied the poisoning problem of the detector and quantitatively studied the poisoning ability of some poisoning attack methods on PCA anomaly detectors. Qianyukui et al enumerates 3 kinds of toxicity attack mechanisms, and proposes a toxicity defense mechanism RPCA based on robust PCA by using methods such as projection tracking and the like. The method mainly analyzes the harmfulness of different toxic attack modes to the PCA detector, researches the defense mechanism of the PCA detector in a targeted manner, and lacks the research on a common normal modeling method under a complex abnormal noise environment.
The data missing problem is a common practical problem in a plurality of fields, and how to obtain enough useful information from incomplete data is a problem to be solved urgently. Wangli Xu et al propose solutions to the goodness-of-fit inspection problem of models with time-varying coefficients missing for different types of variables, respectively; in the network field, yin Zhang et al indicates that data loss is faced in network traffic data measurement on a large-scale high-speed backbone network, and proposes a sparse normalized matrix decomposition (sparse normalized matrix factorization) method to completely interpolate and fill the missing data.
Network anomaly detection can determine when an anomaly occurs, and anomaly location is a very challenging task. Ringberg et al have studied and analyzed the deficiency that the detection method of the whole network anomaly based on PCA exists in the aspect of the abnormal localization; eriksson et al propose a base detect (basis detect) full network anomaly detection and location method, but the method can only locate the border router, and cannot provide more detailed anomaly location information.
(III) summary of the invention
The technical problem to be solved by the invention is as follows: the method can process complete data and data loss, has strong resistance to abnormal noise interference, has low sensitivity to model parameters, and has stable performance.
The technical scheme of the invention is as follows:
a data source model:
early Internet traffic research focused primarily on the temporal characteristics of single-link packets in an Internet Service Provider (ISP), thereby yielding traffic characteristics that were self-similar, long-correlated, etc. However, an ISP often contains hundreds or thousands of links, and the internet is composed of tens of thousands of such ISPs, and when viewed in such a large background, the spatial characteristics of traffic are highlighted, and the traffic matrix, which is a compact and concise description of traffic between nodes in a given network architecture, is a common model structure in network traffic research. Meanwhile, the analysis of the traffic data of all links of the whole network is a task which is difficult to complete, the OD flow traffic matrix is used for carrying out the traffic analysis of the whole network more directly and clearly, and the Point of presence (Point of presence) level traffic matrix is adopted as a researched data source.
PoP level traffic matrix: assuming that an Autonomous System (AS) has N PoP nodes, the traffic between any pair of PoP nodes is continuously measured in a certain cycle, the inflow node of the traffic is the source node, the outflow node is the destination node, and thus it can be referred to AS source-destination (OD) flow, and then the measurement values are arranged into an N × D matrix X, which is referred to AS a PoP level traffic matrix of the AS (AS shown in fig. 1), where N represents the number of measured cycles and D represents the number of OD traffic measurement values (D = N × N). Flow matrix ith row representationThe ith period is a vector formed by OD flow measurement values; column j represents the vector of which the j-th OD consists of the sequences of time measurements. Any element x of the traffic matrix ij Representing some measure of traffic at the jth OD of the ith cycle. The flow measure adopted by the invention is the flow size (byte number, packet number or flow number).
Data loss problem:
in the data acquisition process, the data volume of the high-speed backbone network is huge and the transmission is fast, so that the burden of acquisition equipment is increased and the stability is reduced, and data loss occurs; data loss may also occur during data transmission due to network congestion, device or link failure.
The loss of traffic data in network measurement is not all completely randomized, and in many cases the loss is highly structured, and 4 loss mechanisms are proposed herein to describe the situation of data loss that may be encountered in the network data acquisition and transmission process.
(1) Complete random deletion
A completely random miss refers to any element X in the traffic matrix X ij The loss, q, with a random probability, may occur as a result of occasional congestion at the flow measuring device, or as a result of random data loss due to unreliable transmission mechanisms used for the measurement data.
(2) Random absence of time period
The rows of the traffic matrix correspond to the acquisition cycles of the traffic, and the random loss of the time period refers to the loss of any row of elements in the traffic matrix X with a probability q, which may cause the loss of the acquired data in the time period due to overload of a storage device or program failure caused by excessive data volume during the centralized processing of the measured data.
(3) OD random deletions
The columns of the traffic matrix correspond to the OD flows, and random loss of OD corresponds to the loss of any column of elements in the traffic matrix X with probability q. This may occur due to OD source or destination identification errors caused by flow filtering or acquisition procedure errors, and link or router failures may also cause loss of data on the relevant OD.
(4) Random block deletion
A block random miss refers to a certain sub-matrix of the traffic matrix X being lost with a probability q. Such a structured absence may occur in case of failure of the acquisition device or a full memory lasting several acquisition cycles, with a data absence corresponding to a plurality of adjacent rows and columns of the traffic matrix. If the missing sub-block is set as a certain row of the traffic matrix, the time period random missing is converted, and if the missing sub-block is set as a certain column, the OD random missing is converted.
Modeling problem under abnormal noise:
under the condition that a sample is known, a maximum likelihood estimation is often used for estimating model parameters for establishing a model, the probability distribution function of known data is needed by using the method, but the probability distribution of the data is difficult to describe accurately, and the normal distribution has good operational property, so that the data generally meets the characteristic of the normal distribution. In the linear Gaussian regression model, the maximum likelihood estimation and the least square estimation are equivalent, least square fitting is carried out in an abnormal noise environment, and the deviation of fitting can be caused by outlier sample points (namely, a small amount of samples which have overlarge distribution difference with most samples in the sample set), so that the accuracy of the model is influenced. Therefore, in an actual internet environment, the acquired traffic contains complex abnormal traffic as an outlier of the traffic population, and the establishment of the model is influenced, and how to construct the traffic normality model more accurately under the circumstance is an urgent problem to be solved.
Abnormal positioning problem:
the network anomaly detection can find the anomaly and determine the time when the anomaly occurs, but in order to more accurately grasp the fault or safety problem in the network, the anomaly location is needed to be carried out in a targeted way. The abnormal positioning in the flow matrix X is the position X for determining the matrix row-column interleaving corresponding to the abnormal occurrence ij A row of X corresponds to the time when the anomaly occurred, and a column of X corresponds to the location (i.e., on some OD or ODs) where the anomaly occurred.
A full-network anomaly detection and positioning method based on Robust Multivariate Probabilistic Calibration Model (RMPCM) comprises the following steps:
step 1, modeling normal flow: establishing a normal model by using the collected flow data;
step 2, flow abnormity detection: measuring whether the sample is abnormal or not by using the Mahalanobis distance between the sample and the normal model;
step 3, abnormal OD positioning: the location of the anomaly occurrence is located by analysis of the contribution to the flow of the anomalous sample OD.
The normal flow modeling in the step 1 comprises the following steps: and (3) constructing a normal model under the condition of abnormal noise data and constructing a normal model under the condition of missing flow data.
The method for constructing the normal model under the abnormal noise data comprises the following steps: introducing multivariate t distribution into an implicit variable probability model, and solving the problem of normal model construction under abnormal noise data;
A. replacing normal distribution with t distribution, and establishing a hidden variable probability model:
let the hidden vector t of each d dimension i All come from a D-dimensional feature vector x i The linear probability projection of (1), wherein D is more than or equal to D, so as to establish a hidden variable probability model; in order to solve the problem that a Gaussian noise model is too sensitive to an abnormal observation value, t distribution is selected to replace normal distribution, unit variance t distribution is selected to serve as prior distribution of hidden vectors, and a probability model is as follows:
p(t i )=S(t i |0,I d ,v) (5)
p(x i |t i )=S(x i |Wt i +μ,τI D ,v) (6)
wherein mu is a position vector, I is a unit matrix, v is the degree of freedom of t distribution, W is a projection matrix, and tau is a model parameter to be solved;
solving the probability model by using maximum likelihood estimation directly, expanding a t distribution model into an infinite Gaussian mixture model with the same mean value, wherein the prior distribution of the t distribution model is gamma distribution, and the parameters are only related to the degree of freedom v of the t distribution:
wherein Ga (u | alpha, beta) = beta α u α-1 e -βu The/gamma (alpha) is a probability density function of gamma distribution, and the lambda is a scale matrix; according to the formula (7), introducing an implicit variable u, u being a scalar, u: ga (v/2 ), if t: t (mu, lambda, v), then t | u: N (mu, u, v) -1 Λ); the following hidden variable models were obtained:
in order to obtain the model parameters { μ, τ, W, v }, it is also necessary to obtain u | x and t | x, u, which can be obtained from equations (9) and (10):
p(x i |u i )=∫p(x i |t i ,u i )p(t i |u i )dt=N(μ,WW T +τI D /u i ) (11)
let Ψ = WW T +τI D For a matrix of D × D, the edge distribution of x follows a t distribution, which can be expressed as: p (x) i )=S(μ,Ψ,v),
From equation (8) and equation (11), the following can be obtained:
wherein, delta 2 =(x i -μ) T Ψ -1 (x i μ) from the equations (9) and (10):
p(t i |x i ,u i )∝p(x i |t i ,u i )p(t i |u i )=N(M -1 W T (x i -μ),τΜ -1 /u i ) (13)
Wherein M = W T W+τI d For a d x d matrix, t | x obeys a t distribution, i.e., p (t) i |x i )=S(M -1 W T (x i -μ),τΜ -1 ,v);
B. Carrying out model parameter solution:
the model parameters to be solved are omega = { mu, tau, W, v, d }, and firstly, the number d of reserved pivot elements is determined to be 5 by using a PCA-based lithograph; for { mu, tau, W, v }, the maximum likelihood estimation is adopted to carry out parameter estimation of the model, and the log-likelihood function is as follows:
wherein N is the number of samples;
in order to simplify the calculation, a REM (Rapid approximation-maximization) algorithm is adopted to obtain the maximum likelihood estimation of the model parameters, the REM can obviously improve the convergence speed of the algorithm, the REM algorithm comprises two stages, each stage adopts an EM algorithm to estimate different parameters, and then circulation of the two stages is carried out iteratively until the convergence condition is met;
the first stage is as follows:
this phase does not take t into account i Estimating only the parameter mu;
the log-likelihood function is:
e, step E:
according to the Jensen inequality, when p (u) i )=p(u i |x i ) Then, the equal sign is established, and L is established 1 Lower bound of (1), byObtaining L from the formula (11) 1 The expectation, see equation (16),<·&gt represents the calculation expectation:
wherein, the first and the second end of the pipe are connected with each other,
and M:
will be provided with<L 1 &And calculating the partial derivative of mu and making the partial derivative equal to 0 to obtain a mu estimated value:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
The log-likelihood function is:
e, step E:
when p (t) i ,u i )=p(t i ,u i |x i ) Then, L can be obtained from the formula (8), the formula (9) and the formula (10) 2 The expectation of (2):
wherein the content of the first and second substances,<u i &gt, see formula (17),
<t i >=M -1 W T (x i -μ) (21)
<u i t i >=<u i ><t i > (22)
and M:
maximization<L 2 &And obtaining an updating formula of the model parameters { W, tau, v }:
parameter(s)The estimate of (d) is obtained by a linear search of the following formula:
the normal model construction method under the condition of the flow data loss comprises the following steps:
sample data x for D dimension i If data in certain dimensions of the model are missing, the data can still be used for estimating model parameters; sample data x i Divided into observed and missing values, i.e.
Then sample data x i The mean and covariance matrices of (a) may be expressed as:
due to the fact thatThen
Wherein the content of the first and second substances,
A. replacing normal distribution with t distribution to establish a hidden variable probability model;
B. carrying out model parameter solving: determining the number d of the reserved pivot elements as 5, and performing model parameter estimation under missing data by using a REM algorithm for { mu, tau, W, v }:
the first stage is as follows:
e, step E:
wherein the content of the first and second substances,
<u i (x i -μ)(x i -μ) T >=Q i +<u i >(z i -μ)(z i -μ) T (31)
is sample data x i When some of the dimensions are missing, the number of observable dimensions remain;
and M:
will be provided with<L′ 1 &And g, calculating a partial derivative of mu and enabling the partial derivative to be equal to 0 to obtain a mu estimated value:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
E, step E:
wherein the content of the first and second substances,
m, step:
maximization<L′ 2 &And obtaining an updating formula of the model parameters { W, tau, v }:
the update of the parameter v is found by:
the specific method for detecting abnormal flow in step 2 is as follows:
selecting a metric for complex flow data to judge abnormal flow in the complex flow data; two strategies are commonly used for judging which abnormal points are in data, namely judging Hotelling's T 2 Whether the Square Prediction Error (SPE) exceeds the threshold value or not is judged to determine whether the sample point is an orthogonal abnormal point (orthogonal abnormal point); because the invention establishes a probability model, the Mahalanobis distance of the sample can be simply adopted to carry out flow anomaly detection, and the squared Mahalanobis distance of the complete data sample is delta 2
δ 2 =(x i -μ) T Ψ -1 (x i -μ) (38)
For samples with some dimension missing, their mahalanobis distance is squared to find their expectationInstead of:
the abnormality is interpreted using a normally distributed "3 σ" control chart, namely: when the time sequence is abnormal, the corresponding distribution exceeds the control limit; let us denote the mahalanobis distance squared time series as δ 2 (t) mean value of μ M Variance ofThe setting of the "3 σ" control map is as follows:
center line: CL = μ M
An upper control limit: UCL = μ M +3σ M
Lower control limits: LCL = μ M -3σ M
According to the normal distribution rule, the method comprises the following steps:
where Φ (-) is the probability distribution function of a standard normal distribution,
the discriminant formula of the 3 sigma control chart is as follows:
and (3 sigma) control chart is adopted to judge the abnormity, namely when the deviation of the value from the mean value exceeds 3 times of standard deviation, the occurrence of an abnormal event is judged, and the confidence coefficient is 99.74%.
The specific method for positioning the abnormal OD in the step 3 is as follows:
determining an abnormal sample, and further determining which dimensions of the sample (the sample dimension corresponds to the OD) the abnormal values in should be responsible for the abnormal sample, namely abnormal OD positioning; let the abnormal sample be x a In total, D dimensions, x j (j =1, \8230;, D) is x a The j-dimension variable is subjected to abnormal OD positioning by adopting the following contribution analysis method, and the method can be divided into two stages:
stage one: repeating the following steps a1 to c1 each time j =1, 2, 3, and D in sequence to obtain D execution results;
step a1, abnormal samples x a X in (2) j Assuming a missing, an observation and a missing value are obtained:and
abnormal sample x a The mean and covariance matrices of (a) are expressed as:
due to the fact thatThen
Step b1, calculatingThe conditional probability distribution yields the corresponding z a And Q a :
Computing an anomalous sample x a Squared mahalanobis distance of
Step c1, calculating x j Degree of contribution of (2)I.e. x a Andthe squared difference of the mahalanobis distance of (a), which represents the degree of reduction of the anomaly measure of the abnormal sample after removing a certain dimension;
of D execution results, ifLess than a threshold for anomaly discrimination, i.e. degree of contributionIf the value is greater than the set abnormal judgment value, the abnormal sample x is a Of (j) th dimension pair x a Is responsible for the abnormality of (c);
and a second stage:
if not found in stage oneIf the value is less than the threshold value of the abnormal judgment, the abnormal sample x is indicated a If the anomaly of a single dimension is not sufficient to cause an anomaly in the sample, then the following operations are performed:
step a2, according to the contribution degree of each dimensionSequencing the data of each dimension from large to small;
step b2, removing the data of the two dimensions arranged at the top according to the sequence arranged in the step a2, and if the squared Mahalanobis distance calculated at the moment is smaller than the threshold value for judging the abnormality, the two corresponding dimensions are used for matching the sample x together a Is responsible for the abnormality of (c); otherwise, continuously removing the data of the dimension arranged at the top, calculating the squared Mahalanobis distance once the data of the dimension arranged at the top is removed until the calculated squared Mahalanobis distance is smaller than the threshold value for judging the abnormality, and at this time, the data of all the removed dimensions are used for the sample x together a Is responsible for the abnormality of (a).
The invention has the beneficial effects that:
the multivariate t distribution is introduced into a hidden variable probability model instead of normal distribution, a normal model of a flow matrix is further established, abnormal flow detection is carried out by comparing the Mahalanobis distance (Mahalanobis distance) between a sample and the normal model, and abnormal OD positioning is further realized by a contribution analysis method. The method solves the problem of abnormal detection under the condition of incomplete data to be detected by establishing a hidden variable probability model, solves the problem of interference of abnormal noise on a normal model by introducing multivariate t distribution into a probability modeling process, and improves the detection precision; the invention has the advantages of better robustness, wide application scene, capability of processing complete data and data loss, stronger resistance to abnormal noise interference, lower sensitivity to model parameters, stable performance and reduction of complex parameter debugging work in actual deployment.
(IV) description of the drawings
FIG. 1 is a schematic view of a traffic matrix;
FIG. 2 is a diagram of the relationship steps of the RMPCM method;
FIG. 3 is a diagram of normal modeling steps under abnormal noise;
FIG. 4 is a graph model of RMPCM (shaded nodes are observed vectors, arrows indicate conditional correlations of random vectors);
FIG. 5 is a diagram of normal modeling steps in the absence of data;
FIG. 6 is a diagram illustrating a positioning procedure of abnormal ODs;
FIG. 7 is a diagram showing the results of detection under the condition of no missing of measured data;
FIG. 8 is a diagram showing the detection results of measured data under 4 deletion mechanisms;
FIG. 9 is a graph of sensitivity analysis of PCA (left side of the graph) and RMPCM (right side of the graph) to principal component numbers;
FIG. 10 is a graph of sensitivity analysis of PCA (left side of the graph) and RMPCM (right side of the graph) to flow measurements.
(V) detailed description of the preferred embodiments
The method for detecting and positioning the full-network anomaly based on a Robust Multivariate Probabilistic Calibration Model (RMPCM) comprises the following steps:
step 1, modeling normal flow: establishing a normal model by using the collected flow data;
step 2, flow abnormity detection: measuring whether the sample is abnormal or not by using the Mahalanobis distance between the sample and the normal model;
step 3, abnormal OD positioning: the location of the anomaly occurrence is located by analyzing the contribution to the flow of the anomalous sample OD.
The relationship between the abnormal event in the network and the network abnormal detection and location is (as shown in fig. 2): abnormal events can affect the statistics of partial flows in the network, the change of the statistics corresponding to the network flow total triggers an abnormal detector to alarm, and a network manager extracts the abnormal flows in the network through abnormal positioning after receiving the alarm, and then analyzes and determines the abnormal events;
the application of the network anomaly detection method in the actual internet environment often encounters some practical problems, such as the loss of traffic data in the transmission and collection processes, and the model offset caused by the anomaly traffic interference in the network.
Under the condition that flow data are incomplete, the traditional network anomaly detection method cannot be applied, a Bayes statistical calculation method is adopted in the invention, but the complexity of network flow data cannot be directly applied to obtain posterior mean value estimation and progressive variance thereof, an implicit variable probability model is introduced, namely, some potential data are added on the basis of known measurement data, so that the calculation is simplified to complete parameter estimation, a data missing part and unknown parameters can be taken as the potential data in the process, and an EM (expectation-maximization) algorithm is adopted to obtain Maximum Likelihood Estimation (MLE) of model parameters.
The probability distribution of known data is needed during maximum likelihood estimation, and the probability distribution is generally assumed to meet normal distribution, but because the actual network flow contains some abnormal noise flow, the deviation of parameter estimation is too large by adopting the normal distribution assumption, and therefore multivariate t distribution is introduced to replace multivariate normal distribution. Compared with normal distribution, t distribution has a heavy tail characteristic, abnormal samples have lower weight in the maximum likelihood estimation process after the t distribution is introduced, the influence on parameter estimation can be reduced, and the specific explanation is as follows:
the sample data is x i (i =1, \ 8230;, N) assuming that it satisfies an independent identically distributed normal random vector, denoted as
Wherein N is the number of samples, D is the dimension of the samples, mu is the mean vector function, the parameter is theta, sigma is the covariance matrix of DxD, and the parameter isWe adopt the t distribution model to replace the above normal distribution model, denoted as
Similarly, D is the dimension of the sample, μ is the location parameter (location parameter) vector, Λ is the scale matrix (scale matrix), and v is t The degree of freedom of distribution (degrees of freedom), with probability density function (p.d.f.) is:
wherein, delta 2 =(x-μ) T Λ -1 (x- μ) is the square of the mahalanobis distance of x to μ, Γ (·) is the gamma function; when v < ∞andmaximum likelihood estimation is used for parameter estimation, the robustness of t-distribution model to outlier interference is better than that of normal distribution model, mainly embodied in that outliers with larger mahalanobis distance contribute relatively less to model parameter estimation, and the parameter theta estimation is used belowExplanation is given by way of example:
for the normal distribution model (1), its maximum likelihood estimate satisfies the equation:where l is the log-likelihood function, A i Is mu i A matrix of partial derivatives to θ; under the t distribution model (2), the maximum likelihood estimation satisfies the equation:wherein, the first and the second end of the pipe are connected with each other,
for the weight of sample i, see asIncrease w of i Decreasing; abnormal samples have a larger delta i Smaller w i Therefore, the sensitivity to abnormal samples can be reduced by using t distribution in model parameter estimation, and the robustness of the t distribution model is better compared with that of a normal distribution model; v is an adjusting parameter of the t distribution model for robustness of outliers, and increases along with the reduction of v; when v → ∞, the normal distribution model is equivalent to the maximum likelihood estimation of the t distribution model.
The normal flow modeling in the step 1 comprises the following steps: and (3) constructing a normal model under the condition of abnormal noise data and constructing a normal model under the condition of missing flow data.
The method for constructing the normal model under the abnormal noise data comprises the following steps: introducing multivariate t distribution into an implicit variable probability model, and solving the problem of normal model construction under abnormal noise data (as shown in figure 3);
A. and (3) replacing normal distribution with t distribution to establish a hidden variable probability model:
let the hidden vector t of each d dimension i Are all from a D-dimensional feature vector x i Linear probability projection of (1), wherein D ≧ D, is establishedEstablishing a hidden variable probability model; in order to solve the problem that a Gaussian noise model is too sensitive to an abnormal observation value, t distribution is selected to replace normal distribution, unit variance t distribution is selected to serve as prior distribution of hidden vectors, and a probability model is as follows:
p(t i )=S(t i |0,I d ,v) (5)
p(x i |t i )=S(x i |Wt i +μ,τI D ,v) (6)
wherein mu is a position vector, I is a unit matrix, v is the degree of freedom of t distribution, W is a projection matrix, and tau is a model parameter to be solved;
solving the probability model by using maximum likelihood estimation directly, expanding a t distribution model into an infinite Gaussian mixture model with the same mean value, wherein the prior distribution of the t distribution model is gamma distribution, and parameters are only related to the degree of freedom v of the t distribution:
wherein Ga (u | α, β) = β α u α-1 e -βu The/gamma (alpha) is a probability density function of gamma distribution, and the lambda is a scale matrix; according to the formula (7), introducing an implicit variable u, u being a scalar, u: ga (v/2 ), if t: t (mu, lambda, v), then t | u: N (mu, u, v) -1 Λ); the following hidden variable model (as shown in fig. 4) was obtained:
in order to obtain the model parameters { μ, τ, W, v }, it is also necessary to obtain u | x and t | x, u, which can be obtained from equations (9) and (10):
p(x i |u i )=∫p(x i |t i ,u i )p(t i |u i )dt=N(μ,WW T +τI D /u i ) (11)
let Ψ = WW T +τI D For a matrix of D × D, the edge distribution of x follows a t distribution, which can be expressed as: p (x) i )=S(μ,Ψ,v),
From equation (8) and equation (11), the following can be obtained:
wherein, delta 2 =(x i -μ) T Ψ -1 (x i μ), which can be determined from equation (9) and equation (10):
p(t i |x i ,u i )∝p(x i |t i ,u i )p(t i |u i )=N(M -1 W T (x i -μ),τΜ -1 /u i ) (13)
wherein M = W T W+τI d For a d x d matrix, t | x obeys a t distribution, i.e., p (t) i |x i )=S(M -1 W T (x i -μ),τΜ -1 ,v);
B. Carrying out model parameter solving:
the model parameters to be solved are omega = { mu, tau, W, v, d }, and firstly, the number d of reserved pivot elements is determined to be 5 by using a PCA-based lithograph; for { mu, tau, W, v }, the maximum likelihood estimation is adopted to carry out parameter estimation of the model, and the log likelihood function is as follows:
wherein N is the number of samples;
in order to simplify calculation, a REM (Rapid estimation-maximization) algorithm is adopted to obtain the maximum likelihood estimation of the model parameters, the REM can obviously improve the convergence speed of the algorithm, the REM algorithm comprises two stages, each stage adopts an EM algorithm to estimate different parameters, and then circulation of the two stages is carried out iteratively until the convergence condition is met;
the first stage is as follows:
this phase does not take t into account i Estimating only the parameter mu;
the log-likelihood function is:
e, step E:
according to the Jensen inequality, when p (u) i )=p(u i |x i ) Then, the equal sign is established, and L is established 1 The lower bound of (2), L is obtained from the formula (11) 1 The expectation, see equation (16),<·&gt, representing the computational expectation:
wherein, the first and the second end of the pipe are connected with each other,
m, step:
will be provided with<L 1 &And g, calculating a partial derivative of mu and enabling the partial derivative to be equal to 0 to obtain a mu estimated value:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
The log-likelihood function is:
e, step E:
when p (t) i ,u i )=p(t i ,u i |x i ) Then, L can be obtained from the formula (8), the formula (9) and the formula (10) 2 The expectation of (2):
wherein, the first and the second end of the pipe are connected with each other,<u i &gt, see formula (17),
<t i >=M -1 W T (x i -μ) (21)
<u i t i >=<u i ><t i > (22)
m, step:
maximization<L 2 &And gt, obtaining an updating formula of model parameters { W, tau, v }:
parameter(s)The estimate of (d) is obtained by a linear search of the following formula:
the method for constructing the normal model under the condition of missing flow data comprises the following steps (as shown in figure 5):
sample data x for D dimension i If data on certain dimensions of the model are missing, the data can still be used for estimating model parameters; sample data x i Divided into observed and missing values, i.e.
Then sample data x i The mean and covariance matrices of (a) can be expressed as:
due to the fact thatThen the
Wherein, the first and the second end of the pipe are connected with each other,
A. replacing normal distribution with t distribution to establish a hidden variable probability model;
B. carrying out model parameter solution: determining the number d of the reserved pivot elements as 5, and performing model parameter estimation under the condition of missing data by using a REM algorithm for { mu, tau, W, v }:
the first stage is as follows:
e, step E:
wherein, the first and the second end of the pipe are connected with each other,
<u i (x i -μ)(x i -μ) T >=Q i +<u i >(z i -μ)(z i -μ) T (31)
is sample data x i When some dimensions are missing, the number of observable dimensions remain;
m, step:
will be < L' 1 The partial derivative of μ is taken and made equal to 0, and the μ estimate is obtained:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
E, step E:
wherein the content of the first and second substances,
m, step:
l 'of maximization' 2 >. Obtaining an update formula of model parameters { W, τ, v }:
the update of the parameter v is obtained by:
the specific method for detecting the abnormal flow in the step 2 is as follows:
for complex flow data, selecting a metric to determine abnormal flow therein; two strategies are commonly used for judging which abnormal points are in data, namely judging Hotelling's T 2 Whether the Square Prediction Error (SPE) exceeds the threshold value or not is judged to determine whether the sample point is an orthogonal abnormal point (orthogonal abnormal point) or not; because the invention establishes the probability model, the probability model can be simply adoptedDetecting abnormal flow by the Mahalanobis distance of the sample, wherein the squared Mahalanobis distance of the complete data sample is delta 2
δ 2 =(x i -μ) T Ψ -1 (x i -μ) (38)
For samples with some dimension missing, their Mahalanobis distance squared to find their expectationInstead of:
the abnormality is interpreted using a normally distributed "3 σ" control chart, namely: when the time sequence is abnormal, the corresponding distribution exceeds the control limit; let us denote the mahalanobis distance squared time series as δ 2 (t) mean value of μ M Variance isThe setting of the "3 σ" control map is as follows:
center line: CL = μ M
An upper control limit: UCL = μ M +3σ M
Lower control limits: LCL = μ M -3σ M
According to the normal distribution rule, the method comprises the following steps:
where Φ (-) is the probability distribution function of a standard normal distribution,
the discriminant formula of the "3 σ" control map is:
and (3 sigma) control chart is adopted to judge the abnormity, namely when the deviation of the value from the mean value exceeds 3 times of standard deviation, the occurrence of an abnormal event is judged, and the confidence coefficient is 99.74%.
The specific method for positioning the abnormal OD in the step 3 is as follows:
determining an abnormal sample, and further determining which dimensions of the sample (the sample dimension corresponds to the OD) the abnormal values in should be responsible for the abnormal sample, namely abnormal OD positioning; let the abnormal sample be x a In total, D dimensions, x j (j =1, \ 8230;, D) is x a The j-dimension variable is used for abnormal OD positioning by adopting the following contribution analysis method (the step is shown in FIG. 6), and the method can be divided into two stages:
stage one: repeating the following steps a1 to c1 each time j =1, 2, 3, and D in sequence to obtain D execution results;
step a1, abnormal samples x a X in (2) j Assuming a missing, an observation and a missing value are obtained:and
abnormal sample x a Is expressed as:
due to the fact thatThen the
Step b1, calculatingConditional probability distributionTo give the corresponding z a And Q a :
Computing an anomalous sample x a Squared mahalanobis distance of
Step c1, calculating x j Degree of contribution ofI.e. x a And withThe squared difference of the mahalanobis distance of (a), which represents the degree of reduction of the anomaly measure of the anomalous sample after removing a certain dimension;
of D execution results, ifLess than a threshold for anomaly discrimination, i.e. degree of contributionIf the value is greater than the set abnormal judgment value, the abnormal sample x is a Of (j) th dimension pair x a Is responsible for the abnormality of;
and a second stage:
if not found in stage oneIf the value is less than the threshold value of the abnormal judgment, the abnormal sample x is indicated a If the anomaly of a single dimension is not enough to cause the sample anomaly, then the following operations are performed:
step a2, contribution degree according to each dimensionSequencing the data of each dimension from large to small;
step b2, removing the data of the two dimensions arranged at the top according to the sequence arranged in the step a2, and if the squared Mahalanobis distance calculated at the moment is smaller than the threshold value for judging the abnormality, the two corresponding dimensions are used for matching the sample x together a Is responsible for the abnormality of; otherwise, continuously removing the data of the dimension arranged at the top, calculating the squared Mahalanobis distance once the data of the dimension arranged at the top is removed until the calculated squared Mahalanobis distance is smaller than the threshold value of the abnormal judgment, and at the moment, the data of all the removed dimensions are used for the sample x together a Is responsible for the abnormality of (a).
And (3) analyzing algorithm complexity:
in the RMPCM algorithm, the main computational overhead is the inversion of the metric matrix Ψ of the traffic matrix and the number of iterations of the EM algorithm. Ψ is a matrix of D × D, D is the dimension of the sample X, corresponding to the number of columns of the flow matrix, i.e., the number of ODs (n × n), and Ψ is directly calculated in the calculation process -1 The complexity of the algorithm is greatly influenced, and psi can be obtained by utilizing the Woodbury matrix identity equation -1 =(WW T +τI D ) -1 =τ -1 I D-1 WM -1 W T Wherein M = W T W+τI d For the matrix of dXd, the PCA dimension reduction method is used to determine the reserved principal component number D, and D < D is known, so that the inversion of the matrix psi for solving the matrix of the dXD is converted into the inversion of the matrix M for solving the matrix of the dXd, and the computational complexity is greatly simplified. The time complexity is O (Td) 2 ). The time complexity of the algorithm is also related to the iteration times of the EM algorithm, and the iteration times in the calculation by adopting the rapid EM algorithm are generally less than 15. A Matlab is adopted to configure a computer with a core i 7.5 GHz CPU and a 4GB memory in a win7 system, an RMPCM detection algorithm is executed on a data set in a subsequent experiment, the running time of complete data is not more than 2.53s, and the running time of missing data is less than 5s, which is shown in Table 1. Will Matla if actually deployedb the code conversion to binary executable program will be further accelerated in computation speed.
TABLE 1RMPCM Algorithm execution time
Analyzing network measured data:
data set:
the data set of backbone network Abilene commonly used in network flow research is selected from the actually measured data set, and main users of the Abilene network are universities and scientific research institutions in the United states. Because the data in 2003 is complete and more methods are adopted for reference, the invention selects NetFlow data of 11 PoP nodes from 15 days 12 and 12 days 21 and 12 months in 2003, obtains the entry point and the exit point of each flow according to BGP and ISIS routing tables, and obtains the OD flow size and the flow matrix, which are shown in table 2. The data set is used for carrying out detection performance evaluation and sensitivity analysis under the condition of lacking data.
Table 2 Abilene traffic matrix dataset
And (3) detecting abnormality under data missing:
the detection results under the condition of no missing of the measured data are shown in fig. 7. When an abnormality detection experiment under the condition of actual measurement data set deletion is carried out, a data set with the number of groups (P) is selected to be sequentially tested according to the 4 deletion mechanisms provided above. In order to compare the similarities and differences of the RMPCM method under various deletion mechanisms and analyze the factors influencing the detection performance, the same loss rate is set under 3 mechanisms of complete random deletion, time period random deletion and OD random deletion to compare the influences of different mechanisms on the detection performance, and the influences of the same loss rate and different block sizes on the detection performance are compared under the 4 th mechanism of block random deletion. In order to eliminate the contingency of the experiment of losing data, 10 times of experiments are carried out, the average value of the experiments is taken, and an ROC curve is drawn by taking the detection result of RMPCM under the complete data as a reference.
In the completely random deletion experiment, 3 kinds of loss rates are selected, the loss data accounts for 10%,20%, and 50% of the total data volume of the traffic matrix, and only partial results are displayed in space, for example, fig. 8 (a) shows detection curves and ROC curves with the loss rates of 10%,50% in sequence; in the time period random deletion experiment, the number of cycles of random deletion is set to be 200, 400 and 1000 respectively, and since the total time period is 2010, the data amount discarded by the time period random deletion and the complete random deletion is similar, and partial results are shown in (b) of fig. 8; in the OD random missing experiment, since the whole column of the algorithm restriction matrix cannot be all empty, half of the adjacent data is selected to be empty, the total number of missing ODs is set to be 24, 48, 121, and 121, respectively, so that the discarded data amount of the OD random missing experiment is similar to the former two, and partial results are shown in (c) of fig. 8. Experiments of the three mechanisms show that the main factor influencing the detection rate is the loss of abnormal data, and meanwhile, the fact that the false alarm rate under the same detection rate is gradually increased along with the increase of the data loss rate and the performance shown by an ROC curve is gradually deteriorated is also found; compared with the three mechanisms, the detection performance is minimally affected by complete random deletion, the detection performance is randomly deleted in one time period, and the OD random deletion is maximal, the analysis reason is related to the size of the data quantity of each structured deletion, so that a 4 th block random deletion experiment is carried out, different block sizes are set in the experiment, but the same loss quantity is kept, three block sizes are respectively set, namely 5, 16, 40, the number of the deleted blocks is respectively 2000, 200, 30, and the loss quantity is kept accounting for about 20% of the total data quantity, and as shown in (d) in fig. 8, the ROC curve shows that the detection performance is greatly affected by the data quantity of the structured deletion, and the detection performance is obviously reduced when the block size is increased to 40.
And (3) sensitivity analysis:
the sensitivity analysis of the invention is mainly aimed at two situations: the sensitivity analysis of the number d of the reserved pivot elements is performed, and the sensitivity analysis of the flow measurement is performed. The PCA method was selected for comparison.
Selecting a flow number (F) data set to carry out a principal component number sensitivity analysis experiment, wherein the experimental result is shown in fig. 9, (a) and (c) respectively reserve detection curves with principal component numbers of 4 and 5 for PCA, visible curve profiles are completely different, and the detection results are very different; in fig. 9, (b) and (d) are detection curves corresponding to the RMPCM, respectively, and both the curve profiles and the detection results are consistent. The detection condition of the RMPCM method pivot number of 2 to 10 is verified in the experiment, the detection results are basically consistent, and high robustness is kept.
Data sets B, P and F are respectively selected to carry out sensitivity analysis of flow measurement, experiments set the threshold value of the cumulative variance contribution rate of the pivot element to be 0.85, the results are shown in figure 10, the curve profiles obtained by the PCA method under 3 measurement types are completely different, and the detection results are also greatly different; the detection curve profile of the RMPCM method is approximate, detection results are different but have larger relation, the number of the detected abnormity is the largest by taking the number of packets as the measurement, the result obtained by taking the number of streams as the measurement is contained in the abnormity detected by taking the number of packets as the measurement, the percentage is 59.6%, and the same result obtained by taking the number of bytes as the measurement is also contained in the abnormity obtained by taking the number of packets as the measurement, and the percentage is 43.5%. Due to the fact that no abnormal mark of the measured data set exists, the actual situation of the abnormal situation cannot be known, but the statistics obtained by taking the number of streams, the number of bytes and the number of packets as the measurement values are related, corresponding abnormal detection results are also overlapped, the results obtained by the PCA method conflict with one another at multiple places, the flow measurement degree is sensitive, and compared with the RMPCM method, the RMPCM method is high in robustness, and the obtained results are more reasonable.
The simulation experiment, the simulation platform experiment and the analysis of the internet measured data show that: the detection performance of the RMPCM method is superior to that of the PCA method and the ANTIDOTE method, the robustness is good, whether the data to be detected is complete or not and whether the detection environment has strong interference or not, the method shows relatively stable detection performance and has lower sensitivity to model parameters.

Claims (3)

1. A full-network anomaly detection and positioning method based on a robust multivariate probability calibration model is characterized by comprising the following steps: comprises the following steps:
step 1, modeling normal flow: establishing a normal model by using the acquired flow data;
step 2, flow abnormity detection: measuring whether the sample is abnormal or not by using the Mahalanobis distance between the sample and the normal model;
step 3, abnormal OD positioning: analyzing and positioning the position of the abnormal occurrence through the contribution to the abnormal sample OD flow;
the normal flow modeling in the step 1 comprises the following steps: constructing a normal model containing abnormal noise data and constructing a normal model under the condition that flow data is missing;
the method for constructing the normal model under the abnormal noise data comprises the following steps:
A. replacing normal distribution with t distribution, and establishing a hidden variable probability model:
is provided with each d Implicit vector t of dimension i All come from a D-dimensional feature vector x i The linear probability projection of (1), wherein D is more than or equal to D, t distribution is selected to replace normal distribution, unit variance t distribution is selected as prior distribution of hidden vectors, and a probability model is as follows:
p(t i )=S(t i |0,I d ,v) (5)
p(x i |t i )=S(x i |Wt i +μ,τI D ,v) (6)
wherein mu is a position vector, I is a unit matrix, v is the degree of freedom of t distribution, W is a projection matrix, and tau is a model parameter to be solved;
expanding a t distribution model into an infinite Gaussian mixture model with the same mean value, wherein the prior distribution of the t distribution model is gamma distribution, and the parameters are only related to the degree of freedom v of the t distribution:
wherein Ga (u | alpha, beta) = beta α u α-1 e -βu The/gamma (alpha) is a probability density function of gamma distribution, and the lambda is a scale matrix; according to the formula (7), introducing an implicit variable u, u being a scalar, u: ga (v/2 ), if t: t (mu, lambda, v), then t | u: N (mu, u, v) -1 Λ); the following was obtainedHidden variable model:
in order to obtain the model parameters { μ, τ, W, v }, it is also necessary to obtain u | x and t | x, u, which can be obtained from equations (9) and (10):
p(x i |u i )=∫p(x i |t i ,u i )p(t i |u i )dt=N(μ , WW T +τI D /u i ) (11)
let Ψ = WW T +τI D For a matrix of D × D, the edge distribution of x follows a t distribution, which can be expressed as:
p(x i )=S(μ,Ψ,v),
from equation (8) and equation (11), the following can be obtained:
wherein, delta 2 =(x i -μ) T Ψ -1 (x i μ), which can be determined from equation (9) and equation (10):
p(t i |x i ,u i )∝p(x i |t i ,u i )p(t i |u i )=N(M -1 W T (x i -μ),τΜ -1 /u i ) (13)
wherein M = W T W+τI d For a d x d matrix, t | x obeys a t distribution, i.e., p (t) i |x i )=S(M -1 W T (x i -μ),τΜ -1 ,v);
B. Carrying out model parameter solving:
the model parameters to be solved are Ω = { μ, τ, W, v, d }, and first, the number d of the reserved pivot elements is determined to be 5; for { mu, tau, W, v }, the maximum likelihood estimation is adopted to carry out parameter estimation of the model, and the log-likelihood function is as follows:
wherein N is the number of samples;
calculating the maximum likelihood estimation of the model parameters by adopting an REM algorithm, wherein the REM algorithm comprises two stages, each stage adopts an EM algorithm to estimate different parameters, and then, the two stages are circulated iteratively until a convergence condition is met;
the first stage is as follows:
this phase does not take t into account i Estimating only the parameter mu;
the log-likelihood function is:
e, step E:
according to the Jensen inequality, when p (u) i )=p(u i |x i ) When the equal sign is established, L is established 1 The lower bound of (2), L is obtained from the formula (11) 1 The expectation, see equation (16),<·&gt represents the calculation expectation:
wherein the content of the first and second substances,
and M:
will be provided with<L 1 &And g, calculating a partial derivative of mu and enabling the partial derivative to be equal to 0 to obtain a mu estimated value:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
The log-likelihood function is:
e, step E:
when p (t) i ,u i )=p(t i ,u i |x i ) Then, L can be obtained from the formula (8), the formula (9) and the formula (10) 2 The expectation of (2):
wherein, the first and the second end of the pipe are connected with each other,<u i &gt, see formula (17),
<t i >=M -1 W T (x i -μ) (21)
<u i t i >=<u i ><t i > (22)
and M:
maximization<L 2 &And obtaining an updating formula of the model parameters { W, tau, v }:
parameter(s)The estimate of (d) is obtained by a linear search of the following formula:
the normal model construction method under the condition of the flow data loss comprises the following steps:
sample data x i Divided into observed and missing values, i.e.
Then sample data x i The mean and covariance matrices of (a) may be expressed as:
due to the fact thatThen
Wherein the content of the first and second substances,
A. replacing normal distribution with t distribution to establish a hidden variable probability model;
B. carrying out model parameter solving: determining the number d of the reserved pivot elements as 5, and performing model parameter estimation under missing data by using a REM algorithm for { mu, tau, W, v }:
the first stage is as follows:
e, step E:
wherein, the first and the second end of the pipe are connected with each other,
<u i (x i -μ)(x i -μ) T >=Q i +<u i >(z i -μ)(z i -μ) T (31)
is sample data x i When some dimensions are missing, the number of observable dimensions remain;
and M:
will be provided with<L 1 ′&And calculating the partial derivative of mu and making the partial derivative equal to 0 to obtain a mu estimated value:
and a second stage:
adding an implicit variable t i Estimating the parameters { tau, W, v } and substituting the estimated value of mu obtained
E, step E:
wherein, the first and the second end of the pipe are connected with each other,
and M:
maximization<L 2 ′&And gt, obtaining an updating formula of model parameters { W, tau, v }:
the update of the parameter v is obtained by:
2. the method for detecting and positioning the full-network anomaly based on the robust multivariate probability calibration model as claimed in claim 1, wherein the method comprises the following steps: the specific method for detecting abnormal flow in step 2 is as follows:
by usingThe Mahalanobis distance of the sample is used for detecting the abnormal flow, and the squared Mahalanobis distance of the complete data sample is delta 2
δ 2 =(x i -μ) T Ψ -1 (x i -μ) (38)
For samples with some dimension missing, their Mahalanobis distance squared to find their expectationInstead of:
the abnormality is interpreted using a normally distributed "3 σ" control chart, namely: when the time sequence is abnormal, the corresponding distribution exceeds the control limit; let us denote the mahalanobis distance squared time series as δ 2 (t) mean value of μ M Variance isThe setting of the "3 σ" control map is as follows:
center line: CL = μ M
An upper control limit: UCL = μ M +3σ M
Lower control limits: LCL = μ M -3σ M
According to the normal distribution rule, the method comprises the following steps:
where Φ (-) is the probability distribution function of a standard normal distribution,
the discriminant formula of the "3 σ" control map is:
and (3 sigma) control chart is adopted to judge the abnormity, namely when the deviation of the value from the mean value exceeds 3 times of standard deviation, the occurrence of an abnormal event is judged, and the confidence coefficient is 99.74%.
3. The method for detecting and positioning the full-network anomaly based on the robust multivariate probability calibration model as claimed in claim 1, wherein the method comprises the following steps: the specific method for positioning the abnormal OD in the step 3 is as follows:
let the abnormal sample be x a In total, there are D dimensions, x j (j =1, \ 8230;, D) is x a The j-dimension variable is used for positioning abnormal OD by adopting the following contribution analysis method, and the method can be divided into two stages:
stage one: repeating the following steps a1 to c1 each time j =1, 2, 3, and D in sequence to obtain D execution results;
step a1, abnormal samples x a X in (2) j Assuming a missing, an observation and a missing value are obtained:andabnormal sample x a The mean and covariance matrices of (a) are expressed as:
due to the fact thatThen
Step b1, calculatingConditional probability distribution yielding corresponding z a And Q a :
Computing an anomalous sample x a Squared mahalanobis distance of
Step c1, calculating x j Degree of contribution ofI.e. x a Andthe squared difference of the mahalanobis distance of (a);
among the D execution results, ifLess than a threshold for anomaly discrimination, i.e. degree of contributionIf the value is greater than the set abnormal judgment value, the abnormal sample x is a J-th dimension pair x a Is responsible for the abnormality of;
and a second stage:
if not found in stage oneIf the value is less than the threshold value of the abnormal judgment, the abnormal sample x is indicated a If the anomaly of a single dimension is not enough to cause the sample anomaly, then the following operations are performed:
step a2, according to the contribution degree of each dimensionSequencing the data of each dimension from large to small;
step b2, removing the data of the two dimensions arranged at the top according to the sequence arranged in the step a2, and if the squared Mahalanobis distance calculated at the moment is smaller than the threshold value of the abnormal judgment, the two corresponding dimensions are used for matching the sample x together a Is responsible for the abnormality of (c); otherwise, continuously removing the data of the dimension arranged at the top, calculating the squared Mahalanobis distance once the data of the dimension arranged at the top is removed until the calculated squared Mahalanobis distance is smaller than the threshold value of the abnormal judgment, and at the moment, the data of all the removed dimensions are used for the sample x together a Is responsible for the abnormality of (a).
CN201510414996.5A 2015-07-15 2015-07-15 Whole network abnormality detection localization method based on healthy and strong multivariate probability calibrating patterns Active CN105119734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510414996.5A CN105119734B (en) 2015-07-15 2015-07-15 Whole network abnormality detection localization method based on healthy and strong multivariate probability calibrating patterns

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510414996.5A CN105119734B (en) 2015-07-15 2015-07-15 Whole network abnormality detection localization method based on healthy and strong multivariate probability calibrating patterns

Publications (2)

Publication Number Publication Date
CN105119734A CN105119734A (en) 2015-12-02
CN105119734B true CN105119734B (en) 2018-04-17

Family

ID=54667632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510414996.5A Active CN105119734B (en) 2015-07-15 2015-07-15 Whole network abnormality detection localization method based on healthy and strong multivariate probability calibrating patterns

Country Status (1)

Country Link
CN (1) CN105119734B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106101071B (en) * 2016-05-27 2019-04-05 杭州安恒信息技术股份有限公司 A kind of method of the defence link drain type CC attack of Behavior-based control triggering
CN106453392B (en) * 2016-11-14 2019-04-09 中国人民解放军防空兵学院 Whole network exception stream recognition method based on traffic characteristic distribution
CN106972968B (en) * 2017-03-29 2020-04-24 北京理工大学 Network abnormal flow detection method based on cross entropy and Mahalanobis distance
CN107426199B (en) * 2017-07-05 2020-10-30 浙江鹏信信息科技股份有限公司 Method and system for detecting and analyzing network abnormal behaviors
CN107688658A (en) * 2017-09-05 2018-02-13 北京奇艺世纪科技有限公司 The localization method and device of a kind of abnormal data
CN108733812B (en) * 2018-05-21 2021-09-14 华东师范大学 Identification method of abnormal data points in time series data based on global information
CN109164351A (en) * 2018-09-03 2019-01-08 北京许继电气有限公司 Internet of things equipment Analysis on monitoring data method and system based on time series
CN110012421A (en) * 2019-03-18 2019-07-12 青岛民航凯亚系统集成有限公司 A kind of airport Working vehicle position location abnormal point detecting method
CN109787846A (en) * 2019-03-27 2019-05-21 湖北大学 A kind of 5G network service quality exception monitoring and prediction technique and system
CN111027721B (en) * 2019-11-22 2023-07-18 中国铁道科学研究院集团有限公司通信信号研究所 System fault positioning method
CN111817875B (en) * 2020-06-03 2022-06-28 华为技术有限公司 Method and device for detecting network fault
CN112235134B (en) * 2020-09-29 2022-08-09 中国人民解放军战略支援部队信息工程大学 Router ownership detection method and system based on IP connection probability classification
CN113949550B (en) * 2021-10-08 2022-09-09 东北大学 DDoS attack detection method under SDN environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388885A (en) * 2008-07-23 2009-03-18 成都市华为赛门铁克科技有限公司 Detection method and system for distributed denial of service
CN101534305A (en) * 2009-04-24 2009-09-16 中国科学院计算技术研究所 Method and system for detecting network flow exception
CN102075383A (en) * 2010-12-29 2011-05-25 深圳市永达电子股份有限公司 Neural network-based low amplitude network flow anomaly detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388885A (en) * 2008-07-23 2009-03-18 成都市华为赛门铁克科技有限公司 Detection method and system for distributed denial of service
CN101534305A (en) * 2009-04-24 2009-09-16 中国科学院计算技术研究所 Method and system for detecting network flow exception
CN102075383A (en) * 2010-12-29 2011-05-25 深圳市永达电子股份有限公司 Neural network-based low amplitude network flow anomaly detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MOADA-SVR:一种基于支持向量回归的多元在线异常检测方法;钱叶魁 等;《通信学报》;20110225;第32卷(第2期);第4节 *

Also Published As

Publication number Publication date
CN105119734A (en) 2015-12-02

Similar Documents

Publication Publication Date Title
CN105119734B (en) Whole network abnormality detection localization method based on healthy and strong multivariate probability calibrating patterns
JP6716297B2 (en) System and method for predictive reliability mining
JP2019061565A (en) Abnormality diagnostic method and abnormality diagnostic device
CN107493277B (en) Large data platform online anomaly detection method based on maximum information coefficient
US20080270088A1 (en) Method and system for causal modeling and outlier detection
US10003985B1 (en) System and method for determining reliability of nodes in mobile wireless network
Chikhaoui et al. A new granger causal model for influence evolution in dynamic social networks: The case of dblp
CN110138614B (en) Tensor model-based online network flow anomaly detection method and system
CN114978568A (en) Data center management using machine learning
US20170124782A1 (en) Methods for detecting one or more aircraft anomalies and devices thereof
Chang et al. Addressing multicollinearity in semiconductor manufacturing
Hou et al. Detection and characterization of network anomalies in large-scale RTT time series
CN110460622A (en) A kind of network anomaly detection method based on Situation Awareness prediction technique
CN109101820B (en) Web application security vulnerability prediction method based on execution flow graph
Xu et al. An improved methodology for outlier detection in dynamic datasets
Zhang et al. Using the expected detection delay to assess the performance of different multivariate statistical process monitoring methods for multiplicative and drift faults
Chen et al. An efficient solution to locate sparsely congested links by network tomography
Ni et al. Information dynamic spectrum characterizes system instability toward critical transitions
CN117319047A (en) Network path analysis method and system based on network security anomaly detection
Li et al. Network-wide traffic anomaly detection and localization based on robust multivariate probabilistic calibration model
Singh et al. Unified framework for developing two dimensional software reliability growth models with change point
Younis et al. Towards the Impact of Security Vunnerabilities in Software Design: A Complex Network-Based Approach
CN109756379A (en) A kind of network performance abnormality detection and localization method based on the decomposition of matrix difference
Muthukumar et al. Denoising internet delay measurements using weak supervision
Nguyen et al. Uncertainty quantification for model parameters and hidden state variables in Bayesian dynamic linear models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Qian Yekui

Inventor after: Ye Lixin

Inventor after: Zhu Shaowei

Inventor after: Li Yuli

Inventor after: Du Jiang

Inventor after: Huang Hao

Inventor after: Yang Ruipeng

Inventor after: Luo Chaofeng

Inventor before: Qian Yekui

Inventor before: Jiang Wenfeng

Inventor before: Huang Hao

Inventor before: Zou Fuchun

Inventor before: Li Yuli

Inventor before: Liu Fengrong

Inventor before: Ye Lixin

Inventor before: Zhao Xin

Inventor before: Zhang Zhaoguang

Inventor before: Li Bainan

Inventor before: Liu Jianqiang

Inventor before: Li Tao

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant