CN114745187A

CN114745187A - Internal network anomaly detection method and system based on POP flow matrix

Info

Publication number: CN114745187A
Application number: CN202210412983.4A
Authority: CN
Inventors: 刘翔宇; 朱诗兵; 李玉巍; 王宇; 熊达鹏
Original assignee: Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Current assignee: Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2022-07-12
Anticipated expiration: 2042-04-19
Also published as: CN114745187B

Abstract

The invention discloses an internal network anomaly detection method and system based on a POP flow matrix, which comprises the steps of obtaining flow data among nodes to construct the POP flow matrix; encoding each flow data in the POP flow matrix into a vector; extracting frequency domain features of each flow data based on the vectors; performing data dimensionality reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix; generating an antagonistic network based on the training set training to generate an anomaly detection model; performing anomaly detection based on the anomaly detection model; the method carries out frequency analysis on the intercommunication flow among a plurality of nodes, not only contains the frequency domain of a single node, but also contains the cross-correlation information among all the nodes, and can carry out abnormal detection aiming at the flow conditions of all the nodes in the server.

Description

Internal network anomaly detection method and system based on POP flow matrix

Technical Field

The invention relates to the technical field of abnormal data detection methods, in particular to a method and a system for detecting an internal network abnormality based on a POP flow matrix.

Background

In the traditional malicious traffic detection, the characteristics of traffic are analyzed through a preset rule so as to identify malicious traffic, the rule aims to protect a legal network user from network attack, however, the detection of a rule base is difficult to detect zero-day attack; compared with the rule-based method, the machine learning-based method can effectively identify the zero-day malicious traffic, and can be used as a supplement to the traditional fixed rule-based method (namely signature-based NIDS); unfortunately, due to the processing overhead of the machine learning algorithm, the existing detection method has low detection precision and cannot process high-rate flow; most of the methods can only be deployed off-line and cannot realize real-time detection, especially in high-performance networks.

Meanwhile, an attacker can easily interfere with and circumvent the above method by injecting noise (e.g., packets generated by benign applications) into the attack traffic; traditional stream-level methods detect attacks by analyzing stream-level statistics, which can result in significant detection delays, and in addition, evading attacks can easily bypass traditional stream-level detection using coarse-grained stream-level statistics, and packet-level detection is difficult to achieve robust detection of covert attack patterns.

In addition, currently, the anomaly detection of traffic mainly aims at the frequency anomaly detection between two nodes, and has the defect that the frequency of the traffic communicated among a plurality of nodes cannot be analyzed simultaneously, namely the anomaly detection of the traffic condition of all the nodes in the server cannot be realized.

Disclosure of Invention

Aiming at the problems, the invention aims to design and provide an internal network anomaly detection method based on a POP flow matrix according to the characteristics of a high-performance network of an internal system (the flow throughput is huge, the real-time data packet-level anomaly detection is difficult to carry out, different from an internet user, the node flow in the system generally has stronger periodicity and repeatability, the nodes are limited, and a foundation is laid for carrying out anomaly detection by utilizing flow information among the nodes), the method effectively extracts and analyzes the frequency domain characteristics of the network flow through discrete Fourier transform, further visualizes and detects the anomaly of the flow frequency domain characteristics among the network nodes, and realizes real-time robust malicious flow detection by utilizing a machine learning algorithm; namely, the method performs frequency analysis on the intercommunication traffic among a plurality of nodes, not only contains the frequency domain of a single node, but also contains the cross-correlation information among all the nodes, and can perform anomaly detection aiming at the traffic conditions of all the nodes in the server.

The invention also provides an internal network anomaly detection system based on the POP traffic matrix.

The first technical scheme adopted by the invention is as follows: a method for detecting internal network abnormity based on a POP flow matrix comprises the following steps:

s100: acquiring flow data among nodes to construct a POP flow matrix;

s200: encoding each flow data in the POP flow matrix into a vector;

s300: extracting frequency domain features of each flow data based on the vectors;

s400: performing data dimensionality reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix;

s500: generating an antagonistic network based on the training set training to generate an anomaly detection model, the anomaly detection model comprising a loss function;

s600: and carrying out anomaly detection based on the anomaly detection model.

Preferably, the step S200 includes: and coding all the flow data detected in the detection step length, and converting the flow data into vector representation.

Preferably, the step S300 further includes: and carrying out normalization processing on the frequency domain characteristics by adopting a Min-Max algorithm.

Preferably, the step S400 includes: and respectively inputting the frequency domain characteristics of the flow data into a depth automatic encoder to perform data dimensionality reduction.

Preferably, the step S400 further includes: the low-dimensional data matrix is plotted as a thermodynamic diagram.

Preferably, the loss function in step S500 is expressed by the following formula:

L(z_γ)＝(1-λ)L_R(z_γ)+λL_D(z_γ)

wherein L (z)_γ) Is the total loss; l is a radical of an alcohol_D(z_γ) To identify losses; l is_R(z_γ) To reconstruct error losses; λ is a weighting coefficient.

Preferably, the step S600 includes:

obtaining an abnormal score of the POP flow matrix to be detected based on the loss function; and judging whether the internal network is abnormal or not based on the self-adaptive judging threshold value and the abnormal score of the POP flow matrix to be detected, and if the abnormal score of the POP flow matrix to be detected is larger than the self-adaptive judging threshold value, judging that the internal network is abnormal.

Preferably, the anomaly score is represented by the following formula:

A(x)＝(1-λ)R(x)+λD(x)

wherein A (x) is an abnormality score; r (x) is the reconstruction error score; d (x) is the discriminant score; λ is a weighting coefficient.

Preferably, the adaptive discrimination threshold is represented by the following formula:

in the formula, T is a self-adaptive discrimination threshold; a (i) is the abnormal score of the ith POP flow matrix sample after ascending sorting; the value of i is the number of POP traffic matrix samples multiplied by (1-rho); rho is the abnormal fraction of the flow data and the proportion of the abnormal data; n is a radical of_TIs the number of POP traffic matrix samples.

The second technical scheme adopted by the invention is as follows: an internal network anomaly detection system based on a POP flow matrix comprises a POP flow matrix construction module, an encoding module, a frequency domain feature extraction module, a data dimension reduction module, an anomaly detection model generation module and an anomaly detection module;

the POP flow matrix construction module is used for acquiring flow data among the nodes to construct a POP flow matrix;

the encoding module is used for encoding each flow data in the POP flow matrix into a vector;

the frequency domain feature extraction module is used for extracting frequency domain features of each flow data based on the vectors;

the data dimension reduction module is used for performing data dimension reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix;

the anomaly detection model generation module is used for generating an antagonistic network based on the training set training to generate an anomaly detection model, and the anomaly detection model comprises a loss function;

the anomaly detection module is used for carrying out anomaly detection based on the anomaly detection model.

The beneficial effects of the above technical scheme are that:

(1) the method for detecting the internal network abnormality based on the POP flow matrix effectively extracts and analyzes the frequency domain characteristics of the network flow through discrete Fourier transform, further performs visualization and abnormality detection of the flow frequency domain characteristics between network nodes, and utilizes a machine learning algorithm to realize real-time robust malicious flow detection.

(2) Compared with the prior art which mainly aims at the frequency abnormality detection between two nodes, for example, the frequency of a to b at ordinary times is F1, and becomes F2 when abnormality occurs, the method disclosed by the invention not only contains the frequency domain of a single node, but also contains the cross-correlation information between nodes, and can be used for carrying out abnormality detection on the flow conditions of all nodes in a server.

(3) The invention designs an internal network anomaly detection method based on a POP flow matrix aiming at the characteristics of a high-performance network of an internal system, the method uses DFT to extract the frequency domain characteristics of flow, analyzes the flow characteristics of the internal system from the angle of the frequency domain, and finally performs anomaly detection in a GAN network, thereby realizing the real-time anomaly detection of the high-throughput and high-performance network to a certain extent on the premise of ensuring the detection precision.

(4) The invention adopts an automatic encoder matrix to reduce the dimension of data, and converts M multiplied by K_sThe dimension information is converted into the information of M multiplied by M dimensions, the hyperspectral image is approximately reduced into a 2-dimensional plane image, then, the GAN network is directly used for carrying out abnormity detection on a digital (pixel) matrix (for example, a digital image acquired by a digital camera, a scanner, a CT or a magnetic resonance imaging mode and the like is seen by human eyes, but a digital device is allowed to see the image, the numerical value of each point in the recorded image is recorded, a color image is a multi-channel image with M multiplied by M amplitude, and a gray image is a single-channel image with M multiplied by M amplitude).

Drawings

Fig. 1 is a schematic flowchart of an internal network anomaly detection method based on a POP traffic matrix according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an internal network anomaly detection method based on a POP traffic matrix according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a POP traffic matrix provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of step size detection according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a new matrix obtained after encoding each traffic data in the POP traffic matrix into a vector according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a data cube (frequency domain features of each flow data) provided by an embodiment of the invention;

FIG. 7 is a schematic structural diagram of a depth automatic encoder according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating data dimensionality reduction by a depth auto-encoder according to one embodiment of the present invention;

fig. 9 is a schematic structural diagram of an internal network anomaly detection model based on multi-dimensional information according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in further detail with reference to the drawings and examples. The following detailed description of the embodiments and the accompanying drawings are provided to illustrate the principles of the invention and are not intended to limit the scope of the invention, which is defined by the claims, i.e., the invention is not limited to the preferred embodiments described.

In the description of the present invention, it is to be noted that, unless otherwise specified, "a plurality" means two or more; the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance; the specific meaning of the above terms in the present invention can be understood as appropriate to those of ordinary skill in the art.

Example one

As shown in fig. 1 and fig. 2, an embodiment of the present invention provides a method for detecting an internal network anomaly based on a POP traffic matrix, which includes the following steps:

s100: acquiring flow data among nodes to construct a POP flow matrix;

a pop (point to point) traffic matrix is a concept introduced to summarize the fact that traffic is viewed from a full-network viewpoint between domains or within a domain, and the traffic matrix reflects the traffic situation between all pairs of source nodes and destination nodes in a network;

as shown in fig. 3, assuming that M user nodes exist in a network, each node and other nodes are monitored for a period of time ω by setting up a flow probe in a server for a period of time T_sSize of inner flow

Get togetherObtaining N pieces of flow data; constructing a POP flow matrix based on the N pieces of flow data; wherein t is the frame number of the flow data; i and j represent the coordinates of the traffic matrix, respectively, e.g.

Representing the traffic from node 1 to node 2,

representing the traffic from node 1 to node M.

S200: encoding each flow data in the POP flow matrix into a vector;

as shown in fig. 4, the detection step ω in the time direction_TIn the method, the step length ω is detected_TThe flow data in the flow graph is converted into vector representation; traffic from node 1 to node 2

For example, in detecting the step size ω_TIn the inner, omega is detected_TThe flow data:

these traffic data are code converted into a vector representation:

in the formula, i is more than or equal to 1 and less than or equal to M; j is more than or equal to 1 and less than or equal to M; by analogy, a new matrix as shown in fig. 5 can be obtained.

The invention encodes the flow information between each node into vectors, thereby reducing the data scale and the cost of subsequent processing.

S300: dividing the coded vector to obtain a plurality of frames, and performing Discrete Fourier Transform (DFT) on each frame so as to extract frequency domain characteristics of each flow data;

the present invention extracts frequency domain features, S, from high-speed traffic by using discrete Fourier transform_ijDetecting step length omega from ith node to jth node_TInternal flow data directionAmount, S_ijIs expressed by the following formula:

will S_ijPerforming discrete Fourier transform on the vector in the time window T to obtain the frequency domain characteristics of the flow data in the time window T as follows:

in the formula, F_ijThe frequency domain characteristics after Fourier transform; f_ijkDetecting step length omega from i node to j node_TTraffic data in the frequency domain

A frequency component of (a); s_ijDetecting step length omega from ith node to jth node_TA traffic data vector within; n and k are parameters in a discrete Fourier transform formula respectively, wherein k is the serial number of a sampling point on a frequency spectrum, such as a kth complex exponential signal after DFT transformation; omega_TTo detect the step size.

Since the frequency characteristics of the discrete fourier transform output include complex numbers and cannot be used as the input of the machine learning classifier, it is necessary to further calculate the modulus of the complex numbers, and convert the complex data into real data by the following formula:

F_ijk＝a_ijk+b_ijk，1≤k≤ω_T

p_ijk＝p_ij(ω_T-k)

in the formula, F_ijkDetecting step length omega from i node to j node_TTraffic data in the frequency domain

A frequency component of (a); a is_ijkThe real part after DFT transformation; b_ijkThe complex part (Fourier transform and basic formula of module) after DFT transform; s_ijDetecting step length omega from ith node to jth node_TA traffic data vector within; p is a radical of_ijkIs F_ijkThe mold of (4); p is a radical of_ijIs F_ijThe die of (1).

Further, in one embodiment, in order to enable the frequency domain features to be better operated in the machine learning classifier, the frequency domain features are normalized by adopting a Min-Max algorithm, so that the values of all feature values are limited in an interval [0, 1 ]; the Min-Max algorithm formula is as follows:

in the formula, x' is a normalized characteristic value; x is the characteristic value of the input sample; x is the number of_minAnd x_maxRespectively the minimum value and the maximum value in the sample characteristic values.

The invention carries out normalization processing on the modulus of frequency domain representation generated by DFT, and floating point overflow caused by the instability problem of the factor value in the machine learning training process is prevented; p is a radical of_ijObtaining p 'after normalization treatment'_ijThrough the above steps, as shown in fig. 6, the data in one detection step can be output as M × K_sData cubeThe data cube is a frequency domain feature of each traffic data, a plane M × M (length × width) includes mutual information between nodes in the same frequency, and a depth (also referred to as a height) represents different frequency information.

S400: respectively inputting the frequency domain characteristics of each flow data into a depth automatic encoder to perform data dimensionality reduction so as to obtain a low-dimensional data matrix, and constructing a training set based on the low-dimensional data matrix;

in order to facilitate the work of the subsequent classifier, the invention uses a plurality of depth automatic encoders to perform data dimension reduction, the structure of the depth automatic encoders is shown in FIG. 7, and the input sample is compressed by the encoders to obtain the low-dimensional representation p of the original sample characteristic_cij，p_cijThen reconstructed by a decoder to obtain a reconstructed sample p ″_ij(ii) a The encoder and the decoder are all full connection layers, and the activation function is a tanh function; wherein, a sample p 'is input'_ijComprises the following steps:

the compressed samples are obtained by the following formula:

p_cij＝h(p′_ij；θ_e)

in the formula, p_cijTo compress the samples; p'_ijIs an input sample; theta_eAre encoder parameters.

Reconstruction error p of an automatic encoder_rijThe following equation is obtained:

p_rij＝f(p′_ij；p″_ij)

in the formula, ρ_rijIs a reconstruction error; p'_ijIs an input sample; p ″)_ijTo reconstruct the sample;

wherein, p'_ij＝g(p_cij；θ_d)，θ_dAre decoder parameters.

In particular, the error p is reconstructed_rij2-dimensional features, namely Euclidean distance and cosine similarity respectively; wherein the Euclidean distance (Euclidean distance) is represented by the following formulaThe following steps:

in the formula, L₁(p′_ij；p″_ij) Is p'_ijAnd p ″)_ijThe Euclidean distance between; p'_ijIs an input sample; p ″)_ijTo reconstruct the sample; p'_ijnIs p'_ijThe nth row in the column vector of (a); p ″_ijnIs p ″)_ijThe nth row in the column vector of (a); p'_ijAnd p ″)_ijIs 1 XK_sA column vector of (a);

the cosine similarity is expressed by the following formula:

in the formula, L₂(p′_ij；p″_ij) Is p'_ijAnd p ″)_ijCosine similarity between them; p'_ijIs an input sample; p ″)_ijTo reconstruct the sample; p'_ijnIs p'_ijThe nth row in the column vector of (a); p ″)_ijnIs p ″)_ijIs selected to be the nth row in the column vector of (a).

As shown in fig. 8, the dimension of frequency information is changed from K by using M × M depth auto-encoders_sDimension conversion to 1 dimension, resulting in a low-dimensional representation (compressed samples) p of the input information_ci_jFinally, the M multiplied by M low-dimensional data matrix R is output^M×MIn total, output N_TConstructing a training set based on all the low-dimensional data matrixes; wherein the content of the first and second substances,

further, in one embodiment, to facilitate visualization of flow monitoring, a low-dimensional data matrix R is used^M×MAnd rapidly drawing into a thermodynamic diagram.

S500: generating a countermeasure network (GAN) based on a training set training to obtain an anomaly detection model, the anomaly detection model including a loss function;

the generation countermeasure network (GAN) consists of two countermeasure modules, a generator G and a discriminator D; the generator G can be seen as a mapping G (Z) of samples Z into matrix spatial manifold χ, learning the distribution P over the normal data x using a uniformly distributed one-dimensional input noise vector sampled from the hidden space Z_g(ii) a In this case, the network architecture of the generator G is equivalent to a convolutional decoder composed with a stride convolution; discriminator D is a standard convolutional neural network that maps the data matrix to a single scalar value D (·); the output of the discriminator D (-) can be interpreted as the probability that the discriminator D decides whether the given input is from the true image x sampled by the training data χ or G (z) generated by the generator G; d and G are optimized by a function min max V (G, D).

Known training set has N_TData matrix of size M

From each matrix, K sub-matrices x of size c x c are sampled at random positions_k，mE x, K is 1, 2, K; in the training process, only normal data is used for generating the countermeasure network through unsupervised training, so that the generated countermeasure network learns normal data manifold.

The discriminator is trained to mark the actual sample as "true" to the maximum extent possible, while the actual sample from P is marked as "true" sample_gIs marked as "false" sample; by minimizing V (G) ═ log [1-D (G (z))]Training generator G to fool D, which is equivalent to maximizing v (G) ═ D (G (z)); during the confrontation training, the generator can gradually improve the capacity of generating similar and vivid matrixes, the discriminator can also improve the capacity of identifying the real matrixes and the generated matrixes, and finally the abnormal detection model is generated through training.

The invention performs anomaly detection of traffic matrix frequency domain features using an unsupervised anomaly detection model based on a generative countermeasure network that trains generators and discriminators with characteristics of the generative countermeasure network (GAN) to simultaneously distinguish generated data from authentic data.

When the confrontation training is complete, the generator has learned the x-mapping from the hidden spatial representation z to the real (normal) image (POP traffic matrix):

G(z)＝z→x

because the hidden space has the smooth and continuous property, two images which are similar in vision can be generated by sampling from two close points in the hidden space theoretically; given an input image (POP flow matrix to be detected) x, finding a point z from a hidden space, wherein the point is in one-to-one correspondence with an image G (z), the image is most similar to the query image x in vision, and the image is also positioned in a manifold x; the degree of similarity of x and G (z) depends on how well a given input image follows the data distribution p for the training generator_g(ii) a To find the best Z, Z can be randomly sampled from the implicit spatial distribution Z₁And input it into a trained generator to obtain a generated image G (z)₁) (ii) a Based on the generated image G (z)₁) Defining a loss function which guarantees z₁Gradient update of coefficient, thereby updating position in hidden variable to obtain new z₂(ii) a And so on until iteration gamma finds the most similar image G (z)_γ) And carrying out multiple back propagation steps by utilizing an iterative method to optimize the position of Z in the latent space Z.

The anomaly detection model includes a loss function including a reconstruction error loss and a discrimination loss;

the reconstruction error loss is used for measuring an input POP flow matrix x to be detected and a matrix G (z) generated in a matrix space_γ) The difference between them, the reconstruction error loss is expressed by the following formula:

L_R(z_γ)＝∑|x-G(z_γ)|

in the formula, L_R(z_γ) To reconstruct error losses; x is P to be detectedAn OP traffic matrix; g (z)_γ) A matrix generated in a matrix space;

the discrimination loss is expressed by the following formula:

L_D(z_γ)＝∑|f(x)-f(G(z_γ))|

in the formula, L_D(z_γ) To discriminate losses, the output of the discriminator intermediate layer f (-) is used to specify statistical information of the output matrix; x is a POP flow matrix to be detected; g (z)_γ) A matrix generated in a matrix space;

the loss function is defined as a weighted sum of two components, the total loss function being represented by the following formula:

L(z_γ)＝(1-λ)L_R(z_γ)+λL_D(z_γ)

wherein L (z)_γ) Is the total loss; l is a radical of an alcohol_D(z_γ) To identify losses; l is_R(z_γ) To reconstruct error losses; λ is a weighting coefficient for representing the weight of R (x), D (x).

S600: performing anomaly detection based on the anomaly detection model;

when the new data (POP flow matrix x to be detected) is subjected to abnormity identification, the POP flow matrix to be detected is subjected to frequency domain analysis and data dimension reduction processing to obtain a low-dimensional data matrix, and the low-dimensional data matrix is input into an abnormity detection model for abnormity detection;

the abnormal detection comprises evaluating a POP flow matrix x to be detected, judging whether the POP flow matrix x is a normal image or an abnormal image, and specifically comprises the following steps:

the loss function of the anomaly detection model is used for mapping the POP flow matrix x to be detected to a hidden space, and the loss function evaluates the generated matrix G (z) when updating iteration gamma times_γ) Degree of similarity of matching with the images (training set) input in the confrontation training; therefore, the abnormal score a (x) can be directly obtained from the mapping loss function, and the score represents the fitting degree of the POP traffic matrix x to be detected and the normal matrix model:

A(x)＝(1-λ)R(x)+λD(x)

wherein A (x) is an abnormality score; r (x) is the reconstruction error score; d (x) is the discriminant score; λ is a weighting coefficient for representing the weight of R (x), D (x); for abnormal matrix data, the abnormal detection model can generate a larger abnormal score, and the smaller abnormal score means that a very similar matrix is trained in the training process; the abnormal score can be directly judged on the basis of setting an adaptive threshold value.

Judging whether the internal network is abnormal or not based on the adaptive judging threshold T and the abnormal score of the POP flow matrix to be detected, and if the abnormal score of the POP flow matrix to be detected is larger than the adaptive judging threshold T, judging that the internal network is abnormal; the manner of abnormality determination is as follows:

the invention obtains a self-adaptive discrimination threshold T according to the energy of sample data and the proportion rho of abnormal data, and specifically comprises the following steps:

for N pieces of flow data, let ω_TFor step length detection, after DFT processing, N can be obtained_TThe method comprises the following steps of calculating the abnormal score of each sample through an unsupervised abnormal detection model based on a generated countermeasure network, sequencing all samples in an ascending order according to the abnormal score value, and expressing a self-adaptive discrimination threshold T through the following formula:

in the formula, T is a self-adaptive discrimination threshold; a (i) is the abnormal score of the ith POP flow matrix sample after ascending sorting; the numerical value of i is the number of POP flow matrix samples multiplied by (1-rho), and the downward integer of the result is taken; rho is the abnormal fraction of the flow data and the proportion of the abnormal data, the rho value of the data set cannot be predicted in practice, and a proper rho value is determined by adopting a binary-fold half-search method; n is a radical of_TIs the number of POP traffic matrix samples.

The invention adopts the automatic encoder matrix to carry out data dimension reduction, and converts MxM into a databaseK_sDimension information is converted into M multiplied by M dimension information, dimension reduction of a hyperspectral image into a 2-dimensional plane image is approximately seen, then abnormality detection of a digital (pixel) matrix is directly carried out by using a GAN network, namely data in the abnormality detection process of the internal network is in a matrix form, the GAN network is applied to the invention, the matrix data is directly processed, numerical value-pixel conversion is not needed, the process of converting the matrix into the image does not exist, and the abnormality detection is directly carried out by using the GAN network based on a flow data matrix.

Example two

As shown in fig. 9, an embodiment of the present invention provides an internal network anomaly detection system based on a POP traffic matrix, which includes a POP traffic matrix construction module, an encoding module, a frequency domain feature extraction module, a data dimension reduction module, an anomaly detection model generation module, and an anomaly detection module;

the frequency domain feature extraction module is used for extracting the frequency domain features of the flow data based on the vectors;

the anomaly detection model generation module is used for generating an antagonistic network based on the training set training to generate an anomaly detection model;

and the anomaly detection module is used for carrying out anomaly detection based on the anomaly detection model.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for detecting internal network abnormity based on a POP flow matrix is characterized by comprising the following steps:

s100: acquiring flow data among nodes to construct a POP flow matrix;

s200: encoding each flow data in the POP flow matrix into a vector;

s500: generating a countermeasure network based on the training set training to generate an anomaly detection model, the anomaly detection model including a loss function;

s600: and carrying out anomaly detection based on the anomaly detection model.

2. The method according to claim 1, wherein the step S200 includes: and coding all the flow data detected in the detection step length, and converting the flow data into vector representation.

3. The internal network anomaly detection method according to claim 1, wherein said step S300 further comprises: and carrying out normalization processing on the frequency domain characteristics by adopting a Min-Max algorithm.

4. The method according to claim 1, wherein the step S400 comprises: and respectively inputting the frequency domain characteristics of the flow data into a depth automatic encoder to perform data dimension reduction.

5. The internal network anomaly detection method according to claim 1, wherein said step S400 further comprises: the low-dimensional data matrix is plotted as a thermodynamic diagram.

6. The method according to claim 1, wherein the loss function in step S500 is expressed by the following formula:

L(z_γ)＝(1-λ)L_R(z_γ)+λL_D(z_γ)

wherein L (z)_γ) Is the total loss; l is_D(z_γ) To identify losses; l is_R(z_γ) To reconstruct error losses; λ is a weighting coefficient.

7. The method according to claim 1, wherein the step S600 comprises:

8. The internal network anomaly detection method according to claim 7, characterized in that said anomaly score is expressed by the following formula:

A(x)＝(1-λ)R(x)+λD(x)

9. The internal network anomaly detection method according to claim 7, characterized in that said adaptive discrimination threshold is expressed by the following formula:

in the formula, T is a self-adaptive discrimination threshold; a (i) is the abnormal score of the ith POP flow matrix sample after ascending sorting; the value of i is the number of POP traffic matrix samples multiplied by (1-rho); ρ is an abnormal score of the flow data and abnormal dataThe ratio of (A) to (B); n is a radical of_TIs the number of POP traffic matrix samples.

10. An internal network anomaly detection system based on a POP flow matrix is characterized by comprising a POP flow matrix construction module, a coding module, a frequency domain feature extraction module, a data dimension reduction module, an anomaly detection model generation module and an anomaly detection module;