CN114745187A - Internal network anomaly detection method and system based on POP flow matrix - Google Patents

Internal network anomaly detection method and system based on POP flow matrix Download PDF

Info

Publication number
CN114745187A
CN114745187A CN202210412983.4A CN202210412983A CN114745187A CN 114745187 A CN114745187 A CN 114745187A CN 202210412983 A CN202210412983 A CN 202210412983A CN 114745187 A CN114745187 A CN 114745187A
Authority
CN
China
Prior art keywords
matrix
flow
anomaly detection
data
pop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210412983.4A
Other languages
Chinese (zh)
Other versions
CN114745187B (en
Inventor
刘翔宇
朱诗兵
李玉巍
王宇
熊达鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Original Assignee
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peoples Liberation Army Strategic Support Force Aerospace Engineering University filed Critical Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority to CN202210412983.4A priority Critical patent/CN114745187B/en
Publication of CN114745187A publication Critical patent/CN114745187A/en
Application granted granted Critical
Publication of CN114745187B publication Critical patent/CN114745187B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an internal network anomaly detection method and system based on a POP flow matrix, which comprises the steps of obtaining flow data among nodes to construct the POP flow matrix; encoding each flow data in the POP flow matrix into a vector; extracting frequency domain features of each flow data based on the vectors; performing data dimensionality reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix; generating an antagonistic network based on the training set training to generate an anomaly detection model; performing anomaly detection based on the anomaly detection model; the method carries out frequency analysis on the intercommunication flow among a plurality of nodes, not only contains the frequency domain of a single node, but also contains the cross-correlation information among all the nodes, and can carry out abnormal detection aiming at the flow conditions of all the nodes in the server.

Description

Internal network anomaly detection method and system based on POP flow matrix
Technical Field
The invention relates to the technical field of abnormal data detection methods, in particular to a method and a system for detecting an internal network abnormality based on a POP flow matrix.
Background
In the traditional malicious traffic detection, the characteristics of traffic are analyzed through a preset rule so as to identify malicious traffic, the rule aims to protect a legal network user from network attack, however, the detection of a rule base is difficult to detect zero-day attack; compared with the rule-based method, the machine learning-based method can effectively identify the zero-day malicious traffic, and can be used as a supplement to the traditional fixed rule-based method (namely signature-based NIDS); unfortunately, due to the processing overhead of the machine learning algorithm, the existing detection method has low detection precision and cannot process high-rate flow; most of the methods can only be deployed off-line and cannot realize real-time detection, especially in high-performance networks.
Meanwhile, an attacker can easily interfere with and circumvent the above method by injecting noise (e.g., packets generated by benign applications) into the attack traffic; traditional stream-level methods detect attacks by analyzing stream-level statistics, which can result in significant detection delays, and in addition, evading attacks can easily bypass traditional stream-level detection using coarse-grained stream-level statistics, and packet-level detection is difficult to achieve robust detection of covert attack patterns.
In addition, currently, the anomaly detection of traffic mainly aims at the frequency anomaly detection between two nodes, and has the defect that the frequency of the traffic communicated among a plurality of nodes cannot be analyzed simultaneously, namely the anomaly detection of the traffic condition of all the nodes in the server cannot be realized.
Disclosure of Invention
Aiming at the problems, the invention aims to design and provide an internal network anomaly detection method based on a POP flow matrix according to the characteristics of a high-performance network of an internal system (the flow throughput is huge, the real-time data packet-level anomaly detection is difficult to carry out, different from an internet user, the node flow in the system generally has stronger periodicity and repeatability, the nodes are limited, and a foundation is laid for carrying out anomaly detection by utilizing flow information among the nodes), the method effectively extracts and analyzes the frequency domain characteristics of the network flow through discrete Fourier transform, further visualizes and detects the anomaly of the flow frequency domain characteristics among the network nodes, and realizes real-time robust malicious flow detection by utilizing a machine learning algorithm; namely, the method performs frequency analysis on the intercommunication traffic among a plurality of nodes, not only contains the frequency domain of a single node, but also contains the cross-correlation information among all the nodes, and can perform anomaly detection aiming at the traffic conditions of all the nodes in the server.
The invention also provides an internal network anomaly detection system based on the POP traffic matrix.
The first technical scheme adopted by the invention is as follows: a method for detecting internal network abnormity based on a POP flow matrix comprises the following steps:
s100: acquiring flow data among nodes to construct a POP flow matrix;
s200: encoding each flow data in the POP flow matrix into a vector;
s300: extracting frequency domain features of each flow data based on the vectors;
s400: performing data dimensionality reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix;
s500: generating an antagonistic network based on the training set training to generate an anomaly detection model, the anomaly detection model comprising a loss function;
s600: and carrying out anomaly detection based on the anomaly detection model.
Preferably, the step S200 includes: and coding all the flow data detected in the detection step length, and converting the flow data into vector representation.
Preferably, the step S300 further includes: and carrying out normalization processing on the frequency domain characteristics by adopting a Min-Max algorithm.
Preferably, the step S400 includes: and respectively inputting the frequency domain characteristics of the flow data into a depth automatic encoder to perform data dimensionality reduction.
Preferably, the step S400 further includes: the low-dimensional data matrix is plotted as a thermodynamic diagram.
Preferably, the loss function in step S500 is expressed by the following formula:
L(zγ)=(1-λ)LR(zγ)+λLD(zγ)
wherein L (z)γ) Is the total loss; l is a radical of an alcoholD(zγ) To identify losses; l isR(zγ) To reconstruct error losses; λ is a weighting coefficient.
Preferably, the step S600 includes:
obtaining an abnormal score of the POP flow matrix to be detected based on the loss function; and judging whether the internal network is abnormal or not based on the self-adaptive judging threshold value and the abnormal score of the POP flow matrix to be detected, and if the abnormal score of the POP flow matrix to be detected is larger than the self-adaptive judging threshold value, judging that the internal network is abnormal.
Preferably, the anomaly score is represented by the following formula:
A(x)=(1-λ)R(x)+λD(x)
wherein A (x) is an abnormality score; r (x) is the reconstruction error score; d (x) is the discriminant score; λ is a weighting coefficient.
Preferably, the adaptive discrimination threshold is represented by the following formula:
Figure BDA0003603004330000021
in the formula, T is a self-adaptive discrimination threshold; a (i) is the abnormal score of the ith POP flow matrix sample after ascending sorting; the value of i is the number of POP traffic matrix samples multiplied by (1-rho); rho is the abnormal fraction of the flow data and the proportion of the abnormal data; n is a radical ofTIs the number of POP traffic matrix samples.
The second technical scheme adopted by the invention is as follows: an internal network anomaly detection system based on a POP flow matrix comprises a POP flow matrix construction module, an encoding module, a frequency domain feature extraction module, a data dimension reduction module, an anomaly detection model generation module and an anomaly detection module;
the POP flow matrix construction module is used for acquiring flow data among the nodes to construct a POP flow matrix;
the encoding module is used for encoding each flow data in the POP flow matrix into a vector;
the frequency domain feature extraction module is used for extracting frequency domain features of each flow data based on the vectors;
the data dimension reduction module is used for performing data dimension reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix;
the anomaly detection model generation module is used for generating an antagonistic network based on the training set training to generate an anomaly detection model, and the anomaly detection model comprises a loss function;
the anomaly detection module is used for carrying out anomaly detection based on the anomaly detection model.
The beneficial effects of the above technical scheme are that:
(1) the method for detecting the internal network abnormality based on the POP flow matrix effectively extracts and analyzes the frequency domain characteristics of the network flow through discrete Fourier transform, further performs visualization and abnormality detection of the flow frequency domain characteristics between network nodes, and utilizes a machine learning algorithm to realize real-time robust malicious flow detection.
(2) Compared with the prior art which mainly aims at the frequency abnormality detection between two nodes, for example, the frequency of a to b at ordinary times is F1, and becomes F2 when abnormality occurs, the method disclosed by the invention not only contains the frequency domain of a single node, but also contains the cross-correlation information between nodes, and can be used for carrying out abnormality detection on the flow conditions of all nodes in a server.
(3) The invention designs an internal network anomaly detection method based on a POP flow matrix aiming at the characteristics of a high-performance network of an internal system, the method uses DFT to extract the frequency domain characteristics of flow, analyzes the flow characteristics of the internal system from the angle of the frequency domain, and finally performs anomaly detection in a GAN network, thereby realizing the real-time anomaly detection of the high-throughput and high-performance network to a certain extent on the premise of ensuring the detection precision.
(4) The invention adopts an automatic encoder matrix to reduce the dimension of data, and converts M multiplied by KsThe dimension information is converted into the information of M multiplied by M dimensions, the hyperspectral image is approximately reduced into a 2-dimensional plane image, then, the GAN network is directly used for carrying out abnormity detection on a digital (pixel) matrix (for example, a digital image acquired by a digital camera, a scanner, a CT or a magnetic resonance imaging mode and the like is seen by human eyes, but a digital device is allowed to see the image, the numerical value of each point in the recorded image is recorded, a color image is a multi-channel image with M multiplied by M amplitude, and a gray image is a single-channel image with M multiplied by M amplitude).
Drawings
Fig. 1 is a schematic flowchart of an internal network anomaly detection method based on a POP traffic matrix according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an internal network anomaly detection method based on a POP traffic matrix according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a POP traffic matrix provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of step size detection according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a new matrix obtained after encoding each traffic data in the POP traffic matrix into a vector according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a data cube (frequency domain features of each flow data) provided by an embodiment of the invention;
FIG. 7 is a schematic structural diagram of a depth automatic encoder according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating data dimensionality reduction by a depth auto-encoder according to one embodiment of the present invention;
fig. 9 is a schematic structural diagram of an internal network anomaly detection model based on multi-dimensional information according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. The following detailed description of the embodiments and the accompanying drawings are provided to illustrate the principles of the invention and are not intended to limit the scope of the invention, which is defined by the claims, i.e., the invention is not limited to the preferred embodiments described.
In the description of the present invention, it is to be noted that, unless otherwise specified, "a plurality" means two or more; the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance; the specific meaning of the above terms in the present invention can be understood as appropriate to those of ordinary skill in the art.
Example one
As shown in fig. 1 and fig. 2, an embodiment of the present invention provides a method for detecting an internal network anomaly based on a POP traffic matrix, which includes the following steps:
s100: acquiring flow data among nodes to construct a POP flow matrix;
a pop (point to point) traffic matrix is a concept introduced to summarize the fact that traffic is viewed from a full-network viewpoint between domains or within a domain, and the traffic matrix reflects the traffic situation between all pairs of source nodes and destination nodes in a network;
as shown in fig. 3, assuming that M user nodes exist in a network, each node and other nodes are monitored for a period of time ω by setting up a flow probe in a server for a period of time TsSize of inner flow
Figure BDA0003603004330000051
Get togetherObtaining N pieces of flow data; constructing a POP flow matrix based on the N pieces of flow data; wherein t is the frame number of the flow data; i and j represent the coordinates of the traffic matrix, respectively, e.g.
Figure BDA0003603004330000052
Representing the traffic from node 1 to node 2,
Figure BDA0003603004330000053
representing the traffic from node 1 to node M.
S200: encoding each flow data in the POP flow matrix into a vector;
as shown in fig. 4, the detection step ω in the time directionTIn the method, the step length ω is detectedTThe flow data in the flow graph is converted into vector representation; traffic from node 1 to node 2
Figure BDA0003603004330000054
For example, in detecting the step size ωTIn the inner, omega is detectedTThe flow data:
Figure BDA0003603004330000055
these traffic data are code converted into a vector representation:
Figure BDA0003603004330000056
in the formula, i is more than or equal to 1 and less than or equal to M; j is more than or equal to 1 and less than or equal to M; by analogy, a new matrix as shown in fig. 5 can be obtained.
The invention encodes the flow information between each node into vectors, thereby reducing the data scale and the cost of subsequent processing.
S300: dividing the coded vector to obtain a plurality of frames, and performing Discrete Fourier Transform (DFT) on each frame so as to extract frequency domain characteristics of each flow data;
the present invention extracts frequency domain features, S, from high-speed traffic by using discrete Fourier transformijDetecting step length omega from ith node to jth nodeTInternal flow data directionAmount, SijIs expressed by the following formula:
Figure BDA0003603004330000057
will SijPerforming discrete Fourier transform on the vector in the time window T to obtain the frequency domain characteristics of the flow data in the time window T as follows:
Figure BDA0003603004330000058
Figure BDA0003603004330000059
in the formula, FijThe frequency domain characteristics after Fourier transform; fijkDetecting step length omega from i node to j nodeTTraffic data in the frequency domain
Figure BDA00036030043300000510
A frequency component of (a); sijDetecting step length omega from ith node to jth nodeTA traffic data vector within; n and k are parameters in a discrete Fourier transform formula respectively, wherein k is the serial number of a sampling point on a frequency spectrum, such as a kth complex exponential signal after DFT transformation; omegaTTo detect the step size.
Since the frequency characteristics of the discrete fourier transform output include complex numbers and cannot be used as the input of the machine learning classifier, it is necessary to further calculate the modulus of the complex numbers, and convert the complex data into real data by the following formula:
Fijk=aijk+bijk,1≤k≤ωT
Figure BDA0003603004330000061
Figure BDA0003603004330000062
Figure BDA0003603004330000063
pijk=pijT-k)
in the formula, FijkDetecting step length omega from i node to j nodeTTraffic data in the frequency domain
Figure BDA0003603004330000064
A frequency component of (a); a isijkThe real part after DFT transformation; bijkThe complex part (Fourier transform and basic formula of module) after DFT transform; sijDetecting step length omega from ith node to jth nodeTA traffic data vector within; p is a radical ofijkIs FijkThe mold of (4); p is a radical ofijIs FijThe die of (1).
Further, in one embodiment, in order to enable the frequency domain features to be better operated in the machine learning classifier, the frequency domain features are normalized by adopting a Min-Max algorithm, so that the values of all feature values are limited in an interval [0, 1 ]; the Min-Max algorithm formula is as follows:
Figure BDA0003603004330000065
in the formula, x' is a normalized characteristic value; x is the characteristic value of the input sample; x is the number ofminAnd xmaxRespectively the minimum value and the maximum value in the sample characteristic values.
The invention carries out normalization processing on the modulus of frequency domain representation generated by DFT, and floating point overflow caused by the instability problem of the factor value in the machine learning training process is prevented; p is a radical ofijObtaining p 'after normalization treatment'ijThrough the above steps, as shown in fig. 6, the data in one detection step can be output as M × KsData cubeThe data cube is a frequency domain feature of each traffic data, a plane M × M (length × width) includes mutual information between nodes in the same frequency, and a depth (also referred to as a height) represents different frequency information.
S400: respectively inputting the frequency domain characteristics of each flow data into a depth automatic encoder to perform data dimensionality reduction so as to obtain a low-dimensional data matrix, and constructing a training set based on the low-dimensional data matrix;
in order to facilitate the work of the subsequent classifier, the invention uses a plurality of depth automatic encoders to perform data dimension reduction, the structure of the depth automatic encoders is shown in FIG. 7, and the input sample is compressed by the encoders to obtain the low-dimensional representation p of the original sample characteristiccij,pcijThen reconstructed by a decoder to obtain a reconstructed sample p ″ij(ii) a The encoder and the decoder are all full connection layers, and the activation function is a tanh function; wherein, a sample p 'is input'ijComprises the following steps:
Figure BDA0003603004330000071
the compressed samples are obtained by the following formula:
pcij=h(p′ij;θe)
in the formula, pcijTo compress the samples; p'ijIs an input sample; thetaeAre encoder parameters.
Reconstruction error p of an automatic encoderrijThe following equation is obtained:
prij=f(p′ij;p″ij)
in the formula, ρrijIs a reconstruction error; p'ijIs an input sample; p ″)ijTo reconstruct the sample;
wherein, p'ij=g(pcij;θd),θdAre decoder parameters.
In particular, the error p is reconstructedrij2-dimensional features, namely Euclidean distance and cosine similarity respectively; wherein the Euclidean distance (Euclidean distance) is represented by the following formulaThe following steps:
Figure BDA0003603004330000072
Figure BDA0003603004330000073
in the formula, L1(p′ij;p″ij) Is p'ijAnd p ″)ijThe Euclidean distance between; p'ijIs an input sample; p ″)ijTo reconstruct the sample; p'ijnIs p'ijThe nth row in the column vector of (a); p ″ijnIs p ″)ijThe nth row in the column vector of (a); p'ijAnd p ″)ijIs 1 XKsA column vector of (a);
the cosine similarity is expressed by the following formula:
Figure BDA0003603004330000074
in the formula, L2(p′ij;p″ij) Is p'ijAnd p ″)ijCosine similarity between them; p'ijIs an input sample; p ″)ijTo reconstruct the sample; p'ijnIs p'ijThe nth row in the column vector of (a); p ″)ijnIs p ″)ijIs selected to be the nth row in the column vector of (a).
As shown in fig. 8, the dimension of frequency information is changed from K by using M × M depth auto-encoderssDimension conversion to 1 dimension, resulting in a low-dimensional representation (compressed samples) p of the input informationcijFinally, the M multiplied by M low-dimensional data matrix R is outputM×MIn total, output NTConstructing a training set based on all the low-dimensional data matrixes; wherein the content of the first and second substances,
Figure BDA0003603004330000081
further, in one embodiment, to facilitate visualization of flow monitoring, a low-dimensional data matrix R is usedM×MAnd rapidly drawing into a thermodynamic diagram.
S500: generating a countermeasure network (GAN) based on a training set training to obtain an anomaly detection model, the anomaly detection model including a loss function;
the generation countermeasure network (GAN) consists of two countermeasure modules, a generator G and a discriminator D; the generator G can be seen as a mapping G (Z) of samples Z into matrix spatial manifold χ, learning the distribution P over the normal data x using a uniformly distributed one-dimensional input noise vector sampled from the hidden space Zg(ii) a In this case, the network architecture of the generator G is equivalent to a convolutional decoder composed with a stride convolution; discriminator D is a standard convolutional neural network that maps the data matrix to a single scalar value D (·); the output of the discriminator D (-) can be interpreted as the probability that the discriminator D decides whether the given input is from the true image x sampled by the training data χ or G (z) generated by the generator G; d and G are optimized by a function min max V (G, D).
Known training set has NTData matrix of size M
Figure BDA0003603004330000082
From each matrix, K sub-matrices x of size c x c are sampled at random positionsk,mE x, K is 1, 2, K; in the training process, only normal data is used for generating the countermeasure network through unsupervised training, so that the generated countermeasure network learns normal data manifold.
The discriminator is trained to mark the actual sample as "true" to the maximum extent possible, while the actual sample from P is marked as "true" samplegIs marked as "false" sample; by minimizing V (G) ═ log [1-D (G (z))]Training generator G to fool D, which is equivalent to maximizing v (G) ═ D (G (z)); during the confrontation training, the generator can gradually improve the capacity of generating similar and vivid matrixes, the discriminator can also improve the capacity of identifying the real matrixes and the generated matrixes, and finally the abnormal detection model is generated through training.
The invention performs anomaly detection of traffic matrix frequency domain features using an unsupervised anomaly detection model based on a generative countermeasure network that trains generators and discriminators with characteristics of the generative countermeasure network (GAN) to simultaneously distinguish generated data from authentic data.
When the confrontation training is complete, the generator has learned the x-mapping from the hidden spatial representation z to the real (normal) image (POP traffic matrix):
G(z)=z→x
because the hidden space has the smooth and continuous property, two images which are similar in vision can be generated by sampling from two close points in the hidden space theoretically; given an input image (POP flow matrix to be detected) x, finding a point z from a hidden space, wherein the point is in one-to-one correspondence with an image G (z), the image is most similar to the query image x in vision, and the image is also positioned in a manifold x; the degree of similarity of x and G (z) depends on how well a given input image follows the data distribution p for the training generatorg(ii) a To find the best Z, Z can be randomly sampled from the implicit spatial distribution Z1And input it into a trained generator to obtain a generated image G (z)1) (ii) a Based on the generated image G (z)1) Defining a loss function which guarantees z1Gradient update of coefficient, thereby updating position in hidden variable to obtain new z2(ii) a And so on until iteration gamma finds the most similar image G (z)γ) And carrying out multiple back propagation steps by utilizing an iterative method to optimize the position of Z in the latent space Z.
The anomaly detection model includes a loss function including a reconstruction error loss and a discrimination loss;
the reconstruction error loss is used for measuring an input POP flow matrix x to be detected and a matrix G (z) generated in a matrix spaceγ) The difference between them, the reconstruction error loss is expressed by the following formula:
LR(zγ)=∑|x-G(zγ)|
in the formula, LR(zγ) To reconstruct error losses; x is P to be detectedAn OP traffic matrix; g (z)γ) A matrix generated in a matrix space;
the discrimination loss is expressed by the following formula:
LD(zγ)=∑|f(x)-f(G(zγ))|
in the formula, LD(zγ) To discriminate losses, the output of the discriminator intermediate layer f (-) is used to specify statistical information of the output matrix; x is a POP flow matrix to be detected; g (z)γ) A matrix generated in a matrix space;
the loss function is defined as a weighted sum of two components, the total loss function being represented by the following formula:
L(zγ)=(1-λ)LR(zγ)+λLD(zγ)
wherein L (z)γ) Is the total loss; l is a radical of an alcoholD(zγ) To identify losses; l isR(zγ) To reconstruct error losses; λ is a weighting coefficient for representing the weight of R (x), D (x).
S600: performing anomaly detection based on the anomaly detection model;
when the new data (POP flow matrix x to be detected) is subjected to abnormity identification, the POP flow matrix to be detected is subjected to frequency domain analysis and data dimension reduction processing to obtain a low-dimensional data matrix, and the low-dimensional data matrix is input into an abnormity detection model for abnormity detection;
the abnormal detection comprises evaluating a POP flow matrix x to be detected, judging whether the POP flow matrix x is a normal image or an abnormal image, and specifically comprises the following steps:
the loss function of the anomaly detection model is used for mapping the POP flow matrix x to be detected to a hidden space, and the loss function evaluates the generated matrix G (z) when updating iteration gamma timesγ) Degree of similarity of matching with the images (training set) input in the confrontation training; therefore, the abnormal score a (x) can be directly obtained from the mapping loss function, and the score represents the fitting degree of the POP traffic matrix x to be detected and the normal matrix model:
A(x)=(1-λ)R(x)+λD(x)
wherein A (x) is an abnormality score; r (x) is the reconstruction error score; d (x) is the discriminant score; λ is a weighting coefficient for representing the weight of R (x), D (x); for abnormal matrix data, the abnormal detection model can generate a larger abnormal score, and the smaller abnormal score means that a very similar matrix is trained in the training process; the abnormal score can be directly judged on the basis of setting an adaptive threshold value.
Judging whether the internal network is abnormal or not based on the adaptive judging threshold T and the abnormal score of the POP flow matrix to be detected, and if the abnormal score of the POP flow matrix to be detected is larger than the adaptive judging threshold T, judging that the internal network is abnormal; the manner of abnormality determination is as follows:
Figure BDA0003603004330000101
the invention obtains a self-adaptive discrimination threshold T according to the energy of sample data and the proportion rho of abnormal data, and specifically comprises the following steps:
for N pieces of flow data, let ωTFor step length detection, after DFT processing, N can be obtainedTThe method comprises the following steps of calculating the abnormal score of each sample through an unsupervised abnormal detection model based on a generated countermeasure network, sequencing all samples in an ascending order according to the abnormal score value, and expressing a self-adaptive discrimination threshold T through the following formula:
Figure BDA0003603004330000102
in the formula, T is a self-adaptive discrimination threshold; a (i) is the abnormal score of the ith POP flow matrix sample after ascending sorting; the numerical value of i is the number of POP flow matrix samples multiplied by (1-rho), and the downward integer of the result is taken; rho is the abnormal fraction of the flow data and the proportion of the abnormal data, the rho value of the data set cannot be predicted in practice, and a proper rho value is determined by adopting a binary-fold half-search method; n is a radical ofTIs the number of POP traffic matrix samples.
The invention adopts the automatic encoder matrix to carry out data dimension reduction, and converts MxM into a databaseKsDimension information is converted into M multiplied by M dimension information, dimension reduction of a hyperspectral image into a 2-dimensional plane image is approximately seen, then abnormality detection of a digital (pixel) matrix is directly carried out by using a GAN network, namely data in the abnormality detection process of the internal network is in a matrix form, the GAN network is applied to the invention, the matrix data is directly processed, numerical value-pixel conversion is not needed, the process of converting the matrix into the image does not exist, and the abnormality detection is directly carried out by using the GAN network based on a flow data matrix.
Example two
As shown in fig. 9, an embodiment of the present invention provides an internal network anomaly detection system based on a POP traffic matrix, which includes a POP traffic matrix construction module, an encoding module, a frequency domain feature extraction module, a data dimension reduction module, an anomaly detection model generation module, and an anomaly detection module;
the POP flow matrix construction module is used for acquiring flow data among the nodes to construct a POP flow matrix;
the encoding module is used for encoding each flow data in the POP flow matrix into a vector;
the frequency domain feature extraction module is used for extracting the frequency domain features of the flow data based on the vectors;
the data dimension reduction module is used for performing data dimension reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix;
the anomaly detection model generation module is used for generating an antagonistic network based on the training set training to generate an anomaly detection model;
and the anomaly detection module is used for carrying out anomaly detection based on the anomaly detection model.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for detecting internal network abnormity based on a POP flow matrix is characterized by comprising the following steps:
s100: acquiring flow data among nodes to construct a POP flow matrix;
s200: encoding each flow data in the POP flow matrix into a vector;
s300: extracting frequency domain features of each flow data based on the vectors;
s400: performing data dimensionality reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix;
s500: generating a countermeasure network based on the training set training to generate an anomaly detection model, the anomaly detection model including a loss function;
s600: and carrying out anomaly detection based on the anomaly detection model.
2. The method according to claim 1, wherein the step S200 includes: and coding all the flow data detected in the detection step length, and converting the flow data into vector representation.
3. The internal network anomaly detection method according to claim 1, wherein said step S300 further comprises: and carrying out normalization processing on the frequency domain characteristics by adopting a Min-Max algorithm.
4. The method according to claim 1, wherein the step S400 comprises: and respectively inputting the frequency domain characteristics of the flow data into a depth automatic encoder to perform data dimension reduction.
5. The internal network anomaly detection method according to claim 1, wherein said step S400 further comprises: the low-dimensional data matrix is plotted as a thermodynamic diagram.
6. The method according to claim 1, wherein the loss function in step S500 is expressed by the following formula:
L(zγ)=(1-λ)LR(zγ)+λLD(zγ)
wherein L (z)γ) Is the total loss; l isD(zγ) To identify losses; l isR(zγ) To reconstruct error losses; λ is a weighting coefficient.
7. The method according to claim 1, wherein the step S600 comprises:
obtaining an abnormal score of the POP flow matrix to be detected based on the loss function; and judging whether the internal network is abnormal or not based on the self-adaptive judging threshold value and the abnormal score of the POP flow matrix to be detected, and if the abnormal score of the POP flow matrix to be detected is larger than the self-adaptive judging threshold value, judging that the internal network is abnormal.
8. The internal network anomaly detection method according to claim 7, characterized in that said anomaly score is expressed by the following formula:
A(x)=(1-λ)R(x)+λD(x)
wherein A (x) is an abnormality score; r (x) is the reconstruction error score; d (x) is the discriminant score; λ is a weighting coefficient.
9. The internal network anomaly detection method according to claim 7, characterized in that said adaptive discrimination threshold is expressed by the following formula:
Figure FDA0003603004320000021
in the formula, T is a self-adaptive discrimination threshold; a (i) is the abnormal score of the ith POP flow matrix sample after ascending sorting; the value of i is the number of POP traffic matrix samples multiplied by (1-rho); ρ is an abnormal score of the flow data and abnormal dataThe ratio of (A) to (B); n is a radical ofTIs the number of POP traffic matrix samples.
10. An internal network anomaly detection system based on a POP flow matrix is characterized by comprising a POP flow matrix construction module, a coding module, a frequency domain feature extraction module, a data dimension reduction module, an anomaly detection model generation module and an anomaly detection module;
the POP flow matrix construction module is used for acquiring flow data among the nodes to construct a POP flow matrix;
the encoding module is used for encoding each flow data in the POP flow matrix into a vector;
the frequency domain feature extraction module is used for extracting the frequency domain features of the flow data based on the vectors;
the data dimension reduction module is used for performing data dimension reduction on the frequency domain characteristics of the flow data to obtain a low-dimensional data matrix; constructing a training set based on the low-dimensional data matrix;
the anomaly detection model generation module is used for generating an antagonistic network based on the training set training to generate an anomaly detection model, and the anomaly detection model comprises a loss function;
the anomaly detection module is used for carrying out anomaly detection based on the anomaly detection model.
CN202210412983.4A 2022-04-19 2022-04-19 Internal network anomaly detection method and system based on POP flow matrix Active CN114745187B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210412983.4A CN114745187B (en) 2022-04-19 2022-04-19 Internal network anomaly detection method and system based on POP flow matrix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210412983.4A CN114745187B (en) 2022-04-19 2022-04-19 Internal network anomaly detection method and system based on POP flow matrix

Publications (2)

Publication Number Publication Date
CN114745187A true CN114745187A (en) 2022-07-12
CN114745187B CN114745187B (en) 2022-11-01

Family

ID=82283996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210412983.4A Active CN114745187B (en) 2022-04-19 2022-04-19 Internal network anomaly detection method and system based on POP flow matrix

Country Status (1)

Country Link
CN (1) CN114745187B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190124045A1 (en) * 2017-10-24 2019-04-25 Nec Laboratories America, Inc. Density estimation network for unsupervised anomaly detection
CN109948117A (en) * 2019-03-13 2019-06-28 南京航空航天大学 A kind of satellite method for detecting abnormality fighting network self-encoding encoder
CN111222638A (en) * 2019-11-21 2020-06-02 湖南大学 Network anomaly detection method and device based on neural network
CN112597831A (en) * 2021-02-22 2021-04-02 杭州安脉盛智能技术有限公司 Signal abnormity detection method based on variational self-encoder and countermeasure network
CN113747441A (en) * 2021-08-03 2021-12-03 西安交通大学 Mobile network flow abnormity detection method and system based on feature dimension reduction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190124045A1 (en) * 2017-10-24 2019-04-25 Nec Laboratories America, Inc. Density estimation network for unsupervised anomaly detection
CN109948117A (en) * 2019-03-13 2019-06-28 南京航空航天大学 A kind of satellite method for detecting abnormality fighting network self-encoding encoder
CN111222638A (en) * 2019-11-21 2020-06-02 湖南大学 Network anomaly detection method and device based on neural network
CN112597831A (en) * 2021-02-22 2021-04-02 杭州安脉盛智能技术有限公司 Signal abnormity detection method based on variational self-encoder and countermeasure network
CN113747441A (en) * 2021-08-03 2021-12-03 西安交通大学 Mobile network flow abnormity detection method and system based on feature dimension reduction

Also Published As

Publication number Publication date
CN114745187B (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN109949278B (en) Hyperspectral anomaly detection method based on antagonistic self-coding network
CN111027378B (en) Pedestrian re-identification method, device, terminal and storage medium
CN107563433B (en) Infrared small target detection method based on convolutional neural network
CN110298235B (en) Hyperspectral anomaly detection method and system based on manifold constraint self-coding network
CN111582225A (en) Remote sensing image scene classification method and device
CN112633202B (en) Hyperspectral image classification algorithm based on dual denoising combined multi-scale superpixel dimension reduction
CN110766708B (en) Image comparison method based on contour similarity
CN107203750B (en) Hyperspectral target detection method based on combination of sparse expression and discriminant analysis
WO2011088520A1 (en) Identifying matching images
CN115600200A (en) Android malicious software detection method based on entropy spectrum density and adaptive contraction convolution
CN115331079A (en) Attack resisting method for multi-mode remote sensing image classification network
CN116704585A (en) Face recognition method based on quality perception
CN111428772A (en) Photovoltaic system depth anomaly detection method based on k-nearest neighbor adaptive voting
CN113095218B (en) Hyperspectral image target detection algorithm
CN114745187B (en) Internal network anomaly detection method and system based on POP flow matrix
CN116189800B (en) Pattern recognition method, device, equipment and storage medium based on gas detection
CN117115675A (en) Cross-time-phase light-weight spatial spectrum feature fusion hyperspectral change detection method, system, equipment and medium
CN116704241A (en) Full-channel 3D convolutional neural network hyperspectral remote sensing image classification method
CN114760128A (en) Network abnormal flow detection method based on resampling
CN115375966A (en) Image countermeasure sample generation method and system based on joint loss function
CN113989898A (en) Face confrontation sample detection method based on spatial sensitivity
Yi et al. Multiresolution convolutional neural network for underwater acoustic target recognition
CN112767427A (en) Low-resolution image recognition algorithm for compensating edge information
CN112257688A (en) GWO-OSELM-based non-contact palm in-vivo detection method and device
Xu et al. Individual recognition of communication emitter based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant