CN115694947B

CN115694947B - Network encryption traffic threat sample generation mechanism method based on countermeasure generation DQN

Info

Publication number: CN115694947B
Application number: CN202211316059.2A
Authority: CN
Inventors: 杨进; 梁炜恒; 梁刚; 朱云飞; 陈晨; 李果
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2024-04-16
Anticipated expiration: 2042-10-26
Also published as: CN115694947A

Abstract

The invention discloses a network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, which converts data in an original network encryption traffic data packet into a data format required by a model for generating the countermeasure generation DQN threat sample through data preprocessing; inputting the obtained data into a category labeling processing program to carry out category labeling; the method for generating the network encryption traffic threat sample is characterized in that the network encryption traffic threat sample is decomposed into a series of modules, and the final step is to output generated network encryption traffic threat sample data; the improved sample generation module selects a corresponding next action according to the information related to the initial candidate environment, then selects a candidate environment similar to the original network encryption traffic data according to the action, stores action experience into an experience space to perform training learning of the neural network, and repeats the steps until the improved sample generation module can generate network encryption traffic threat sample data similar to the original network encryption traffic data.

Description

Network encryption traffic threat sample generation mechanism method based on countermeasure generation DQN

Technical Field

The invention relates to the fields of network security technology and the like, in particular to a network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN.

Background

Along with the continuous development of internet technology and the increase of application demands, many internet applications choose to encrypt network communication traffic in the data transmission process, and after the encryption methods, many plaintext information in network traffic data packets becomes invisible, and the accuracy of the traditional network traffic identification method is no longer as good as that of the traditional network traffic identification method, so that the encryption method is continuously improved and developed, and a great challenge is brought to the network traffic identification method, and deep learning is rapidly applied to the network traffic identification field due to the strong learning capability and wide applicability of the deep learning method.

The multi-layer perceptron is a neural network, which consists of an input layer, an output layer and a plurality of hidden layer neurons, each layer is provided with a plurality of neurons densely connected to the adjacent layers, and the network structure of the model is shown in figure 1. The neurons take a weighted sum of their input data and produce an output by a nonlinear activation function. Because of the large number of parameters that need to be learned by the multi-layer perceptron model, the model is often very complex, inefficient, and difficult to train against complex problems. The current network traffic identification field does not use a deep multi-layer perceptron alone any more, and only a few layers of fully connected neurons are used as a small part of other models. For data identifying threat features in network encrypted traffic, the multi-layer perceptron model requires a large amount of network encrypted traffic threat sample data due to its complexity.

Similar to the multi-layer perceptron, convolutional neural networks are also composed of several layers of neurons with learnable parameters. The convolutional neural network model solves this problem by using a convolutional layer, because the multi-layer perceptron model does not handle high-dimensional inputs well, resulting in a large number of parameters in the hidden layer. The convolutional layer uses a set of convolutional kernels with a small number of parameters, and the same set of convolutional kernels is used across the input to produce the output of the next layer. The number of parameters is significantly reduced by using the same set of convolution kernels in one layer, which also helps the model to have translational invariance and rotational invariance, sampling using a pooling layer after one or more convolution layers, and concatenating a fully connected layer after the last hidden layer.

A recurrent neural network is a neural network that contains a recurrent structure to store sequence information, and is specifically designed for sequence data, whose output data depends not only on the last input data but also on previous input data. Recurrent neural networks have been successfully applied to speech recognition, time series prediction, translation, and language modeling. The weakness of dependence between inputs that learn far away from each other due to gradient extinction and gradient explosion is a challenge encountered with conventional recurrent neural networks, which solve this problem by adding a set of gates that control when information is stored or deleted.

The auto-encoder is also a neural network model with significantly smaller hidden layers than the input and output layers, the network structure of the model being as shown in fig. 2, the intra-coded representation of the auto-encoder can be used for data compression or dimension reduction. Multilayer perceptrons, convolutional neural networks, and recurrent neural networks can all be used as part of an automatic encoder model, which is widely used to initialize weights of deep neural networks. Automatic encoders have some variations, such as denoising automatic encoders, which train by inputting incomplete sample data to output complete input samples, forcing the model to be more robust; and a variation automatic encoder for generating virtual data from the target data distribution. A more complex auto-encoder architecture is called a stacked auto-encoder, where multiple auto-encoders are stacked, where the output of each auto-encoder is the input of the next auto-encoder, and the whole model is trained layer by layer in a greedy way.

The network encryption traffic sample identification is performed by using deep learning, and although a plurality of problems encountered by the traditional network traffic identification method can be solved, certain limitations still exist, such as the characteristic in the network encryption traffic is continuously changed; or the network encryption traffic data set is unbalanced in category, the quantity of the network encryption traffic data of one category is several times that of the network encryption traffic data of the other category, so that the identification model identifies all small category samples as large category samples for improving the identification rate, and the model has no better generalization; the real recognition rate of the recognition model on the network encryption traffic threat sample data can be improved only by generating the network encryption traffic threat sample data which is similar as possible according to the existing network encryption traffic data through different methods and increasing the number of the network encryption traffic threat sample data.

Disclosure of Invention

The invention aims to provide a network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, which combines the countermeasure generation neural network with the DQN, fully utilizes the advantages of the countermeasure generation neural network in the aspect of generating a small sample data set and combines the advantages of the DQN algorithm in action decision optimization.

The invention is realized by the following technical scheme: a network encrypted traffic threat sample generation mechanism method based on challenge generation DQN, comprising the steps of:

1) Converting data in an original network encrypted flow data packet into a data format required by a model for generating a DQN threat sample by data preprocessing;

2) Inputting the data obtained in the step 1) into a category labeling processing program to carry out category labeling;

3) After the step 2), using a sample generation module as a tool for generating network encrypted traffic threat sample data, and generating the network encrypted traffic threat sample data based on the current small sample network encrypted traffic data;

4) The sample generation module mainly comprises a data generation sub-module and a resolution sub-module, wherein the data generation sub-module generates network encryption traffic threat sample data by utilizing different noise parameters, the generated network encryption traffic threat sample data and original network encryption traffic data are sent to the resolution sub-module together, the resolution sub-module performs feature extraction on the input data, and judges whether the input data is the original network encryption traffic data or the generated network encryption traffic threat sample data, and meanwhile the resolution sub-module and the data generation sub-module in the sample generation module are trained until the loss functions of the two sub-modules are approaching to stability;

5) The original network encryption traffic data and the generated network encryption traffic threat sample data are input into an improved sample generation module based on the countermeasure generation DQN together for training and learning, the mechanism of continuous exploration and recognition is interacted with the environment, and the generation of the network encryption traffic threat sample data which is similar as possible is realized by maximizing the final expected reward value;

6) Judging whether the training and learning end condition of the countermeasure generation DQN threat sample generation model is met, if not, starting to perform the next training and learning from the step 3), repeating the training for a plurality of times to continuously generate different network encryption traffic threat sample data, and if so, finishing the detection and outputting the finally generated network encryption traffic threat sample data.

Further, in order to better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, the following setting mode is adopted: the generating network encrypted traffic threat sample data based on the current small sample network encrypted traffic data specifically comprises: network encrypted traffic threat sample data is dynamically generated using existing network encrypted traffic packet information, including network protocols (e.g., UDP, TCP, FTP or HTTP), applications (e.g., tencel QQ, weChat or browser), traffic types (e.g., browse video, download files or chat), interaction sites, user behavior (e.g., submit form requests or send messages), operating system, browser.

Further, in order to better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, the following setting mode is adopted: the method comprises the steps of generating network encrypted traffic threat sample data based on current small sample network encrypted traffic data, wherein the input of the network encrypted traffic threat sample data further comprises the following input characteristics: the header, the time sequence, the payload and the statistical characteristics of the data packet, because the time sequence characteristics are hardly affected by traffic encryption, the characteristics are widely applied to network encryption traffic data generation; in network encrypted traffic data, the first few packets containing protocol handshake information are typically not encrypted and are also applied to network encrypted traffic data generation; the number of statistical features and the input dimensions are limited, and different statistical features are selected according to the type of network encrypted traffic data generated.

Further, in order to better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, the following setting mode is adopted: the starting position of the data generation sub-module is connected with a random noise generator, the random noise generator is used for generating a group of random noise parameters, generating network encryption traffic threat sample data together with input data, and then the data generation sub-module updates parameters of each neuron in the neural network according to the noise parameters and the judgment result output by the resolution sub-module to generate the network encryption traffic threat sample data.

Further, in order to better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, the following setting mode is adopted: the resolution submodule is a network structure trained by the input original network encrypted flow data, judges whether the network encrypted flow data generated by the data generation submodule has threat features or not, and returns a judging result to the data generation submodule until loss values of the resolution submodule and the data generation submodule tend to be stable; the resolution submodule judges the input original network encryption traffic data and the generated network encryption traffic threat sample data according to the set resolution function, and the positive value indicates that the resolution submodule can distinguish the original network encryption traffic data and the generated network encryption traffic threat sample data, and the negative value indicates that the resolution submodule cannot determine whether the input data is the original network encryption traffic data or the generated network encryption traffic threat sample data.

Further, in order to better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, the following setting mode is adopted: the step 5) comprises the following specific steps:

5.1 The improved sample generation module randomly selects an initial interaction environment before training and learning, then selects a next action A in the initial environment, and selects an action with the maximum expected reward value through a neural network according to the continuous exploration and identification mechanism of the counter-generated DQN threat sample generation model, wherein the probability of the next action is , and the probability of randomly selects an action in an action space;

5.2 After the action selection is finished, the improved sample generation module executes the action in the initial environment, then the initial environment returns the next environment S_N and the rewarding value R, and the improved sample generation module stores the current environment S, the selected action A and the rewarding value R in an experience space at the moment, wherein the next environment S_N is used as experience;

5.3 The improvement sample generation module changes the current environment to the next environment s_n and repeats step 5.2) until the experience space reaches a threshold;

5.4 After the experience space reaches the threshold value, the neural network in the improved sample generation module starts to update, samples with fixed sizes are randomly selected from the experience space, the obtained reward value R and the next environment S_N are sent into the neural network to calculate the expected reward value y, the calculation result is sent into the Q network to calculate the loss value, and the Q network is updated according to the loss value;

5.5 The improved sample generation module continues to interact with the environment to generate four values of the current environment S, the selected action A, the rewarding value R and the next environment S_N as experiences, the experiences are placed in an experience space, and then the step 5.4) is repeated, and the sample is selected to update the Q network continuously and randomly until the loss value of the Q network tends to be stable;

5.6 When the parameters of the Q network in the improved sample generation module are updated to a certain number of times, the parameters of the anti-generation neural network in the improved sample generation module are updated, and finally the generated network encryption traffic threat sample data is output.

Further, in order to better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, the following setting mode is adopted: the is a floating point number between 0 and 100, and the/> is incremented with the number of training steps, i.e. the sample generation module is improved to trust the Q network selection more and more.

Further, in order to better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, the following setting mode is adopted: when the loss value of the Q network is calculated, the loss value is realized through the following formula:

Where Loss represents a Loss value of the Q network, Q (S) represents a prize value returned by the current environment S in the Q network according to the action, and y represents a desired prize value calculated by the neural network according to the input prize value R and the next environment s_n.

Compared with the prior art, the invention has the following advantages:

The invention realizes the generation of network encryption traffic threat sample data based on the countermeasure generation DQN model, combines deep learning with reinforcement learning, solves the input problem of high-dimensional data in the traditional reinforcement learning, decomposes the network encryption traffic threat sample generation method into a series of modules, and finally outputs the generated network encryption traffic threat sample data; the improved sample generation module selects a corresponding next action according to the information related to the initial candidate environment, then selects a candidate environment similar to the original network encryption traffic data according to the action, stores action experience into an experience space to perform training learning of the neural network, and repeats the steps until the improved sample generation module can generate network encryption traffic threat sample data similar to the original network encryption traffic data. According to the invention, the network encryption traffic threat sample data which is similar as possible is generated according to the existing network encryption traffic data, and the number of the network encryption traffic threat sample data is increased, so that the real recognition rate of the network encryption traffic threat sample data by the network encryption traffic recognition model is improved.

The invention combines the anti-generation neural network with the DQN, fully utilizes the advantages of the anti-generation neural network in the aspect of generating a small sample data set, combines the advantages of the DQN algorithm in action decision optimization, thereby generating network encrypted traffic threat sample data similar to the original network encrypted traffic data, and solves the problem of low recognition rate of the small sample traffic data in the unbalanced network encrypted traffic data set.

Drawings

FIG. 1 is a diagram of a multi-layer perceptron model.

Fig. 2 is a diagram of an automatic encoder model.

Fig. 3 is a diagram of a markov decision process.

Fig. 4 is an overall flowchart of the algorithm.

Detailed Description

The present invention will be described in further detail with reference to examples, but embodiments of the present invention are not limited thereto.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.

Noun interpretation:

DQN: the DQN is an algorithm combining deep learning and reinforcement learning, and the Q space capable of storing an oversized state space cannot be constructed because the Q space storage size in the traditional reinforcement learning algorithm Q-learning is limited and the state in practical application is nearly infinite; however, the neural network can take the state and the action as the input of the neural network, and then the Q value of the action is obtained after the analysis of the neural network, so that the Q value is not required to be recorded in the Q space, and the Q value is predicted directly by using the neural network.

The invention is based on the following theoretical basis:

The method for identifying the threat sample data in the network encrypted traffic, namely, the method for identifying the network encrypted traffic data with threat features is very important for network service quality control, network resource use planning, malicious software detection, intrusion detection and other applications, and because of the importance of the threat sample data, a plurality of different network encrypted traffic identification methods are developed for years so as to adapt to different application scenes and changing requirements, but in recent years, the communication technology is continuously developed, including the technology of encryption, port confusion and the like, and new challenges are presented for identifying the network encrypted traffic. Over time, technology for identifying network encrypted traffic has evolved significantly. The simplest approach is to use port numbers for identification, however the accuracy of this identification approach is continually decreasing because emerging applications either use well known port numbers to hide their traffic or do not use standard port numbers. Although this identification method is inaccurate, in practical applications, the port number identification method is still widely used alone or together with other identification methods; currently, identifying network encrypted traffic relies on payload or packet inspection, focusing on looking up features or keywords in network traffic packets, but these methods are only applicable to unencrypted traffic data and have high computational overhead. New generation identification techniques based on traffic statistics have therefore emerged, which rely on statistical or time-series features to enable them to process encrypted and unencrypted network traffic data, typically using machine learning algorithms such as random forests and K nearest neighbors, however the performance of these identification methods depends largely on the design of the algorithm, thus limiting their versatility.

Because the deep learning technology avoids the need of extracting the threat features of the network encryption traffic by an expert, and the feature is automatically extracted by training the neural network, the advantage makes the deep learning a very ideal method for identifying the network encryption traffic, especially when the network encryption traffic of a new class continuously appears and the encryption traffic of an old class continuously changes. Another important advantage of deep learning is that it has a considerably stronger learning capacity than traditional machine learning methods, and thus can learn more complex patterns. Combining these two advantages, deep learning, which is an end-to-end approach, is able to learn the nonlinear relationship between the original input and the corresponding output without decomposing the problem into sub-problems of feature extraction and recognition threat samples. Deep learning, however, requires a large amount of tag class data and sufficient computing power to achieve the goal of identifying network encrypted traffic.

An antagonism generation network is an unsupervised neural network that trains both a generation model intended to generate simulated data of a target distribution and an authentication model intended to distinguish between real data and generated data, both models typically consisting of neural networks. The generated model is first trained by the discrimination model to maximize the error probability, then the real data and the generated data are input, the generated model parameters are revised and the discrimination model is trained to minimize the error probability, and the process is repeated until convergence of the model occurs. Although training and convergence is difficult against a generation network, it has been used in many applications, such as creating simulated images, generating 3D models from images, improving image quality, generating sample data for data-sparse applications. The generative model may be used to address the problem of data set imbalance in network encrypted traffic identification, which refers to the situation where the number of samples per class in the network encrypted traffic data set varies greatly, in which case deep learning is often difficult to predict correctly a small amount of class data. The most common and simplest method of handling unbalanced data sets is to oversample a small amount of class data, copy samples of a secondary class, or undersample most classes, and delete some samples from the primary class. The primary difference between the aided-classified countermeasure-generating neural network, which is used to generate the composite samples required for the classification task, and the countermeasure-generating neural network, is that the aided-classified countermeasure-generating neural network takes as input both random noise and class labels, so as to generate samples with class labels.

The markov decision process is a model with markov characteristics in which the next context changes are related not only to the current context but also to the currently selected action. Typically a markov decision process consists of a five-tuple:

(S,A,{P_sa},γ,R)

where S represents the set of environments, i.e. the set of all possible environmental constituents in a system; a represents a set of actions, i.e. a set of actions where all models in a system may exist; p _sa represents the probability of the environment transition, and the transition from one environment to the next in S needs to select different A, and under the condition of the current environment S E S, the transition to other different environments, namely the probability distribution of other environments, is performed after the selected action a E A is executed; gamma represents the discount coefficient, when gamma=0, the current model only considers the instant prize value and does not consider the long-term return value, and when gamma=1, the current expected prize value and the instant prize value are equally important; r is a reward function, and a reward value is calculated according to the current environment and the selected action. The initial model randomly selects an environment s ₁, then selects an action a ₁ from A to execute, the model transitions to the next environment s ₂ according to the probability distribution of P _sa, then selects an action a ₂ to execute, transitions to the next environment s ₃, and repeats the above steps until the model training learning end condition is met, and the specific process of the model is shown in FIG. 3.

Example 1:

The invention designs a network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, which comprises the following steps:

1) Converting data in an original network encrypted traffic data packet (original network encrypted traffic data) into a data format required by a generation model for resisting generation of DQN threat samples through data preprocessing;

3) After the step 2), a sample generation module is adopted as a tool for generating network encrypted traffic threat sample data, and the network encrypted traffic threat sample data is generated based on the current small sample network encrypted traffic data, specifically: dynamically generating network encrypted traffic threat sample data by using network protocol (such as UDP, TCP, FTP or HTTP), application program (such as Tencent QQ, weChat or browser), traffic type (such as browsing video, downloading file or chat), interaction website, user behavior (such as submitting form request or sending message), operating system and browser;

4) The sample generation module mainly comprises a data generation sub-module and a resolution sub-module, wherein the data generation sub-module generates network encryption traffic threat sample data by utilizing different noise parameters, the generated network encryption traffic threat sample data and original network encryption traffic data are sent to the resolution sub-module together, the resolution sub-module performs feature extraction on input data (the generated network encryption traffic threat sample data and the original network encryption traffic data) and judges whether the input data is the original network encryption traffic data or the generated network encryption traffic threat sample data, and meanwhile the resolution sub-module and the data generation sub-module in the sample generation module are trained until the loss functions of the two sub-modules approach to stability;

Example 2:

the embodiment is further optimized based on the above embodiment, and the same features as the foregoing technical solutions are not described herein, so as to further better implement the method for generating a network encrypted traffic threat sample based on the challenge generation DQN according to the present invention, and particularly, the following setting manner is adopted: the method comprises the steps of generating network encrypted traffic threat sample data based on current small sample network encrypted traffic data, wherein the input of the network encrypted traffic threat sample data further comprises the following input characteristics: the header, the time sequence, the payload and the statistical characteristics of the data packet, because the time sequence characteristics are hardly affected by traffic encryption, the characteristics are widely applied to network encryption traffic data generation; in network encrypted traffic data, the first few packets containing protocol handshake information are typically not encrypted and are also applied to network encrypted traffic data generation; the number of statistical features and the input dimensions are limited, and different statistical features are selected according to the type of network encrypted traffic data generated.

Example 3:

The embodiment is further optimized on the basis of any one of the embodiments, and the same points as the technical scheme are not repeated here, so as to further better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, and particularly adopt the following setting mode: the starting position of the data generation sub-module is connected with a random noise generator, the random noise generator is used for generating a group of random noise parameters, generating network encryption traffic threat sample data together with input data, and then the data generation sub-module updates parameters of each neuron in the neural network according to the noise parameters and the judgment result output by the resolution sub-module to generate the network encryption traffic threat sample data.

Example 4:

the embodiment is further optimized on the basis of any one of the embodiments, and the same points as the technical scheme are not repeated here, so as to further better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, and particularly adopt the following setting mode: the resolution submodule is a network structure trained by the input original network encrypted flow data, judges whether the network encrypted flow data generated by the data generation submodule has threat features or not, and returns a judging result to the data generation submodule until loss values of the resolution submodule and the data generation submodule tend to be stable; the resolution submodule judges the input original network encryption traffic data and the generated network encryption traffic threat sample data according to the set resolution function, and the positive value indicates that the resolution submodule can distinguish the original network encryption traffic data and the generated network encryption traffic threat sample data, and the negative value indicates that the resolution submodule cannot determine whether the input data is the original network encryption traffic data or the generated network encryption traffic threat sample data.

Example 5:

The embodiment is further optimized on the basis of any one of the embodiments, and the same points as the technical scheme are not repeated here, so as to further better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, and particularly adopt the following setting mode: the step 5) comprises the following specific steps:

Example 6:

The embodiment is further optimized on the basis of any one of the embodiments, and the same points as the technical scheme are not repeated here, so as to further better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, and particularly adopt the following setting mode: the is a floating point number between 0 and 100, and the/> is incremented with the number of training steps, i.e. the sample generation module is improved to trust the Q network selection more and more.

Example 7:

The embodiment is further optimized on the basis of any one of the embodiments, and the same points as the technical scheme are not repeated here, so as to further better realize the network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN, and particularly adopt the following setting mode: when the loss value of the Q network is calculated, the loss value is realized through the following formula:

Example 8:

The embodiment is further optimized based on any one of the embodiments, and the same features as the foregoing technical solutions are not described herein, and in combination with the description of fig. 4, a method for generating a network encrypted traffic threat sample based on the challenge generation DQN includes the following steps:

The preparation stage of the original network encrypted traffic data: extracting the original network encrypted traffic data in the original network encrypted traffic data packet.

Data preprocessing: the raw network encrypted traffic data is converted into the data format required to generate a model against the generation of DQN threat samples.

A data category labeling stage: and inputting the preprocessed data into a class labeling processing program to carry out class labeling.

Sample generation module processing stage: after the data category is marked, a sample generation module is adopted as a tool for generating a network encryption traffic threat sample, network encryption traffic threat sample data is generated based on the current small sample network encryption traffic data, and the network encryption traffic threat sample data which can be further used is dynamically generated by utilizing the network protocol (such as UDP, TCP, FTP or HTTP), application program (such as Tencent QQ, weChat or browser), traffic type (such as browsing video, downloading files or chatting), interaction website, user behavior (such as submitting form request or sending message), operating system, browser and other data packet information of the existing network encryption traffic data; inputting available network encryption traffic threat sample data into a sample generation module, generating network encryption traffic threat sample data by a data generation submodule of the sample generation module by using different noise parameters, sending the generated network encryption traffic threat sample data and original network encryption traffic data into a resolution submodule, and carrying out feature extraction on the input data (the generated network encryption traffic threat sample data and the original network encryption traffic data) by the resolution submodule, judging whether the input network encryption traffic data is the original network encryption traffic data or the generated network encryption traffic threat sample data, and training the resolution submodule and the data generation submodule in the sample generation module until the loss functions of the two submodules approach to be stable; when generating network encrypted traffic threat sample data based on current small sample network encrypted traffic data, its inputs further include the following input features: the header, the time sequence, the payload and the statistical characteristics of the data packet, because the time sequence characteristics are hardly affected by traffic encryption, the characteristics are widely applied to network encryption traffic data generation; in network encrypted traffic data, the first few packets containing protocol handshake information are typically not encrypted and are also applied to network encrypted traffic data generation; the number of statistical features and the input dimensions are limited, and different statistical features are selected according to the type of network encrypted traffic data generated.

An improved sample generation module processing stage: the method comprises the following specific steps of inputting original network encryption traffic data and generated network encryption traffic threat sample data into an improved sample generation module based on the countermeasure generation DQN for training and learning, interacting with the environment through a mechanism which is continuously explored and identified, and generating the network encryption traffic threat sample data which is as similar as possible by maximizing a final expected reward value, wherein the method comprises the following specific steps of:

Firstly, an initial interaction environment is randomly selected by an improved sample generation module before training and learning, then a next action A is selected in the initial environment, the next action has probability of selecting an action with the largest expected reward value through a neural network according to a mechanism of continuously exploring and identifying a counter-generated DQN threat sample generation model, and the probability of randomly selects an action in an action space;

Secondly, after the action selection is finished, the improved sample generation module executes the action in an initial environment, then the initial environment returns a next environment S_N and a reward value R, and the improved sample generation module stores the current environment S, the selected action A and the reward value R in an experience space as experience;

secondly, the improvement sample generation module changes the current environment to the next environment S_N and repeats step 5.2) until the experience space reaches a threshold;

Secondly, after an experience space reaches a threshold value, a neural network in a sample generation module is improved to update, samples with fixed sizes are randomly selected from the experience space, a sampled reward value R and a next environment S_N are sent to the neural network to calculate a desired reward value y, a calculation result is sent to a Q network to calculate a Loss value through , wherein Loss represents the Loss value of the Q network, Q (S) represents the reward value returned by the current environment S in the Q network according to actions, y represents the desired reward value calculated by the neural network according to the input reward value R and the next environment S_N, and the Q network is updated according to the Loss value;

Secondly, the improved sample generation module continuously interacts with the environment to generate four values of the current environment S, the selected action A and the rewarding value R, and the next environment S_N as experience, the experience is put into an experience space, and then the step 5.4) is repeated, and the sample is continuously selected randomly to update the Q network until the loss value of the Q network tends to be stable;

Finally, after the parameters of the Q network in the improved sample generation module are updated to a certain number of times, the parameters of the anti-generation neural network in the improved sample generation module are updated, and finally generated network encryption traffic threat sample data are output;

Wherein is a floating point number between 0 and 100, and/() is incremented with the number of training steps, i.e. the sample generation module is improved to trust the Q network selection more and more.

Judging whether the model training ending condition is met: judging whether the training and learning end condition of the countermeasure generation DQN threat sample generation model is met, if not, starting to perform the next training and learning from the step 3), repeating the training for a plurality of times to continuously generate different network encryption traffic threat sample data, and if so, finishing the detection and outputting the finally generated network encryption traffic threat sample data.

The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent variation, etc. of the above embodiment according to the technical matter of the present invention fall within the scope of the present invention.

Claims

1. The network encryption traffic threat sample generation mechanism method based on the countermeasure generation DQN is characterized by comprising the following steps of: comprising the following steps:

5) The original network encryption traffic data and the generated network encryption traffic threat sample data are input into an improved sample generation module based on the countermeasure generation DQN together for training and learning, the mechanism of continuous exploration and recognition is interacted with the environment, and the generation of the network encryption traffic threat sample data which is similar as possible is realized by maximizing the final expected reward value; comprises the following specific steps:

5.1 The improved sample generation module randomly selects an initial interaction environment before training and learning, then selects a next action A in the initial environment, and randomly selects an action in an action space according to a mechanism that a counter-generated DQN threat sample generation model continuously explores and identifies, wherein the probability of the next action is , the action with the maximum expected reward value is selected through a neural network, and the probability of the/> is selected randomly; the/> is a floating point value ranging from 0 to 100, and the/> is increased along with the increase of the training step number, namely the sample generation module is improved to trust the Q network selection more and more;

5.6 When the parameters of the Q network in the improved sample generation module are updated to a certain number of times, the parameters of the anti-generation neural network in the improved sample generation module are updated, and finally generated network encryption traffic threat sample data is output;

2. The network encrypted traffic threat sample generation mechanism method based on countermeasure generation DQN of claim 1, wherein: the generating network encrypted traffic threat sample data based on the current small sample network encrypted traffic data specifically comprises: and dynamically generating network encryption traffic threat sample data by utilizing the existing network encryption traffic data packet information including network protocols, application programs, traffic types, interaction websites, user behaviors, operating systems and browsers.

3. A network encrypted traffic threat sample generation mechanism method based on challenge generation DQN according to claim 1 or 2, characterized by: the method comprises the steps of generating network encrypted traffic threat sample data based on current small sample network encrypted traffic data, wherein the input of the network encrypted traffic threat sample data further comprises the following input characteristics: header and time sequence, payload, and statistics of the data packet.

4. The network encrypted traffic threat sample generation mechanism method based on countermeasure generation DQN of claim 1, wherein: the starting position of the data generation sub-module is connected with a random noise generator, the random noise generator is used for generating a group of random noise parameters, generating network encryption traffic threat sample data together with input data, and then the data generation sub-module updates parameters of each neuron in the neural network according to the noise parameters and the judgment result output by the resolution sub-module to generate the network encryption traffic threat sample data.

5. A network encrypted traffic threat sample generation mechanism method based on challenge generation DQN according to claim 1 or 2, characterized by: the resolution submodule is a network structure trained by the input original network encrypted flow data, judges whether the network encrypted flow data generated by the data generation submodule has threat features or not, and returns a judging result to the data generation submodule until loss values of the resolution submodule and the data generation submodule tend to be stable; the resolution submodule judges the input original network encryption traffic data and the generated network encryption traffic threat sample data according to the set resolution function, and the positive value indicates that the resolution submodule can distinguish the original network encryption traffic data and the generated network encryption traffic threat sample data, and the negative value indicates that the resolution submodule cannot determine whether the input data is the original network encryption traffic data or the generated network encryption traffic threat sample data.

6. The network encrypted traffic threat sample generation mechanism method based on countermeasure generation DQN of claim 1, wherein: when the loss value of the Q network is calculated, the loss value is realized through the following formula: