CN111917765A

CN111917765A - Network attack flow generation system based on generation type countermeasure network

Info

Publication number: CN111917765A
Application number: CN202010742886.2A
Authority: CN
Inventors: 杨华; 温泉; 王晓菲; 李宁; 张茜
Original assignee: Beijing Institute of Computer Technology and Applications
Current assignee: Beijing Institute of Computer Technology and Applications
Priority date: 2020-07-29
Filing date: 2020-07-29
Publication date: 2020-11-10

Abstract

The invention relates to a network attack flow generation system based on a generation type countermeasure network, belonging to the technical field of network security. The invention generates the network attack flow through training by applying the generating type confrontation network algorithm, is used for simulating the attack flow in the network environment, can be used for verifying the processing capacity of the safety protection system on abnormal data, and can also be applied to a network target range as a generation source of the attack flow.

Description

Network attack flow generation system based on generation type countermeasure network

Technical Field

The invention belongs to the technical field of network security, and particularly relates to a network attack flow generation system based on a generation type countermeasure network.

Background

In cyberspace, a cyber attack may be described as any malicious activity that attempts to compromise the network. Very extensive network behavior may be included in the definition, such as attempting to break the stability of the network, obtain unauthorized files, or elevate access rights. Currently, the threats faced by computer network security can be mainly classified into two categories: the threat to the information in the network and the threat to the equipment in the network. The security, integrity and usability of information can be protected only by ensuring the physical security, the network system security, the data security, the information content security and the information basic equipment security.

In order to verify the security of the network environment, the network security shooting range is an important means for supporting network space security technology verification, network tool test, attack and defense countermeasure drilling and network risk assessment. A virtualization platform which can be flexibly shared by entity equipment and environment and computing and storage resources is built, a simulation experiment environment aiming at a field network space network and equipment is formed, and the virtual drill training service capability is improved. Meanwhile, activities such as confrontation exercise, actual combat teaching, tool evaluation and the like can be carried out by utilizing the target range, so that the working personnel can carry out exercise in a scene close to the reality before carrying out tasks, learn various previously accumulated technical and combat law experiences and effectively improve the capability of the working personnel in solving practical problems.

One key and difficult problem in network shooting range construction is to simulate network attack traffic close to the actual network. The realistic network attack flow can more accurately verify the protection capability of the safety protection system on one hand, and can generate a network attack event close to the actual effect on the other hand, so that the problem solving capability of workers is better improved.

Disclosure of Invention

Technical problem to be solved

The technical problem to be solved by the invention is as follows: how to design a network attack traffic generation system.

(II) technical scheme

In order to solve the above technical problem, the present invention provides a network attack traffic generation system based on a generation-based countermeasure network, comprising: a generative countermeasure network GAN, a traffic generator and a proxy system;

the GAN is used for learning a characteristic distribution rule of target flow and generating flow characteristics, and then the generated flow characteristics are sent to a flow generator;

the flow generator is used for generating a packet sequence with randomness according to the flow characteristics, mixing the packet sequence with real attack flow, and generating the packet sequence into simulated flow through a flow generation algorithm;

the proxy system comprises a local proxy server, wherein the local proxy server is used for deforming the simulation flow and loading the load into the simulation flow to obtain the required final simulation attack flow output.

Preferably, the GAN is composed of a generator and a discriminator, wherein the generator generates a new data sample by self-learning sample feature training after receiving the real data sample, and the discriminator is used as a classifier for judging whether the input is real data or generated sample data;

setting variable z as random noise, x as input sample data, G as a generation model, D as a discriminant model, V as a flow characteristic function, and E as a two-classifier_P(x)(α) represents the likelihood function of α, GAN is expressed as:

wherein log (1-D (G (z)) represents new data sample judgment generated by training, and through continuous mutual game of maximum and minimum values, G and D are optimized cyclically and continuously until the two models reach Nash equilibrium, gradient diffusion may occur in the minimized target function of GAN, so that the target function is difficult to update the generator, LSGANs punish samples far away from decision boundary, the gradient of the samples is the decision direction of gradient descent, the cross entropy in LSGANs does not care about distance, but only care about whether to classify correctly, and the target function of a discriminator is:

the goal function of GAN is:

wherein a, b and c are variable parameters, and in the process of training the model, a, b and c satisfy b-c-1 and b-a-2.

Preferably, the traffic generator can generate a specific packet sequence according to the traffic characteristics of GAN, generate a simulated traffic by mixing the packet sequence of the real attack traffic, use the cumulative representation of the traffic cUMUL to guide the traffic generation process in the traffic generation algorithm used in mixing, and for a packet sequence P_a＝[p_a1，p_a2，…，p_ai]Flow rate of p_aiThe absolute value of (b) indicates the length of the ith packet, p_ai>0 indicates that the ith packet is an outbound packet and p_ai<0 indicates that the ith packet is a received packet; the cUMUL representation of the flow is a sequence c ═ c₁，c₂，…，c_N]Wherein c is₀＝0，c_i＝c_i－1+p_iI ∈ {1, 2, …, N }, where N denotes the length of the sequence.

Preferably, the local proxy server deforms the simulated traffic according to the generated traffic pattern, and outputs the deformed simulated traffic as final simulated attack traffic.

Preferably, in the process of generating the flow characteristics by using the GAN, the size parameter of the GAN is selected, and there are two selection modes: firstly, selecting a full data set; second, choose the minimum number of samples to train, namely train only one sample at a time.

Preferably, in the process that the GAN is used to generate the flow characteristics, the imbalance ratio is set according to data input for the first time of GAN training as follows:

num＝N^-/N⁺

wherein N is^-The number of UAL samples; n is a radical of⁺And setting the number of NORMAL samples, and increasing num continuously during the process of generating the samples by repeatedly utilizing the GAN until the number of the NUM is 1, stopping generating the samples at the moment, and balancing the sample proportion.

Preferably, in the process of generating the traffic characteristics by using the GAN, the number of iterations of different types of samples is different, and the number of iterations of the GAN is:

count＝(N⁺-N')/x

wherein, the count is the iteration times needed by different types of data; n' is set as the initial sample number of the data type to be generated; x is a size parameter set by GAN.

The invention also provides a method for generating the network attack flow by using the system.

Preferably, the method comprises the following steps:

the GAN learns the characteristic distribution rule of the target flow and generates flow characteristics, and then the generated flow characteristics are sent to a flow generator;

the flow generator generates a packet sequence with randomness according to the flow characteristics, mixes the packet sequence with real attack flow, and generates the packet sequence into simulated flow through a flow generation algorithm;

and the local proxy server deforms the simulation flow and loads the load into the simulation flow to obtain the required final simulation attack flow output.

The invention also provides an application of the system in the technical field of network security.

(III) advantageous effects

The invention generates the network attack flow through training by applying the generating type confrontation network algorithm, is used for simulating the attack flow in the network environment, can be used for verifying the processing capacity of the safety protection system on abnormal data, and can also be applied to a network target range as a generation source of the attack flow.

Drawings

FIG. 1 is a basic framework diagram of a generative countermeasure network;

FIG. 2 is a block diagram of a network attack traffic generation system of the generative countermeasure network of the present invention;

fig. 3 is a flowchart of the network attack traffic generation system of the generative countermeasure network according to the present invention.

Detailed Description

In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.

The network attack traffic generation system of the generative countermeasure network comprises a generative countermeasure network gan (generating adaptive network), a traffic generator and a proxy system, and the overall design of the system is shown in fig. 2.

The GAN is used for learning a characteristic distribution rule of target flow and generating flow characteristics, and then the generated flow characteristics are sent to a flow generator; the flow generator is used for generating a packet sequence with randomness according to the flow characteristics, mixing the packet sequence with real attack flow, and generating the packet sequence into simulated flow through a flow generation algorithm; the proxy system comprises a local proxy server, wherein the local proxy server is used for deforming the simulation flow and loading the load into the simulation flow to obtain the required final simulation attack flow output.

The generative confrontation network imitates the mode of two-person zero-sum game in game theory and consists of a generator and a discriminator. The generator generates new data samples by self-learning sample features after receiving real data samples. The discriminator is used as a classifier and judges whether the input is real data or generated sample data. The basic framework of a GAN network is shown in fig. 1.

wherein log D (x) is the judgment of the discriminator; and log (1-D (G (z)) represents judgment of generated data, and G and D are cyclically and continuously optimized through continuous mutual game of maximum and minimum values until the two models reach Nash equilibrium. Gradient diffusion may occur in the minimized target function of GAN, so that the target function is difficult to update the generator again, and the training process of GAN is unstable. LSGANs penalize samples far from the decision boundary, solving the above problem. The gradient of these samples is the determining direction of the gradient descent. In LSGANs, the cross entropy does not concern distance, but only whether it is correctly classified, and the objective function of its discriminator is:

the goal function of GAN is:

wherein a, b and c are variable parameters, and in the process of training the model, a, b and c satisfy b-c ═ 1 and b-a ═ 2, the model can relieve instability in training and improve diversity of GAN generation characteristics.

The flow generator can generate a specific packet sequence according to the flow characteristics of the GAN, and the packet sequence of the real attack flow is mixed to generate the simulated attack flow. In the traffic generation algorithm, the present invention uses a cumulative representation of traffic (cUMUL) to guide the traffic generation process. For a sequence P containing packets_a＝[p_a1，p_a2，…，p_ai]In which p is_aiThe absolute value of (b) indicates the length of the ith packet, p_ai>0 indicates that the ith packet is an outbound packet and p_ai<0 indicates that the ith packet is a received packet; the cUMUL representation of the flow is a sequence c ═ c₁，c₂，…，c_N]Wherein c is₀＝0，c_i＝c_i－1+p_iI ∈ {1, 2, …, N }, where N denotes the length of the sequence.

The proxy system comprises a local proxy server, the client is connected with the local proxy server firstly and sends the simulated traffic generated by the traffic generator to the local proxy server, and then the local proxy server deforms the simulated traffic according to the generated traffic mode and outputs the deformed simulated traffic as final simulated attack traffic.

The following describes the working process of the generated system for generating traffic against network attacks by taking pre-collected unauthorized local super login attack traffic (UAL) as an example, and with reference to fig. 3, the method includes the following steps:

step 1 extracting characteristic types of data

The UAL network traffic is characterized as follows:

step 2 data sample feature analysis

(1) And (6) digitizing. Converting the character-type features into numerical-type features.

(2) And (6) standardizing. In the analysis process, firstly, the data after the digitization is processed by data standardization.

(3) And (6) normalizing. The values are normalized to the [0, 1] interval, resulting in a data set suitable for GAN and hence training therein.

Step 3 data sample Generation

And training the potential features of the UAL data type to GAN in times, generating sufficient sample data, and mixing the sample data into the original data to solve the problem of less samples of the type. The value of the selected GAN is selected as the value of the Size parameter, and the value of the selected GAN has two options: firstly, a full data set is selected, so that the representative sample characteristics can be better trained; secondly, the minimum number of samples is selected for training, namely only one sample is trained at a time, so that the function can be converged to the fastest speed. The UAL data flow 50 data types are selected as the whole block of data input into the model of the GAN.

Setting the unbalance proportion according to data input for the first time of GAN training as follows:

num＝N^-/N⁺

wherein N is^-The number of UAL samples; n is a radical of⁺Set to the number of NORMAL samples. During the process of generating samples by reusing GAN, num is increased continuously until it is 1. At this point, the sample generation is stopped and the sample ratios are balanced. The number of iterations for different types of samples in the process is also different,the iterative generation times of GAN are:

count＝(N⁺-N')/x

wherein, the count is the iteration times needed by different types of data; n' is set as the initial sample number of the data type to be generated; x is the Batch _ Size set by GAN. After introducing the UAL sample X + into the GAN model, the GAN function changes as follows:

and (4) inputting the traffic characteristics and the real attack traffic generated in the step (4) into a traffic generator, wherein the traffic generator outputs the traffic with the highest similarity with other traffic models in the generated traffic.

And 5, the local proxy server is responsible for deforming the traffic and loading the load into the generated traffic, and then obtaining the required attack traffic.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A system for generating network attack traffic based on a generative countermeasure network, comprising: a generative countermeasure network GAN, a traffic generator and a proxy system;

2. The system of claim 1, wherein the GAN is comprised of a generator that generates new data samples by self-learning sample feature training after receiving real data samples, and a discriminator that determines whether the input is real data or generated sample data;

the goal function of GAN is:

3. The system of claim 1, wherein the traffic generator is capable of generating specific packet sequences based on the traffic characteristics of GAN, generating simulated traffic by mixing the packet sequences of real attack traffic, and using the cumulative representation of traffic, cUMUL, in the traffic generation algorithm used in the mixing to guide the traffic generation process, for a packet sequence P_a＝[p_a1，p_a2，…，p_ai]Flow rate of p_aiThe absolute value of (b) indicates the length of the ith packet, p_ai>0 indicates that the ith packet is an outbound packet and p_ai<0 indicates that the ith packet is a received packet; the cUMUL representation of the flow is a sequence c ═ c₁，c₂，…，c_N]Wherein c is₀＝0，c_i＝c_i－1+p_iI ∈ {1, 2, …, N }, where N denotes the length of the sequence.

4. The system of claim 1, wherein the local proxy server morphs the simulated traffic according to the generated traffic pattern as a final simulated attack traffic output.

5. The system of claim 2, wherein the GAN is used to select the size parameter of the GAN during the generation of the traffic characteristics, and there are two options: firstly, selecting a full data set; second, choose the minimum number of samples to train, namely train only one sample at a time.

6. The system of claim 5, wherein the GAN is used in generating the flow characteristics by setting an imbalance ratio according to data input for the first time of GAN training as:

num＝N^-/N⁺

7. The system of claim 6, wherein the GAN is used to generate the traffic characteristics with different iterations for different types of samples, and wherein the number of iterations for the GAN is:

count＝(N⁺-N')/x

8. A method of implementing network attack traffic generation using the system of any one of claims 1 to 7.

9. The method of claim 8, comprising the steps of:

10. Use of a system according to any of claims 1 to 7 in the field of network security technology.