CN115021965A - Method and system for generating attack data of intrusion detection system based on generating type countermeasure network - Google Patents

Method and system for generating attack data of intrusion detection system based on generating type countermeasure network Download PDF

Info

Publication number
CN115021965A
CN115021965A CN202210485160.4A CN202210485160A CN115021965A CN 115021965 A CN115021965 A CN 115021965A CN 202210485160 A CN202210485160 A CN 202210485160A CN 115021965 A CN115021965 A CN 115021965A
Authority
CN
China
Prior art keywords
attack
data
sample
data sample
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210485160.4A
Other languages
Chinese (zh)
Other versions
CN115021965B (en
Inventor
孟博
杨杰
王德军
魏增颂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang Citic Youlian Software Co ltd
South Central Minzu University
Original Assignee
Shijiazhuang Citic Youlian Software Co ltd
South Central University for Nationalities
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang Citic Youlian Software Co ltd, South Central University for Nationalities filed Critical Shijiazhuang Citic Youlian Software Co ltd
Priority to CN202210485160.4A priority Critical patent/CN115021965B/en
Publication of CN115021965A publication Critical patent/CN115021965A/en
Application granted granted Critical
Publication of CN115021965B publication Critical patent/CN115021965B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method and a system for generating attack data of an intrusion detection system based on a generating type countermeasure network, which comprises the steps of firstly carrying out characteristic analysis on acquired data flow, then carrying out characteristic screening through a random forest algorithm, then preprocessing a data set, removing zero values and null values in the data set, and uniformly sampling various attack data; the constructed generative confrontation network model comprises a generator, a converter and a discriminator, wherein random noise is adopted by the generator as input, a new data sample is generated through a multilayer neural network, the converter combines non-attack characteristics of the generated data sample with attack characteristics of an attack behavior data sample to form a new attack sample, the new attack sample is delivered to the discriminator, real data and the data sample generated by the converter are trained uniformly, and training result parameters are transmitted to the generator for iterative training; in addition, the attack performance of the attack sample is evaluated through the detection of an intrusion detection system based on the deep belief network.

Description

Generation method and system of attack data of intrusion detection system based on generation type countermeasure network
Technical Field
The invention relates to the technical field of information security, in particular to a method and a system for generating attack data of an intrusion detection system based on a generating type countermeasure network.
Background
The research work of intrusion detection technology as an active security defense has been widely developed. Especially, along with the development of machine learning algorithm and deep learning algorithm, the detection algorithm is also more abundant. This aspect is also quite studied with respect to attacks by deep learning based intrusion detection systems. The network-based intrusion detection system IDS is an important branch of an intrusion detection system, which monitors a network through a system, collects data information of data packets, and observes and analyzes real-time network traffic to detect intrusion behavior in the network.
When the concept of deep learning is proposed, it is a great trend to construct a nonlinear network structure composed of a plurality of hidden layers to satisfy data classification. The 'depth' refers to the number of hidden layers in a neural network, a traditional neural network only comprises 2-3 hidden layers, deep learning can comprise up to 150 hidden layers, a plurality of continuous layers are adopted for operation, the layers are connected with one another, and each layer receives the output of the previous layer as input. For example: the automatic encoder consists of an encoder and a decoder for generating reconstruction, can represent linear transformation and nonlinear transformation, and is widely used for dimensionality reduction task in the intrusion detection field. The deep belief network is a directed deep neural network consisting of a plurality of layers of RBMs and a layer of BPs, features are extracted through a hidden layer to enable training data of the later layer to be more representative, the problem of detection of complex high-dimensional data can be solved, and the deep belief network is already applied to the field of intrusion detection.
Intrusion detection algorithms are various, and the detection efficiency and accuracy of the detection system are improved. The research on the direction of ensuring the safety and the reliability is deficient. At present, in a method for generating attack flow of a network-based intrusion detection system, iteration times are multiple, calculation efficiency is low, and generated disturbance time is long.
Disclosure of Invention
The invention provides a method and a system for generating attack data of an intrusion detection system based on a generating type countermeasure network, which are used for solving or at least partially solving the technical problems of poor generation efficiency and poor effect of the attack data in the prior art.
In order to solve the above technical problem, a first aspect of the present invention provides a method for generating attack data of an intrusion detection system based on a generative countermeasure network, including:
s1: acquiring data traffic, wherein the data traffic comprises normal network behavior data traffic and attack behavior data traffic;
s2: performing characteristic analysis on the acquired data traffic by adopting a traffic analysis tool to obtain a related data set, wherein the related data set comprises normal network behavior data samples and attack behavior data samples, and the normal network behavior data samples and the attack behavior data samples both comprise attack characteristics and non-attack characteristics;
s3: carrying out feature screening through a random forest algorithm, marking attack features and non-attack features of data samples in a related data set, and then preprocessing the data set after feature marking;
s4: constructing a generating type confrontation network model, wherein the model comprises a generator, a converter and a discriminator, the generator is used for learning the characteristic distribution rule of a normal network behavior data sample and generating an attack data sample, the converter is used for combining the non-attack characteristics contained in the generated attack data sample with the attack characteristics contained in the attack behavior data sample to form a new attack data sample, the discriminator is a two-classifier, the normal network behavior data sample in a relevant data set and the new attack data sample generated by the converter are subjected to unified training, whether the input data sample is a real data sample or a generated data sample is judged, then training result parameters are transmitted to the generator for iterative training, and the trained generating type confrontation network model is obtained;
s5: and generating target attack data by using the trained generative countermeasure network model.
In one embodiment, after step S5, the method further comprises:
and setting an intrusion detection system of the deep belief network, and detecting the attack performance of the generated target attack data.
In one embodiment, the step S3 of performing feature screening by using a random forest algorithm to mark attack features and non-attack features of the data samples in the relevant data sets includes:
and (4) carrying out feature screening through a random forest algorithm, marking the features with the importance ranking meeting the preset conditions as attack features, and marking the rest features as non-attack features.
In one embodiment, the preprocessing of the feature labeled data set in step S3 includes:
and clearing abnormal data in the data set after the characteristic mark, deleting data containing infinite numerical values and null values, and converting date numerical values into time stamps.
In one embodiment, in step S4, when performing the iterative training, the loss function is:
Figure BDA0003628919640000031
wherein, P r Is the probability distribution of the real data sample, P g Is the probability distribution of the generated data samples. W (P) r ,P g ) Is P r And P g Wasserstein distance of, pi (P) r ,P g ) Is P r And P g For each joint distribution, a pair of samples x and y is obtained by sampling from the set of all possible joint distributions combined by the distributions, | x-y | | | is the distance between the samples, and Ε (x,y)~γ [||x-y||]For the expected value of the sample versus distance under the joint distribution gamma,
Figure BDA0003628919640000032
representing the lower bound for solving for the expected value.
Based on the same inventive concept, the second aspect of the present invention provides a system for generating attack data of an intrusion detection system based on a generative countermeasure network, comprising:
the data traffic acquiring module is used for acquiring data traffic, wherein the data traffic comprises normal network behavior data traffic and attack behavior data traffic;
the characteristic analysis module is used for carrying out characteristic analysis on the obtained data traffic by adopting a traffic analysis tool to obtain a related data set, wherein the related data set comprises a normal network behavior data sample and an attack behavior data sample, and the normal network behavior data sample and the attack behavior data sample both contain attack characteristics and non-attack characteristics;
the characteristic screening and preprocessing module is used for screening characteristics through a random forest algorithm, marking attack characteristics and non-attack characteristics of data samples in a related data set and then preprocessing the data set after characteristic marking;
the model building and training module is used for building a generating type confrontation network model, and the model comprises a generator, a converter and a discriminator, wherein the generator is used for learning the characteristic distribution rule of a normal network behavior data sample and generating an attack data sample, the converter is used for combining non-attack characteristics contained in the generated attack data sample with attack characteristics contained in the attack behavior data sample to form a new attack data sample, the discriminator is a two-classifier, the normal network behavior data sample in a relevant data set and the new attack data sample generated by the converter are subjected to unified training, whether the input data sample is a real data sample or the generated data sample is judged, then training result parameters are transmitted to the generator for iterative training, and the trained generating type confrontation network model is obtained;
and the attack data generation module is used for generating target attack data by using the trained generative countermeasure network model.
One or more technical solutions in the embodiments of the present application at least have one or more of the following technical effects:
the invention provides a method for generating attack data of an intrusion detection system based on a generating type countermeasure network, which divides a data set into normal network behavior data samples and attack behavior data samples, the normal network behavior data samples are used as the input of a model for training, and the attack behavior data samples select partial attack characteristics to be combined with the non-attack characteristics of the generated attack data samples and do not directly participate in the training of the model. According to the method, the random forest algorithm is used for feature screening, features with the importance ranking at the top are identified as attack features, the non-attack features of the generated attack data samples are combined with the attack features of the attack behavior data samples to form new attack data samples, on one hand, the attack capability of the attack sample data can be guaranteed, on the other hand, the consumption of time and space of the model algorithm can be reduced, and therefore the generation effect and the generation efficiency of the attack data are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a basic framework diagram of a generative countermeasure network provided in an embodiment of the present invention;
fig. 2 is an architecture diagram of an attack data generation method of an intrusion detection system based on a generative countermeasure network provided in an embodiment of the present invention;
fig. 3 is a flowchart of a method for generating attack data of an intrusion detection system based on a generative countermeasure network according to an embodiment of the present invention.
Detailed Description
The invention provides a method and a system for generating attack data of an intrusion detection system based on a generation-type countermeasure network.
In order to achieve the above object, the main concept of the present invention is as follows:
firstly, acquiring data traffic, then performing characteristic analysis on the acquired data traffic by adopting a traffic analysis tool, then performing characteristic screening by adopting a random forest algorithm, marking attack characteristics and non-attack characteristics of data samples in a related data set, and dividing the data set into normal network behavior data samples and attack behavior data samples; and then constructing a generating type confrontation network model, learning the characteristic distribution rule of the normal network behavior data sample through a generator, generating an attack data sample, combining the non-attack characteristic of the generated attack data sample with the attack characteristic of the attack behavior data sample by a converter to form a new attack data sample, uniformly training the data sample of the normal network behavior in the data set and the new attack data sample generated by the converter by a discriminator, transmitting the training result parameters to the generator, performing iterative training, and finally generating target attack data through the trained generating type confrontation network model.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment of the invention provides a method for generating attack data of an intrusion detection system based on a generative confrontation network, which comprises the following steps:
s1: acquiring data traffic, wherein the data traffic comprises normal network behavior data traffic and attack behavior data traffic;
s2: performing characteristic analysis on the acquired data traffic by adopting a traffic analysis tool to obtain a related data set, wherein the related data set comprises normal network behavior data samples and attack behavior data samples, and the normal network behavior data samples and the attack behavior data samples both comprise attack characteristics and non-attack characteristics;
s3: carrying out feature screening through a random forest algorithm, marking attack features and non-attack features of data samples in a related data set, and then preprocessing the data set after feature marking;
s4: the method comprises the steps of constructing a generating type confrontation network model, wherein the generator is used for learning a feature distribution rule of a normal network behavior data sample and generating an attack data sample, the converter is used for combining non-attack features contained in the generated attack data sample with attack features contained in the attack behavior data sample to form a new attack data sample, the discriminator is a two-classifier, the normal network behavior data sample in a relevant data set and the new attack data sample generated by the converter are subjected to unified training, whether the input data sample is a real data sample or a generated data sample is judged, then training result parameters are transmitted to the generator for iterative training, and finally target attack data are generated through the trained generating type confrontation network model.
Specifically, the thought basis of the generative countermeasure network GAN is two-person zero-sum game, which has been proposed and is a research hotspot. Through research in recent years, the research field of GAN has been related to various fields. The method uses the GAN mode to generate the attack data, and in the specific application process, the generated attack data cheats an intrusion detection system using deep learning, so that the method is an effective attack scheme aiming at the defects of the intrusion detection system based on the deep learning. An attack approach that bypasses deep learning based intrusion detection system detection is constructed if needed. The first problem to be solved is how to generate a data flow that can bypass the intrusion detection system to detect. The intrusion detection system based on the deep belief network adopts the technical principle that the deep neural network is used for feature extraction, perception and learning. The resulting data traffic is to conform to such deep-learned traffic characteristics and probability distributions.
Secondly, ensuring that the generated data traffic can bypass the detection of the intrusion detection system and simultaneously has the attack capability is a technical problem to be solved. Otherwise, the generated data sample is general data traffic and cannot cause attacks on the target server or the target user.
In order to solve the above problems, this embodiment provides a method for generating attack data of an intrusion detection system based on a generative countermeasure network, which constructs a generator and a discriminator based on the idea of a null-sum game of the generative countermeasure network, and performs mutual iterative training. Meanwhile, a converter is arranged between the generator and the discriminator, so that the non-attack characteristics of the generated data can be reserved, and the non-attack characteristics are combined with the attack characteristics of the existing attack behavior data sample to obtain attack data with strong attack.
Fig. 3 is a flowchart of a method for generating attack data of an intrusion detection system based on a generative countermeasure network according to an embodiment of the present invention.
In a specific implementation process, the data traffic in step S1 may be downloaded by the user. Step S3 performs feature screening and preprocessing by using a random forest algorithm, and then classifies the normal network behavior data sample and the attack behavior data sample, for example, the label of the normal network behavior data sample is Begin, and the label of the attack behavior data sample is the attack type of the data. The obtained normal network behavior data sample is used as the input of the model to participate in model training, the attack characteristic of the attack behavior data sample is used for being combined with the non-attack characteristic of the attack data sample generated by the generator in the training process, and the attack characteristic does not directly participate in the model training.
Referring to fig. 1-2, fig. 1 is a basic framework diagram of a generative countermeasure network provided in an embodiment of the present invention. Fig. 2 is an architecture diagram of a method for generating attack data of an intrusion detection system based on a generative countermeasure network according to an embodiment of the present invention.
The generative countermeasure network model constructed in step S4 is composed of a generator, a converter, and a discriminator, where the input of the generator is a one-dimensional random variable, and the output is learned traffic characteristics (the generative countermeasure network learning generates a characteristic distribution of normal network behavior data samples, and the generated attack data samples include attack characteristics and non-attack characteristics, but generally do not have attack capability, and it is necessary to replace the attack characteristics included in the attack data samples with the attack characteristics of the attack network behavior data to make the attack characteristics possess the attack capability). The converter combines the non-attack characteristics of the attack data sample generated by the generator with the attack characteristics of the attack flow, and the discriminator is used as a two-classifier to judge whether the input is the real data characteristics or the generated sample data characteristics. And uniformly training the flow characteristics in the data set and the flow characteristics generated by the converter, and transmitting the training result parameters to the generator for iterative training.
In one embodiment, after step S4, the method further comprises:
and setting an intrusion detection system of the deep belief network, and detecting the attack performance of the generated target attack data.
In particular, network intrusion detection methods based on deep learning are the hot spot of current research. The deep belief network is composed of a plurality of RBMs and a layer of BP neural network. The training steps are mainly as follows: and training the RBM layer by layer. The hidden layer vector may be obtained by mapping the visible layer vector of each lower layer and then inputting the hidden layer vector as the visible layer vector of the next layer. And adding a BP neural network after the last RBM, and taking the output vector of the last RBM as the input vector of the BP neural network. Aiming at the deep belief network, a generative confrontation network can be constructed, and a generator and a discriminator which are formed by a convolutional neural network are constructed for the generative confrontation network. And performing a zero sum game to generate attack data with attack capability finally, wherein the attack data can bypass the detection of the intrusion detection system.
In one embodiment, the step S3 of performing feature screening by using a random forest algorithm to mark attack features and non-attack features of the data samples in the relevant data sets includes:
and (4) performing feature screening through a random forest algorithm, marking the features with importance ranks meeting the preset conditions as attack features, and marking the rest features as non-attack features.
Specifically, the normal network behavior data samples and the attack behavior data samples both contain attack features and non-attack features, and the attack features and the non-attack features are divided by using a random forest algorithm.
In the specific implementation process, the feature meeting the predetermined condition is a feature with a preset feature priority level, so as to indicate the importance degree of the feature in determining that the data sample has the attack capability.
In one embodiment, the preprocessing of the feature labeled data set in step S3 includes:
and clearing abnormal data in the data set after the characteristic mark, deleting data containing infinite numerical values and null values, and converting date numerical values into time stamps.
In one embodiment, in step S4, when performing the iterative training, the loss function is:
Figure BDA0003628919640000071
wherein, P r Is the probability distribution, P, of the true data sample g Is the probability distribution of the generated data samples. W (P) r ,P g ) Is P r And P g Wasserstein distance of, pi (P) r ,P g ) Is P r And P g For each joint distribution, a pair of samples x and y is obtained by sampling from the set of all possible joint distributions combined by the distributions, | x-y | | | is the distance between the samples, and Ε (x,y)~γ [||x-y||]For the expected value of the sample versus distance under the joint distribution gamma,
Figure BDA0003628919640000081
representing the lower bound for solving for the expected value.
Specifically, the probability distribution of the true data sample x is P r ,P g To generate a probability distribution for the data sample x. The generative confrontation network mainly learns the mapping relation from a random variable z to a real data sample x, wherein the z follows normal distribution, and a differential function g (z) is obtained through the generator, and the parameter is theta g Indicating the probability that the sample is from the generated data. Using the parameter theta d A discriminator function f (x) is defined, which represents the probability that x is true data. The arbiter is trained to maximize it. L (f, θ) d ) To generate a cost function for the antagonistic network:
Figure BDA0003628919640000082
Figure BDA0003628919640000083
for in mathematics, solve for P r And P g Integral versions of the probability distribution functions.
Theoretically deducing the optimal discriminator
Figure BDA0003628919640000084
Wherein D is * (x) Optimum discriminator function, P, for solving when the generator is fixed r (x) And P g (x) Respectively represent P r And P g The probability density of (c). Can pass through P r And P g The difference of the probability density is measured by the KL difference of (1), the JS divergence is the deformation of the KL divergence, JSD (P) r (x))||P g (x) Is P) r And P g JS divergence between, can also be to P r And P g Is measured. Further generating a cost function L (f, theta) of the antagonistic network d ) The derivation is to cover the training standard for JS divergence:
L(f,θ d )=-2log2+2JSD(P r (x)||P g (x))
however, when P is r And P g Where the probability distributions of (a) do not coincide, it is not possible to solve the phase by means of a gradient descent methodInformation on the gradient between the two distributions. Therefore, the WGAN uses the Wasserstein distance instead of Jensen-Shannon divergence, i.e., the final loss function adopted is the formula for the Wasserstein distance.
Through continuous maximum and minimum value mutual game and continuous optimization of the generators and the discriminators, the two modules (the generators and the discriminators) finally reach Nash balance, the data generated by the generators cannot be distinguished by the discriminators as real sample data or generated sample data,
in order to ensure that the generated data traffic can have attack capability, converters are created in the generator and the intrusion detection system, and the converters combine non-attack characteristics contained in the generated attack data samples with attack characteristics contained in the attack behavior data samples to form new data samples. The advantage of doing so is that can guarantee the aggressive ability of attacking sample data, and the method that the converter directly combines can reduce the consumption of model algorithm time space simultaneously.
In a specific example, the dataset used is the CSE-CIC-IDS-2018 dataset. The data set is a collaborative project between the communications security agency (CSE) and the canadian network security institute (CIC) to generate a diverse and comprehensive baseline data set for intrusion detection based on creating a user profile that contains an abstract representation of events and behaviors seen on the network, while the configuration files will be combined to generate a set of different data sets, each having a unique set of functions that can cover a portion of the assessment domain. The data set contains 7 different attack scenarios: brute-force, heartbed, Botnet, DoS, DDoS, Web attacks, and profiling of the network.
And analyzing the characteristics of the data flow by using a flow analysis tool to obtain a related data set. The related traffic characteristics can be extracted by using a CICFlowMeter tool, which is a network traffic stream generator written by using Java. Finally, FlowID, SourceIP, DestinationIP, SourcePort, DestinationPort and the network flow characteristics with more than 80 are obtained.
And performing feature selection through a random forest algorithm, and marking attack features and non-attack features of data samples in the related data sets. And then preprocessing the data set, and clearing abnormal data in the data set. And classifying the normal network behavior data samples and the attack behavior data samples, wherein the normal network behavior data samples are used as the input of the model to participate in model training, the attack characteristics of the attack behavior data samples are directly combined with the non-attack characteristics of the attack data samples generated by the generator in the training process, and the attack characteristics do not directly participate in model training.
Combining the attack characteristics in the CSE-CIC-IDS-2018 data set and the non-attack characteristics generated by the generator through the converter, sending the combined characteristics to the discriminator to perform secondary classification, and performing iterative training on the discriminator and the generator to generate enough sample data. And finally, the generated target attack data is transmitted to an intrusion detection system of the deep belief network to be used as a detector, and the attack performance of the generated data is detected.
In the specific implementation process, the attack flow is selected according to different attack flows. The attack characteristics of various attack modes can be combined with the non-attack characteristics of the generated attack data sample to simulate various attack methods, including but not limited to Dos attack, Brute-force, heartbed, Botnet and other attack methods.
In general, the invention is a method for generating attack data for a generative-based intrusion detection system against a network, with the object of generating attack data with attack characteristics that are able to bypass the detection of an intrusion detection system based on a deep belief network,
firstly, performing characteristic analysis on acquired data flow by adopting a flow analysis tool, then performing characteristic screening by a random forest algorithm, then preprocessing a data set, removing zero values and null values in the data set, and uniformly sampling various attack data; the constructed generative confrontation network model comprises a generator, a converter and a discriminator, wherein random noise is adopted by the generator as input, a new data sample is generated through a multilayer neural network, the converter combines non-attack characteristics of the generated data sample with attack characteristics of a real data sample (attack behavior data sample) to form a new attack sample, the new attack sample is delivered to the discriminator, the real data and the data sample generated by the converter are trained uniformly, and training result parameters are transmitted to the generator for iterative training; in addition, the attack performance of the attack sample is evaluated through the intrusion detection system detection based on the deep belief network.
It is worth noting that the data set is divided into normal network behavior data samples and attack behavior data samples, the normal network behavior data samples are used as the input of the model for training, the attack behavior data samples select partial attack characteristics to be combined with non-attack characteristics in the generated attack data, and the partial attack characteristics do not directly participate in the training of the model
The attack data generated by the method of the invention can implement effective network attack to the intrusion detection system based on deep learning. According to the method, the specific attack characteristics of the selected attack behavior data samples are combined with the non-attack characteristics of the generated attack sample data according to the difference of the selected attack behavior data samples, and various attack methods are simulated, including but not limited to Dos attack, Brute-force, Heartbed, Botnet and other attack methods. The method has the advantages that the random forest algorithm is used for feature screening, the features with the top importance rank are identified as attack features, and the attack features are combined with the non-attack features of the generated samples, so that the attack capability of the generated samples can be efficiently reserved.
Example two
Based on the same inventive concept, the embodiment provides a system for generating attack data of an intrusion detection system based on a generative countermeasure network, which comprises:
the data traffic acquiring module is used for acquiring data traffic, wherein the data traffic comprises normal network behavior data traffic and attack behavior data traffic;
the characteristic analysis module is used for carrying out characteristic analysis on the acquired data traffic by adopting a traffic analysis tool to obtain a related data set, wherein the related data set comprises a normal network behavior data sample and an attack behavior data sample, and the normal network behavior data sample and the attack behavior data sample both contain attack characteristics and non-attack characteristics;
the characteristic screening and preprocessing module is used for screening characteristics through a random forest algorithm, marking attack characteristics and non-attack characteristics of data samples in a related data set and then preprocessing the data set after characteristic marking;
the model building and training module is used for building a generating type confrontation network model, and the model comprises a generator, a converter and a discriminator, wherein the generator is used for learning the characteristic distribution rule of a normal network behavior data sample and generating an attack data sample, the converter is used for combining non-attack characteristics contained in the generated attack data sample with attack characteristics contained in the attack behavior data sample to form a new attack data sample, the discriminator is a two-classifier, the normal network behavior data sample in a relevant data set and the new attack data sample generated by the converter are subjected to unified training, whether the input data sample is a real data sample or the generated data sample is judged, then training result parameters are transmitted to the generator for iterative training, and the trained generating type confrontation network model is obtained;
and the attack data generation module is used for generating target attack data by using the trained generative countermeasure network model.
Since the system described in the second embodiment of the present invention is a system adopted for implementing the method for generating attack data of the intrusion detection system based on the generative countermeasure network in the first embodiment of the present invention, a person skilled in the art can understand the specific structure of the system based on the method described in the first embodiment of the present invention, and details thereof are not described herein. All systems adopted by the method of the first embodiment of the present invention are within the intended protection scope of the present invention.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A method for generating attack data for a generative-based intrusion detection system against a network, comprising:
s1: acquiring data traffic, wherein the data traffic comprises normal network behavior data traffic and attack behavior data traffic;
s2: performing characteristic analysis on the acquired data traffic by adopting a traffic analysis tool to obtain a related data set, wherein the related data set comprises normal network behavior data samples and attack behavior data samples, and the normal network behavior data samples and the attack behavior data samples both comprise attack characteristics and non-attack characteristics;
s3: carrying out feature screening through a random forest algorithm, marking attack features and non-attack features of data samples in a related data set, and then preprocessing the data set after feature marking;
s4: constructing a generating type confrontation network model, wherein the model comprises a generator, a converter and a discriminator, the generator is used for learning the characteristic distribution rule of a normal network behavior data sample and generating an attack data sample, the converter is used for combining the non-attack characteristics contained in the generated attack data sample with the attack characteristics contained in the attack behavior data sample to form a new attack data sample, the discriminator is a two-classifier, the normal network behavior data sample in a relevant data set and the new attack data sample generated by the converter are subjected to unified training, whether the input data sample is a real data sample or a generated data sample is judged, then training result parameters are transmitted to the generator for iterative training, and the trained generating type confrontation network model is obtained;
s5: and generating target attack data by using the trained generative countermeasure network model.
2. The method for generating attack data for a generative-based intrusion detection system to combat a network as recited in claim 1, wherein after step S5, the method further comprises:
and setting an intrusion detection system of the deep belief network to detect the attack performance of the generated target attack data.
3. The method as claimed in claim 1, wherein the step S3 of performing feature screening by using a random forest algorithm to mark attack features and non-attack features of the data samples in the related data sets comprises:
and (4) performing feature screening through a random forest algorithm, marking the features with importance ranks meeting the preset conditions as attack features, and marking the rest features as non-attack features.
4. The method as claimed in claim 1, wherein the preprocessing of the signature labeled data set in step S3 includes:
and clearing abnormal data in the data set after the characteristic mark, deleting data containing infinite numerical values and null values, and converting date numerical values into time stamps.
5. The method as claimed in claim 1, wherein the step S4 of iterative training is performed by using a loss function as follows:
Figure FDA0003628919630000021
wherein, P r Is the probability distribution, P, of the true data sample g For the probability distribution of the generated data samples, W (P) r ,P g ) Is P r And P g Wasserstein distance of, pi (P) r ,P g ) Is P r And P g For each joint distribution, a pair of samples x and y is obtained by sampling from the set of all possible joint distributions combined by the distributions, | x-y | | | is the distance between the samples, and Ε (x,y)~γ [||x-y||]For the expected value of the sample versus distance under the joint distribution gamma,
Figure FDA0003628919630000022
representing the lower bound for solving for the expected value.
6. A system for generating attack data based on a generative countering network intrusion detection system, comprising:
the data traffic acquiring module is used for acquiring data traffic, wherein the data traffic comprises normal network behavior data traffic and attack behavior data traffic;
the characteristic analysis module is used for carrying out characteristic analysis on the acquired data traffic by adopting a traffic analysis tool to obtain a related data set, wherein the related data set comprises a normal network behavior data sample and an attack behavior data sample, and the normal network behavior data sample and the attack behavior data sample both contain attack characteristics and non-attack characteristics;
the characteristic screening and preprocessing module is used for screening characteristics through a random forest algorithm, marking attack characteristics and non-attack characteristics of data samples in a related data set and then preprocessing the data set after characteristic marking;
the model building and training module is used for building a generating type confrontation network model, and the model comprises a generator, a converter and a discriminator, wherein the generator is used for learning the characteristic distribution rule of a normal network behavior data sample and generating an attack data sample, the converter is used for combining non-attack characteristics contained in the generated attack data sample with attack characteristics contained in the attack behavior data sample to form a new attack data sample, the discriminator is a two-classifier, the normal network behavior data sample in a relevant data set and the new attack data sample generated by the converter are subjected to unified training, whether the input data sample is a real data sample or the generated data sample is judged, then training result parameters are transmitted to the generator for iterative training, and the trained generating type confrontation network model is obtained;
and the attack data generation module is used for generating target attack data by using the trained generative countermeasure network model.
CN202210485160.4A 2022-05-06 2022-05-06 Method and system for generating attack data of intrusion detection system based on generation type countermeasure network Active CN115021965B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210485160.4A CN115021965B (en) 2022-05-06 2022-05-06 Method and system for generating attack data of intrusion detection system based on generation type countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210485160.4A CN115021965B (en) 2022-05-06 2022-05-06 Method and system for generating attack data of intrusion detection system based on generation type countermeasure network

Publications (2)

Publication Number Publication Date
CN115021965A true CN115021965A (en) 2022-09-06
CN115021965B CN115021965B (en) 2024-04-02

Family

ID=83068968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210485160.4A Active CN115021965B (en) 2022-05-06 2022-05-06 Method and system for generating attack data of intrusion detection system based on generation type countermeasure network

Country Status (1)

Country Link
CN (1) CN115021965B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331245A (en) * 2022-10-12 2022-11-11 中南民族大学 Table structure identification method based on image instance segmentation
CN117527451A (en) * 2024-01-08 2024-02-06 国网江苏省电力有限公司苏州供电分公司 Network intrusion detection method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110061869A (en) * 2019-04-09 2019-07-26 中南民族大学 A kind of network path classification method and device based on keyword
WO2020192849A1 (en) * 2019-03-28 2020-10-01 Conti Temic Microelectronic Gmbh Automatic identification and classification of adversarial attacks
CN113395280A (en) * 2021-06-11 2021-09-14 成都为辰信息科技有限公司 Anti-confusion network intrusion detection method based on generation of countermeasure network
CN113922985A (en) * 2021-09-03 2022-01-11 西南科技大学 Network intrusion detection method and system based on ensemble learning
CN114091661A (en) * 2021-11-24 2022-02-25 北京工业大学 Oversampling method for improving intrusion detection performance based on generation countermeasure network and k-nearest neighbor algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020192849A1 (en) * 2019-03-28 2020-10-01 Conti Temic Microelectronic Gmbh Automatic identification and classification of adversarial attacks
CN110061869A (en) * 2019-04-09 2019-07-26 中南民族大学 A kind of network path classification method and device based on keyword
CN113395280A (en) * 2021-06-11 2021-09-14 成都为辰信息科技有限公司 Anti-confusion network intrusion detection method based on generation of countermeasure network
CN113922985A (en) * 2021-09-03 2022-01-11 西南科技大学 Network intrusion detection method and system based on ensemble learning
CN114091661A (en) * 2021-11-24 2022-02-25 北京工业大学 Oversampling method for improving intrusion detection performance based on generation countermeasure network and k-nearest neighbor algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIHAN XIAO 等: "An intrusion detection model based on feature reduction and convolutional neural networks", 《IEEE ACCESS》, 12 March 2019 (2019-03-12) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331245A (en) * 2022-10-12 2022-11-11 中南民族大学 Table structure identification method based on image instance segmentation
CN115331245B (en) * 2022-10-12 2023-02-03 中南民族大学 Table structure identification method based on image instance segmentation
CN117527451A (en) * 2024-01-08 2024-02-06 国网江苏省电力有限公司苏州供电分公司 Network intrusion detection method, device, electronic equipment and storage medium
CN117527451B (en) * 2024-01-08 2024-04-02 国网江苏省电力有限公司苏州供电分公司 Network intrusion detection method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115021965B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
Ding et al. Intrusion detection system for NSL-KDD dataset using convolutional neural networks
CN115021965A (en) Method and system for generating attack data of intrusion detection system based on generating type countermeasure network
Chebrolu et al. Feature deduction and ensemble design of intrusion detection systems
Le et al. Data analytics on network traffic flows for botnet behaviour detection
Araújo et al. Identifying important characteristics in the KDD99 intrusion detection dataset by feature selection using a hybrid approach
CN111652732A (en) Bit currency abnormal transaction entity identification method based on transaction graph matching
Nadiammai et al. A comprehensive analysis and study in intrusion detection system using data mining techniques
CN113114673A (en) Network intrusion detection method and system based on generation countermeasure network
Maidamwar et al. Implementation of network intrusion detection system using artificial intelligence: Survey
Latif et al. Analyzing feasibility for deploying very fast decision tree for DDoS attack detection in cloud-assisted WBAN
Corchado et al. Detecting compounded anomalous SNMP situations using cooperative unsupervised pattern recognition
Sridevi et al. Genetic algorithm and artificial immune systems: A combinational approach for network intrusion detection
Lorenzo-Fonseca et al. Intrusion detection method using neural networks based on the reduction of characteristics
Sharma et al. Recent trend in Intrusion detection using Fuzzy-Genetic algorithm
Kumar et al. Preserving Security of Crypto Transactions with Machine Learning Methodologies
Premaratne et al. Evidence theory based decision fusion for masquerade detection in IEC61850 automated substations
Akshobhya Machine learning for anonymous traffic detection and classification
Jazzar et al. A novel soft computing inference engine model for intrusion detection
Sekhar et al. Classification performance improvement by enhancing the detection accuracy of DDOS attacks over flash crowd using CROSS GAN (XGAN)
Aguiló–Gost et al. A Machine Learning IDS for Known and Unknown Anomalies
CN115622796B (en) Network security linkage response combat map generation method, system, device and medium
WO2024007565A1 (en) Network analysis using optical quantum computing
Alhazzaa et al. Intrusion Detection Systems using Genetic Algorithms
Bing et al. Application process of machine learning in cyberspace security
KR102497737B1 (en) A system and method for detecting undetected network intrusions types using generative adversarial network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant