CN112087447A - Rare attack-oriented network intrusion detection method - Google Patents

Rare attack-oriented network intrusion detection method Download PDF

Info

Publication number
CN112087447A
CN112087447A CN202010928410.8A CN202010928410A CN112087447A CN 112087447 A CN112087447 A CN 112087447A CN 202010928410 A CN202010928410 A CN 202010928410A CN 112087447 A CN112087447 A CN 112087447A
Authority
CN
China
Prior art keywords
attack
rare
data
network
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010928410.8A
Other languages
Chinese (zh)
Other versions
CN112087447B (en
Inventor
钱俊彦
沈荔萍
翟仲毅
赵岭忠
李�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wanganxin Technology Co ltd
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN202010928410.8A priority Critical patent/CN112087447B/en
Publication of CN112087447A publication Critical patent/CN112087447A/en
Application granted granted Critical
Publication of CN112087447B publication Critical patent/CN112087447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a rare attack-oriented network intrusion detection method, which comprises the steps of firstly, carrying out feature extraction on unbalanced data through a genetic coding algorithm and a random forest to finally obtain an optimized subset; then, respectively constructing a common attack set and a rare attack set by separating data; then, training a joint attack classifier based on a convolutional neural network by utilizing a common attack set and a rare attack set; and finally, detecting the network data by using the trained joint attack classifier. According to the invention, under the condition of unbalanced network data, the two sub-classifiers of the constructed combined attack classifier can be ensured to be effectively learned through the mode of firstly detecting the common attack of the network data and then detecting the rare attack, so that the common attack and the rare attack can be detected, and the effect of detecting the rare attack is improved.

Description

Rare attack-oriented network intrusion detection method
Technical Field
The invention relates to the technical field of network intrusion detection, in particular to a rare attack-oriented network intrusion detection method.
Background
Many years ago, researchers have conducted much research around the area of network intrusion detection. Heberlein proposes to identify anomalous behavior by supervising the communication between network users. Afterwards, in order to improve the recognition capability and efficiency of network attacks, Mark Crobie introduces an active agent mechanism for the research of intrusion detection. In continuous research, related technologies of network intrusion detection are rapidly developed, however, the technologies still have the problems of low identification accuracy and high false alarm rate of network attacks, and meanwhile, the technologies cannot respond to some novel network attacks in time.
In recent years, conventional machine learning techniques have been applied to the field of network intrusion detection. The technique can model network data features and then predict future network behavior or classify ongoing network behavior using learned models, where commonly used learning algorithms include decision trees, random forests, support vector machines, and K-neighborhood algorithms. For example, Shinjinn Horng proposes a network intrusion detection method based on a support vector machine, which includes basic attack feature extraction and classification functions, and which exhibits higher performance in identifying Dos attacks and Probe attacks than some previous works. And for example, any owner and the like propose an intrusion detection method based on the combination of a K-proximity algorithm and a random forest, wherein the method adopts the K-proximity algorithm to preprocess a data set. A random forest algorithm is then used to train a classifier based on the newly acquired dataset, which can improve detection performance.
With the development of deep learning, researchers introduce deep learning into the field of network intrusion detection based on its good performance in the fields of image recognition, natural language processing, behavior recognition, and the like. The technology can learn the mapping function of input and output from the network data to extract the attack characteristics, and the process does not need excessive human intervention, so that the human consumption is reduced, and the error rate is reduced. The deep learning model commonly used in network intrusion detection comprises a deep confidence network, a convolutional neural network, a cyclic neural network and a variant long-short memory time sequence network thereof. For example, Nadeem and the like propose an intrusion detection method based on a DBN, the method combines an unsupervised learning algorithm and a supervised learning algorithm, and has higher robustness when processing mass network data. And if the Staudemeyer models network traffic into time series data by using the LSTM, then the data is used for training the recurrent neural network, and experimental results show that the method can improve the network intrusion detection performance.
Although many network intrusion detection methods with good performance are proposed based on the conventional machine learning and deep learning, if the distribution of attack categories in a network data set is seriously unbalanced, i.e., the rare attacks account for less than 2% of the total number, the rare attack categories in the data set are overwhelmed by the dominant non-rare attack categories (common attacks). Under the condition, the existing network intrusion detection method cannot effectively learn the characteristics of rare attack categories, so that the accuracy of a classification model is influenced, and the detection rate of rare attacks is low.
Disclosure of Invention
The invention aims to solve the problem that the existing network intrusion detection method has poor detection effect on rare attacks in unbalanced network data, and provides a rare attack-oriented network intrusion detection method.
In order to solve the problems, the invention is realized by the following technical scheme:
the rare attack-oriented network intrusion detection method comprises the following steps:
step 1, collecting network attack data, and processing the collected network attack data into network data with an attack type label as a training set;
step 2, after the feature selection is carried out on the training set by utilizing a genetic programming algorithm, a subdata set with selected features is generated;
step 3, evaluating the accuracy of the subdata sets by using a random forest, and calculating the fitting value of the subdata sets by using the accuracy: if the fitting value of the sub data set reaches the target fitting value, ending the iteration, taking the sub data set as an optimized data set, and turning to the step 4; otherwise, the subdata set is used as a training set, and the step 2 is returned to, and the iteration is continued;
step 4, dividing the optimized data set into a common attack set and a rare attack set according to the feature tags of the network data attack types;
step 5, firstly, constructing an initial common attack classification model by using a convolutional neural network, and sending a common attack set to the initial common attack classification model to train the model to obtain a common attack classifier; then, constructing an initial rare attack classification model by using a common attack classifier, and sending a rare attack set to the initial rare attack classification model to train the model to obtain a rare attack classifier; finally, cascading the common attack classifier and the rare attack classifier to form a combined attack classifier;
step 6, sending the network data to be detected collected in real time into a joint attack classifier for classification; in the joint attack classifier, firstly, sending network data to be detected into a common attack classifier for first detection, and judging whether common attacks exist: if the network data to be detected exists, judging that the network data to be detected is subjected to common attack; if not, sending the network data to be detected into the rare attack classifier for secondary detection, and judging whether rare attacks exist: if the network data to be detected exists, judging that the network data to be detected is subjected to rare attack; and if the network data to be detected does not exist, judging that the network data to be detected is normal network data which is not attacked.
In the above step 3, the fitting value f of the sub data setfitnessComprises the following steps:
Figure BDA0002669293110000021
where score represents the accuracy of the sub data set obtained using a random forest for evaluation, and n represents the number of trees in the random forest.
In order to prevent the dead cycle condition that all the subdata sets fail to reach the required target fitting value in the whole iteration process, in step 3, if a preset iteration number is reached and the fitting value of the subdata set does not reach the target fitting value yet, the subdata set with the highest fitting value in all iterations is used as an optimized data set so as to ensure that the optimal subdata set is sent to a subsequent step as the optimized data set and ensure the subsequent classification accuracy.
Compared with the prior art, the invention has the following characteristics:
1. the method carries out feature extraction on unbalanced data through a genetic coding algorithm and a random forest, finally obtains an optimized subset, and eliminates the influence of redundant data on a rare attack mode; meanwhile, a common attack set and a rare attack set are respectively constructed by separating data to balance data distribution among attack types.
2. The invention provides a joint attack classifier based on a convolutional neural network, so that the rare attack classifier can perform efficient learning based on small sample data, and further, the detection capability of rare attacks is improved. Firstly, training a common attack classifier based on a common attack type with huge data volume, and obtaining a proper model by adjusting parameters; then, based on the idea of transfer learning, the common attack classifier is used as an initialization model of the rare attack classifier, and then the rare attack classifier is continuously finely adjusted on the rare attack set. And finally, respectively training a convolutional neural network based on the obtained subsets to obtain a common attack classifier and a rare attack classifier, and connecting the two sub-classifiers in a series connection mode to construct a combined attack classifier for detecting the rare attack type.
3. According to the invention, under the condition of unbalanced network data, the two sub-classifiers of the constructed combined attack classifier can be ensured to be effectively learned through the mode of firstly detecting the common attack of the network data and then detecting the rare attack, so that the common attack and the rare attack can be detected, and the effect of detecting the rare attack is improved.
Drawings
FIG. 1 is a structure of a joint attack classifier.
Fig. 2 is a flowchart of a rare attack-oriented network intrusion detection method.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to specific examples.
In order to solve the problem of low detection performance of rare attacks caused by unbalanced data distribution among attack categories in the network intrusion detection process, the key technology adopted by the invention is as follows:
(1) provides a feature extraction technology based on genetic programming and random forests
In general, data used for network intrusion detection not only contains a large amount of redundant information, but also has an imbalance in distribution among attack categories in the data. In unbalanced network data, the existence of redundant data seriously affects the classification capability of the classifier on rare attack data, so that the overall network intrusion detection performance is reduced, and a higher false alarm rate is generated. In order to eliminate the influence caused by redundant data, the network data is processed before the attack classifier is trained, wherein a commonly used technology is feature extraction, and the technology is to eliminate redundant attack features and keep important information. In order to eliminate information which generates negative influence in the detection process and highlight rare attack characteristics, the invention provides a method for cleaning redundant data by combining genetic programming and random forests. The method comprises the steps of firstly finding out subgroups of network data by using a genetic programming algorithm, then evaluating each subgroup by using a random forest algorithm, finally selecting one with the highest fitting value, and forming a new data set by using features appearing in a random forest in the subgroups, wherein an overfitting feature set is effectively prevented from being generated in the process.
The genetic algorithm is designed according to the principle of biological evolution, a population representing a potential problem solution is continuously evolved to finally find an optimized problem solution set, and the basic process comprises the following steps:
the first step is as follows: n individuals are randomly initialized, the population at this time is marked as P (0), and iteration T of population evolution is initialized.
The second step is that: representative individuals with good performance are selected and inherited to the next generation directly or after cross-pairing.
The third step: and carrying out evolution operation on the new population and changing the individual value of the new population.
The fourth step: the population P (t) generates a next generation population P (t +1) through an operation mode of crossing, mutation and selection. And when the value of T is equal to T, the iteration is finished, and the population evolution operation is finished.
Genetic programming algorithms are extensions of genetic algorithms that are typically used as an optimization technique to find solution sets for a particular problem while forming solution set populations for the problem to be solved. Briefly, a genetic programming algorithm can be represented by the following formula:
gt+1=ffitness(gt)t=0,1,2, ,n
the algorithm first randomly initializes a population g0As an original solution set of the problem to be solved; then using the fitting function ffitnessAt the current population gtSelecting a plurality of individuals to operate to form a next generation population gt+1. The selection process generates a tree that defines rules and functions and evolves through genetic operations such as crossing, mutation, and breeding until iteration is complete.
The random forest acts as an integrated evaluator that uses averaging to control the overfitting problem. The main implementation process of the random forest is as follows: firstly, constructing different sample sets from original data; then training a decision tree based on each sample set; and finally, integrating all decision trees to form a forest. After the forest is formed, each tree votes for the same classification target. Finally, the forest will define the category of the target as the type with the most votes.
The random forest can effectively reduce the generation of overfitting solution sets, and the random forest is used as a filtering method to evaluate each solution set obtained from a genetic programming algorithm. Firstly, obtaining the individuals of solution set in genetic programming algorithm, and according to the obtained individuals in the original data setTo construct a temporary data set X in which a large number of samples (X) are presenti,yi) Where i is 1, …, l, and xi∈RnN denotes the characteristic dimension in the dataset, yi∈Z+A category representing a variable Y; then classifying by utilizing a random forest based on a new data set X, and obtaining a fitting value of the data set; finally, the data set with the highest fitting value is selected from the generated population, and the features appearing in the random forest are selected from the data set to construct a final optimized data set.
The feature extraction process based on genetic programming and random forests is as follows:
1) setting parameters of a genetic programming algorithm and the number of iterations.
2) Parameters and iteration numbers are initialized, wherein each population parameter represents a solution set of a specific problem, and individuals in the population represent features in the data set.
3) The new data set obtained after one pass of genetic programming is called a sub data set after feature selection.
4) And evaluating the accuracy of the sub data sets by using a random forest algorithm, and calculating a fitting value of each sub data set by using a fitting function so as to evaluate the sub data sets generated by each characteristic selection.
The fitting function is defined as follows:
ffitness=score/n
where score represents the accuracy of classification of each sub data set using a random forest, and n represents the number of trees in the forest.
5) The above process is repeated until the fitting value of the sub data set reaches the target fitting value or reaches a predetermined number of iterations. When the fitting value of the sub data set reaches the target fitting value, ending iteration and taking the sub data set as an optimized data set; and if the target fitting value is not reached until the iteration is finished, taking the subdata set with the highest fitting value in all iterations as an optimized data set.
(2) Provides a data separation technology
In order to solve the problem of low detection capability on rare attacks caused by unbalanced attack data, the invention provides a method for separating an attack data set and constructing a combined attack classifier. Taking the NSL-KDD data set as an example, the training samples of U2R and R2L included in the NSL-KDD data set are small in number, and the conventional machine learning classifier or deep learning classifier is trained directly based on the original data set, so that it is difficult to identify the two attack types in the classification process. Therefore, the invention obtains the common attack set and the rare attack set by separating the data sets, and the process can reduce the unbalance degree between attack data distribution. And separating the data, and respectively constructing a common attack set and a rare attack set. The common attack set consists of a large amount of data of common attack types and part of normal network traffic; the rare attack set consists of rare attack types containing a small number of records and part of normal network traffic. And in the process of constructing the subset, the normal type data is divided in a random selection mode. Compared with the original data, the data distribution in the common attack set and the rare attack set is more balanced, and the attack classifier trained on the two data sets can obtain better classification performance. When processing unbalanced data, a common processing method is to reduce the degree of imbalance between different classes by sampling data and construct an attack classifier to intensively learn attack patterns of all classes. Meanwhile, the method provides a data basis for designing a combined attack classifier to solve the problem of low detection performance of rare attacks caused by data imbalance. After data is separated, the number of samples in the common attack set is large, and the number of samples in the rare attack set is small.
(3) Provides a combined attack classifier based on deep learning and traditional machine learning
The data separation mode can balance the distribution among attack types, and the joint attack classifier obtained based on new data set training eliminates the influence of common attack mode on rare attack mode. However, after the data is separated, the rare attack set has a small sample number, which may affect the performance of the rare attack classifier. Therefore, based on the characteristics that the data volume of the common attack set is huge and the data volume of the rare attack set is small, the deep learning technology is used for constructing the common attack classifier, the traditional machine learning technology is used for constructing the rare attack classifier, and the two attack classifiers are connected to construct the combined attack classifier. The method comprises the steps of firstly, constructing a common attack classifier by using a convolutional neural network, and adjusting training parameters by using a common attack set to obtain a proper common attack classifier; then introducing a transfer learning idea, taking a common attack classifier as a source model to initialize a rare attack classifier, and utilizing a rare attack set to fine tune training parameters to obtain a proper rare attack classifier; and finally, connecting the two classifiers to form a joint attack classifier, as shown in FIG. 1.
The convolutional neural network mainly reduces parameters in the neural network by sharing weights, and most of the traditional convolutional neural networks are constructed for tasks such as image recognition, video processing and the like. In the convolutional neural network, the convolutional layer is mainly used for extracting high-dimensional features of data, and the pooling layer further reduces the dimension of the obtained feature set, so that the computational complexity is reduced. The data size of general image data after multilayer convolution is large, and the calculation consumption can be effectively reduced after reasonable pooling operation. Different from image data, after the network data is subjected to feature extraction, a convolutional neural network is used for mapping to a higher-dimensional space, the obtained network data features become very sparse, and the dimension reduction is performed on the sparse network data features by using a pooling operation, so that some important information in the data is seriously lost, and the training effect of an attack classifier is further influenced. The convolutional neural network is trained by using a back propagation algorithm, firstly, input data is given, and output is obtained through calculation after a series of high-dimensional mapping operations; then using an error function to compare the difference between the output and the true input sample data label; and finally updating the weight value through back propagation. The convolutional neural network used in this chapter is composed of an input layer, a convolutional layer, a fully-connected layer, and a classification layer.
The invention relates to a joint attack classifier based on a convolutional neural network, which uses two convolutional neural networks with the same structure. In order to eliminate the influence of common attacks in the data set on rare attacks, the network data needs to be separated, and a common attack set and a rare attack set are respectively constructed. In practical situations, when there is a correlation with most of the data or tasks encountered and the amount of problem data to be solved is insufficient, a migration learning technique may be applied for optimization. The transfer learning optimizes the learning efficiency of the new model by sharing the learned high-quality model parameters with the new model, and then the new model completes fine adjustment in the task field of the new model. In order to enable the rare attack classifier to obtain effective classification performance under the condition of small sample size of the rare attack set, the invention applies the transfer learning technology to train the attack classifier.
The specific implementation algorithm for training the common attack classifier is as follows:
inputting: given a common attack dataset X ═ X (X)1,x2,…,xn) Where n represents the number of samples in the data set. Given the number of filters T, the number of fully connected layers L.
And (3) outputting: the class y of the sample is input.
The first step is as follows: for each classifier T e [1, T]Initializing the weight W of the classifiertAnd bias btSo that: wt=0,bt0. Then using ft=tanh(WtX+bt) A new set of features obtained after the input data has undergone a convolution operation is computed.
The second step is that: for each layer of full connection layer L is belonged to [1, L ∈]Initializing the weight W of the full connection layertAnd bias btSo that: wl=0,bl0, then h is defined0And f, finally calculating the output of the l-th layer full connection layer: h isl=relu(Wlhl-1+bl)。
The third step: calculating the output y of the last classification layer as softmax (W)LhL-1+bL)。
The fourth step: and calculating a loss function, and updating parameters by using a gradient descent algorithm.
Firstly, a common attack model is trained, and a random gradient descent algorithm is adopted to update the weight and the bias. After obtaining the common attack classifier, the initialization model is obtained. When training the rare attack classifier, firstly obtaining all weight matrixes W and bias b of the common attack classifier, and after the rare attack classifier is constructed, initializing the preliminarily established rare attack classifier by using the obtained parameters, wherein the specific implementation process is as follows:
inputting: given a rare attack dataset X '═ X'1,x'2,…,x'n) Where n 'represents the number of samples in the data set containing n'. Given the number of filters T, the number of fully-connected layers L, the initialized weight WtAnd bias bt
And (3) outputting: the class y' of the sample is input.
The first step is as follows: for each classifier T e [1, T]And each full link layer is e [1, L ∈]Weight W int'and bias b'tInitializing such that: wt'=Wt,b't=bt
The second step is that: and giving rare attack input samples, carrying out forward propagation to finally obtain output data, and then finely adjusting parameters in the classifier in a rare attack set by using a backward propagation algorithm.
Training of the two classifiers is carried out independently, and the common attack classifier learns a common attack mode and a normal mode; the rare attack classifier learns rare attack patterns and normal patterns. And in the detection stage, the two attack classifiers are connected to form a combined attack classifier, the process firstly uses a common attack classifier to classify the network data, and if the classification result is of a normal type, the network data is detected again by the rare attack classifier.
Referring to fig. 2, a rare attack-oriented network intrusion detection method specifically includes the following steps:
step 1, collecting network attack data, and processing the collected network attack data into network data with an attack type label as a training set.
The classic datasets in the field of network intrusion detection are the KDD99 dataset and the NSL-KDD dataset. The KDD99 data set is a data set collected and created by american researchers simulating different network attacks in a real network environment, and is widely applied to a method for evaluating network intrusion detection. Although a huge number of attack samples exist in the KDD99 raw data set, the huge number of samples are repetitive and redundant, and once a large number of repetitive samples appear in the training set, the learning process of the classifier is biased. The NSL-KDD dataset is an upgraded version of the KDD99 dataset, eliminating redundant samples in the training set and deleting duplicate records in the testing set. Thus, the NSL-KDD dataset was used in the study of the present invention. The NSL-KDD dataset consists of network traffic data. Each piece of data is a TCP packet within a certain time, the packets refer to data transmitted from a source address to a destination address under a certain protocol, and each attack sample in the data includes 41 characteristic attributes and 1 sample class label.
Step 2, after the feature selection is carried out on the training set by utilizing a genetic programming algorithm, a subdata set with selected features is generated;
and 3, evaluating the accuracy of the subdata sets by using the random forest, and calculating the fitting value of the subdata sets by using the accuracy. Wherein the fitting value f of the sub data setfitnessComprises the following steps:
Figure BDA0002669293110000071
where score represents the accuracy of the sub data set obtained using a random forest for evaluation, and n represents the number of trees in the random forest.
If the fitting value of the sub data set reaches the target fitting value, ending iteration, taking the sub data set as an optimized data set, and turning to the step 4; otherwise, the subdata set is used as a training set, and the step 2 is returned to, and the iteration is continued.
And (4) when the preset iteration number is reached and the fitting value of the subdata set does not reach the target fitting value, taking the subdata set with the highest fitting value in all iterations as an optimized data set, and turning to the step 4.
And 4, dividing the optimized data set into a common attack set and a rare attack set according to the feature tags of the network data attack types.
Step 5, firstly, constructing an initial common attack classification model by using a convolutional neural network, and sending a common attack set to the initial common attack classification model to train the model to obtain a common attack classifier; then, constructing an initial rare attack classification model by using a common attack classifier, and sending a rare attack set to the initial rare attack classification model to train the model to obtain a rare attack classifier; and finally, cascading the common attack classifier and the rare attack classifier to form a combined attack classifier.
Step 6, sending the network data to be detected collected in real time into a joint attack classifier for classification; in the joint attack classifier, firstly, sending network data to be detected into a common attack classifier for first detection, and judging whether common attacks exist: if the network data to be detected exists, judging that the network data to be detected is subjected to common attack; if not, sending the network data to be detected into the rare attack classifier for secondary detection, and judging whether rare attacks exist: if the network data to be detected exists, judging that the network data to be detected is subjected to rare attack; and if the network data to be detected does not exist, judging that the network data to be detected is normal network data which is not attacked.
It should be noted that, although the above-mentioned embodiments of the present invention are illustrative, the present invention is not limited thereto, and thus the present invention is not limited to the above-mentioned embodiments. Other embodiments, which can be made by those skilled in the art in light of the teachings of the present invention, are considered to be within the scope of the present invention without departing from its principles.

Claims (3)

1. The rare attack-oriented network intrusion detection method is characterized by comprising the following steps:
step 1, collecting network attack data, and processing the collected network attack data into network data with an attack type label as a training set;
step 2, after the feature selection is carried out on the training set by utilizing a genetic programming algorithm, a subdata set with selected features is generated;
step 3, evaluating the accuracy of the subdata sets by using a random forest, and calculating the fitting value of the subdata sets by using the accuracy: if the fitting value of the sub data set reaches the target fitting value, ending the iteration, taking the sub data set as an optimized data set, and turning to the step 4; otherwise, the subdata set is used as a training set, and the step 2 is returned to, and the iteration is continued;
step 4, dividing the optimized data set into a common attack set and a rare attack set according to the feature tags of the network data attack types;
step 5, firstly, constructing an initial common attack classification model by using a convolutional neural network, and sending a common attack set to the initial common attack classification model to train the model to obtain a common attack classifier; then, constructing an initial rare attack classification model by using a common attack classifier, and sending a rare attack set to the initial rare attack classification model to train the model to obtain a rare attack classifier; finally, cascading the common attack classifier and the rare attack classifier to form a combined attack classifier;
step 6, sending the network data to be detected collected in real time into a joint attack classifier for classification; in the joint attack classifier, firstly, sending network data to be detected into a common attack classifier for first detection, and judging whether common attacks exist: if the network data to be detected exists, judging that the network data to be detected is subjected to common attack; if not, sending the network data to be detected into the rare attack classifier for secondary detection, and judging whether rare attacks exist: if the network data to be detected exists, judging that the network data to be detected is subjected to rare attack; and if the network data to be detected does not exist, judging that the network data to be detected is normal network data which is not attacked.
2. According to claimThe rare attack-oriented network intrusion detection method of claim 1, wherein, in step 3, the fitting value f of the sub data setfitnessComprises the following steps:
Figure FDA0002669293100000011
where score represents the accuracy of the sub data set obtained using a random forest for evaluation, and n represents the number of trees in the random forest.
3. The rare attack-oriented network intrusion detection method as recited in claim 1, wherein in step 3, if the predetermined number of iterations is reached and the fitting value of the sub data set has not yet reached the target fitting value, the sub data set with the highest fitting value in all iterations is taken as the optimized data set, and the procedure goes to step 4.
CN202010928410.8A 2020-09-07 2020-09-07 Rare attack-oriented network intrusion detection method Active CN112087447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010928410.8A CN112087447B (en) 2020-09-07 2020-09-07 Rare attack-oriented network intrusion detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010928410.8A CN112087447B (en) 2020-09-07 2020-09-07 Rare attack-oriented network intrusion detection method

Publications (2)

Publication Number Publication Date
CN112087447A true CN112087447A (en) 2020-12-15
CN112087447B CN112087447B (en) 2022-05-06

Family

ID=73732619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010928410.8A Active CN112087447B (en) 2020-09-07 2020-09-07 Rare attack-oriented network intrusion detection method

Country Status (1)

Country Link
CN (1) CN112087447B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989354A (en) * 2021-01-27 2021-06-18 中标软件有限公司 Attack detection method based on neural network and focus loss
CN113114618A (en) * 2021-03-02 2021-07-13 西安电子科技大学 Internet of things equipment intrusion detection method based on traffic classification recognition
CN113315790A (en) * 2021-07-29 2021-08-27 湖南华菱电子商务有限公司 Intrusion flow detection method, electronic device and storage medium
CN113518063A (en) * 2021-03-01 2021-10-19 广东工业大学 Network intrusion detection method and system based on data enhancement and BilSTM
CN113609480A (en) * 2021-08-12 2021-11-05 广西师范大学 Multi-path learning intrusion detection method based on large-scale network flow
CN113824725A (en) * 2021-09-24 2021-12-21 中国人民解放军国防科技大学 Network safety monitoring analysis method and system based on causal machine learning
CN114500071A (en) * 2022-02-10 2022-05-13 江苏大学 Self-adaptive fingerprint attack method and system for dynamic growth of target website

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504332A (en) * 2014-12-29 2015-04-08 南京大学 Negative selection intrusion detection method based on secondary mobile node strategy
CN107231384A (en) * 2017-08-10 2017-10-03 北京科技大学 A kind of ddos attack detection defence method cut into slices towards 5g networks and system
CN108694476A (en) * 2018-06-29 2018-10-23 山东财经大学 A kind of convolutional neural networks Stock Price Fluctuation prediction technique of combination financial and economic news
CN109829514A (en) * 2019-03-07 2019-05-31 西安电子科技大学 A kind of network inbreak detection method, device, computer equipment and storage medium
US20190281076A1 (en) * 2017-02-27 2019-09-12 Amazon Technologies, Inc. Intelligent security management
CN110941794A (en) * 2019-11-27 2020-03-31 浙江工业大学 Anti-attack defense method based on universal inverse disturbance defense matrix

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504332A (en) * 2014-12-29 2015-04-08 南京大学 Negative selection intrusion detection method based on secondary mobile node strategy
US20190281076A1 (en) * 2017-02-27 2019-09-12 Amazon Technologies, Inc. Intelligent security management
CN107231384A (en) * 2017-08-10 2017-10-03 北京科技大学 A kind of ddos attack detection defence method cut into slices towards 5g networks and system
CN108694476A (en) * 2018-06-29 2018-10-23 山东财经大学 A kind of convolutional neural networks Stock Price Fluctuation prediction technique of combination financial and economic news
CN109829514A (en) * 2019-03-07 2019-05-31 西安电子科技大学 A kind of network inbreak detection method, device, computer equipment and storage medium
CN110941794A (en) * 2019-11-27 2020-03-31 浙江工业大学 Anti-attack defense method based on universal inverse disturbance defense matrix

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KOK CHIN KHOR等: "《A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection》", 《APPLIED INTELLIGENCE》 *
宋文展等: "一种限制输出模型规模的集成进化分类算法", 《数据采集与处理》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989354A (en) * 2021-01-27 2021-06-18 中标软件有限公司 Attack detection method based on neural network and focus loss
CN113518063A (en) * 2021-03-01 2021-10-19 广东工业大学 Network intrusion detection method and system based on data enhancement and BilSTM
CN113114618A (en) * 2021-03-02 2021-07-13 西安电子科技大学 Internet of things equipment intrusion detection method based on traffic classification recognition
CN113315790A (en) * 2021-07-29 2021-08-27 湖南华菱电子商务有限公司 Intrusion flow detection method, electronic device and storage medium
CN113609480A (en) * 2021-08-12 2021-11-05 广西师范大学 Multi-path learning intrusion detection method based on large-scale network flow
CN113609480B (en) * 2021-08-12 2023-04-28 广西师范大学 Multipath learning intrusion detection method based on large-scale network flow
CN113824725A (en) * 2021-09-24 2021-12-21 中国人民解放军国防科技大学 Network safety monitoring analysis method and system based on causal machine learning
CN114500071A (en) * 2022-02-10 2022-05-13 江苏大学 Self-adaptive fingerprint attack method and system for dynamic growth of target website
CN114500071B (en) * 2022-02-10 2024-04-16 江苏大学 Self-adaptive fingerprint attack method and system aiming at dynamic growth of target website

Also Published As

Publication number Publication date
CN112087447B (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN112087447B (en) Rare attack-oriented network intrusion detection method
US7362892B2 (en) Self-optimizing classifier
CN110929029A (en) Text classification method and system based on graph convolution neural network
CN104601565B (en) A kind of network invasion monitoring sorting technique of intelligent optimization rule
CN110460605B (en) Abnormal network flow detection method based on automatic coding
CN112685504A (en) Production process-oriented distributed migration chart learning method
CN111988329A (en) Network intrusion detection method based on deep learning
CN113901448A (en) Intrusion detection method based on convolutional neural network and lightweight gradient elevator
CN114172688A (en) Encrypted traffic network threat key node automatic extraction method based on GCN-DL
CN113987236B (en) Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network
Mishra et al. Kohonen self organizing map with modified k-means clustering for high dimensional data set
Deng et al. Network intrusion detection based on sparse autoencoder and IGA-BP network
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN115310589A (en) Group identification method and system based on depth map self-supervision learning
CN113179276B (en) Intelligent intrusion detection method and system based on explicit and implicit feature learning
Bhowmik et al. Dbnex: Deep belief network and explainable ai based financial fraud detection
CN113010705A (en) Label prediction method, device, equipment and storage medium
CN111310838A (en) Drug effect image classification and identification method based on depth Gabor network
CN114265954B (en) Graph representation learning method based on position and structure information
Faraoun et al. Neural networks learning improvement using the k-means clustering algorithm to detect network intrusions
CN114124437B (en) Encrypted flow identification method based on prototype convolutional network
CN112651422B (en) Space-time sensing network flow abnormal behavior detection method and electronic device
Fukumi et al. Rule extraction from neural networks trained using evolutionary algorithms with deterministic mutation
Wang et al. psoResNet: An improved PSO-based residual network search algorithm
CN114519605A (en) Advertisement click fraud detection method, system, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230619

Address after: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee after: Guangzhou Dayu Chuangfu Technology Co.,Ltd.

Address before: 541004 No. 15 Yucai Road, Qixing District, Guilin, the Guangxi Zhuang Autonomous Region

Patentee before: Guangxi Normal University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230725

Address after: 518000 floor 7, building 10, new material port, Changyuan, high tech middle school, Yuehai Street Science Park community, Nanshan District, Shenzhen, Guangdong

Patentee after: Shenzhen wanganxin Technology Co.,Ltd.

Address before: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee before: Guangzhou Dayu Chuangfu Technology Co.,Ltd.