CN109299741B

CN109299741B - Network attack type identification method based on multi-layer detection

Info

Publication number: CN109299741B
Application number: CN201811146113.7A
Authority: CN
Inventors: 胡昌振; 吕坤; 孙冲
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2018-06-15
Filing date: 2018-09-29
Publication date: 2022-03-04
Anticipated expiration: 2038-09-29
Also published as: CN109299741A

Abstract

The invention relates to a network attack type identification method based on multilayer detection, and belongs to the technical field of information security. The specific operation steps are as follows: step one, acquiring original training data and preprocessing. And step two, constructing an integrated classification model. And step three, training an integrated classification model. And step four, preprocessing the test data. And step five, classifying the test data. Compared with the prior art, the network attack type identification method based on multilayer detection has the following advantages that: the smote algorithm is adopted to carry out up-sampling on a few samples and carry out down-sampling on a plurality of samples, and the problem of unbalanced samples in a data set is solved. Secondly, an integrated model is adopted, so that the detection accuracy and the recall rate are improved. And thirdly, combining the drosophila optimization algorithm FOA with a Support Vector Machine (SVM) to realize the optimal and self-adaptive selection of parameters C and gamma in the SVM.

Description

Network attack type identification method based on multi-layer detection

Technical Field

The invention relates to a network attack type identification method based on multilayer detection, and belongs to the technical field of information security.

Background

In the cyberspace, the number and scale of cyber attacks in recent years have increased dramatically, and the basic main types of cybersattacks include Denial of Service (DoS), unauthorized Remote host access (Remote-to-log, R2L), unauthorized super User access (User-to-Root, U2R), snoop detection (Probing), and the like, each of which includes a plurality of sub-attack types. To effectively detect these network attacks, it has become an urgent task to deploy efficient intrusion detection systems.

The current commonly used network attack detection methods include: the rule-based detection method has the disadvantages that new intrusions are difficult to detect, and editing the rules is time-consuming and highly dependent on the known intrusion knowledge base. Secondly, an entropy detection method depending on network flow characteristic distribution has the defects that entropy expresses randomness, and abnormal flows which do not disturb randomness cannot be detected. And detecting methods based on machine learning, such as neural networks, support vector machines, clustering algorithms, and the like. The detection method based on machine learning can detect new intrusion and is generally applied to current intrusion detection, but the detection result is greatly influenced by data imbalance and parameters of an algorithm model.

Disclosure of Invention

The invention aims to solve the problems of unbalanced network attack detection data sets and low accuracy and recall rate of a network attack classification algorithm, and provides a network attack type identification method based on multilayer detection.

The invention is realized by the following technical scheme.

The invention provides a network attack type identification method based on multilayer detection, which comprises the following specific operation steps:

step one, acquiring original training data and preprocessing.

Step 1.1: and acquiring network attack data to form an original training data set. The network attack data includes numerical features and character discrete features. The character discrete type features include: protocol type, service type, and connection error identification.

Step 1.2: each piece of original training data in the original training data set is converted into a numerical type original training data feature vector. The method specifically comprises the following steps:

step 1.2.1: and extracting character discrete type features from each piece of data, and respectively coding the character discrete type features in a one-hot vector form, wherein one-hot vector is obtained by corresponding to one character discrete type feature.

Step 1.2.2: constructing a numerical characteristic vector by using the value of the numerical characteristic in each piece of data;

step 1.2.3: and merging the numerical characteristic vector in the step 1.2.2 with all the one-hot vectors obtained in the step 1.2.1.

Through the operation of the steps, a numerical-type original training data feature vector is obtained corresponding to an original training data.

Step 1.3: the problem of unbalanced quantity of each type of data of the original training data set is solved through data down sampling and data up sampling. The method specifically comprises the following steps:

case 1: if the amount of data of a certain type (denoted by symbol a) in the original training data set is much larger than that of data of other types, the amount of a type a is reduced by using a data down-sampling method, specifically: a part of data is randomly extracted from the data of the type A to reduce the data of the type A.

Case 2: if the number of a certain type (denoted by symbol B) in the original training data set is much lower than the number of other types of data, the data up-sampling method is adopted to increase the number of B types of data.

The data up-sampling algorithm is a SMOTE (Synthetic minimum Oversampling Technique) algorithm.

An original training data set after one-hot coding, data down-sampling and data up-sampling is called a basic training data set and is represented by a symbol X; by the symbol x_ijThe jth feature of the ith piece of data representing the base training data set X, i ∈ [1, n [ ]]N is the number of data in the basic training data set X;

step 1.4: the data in the basic training data set X is normalized by equation (1).

Wherein, x'_ijAs data x_ijData obtained after normalization processing; AVG_jTraining for foundationCalculating the average value of the jth characteristic of all the data in the training data set X through a formula (2); STD_jThe standard deviation of the jth feature of all data in the basic training data set X is calculated by formula (3).

After the basic training data set is preprocessed through the operation of the first step, a training data set is obtained and is represented by a symbol X'.

And step two, constructing an integrated classification model.

The integrated classification model comprises a GBDT (Gradient Boost Decision Tree) classifier, a KNN classifier and a stacking classifier.

The GBDT classifier is learned by an idea of Boosting through iterative construction of Classification And Regression Trees (CART). By the symbol f_t-1(x) Representing the GBDT classifier obtained by the iteration of the (t-1) th round, wherein t is a positive integer; by the symbol f_t(x) Representing a GBDT classifier obtained by the t-th iteration; by the symbol L (y, f)_t-1(x) Represents the loss function of the GBDT classifier obtained from the (t-1) th iteration; by the symbol L (y, f)_t(x) Represents the penalty function of the GBDT classifier obtained from the t-th iteration; by the symbol h_t(x) The fitting function obtained in the t-th round learning is shown. In the learning process of the GBDT classifier, the (t-1) th iteration is to find the L (y, f) in the formula (4)_t(x) ) takes a minimum value of h_t(x) In that respect Finding the smallest h_t(x) The process of (2) adopts a method of negative gradient fitting of a loss function.

L(y,f_t(x))＝L(y,f_t-1(x)+h_t(x))(4)

The KNN classifier is used for classifying DoS (Denial of Service) type data and predicting the subtype of the DoS type data. And setting the parameter K of the KNN classifier to be 3.

The stacking classifier is used for classifying non-DoS type data. The stacking classifier is divided into a primary classification model and a secondary classification model. The primary model is divided into an upper layer and a lower layer, wherein the upper layer of the primary model is formed by connecting 3 xgboost (eXtreme Gradient boost) classification module groups, 1 SVM (support vector machines) classification module group, 1 GBDT (Gradient boost Defison Tree) classification module group and 1 RF (random forest) classification module group in parallel. Each xgboost classification module group is formed by connecting m xgboost classification modules in parallel, each SVM classification module group is formed by connecting m SVM classification modules in parallel, each GBDT classification module group is formed by connecting m GBDT classification modules in parallel, and each RF classification module group is formed by connecting m RF classification modules in parallel; m is an artificial set value, and m belongs to [3,8 ]. The lower layer of the primary module is a splicing and voting module.

And the output ends of the 3 xgboost classification module groups, the 1 SVM classification module group, the 1 GBDT classification module group and the 1 RF classification module group at the upper layer of the primary model are respectively connected with the input ends of the splicing and voting modules at the lower layer of the primary model. In the training phase, the splicing and voting module is used for: and combining the output results of each xgboost classification module group, SVM classification module group, GBDT classification module group and RF classification module group of the upper layer of the primary model to obtain a vector matrix called stacking vector matrix. In the testing stage, the splicing and voting module is used for: corresponding to a piece of test data, output results of each xgboost classification module group, SVM classification module group, GBDT classification module group and RF classification module group of the upper layer of the primary model are respectively voted, each classification module group obtains a classification result, and then the classification results are combined to obtain a 1 x 6 stacking feature vector.

The secondary model is an SVM classifier, and the input of the secondary model is the stacking feature vector generated by the primary model.

The SVM classifier adopts a drosophila optimization algorithm FOA to optimally select SVM kernel function parameters (represented by a symbol gamma) and punishment parameters (represented by a symbol C), and the specific operation steps are as follows:

step 2.1: initializing SVM kernel function parameter gamma and punishment parameter C, gamma belongs to [0.001, 5]]，C∈[0.001,5]. Setting the starting position of the fruit fly as (C)_begin,γ_begin) In which C is_begin＝C，γ_begin＝γ。

Step 2.2: the population size (denoted by the symbol popsize), the number of iterations (denoted by the symbol epoch), and the search distance (denoted by the symbol val) of the penalty parameter C are set_CRepresentation) and the search distance of the kernel parameter y (denoted by the symbol val)_γRepresentation). popsize E [8,15 ]]，epoch≥5，val_C∈[0.05,0.5]，val_γ∈[0.001,0.01]。

Step 2.3: calculating the position of the pth fruit fly at the next moment according to the formulas (6) to (7), and using the symbol (C)_p,γ_p) Denotes p ∈ [1, popsize >]。

C_p＝C_begin+val_C×ε (6)

γ_p＝γ_begin+val_γ×ε (7)

Wherein ε is a random value in the range of [ -1,1 ].

Step 2.4: if the penalty parameter C is less than 0.001 at the moment, C is 0.001; if C >5, then C ═ 5. If γ <0.001, γ is 0.001; when γ is greater than 5, γ is 5.

Step 2.5: and (4) calculating the fitness function values of the positions of all the drosophila flies obtained in the step 2.3 according to a formula (8).

Fit(C_p,γ_p)＝accuracy(C_p,γ_p) (8)

Wherein, Fit (C)_p,γ_p) The fitness function value of the position of the pth fruit fly is shown; accuracy (C)_p,γ_p) Representing the SVM classifier at parameter (C)_q,γ_q) Upper cross validation generated accuracy, C_q＝C_p,γ_q＝γ_p。

Step 2.6, finding the maximum value (using the maximum value) in the fitness function values corresponding to the positions of all fruit flies at the current momentSymbol Fit_maxRepresentation), and Fit_maxThe corresponding position is judged to be Fit at the moment_maxIf the fitness function value is higher than the fitness function value of the initial position, Fit is used_maxThe corresponding position replaces the initial position while saving the Fit_maxThen the next iteration is performed. If it is at that time Fit_maxIf the fitness function value is lower than the fitness function value of the initial position, the step 2.3 to the step 2.6 are repeatedly executed until the iteration times reach the epoch times, and the operation is finished.

The connection relation of the integrated classification model is as follows: external data enters the integrated classification model through the input end of the GBDT classifier; the output end of the GBDT classifier is respectively connected with the input ends of the KNN classifier and the stacking classifier; and the output of the KNN classifier and the stacking classifier is used as the external output of the integrated classification model.

And step three, training an integrated classification model.

And training an integrated classification model on the basis of the operation of the step one and the operation of the step two. The method specifically comprises the following steps:

step 3.1: the GBDT classifier is trained. The method specifically comprises the following steps:

step 3.1.1: the data in the training data set X' are labeled by category. The data in the training data set X' is labeled as DoS type and other type 2.

Step 3.1.2: the GBDT classifier is trained using the labeled training data set X'.

And 3.1, obtaining the trained GBDT classifier.

Step 3.2: and training the KNN classifier. The method specifically comprises the following steps:

step 3.2.1: constructing a DoS type data set for the data marked as DoS type in the training data set X', and using the symbol X₁' means.

Step 3.2.2: to DoS type data set X'₁The data in (1) is marked for fine classification. The DoS type dataset, in symbol X'₁The data in (1) are subdivided into: smurf attacks, neptune attacks, back attacks, teardrop attacks, pod attacks, and Other attacks.

Step 3.2.3: to DoS type data set X'₁Performing data down-sampling processing according to the subdivision type to solve the DoS type data set X'₁The quantity of each subdivision type data is unbalanced; the data set after data down-sampling, called KNN training data set, is represented by symbol X₁And (4) showing.

Step 3.2.4: training dataset X using KNN₁And training the KNN classifier.

And 3.2, obtaining the trained KNN classifier.

Step 3.3: training a stacking classifier. The method specifically comprises the following steps:

step 3.3.1: constructing a stacking training data set using data labeled as other types in training data set X', denoted by symbol X₂Representing, and then performing fine classification marking on data in a stacking training data set X₂Is subdivided into: normal, Probe, U2L (User-to-Root, unauthorized access to a supervisor), R2L (Remote-to-log, unauthorized access to a Remote host).

Step 3.3.2: training data set X₂Is divided into m subsets, called 1 st subset, 2 nd subset, … …, m subsets. The number of data per subset is denoted by the symbol M, which is a positive integer.

Step 3.3.3: the set of RF classification modules is trained. The method specifically comprises the following steps:

step 3.3.3.1: the temporary variable is denoted by the symbol h, h ∈ [1, m ]. The initial value of h is set to 1.

Step 3.3.3.2: training data set X₂As verification data. Then, using stacking training data set X₂As training data, to train an untrained RF classification module of the set of RF classification modules.

Step 3.3.3.3: and inputting the data of the h-th subset into the trained RF classification module in step 3.3.3.2 for classification, so as to obtain an M × 1 vector matrix.

Step 3.3.3.4: if h < m, the value of h is incremented by 1 and steps 3.3.3.2 through 3.3.3.4 are repeated. Otherwise, the operation of step 3.3.3.5 is performed.

Step 3.3.3.5: and merging the classification results of the 1 st subset to the m th subset obtained in the step 3.3.3.2 to obtain a classification result of the data of the stacking training data set in the RF classification module group, and sending the classification result to the splicing and voting module.

Completing the training of the RF classification module group through the operations from step 3.3.3.1 to step 3.3.3.5, and obtaining a stacking training data set X₂The classification result of the data in the RF classification module group.

Step 3.3.4: training the SVM classification module group. The method specifically comprises the following steps:

step 3.3.4.1: the temporary variable is denoted by the symbol h, h ∈ [1, m ]. The initial value of h is set to 1.

Step 3.3.4.2: training data set X₂As verification data. Then, using stacking training data set X₂The other data is used as training data to train an untrained SVM classification module in the SVM classification module group.

Step 3.3.4.3: and inputting the data of the h-th subset into the SVM classification module trained in the step 3.3.4.2 for classification, so as to obtain an M × 1 vector matrix.

Step 3.3.4.4: if h < m, the value of h is incremented by 1 and steps 3.3.4.2 through 3.3.4.4 are repeated. Otherwise, the operation of step 3.3.4.5 is performed.

Step 3.3.4.5: merging the classification results of the 1 st subset to the m th subset obtained in the step 3.3.4.2 to obtain a stacking training data set X₂The classification result of the data in the SVM classification module group is sent to the splicing and voting module.

Through the operations from step 3.3.4.1 to step 3.3.4.5, training of the SVM classification module group is completed, and a classification result of data of a stacking training data set in the SVM classification module group is obtained.

Step 3.3.5: and training the GBDT classification module group. The method specifically comprises the following steps:

step 3.3.5.1: the temporary variable is denoted by the symbol h, h ∈ [1, m ]. The initial value of h is set to 1.

Step 3.3.5.2: training data set X₂As verification data. Then, using stacking training data set X₂As training data, training an untrained GBDT classification module in the GBDT classification module group.

Step 3.3.5.3: and inputting the data of the h-th subset into the GBDT classification module trained in the step 3.3.5.2 for classification, so as to obtain an M × 1 vector matrix.

Step 3.3.5.4: if h < m, the value of h is incremented by 1 and steps 3.3.5.2 through 3.3.5.4 are repeated. Otherwise, the operation of step 3.3.5.5 is performed.

Step 3.3.5.5: merging the classification results of the 1 st subset to the m th subset obtained in the step 3.3.5.2 to obtain a stacking training data set X₂And (4) the classification result of the data in the GBDT classification module group is sent to the splicing and voting module.

Through the operations from step 3.3.5.1 to step 3.3.5.5, the training of the GBDT classification module group is completed, and the classification result of the data of the stacking training data set in the GBDT classification module group is obtained.

Step 3.3.6: and training an XGBOOST classification module group. The method specifically comprises the following steps:

step 3.3.6.1: the temporary variable is denoted by the symbol h, h ∈ [1, m ]. The initial value of h is set to 1.

Step 3.3.6.2: training data set X₂As verification data. Then, using stacking training data set X₂The other data of XGBOOST classification module group is used as training data to train an untrained XGBOOST classification module in the XGBOOST classification module group.

Step 3.3.6.3: and inputting the h-th subset data into the trained XGBOOST classification module in step 3.3.6.2 for classification to obtain an Mx 1 vector matrix.

Step 3.3.6.4: if h < m, the value of h is incremented by 1 and steps 3.3.6.2 through 3.3.6.4 are repeated. Otherwise, the operation of step 3.3.6.5 is performed.

Step 3.3.6.5: and merging the classification results of the 1 st subset to the m th subset obtained in the step 3.3.6.2 to obtain the classification result of the data of the stacking training data set in the XGBOOST classification module group, and sending the classification result to the splicing and voting module.

Completing the training of the XGBOST classification module group through the operations from step 3.3.6.1 to step 3.3.6.5, and obtaining a stacking training data set X₂The data of XGBOOST is classified into the classification result of the XGBOOST classification module group.

Step 3.3.7: repeating the step 3.3.6 for 2 times to finish the training of the other 2 XGB OST classification module groups and obtain a stacking training data set X₂The data in the other 2 XGBOOST classification module groups are classified into results and sent to the splicing and voting module.

Step 3.3.8: the splicing and voting module carries out the stacking training data set X obtained from the step 3.3.3 to the step 3.3.7₂Combining the classification results of all the classification module groups to obtain a vector matrix of P multiplied by 6, namely a stacking vector matrix; wherein P represents a stacking training data set X₂The amount of data of (c).

Step 3.3.9: inputting the stacking vector matrix obtained in the step 3.3.8 into a secondary model SVM classifier of a stacking classifier, and performing training operation to obtain a trained stacking classifier.

And finishing the training of the stacking classifier through the operation of the steps to obtain a trained integrated classification model.

And step four, preprocessing the test data. The method specifically comprises the following steps:

step 4.1: and acquiring network attack data to form an original test data set. The network attack data includes numerical features and character discrete features. The character discrete type features are as follows: protocol type, service type, and connection error identification.

Step 4.2: each piece of original test data in the original test data set is converted into a numerical type original test data feature vector. The method specifically comprises the following steps:

step 4.2.1: and extracting character discrete type features from each piece of data, and respectively coding the character discrete type features in a one-hot vector form, wherein one-hot vector is obtained by corresponding to one character discrete type feature.

Step 4.2.2: constructing a numerical characteristic vector by using the value of the numerical characteristic in each piece of data;

step 4.2.3: and merging the numerical characteristic vector in the step 4.2.2 with the one-hot vector obtained in the step 4.2.1.

And through the operation of the steps, corresponding to one piece of original test data, and obtaining a numerical value type original test data feature vector.

The original test data set subjected to one-hot coding is called a basic test data set, and the symbol X is used_testRepresents; by the symbol x_test,ijRepresenting a base test data set X_testThe jth feature of the ith piece of data.

Step 4.3: the basic test data set X is given by the formula (5)_testThe data in (1) is normalized.

Wherein, x'_test,ijAs data x_test,ijData obtained after normalization processing; AVG_jThe average value of the jth feature of all data in the basic training data set X obtained in the step 1.4 is obtained; STD_jThe standard deviation of the jth feature of all data in the basic training data set X obtained in step 1.4.

After the operation of step 4, preprocessing the basic test data set to obtain a test data set, using the symbol X'_testAnd (4) showing.

And step five, classifying the test data.

And inputting the test data obtained through the preprocessing in the fourth step into the integrated classification model trained in the third step for classification. The method comprises the following specific steps:

step 5.1: inputting a piece of test data obtained through the preprocessing in the step four into the GBDT classifier, and if the classification result is the DoS type, executing the operation in the step 5.2; if the classification result is of non-DoS type, the operation of step 5.3 is performed.

Step 5.2: and inputting the test data into a KNN classifier for classification to obtain and output a final classification result, and finishing the operation.

Step 5.3: and respectively inputting the test data into each RF classification module in the RF classification module group, and outputting the classification result to the splicing and voting module after classification operation. And the splicing and voting module votes the output result of the RF classification module group to determine the classification result.

Step 5.4: and respectively inputting the test data into each GBDT classification module in the GBDT classification module group, and outputting classification results to a splicing and voting module after classification operation. And the splicing and voting module votes the output result of the GBDT classification module group to determine the classification result.

Step 5.5: and respectively inputting the test data into each SVM classification module in the SVM classification module group, and outputting a classification result to a splicing and voting module after classification operation. And the splicing and voting module votes the output result of the SVM classification module group to determine the classification result.

Step 5.6: and respectively inputting the test data into each xgboost classification module in one xgboost classification module group, and outputting the classification result to a splicing and voting module after classification operation. And the splicing and voting module votes the output result of the xgboost classification module group to determine the classification result.

Step 5.7: and repeating the operation of the step 5.6 for 2 times to obtain the classification results of the other 2 xgboost classification module groups.

Step 5.8: and (5) combining the results of the steps from 5.3 to 5.7 to obtain a 1 × 6 stacking vector.

Step 5.9: and (4) inputting the 1 × 6 stacking vector obtained in the step 5.8 into a secondary model SVM classifier of the stacking classifier, performing classification operation to obtain and output a classification result of the test data, and finishing the operation.

Advantageous effects

Compared with the prior art, the network attack type identification method based on multilayer detection has the following advantages that:

the smote algorithm is adopted to carry out up-sampling on a few samples and carry out down-sampling on a plurality of samples, and the problem of unbalanced samples in a data set is solved.

Secondly, an integrated classification model is adopted, so that the detection accuracy and recall rate are improved.

And thirdly, combining a drosophila optimization algorithm (FOA) with a Support Vector Machine (SVM) to realize optimal and self-adaptive selection of parameters C and gamma in the SVM.

Drawings

Fig. 1 is an operation flowchart of a network attack type identification method based on multi-layer detection in an embodiment of the present invention.

FIG. 2 is a block diagram of an integrated classification model in accordance with an embodiment of the present invention.

Detailed Description

According to the technical scheme, the invention is described in detail by combining the drawings and the implementation examples.

The network attack type identification method based on multi-layer detection provided by the invention has the operation flow as shown in figure 1, and the specific operation steps are as follows.

Step one, acquiring original training data and preprocessing.

Step 1.1: and acquiring network attack data to form an original training data set. The experiment adopts a KDD99 data set, the data distribution in the original training data set is shown in Table 1, and the data distribution comprises Normal data, DoS data, PROBE data, U2L data, R2L data and five types of data. Wherein the distribution of the subtypes of DoS class data is shown in table 2. Each piece of normal data or attack data is composed of 41 features, as shown in table 3, in which the values of three discrete features, "pro col _ TYPE", "SERVICE", "FLAG" are character labels, and the values of the other features are numerical values.

TABLE 1 data distribution of the raw training data set of KDD99

Categories	NORMAL	DoS	PROBE	U2L	R2L
						Original training set	97278	391485	4107	52	1126

TABLE 2 KDD99 raw training data set DoS attack subtype data distribution

Categories	Back	neptune	pod	smurf	teardrop	Other
							Original training set	2203	107201	264	280790	979	21

Table 3 41 characteristic components of KDD99 dataset

step 1.2.1: three character discrete characteristics of 'PROTOCOL _ TYPE', 'SERVICE' and 'FLAG' are extracted from each piece of data and are respectively encoded in a one-hot vector form, and one character discrete characteristic corresponds to one-hot vector.

the Normal data type has a much larger number of samples and DoS types than the other types. 10000 pieces of data are randomly extracted from the Normal type data, and the number of the Normal type data is reduced. The DoS type data is composed of a plurality of subtypes, samples of smurf attack data and neptune attack data are far more than the number of other subtypes, the two subtypes of data are downsampled, the smurf samples are randomly drawn for 14000, and the neptune samples are randomly drawn for 8533.

The data amount of PROBE, U2L and R2L types in the original training data is less than that of DoS and Normal data, and the three types of data are up-sampled by adopting an SMOTE algorithm.

In the invention, the PROBE sample is expanded by 2 times for sampling, and the neighbor number in the corresponding SMOTE algorithm is set to be 3; expanding the R2L sample by 4 times for sampling, and setting the neighbor number in the corresponding SMOTE algorithm as 3; the U2L sample is expanded by 40 times the sample, and the number of neighbors in the corresponding SMOTE algorithm is set to 10. After up-sampling, 8214 samples were obtained for PROBE, 4504 samples were obtained for R2L, and 2080 samples were obtained for U2L.

An original training data set after one-hot coding, data down-sampling and data up-sampling is called a basic training data set and is represented by a symbol X; by the symbol x_ijThe jth feature of the ith piece of data representing the base training data set X, i ∈ [1, n [ ]]And n is 54798. The data distribution of the resulting base training data set and the data distribution of the DoS subtypes therein are shown in tables 4 and 5.

TABLE 4 data distribution of the basic training data set X

Categories	NORMAL	DoS	PROBE	U2L	R2L
						Basic training set	10000	30000	8214	2080	4504

TABLE 5 DoS attack subtype data distribution in basic training dataset X

Categories	Back	neptune	pod	smurf	teardrop	Other
							Basic training set	2203	8533	264	14000	979	21

Wherein, x'_ijAs data x_ijData obtained after normalization processing; AVG_jCalculating the average value of the jth characteristic of all data in the basic training data set X through a formula (2); STD_jThe standard deviation of the jth feature of all data in the basic training data set X is calculated by formula (3).

And step two, constructing an integrated classification model.

L(y,f_t(x))＝L(y,f_t-1(x)+h_t(x)) (4)

The stacking classifier is used for classifying non-DoS type data. The stacking classifier is divided into a primary classification model and a secondary classification model. The primary model is divided into an upper layer and a lower layer, and the upper layer of the primary model is formed by connecting 3 xgboost classification module groups, 1 SVM classification module group, 1 GBDT classification module group and 1 RF classification module group in parallel. Each xgboost classification module group is formed by connecting m xgboost classification modules in parallel, each SVM classification module group is formed by connecting m SVM classification modules in parallel, each GBDT classification module group is formed by connecting m GBDT classification modules in parallel, and each RF classification module group is formed by connecting m RF classification modules in parallel. In this patent, m is set to 5. The lower layer of the primary module is a splicing and voting module.

And the output ends of the 3 xgboost classification module groups, the 1 SVM classification module group, the 1 GBDT classification module group and the 1 RF classification module group at the upper layer of the primary model are respectively connected with the input ends of the splicing and voting modules at the lower layer of the primary model. In the training phase, the splicing and voting module is used for: and combining the output results of each xgboost classification module group, SVM classification module group, GBDT classification module group and RF classification module group of the upper layer of the primary model to obtain a vector matrix called stacking vector matrix.

The structure of the integrated classification model is shown in fig. 2.

In the testing stage, the splicing and voting module is used for: corresponding to a piece of test data, output results of each xgboost classification module group, SVM classification module group, GBDT classification module group and RF classification module group of the upper layer of the primary model are respectively voted, each classification module group obtains a classification result, and then the classification results are combined to obtain a 1 x 6 stacking feature vector.

The SVM classifier adopts an FOA (Fly Optimization Algorithm) Optimization Algorithm to optimally select an SVM kernel function parameter (represented by a symbol gamma) and a penalty parameter (represented by a symbol C), and the specific operation steps are as follows:

step 2.1: initializing SVM kernel function parameter gamma and punishment parameter C, gamma belongs to [0.001, 5]]，C∈[0.001,5]. In the examples, γ is set to 0.01 and C is set to 0_.5, namely the initial position of the fruit fly is (C)_begin,γ_begin) In which C is_begin＝C，γ_begin＝γ。

Step 2.2: the population size (denoted by the symbol popsize), the number of iterations (denoted by the symbol epoch), and the search distance (denoted by the symbol val) of the penalty parameter C are set_CRepresentation) and the search distance of the kernel parameter y (denoted by the symbol val)_γRepresentation). popsize 10, epoch 5, val_C＝0.1，val_γ＝0.001。

C_p＝C_begin+val_C×ε (6)

γ_p＝γ_begin+val_γ×ε (7)

Wherein ε is a random value in the range of [ -1,1 ].

Fit(C_p,γ_p)＝accuracy(C_p,γ_p) (8)

Step 2.6, finding the maximum value (using the symbol Fit) in the fitness function values corresponding to the positions of all fruit flies at the current moment_maxRepresentation), and Fit_maxThe corresponding position is judged to be Fit at the moment_maxIf the fitness function value is higher than the fitness function value of the initial position, Fit is used_maxThe corresponding position replaces the initial position while saving the Fit_maxThen the next iteration is performed. If it is at that time Fit_maxIf the fitness function value is lower than the fitness function value of the initial position, the step 2.3 to the step 2.6 are repeatedly executed until the iteration times reach the epoch times, and the operation is finished.

And step three, training an integrated classification model.

step 3.1.1: the data in the training data set X' are labeled by category. The data in the training data set X' is labeled as DoS (Denial of Service) type and other 2 types. The data distribution is shown in table 6.

TABLE 6 data distribution of GBDT classifier training set

Data type	DoS	non-DoS (U2L, R2L, Normal, Probe)
			Number of	30000	24798

And 3.1, obtaining the trained GBDT classifier.

step 3.2.1: constructing a DoS type data set by using data marked as DoS type in the training data set X ', and using symbols X'₁And (4) showing.

Step 3.2.2: to DoS type data set X'₁The data in (1) is marked for fine classification. The DoS type dataset, in symbol X'₁The data in (1) are subdivided into: smurf attacks, neptune attacks, back attacks, teardrop attacks, pod attacks, and Other.

Step 3.2.3: to DoS type data set X'₁Performing data down-sampling processing according to the subdivision type; the number of types of data of smurf and neptune is far more than that of other types of data, 5000 pieces of smurf data and 4000 pieces of neptune data are randomly extracted. The data set after data down-sampling, called KNN training data set, is represented by symbol X₁And (4) showing. The data distribution is shown in table 7.

TABLE 7 training data distribution for KNN classifier

DoS subtype	Back	neptune	pod	smurf	teardrop	Other
							Number of	2203	4000	264	5000	979	21

Step 3.2.4: training dataset X using KNN₁And training the KNN classifier.

And 3.2, obtaining the trained KNN classifier.

step 3.3.1: constructing a stacking training data set using data labeled as other types in training data set X', denoted by symbol X₂Representing, and then performing fine classification marking on data in a stacking training data set X₂Is subdivided into: normal, PROBE, U2L, R2L. The data distribution is shown in table 8.

TABLE 8 training data distribution for stacking model

Data type	NORMAL	PROBE	U2L	R2L
					Number of	10000	8214	2080	4504

Step 3.3.2: training data set X₂Is divided evenly into 5 subsets, referred to as the 1 st subset, the 2 nd subset, … …, and the 5 th subset, respectively. The number of data per subset is denoted by the symbol M, which is a positive integer.

step 3.3.3.1: the temporary variable is denoted by the symbol t, t ∈ [1,5 ]. The initial value of t is set to 1.

Step 3.3.3.2: training data set X₂As verification data, t e [1,5]]. Then, using stacking training data set X₂As training data, to train an untrained RF classification module of the set of RF classification modules.

Step 3.3.3.3: and inputting the data of the t-th subset into the RF classification module trained in step 3.3.3.2 for classification, so as to obtain an mx 1 vector matrix.

Step 3.3.3.4: if t <5, the value of t is incremented by 1 and steps 3.3.3.2 through 3.3.3.4 are repeated. Otherwise, the operation of step 3.3.3.5 is performed.

Step 3.3.3.5: and merging the classification results of the 1 st subset to the 5 th subset obtained in the step 3.3.3.2 to obtain a classification result of the data of the stacking training data set in the RF classification module group, and sending the classification result to the splicing and voting module.

step 3.3.4.1: the temporary variable is denoted by the symbol t, t ∈ [1,5 ]. The initial value of t is set to 1.

Step 3.3.4.2: training data set X₂As verification data, t e [1,5]]. Then, using stacking training data set X₂The other data is used as training data to train an untrained SVM classification module in the SVM classification module group.

Step 3.3.4.3: and inputting the data of the t-th subset into the SVM classification module trained in the step 3.3.4.2 for classification, so as to obtain an M × 1 vector matrix.

Step 3.3.4.4: if t <5, the value of t is incremented by 1 and steps 3.3.4.2 through 3.3.4.4 are repeated. Otherwise, the operation of step 3.3.4.5 is performed.

Step 3.3.4.5: merging the classification results of the 1 st subset to the 5 th subset obtained in the step 3.3.4.2 to obtain a stacking training data set X₂The classification result of the data in the SVM classification module group is sent to the splicing and voting module.

step 3.3.5.1: the temporary variable is denoted by the symbol t, t ∈ [1,5 ]. The initial value of t is set to 1.

Step 3.3.5.2: training data set X₂As verification data, t e [1,5]]. Then, using stacking training data set X₂As training data, training an untrained GBDT classification module in the GBDT classification module group.

Step 3.3.5.3: and inputting the data of the t-th subset into the GBDT classification module trained in the step 3.3.5.2 for classification, so as to obtain an M × 1 vector matrix.

Step 3.3.5.4: if t <5, the value of t is incremented by 1 and steps 3.3.5.2 through 3.3.5.4 are repeated. Otherwise, the operation of step 3.3.5.5 is performed.

Step 3.3.5.5: merging the classification results of the 1 st subset to the 5 th subset obtained in the step 3.3.5.2 to obtain a stacking training data set X₂And (4) the classification result of the data in the GBDT classification module group is sent to the splicing and voting module.

step 3.3.6.1: the temporary variable is denoted by the symbol t, t ∈ [1,5 ]. The initial value of t is set to 1.

Step 3.3.6.2: training data set X₂As verification data, t e [1,5]]. Then, using stacking training data set X₂The other data of XGBOOST classification module group is used as training data to train an untrained XGBOOST classification module in the XGBOOST classification module group.

Step 3.3.6.3: and inputting the data of the t-th subset into the XGBOOST classification module trained in the step 3.3.6.2 for classification to obtain an Mx 1 vector matrix.

Step 3.3.6.4: if t <5, the value of t is incremented by 1 and steps 3.3.6.2 through 3.3.6.4 are repeated. Otherwise, the operation of step 3.3.6.5 is performed.

Step 3.3.6.5: and merging the classification results of the 1 st subset to the 5 th subset obtained in the step 3.3.6.2 to obtain the classification result of the data of the stacking training data set in the XGBOOST classification module group, and sending the classification result to the splicing and voting module.

And step four, preprocessing the test data.

Step 4.1: and acquiring network attack data to form an original test data set. As described in step 1.1, the experiment used a KDD99 dataset with 41 signature components per test data, as shown in table 3. The values of three discrete features "PROTOCOL _ TYPE", "SERVICE" and "FLAG" are character labels, and the values of the other features are numerical values. The data distribution and DoS subtype distribution in the original test data set are shown in tables 9 and 10.

TABLE 9 data distribution of raw test data set

Type (B)	NORMAL	DoS	PROBE	U2L	R2L
						Number of	60593	229853	4166	228	16189

TABLE 10 original test data set DoS attack subtype data distribution

Type (B)	Back	neptune	pod	smurf	teardrop	Other
							Number of	1098	58001	87	164091	12	6564

And step five, classifying the test data.

step 5.1: inputting each test data obtained through the preprocessing in the step four into a GBDT classifier, and executing the operation in the step 5.2 if the classification result is the DoS type; if the classification result is of non-DoS type, the operation of step 5.3 is performed.

Step 5.9: and (4) inputting the 1 × 6 stacking vector obtained in the step 5.8 into a secondary model SVM classifier of the stacking classifier, and obtaining and outputting a classification result of the test data through classification operation.

And finally, evaluating the prediction result. And (4) considering the prediction result of the test set obtained in the step five on two indexes of accuracy and recall rate. The results are shown in tables 11 and 12.

TABLE 11 prediction of DoS, PROBE, U2L, R2L, NORMAL data in the test set

Type (B)	NORMAL	DoS	PROBE	U2L	R2L
						Rate of accuracy	75.67％	99.89％	83.59％	7.54％	84.63％
Recall rate	99.23％	97.41％	93.11％	23.24％	10.91％

TABLE 12 prediction of data for the DoS attack subtype

Type (B)	smurf	neptune	back	pod	teardrop
						Rate of accuracy	99.99％	99.40％	68.49％	51.50％	29.27％
Recall rate	99.98％	99.85％	100％	99.85％	100％

Table 11 shows the accuracy and recall of the classification of five data types NORMAL, PROBE, DoS, U2L and R2L by the present method. Table 11 shows the classification accuracy and recall rate of the DoS attack subtype data according to the present method. Experimental results show that the method achieves good effects on the accuracy rate and the recall rate of the test set on the data set with extremely unbalanced data and inconsistent data distribution.

Claims

1. A network attack type identification method based on multilayer detection is characterized in that: the specific operation steps are as follows:

step one, acquiring original training data and preprocessing the original training data;

step 1.1: acquiring network attack data to form an original training data set; the network attack data comprises numerical characteristics and character discrete characteristics; the character discrete type features include: protocol type, service type and connection error identification;

step 1.2: converting each piece of original training data in an original training data set into a numerical original training data feature vector; the method specifically comprises the following steps:

step 1.2.1: extracting character discrete type features from each piece of data, and respectively coding the character discrete type features in a one-hot vector form, wherein one character discrete type feature corresponds to one-hot vector;

step 1.2.3: merging the numerical characteristic vector in the step 1.2.2 with all the one-hot vectors obtained in the step 1.2.1;

obtaining a numerical type original training data feature vector corresponding to an original training data through the operation of the step;

step 1.3: the problem of unbalanced quantity of various types of data of an original training data set is solved through data down-sampling and data up-sampling;

step 1.4: carrying out standardization processing on data in a basic training data set X through a formula (1);

wherein, x'_ijAs data x_ijNormalizedThe obtained data; AVG_jCalculating the average value of the jth characteristic of all data in the basic training data set X through a formula (2); STD_jCalculating the standard deviation of the jth characteristic of all data in the basic training data set X through a formula (3);

after the basic training data set is preprocessed through the operation of the first step, a training data set is obtained and is represented by a symbol X';

step two, constructing an integrated classification model;

the integrated classification model comprises a GBDT classifier, a KNN classifier and a stacking classifier;

the GBDT classifier is learned by iteratively constructing a classification regression tree CART to enhance Boosting thought; by the symbol f_t-1(x) Representing the GBDT classifier obtained by the iteration of the (t-1) th round, wherein t is a positive integer; by the symbol f_t(x) Representing a GBDT classifier obtained by the t-th iteration; by the symbol L (y, f)_t-1(x) Represents the loss function of the GBDT classifier obtained from the (t-1) th iteration; by the symbol L (y, f)_t(x) Represents the penalty function of the GBDT classifier obtained from the t-th iteration; by the symbol h_t(x) Representing a fitting function obtained by learning in the t-th round; in the learning process of the GBDT classifier, the (t-1) th iteration is to find the L (y, f) in the formula (4)_t(x) ) takes a minimum value of h_t(x) (ii) a Finding the smallest h_t(x) The process of (1) adopts a method of negative gradient fitting of a loss function;

L(y,f_t(x))＝L(y,f_t-1(x)+h_t(x)) (4)

the KNN classifier is used for classifying the DoS type data and predicting the subtype of the DoS type data; setting a parameter K of the KNN classifier to be 3;

the stacking classifier is used for classifying non-DoS type data; the stacking classifier is divided into a primary classification model and a secondary classification model; the primary classification model is divided into an upper layer and a lower layer, and the upper layer of the primary classification model is formed by connecting 3 xgboost classification module groups, 1 SVM classification module group, 1 GBDT classification module group and 1 RF classification module group in parallel; each xgboost classification module group is formed by connecting m xgboost classification modules in parallel, each SVM classification module group is formed by connecting m SVM classification modules in parallel, each GBDT classification module group is formed by connecting m GBDT classification modules in parallel, and each RF classification module group is formed by connecting m RF classification modules in parallel; m is an artificial set value, and m belongs to [3,8 ]; the lower layer of the primary module is a splicing and voting module;

the output ends of the 3 xgboost classification module groups, the 1 SVM classification module group, the 1 GBDT classification module group and the 1 RF classification module group at the upper layer of the primary classification model are respectively connected with the input ends of the splicing and voting modules at the lower layer of the primary classification model; in the training phase, the splicing and voting module is used for: combining output results of each xgboost classification module group, SVM classification module group, GBDT classification module group and RF classification module group at the upper layer of the primary classification model to obtain a vector matrix called stacking vector matrix; in the testing stage, the splicing and voting module is used for: corresponding to a piece of test data, voting output results of each xgboost classification module group, SVM classification module group, GBDT classification module group and RF classification module group at the upper layer of the primary classification model respectively, obtaining a classification result by each classification module group, and then combining the classification results to obtain a 1 x 6 stacking feature vector;

the secondary classification model is an SVM classifier, is optimized by adopting an FOA algorithm, and inputs stacking characteristic vectors generated for the primary classification model, and the specific method comprises the following steps:

step 2.1: initializing SVM kernel function parameter gamma and punishment parameter C, gamma belongs to [0.001, 5]]，C∈[0.001,5](ii) a Setting the starting position of the fruit fly as (C)_begin,γ_begin) In which C is_begin＝C，γ_begin＝γ；

Step 2.2: setting the search distance val of the population size popsize, the iteration number epoch and the penalty parameter C_CSearch distance val of sum kernel parameter γ_γ(ii) a Wherein, popsize is E [8,15 ]]，epoch≥5，val_C∈[0.05,0.5]，val_γ∈[0.001,0.01]；

Step 2.3: calculating the position of the pth fruit fly at the next moment according to the formulas (6) to (7), and using the symbol (C)_p,γ_p) Denotes p ∈ [1, popsize >]；

C_p＝C_begin+val_C×ε (6)

γ_p＝γ_begin+val_γ×ε (7)

Wherein ε is a random value in the range of [ -1,1 ];

step 2.4: if the penalty parameter C is less than 0.001, C is 0.001; if C >5, then C is 5; if gamma is less than 0.001, then gamma is 0.001; when gamma is more than 5, the gamma is 5;

step 2.5: calculating fitness function values of all the positions of the fruit flies obtained in the step 2.3 according to a formula (8);

Fit(C_p,γ_p)＝accuracy(C_p,γ_p) (8)

wherein, Fit (C)_p,γ_p) The fitness function value of the position of the pth fruit fly is shown; accuracy (C)_p,γ_p) Representing the SVM classifier at parameter (C)_q,γ_q) Upper cross validation generated accuracy, C_q＝C_p,γ_q＝γ_p；

Step 2.6, finding out the maximum Fit in the fitness function values corresponding to the positions of all fruit flies at the current moment_maxAnd Fit_maxThe corresponding position is judged to be Fit at the moment_maxIf the fitness function value is higher than the fitness function value of the initial position, Fit is used_maxThe corresponding position replaces the initial position while saving the Fit_maxThen, the next iteration is carried out; if it is at that time Fit_maxIf the fitness function value is lower than the fitness function value of the initial position, the step 2.3 to the step 2.6 are repeatedly executed until the iteration times reach the epoch times, and the operation is ended;

the connection relation of the integrated classification model is as follows: external data enters the integrated classification model through the input end of the GBDT classifier; the output end of the GBDT classifier is respectively connected with the input ends of the KNN classifier and the stacking classifier; the output of the KNN classifier and the stacking classifier is used as the external output of the integrated classification model;

step three, training an integrated classification model;

training an integrated classification model on the basis of the operation of the first step and the operation of the second step; the method specifically comprises the following steps:

step 3.1: training a GBDT classifier; the method specifically comprises the following steps:

step 3.1.1: making category labels on data in the training data set X'; marking the data in the training data set X' as a DoS type and other 2 types;

step 3.1.2: training a GBDT classifier by using the marked training data set X';

obtaining a trained GBDT classifier through the operation of the step 3.1;

step 3.2: training a KNN classifier; the method specifically comprises the following steps:

step 3.2.1: constructing a DoS type data set by using data marked as DoS type in the training data set X ', and using symbols X'₁Represents;

step 3.2.2: to DoS type data set X'₁The data in (1) is marked by fine classification; the DoS type dataset, in symbol X'₁The data in (1) are subdivided into: smurf attacks, neptune attacks, back attacks, teardrop attacks, pod attacks, and Other attacks;

step 3.2.3: to DoS type data set X'₁Performing data down-sampling processing according to the subdivision type to solve the DoS type data set X'₁The quantity of each subdivision type data is unbalanced; the data set after data down-sampling, called KNN training data set, is represented by symbol X₁Represents;

step 3.2.4: training dataset X using KNN₁Training a KNN classifier;

obtaining a trained KNN classifier through the operation of the step 3.2;

step 3.3: training a stacking classifier; the method specifically comprises the following steps:

step 3.3.1: constructing a stacking training data set using data labeled as other types in training data set X', denoted by symbol X₂Representing, and then performing fine classification marking on data in a stacking training data set X₂Is subdivided into: normal, Probe, U2L, R2L;

step 3.3.2: training data set X₂The data of (1) is uniformly divided into m subsets which are respectively called as a 1 st subset, a 2 nd subset, … … and an m th subset; the data quantity of each subset is represented by the symbol M, and M is a positive integer;

step 3.3.3: training a group of RF classification modules; the method specifically comprises the following steps:

step 3.3.3.1: the temporary variable is represented by the symbol h, h ∈ [1, m ]; setting the initial value of h to 1;

step 3.3.3.2: training data set X₂As verification data; then, using stacking training data set X₂As training data, training an untrained RF classification module in the set of RF classification modules;

step 3.3.3.3: inputting the data of the h-th subset into the RF classification module trained in the step 3.3.3.2 for classification to obtain an M x 1 vector matrix;

step 3.3.3.4: if h < m, increasing the value of h by 1, and repeating the steps 3.3.3.2 to 3.3.3.4; otherwise, the operation of step 3.3.3.5 is performed;

step 3.3.3.5: combining the classification results of the 1 st subset to the m th subset obtained in the step 3.3.3.2 to obtain a classification result of the data of the stacking training data set in the RF classification module group, and sending the classification result to the splicing and voting module;

completing the training of the RF classification module group through the operations from step 3.3.3.1 to step 3.3.3.5, and obtaining a stacking training data set X₂The classification result of the data in the RF classification module group;

step 3.3.4: training an SVM classification module group; the method specifically comprises the following steps:

step 3.3.4.1: the temporary variable is represented by the symbol h, h ∈ [1, m ]; setting the initial value of h to 1;

step 3.3.4.2: training data set X₂As verification data; then, using stacking training data set X₂Taking other data as training data, and training an untrained SVM classification module in the SVM classification module group;

step 3.3.4.3: inputting the data of the h-th subset into the SVM classification module trained in the step 3.3.4.2 for classification to obtain an MX 1 vector matrix;

step 3.3.4.4: if h < m, increasing the value of h by 1, and repeating the steps 3.3.4.2 to 3.3.4.4; otherwise, the operation of step 3.3.4.5 is performed;

step 3.3.4.5: merging the classification results of the 1 st subset to the m th subset obtained in the step 3.3.4.2 to obtain a stacking training data set X₂The classification result of the data in the SVM classification module group is sent to a splicing and voting module;

completing training of the SVM classification module group through operations from step 3.3.4.1 to step 3.3.4.5, and obtaining a classification result of data of a stacking training data set in the SVM classification module group;

step 3.3.5: training a GBDT classification module group; the method specifically comprises the following steps:

step 3.3.5.1: the temporary variable is represented by the symbol h, h ∈ [1, m ]; setting the initial value of h to 1;

step 3.3.5.2: training data set X₂As verification data; then, using stacking training data set X₂The other data is used as training data to train a GBDT classification module which is not trained in the GBDT classification module group;

step 3.3.5.3: inputting the data of the h-th subset into the GBDT classification module trained in the step 3.3.5.2 for classification to obtain an Mx 1 vector matrix;

step 3.3.5.4: if h < m, increasing the value of h by 1, and repeating the steps 3.3.5.2 to 3.3.5.4; otherwise, executing the operation of step 3.3.5.5;

step 3.3.5.5:merging the classification results of the 1 st subset to the m th subset obtained in the step 3.3.5.2 to obtain a stacking training data set X₂The classification result of the data in the GBDT classification module group is sent to a splicing and voting module;

completing the training of the GBDT classification module group through the operations from step 3.3.5.1 to step 3.3.5.5, and obtaining the classification result of the data of the stacking training data set in the GBDT classification module group;

step 3.3.6: training an XGBOOST classification module group; the method specifically comprises the following steps:

step 3.3.6.1: the temporary variable is represented by the symbol h, h ∈ [1, m ]; setting the initial value of h to 1;

step 3.3.6.2: training data set X₂As verification data; then, using stacking training data set X₂The other data is used as training data to train an untrained XGBOOST classification module in the XGBOOST classification module group;

step 3.3.6.3: inputting the data of the h-th subset into the XGBOOST classification module trained in the step 3.3.6.2 for classification to obtain an Mx 1 vector matrix;

step 3.3.6.4: if h < m, increasing the value of h by 1, and repeating the steps 3.3.6.2 to 3.3.6.4; otherwise, the operation of step 3.3.6.5 is performed;

step 3.3.6.5: merging the classification results of the 1 st subset to the m th subset obtained in the step 3.3.6.2 to obtain the classification result of the data of the stacking training data set in the XGBOOST classification module group, and sending the classification result to the splicing and voting module;

completing the training of the XGBOST classification module group through the operations from step 3.3.6.1 to step 3.3.6.5, and obtaining a stacking training data set X₂The data of the XGBOOST is classified into a classification result in the XGBOOST classification module group;

step 3.3.7: repeating the step 3.3.6 for 2 times to finish the training of the other 2 XGB OST classification module groups and obtain a stacking training data set X₂The data in the other 2 XGBOOST classification module groups are classified into results and sent to a splicing and voting module;

and 3.3.8: the splicing and voting module carries out the stacking training data set X obtained from the step 3.3.3 to the step 3.3.7₂Combining the classification results of all the classification module groups to obtain a vector matrix of P multiplied by 6, namely a stacking vector matrix; wherein P represents a stacking training data set X₂The number of data of (a);

step 3.3.9: inputting the stacking vector matrix obtained in the step 3.3.8 into a secondary classification model SVM classifier of a stacking classifier, and performing training operation to obtain a trained stacking classifier;

completing training of a stacking classifier through the operation of the steps to obtain a trained integrated classification model;

step four, preprocessing the test data; the method specifically comprises the following steps:

step 4.1: acquiring network attack data to form an original test data set; the network attack data comprises numerical characteristics and character discrete characteristics; the character discrete type features are as follows: protocol type, service type and connection error identification;

step 4.2: converting each piece of original test data in the original test data set into a numerical original test data feature vector; the method specifically comprises the following steps:

step 4.2.1: extracting character discrete type features from each piece of data, and respectively coding the character discrete type features in a one-hot vector form, wherein one character discrete type feature corresponds to one-hot vector;

step 4.2.3: merging the numerical characteristic vector in the step 4.2.2 with the one-hot vector obtained in the step 4.2.1;

obtaining a numerical value type original test data characteristic vector corresponding to an original test data through the operation of the step;

the original test data set subjected to one-hot coding is called a basic test data set, and the symbol X is used_testRepresents; by the symbol x_test,ijRepresenting a base test data set X_testThe jth feature of the ith piece of data of (1);

step 4.3: the basic test data set X is given by the formula (5)_testCarrying out standardization processing on the data in (1);

wherein, x'_test,ijAs data x_test,ijData obtained after normalization processing; AVG_jThe average value of the jth feature of all data in the basic training data set X obtained in the step 1.4 is obtained; STD_jThe standard deviation of the jth feature of all data in the basic training data set X obtained in the step 1.4;

after the operation of step 4, preprocessing the basic test data set to obtain a test data set, using the symbol X'_testRepresents;

step five, classifying the test data;

inputting the test data obtained through the pretreatment in the fourth step into the integrated classification model trained in the third step for classification; the method comprises the following specific steps:

step 5.1: inputting a piece of test data obtained through the preprocessing in the step four into the GBDT classifier, and if the classification result is the DoS type, executing the operation in the step 5.2; if the classification result is the non-DoS type, executing the operation of the step 5.3;

step 5.2: inputting the test data into a KNN classifier for classification to obtain and output a final classification result, and finishing the operation;

step 5.3: respectively inputting the test data into each RF classification module in the RF classification module group, and outputting classification results to a splicing and voting module after classification operation; the splicing and voting module votes the output result of the RF classification module group to determine a classification result;

step 5.4: respectively inputting the test data into each GBDT classification module in the GBDT classification module group, and outputting classification results to a splicing and voting module after classification operation; the output result of the GBDT classification module group is voted by the splicing and voting module to determine a classification result;

step 5.5: respectively inputting the test data into each SVM classification module in the SVM classification module group, and outputting classification results to a splicing and voting module after classification operation; the splicing and voting module votes the output result of the SVM classification module group to determine a classification result;

step 5.6: respectively inputting the test data into each xgboost classification module in an xgboost classification module group, and outputting a classification result to a splicing and voting module after classification operation; the splicing and voting module votes the output result of the xgboost classification module group to determine a classification result;

step 5.7: repeating the operation of the step 5.6 for 2 times to obtain the classification results of the other 2 xgboost classification module groups;

step 5.8: combining the results of the steps 5.3 to 5.7 to obtain a 1 × 6 stacking vector;

step 5.9: inputting the 1 × 6 stacking vector obtained in the step 5.8 into a secondary classification model SVM classifier of the stacking classifier, performing classification operation to obtain and output a classification result of the test data, and ending the operation.

2. The network attack type identification method based on multi-layer detection as claimed in claim 1, characterized in that: in step 1.3, the problem of unbalanced quantity of each type of data in the original training data set is solved through data down-sampling and data up-sampling, specifically:

case 1: if the number of data of a certain type a in the original training data set is much greater than that of data of other types, the number of types a is reduced by using a data down-sampling method, specifically: randomly extracting a part of data from the data of the type A so as to reduce the data of the type A;

case 2: if the number of a certain type B in the original training data set is far lower than that of other types of data, increasing the number of the type B data by adopting a data up-sampling method;

the data up-sampling algorithm is a SMOTE algorithm.