CN115567367A

CN115567367A - Network fault detection method based on multiple promotion ensemble learning

Info

Publication number: CN115567367A
Application number: CN202211153019.0A
Authority: CN
Inventors: 周静; 卢建平; 孙强; 程史靓; 黄蔚; 冯鑫; 夏榕泽; 石昌友; 韩欢
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2023-01-03

Abstract

The invention particularly relates to a network fault detection method based on multiple promotion ensemble learning, which comprises the following steps: inputting the network connection record into the trained fault detection model, and outputting a corresponding network fault predicted value; during training, firstly, a training sample set is obtained; secondly, inputting the training sample set into a convolutional neural network of a fault detection model for preliminary learning to obtain a plurality of CNN-based classifiers; then, paying attention to the training samples which are wrongly classified in the training process of the CNN-based classifier through a self-adaptive lifting algorithm, and constructing a corresponding AB model based on the CNN-based classifier; and then constructing corresponding sub-decision groups based on the AB model, and further performing weighted summation on the output of each sub-decision group through a multiple lifting algorithm to generate a corresponding network fault predicted value, so as to optimize a fault detection model. The method and the device can be well suitable for network fault detection, and can ensure the precision and generalization error of network fault detection, thereby improving the effect of network fault detection.

Description

Network fault detection method based on multiple promotion ensemble learning

Technical Field

The invention relates to the technical field of deep learning networks, in particular to a network fault detection method based on multiple-promotion ensemble learning.

Background

With the development of informatization and intellectualization, services operated on a communication network are more and more diversified, the network scale is more and more increased, the structure is more and more complex, and the probability and the influence of network fault events are continuously enlarged. The reasons for network failure are endless. These network failure events cause significant economic losses and social impact. The detection efficiency and accuracy of the network fault directly affect whether the network normally operates and the service quality, so that the research of a high-performance network fault diagnosis model to ensure the normal operation of the network is very important.

With the low data value density of network connection recorded data, traditional network fault diagnosis models appear successively, for example, many researchers adopt expert system diagnosis, signal processing diagnosis, statistical analysis diagnosis, state estimation diagnosis and the like. However, the traditional methods are difficult to avoid the characteristics of hierarchy, diffusivity and uncertainty presented by network faults, and are difficult to deal with the fault diagnosis problem of the complex network in the current big data background. The traditional network supervision system and a manual fault troubleshooting mode are difficult to weaken the influence caused by irrelevant or redundant features. Therefore, new fault diagnosis techniques capable of coping with today's complex network systems need to be explored.

The applicant finds that machine learning has the characteristic of strong learning capability, and a machine learning-based method is often used in network fault diagnosis, such as a BP algorithm, an RBF algorithm, a CNN algorithm and other network fault diagnosis models. However, the network fault has the characteristics of high feature attribute dimension, low data value density, large data volume and the like, so that the existing BP algorithm and RBF algorithm are not suitable for high-precision identification of the network fault. Meanwhile, a single base learner is difficult to ensure the precision and generalization error of fault diagnosis, and the accuracy of network fault detection is also low. Therefore, how to design a method suitable for network fault detection and capable of ensuring detection accuracy and generalization error is an urgent technical problem to be solved.

Disclosure of Invention

Aiming at the defects of the prior art, the technical problems to be solved by the invention are as follows: how to provide a network fault detection method based on multiple promotion ensemble learning, which is well suitable for network fault detection and can ensure the precision and generalization error of network fault detection, thereby improving the effect of network fault detection.

In order to solve the technical problems, the invention adopts the following technical scheme:

a network fault detection method based on multiple promotion ensemble learning comprises the following steps:

s1: acquiring a network connection record to be detected;

s2: inputting the network connection record into the trained fault detection model, and outputting a corresponding network fault predicted value;

the fault detection model comprises T sub-decision groups, and each sub-decision group comprises I _t AB models, and each AB model comprises K CNN-based classifiers;

during training, firstly, acquiring a plurality of network connection records with network fault labels to construct a training sample set; secondly, inputting the training sample set into a convolutional neural network of a fault detection model for preliminary learning to obtain a plurality of CNN-based classifiers; then, paying attention to training samples which are wrongly classified in the training process of the CNN-based classifier through a self-adaptive lifting algorithm, and constructing a corresponding AB model based on the CNN-based classifier; then, constructing a corresponding sub-decision group based on an AB model, and further performing weighted summation on the outputs of the T sub-decision groups through a multiple lifting algorithm to generate a corresponding network fault predicted value; finally, calculating training loss based on the network fault predicted value and the corresponding network fault label to optimize a fault detection model;

s3: and taking the network fault predicted value output by the fault detection model as a network fault detection result of the network connection record.

Preferably, in step S2, the convolutional neural network of the fault detection model is a LetNet5 model, which includes an input layer, a convolutional layer, a sampling layer, a full connection layer, and an output layer.

Preferably, the fault detection model is trained by:

s201: acquiring a plurality of network connection records carrying network fault labels to construct a training sample set; then preprocessing the training sample set to obtain a preprocessed data set, and setting the weight of each training sample in the preprocessed data set;

s202: initialization flag variable I _t Let t =1;

s203: initializing the number q =1 of AB models in the sub-decision group;

s204: initializing the number k =1 of CNN-based classifiers in the AB model;

s205: sampling for N times according to the replaced weight of each training sample to obtain a base classifier data set of the kth CNN base classifier in the qth AB model in the tth sub-decision group

And inputting the data to a convolutional neural network to output a corresponding CNN-based classifier

S206: if K is less than K, K = K +1, and returns to step S205; otherwise, constructing a qth AB model in the tth sub-decision group based on the obtained K CNN-based classifiers and the following formulaAB( _t D ^q ) And calculating the AB model AB: ( _t D ^q ) Output error of _t e ^q And further based on output error _t e ^q Adjusting the weight of each training sample in the preprocessed data set; at the same time, corresponding AB model data set vectors are obtained

In the formula: AB ( _t D ^q ) Representing the qth AB model in the tth sub-decision group;

representing the kth CNN-based classifier in the qth AB model in the tth sub-decision group;

representing the output weight of the kth CNN-based classifier in the qth AB model in the tth sub-decision group;

wherein, the first and the second end of the pipe are connected with each other,

in the formula:

representing the ratio of the number of samples which are not correctly classified in the kth CNN-based classifier in the qth AB model in the tth sub-decision group to the number of all samples;

s207: if q is less than I _t If q = q +1, and return to step S204; otherwise, based on the obtained I _t The AB models construct the tth sub-decision group, and the output of the tth sub-decision group is calculated by combining the following formula

And a weight α _t (ii) a At the same time, get the corresponding sonDecision group dataset vector

α _t ＝log[(1- _t e ^q )/ _t e ^q ]；

In the formula: alpha is alpha _t A weight representing the tth sub-decision group; _t e ^q representing the output error of the qth AB model in the tth sub-decision group;

s208: if T is less than T, T = T +1, and returns to step S203 to enter the next sub-decision group; otherwise, classifying the training samples through the T sub-decision groups respectively, and calculating by combining the following formula to obtain a network fault category with the maximum weight as a network fault predicted value;

_t β ^q ＝ _t e ^q /(1- _t e ^q )；

in the formula: MB (multimedia broadcasting) ^* ( _t D ^q ) Representing that the T sub-decision groups classify the training samples, and taking the network fault category with the maximum weight as the network fault predicted value of the network connection record; y is _t Representing AB model dataset vectors

A label vector formed by the network fault labels corresponding to the middle training sample, namely a network fault true value; alpha (alpha) ("alpha") _t Representing the weight of the tth sub-decision group; _t D ^q representing an input data set vector input into the qth AB model in the tth sub-decision group, namely a data set of K CNN-based classifiers in the AB model; _t β ^q and representing the output weight of the qth AB model in the tth sub-decision group to the fault detection model.

Preferably, in step S201, the training sample set is preprocessed by the following steps:

s2011: training sample set D = &(X ₁ ,Y ₁ ),…,(X _i ,Y _i ),…,(X _N ,Y _N ) I =1,2, \ 8230, N contains N training samples, where X is _i ＝{x _ij J =1,2, \8230;, M } denotes the i-th training sample, Y = { _i I =1,2, \8230;, N } represents the label vector of the training sample, M represents the original feature attribute dimension of the training sample;

s2012: for feature attribute vector X = { X ₁ ,…,X _j ,…,X _M J =1,2, \ 8230, M is treated numerically;

s2013: normalizing the characteristic attribute vector X after the numerical processing to obtain the characteristic attribute vector after the normalization processing

Further obtaining a preprocessed data set after numerical processing and normalization processing

Preferably, in step S2013, the feature attribute vector is normalized by the following formula:

wherein the content of the first and second substances,

in the formula:

representing the characteristic attribute vector of the j dimension after normalization processing; x _j Representing the jth characteristic attribute vector after the digitization processing; x is a radical of a fluorine atom _ij A characteristic attribute vector representing the jth dimension of the ith training sample; a. The _j 、S _j 、X _jmin 、X _jmax Respectively represent the jth characteristic attribute X _j Mean, variance, minimum, and maximum.

Preferably, in step S205, the flag variable I is calculated by the following formula _t : if I _t If t, then the dataset will be preprocessed

The weight of each training sample is set as

Otherwise according to the output error _t e ^q Adjusting the weight of each training sample; then, sampling with feedback is carried out according to the weight of each training sample in the preprocessed data set;

preferably, in step S206, the output error of the AB model is calculated by the following formula _t e ^q ：

In the formula: _t e ^q representing the output error of the qth AB model in the tth sub-decision group; weight (X) _t ) Representing training sample X _t The weight of (c); AB (X) _t ) Representing the output of the AB model.

Preferably, in step S206, the weight of each training sample in the preprocessed data set is adjusted as follows:

1) If it is _t e ^q If the number is more than 0.5, discarding the corresponding AB model, and calculating the weight of each training sample based on continuous Poisson distribution;

2) If it is _t e ^q =0, the output weight is set _t β ^q ＝10 ^-10 Calculating the weight of each training sample based on the continuous Poisson distribution;

3) If 0 < _t e ^q If less than 0.5, the output weight is set _t β ^q ＝ _t e ^q /(1- _t e ^q ) Simultaneous vector to AB model dataset _t D ^q Of the training samples, the weight of the misclassified sample divided by 2 _t e ^q The weight of the positive score sample is divided by 2 (1- _t e ^q ) And the minimum weight is 10 ^-8 。

Preferably, the continuous poisson distribution is represented by the following formula:

in the formula: p represents a probability value, namely the weight of the training sample; random (1, 2, \8230;, 999) indicates that an integer is randomly generated from 1 to 999.

Preferably, after the fault detection model is trained, the performance of the fault detection model is evaluated through the precision, the F1 value, the detection rate TPR and the false alarm rate FPR of various training samples;

1) Calculating the precision of each type of training sample by the following formula:

in the formula: acc _test Representing the precision of various training samples; n is a radical of _a Represents the number of samples that are correctly classified; n represents the total number of samples;

2) The F1 value was calculated by the following formula:

in the formula: f1 represents an F1 value; p represents precision; r represents the recall ratio;

3) The detection rate TPR and the false alarm rate FPR are calculated by the following formulas:

in the formula: TP represents the number of samples that are actually positive and the fault diagnosis model predicts as positive; FN represents the number of samples that are actually positive and the fault diagnosis model predicts negative; FP represents the number of samples that are actually negative and the fault diagnosis model predicts as positive; TN represents the number of samples that are actually negative and the fault diagnosis model predicts negative.

The network fault detection method based on the multiple promotion ensemble learning has the following beneficial effects:

according to the invention, the network connection record is input into the convolutional neural network for preliminary learning, so that the advantages of core sharing, scale invariance and strong learning capability of the convolutional neural network can be utilized to adapt to the characteristics of high network fault characteristic attribute dimensionality, low data value density and large data volume, and further, the method is well suitable for network fault detection and improves the precision of network fault detection; meanwhile, the training samples which are wrongly classified in the training process of the CNN-based classifier are concerned by the self-adaptive lifting algorithm, so that the error of network fault classification can be reduced by utilizing the advantage of stronger deviation reduction capability of the self-adaptive lifting algorithm, and the precision of network fault detection can be further improved; in addition, the output of each sub-decision group is weighted and summed through the multiple lifting algorithm to generate the corresponding network fault predicted value, so that the advantage of variance can be effectively reduced through the multiple lifting algorithm to improve the classification accuracy, the problem of low value density of fault data is solved, and the generalization error of network fault detection can be reduced. Therefore, the method and the device can be well suitable for network fault detection, and can ensure the precision and the generalization error of the network fault detection, thereby improving the effect of the network fault detection.

Drawings

For a better understanding of the objects, solutions and advantages of the present invention, reference will now be made in detail to the present invention, which is illustrated in the accompanying drawings, in which:

FIG. 1 is a logic block diagram of a network fault detection method based on multiple boosting ensemble learning;

FIG. 2 is a network architecture diagram of a convolutional neural network;

FIG. 3 is a network architecture diagram during fault detection model training;

FIG. 4 is a network architecture diagram of a sub-decision group in a fault detection model;

FIG. 5 is a diagram of a feature property box for a network failure;

fig. 6 is a schematic diagram comparing the fault detection model of the present invention with other existing models.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings or the orientations or positional relationships that the products of the present invention are conventionally placed in use, and are only used for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance. Furthermore, the terms "horizontal", "vertical" and the like do not imply that the components are required to be absolutely horizontal or pendant, but rather may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined. In the description of the present invention, it should also be noted that, unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The following is further detailed by the specific embodiments:

the embodiment is as follows:

the embodiment of the invention discloses a network fault detection method based on multiple promotion ensemble learning.

As shown in fig. 1, the method for detecting network faults based on multiple boosting ensemble learning includes:

s1: acquiring a network connection record to be detected;

the fault detection model comprises T sub-decision groups, wherein each sub-decision group comprises I _t AB models, and each AB model comprises K CNN-based classifiers;

during training, firstly, a plurality of network connection records with network fault labels are obtained to construct a training sample set; secondly, inputting the training sample set into a convolutional neural network of a fault detection model for preliminary learning to obtain a plurality of CNN-based classifiers; then, paying attention to the training sample of the error classification in the training process of the CNN-based classifier through an adaptive boosting algorithm (AdaBoost, AB), and constructing a corresponding AB model based on the CNN-based classifier; then constructing a corresponding sub-decision group based on an AB model, and further performing weighted summation on the output of the T sub-decision groups through a multi-boost (MB) algorithm to generate a corresponding network fault predicted value; finally, calculating training loss based on the network fault predicted value and the corresponding network fault label to optimize a fault detection model;

in this embodiment, in the training sample, the network fault labels carried by the sample have two functions: judging whether an input sample is classified correctly or wrongly, if the input sample is a wrongly-classified sample, increasing the weight of the sample in the next iteration of the algorithm, and otherwise, reducing the weight of the sample, thereby optimizing a fault detection model; secondly, training loss (such as cross entropy loss) is calculated through the network fault predicted value and the corresponding network fault label, and then the existing mature means is combined with the training loss to optimize a fault detection model. In the test sample, the sample carries a label for calculating performance indexes of the fault detection model, such as accuracy, F1 value, detection rate TPR and false alarm rate FPR.

In this embodiment, a network connection record sequence of a certain period of time may be input into the fault detection model, and the fault detection model may determine whether a network fault occurs and detect a corresponding network fault type (i.e., a network fault prediction value), such as Normal record (Normal), denial of service type (DOS), monitoring and other detection activities (Probing), illegal access (R2L) from a remote machine, and illegal access (U2R) from a general user to a local super user privilege.

According to the invention, the network connection record is input into the convolutional neural network for preliminary learning, so that the advantages of core sharing, scale invariance and strong learning capability of the convolutional neural network can be utilized to adapt to the characteristics of high network fault characteristic attribute dimensionality, low data value density and large data volume, and further, the method is well suitable for network fault detection and improves the precision of network fault detection; meanwhile, the training samples which are wrongly classified in the training process of the CNN-based classifier are concerned by the self-adaptive lifting algorithm, so that the error of network fault classification can be reduced by utilizing the advantage of stronger deviation reduction capability of the self-adaptive lifting algorithm, and the precision of network fault detection can be further improved; in addition, the output of each sub-decision group is subjected to weighted summation through a multiple lifting algorithm to generate a corresponding network fault predicted value, so that the classification accuracy can be improved by effectively reducing the advantage of variance through the multiple lifting algorithm, the problem of low value density of fault data is solved, and the generalization error of network fault detection can be further reduced. Therefore, the method and the device can be well suitable for network fault detection, and can ensure the precision and the generalization error of the network fault detection, thereby improving the effect of the network fault detection.

In the specific implementation process, the Convolutional Neural Network (CNN) algorithm has the characteristics of certain rotation invariance and certain translation invariance, and the convolution operation and the pooling operation are performed by sharing a convolution kernel, so that the dimensionality of input data can be reduced. Considering that the feature attribute dimension of the network record connection is 41, the present invention will use CNN as the base classifier model. In addition, considering the requirements of the algorithm on memory occupation and calculated amount reduction, the invention adopts a LetNet5 model which is mature and used in the prior art and adopts an improved method of convolution after filling, thereby avoiding the reduction of dimensionality of input data in the convolution process and information loss.

As shown in fig. 2, the LetNet5 model includes an input layer, a convolutional layer, a sampling layer, a full connection layer, and an output layer;

the expression function of the output layer is expressed by the following formula:

y＝soft max(ωP+b)；

in the formula: ω represents the full connection layer weight matrix; p represents a pooling feature matrix; b represents a bias term.

And sampling K groups of balance data from the preprocessed data set as training sample sets to be transmitted to the CNN model, and training to obtain a corresponding CNN-based classification model. In particular, from pre-processed data sets

Intermediate sampling, obtaining K groups of balanced data sets, and forming a data set vector by the data sets

Wherein

Inputting the data serving as a training sample set into a LeNet5 CNN model until all K groups of data are trained, and obtaining an output layer value Z by combining a pooling feature matrix P, a full-connection layer weight matrix omega and a bias item b _k ＝soft max(ω _k P _k +b _k ) Then K corresponding CNN-based classification models { Z ] are obtained ₁ ,Z ₂ ,…,Z _K }. Wherein the data

From a pre-processed data set

Total number of base classification samples

The group number K of the groups is less than or equal to N.

The network connection records are input into the convolutional neural network for preliminary learning, so that the convolutional neural network has the advantages of core sharing, scale invariance and strong learning capacity to adapt to the characteristics of high network fault characteristic attribute dimensionality, low data value density and large data volume, and further can be well suitable for network fault detection and improve the precision of network fault detection.

In the specific implementation process, a multi-boost (MB) is a serial integration algorithm, and is composed of a plurality of sub-decision groups. A mark variable I for judging the iteration number of a sub-decision group is defined in a multiple promotion integration algorithm _t (t =1,2, \ 8230;, ∞) indicates that the t-th sub-decision group includes I _t A base classifier constructed by the AB algorithm. Inputting the obtained CNN-based classification model into MB model to obtain K CNN-based classification modelsModel (III)

Inputting the result into a qth AB algorithm model in the tth sub-decision group to obtain a qth AB algorithm model, continuously training the qth AB algorithm model through a CNN model and a corresponding AB model until the T sub-decision groups are trained, weighting and summing all the sub-decision groups according to corresponding weights, and finally obtaining an MB integrated model.

The multiple lifting integration algorithm based on the CNN-based classifier is realized by four steps: the method comprises the steps of obtaining a training sample set, inputting data, integrating a model learning process and outputting a diagnosis result. As shown in fig. 3 and 4, the fault detection model is trained by:

wherein the data set

A data set containing N pre-processed training samples.

Cycle number T: number of sub-decision groups.

Cycle number q: the number of AB models in the sub-decision group.

Cycle number K: the number of CNN-based classifiers in the AB model.

CNN-based classifier function: m = CNN (·).

Integer I _t The sub-decision group iteration end flag variable.

S201: acquiring a plurality of network connection records with network fault labels to construct a training sample set; then preprocessing the training sample set to obtain a preprocessed data set, and setting the weight of each training sample in the preprocessed data set;

s202: initialization flag variable I _t Let t =1;

s203: initializing the number q =1 of AB models in the sub-decision group;

s204: initializing the number k =1 of CNN-based classifiers in the AB model;

And inputting the data into a convolutional neural network to output a corresponding CNN-based classifier

The flag variable I is calculated by the following formula _t : if I _t If t, then the dataset will be preprocessed

The weight of each training sample is set as

s206: if K < K, K = K +1, and returns to step S205; otherwise, constructing a q-th AB model AB in the t-th sub-decision group based on the obtained K CNN-based classifiers by combining the following formula ( _t D ^q ) And calculating the AB model AB ( _t D ^q ) Output error of _t e ^q And further based on output error _t e ^q Adjusting the weight of each training sample in the preprocessed data set; meanwhile, a corresponding AB model data set vector (or called as an average data set) is obtained

In the formula: AB ( _t D ^q ) Representing the qth of the tth sub-decision groupAn AB model;

wherein the content of the first and second substances,

in the formula:

calculating the output error of the AB model by the following formula _t e ^q ：

The weight of each training sample in the preprocessed data set is adjusted in the following way:

3) If 0 < _t e ^q If < 0.5, the output weight is set _t β ^q ＝ _t e ^q /(1- _t e ^q ) Simultaneous vector to AB model dataset _t D ^q Of the training samples, the weight of the misclassified sample divided by 2 _t e ^q The weight of the positive score sample is divided by 2 (1-) _t e ^q ) And the minimum weight is 10 ^-8 。

The continuous poisson distribution is represented by the following formula:

in the formula: p represents a probability value, namely the weight of the training sample; random (1, 2, \ 8230;, 999) indicates that an integer is randomly generated from 1 to 999.

S207: if q is less than I _t If q = q +1, and return to step S204; otherwise, based on the obtained I _t The AB model constructs the t sub-decision group and calculates the output of the t sub-decision group by combining the following formula

And the weight alpha _t (ii) a At the same time, a corresponding sub-decision group dataset vector is obtained

α _t ＝log[(1- _t e ^q )/ _t e ^q ]；

In the formula: alpha is alpha _t Representing the weight of the tth sub-decision group; _t e ^q representing the output error of the qth AB model in the tth sub-decision group;

_t β ^q ＝ _t e ^q /(1- _t e ^q )；

in the formula: MB (multimedia broadcasting) ^* ( _t D ^q ) Representing that the T sub-decision groups classify the training samples, and taking the network fault category with the maximum weight as the network fault predicted value of the network connection record; y is _t Representing a dataset vector

A label vector formed by the network fault labels corresponding to the middle training sample, namely a network fault true value; alpha is alpha _t Representing the weight of the tth sub-decision group; _t D ^q representing an input data set vector input into the qth AB model in the tth sub-decision group, namely a data set of K CNN-based classifiers in the AB model; _t β ^q and representing the output weight of the qth AB model in the tth sub-decision group to the fault detection model.

According to the method, the training samples which are wrongly classified in the training process of the CNN-based classifier are concerned by the self-adaptive lifting algorithm, so that the error of network fault classification can be reduced by using the advantage of strong deviation reduction capability of the self-adaptive lifting algorithm, and the precision of network fault detection can be further improved; meanwhile, the output of each sub-decision group is weighted and summed through the multiple lifting algorithm to generate the corresponding network fault prediction value, so that the classification accuracy can be improved by effectively reducing the advantage of variance through the multiple lifting algorithm, the problem of low value density of fault data is solved, and the generalization error of network fault detection can be reduced.

In the specific implementation process, considering that the number of network connection records of the training sample set is large, and the massive network data contains many invalid characteristic attributes, irrelevant or redundant characteristics increase the consumption of the algorithm in space and time, and may also cause the accuracy of fault diagnosis to be reduced. On the other hand, from the overall statistical analysis condition of the network operation data, the values of the characteristic attributes are different, and even the value taking condition of some attributes is quite complex. If the data mining algorithm operation is directly carried out on the original network operation data set, the process is very complex, a large amount of manpower and material resources are consumed, and the fault diagnosis result is not ideal. Therefore, in order to reduce the dependency on the measurement unit and reduce the influence of the difference of the characteristic attribute measurement on the diagnostic algorithm, the characteristic attribute needs to be quantified and normalized.

Preprocessing a training sample set by the following steps:

s2011: training sample set D = { (X) ₁ ,Y ₁ ),…,(X _i ,Y _i ),…,(X _N ,Y _N ) I =1,2, \ 8230 }, N contains N training samples, where X is _i ＝{x _ij J =1,2, \8230, M represents the ith training sample, and Y = { Y } _i I =1,2, \8230;, N } represents the label vector of the training sample, M represents the original feature attribute dimension of the training sample;

Further obtaining a preprocessed data set after the numerical processing and the normalization processing

Normalizing the characteristic attribute vector by the following formula:

wherein the content of the first and second substances,

in the formula:

representing the characteristic attribute vector of the j dimension after normalization processing; x _j Representing the jth characteristic attribute vector after the digitization processing; x is the number of _ij Representing the characteristic attribute vector of the jth dimension of the ith training sample; a. The _j 、S _j 、X _jmin 、X _jmax Respectively represent the jth characteristic attribute X _j Mean, variance, minimum, and maximum.

In this embodiment, the preprocessing data set is subjected to one-hot encoding after the digitizing processing and the normalizing processing. The one-hot encoding is to encode by using classified variables, assuming L kinds of tags, first converting integer values from 0 to L-1 into binary data, and the one-hot encoding corresponding to the L kinds of tags is a string of binary data, and except for the flag bit corresponding to the integer being 1, other bits are all 0. The details are shown in Table 4.

According to the invention, through carrying out numerical processing and normalization processing on the training sample set, the dependence of the fault detection model on a measurement unit can be reduced, the influence on the fault detection model due to the difference of characteristic attribute measurement is weakened, and the performance of network fault detection can be further improved, so that the effect of network fault detection can be further improved.

In other preferred embodiments, the high-value density dataset may also be generated by the XGBoost algorithm:

XGboost algorithm-based calculation of preprocessed data set

Each characteristic attribute

j =1,2, \8230;, gain of M;

in the formula: g _j Representing feature attributes

The gain of (c); (G) _jL +G _jR ) ² /(H _jL +H _jR +λ)、

And

respectively representing the left sub-tree score, the right sub-tree score and the score which can be taken when the tree is not divided; g _jL 、H _jL 、G _jR 、H _jR Respectively representing the gradient and the second-order gradient of a left subtree and the gradient and the second-order gradient of a right subtree of the Taylor second-order expansion of the XGboost loss function; λ represents the degree of simplicity of the desired tree, and the larger the value thereof, the simpler the desired tree structure is; gamma represents the complexity cost of adding a new leaf node;

setting a threshold value eta: if G is _j If the result is less than eta, the corresponding characteristic attribute is removed, otherwise, the corresponding characteristic attribute is reserved;

repeating the steps until the gains of the M characteristic attributes are compared, and if the gains of the b characteristic attributes are smaller than a set threshold eta in the process, rejecting the b characteristic attributes and obtaining a high-value density data set

Wherein the feature attribute vector is

b represents the number of feature attributes to be culled.

According to the invention, the importance evaluation and the feature screening are carried out on the feature attributes of the training samples in the preprocessed data set through the XGboost algorithm, so that the feature attribute importance evaluation and the feature screening can be carried out on the network fault data, and further irrelevant or redundant feature attributes can be removed to obtain a high-value density data set to train a fault detection model, thereby the efficiency and the precision of the fault detection model training can be considered.

In the specific implementation process, after the fault detection model is trained, the performance of the fault detection model is evaluated through the precision, the F1 value, the detection rate TPR and the false alarm rate FPR of various training samples;

in the formula: acc _test Representing the precision of various training samples; n is a radical of hydrogen _a Represents the number of samples that are correctly classified; n represents the total number of samples;

2) Calculating the F1 value F1-score value reconciles the model recall and accuracy by the following formula):

in the formula: f1 represents an F1 value; p represents precision; r represents recall ratio;

The invention can well evaluate the performance of the fault detection model through the precision, the F1 value, the detection rate and the false alarm rate of various training samples, and further can effectively obtain the fault detection model with the best performance, thereby further improving the effect of network fault detection.

In order to better illustrate the advantages of the technical solution of the present invention, the following experiments are disclosed in this example.

The experiment used a data set KDDCup99 recognized by network security researchers as TCP dump network connection data that simulates the operation of the air force lan in the united states. The measured data set contains 450000 connection records, wherein the characteristic attribute description of the network fault and a certain sample of the network connection record are shown in table 1 and table 2, respectively. Sampling is carried out according to the number ratio shown in table 1, 10% of measured data sets are obtained as training sample sets, and 5% of measured data sets are obtained as test data sets. In order to verify the effectiveness of the algorithm provided by the patent, a tensoflow module is introduced in the experiment, and the algorithm provided by the patent is realized by adopting Python programming.

TABLE 1 characterization of network failures

TABLE 2A sample of network connection log data

(1) Feature attribute preprocessing

In order to simulate a real environment and detect the robustness of the algorithm, 50dB random noise is added into an original data set in the experiment, the data is digitized and normalized, the statistical analysis condition of each characteristic attribute is shown in figure 5, and a box diagram (boxplot) shows that the numerical values of some characteristics are almost all zero, so that the characteristics have almost no influence on the classification result, the data dimension reduction can be realized, and the removed characteristic attributes such asShown in table 3. In this experiment, the parameters were set to N =450000,m =41,b =5, and the data set was preprocessed

Wherein the feature attribute vector is

TABLE 3 characteristic attributes of rejected network failures

(2) Establishing a CNN-based classifier

The variables are classified and encoded by means of unique hot encoding, the integer values from 0 to 4 are converted into binary vectors, except that the flag bit corresponding to the integer is 1, and the other bits are all 0, and the result is shown in table 4.

TABLE 45 Classification class codes corresponding to network failures

Parameter setting of the input layer of the CNN-based classifier: w is a group of ₁ ＝36,H ₁ =36; the parameters of the convolutional layers C1 and C2 are set as: the size of the convolution kernel is 2 multiplied by 2, the step size is 1, and no filling is performed; parameter setting of sampling layers S1 and S2: step size 1, no padding; the output layer parameters are set as: the number of fault classes is 5 using the Softmax function. In the experiment, the number of the set base classification models is K =6, and the number of samples is

K sets of balanced datasets are obtained and these datasets form a dataset vector of

(3) MB serial integration model network fault diagnosis based on CNN-based classifier

The parameters for initializing the fault detection model (hereinafter also referred to as CNN + HB model) in the present invention are as follows: number of training samples

And (3) the number of the sub-decision groups T =5, and then the number of the AB algorithm models contained in each sub-decision group is calculated, wherein each AB algorithm model contains a base classification model, and the number of the base classification models is K =6. And calculating the precision and the F1 value of the network fault diagnosis model provided by the invention after training. The mean diagnostic accuracies and F1 values for Normal, DOS, probin, R2L, U2R were obtained as shown in Table 5. The detection rate TPR and the false alarm rate FPR of the fault detection model provided by the invention are respectively calculated to be 0.92 and 2.16.

TABLE 5 Fault diagnosis results based on CNN + HB model

Table 5 shows the diagnostic accuracy of each network type of the CNN + HB model, which is 90.43% at the minimum and 95.84% at the maximum, and the index F1 can reach 0.964, thus proving the feasibility of the invention.

In order to prove the effectiveness and high-precision performance of the model, the CNN + HB model, the LSTM model and the VSM model provided by the present patent are compared in the experiment, and the comparison result is shown in fig. 6.

With the increase of the number of iterations, fig. 6 shows the comparison result of the network fault diagnosis precision of the CNN + HB model, the LSTM model and the VSM model proposed in this patent. As can be seen from fig. 6, the average diagnostic accuracy of the CNN + HB model, the LSTM model, and the VSM model was relaxed when the number of training periods was 30, and the diagnostic accuracy of the three models was 95.4%, 90.7%, and 89.1%, respectively, when the number of training periods was 22. In addition, the CNN + HB integrates a plurality of CNN-based classification models, and the diagnosis error of the integrated model CNN + HB is smaller, namely the fault diagnosis precision is improved.

It should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the technical solutions, and those skilled in the art should understand that the technical solutions of the present invention can be modified or substituted with equivalent solutions without departing from the spirit and scope of the technical solutions, and all should be covered in the claims of the present invention.

Claims

1. A network fault detection method based on multiple promotion ensemble learning is characterized by comprising the following steps:

s1: acquiring a network connection record to be detected;

during training, firstly, acquiring a plurality of network connection records carrying network fault labels to construct a training sample set; secondly, inputting the training sample set into a convolutional neural network of a fault detection model for preliminary learning to obtain a plurality of CNN-based classifiers; then, paying attention to the training samples which are wrongly classified in the training process of the CNN-based classifier through a self-adaptive lifting algorithm, and constructing a corresponding AB model based on the CNN-based classifier; then, constructing a corresponding sub-decision group based on an AB model, and further performing weighted summation on the outputs of the T sub-decision groups through a multiple lifting algorithm to generate a corresponding network fault predicted value; finally, calculating training loss based on the network fault predicted value and the corresponding network fault label to optimize a fault detection model;

2. The method of claim 1, wherein the method comprises: in step S2, the convolutional neural network of the fault detection model is a LetNet5 model, which includes an input layer, a convolutional layer, a sampling layer, a full connection layer, and an output layer.

3. The method of claim 1, wherein the method comprises: training a fault detection model by:

s202: initialization flag variable I _t Let t =1;

s203: initializing the number q =1 of AB models in the sub-decision group;

s204: initializing the number k =1 of CNN-based classifiers in the AB model;

S206: if K < K, K = K +1, and returns to step S205; otherwise, constructing a q-th AB model AB in the t-th sub-decision group based on the obtained K CNN-based classifiers by combining the following formula ( _t D ^q ) And calculating the AB model AB ( _t D ^q ) Output error of _t e ^q And further based on output error _t e ^q Adjusting the weight of each training sample in the preprocessed data set; at the same time, corresponding AB model data set vectors are obtained

In the formula：AB( _t D ^q ) Representing the qth AB model in the tth sub-decision group;

wherein the content of the first and second substances,

in the formula:

And the weight alpha _t (ii) a At the same time, the corresponding sub-decision group data set vector is obtained

α _t ＝log[(1- _t e ^q )/ _t e ^q ]；

In the formula: alpha (alpha) ("alpha") _t A weight representing the tth sub-decision group; _t e ^q representing the output error of the qth AB model in the tth sub-decision group;

_t β ^q ＝ _t e ^q /(1- _t e ^q )；

in the formula: MB (multimedia broadcasting) ^* ( _t D ^q ) Representing that the T sub-decision groups classify the training samples, and taking the network fault category with the maximum weight as a network fault predicted value of the network connection record; y is _t Representing AB model dataset vectors

A label vector formed by the network fault labels corresponding to the middle training sample, namely a network fault true value; alpha (alpha) ("alpha") _t A weight representing the tth sub-decision group; _t D ^q representing an input data set vector input into the qth AB model in the tth sub-decision group, namely a data set of K CNN-based classifiers in the AB model; _t β ^q and representing the output weight of the qth AB model in the tth sub-decision group to the fault detection model.

4. The method of claim 3, wherein the method comprises: in step S201, a training sample set is preprocessed by the following steps:

s2011: training sample set D = { (X) ₁ ,Y ₁ ),…,(X _i ,Y _i ),…,(X _N ,Y _N ) I =1,2, \ 8230, N contains N training samples, where X is _i ＝{x _ij J =1,2, \8230, M represents the ith training sample, and Y = { Y } _i N represents a label vector of the training sample, and M represents an original feature attribute dimension of the training sample;

s2012: for feature attribute vector X＝{X ₁ ,…,X _j ,…,X _M J =1,2, \ 8230, M is treated numerically;

5. The method for network fault detection based on multiple boosting ensemble learning according to claim 4, wherein: in step S2013, the feature attribute vector is normalized by the following formula:

in the formula:

representing the characteristic attribute vector of the j dimension after normalization processing; x _j Representing the j-th characteristic attribute vector after the numeralization processing; x is the number of _ij A characteristic attribute vector representing the jth dimension of the ith training sample; a. The _j 、S _j 、X _jmin 、X _jmax Respectively represent the jth characteristic attribute X _j Mean, variance, minimum and maximum.

6. The method of claim 3, wherein the method comprises: in step S205, the following is performedFormula (II) calculation flag variable I _t : if I _t = t, then data set will be preprocessed

The weight of each training sample is set as

7. the method of claim 3, wherein the method comprises: in step S206, the output error of the AB model is calculated by the following formula _t e ^q ：

8. The method of claim 7, wherein the method comprises: in step S206, the weight of each training sample in the preprocessed data set is adjusted as follows:

2) If it is _t e ^q =0, then set outputWeight of _t β ^q ＝10 ^-10 Calculating the weight of each training sample based on the continuous Poisson distribution;

3) If 0 < _t e ^q If < 0.5, the output weight is set _t β ^q ＝ _t e ^q /(1- _t e ^q ) Simultaneous vector to AB model dataset _t D ^q Of the training samples, the weight of the misclassified sample divided by 2 _t e ^q The weight of the positive score sample is divided by 2 (1- _t e ^q ) And the minimum weight is 10 ^-8 。

9. The method of claim 8, wherein the method comprises: the continuous poisson distribution is represented by the following formula:

10. The method of claim 1, wherein the method comprises: after the fault detection model is trained, evaluating the performance of the fault detection model through the precision, the F1 value, the detection rate TPR and the false alarm rate FPR of various training samples;

2) The F1 value was calculated by the following formula: