CN117811801A

CN117811801A - Model training method, device, equipment and medium

Info

Publication number: CN117811801A
Application number: CN202311845143.8A
Authority: CN
Inventors: 吴爽; 白燕妮; 王硕; 林志祥; 兰晓军
Original assignee: Tianyi Safety Technology Co Ltd
Current assignee: Tianyi Safety Technology Co Ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-02

Abstract

The application discloses a model training method, device, equipment and medium, and relates to the technical field of network security. Acquiring an original data set and a Bi-directional time convolution network Bi-TCN model to be trained; classifying the characteristics of the original data set to obtain a plurality of attack characteristics in the original data set; performing iterative operation based on the multiple target attack features; the iterative operation includes: performing data dimension expansion on each network flow data corresponding to a plurality of target attack features to obtain a plurality of reference data; flow classification is carried out on the plurality of reference data to obtain a plurality of extension data; the multiple expansion data obtained by each iteration are used as expansion data corresponding to the multiple attack characteristics, and an expansion data set is obtained; training the Bi-TCN model to be trained based on the original data set and the extended data set to obtain a trained Bi-TCN model. By performing multiple data dimension expansion on the attack characteristics, the problem of data unbalance is well solved, and the accuracy of the network flow detection result is improved.

Description

Model training method, device, equipment and medium

Technical Field

The present application relates to the field of network security technologies, and in particular, to a model training method, device, equipment, and medium.

Background

With the rapid development of information technology, various infrastructures are kept away from the network, and thus are essentially subject to threat attacks of network traffic. The security of network traffic plays an important role in the smooth operation of the infrastructure system. The current detection modes for network traffic can be mainly classified into rule-based detection, statistical-based detection and machine learning-based detection.

The rule-based network traffic detection method is to create rules and perform threat traffic detection using a priori knowledge of the attack, such as some of the features that the attack has. The network traffic detection method based on the statistical detection is to detect the abnormality by establishing the statistical distribution of the intrusion pattern.

However, the rule-based traffic detection method needs to manually create different rules for different network traffic when being executed, and the rules are difficult to update, so that the accuracy and the efficiency of detection are low. The network traffic detection method based on statistical detection has very high calculation cost and very limited capability of processing large data volume. More and more researchers are applying network traffic detection methods based on machine learning techniques to the security protection of various infrastructure systems.

The main method for constructing the attack detection system of the network traffic at present is to establish an attack detection model through a public network traffic data set, but most of the network traffic data sets have the problem of data unbalance. The use of an unbalanced network traffic dataset to build an attack detection model can reduce the accuracy of the detection results of the subsequent use of the model to detect network traffic.

Disclosure of Invention

The embodiment of the application provides a model training method, device, equipment and medium, which are used for solving the problem that the accuracy of a detection result of detecting network traffic by using a model can be reduced when an attack detection model is built by using a network traffic data set with unbalanced data.

In a first aspect, an embodiment of the present application provides a model training method, where the method includes:

acquiring an original data set and a Bi-directional time convolution network Bi-TCN model to be trained; the original dataset includes a plurality of network traffic data;

performing feature classification on the original data set to obtain a plurality of attack features in the original data set;

performing iterative operation based on a plurality of target attack characteristics until the similarity between a plurality of expansion data of the current iteration and a plurality of network flow data in the original data set is greater than a similarity threshold value, and ending the iterative operation; the iterative operation includes: performing data dimension expansion on each network flow data corresponding to the target attack characteristics to obtain a plurality of reference data; performing flow classification on the plurality of reference data to obtain the plurality of extension data; if the iteration is the first iteration, the target attack features are the attack features; if the iteration is not the first iteration, the target attack features are attack features corresponding to the expansion data obtained in the previous iteration;

Taking a plurality of expansion data obtained by each iteration as expansion data corresponding to a plurality of attack characteristics to obtain an expansion data set;

and training the Bi-TCN model to be trained based on the original data set and the extension data set to obtain a trained Bi-TCN model.

In some embodiments, after obtaining an extended data set and before training the Bi-TCN model to be trained based on the original data set and the extended data set, obtaining a trained Bi-TCN model, the method further comprises:

respectively performing quality inspection on each network flow expansion data in the expansion data set to obtain network flow expansion data conforming to a preset quality detection rule;

and taking the network flow expansion data which accords with the preset quality detection rule as the network flow expansion data in the expansion data set.

In some embodiments, the network traffic extension data according to the preset quality detection rule includes:

network traffic extension data that does not include illegal values;

when the maximum mean difference value between the extended data set and the original data set is smaller than a preset difference threshold value, the extended data set comprises network flow extended data;

Network traffic extension data including a plurality of preset attack features; the preset attack features are used for classifying the features of the original data set.

In some embodiments, the classifying the features of the original data set to obtain a plurality of attack features in the original data set includes:

for each network traffic data in the original dataset:

extracting the characteristics of the network traffic data to obtain a plurality of characteristic information corresponding to the network traffic data;

and taking the characteristic information which is the same as any one of the preset attack characteristics as the attack characteristics in the original data set.

In some embodiments, the performing data dimension expansion on each network traffic data corresponding to the multiple target attack features to obtain multiple reference data includes:

for each network traffic data, the following operations are performed:

forming a one-dimensional feature vector by the target attack features corresponding to the network traffic data, and multiplying the one-dimensional feature vector by a unit matrix to obtain a first matrix;

determining the row number and the column number of each parameter in a two-dimensional matrix according to the number of target attack features corresponding to the network flow data;

For each parameter in the two-dimensional matrix: if the row number and the column number of the parameter are the same, the value of the parameter is zero; if the row number and the column number of the parameter are different, determining a column vector corresponding to the row number and a column vector corresponding to the column number based on the first matrix, and determining a value of the parameter based on the column vector corresponding to the row number and the column vector corresponding to the column number;

inputting the two-dimensional matrix into a generator of a CL-WGAN model to generate the plurality of reference data.

In some embodiments, the classifying the traffic of the plurality of reference data to obtain the plurality of extension data includes:

inputting the multiple reference data into a convolutional two-way long-short-term memory neural network CNN-BiLSTM, and extracting features of the multiple reference data to obtain feature vectors;

extracting forward and reverse features of the feature vectors to obtain sequence feature vectors in two directions, and fusing the sequence feature vectors in the two directions to obtain flow classification results of the plurality of reference data;

the flow classification result is characterized as reference data of attack network flow as expansion data; the attack network flow is the network flow with the same characteristic information as any one of the preset attack characteristics.

In some embodiments, training the Bi-TCN model to be trained based on the original data set and the extended data set to obtain a trained Bi-TCN model, including:

taking the original data set and the extension data set as training data sets, inputting the Bi-TCN model to be trained, and obtaining classification results of all network flow data in the training data sets;

determining a loss function of the Bi-TCN model to be trained based on the classification result and the labeling data in the training data set; the labeling data in the training data set represents whether each network traffic data in the training data set is attack network traffic or not;

and adjusting the weight parameters and the bias parameters of the Bi-TCN model to be trained based on the loss function of the Bi-TCN model to be trained until the classification result is consistent with the labeling data in the training data set, thereby obtaining the trained Bi-TCN model.

In some embodiments, the Bi-TCN model includes an input layer, a convolution layer, a full connection layer, and a softmax layer, the convolution layer including n residual modules, each residual module including a causal hole convolution unit; the expansion coefficient of the causal hole convolution unit of each residual error module is exponentially increased; the method further comprises the steps of:

Inputting the original data set into the convolution layer, carrying out forward and reverse feature extraction in each residual error module according to the expansion coefficient of the causal cavity convolution unit to obtain sequence feature vectors in two directions, fusing the sequence feature vectors in the two directions, and inputting the sequence feature vectors into the next residual error module until the last residual error module outputs the feature vector after convolution processing;

inputting the feature vector subjected to convolution processing into the full connection layer, and determining a reference feature vector based on weight parameters and bias parameters in the trained Bi-TCN model;

and inputting the reference feature vector output by the full connection layer into a softmax layer for activation operation to obtain classification results of a plurality of network flow data in the original data set.

In a second aspect, embodiments of the present application provide a model training apparatus, the apparatus including:

the first acquisition module is used for acquiring an original data set and a Bi-directional time convolution network Bi-TCN model to be trained; the original dataset includes a plurality of network traffic data;

the classification module is used for classifying the characteristics of the original data set to obtain a plurality of attack characteristics in the original data set;

The expansion module is used for carrying out iterative operation based on a plurality of target attack characteristics until the similarity between a plurality of expansion data of the iteration and a plurality of network flow data in the original data set is greater than a similarity threshold value, and ending the iterative operation; the iterative operation includes: performing data dimension expansion on each network flow data corresponding to the target attack characteristics to obtain a plurality of reference data; performing flow classification on the plurality of reference data to obtain the plurality of extension data; if the iteration is the first iteration, the target attack features are the attack features; if the iteration is not the first iteration, the target attack features are attack features corresponding to the expansion data obtained in the previous iteration;

the extended data set determining module is used for taking a plurality of extended data obtained by each iteration as extended data corresponding to the attack characteristics to obtain an extended data set;

and the training module is used for training the Bi-TCN model to be trained based on the original data set and the extension data set to obtain a trained Bi-TCN model.

In some embodiments, the apparatus further comprises:

The quality detection module is used for respectively carrying out quality inspection on each network flow expansion data in the expansion data set to obtain network flow expansion data conforming to a preset quality detection rule;

network traffic extension data that does not include illegal values;

In some embodiments, the classification module is specifically configured to:

for each network traffic data in the original dataset:

In some embodiments, the expansion module is specifically configured to:

for each network traffic data, the following operations are performed:

In some embodiments, the expansion module is specifically configured to:

In some embodiments, the training module is specifically configured to:

In some embodiments, the Bi-TCN model includes an input layer, a convolution layer, a full connection layer, and a softmax layer, the convolution layer including n residual modules, each residual module including a causal hole convolution unit; the expansion coefficient of the causal hole convolution unit of each residual error module is exponentially increased;

the device also comprises a detection module for:

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the model training method described above.

In a fourth aspect, embodiments of the present application provide a storage medium, which when executed by a processor of an electronic device, is capable of performing the above-described model training method.

In a fifth aspect, embodiments of the present application provide a computer program product, which when executed by an electronic device, enables the electronic device to implement the above-described model training method provided by the present application.

The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:

in the embodiment of the application, an original data set and a Bi-directional time convolution network Bi-TCN model to be trained are obtained; the original data set comprises a plurality of network traffic data; classifying the characteristics of the original data set to obtain a plurality of attack characteristics in the original data set; performing iterative operation based on the target attack characteristics until the similarity between the expansion data of the iteration and the network flow data in the original data set is greater than a similarity threshold value, and ending the iterative operation; the iterative operation includes: performing data dimension expansion on each network flow data corresponding to a plurality of target attack features to obtain a plurality of reference data; flow classification is carried out on the plurality of reference data to obtain a plurality of extension data; if the iteration is the first iteration, the target attack features are attack features; if the iteration is not the first iteration, the target attack features are attack features corresponding to the expansion data obtained in the previous iteration; the multiple expansion data obtained by each iteration are used as expansion data corresponding to the multiple attack characteristics, and an expansion data set is obtained; training the Bi-TCN model to be trained based on the original data set and the extended data set to obtain a trained Bi-TCN model.

Therefore, by classifying the original data set, the attack characteristics corresponding to the attack network flow with smaller data quantity are obtained, then the attack characteristics are subjected to multiple data dimension expansion to generate an expanded data set, so that the attack network flow with the attack characteristics is improved, then the Bi-TCN model is trained by using the expanded data set and the original data set together, the problem of data unbalance is well solved, and the accuracy of a detection result of detecting the network flow by using the model is further improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a schematic system structure diagram of a model training method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a model training method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a CL-WGAN model according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of data dimension expansion according to an embodiment of the present application;

fig. 5 is a flow chart of a flow classification provided in an embodiment of the present application;

fig. 6 is a schematic flow chart of training a Bi-TCN model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a Bi-TCN model according to an embodiment of the present application;

fig. 8 is a flow chart of a method for detecting network traffic data according to an embodiment of the present application;

FIG. 9 is a sub-block diagram of a convolutional layer provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of a model training device according to an embodiment of the present application;

fig. 11 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Wherein the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Also, in the description of the embodiments of the present application, "/" means or, unless otherwise indicated, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and in addition, in the description of the embodiments of the present application, "plural" means two or more than two.

The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", or the like may explicitly or implicitly include one or more such feature, and in the description of embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

To facilitate understanding of the present application, some technical terms related to the present application are described below:

1. data imbalance: meaning that the number of samples in the same dataset varies very much between different categories, which can make the features of a small number of samples difficult to extract. Data unbalance is mainly divided into two types of large data distribution unbalance and small data distribution unbalance. Wherein, the big data distribution imbalance means that in the data with larger overall scale, the sample of a certain class occupies smaller area; small data distribution imbalance is a smaller overall scale of data, with a smaller number of samples of a certain class.

2. Time convolutional network (Temporal Convolutional Network, TCN): refers to a new model that combines convolutional neural networks with time series models for time series data classification.

3. Generating an antagonism network (Generative Adversarial Network, GAN): including generating a network and discriminating between networks. Inspired by zero and game in game theory, the problem of generating data in GAN can be regarded as the antagonism zero and game between the discrimination network and the generation network, and model optimization is achieved in the process, so that the method is widely used in the field of deep learning images.

In view of this, the embodiments of the present application provide a model training method, apparatus, device, and medium, which are used to solve the problem that using a network traffic data set with unbalanced data to build an attack detection model may reduce the accuracy of a detection result of detecting network traffic using the model later.

The inventive concept of the embodiments of the present application: according to the method and the device, the original data set is classified to obtain the attack characteristics corresponding to the attack network flow with smaller data quantity, then the attack characteristics are subjected to multiple data dimension expansion to generate the expanded data set, so that the attack network flow with the attack characteristics is improved, then the Bi-TCN model is trained by using the expanded data set and the original data set together, the problem of data unbalance is well solved, and the accuracy of a detection result of detecting the network flow by using the model is further improved.

In order to construct a real-time, efficient and accurate attack detection model, on one hand, the embodiment of the application ensures that the data set has sufficient attack flow characteristics; on the other hand, the model should be able to learn the attack characteristics of the traffic for classification. Therefore, firstly, in terms of data, aiming at the problem that the abnormal flow detection accuracy is reduced due to unbalanced flow data proportion in the public data set, data supplementation is carried out on abnormal flow data with small data quantity in a sample generation mode, the passing rate of flow quality detection is high, the problem of unbalanced data is well solved, and a balanced data set is obtained. In the aspect of a model, the existing time convolution network is improved to be a bidirectional time convolution network model, so that flow characteristic information in longer time dimension can be better captured, and the space complexity of the model is low.

Fig. 1 is a schematic system structure diagram of a model training method according to an embodiment of the present application. Firstly, performing data unbalance processing on an original data set by using a CL-WGAN model, then performing pretreatment such as data integration, data cleaning, data normalization and the like on balanced data, and then performing model training and model testing on a Bi-TCN deep learning model to obtain network traffic detection and obtain a classification result, namely obtaining attack network traffic and normal network traffic in the original data set.

In order to further explain the technical solutions provided in the embodiments of the present application, the following details are described with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide the method operational steps as shown in the following embodiments or figures, more or fewer operational steps may be included in the method based on routine or non-inventive labor. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present application.

Referring to fig. 2, a flow chart of a model training method according to an embodiment of the present application is shown. The method comprises the steps as shown in fig. 2:

In step 201, obtaining an original data set and a Bi-directional time convolution network Bi-TCN model to be trained; the original dataset includes a plurality of network traffic data.

After the original data set is acquired, the original data in the original data set is subjected to data preprocessing such as data cleaning and normalization.

In step 202, the original data set is feature classified to obtain a plurality of attack features in the original data set.

In some embodiments, feature classification of the original dataset to obtain a plurality of attack features in the original dataset may be performed as: for each network traffic data in the original dataset:

extracting the characteristics of the network traffic data to obtain a plurality of characteristic information corresponding to the network traffic data; and taking the characteristic information which is the same as any one of the preset attack characteristics as the attack characteristics in the original data set.

The preset attack features can be set according to actual experience or actual needs, and the application is not limited to the attack features.

In specific implementation, the data expansion is only generated for the attack feature, so that the original data set needs to be divided into attack network traffic and normal network traffic. Therefore, a plurality of attack features are preset and stored in the server, then the features of the network traffic data in the original data set are extracted and compared with the preset plurality of attack features, the features of all the data in the original data set are divided into attack features and non-attack features, namely the features identical to any one of the preset plurality of attack features are attack features, and otherwise the features are non-attack features.

In step 203, performing iterative operation based on the multiple target attack features until the similarity between the multiple pieces of expanded data of the current iteration and the multiple pieces of network traffic data in the original data set is greater than a similarity threshold value, and ending the iterative operation; the iterative operation includes: performing data dimension expansion on each network flow data corresponding to a plurality of target attack features to obtain a plurality of reference data; and carrying out flow classification on the plurality of reference data to obtain a plurality of extension data.

If the iteration is the first iteration, the target attack features are attack features; if the iteration is not the first iteration, the target attack features are attack features corresponding to the expansion data obtained in the previous iteration.

In specific implementation, as shown in fig. 3, when the CL-WGAN model is used to perform data unbalance processing on the original data set, the original data set is input into the CL-WGAN model, and the data dimension expansion unit, the generator, the classification control unit and the discriminator in the model are used to finally generate an expanded data set.

In some embodiments, when performing data dimension expansion on each network traffic data corresponding to a plurality of target attack features to obtain a plurality of reference data, the operations shown in fig. 4 are performed for each network traffic data respectively:

In step 401, a one-dimensional feature vector is formed by the target attack features corresponding to the network traffic data, and the one-dimensional feature vector is multiplied by a unit matrix to obtain a first matrix;

in step 402, determining a row number and a column number of each parameter in the two-dimensional matrix according to the number of target attack features corresponding to the network traffic data;

in step 403, for each parameter in the two-dimensional matrix: if the row number and the column number of the parameter are the same, the value of the parameter is zero; if the row number and the column number of the parameter are different, determining a column vector corresponding to the row number and a column vector corresponding to the column number based on the first matrix, and determining a value of the parameter based on the column vector corresponding to the row number and the column vector corresponding to the column number;

in step 404, the two-dimensional matrix is input into a generator of the CL-WGAN model, and a plurality of reference data are generated.

In specific implementation, the data dimension expansion operation is performed on the network traffic corresponding to the attack characteristics in the original data set, and the specific steps are as follows:

the target attack characteristics corresponding to the network traffic data are formed into a one-dimensional characteristic vector, and one network traffic corresponding to the attack characteristics is assumed to be expressed as X= [ X ] ₁ ,…,x _t ,…,x _n ]Wherein x is ₁ Representing certain characteristic information. The network traffic X may be converted into a two-dimensional matrix X' by way of data dimension expansion for each network traffic.

Specifically, the one-dimensional eigenvector is multiplied by the identity matrix to obtain a first matrix, as shown in formula (1):

then for i e n]Obtaining a matrix F corresponding to each characteristic information i according to the formula (1) and the formula (2) _i ：

Next, for each parameter in the two-dimensional matrix, a row number j and a column number k, j, k ε [ n ]]Defining the value x of each parameter in the two-dimensional matrix according to equation (1) _jk ：

Wherein F is _j Is a matrix F corresponding to the characteristic information j in the formula (2) _j I.e. first matrix XI _m Column vectors of the j-th column of (b);

finally, after the value calculation of each parameter in the two-dimensional matrix is finished according to the formula (3), a two-dimensional matrix X can be obtained ^′ ：

Thus, the characteristic information X in the network traffic X ₁ Then is converted into X ₁ ^‘ ＝[x ₁₁ ,…,x _1t ,…,x _1n ]Analogize to the rest of the characteristic information x _i Also into corresponding X _i ^‘ ＝[x _i1 ,…,x _it ,…,x _in ]The original one-dimensional vector is converted into a two-dimensional matrix, all characteristic information of the original network flow X is contained in the matrix, data after data dimension expansion is obtained, the two-dimensional matrix after the data dimension expansion is input into a generator, the data after the generation is obtained, and the data after the generation is subjected to dimension conversion inverse operationAnd converting into a vector form, namely obtaining a plurality of parameter flows corresponding to the network flow through the reverse operation of the formulas (1), (2) and (3).

In some embodiments, traffic classification is performed on the plurality of reference data to obtain a plurality of extension data, which may be performed as:

inputting a plurality of reference data into a convolutional two-way long-short-term memory neural network CNN-BiLSTM, and extracting features of the plurality of reference data to obtain feature vectors;

extracting forward and reverse features of the feature vectors to obtain sequence feature vectors in two directions, and fusing the sequence feature vectors in the two directions to obtain flow classification results of a plurality of reference data;

the flow classification result is characterized as reference data of attack network flow as extension data; the attack network traffic is network traffic having the same characteristic information as any one of the preset attack characteristics.

In the implementation, as shown in fig. 5, a plurality of parameter flows corresponding to the network flow X are input into a pre-trained CNN-BiLSTM classification control unit to perform sample classification check, so as to obtain a flow classification result, wherein the flow classification result comprises flows which can be correctly classified and flows which cannot be correctly classified, namely attack network flow and normal network flow, then reference data which pass the check, namely flow data which can be correctly classified, are input into a discriminator, and the reference data which do not pass the check, namely the flow data which cannot be correctly classified, are directly discarded; and taking the result of the discriminator as the expansion data of the iteration, feeding back to the generator, and repeatedly executing the data dimension expansion and flow classification operation until a high-quality expansion data set is finally generated.

The CNN-BiLSTM classification control unit mainly comprises an input layer, a CNN (Convolutional Neural Networks, convolutional neural network) layer, a BiLSTM (Bi-directional Long Short-Term Memory network) layer, a connection layer and an output layer. The training data set firstly enters an input layer, then the relation among the features is extracted through a CNN layer, a feature vector is generated, then the training data set enters a BiLSTM layer to learn the rules among the features, and finally a classification result is obtained through an output layer.

The description of each layer structure of the CNN-BiLSTM classification control unit is as follows:

input layer: and performing model data reading work on the reference data after preprocessing. For example, the input vector may be represented as x= [ X ] with the batch length of n as reference data ₁ ,…,x _t ,…,x _n ]。

CNN layer: and extracting features from the reference data of the input layer to obtain the relation among the features, constructing important feature vectors, and providing the feature vectors to the BiLSTM layer. The CNN layer can be further divided into a convolution layer, a pooling layer and a full connection layer. The main functions of the convolution layer and the pooling layer are to extract the characteristics of the input vector, and then output the characteristic vector through the full connection layer.

BiLSTM layer: and carrying out forward and reverse feature extraction on the feature vectors extracted by the CNN layer to obtain sequence feature vectors in two directions, learning the feature relation between the time sequence features, and fusing the sequence feature vectors in the two directions according to the feature relation between the time sequence features to obtain a new feature vector. Here, the forward LSTM network and the reverse LSTM network are adopted to have two LSTM interconnections on the input sequence, and the data can pass through the cyclic neural network from the forward direction and the reverse direction, so that the context-associated characteristic information of the traffic data can be better captured.

Output layer: and the new feature vector obtained by the BiLSTM layer passes through the output layer to obtain a final flow classification result.

In step 204, the plurality of expansion data obtained in each iteration is used as expansion data corresponding to the plurality of attack features, and an expansion data set is obtained.

The conditions for ending the iterative operation in the application are as follows: and the similarity between the plurality of expansion data of the iteration and the plurality of network flow data in the original data set is larger than a similarity threshold value, and the iteration operation is ended. The similarity between the plurality of expanded data of the current iteration and the plurality of network traffic data in the original data set refers to the ratio of the same number of the plurality of expanded data of the current iteration and the plurality of network traffic data in the original data set in the plurality of expanded data. For example, 10 pieces of expanded data are generated in this iteration, but 9 pieces of expanded data are the same as a plurality of pieces of network traffic data in the original data set, that is, the similarity reaches 90% and is greater than the similarity threshold value 70%, and then the iteration operation is ended.

And then, each iteration is performed to obtain a plurality of expansion data, all the expansion data are used as expansion data corresponding to a plurality of attack features, and an expansion data set is generated.

In some embodiments, after obtaining the extended data set and before training the Bi-TCN model to be trained based on the original data set and the extended data set to obtain the trained Bi-TCN model, the model training method provided in the present application may further be executed as follows:

Respectively performing quality inspection on each network flow expansion data in the expansion data set to obtain network flow expansion data conforming to a preset quality detection rule; and taking the network flow expansion data which accords with the preset quality detection rule as the network flow expansion data in the expansion data set.

Wherein, network flow expansion data which accords with a preset quality detection rule comprises:

network traffic extension data that does not include illegal values;

when the maximum mean value difference value between the extended data set and the original data set is smaller than a preset difference threshold value, the extended data set comprises network flow extended data;

In the implementation, after the extended data set is obtained, quality inspection is carried out on each network traffic extended data in the extended data set in the whole and partial aspects respectively. Overall, calculating a maximum mean difference value (Maxmum Mean Discrepancy, MMD) between the extended dataset and the original dataset; locally, checking whether each network flow expansion data contains illegal values or has attack characteristics; wherein:

Illegal numerical inspection: checking whether each network traffic extension data in the extension data set contains some illegal values, for example, the illegal values include at least one of the following: character data, decimal points and null values which do not exist in the original data set; it can be checked whether the original data set has no character type and the extended data set has character type data; whether a multi-bit decimal point occurs; whether there is a null value, etc. And taking the network traffic expansion data which does not comprise illegal values as the network traffic expansion data which accords with the preset quality detection rule.

Maximum mean difference value checking: checking whether the original data set and the extended data set originate from the same distribution, i.e. assumption D _s ＝(x ₁ ,x ₂ ,…,x _n ) P (x) and D _t ＝(y ₁ ,y ₂ ,…,y _n ) Q (y), calculate D _s And D _t The smaller the MMD value, the better the representation model performance, the more similar the two datasets. When the maximum mean value difference value between the extended data set and the original data set is smaller than the preset difference threshold value, the network flow extended data included in the extended data set is used as network flow extended data conforming to the preset quality detection rule.

The preset difference threshold value can be set according to experience, or can be set according to actual needs, and the preset difference threshold value is not limited in this application.

Attack feature inspection: and checking whether the network traffic extension data in the extension data set contains a plurality of preset attack characteristics. And taking the network traffic expansion data comprising a plurality of preset attack characteristics as the network traffic expansion data conforming to the preset quality detection rule. The preset attack features are used for classifying the features of the original data set. Attack signature inspection is based on the principle of attack. For example, in the Modbus protocol, the Reconnaissance attack is exemplified, and the attack features are command_address (command address) and response_address (response address), that is, the attack features of the generated network traffic extension data include command_address and response_address, and other non-attack features should not include these features.

And finally, taking the network flow expansion data which accords with the preset quality detection rule in the expansion data set as the network flow expansion data in the expansion data set, and deleting the network flow expansion data which does not accord with the preset quality detection rule in the expansion data set, so that the training process of the Bi-TCN model is not used.

In step 205, training the Bi-TCN model to be trained based on the original data set and the extended data set to obtain a trained Bi-TCN model.

In some embodiments, training the Bi-TCN model to be trained based on the original data set and the extended data set to obtain a trained Bi-TCN model may be performed as:

taking the original data set and the extension data set as training data sets, inputting a Bi-TCN model to be trained, and obtaining classification results of all network flow data in the training data sets;

determining a loss function of the Bi-TCN model to be trained based on the classification result and the labeling data in the training data set; the labeling data in the training data set represents whether each network flow data in the training data set is attack network flow or not;

As shown in fig. 6, the extended data set is mixed with the original data set to obtain a new data set; and then carrying out data preprocessing on the new flow data, including data integration, data cleaning, data normalization and the like, dividing the new data set into a training data set and a test data set, for example, dividing the new data set into 10 parts, wherein 9 parts are used as the training data set, and 1 part is used as the test data set, and carrying out 10-fold cross-validation.

Then, the training data set is sent into the Bi-TCN model for training to obtain a classification result, a loss function of the Bi-TCN model to be trained is determined based on the classification result and labeling data in the training data set, whether the model is converged is judged according to the loss function, and the model is continuously and iteratively updated until the model is converged through an optimizer according to the loss function to obtain a trained Bi-TCN model; and then inputting the test data set into the trained Bi-TCN model, carrying out 10-fold cross validation on the trained Bi-TCN model, and outputting a test result.

In some embodiments, as shown in fig. 7, the trained Bi-TCN model includes an input layer, a convolution layer, a full-join layer, and a softmax layer, the convolution layer including n Residual modules (Residual blocks), each Residual module including two causal hole convolution units (Dilated Causal Conv), two weight normalizes (weight norms), two activation units (relus), and two regularization units (dropouts); the expansion coefficients of the causal hole convolution units of each residual module increase exponentially.

Therefore, after training the Bi-TCN model, the present application inputs the original data set into the trained Bi-TCN model to obtain classification results of a plurality of network traffic data in the original data set, that is, attack network traffic and normal network traffic in the original data set, which may be specifically executed as steps shown in fig. 8:

In step 801, inputting an original data set into a convolution layer, performing forward and reverse feature extraction in each residual error module according to expansion coefficients of a causal hole convolution unit to obtain sequence feature vectors in two directions, fusing the sequence feature vectors in the two directions, and inputting the sequence feature vectors into a next residual error module until a final residual error module outputs a feature vector after convolution processing;

in step 802, inputting the feature vector after convolution processing into a full connection layer, and determining a reference feature vector based on weight parameters and bias parameters in the trained Bi-TCN model;

in step 803, the reference feature vector output by the full connection layer is input into the softmax layer for activation operation, so as to obtain classification results of a plurality of network traffic data in the original data set.

Before the original data set is input into the convolution layer, the original data set is processed according to dimension information of the original data set, the size of a preset sliding window and the step length of the preset sliding window to obtain a processed original data set, and then the processed original data set is input into the convolution layer.

The size of the preset sliding window and the step length of the preset sliding window can be set according to experience, and can also be set according to actual needs, and the application is not limited to the size.

In particular, the time-series input vector of the original dataset may be expressed as x= [ X ] ₁ ,x ₂ ,…,x _t-1 ,x _t ]Where t is the batch length of the original dataset. Wherein input x at time i _i Can be expressed as x _i ＝[r,r ₂ ,…,r _k-1 ,r _k ]Where k is the dimension of the data at this point. The input vector X is cut by combining the idea of a sliding window, and if the size of the sliding window is m, sliding cutting is performed along a time sequence, so that a two-dimensional matrix with the size of m multiplied by k can be obtained at a time. When the step size of the sliding window is s, a three-dimensional matrix with the size of p×m×k can be finally obtained, wherein p=t-s, that is, the sliding number of sliding windows. Therefore, the original two-dimensional matrix is converted into the three-dimensional matrix, so that all sequence information can be covered, and the time characteristic can be ensured.

The three-dimensional matrix is then input into the convolutional layer for training. The convolutional layer consists of n time blocks in series, each time block being a residual block that can prevent the effects from deteriorating in model training. As the depth of the network increases, the expansion coefficient d in causal hole convolution will increase exponentially (d=2 ^n-1 ) The expansion coefficient d increases the expansion field so that the model can learn the information longer ago and the network depth decreases.

As shown in fig. 9, a sub-structure diagram of the convolutional layer is shown. The input value of the next module is a linear superposition output value obtained by a jump mechanism in the residual connection of the last module, the process is repeated continuously to complete the convolution process, and a two-dimensional matrix with the size of m multiplied by k is obtained as the final output. The method comprises the steps of carrying out forward and reverse feature extraction in each residual error module according to the expansion coefficient of a causal cavity convolution unit to obtain sequence feature vectors in two directions, fusing the sequence feature vectors in the two directions, and inputting the sequence feature vectors into the next residual error module until the last residual error module outputs the feature vector after convolution processing. If the expansion coefficient is 1, selecting one neuron from every two neurons, namely gray neurons, then acquiring the characteristic information of three adjacent neurons corresponding to the input layer from the forward direction and the reverse direction, fusing the sequence characteristic vectors of the two directions, inputting the sequence characteristic vectors into the neurons of the next hidden layer, and so on, acquiring the corresponding sequence characteristic vectors by other hidden layers, and finally outputting the sequence characteristic vectors at the output layer.

Then, the feature vector after convolution processing, namely, the two-dimensional matrix with m multiplied by k, is subjected to linear dimension reduction operation through the full connection layer, and then multiplied by the weight parameter, and the offset parameter is added to obtain a one-dimensional reference feature vector. Finally, the one-dimensional reference feature vector is subjected to activation operation through a sofamax layer, and a plurality of outputs are mapped into a (0, 1) interval to obtain probability distribution of each network flow data in the original data set.

The probability distribution represents the probability that the network flow data is normal network flow, if the probability is larger than a preset probability threshold value, the network flow data is determined to be normal network flow, otherwise, the network flow data is attack network flow, and a classification result is obtained.

The preset probability threshold value can be set according to experience, or can be set according to actual needs, and the method is not limited in this application.

The model training method provided by the embodiment of the application can be used for expanding attack flow samples: the method is characterized in that a near-real high-quality attack flow sample is constructed based on real attack flow data of a small sample, the distribution proportion of normal flow and attack flow is balanced, the attack characteristics of the attack flow are enriched, so that the better optimized establishment of a subsequent attack detection model is facilitated, or the method is used for a safety research expert to study experiments, and the network safety risk is reduced.

The model after the model training method is trained can also be used for detecting attack events and blocking attacks: the Bi-TCN model trained in the method is used for detecting attack flow, the efficiency is high, the Bi-TCN model is low in complexity, the resource consumption of a deployment machine is low, the detection result can be obtained accurately in real time, and real-time blocking can be achieved.

The model training method provided by the embodiment of the application can be used for enriching attack target range cases: the method for generating the extended data set in the model training method can generate the similar real attack flow for constructing the network target ranges of different environments and provide an environment for the attack actual combat simulation for the security personnel of each system, so that the generated flow effectively acts on the network target ranges and has high usability and practicability.

Based on the foregoing description, in the embodiment of the present application, an original data set and a Bi-directional time convolution network Bi-TCN model to be trained are obtained; the original data set comprises a plurality of network traffic data; classifying the characteristics of the original data set to obtain a plurality of attack characteristics in the original data set; performing iterative operation based on the target attack characteristics until the similarity between the expansion data of the iteration and the network flow data in the original data set is greater than a similarity threshold value, and ending the iterative operation; the iterative operation includes: performing data dimension expansion on each network flow data corresponding to a plurality of target attack features to obtain a plurality of reference data; flow classification is carried out on the plurality of reference data to obtain a plurality of extension data; if the iteration is the first iteration, the target attack features are attack features; if the iteration is not the first iteration, the target attack features are attack features corresponding to the expansion data obtained in the previous iteration; the multiple expansion data obtained by each iteration are used as expansion data corresponding to the multiple attack characteristics, and an expansion data set is obtained; training the Bi-TCN model to be trained based on the original data set and the extended data set to obtain a trained Bi-TCN model.

Based on the same technical conception, the embodiment of the application also provides a model training device, and the principle of solving the problem of the model training device is similar to that of the model training method, so that the implementation of the model training device can be referred to the implementation of the model training method, and the repetition is omitted.

Fig. 10 is a schematic structural diagram of a model training apparatus provided in an embodiment of the present application, where the apparatus includes a first obtaining module 1001, a classifying module 1002, an expanding module 1003, an expanding data set determining module 1004, and a training module 1005; wherein:

a first obtaining module 1001, configured to obtain an original data set and a Bi-directional time convolution network Bi-TCN model to be trained; the original dataset includes a plurality of network traffic data;

The classification module 1002 is configured to perform feature classification on the original data set to obtain a plurality of attack features in the original data set;

the expansion module 1003 is configured to perform an iterative operation based on a plurality of target attack features, until a similarity between a plurality of expansion data of the current iteration and a plurality of network traffic data in the original data set is greater than a similarity threshold, and end the iterative operation; the iterative operation includes: performing data dimension expansion on each network flow data corresponding to the target attack characteristics to obtain a plurality of reference data; performing flow classification on the plurality of reference data to obtain the plurality of extension data; if the iteration is the first iteration, the target attack features are the attack features; if the iteration is not the first iteration, the target attack features are attack features corresponding to the expansion data obtained in the previous iteration;

an extended data set determining module 1004, configured to obtain an extended data set by using a plurality of extended data obtained by each iteration as extended data corresponding to the plurality of attack features;

and a training module 1005, configured to train the Bi-TCN model to be trained based on the original data set and the extended data set, to obtain a trained Bi-TCN model.

In some embodiments, the apparatus further comprises:

network traffic extension data that does not include illegal values;

In some embodiments, the classification module 1002 is specifically configured to:

for each network traffic data in the original dataset:

In some embodiments, the expansion module 1003 is specifically configured to:

for each network traffic data, the following operations are performed:

In some embodiments, the expansion module 1003 is specifically configured to:

In some embodiments, the training module 1005 is specifically configured to:

the device also comprises a detection module for:

In this embodiment of the present application, the division of the modules is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The coupling of the individual modules to each other may be achieved by means of interfaces which are typically electrical communication interfaces, but it is not excluded that they may be mechanical interfaces or other forms of interfaces. Thus, the modules illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices. The integrated modules may be implemented in hardware or in software functional modules.

Having described the model training method and apparatus of exemplary embodiments of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.

An electronic device 130 implemented according to such an embodiment of the present application is described below with reference to fig. 11. The electronic device 130 shown in fig. 11 is merely an example, and should not be construed to limit the functionality and scope of use of embodiments of the present application in any way.

As shown in fig. 11, the electronic device 130 is embodied in the form of a general-purpose electronic device. Components of electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 connecting the various system components, including the memory 132 and the processor 131.

The at least one memory 132 stores a computer program executable by the at least one processor 131, which when executed by the at least one processor 131 causes the at least one processor 131 to perform the steps of any of the model training methods provided in the embodiments of the present application.

Bus 133 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.

Memory 132 may include readable media in the form of volatile memory such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the electronic device 130, and/or any device (e.g., router, modem, etc.) that enables the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur through an input/output (I/O) interface 135. Also, electronic device 130 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 130, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In an exemplary embodiment, a storage medium is also provided, which when executed by a processor of an electronic device, is capable of performing any of the above-described model training methods. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which, when executed by an electronic device, is capable of implementing the steps of any of the model training methods provided herein.

Also, a computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (EPROM), flash Memory, optical fiber, compact disc read-Only Memory (Compact Disk Read Only Memory, CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for device discovery in embodiments of the present application may take the form of a CD-ROM and include program code that can run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio Frequency (RF), etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In cases involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, such as a local area network (Local Area Network, LAN) or wide area network (Wide Area Network, WAN), or may be connected to an external computing device (e.g., connected over the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application also includes such modifications and variations.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, wherein after obtaining an extended data set and before training the Bi-TCN model to be trained based on the original data set and the extended data set to obtain a trained Bi-TCN model, the method further comprises:

3. The method of claim 2, wherein the network traffic extension data conforming to a preset quality detection rule comprises:

network traffic extension data that does not include illegal values;

4. The method of claim 1, wherein the feature classifying the original dataset to obtain a plurality of attack features in the original dataset comprises:

for each network traffic data in the original dataset:

5. The method of claim 1, wherein performing data dimension expansion on the network traffic data corresponding to the plurality of target attack features to obtain a plurality of reference data comprises:

for each network traffic data, the following operations are performed:

6. The method of claim 5, wherein said classifying the plurality of reference data for traffic to obtain the plurality of extension data comprises:

7. The method of claim 1, wherein training the Bi-TCN model to be trained based on the raw dataset and the expanded dataset to obtain a trained Bi-TCN model comprises:

8. The method of any of claims 1-7, wherein the Bi-TCN model comprises an input layer, a convolution layer, a full connection layer, and a softmax layer, the convolution layer comprising n residual modules, each residual module comprising a causal hole convolution unit; the expansion coefficient of the causal hole convolution unit of each residual error module is exponentially increased; the method further comprises the steps of:

9. A model training apparatus, the apparatus comprising:

10. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

11. A storage medium, characterized in that a computer program in the storage medium, when executed by a processor of an electronic device, is capable of performing the method of any of claims 1-8.