CN115563610A

CN115563610A - Method and device for training and identifying intrusion detection model

Info

Publication number: CN115563610A
Application number: CN202211546247.4A
Authority: CN
Inventors: 左严; 杨萍萍; 王正荣; 王祥伟; 汤斌; 包寅杰; 贾俊铖; 胡梦娜
Original assignee: Jiangsu New Hope Technology Co ltd
Current assignee: Jiangsu New Hope Technology Co ltd
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-01-03
Anticipated expiration: 2042-12-05
Also published as: CN115563610B

Abstract

The invention relates to a training method, an identification method and a device of an intrusion detection model. The method specifically comprises the following steps: acquiring a sample data set, establishing a classification model, and training the classification model by using a primary training method based on the MAML. The classification model is a multi-channel CNN model. The multi-channel CNN model comprises: the device comprises an input layer and a plurality of channels, wherein each channel defines a Block, each Block comprises a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional maximum pooling layer and a Dropout layer, the device further comprises a splicing layer, the splicing layer is used for connecting local features extracted from the plurality of different channels to form a new feature vector, and a full connection layer and an output layer are sequentially arranged behind the splicing layer. The detection method based on the deep neural network and the meta-learning training thought can well solve the problem that a model cannot be trained due to insufficient attack sample data.

Description

Method and device for training and identifying intrusion detection model

Technical Field

The invention relates to the field of intrusion detection, in particular to a training method, an identification method and a device of an intrusion detection model.

Background

For certain specific types of attacks, most deep learning methods can accurately identify previously trained types of cyber attacks, provided that massive data and sufficient computational resources are provided. However, the current internet environment is changing, and new attack modes are coming up endlessly. For example, zero-day attacks (Zero-day), which refers to attacks that are immediately discovered and exploited to exploit, use a security vulnerability without patches to make a very destructive cyber attack on a system or software application. The depth model needs to be retrained in the face of detection of a new attack, the sample requirements are large and very time consuming. However, it is often difficult for security agencies to obtain enough attack instances in a short time to provide for model training. This leads to the problem that the model cannot be trained due to an insufficient number of samples.

Disclosure of Invention

In view of the above, it is necessary to provide a method for training an intrusion detection model to solve the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient attack sample data. The method trains a classifier with good generalization ability by using limited samples, and realizes quick learning and detection of newly attacked samples.

A method for training an intrusion detection model, comprising:

a sample data set is obtained, and the sample data set is obtained,

a classification model is established, and the classification model is established,

the classification Model was trained by a Model-iterative Meta-Learning (MAML) -based Meta-training method.

The detection method based on the deep neural network and the meta-learning training thought can well solve the problem that a model cannot be trained due to insufficient attack sample data.

In one embodiment, the classification model is a multi-channel CNN model.

In one embodiment, the multi-channel CNN model includes:

an input layer and a plurality of channels, each channel defining a Block, each Block comprising a two-dimensional convolutional layer, an activation function selection LeakyReLU, a 2-dimensional max pooling layer, and a Dropout layer,

the system also comprises a splicing layer which is used for connecting the local features extracted from a plurality of different channels to form a new feature vector,

and a full connecting layer and an output layer are sequentially arranged behind the splicing layer.

In one of the embodiments, the first and second parts of the device,

the probability distribution of the tag y in the output layer is calculated by a Softmax activation function.

In one embodiment, the sample data set comprises a meta training set Dmeta-train comprising a sample set and a query set and a meta test set Dmeta-test comprising a support set and a test set,

after the classification model is trained, entering a meta-test stage, wherein the meta-test stage comprises a fine-tuning stage and a verification stage,

the fine tuning phase comprises: using pre-trained model parameters when the model needs to adapt to a new specific task

And the sample data on the support set are used for fine tuning the model parameters, and the specific implementation is shown as the following formula,

，

wherein Pi denotes the support set of the ith taskAnd alpha is the learning rate shared between different tasks in the internal update step,

representing an initial parameter of

The training loss value of the model of (1) on task Ti,

the verification phase comprises: after the fine tuning stage, a group is obtained

Parameterized new model

On the test set, the new model

Evaluations were made and averaged to avoid chance of outcome.

In one embodiment, the training of the classification model by the MAML-based meta-training method specifically includes: training is based on dual gradient updates, including inner updates and outer updates,

in the internal update phase, the training loss value on each task Ti is first calculated using the sample set data Si

And optimally updating the local parameter theta of each task Ti along the gradient descending direction, wherein the formula is as follows:

where α is the learning rate shared between different tasks in the internal update step,

representing the training loss value of the model with the initial parameter theta on the task Ti, and corresponding to the internal model of the task Ti through the loss valueThe initial parameter theta is subjected to gradient updating so as to obtain an updated parameter of

The weak supervised model with a preference of (a) is,

in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task from the initial model, specifically, a weight wi of the gradient update is set for each task Ti, and the updating operation of the weight is as follows:

，

wherein ,

representing the total loss value after one iteration,

represents a weighted learning rate, t represents the number of iterations,

furthermore, these weights need to satisfy the condition of weight normalization, i.e.

，

Therefore, it is necessary to further perform a normalization operation on the obtained weights, which is specifically shown in the following formula:

，

then, the parameters after local update are obtained through the training of the query set

And obtaining the loss value by using the query set corresponding to each task Ti

And the total loss of each batch is calculated,and updating a parameter theta of the global network, specifically realizing the following formula:

and beta represents the learning rate of the external update,

after repeated iterations, the value of the loss function is continuously reduced, the network model is gradually converged, and finally a trained model can be obtained

。

An intrusion detection identification method comprising:

acquiring intrusion data to be identified;

and calling the intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.

An intrusion detection identification device comprising:

a data acquisition module and a data processing module,

the data acquisition module is used for acquiring intrusion data to be identified,

the data processing module is used for calling the intrusion detection model obtained by adopting the training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.

A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to execute operations corresponding to the method.

A computer apparatus, comprising: the processor, the memory, the communication interface and the communication bus are used for completing mutual communication, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the method.

Drawings

Fig. 1 is a flowchart of a method for training an intrusion detection model according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a multi-channel CNN model of an embodiment of the present application.

Fig. 3 is a flowchart of a meta-training phase of MAML-based network anomaly detection according to an embodiment of the present application.

FIG. 4 is a Loss plot of a model during training for an embodiment of the present application.

FIG. 5 is a comparison of run times for different models.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

In view of the above, it is necessary to provide a method for training an intrusion detection model to solve the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient attack sample data. The method trains a classifier with good generalization capability by using limited samples, and realizes rapid learning and detection of new attack samples.

As shown in fig. 1, an embodiment of the present application provides a method for training an intrusion detection model, where the method includes: acquiring a sample data set, establishing a classification model, and training the classification model by using a primary training method based on the MAML.

In one embodiment, the classification model is a multi-channel CNN model. The application optimizes the multichannel CNN model. Specifically, as shown in fig. 2, the multi-channel CNN model includes: an input layer and a plurality of channels, each channel defining a Block, each Block comprising a two-dimensional convolutional layer (Conv 2D), an activation function selection leakyreu, a 2-dimensional maximum pooling layer (MaxPooling 2D), and a Dropout layer, wherein the Dropout layer is set to 0.2. The device comprises a plurality of channels, and is characterized by further comprising a splicing layer, wherein the splicing layer is used for connecting local features extracted from the plurality of different channels to form a new feature vector, and a full connection layer and an output layer are sequentially arranged behind the splicing layer. In fig. 2, there are two full connection layers, FC (32,8) and FC (1), respectively, the parameter in the parentheses is the dimension of output, and the splicing layer is localization in fig. 2.

The optimized multi-channel CNN model of the present application is described in detail below by way of example.

First, assume that sample x is a one-dimensional vector containing d features, defined as follows:

。

where ci represents the ith feature of the sample. In order to adapt to the input rule of two-dimensional convolution, the dimensions of all samples need to be reshaped into 1 × d × 1, which respectively represents the height, the width and the number of channels;

a Block is defined for each channel, each Block comprises a convolution layer with different convolution kernel sizes, a network Block comprises a plurality of parallel blocks, input data are respectively input into the blocks, feature detection is carried out at different positions of the input data, and local features are extracted from different space channels of a multi-channel vector. According to experimental studies, here three parallel convolutional layer trains are used, with the window sizes of the convolutional kernels set to 1 × 3, 1 × 4 and 1 × 5, with a step size of 1 × 1, as shown in the following equation:

。

where d is the dimension of the input x, c is the characteristic of x, and wj and bj represent the weight and deviation of the offset matrix in the jth channel convolution operation, respectively. kj denotes the convolution kernel size. σ is an activation function that selects LeakyReLU, which accelerates learning convergence by mapping nonlinearities into the data. Unlike relus, leakyreu can avoid overfitting and solve the problem of dead relus by assigning a non-zero slope to all negative values, i.e., some neurons in the network may never be updated. After three independent convolution operations, in order to reduce the complexity of the network, a max pooling layer is used to connect the outputs of the convolution layers, specifically see the following formula:

。

the maximum pooling layer can filter out the features with weaker correlation by down-sampling the feature map of the upper layer, and reserve the strongest correlated information for the lower layer, thereby effectively reducing overfitting.

Next, in the splice layer, the local features extracted from the three different channels are concatenated to form a new feature vector. See in particular the following formula:

。

where C denotes the stitching (Concatenation) operation and F denotes the tiling (flaten) operation, adjusting the data dimension to one-dimensional to accommodate the input of the fully-connected layer. The fully connected layer combines the extracted features to make the best decision, wherein the best decision comprises two hidden layers, the hidden layers respectively have 32 and 8 neurons, and the LeakyReLU activation function is used for enhancing the learning capability of the network, so that the model can learn from the feature map space in a global mode. The probability distribution of tag y in the output layer is calculated by the Softmax activation function:

，

where yi represents the output ith tag value. In the experimental setting, K = 2. The details of the parameter settings of the network model are shown in table 1.

Table 1 parameter setting table

The multichannel CNN model is a neural network specially designed for small sample learning, and the training mode of the multichannel CNN model is different from the traditional supervised learning. Instead of simply dividing the entire data set into a training set and a test set, a meta-training set containing multiple tasks is generated based on the source data set so that each task includes a sample set and a query set for modeling the meta-test set containing the support set and the test set. The following demonstrates how to generate a small sample task from a raw data set.

Given a data set comprising normal samples and N attack type samples:

。

wherein ,

，

0 represents a normal type, others represent N different types of attacks. Thus, a given dataset is divided into N + 1 subsets by label category:

。

wherein ,

refers to the set of all samples (xi, yi) of yi = t. Meta-learning means that the neural network has to deal with tasks that have never been considered, and therefore needs to select an attack category to simulate a small sample scenario of a new attack in real life. For convenience of description, the attack category N is selected as a new network attack category, and is excluded in the training process; the remaining N-1 attack classes are known attacks for training.

First, randomly selecting an attack sample set

As the source of attack data in the task set, where i belongs to

Next, from the normal sample set

And attack sample set

The task set is formed by randomly sampling K samples respectively. The concrete formula is as follows:

，

，

，

。

wherein ,

it is indicated that a random value is generated,

representing a slave data set

And randomly sampling K samples. Query set

The sampling step of (2) is the same as the sample set, which includes H normal and attack class r samples. See in particular the following formulas.

。

。

。

Wherein S and

representing the sample set and query set in each task separately and ensuring that they do not contain duplicate samples, i.e.

. Each task that is finally generated includes 2K samples for training and 2H samples for validation. This process is repeated n times to construct n task sets for training. The task sampling steps of the meta-test set are the same as those of the meta-training set and are respectively represented by a support set P and a test set T, and the difference is that attack samples are selected from a specific subset

Selected and repeated m times. Therefore, n + m tasks are generated, wherein n tasks are used as a meta training set, and m tasks are used as a meta testing set. The sample set in each task contained 2k × n samples, the query set contained 2h × n samples, the number of samples in the support set was 2k × m, and the number of samples in the test set was 2h × m.

After the task set is generated, the task set is input into the optimized multi-channel CNN network for training. Different from the training mode of traditional supervised learning, the small sample classification process carried out by using the MAML framework needs two stages: a meta training phase and a meta testing phase. The basic idea is to try to initially parameterize a multi-channel model from θ at random

And the distribution of a particular task, a parameter that does not necessarily have the best performance for the different classes of data provided during the meta-training phase, but can quickly adapt to a new task that contains an unknown attack.

In the meta-training phase, it is trained based on dual gradient updates, which includes two modules: the internal update module and the external update module are specifically implemented as shown in fig. 3.

training of model with initial parameter theta on task TiTraining the loss value, and performing gradient update on the initial parameter theta of the internal model corresponding to the task Ti through the loss value so as to obtain an update parameter of

The weak supervision model with the preference has good detection performance on specific attacks in corresponding tasks.

In the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task from the initial model, specifically, a weight wi of gradient updating is set for each task Ti, and the updating goal of the weight is to set the value of wi to an optimal value which minimizes the target value in the next iteration t. Local optimization of the model is avoided through automatic learning of the weight, so that overfitting is relieved, and model convergence is promoted to be more stable. The update operation of the weight value is shown as follows:

，

wherein ,

representing the total loss value after one iteration of the operation,

represents the weighted learning rate and t represents the number of iterations.

，

where k is the number of tasks.

Then, training is carried out by inquiringObtaining locally updated parameters

Calculating the total loss of each batch, and updating a parameter theta of the global network, wherein the specific implementation is as shown in the following formula:

and β represents the learning rate of the external update.

After multiple iterations, the value of the loss function is continuously reduced, the network model is gradually converged, and finally a trained model can be obtained

。

In the meta-test stage, m tasks are randomly sampled to verify the generalization capability of the model in order to avoid accidental situations.

Specifically, the sample data set comprises a meta training set Dmeta-train and a meta test set Dmeta-test, the meta training set Dmeta-train comprises a sample set and a query set, the meta test set Dmeta-test comprises a support set and a test set, after training of the classification model is completed, a meta test stage is started, and the meta test stage comprises a fine adjustment stage and a verification stage.

And fine-tuning the model parameters by sample data on the support set, wherein the fine-tuning aims to ensure the detection performance of the model on the attack type by executing a plurality of iteration steps and a small amount of attack type samples which never appear, and the model is quickly adapted to a new task. The concrete implementation is shown in the following formula,

。

where Pi denotes the support set of the ith task, α is the learning rate shared between different tasks in the internal update step,

representing an initial parameter of

The training loss value of the model of (1) on task Ti.

Parameterized new model

On the test set, to the new model

Evaluations were made and averaged to avoid chance of outcome.

The above-described method of the present application is specifically evaluated by experiments as follows.

The first part is the experimental setup and hyper-parameters, which provide the experimental setup and hyper-parameters, performance indicators, and simulation environment. The second part is experimental performance evaluation and analysis, the method is abbreviated as MCCML, the method is compared with a reference method, the effectiveness of each component is proved through ablation experiments, and experimental results are analyzed in detail to verify the performance of the method. The hardware environment code implementation framework used in the specific experiment is a Pythrch.

Specifically, the experimental setup and the hyper-parameters include the following.

The optimal hyper-parameters used by the model of the present application are listed in table 2, by empirical rules and a number of experiments. External update

A global optimization of the model is performed, and therefore the experiment sets the value of beta to be larger than the value of alpha. In the training phase, the number of attack samples K in each task is set to 5. However, to avoid the chance of a model test phase result, here the number of attack samples H per task is set to 15. In addition, the back propagation process of the small sample training is similar to the traditional supervised learning after the front propagation process is completed. Since the set small sample task of abnormality detection is based on the problem of two classifications of supervised learning, there is no problem of data imbalance. Thus, the loss function used in the training process is a binary cross-entropy function. To better train the proposed model, experiments were performed to update network parameters based on the Adam optimization method of Stochastic Gradient Descent (SGD).

TABLE 2 Superparameter settings

Existing common data sets are generated manually in a particular environment containing many normal and abnormal samples and are not suitable for small sample problems. For small sample learning in network intrusion detection, a task set needs to be reconstructed according to the attack type labels. Therefore, a small part of samples are extracted from the existing public data set CICIDS2017 serving as a data source, the samples are packaged into tasks, and a plurality of task sets are reconstructed, wherein the task sets comprise normal and specific attack samples required by experiments. And finally, selecting the most typical five attacks (DDoS, bruteForce, portscan, bot and Web) in the CICIDS2017 data set for experiment. In addition, data preprocessing is a necessary step before training the model, and therefore preprocessing operations are required on the data. As shown in table 3, there are 5 sets of experiments, each set of experiments will select one attack to simulate the detection of an attack on a real unknown sample and three from the remaining four attack types for training, so there are 4 parallel experiments in total per set. Each group of experiments is repeated for multiple times, and the average value is taken as the final evaluation result, so that the model evaluation result is as accurate as possible.

TABLE 3 Experimental grouping

The following are experimental performance evaluations and analyses.

The performance of the proposed new attack intrusion detection method based on MAML will be verified. The setting of the number of iterations may be obtained by observing the variation of the training loss. Fig. 4 shows the Loss plot for the model over 100 iterations. As can be seen from the figure, with the continuous training of the neural network, the loss function achieves a fast convergence in the previous iterations, and remains at a relatively stable level after 60 iterations, with slight oscillations. Therefore, the number of iterations (Eposide) is set to 100.

In order to evaluate the performance of the method MCCML proposed by the present application and its fitting and generalization ability, it was compared with the current widely used reference classifier, which includes the traditional machine learning algorithm: K-Nearest Neighbor (KNN), random Forest (RF); an ensemble learning algorithm: adaboost, bagging algorithm (Bagging), and Gradient Boosting Decision Tree (GBDT). In addition, the benchmark method also comprises some experimental comparisons of the classical deep learning algorithm: MLP, multi-channel CNN (same as the infrastructure network structure in MCCML, trained using traditional supervised learning training methods). All the model methods described above are based on the same reference data set for experiments to achieve a fair comparison of the detection performance of the new task.

Table 4 lists the performance of the proposed method and benchmark method in identifying various unknown attack categories, including accuracy, recall, and F1 index. The bold face portion is the best detection result for each test attack category. The last three columns in table 4 can be viewed as a set of ablation experiments, and the effectiveness of each component in the model is demonstrated by comparative experiments on the three components of the multi-channel CNN, the meta-learning framework, and the weighted gradient update. As can be seen from table 4: (1) Compared with the full-connection layer method, the multichannel convolution method is improved by 3% on average in each index; (2) Compared with the traditional network model training mode, the meta-learning training aiming at the small sample learning provided by the application has the advantages that the overall performance is improved by 6-7%; (3) For a small sample scene, some shallow learning methods are even better than deep learning, because deep learning depends on a large number of sample sets, overfitting can be caused by too few training data, and the performance is poor; (4) The average gradient update rule of the MAML may cause the initial model to be too biased towards certain existing tasks and not be adapted to new tasks. Weighting the gradient updates can make the model more extensive, reducing the problem of the model performing excessively on certain specific tasks. In summary, compared with the traditional machine learning or deep neural network, the method MCCML provided by the present application provides a better detection effect, which is generally superior to the reference method in all indexes, and the worst detection result is comparable to the best result in the reference method.

TABLE 4

To highlight the training efficiency of the proposed model, fig. 5 provides a runtime comparison of different models per iteration. Experimental results show that the calculation speed of the method provided by the application is obviously higher than that of a pure deep learning method. Time consumption is one of the defects of deep learning, and faster detection efficiency and higher performance can be achieved by training through the meta-learning idea. The running time of each iteration of the method reaches 0.652s, which is also comparable to the training efficiency of machine learning. Since small sample learning is a relatively new topic in the field of network intrusion detection, there is little correlation work available for comparison, and there is no reference sample set suitable for testing. Therefore, the CICIDS2017 open source data set is used for reconstructing a detection task set special for small sample learning, and a plurality of related researches using the CICIDS2017 data set are selected for carrying out a benchmark comparison experiment. Determining an abnormal flow as a normal flow is much more detrimental than determining a normal flow as an abnormal flow. The present application compares the proposed algorithm MCCML with siemese, AE-CGAN-RF and ANID methods for the recall rate most interesting for network intrusion prevention systems, as shown in table 5.

TABLE 5

It should be noted that not all reference models use the same data set size. Both AE-CGAN-RF and ANID are not small sample detection methods, and they require a large number of samples to train. Experimental results show that the MCCML method can obtain competitive performance in a new task including unknown attack, has high detection rate on a new attack sample, achieves 95.22% on average, and is superior to all other reference detection methods. In addition, compared with the similar sample method, siamese, the MAML is superior to the Siamese network model in the field of network anomaly detection.

On the basis, an embodiment of the present application further provides an intrusion detection identification method, including:

acquiring intrusion data to be identified; and calling the intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.

On the basis, an embodiment of the present application further provides an intrusion detection and identification device, including:

the data acquisition module is used for acquiring intrusion data to be identified, and the data processing module is used for calling an intrusion detection model obtained by a training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.

On the basis, the embodiment of the present application further provides a computer storage medium, where at least one executable instruction is stored in the computer storage medium, and the executable instruction causes a processor to execute the operation corresponding to the method.

On the basis of the foregoing, an embodiment of the present application further provides a computer apparatus, including: the processor, the memory, the communication interface and the communication bus are used for completing mutual communication, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the method.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for training an intrusion detection model, comprising:

a set of sample data is obtained and,

and training the classification model by using a MAML-based meta-training method.

2. The method of claim 1, wherein the classification model is a multi-channel CNN model.

3. The method of claim 2, wherein the multi-channel CNN model comprises:

4. The method of claim 3, wherein the intrusion detection model is trained,

5. The method of claim 1, wherein the sample data set comprises a meta training set Dmeta-train and a meta test set Dmeta-test, the meta training set Dmeta-train comprises a sample set and a query set, the meta test set Dmeta-test comprises a support set and a test set,

after the training of the classification model is finished, the meta-test stage is entered, the meta-test stage comprises a fine-tuning stage and a verification stage,

，

where Pi denotes the support set of the ith task, α is the learning rate shared between the different tasks in the internal update step,

representing an initial parameter of

The training loss value of the model of (1) on task Ti,

Parameterized new model

On the test set, the new model

Evaluations were made and averaged to avoid chance of outcome.

6. The method for training the intrusion detection model according to claim 1, wherein the training the classification model by the MAML-based meta-training method specifically comprises: training is based on dual gradient updates, including inner updates and outer updates,

representing the training loss value of the model with the initial parameter theta on the task Ti, and carrying out gradient update on the initial parameter theta of the internal model corresponding to the task Ti through the loss value,thereby obtaining an updated parameter of

The weakly supervised model with a preference for (b),

，

wherein ,

representing the total loss value after one iteration,

represents a weighted learning rate, t represents the number of iterations,

，

，

Calculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows:

and beta represents the learning rate of the external update,

。

7. An intrusion detection identification method, comprising:

acquiring intrusion data to be identified;

invoking an intrusion detection model obtained by using a training method of the intrusion detection model according to any one of claims 1-6, and processing the intrusion data to be identified to obtain a processing result.

8. An intrusion detection recognition device, comprising:

a data acquisition module and a data processing module,

the data processing module is used for calling an intrusion detection model obtained by adopting the training method of the intrusion detection model according to any one of claims 1 to 6, and processing the intrusion data to be identified so as to obtain a processing result.

9. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method of any one of claims 1 to 7.

10. A computer device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, and the memory is used for storing at least one executable instruction which enables the processor to execute the corresponding operation of the method according to any one of claims 1 to 7.