CN115563610A - Method and device for training and identifying intrusion detection model - Google Patents

Method and device for training and identifying intrusion detection model Download PDF

Info

Publication number
CN115563610A
CN115563610A CN202211546247.4A CN202211546247A CN115563610A CN 115563610 A CN115563610 A CN 115563610A CN 202211546247 A CN202211546247 A CN 202211546247A CN 115563610 A CN115563610 A CN 115563610A
Authority
CN
China
Prior art keywords
model
training
layer
task
meta
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211546247.4A
Other languages
Chinese (zh)
Other versions
CN115563610B (en
Inventor
左严
杨萍萍
王正荣
王祥伟
汤斌
包寅杰
贾俊铖
胡梦娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu New Hope Technology Co ltd
Original Assignee
Jiangsu New Hope Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu New Hope Technology Co ltd filed Critical Jiangsu New Hope Technology Co ltd
Priority to CN202211546247.4A priority Critical patent/CN115563610B/en
Publication of CN115563610A publication Critical patent/CN115563610A/en
Application granted granted Critical
Publication of CN115563610B publication Critical patent/CN115563610B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a training method, an identification method and a device of an intrusion detection model. The method specifically comprises the following steps: acquiring a sample data set, establishing a classification model, and training the classification model by using a primary training method based on the MAML. The classification model is a multi-channel CNN model. The multi-channel CNN model comprises: the device comprises an input layer and a plurality of channels, wherein each channel defines a Block, each Block comprises a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional maximum pooling layer and a Dropout layer, the device further comprises a splicing layer, the splicing layer is used for connecting local features extracted from the plurality of different channels to form a new feature vector, and a full connection layer and an output layer are sequentially arranged behind the splicing layer. The detection method based on the deep neural network and the meta-learning training thought can well solve the problem that a model cannot be trained due to insufficient attack sample data.

Description

Method and device for training and identifying intrusion detection model
Technical Field
The invention relates to the field of intrusion detection, in particular to a training method, an identification method and a device of an intrusion detection model.
Background
For certain specific types of attacks, most deep learning methods can accurately identify previously trained types of cyber attacks, provided that massive data and sufficient computational resources are provided. However, the current internet environment is changing, and new attack modes are coming up endlessly. For example, zero-day attacks (Zero-day), which refers to attacks that are immediately discovered and exploited to exploit, use a security vulnerability without patches to make a very destructive cyber attack on a system or software application. The depth model needs to be retrained in the face of detection of a new attack, the sample requirements are large and very time consuming. However, it is often difficult for security agencies to obtain enough attack instances in a short time to provide for model training. This leads to the problem that the model cannot be trained due to an insufficient number of samples.
Disclosure of Invention
In view of the above, it is necessary to provide a method for training an intrusion detection model to solve the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient attack sample data. The method trains a classifier with good generalization ability by using limited samples, and realizes quick learning and detection of newly attacked samples.
A method for training an intrusion detection model, comprising:
a sample data set is obtained, and the sample data set is obtained,
a classification model is established, and the classification model is established,
the classification Model was trained by a Model-iterative Meta-Learning (MAML) -based Meta-training method.
The detection method based on the deep neural network and the meta-learning training thought can well solve the problem that a model cannot be trained due to insufficient attack sample data.
In one embodiment, the classification model is a multi-channel CNN model.
In one embodiment, the multi-channel CNN model includes:
an input layer and a plurality of channels, each channel defining a Block, each Block comprising a two-dimensional convolutional layer, an activation function selection LeakyReLU, a 2-dimensional max pooling layer, and a Dropout layer,
the system also comprises a splicing layer which is used for connecting the local features extracted from a plurality of different channels to form a new feature vector,
and a full connecting layer and an output layer are sequentially arranged behind the splicing layer.
In one of the embodiments, the first and second parts of the device,
the probability distribution of the tag y in the output layer is calculated by a Softmax activation function.
In one embodiment, the sample data set comprises a meta training set Dmeta-train comprising a sample set and a query set and a meta test set Dmeta-test comprising a support set and a test set,
after the classification model is trained, entering a meta-test stage, wherein the meta-test stage comprises a fine-tuning stage and a verification stage,
the fine tuning phase comprises: using pre-trained model parameters when the model needs to adapt to a new specific task
Figure 851147DEST_PATH_IMAGE001
And the sample data on the support set are used for fine tuning the model parameters, and the specific implementation is shown as the following formula,
Figure 418526DEST_PATH_IMAGE002
wherein Pi denotes the support set of the ith taskAnd alpha is the learning rate shared between different tasks in the internal update step,
Figure 217854DEST_PATH_IMAGE003
representing an initial parameter of
Figure 741239DEST_PATH_IMAGE001
The training loss value of the model of (1) on task Ti,
the verification phase comprises: after the fine tuning stage, a group is obtained
Figure 978274DEST_PATH_IMAGE004
Parameterized new model
Figure 16637DEST_PATH_IMAGE005
On the test set, the new model
Figure 370389DEST_PATH_IMAGE005
Evaluations were made and averaged to avoid chance of outcome.
In one embodiment, the training of the classification model by the MAML-based meta-training method specifically includes: training is based on dual gradient updates, including inner updates and outer updates,
in the internal update phase, the training loss value on each task Ti is first calculated using the sample set data Si
Figure 13860DEST_PATH_IMAGE006
And optimally updating the local parameter theta of each task Ti along the gradient descending direction, wherein the formula is as follows:
Figure 411343DEST_PATH_IMAGE007
where α is the learning rate shared between different tasks in the internal update step,
Figure 953314DEST_PATH_IMAGE008
representing the training loss value of the model with the initial parameter theta on the task Ti, and corresponding to the internal model of the task Ti through the loss valueThe initial parameter theta is subjected to gradient updating so as to obtain an updated parameter of
Figure 32129DEST_PATH_IMAGE009
The weak supervised model with a preference of (a) is,
in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task from the initial model, specifically, a weight wi of the gradient update is set for each task Ti, and the updating operation of the weight is as follows:
Figure 592423DEST_PATH_IMAGE010
wherein ,
Figure 911540DEST_PATH_IMAGE011
representing the total loss value after one iteration,
Figure 127758DEST_PATH_IMAGE012
represents a weighted learning rate, t represents the number of iterations,
furthermore, these weights need to satisfy the condition of weight normalization, i.e.
Figure 72580DEST_PATH_IMAGE013
Therefore, it is necessary to further perform a normalization operation on the obtained weights, which is specifically shown in the following formula:
Figure 238113DEST_PATH_IMAGE014
then, the parameters after local update are obtained through the training of the query set
Figure 180661DEST_PATH_IMAGE015
And obtaining the loss value by using the query set corresponding to each task Ti
Figure 680913DEST_PATH_IMAGE016
And the total loss of each batch is calculated,and updating a parameter theta of the global network, specifically realizing the following formula:
Figure 914579DEST_PATH_IMAGE017
and beta represents the learning rate of the external update,
after repeated iterations, the value of the loss function is continuously reduced, the network model is gradually converged, and finally a trained model can be obtained
Figure 449466DEST_PATH_IMAGE018
An intrusion detection identification method comprising:
acquiring intrusion data to be identified;
and calling the intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
An intrusion detection identification device comprising:
a data acquisition module and a data processing module,
the data acquisition module is used for acquiring intrusion data to be identified,
the data processing module is used for calling the intrusion detection model obtained by adopting the training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.
A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to execute operations corresponding to the method.
A computer apparatus, comprising: the processor, the memory, the communication interface and the communication bus are used for completing mutual communication, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the method.
Drawings
Fig. 1 is a flowchart of a method for training an intrusion detection model according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a multi-channel CNN model of an embodiment of the present application.
Fig. 3 is a flowchart of a meta-training phase of MAML-based network anomaly detection according to an embodiment of the present application.
FIG. 4 is a Loss plot of a model during training for an embodiment of the present application.
FIG. 5 is a comparison of run times for different models.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
In view of the above, it is necessary to provide a method for training an intrusion detection model to solve the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient attack sample data. The method trains a classifier with good generalization capability by using limited samples, and realizes rapid learning and detection of new attack samples.
As shown in fig. 1, an embodiment of the present application provides a method for training an intrusion detection model, where the method includes: acquiring a sample data set, establishing a classification model, and training the classification model by using a primary training method based on the MAML.
In one embodiment, the classification model is a multi-channel CNN model. The application optimizes the multichannel CNN model. Specifically, as shown in fig. 2, the multi-channel CNN model includes: an input layer and a plurality of channels, each channel defining a Block, each Block comprising a two-dimensional convolutional layer (Conv 2D), an activation function selection leakyreu, a 2-dimensional maximum pooling layer (MaxPooling 2D), and a Dropout layer, wherein the Dropout layer is set to 0.2. The device comprises a plurality of channels, and is characterized by further comprising a splicing layer, wherein the splicing layer is used for connecting local features extracted from the plurality of different channels to form a new feature vector, and a full connection layer and an output layer are sequentially arranged behind the splicing layer. In fig. 2, there are two full connection layers, FC (32,8) and FC (1), respectively, the parameter in the parentheses is the dimension of output, and the splicing layer is localization in fig. 2.
The optimized multi-channel CNN model of the present application is described in detail below by way of example.
First, assume that sample x is a one-dimensional vector containing d features, defined as follows:
Figure 110385DEST_PATH_IMAGE019
where ci represents the ith feature of the sample. In order to adapt to the input rule of two-dimensional convolution, the dimensions of all samples need to be reshaped into 1 × d × 1, which respectively represents the height, the width and the number of channels;
a Block is defined for each channel, each Block comprises a convolution layer with different convolution kernel sizes, a network Block comprises a plurality of parallel blocks, input data are respectively input into the blocks, feature detection is carried out at different positions of the input data, and local features are extracted from different space channels of a multi-channel vector. According to experimental studies, here three parallel convolutional layer trains are used, with the window sizes of the convolutional kernels set to 1 × 3, 1 × 4 and 1 × 5, with a step size of 1 × 1, as shown in the following equation:
Figure 35616DEST_PATH_IMAGE020
where d is the dimension of the input x, c is the characteristic of x, and wj and bj represent the weight and deviation of the offset matrix in the jth channel convolution operation, respectively. kj denotes the convolution kernel size. σ is an activation function that selects LeakyReLU, which accelerates learning convergence by mapping nonlinearities into the data. Unlike relus, leakyreu can avoid overfitting and solve the problem of dead relus by assigning a non-zero slope to all negative values, i.e., some neurons in the network may never be updated. After three independent convolution operations, in order to reduce the complexity of the network, a max pooling layer is used to connect the outputs of the convolution layers, specifically see the following formula:
Figure 322241DEST_PATH_IMAGE021
the maximum pooling layer can filter out the features with weaker correlation by down-sampling the feature map of the upper layer, and reserve the strongest correlated information for the lower layer, thereby effectively reducing overfitting.
Next, in the splice layer, the local features extracted from the three different channels are concatenated to form a new feature vector. See in particular the following formula:
Figure 462366DEST_PATH_IMAGE022
where C denotes the stitching (Concatenation) operation and F denotes the tiling (flaten) operation, adjusting the data dimension to one-dimensional to accommodate the input of the fully-connected layer. The fully connected layer combines the extracted features to make the best decision, wherein the best decision comprises two hidden layers, the hidden layers respectively have 32 and 8 neurons, and the LeakyReLU activation function is used for enhancing the learning capability of the network, so that the model can learn from the feature map space in a global mode. The probability distribution of tag y in the output layer is calculated by the Softmax activation function:
Figure 746717DEST_PATH_IMAGE023
where yi represents the output ith tag value. In the experimental setting, K = 2. The details of the parameter settings of the network model are shown in table 1.
Table 1 parameter setting table
Figure 221561DEST_PATH_IMAGE024
The multichannel CNN model is a neural network specially designed for small sample learning, and the training mode of the multichannel CNN model is different from the traditional supervised learning. Instead of simply dividing the entire data set into a training set and a test set, a meta-training set containing multiple tasks is generated based on the source data set so that each task includes a sample set and a query set for modeling the meta-test set containing the support set and the test set. The following demonstrates how to generate a small sample task from a raw data set.
Given a data set comprising normal samples and N attack type samples:
Figure 791170DEST_PATH_IMAGE025
wherein ,
Figure 972753DEST_PATH_IMAGE026
Figure 490322DEST_PATH_IMAGE027
0 represents a normal type, others represent N different types of attacks. Thus, a given dataset is divided into N + 1 subsets by label category:
Figure 203194DEST_PATH_IMAGE028
wherein ,
Figure 831622DEST_PATH_IMAGE029
refers to the set of all samples (xi, yi) of yi = t. Meta-learning means that the neural network has to deal with tasks that have never been considered, and therefore needs to select an attack category to simulate a small sample scenario of a new attack in real life. For convenience of description, the attack category N is selected as a new network attack category, and is excluded in the training process; the remaining N-1 attack classes are known attacks for training.
First, randomly selecting an attack sample set
Figure 680760DEST_PATH_IMAGE030
As the source of attack data in the task set, where i belongs to
Figure 306913DEST_PATH_IMAGE031
Next, from the normal sample set
Figure 756349DEST_PATH_IMAGE032
And attack sample set
Figure 860572DEST_PATH_IMAGE033
The task set is formed by randomly sampling K samples respectively. The concrete formula is as follows:
Figure 829796DEST_PATH_IMAGE034
Figure 689167DEST_PATH_IMAGE035
Figure 111053DEST_PATH_IMAGE036
Figure 346862DEST_PATH_IMAGE037
wherein ,
Figure 357543DEST_PATH_IMAGE038
it is indicated that a random value is generated,
Figure 138548DEST_PATH_IMAGE039
representing a slave data set
Figure 296997DEST_PATH_IMAGE040
And randomly sampling K samples. Query set
Figure 743022DEST_PATH_IMAGE041
The sampling step of (2) is the same as the sample set, which includes H normal and attack class r samples. See in particular the following formulas.
Figure 686839DEST_PATH_IMAGE042
Figure 888013DEST_PATH_IMAGE043
Figure 346807DEST_PATH_IMAGE044
Wherein S and
Figure 403713DEST_PATH_IMAGE045
representing the sample set and query set in each task separately and ensuring that they do not contain duplicate samples, i.e.
Figure 388986DEST_PATH_IMAGE046
. Each task that is finally generated includes 2K samples for training and 2H samples for validation. This process is repeated n times to construct n task sets for training. The task sampling steps of the meta-test set are the same as those of the meta-training set and are respectively represented by a support set P and a test set T, and the difference is that attack samples are selected from a specific subset
Figure 761062DEST_PATH_IMAGE047
Selected and repeated m times. Therefore, n + m tasks are generated, wherein n tasks are used as a meta training set, and m tasks are used as a meta testing set. The sample set in each task contained 2k × n samples, the query set contained 2h × n samples, the number of samples in the support set was 2k × m, and the number of samples in the test set was 2h × m.
After the task set is generated, the task set is input into the optimized multi-channel CNN network for training. Different from the training mode of traditional supervised learning, the small sample classification process carried out by using the MAML framework needs two stages: a meta training phase and a meta testing phase. The basic idea is to try to initially parameterize a multi-channel model from θ at random
Figure 831786DEST_PATH_IMAGE048
And the distribution of a particular task, a parameter that does not necessarily have the best performance for the different classes of data provided during the meta-training phase, but can quickly adapt to a new task that contains an unknown attack.
In the meta-training phase, it is trained based on dual gradient updates, which includes two modules: the internal update module and the external update module are specifically implemented as shown in fig. 3.
In the internal update phase, the training loss value on each task Ti is first calculated using the sample set data Si
Figure 698242DEST_PATH_IMAGE006
And optimally updating the local parameter theta of each task Ti along the gradient descending direction, wherein the formula is as follows:
Figure 600339DEST_PATH_IMAGE007
where α is the learning rate shared between different tasks in the internal update step,
Figure 80999DEST_PATH_IMAGE008
training of model with initial parameter theta on task TiTraining the loss value, and performing gradient update on the initial parameter theta of the internal model corresponding to the task Ti through the loss value so as to obtain an update parameter of
Figure 186489DEST_PATH_IMAGE009
The weak supervision model with the preference has good detection performance on specific attacks in corresponding tasks.
In the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task from the initial model, specifically, a weight wi of gradient updating is set for each task Ti, and the updating goal of the weight is to set the value of wi to an optimal value which minimizes the target value in the next iteration t. Local optimization of the model is avoided through automatic learning of the weight, so that overfitting is relieved, and model convergence is promoted to be more stable. The update operation of the weight value is shown as follows:
Figure 105904DEST_PATH_IMAGE010
wherein ,
Figure 800190DEST_PATH_IMAGE011
representing the total loss value after one iteration of the operation,
Figure 264801DEST_PATH_IMAGE012
represents the weighted learning rate and t represents the number of iterations.
Furthermore, these weights need to satisfy the condition of weight normalization, i.e.
Figure 372434DEST_PATH_IMAGE049
Therefore, it is necessary to further perform a normalization operation on the obtained weights, which is specifically shown in the following formula:
Figure 767643DEST_PATH_IMAGE050
where k is the number of tasks.
Then, training is carried out by inquiringObtaining locally updated parameters
Figure 129486DEST_PATH_IMAGE015
And obtaining the loss value by using the query set corresponding to each task Ti
Figure 217527DEST_PATH_IMAGE051
Calculating the total loss of each batch, and updating a parameter theta of the global network, wherein the specific implementation is as shown in the following formula:
Figure 546877DEST_PATH_IMAGE052
and β represents the learning rate of the external update.
After multiple iterations, the value of the loss function is continuously reduced, the network model is gradually converged, and finally a trained model can be obtained
Figure 293248DEST_PATH_IMAGE018
In the meta-test stage, m tasks are randomly sampled to verify the generalization capability of the model in order to avoid accidental situations.
Specifically, the sample data set comprises a meta training set Dmeta-train and a meta test set Dmeta-test, the meta training set Dmeta-train comprises a sample set and a query set, the meta test set Dmeta-test comprises a support set and a test set, after training of the classification model is completed, a meta test stage is started, and the meta test stage comprises a fine adjustment stage and a verification stage.
The fine tuning phase comprises: using pre-trained model parameters when the model needs to adapt to a new specific task
Figure 962126DEST_PATH_IMAGE053
And fine-tuning the model parameters by sample data on the support set, wherein the fine-tuning aims to ensure the detection performance of the model on the attack type by executing a plurality of iteration steps and a small amount of attack type samples which never appear, and the model is quickly adapted to a new task. The concrete implementation is shown in the following formula,
Figure 283386DEST_PATH_IMAGE002
where Pi denotes the support set of the ith task, α is the learning rate shared between different tasks in the internal update step,
Figure 585186DEST_PATH_IMAGE054
representing an initial parameter of
Figure 587777DEST_PATH_IMAGE055
The training loss value of the model of (1) on task Ti.
The verification phase comprises: after the fine tuning stage, a group is obtained
Figure 173479DEST_PATH_IMAGE056
Parameterized new model
Figure 150793DEST_PATH_IMAGE005
On the test set, to the new model
Figure 392419DEST_PATH_IMAGE005
Evaluations were made and averaged to avoid chance of outcome.
The above-described method of the present application is specifically evaluated by experiments as follows.
The first part is the experimental setup and hyper-parameters, which provide the experimental setup and hyper-parameters, performance indicators, and simulation environment. The second part is experimental performance evaluation and analysis, the method is abbreviated as MCCML, the method is compared with a reference method, the effectiveness of each component is proved through ablation experiments, and experimental results are analyzed in detail to verify the performance of the method. The hardware environment code implementation framework used in the specific experiment is a Pythrch.
Specifically, the experimental setup and the hyper-parameters include the following.
The optimal hyper-parameters used by the model of the present application are listed in table 2, by empirical rules and a number of experiments. External update
A global optimization of the model is performed, and therefore the experiment sets the value of beta to be larger than the value of alpha. In the training phase, the number of attack samples K in each task is set to 5. However, to avoid the chance of a model test phase result, here the number of attack samples H per task is set to 15. In addition, the back propagation process of the small sample training is similar to the traditional supervised learning after the front propagation process is completed. Since the set small sample task of abnormality detection is based on the problem of two classifications of supervised learning, there is no problem of data imbalance. Thus, the loss function used in the training process is a binary cross-entropy function. To better train the proposed model, experiments were performed to update network parameters based on the Adam optimization method of Stochastic Gradient Descent (SGD).
TABLE 2 Superparameter settings
Figure 995439DEST_PATH_IMAGE057
Existing common data sets are generated manually in a particular environment containing many normal and abnormal samples and are not suitable for small sample problems. For small sample learning in network intrusion detection, a task set needs to be reconstructed according to the attack type labels. Therefore, a small part of samples are extracted from the existing public data set CICIDS2017 serving as a data source, the samples are packaged into tasks, and a plurality of task sets are reconstructed, wherein the task sets comprise normal and specific attack samples required by experiments. And finally, selecting the most typical five attacks (DDoS, bruteForce, portscan, bot and Web) in the CICIDS2017 data set for experiment. In addition, data preprocessing is a necessary step before training the model, and therefore preprocessing operations are required on the data. As shown in table 3, there are 5 sets of experiments, each set of experiments will select one attack to simulate the detection of an attack on a real unknown sample and three from the remaining four attack types for training, so there are 4 parallel experiments in total per set. Each group of experiments is repeated for multiple times, and the average value is taken as the final evaluation result, so that the model evaluation result is as accurate as possible.
TABLE 3 Experimental grouping
Figure 373330DEST_PATH_IMAGE058
The following are experimental performance evaluations and analyses.
The performance of the proposed new attack intrusion detection method based on MAML will be verified. The setting of the number of iterations may be obtained by observing the variation of the training loss. Fig. 4 shows the Loss plot for the model over 100 iterations. As can be seen from the figure, with the continuous training of the neural network, the loss function achieves a fast convergence in the previous iterations, and remains at a relatively stable level after 60 iterations, with slight oscillations. Therefore, the number of iterations (Eposide) is set to 100.
In order to evaluate the performance of the method MCCML proposed by the present application and its fitting and generalization ability, it was compared with the current widely used reference classifier, which includes the traditional machine learning algorithm: K-Nearest Neighbor (KNN), random Forest (RF); an ensemble learning algorithm: adaboost, bagging algorithm (Bagging), and Gradient Boosting Decision Tree (GBDT). In addition, the benchmark method also comprises some experimental comparisons of the classical deep learning algorithm: MLP, multi-channel CNN (same as the infrastructure network structure in MCCML, trained using traditional supervised learning training methods). All the model methods described above are based on the same reference data set for experiments to achieve a fair comparison of the detection performance of the new task.
Table 4 lists the performance of the proposed method and benchmark method in identifying various unknown attack categories, including accuracy, recall, and F1 index. The bold face portion is the best detection result for each test attack category. The last three columns in table 4 can be viewed as a set of ablation experiments, and the effectiveness of each component in the model is demonstrated by comparative experiments on the three components of the multi-channel CNN, the meta-learning framework, and the weighted gradient update. As can be seen from table 4: (1) Compared with the full-connection layer method, the multichannel convolution method is improved by 3% on average in each index; (2) Compared with the traditional network model training mode, the meta-learning training aiming at the small sample learning provided by the application has the advantages that the overall performance is improved by 6-7%; (3) For a small sample scene, some shallow learning methods are even better than deep learning, because deep learning depends on a large number of sample sets, overfitting can be caused by too few training data, and the performance is poor; (4) The average gradient update rule of the MAML may cause the initial model to be too biased towards certain existing tasks and not be adapted to new tasks. Weighting the gradient updates can make the model more extensive, reducing the problem of the model performing excessively on certain specific tasks. In summary, compared with the traditional machine learning or deep neural network, the method MCCML provided by the present application provides a better detection effect, which is generally superior to the reference method in all indexes, and the worst detection result is comparable to the best result in the reference method.
TABLE 4
Figure 804703DEST_PATH_IMAGE059
To highlight the training efficiency of the proposed model, fig. 5 provides a runtime comparison of different models per iteration. Experimental results show that the calculation speed of the method provided by the application is obviously higher than that of a pure deep learning method. Time consumption is one of the defects of deep learning, and faster detection efficiency and higher performance can be achieved by training through the meta-learning idea. The running time of each iteration of the method reaches 0.652s, which is also comparable to the training efficiency of machine learning. Since small sample learning is a relatively new topic in the field of network intrusion detection, there is little correlation work available for comparison, and there is no reference sample set suitable for testing. Therefore, the CICIDS2017 open source data set is used for reconstructing a detection task set special for small sample learning, and a plurality of related researches using the CICIDS2017 data set are selected for carrying out a benchmark comparison experiment. Determining an abnormal flow as a normal flow is much more detrimental than determining a normal flow as an abnormal flow. The present application compares the proposed algorithm MCCML with siemese, AE-CGAN-RF and ANID methods for the recall rate most interesting for network intrusion prevention systems, as shown in table 5.
TABLE 5
Figure 330363DEST_PATH_IMAGE060
It should be noted that not all reference models use the same data set size. Both AE-CGAN-RF and ANID are not small sample detection methods, and they require a large number of samples to train. Experimental results show that the MCCML method can obtain competitive performance in a new task including unknown attack, has high detection rate on a new attack sample, achieves 95.22% on average, and is superior to all other reference detection methods. In addition, compared with the similar sample method, siamese, the MAML is superior to the Siamese network model in the field of network anomaly detection.
On the basis, an embodiment of the present application further provides an intrusion detection identification method, including:
acquiring intrusion data to be identified; and calling the intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
On the basis, an embodiment of the present application further provides an intrusion detection and identification device, including:
the data acquisition module is used for acquiring intrusion data to be identified, and the data processing module is used for calling an intrusion detection model obtained by a training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.
On the basis, the embodiment of the present application further provides a computer storage medium, where at least one executable instruction is stored in the computer storage medium, and the executable instruction causes a processor to execute the operation corresponding to the method.
On the basis of the foregoing, an embodiment of the present application further provides a computer apparatus, including: the processor, the memory, the communication interface and the communication bus are used for completing mutual communication, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the method.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for training an intrusion detection model, comprising:
a set of sample data is obtained and,
a classification model is established, and the classification model is established,
and training the classification model by using a MAML-based meta-training method.
2. The method of claim 1, wherein the classification model is a multi-channel CNN model.
3. The method of claim 2, wherein the multi-channel CNN model comprises:
an input layer and a plurality of channels, each channel defining a Block, each Block comprising a two-dimensional convolutional layer, an activation function selection LeakyReLU, a 2-dimensional max pooling layer, and a Dropout layer,
the system also comprises a splicing layer which is used for connecting the local features extracted from a plurality of different channels to form a new feature vector,
and a full connecting layer and an output layer are sequentially arranged behind the splicing layer.
4. The method of claim 3, wherein the intrusion detection model is trained,
the probability distribution of the tag y in the output layer is calculated by a Softmax activation function.
5. The method of claim 1, wherein the sample data set comprises a meta training set Dmeta-train and a meta test set Dmeta-test, the meta training set Dmeta-train comprises a sample set and a query set, the meta test set Dmeta-test comprises a support set and a test set,
after the training of the classification model is finished, the meta-test stage is entered, the meta-test stage comprises a fine-tuning stage and a verification stage,
the fine tuning phase comprises: using pre-trained model parameters when the model needs to adapt to a new specific task
Figure 855542DEST_PATH_IMAGE001
And the sample data on the support set are used for fine tuning the model parameters, and the specific implementation is shown as the following formula,
Figure 166569DEST_PATH_IMAGE002
where Pi denotes the support set of the ith task, α is the learning rate shared between the different tasks in the internal update step,
Figure 672637DEST_PATH_IMAGE003
representing an initial parameter of
Figure 951171DEST_PATH_IMAGE001
The training loss value of the model of (1) on task Ti,
the verification phase comprises: after the fine tuning stage, a group is obtained
Figure 646726DEST_PATH_IMAGE004
Parameterized new model
Figure 61527DEST_PATH_IMAGE005
On the test set, the new model
Figure 738496DEST_PATH_IMAGE005
Evaluations were made and averaged to avoid chance of outcome.
6. The method for training the intrusion detection model according to claim 1, wherein the training the classification model by the MAML-based meta-training method specifically comprises: training is based on dual gradient updates, including inner updates and outer updates,
in the internal update phase, the training loss value on each task Ti is first calculated using the sample set data Si
Figure 989480DEST_PATH_IMAGE006
And optimally updating the local parameter theta of each task Ti along the gradient descending direction, wherein the formula is as follows:
Figure 737993DEST_PATH_IMAGE007
where α is the learning rate shared between different tasks in the internal update step,
Figure 758033DEST_PATH_IMAGE008
representing the training loss value of the model with the initial parameter theta on the task Ti, and carrying out gradient update on the initial parameter theta of the internal model corresponding to the task Ti through the loss value,thereby obtaining an updated parameter of
Figure 933799DEST_PATH_IMAGE009
The weakly supervised model with a preference for (b),
in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task from the initial model, specifically, a weight wi of the gradient update is set for each task Ti, and the updating operation of the weight is as follows:
Figure 666220DEST_PATH_IMAGE010
wherein ,
Figure 952845DEST_PATH_IMAGE011
representing the total loss value after one iteration,
Figure 92970DEST_PATH_IMAGE012
represents a weighted learning rate, t represents the number of iterations,
furthermore, these weights need to satisfy the condition of weight normalization, i.e.
Figure 439638DEST_PATH_IMAGE013
Therefore, it is necessary to further perform a normalization operation on the obtained weights, which is specifically shown in the following formula:
Figure 337318DEST_PATH_IMAGE014
then, the parameters after local update are obtained through the training of the query set
Figure 178366DEST_PATH_IMAGE015
And obtaining the loss value by using the query set corresponding to each task Ti
Figure 422265DEST_PATH_IMAGE016
Calculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows:
Figure 877518DEST_PATH_IMAGE017
and beta represents the learning rate of the external update,
after multiple iterations, the value of the loss function is continuously reduced, the network model is gradually converged, and finally a trained model can be obtained
Figure 590390DEST_PATH_IMAGE018
7. An intrusion detection identification method, comprising:
acquiring intrusion data to be identified;
invoking an intrusion detection model obtained by using a training method of the intrusion detection model according to any one of claims 1-6, and processing the intrusion data to be identified to obtain a processing result.
8. An intrusion detection recognition device, comprising:
a data acquisition module and a data processing module,
the data acquisition module is used for acquiring intrusion data to be identified,
the data processing module is used for calling an intrusion detection model obtained by adopting the training method of the intrusion detection model according to any one of claims 1 to 6, and processing the intrusion data to be identified so as to obtain a processing result.
9. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method of any one of claims 1 to 7.
10. A computer device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, and the memory is used for storing at least one executable instruction which enables the processor to execute the corresponding operation of the method according to any one of claims 1 to 7.
CN202211546247.4A 2022-12-05 2022-12-05 Training method, recognition method and device for intrusion detection model Active CN115563610B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211546247.4A CN115563610B (en) 2022-12-05 2022-12-05 Training method, recognition method and device for intrusion detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211546247.4A CN115563610B (en) 2022-12-05 2022-12-05 Training method, recognition method and device for intrusion detection model

Publications (2)

Publication Number Publication Date
CN115563610A true CN115563610A (en) 2023-01-03
CN115563610B CN115563610B (en) 2023-05-30

Family

ID=84770287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211546247.4A Active CN115563610B (en) 2022-12-05 2022-12-05 Training method, recognition method and device for intrusion detection model

Country Status (1)

Country Link
CN (1) CN115563610B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618353A (en) * 2022-10-21 2023-01-17 北京珞安科技有限责任公司 Identification system and method for industrial production safety
CN116389175A (en) * 2023-06-07 2023-07-04 鹏城实验室 Flow data detection method, training method, system, equipment and medium
CN116821907A (en) * 2023-06-29 2023-09-29 哈尔滨工业大学 Drop-MAML-based small sample learning intrusion detection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110365659A (en) * 2019-06-26 2019-10-22 浙江大学 A kind of building method of network invasion monitoring data set under small sample scene
CN110808945A (en) * 2019-09-11 2020-02-18 浙江大学 Network intrusion detection method in small sample scene based on meta-learning
CN113037730A (en) * 2021-02-27 2021-06-25 中国人民解放军战略支援部队信息工程大学 Network encryption traffic classification method and system based on multi-feature learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110365659A (en) * 2019-06-26 2019-10-22 浙江大学 A kind of building method of network invasion monitoring data set under small sample scene
CN110808945A (en) * 2019-09-11 2020-02-18 浙江大学 Network intrusion detection method in small sample scene based on meta-learning
CN113037730A (en) * 2021-02-27 2021-06-25 中国人民解放军战略支援部队信息工程大学 Network encryption traffic classification method and system based on multi-feature learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618353A (en) * 2022-10-21 2023-01-17 北京珞安科技有限责任公司 Identification system and method for industrial production safety
CN115618353B (en) * 2022-10-21 2024-01-23 北京珞安科技有限责任公司 Industrial production safety identification system and method
CN116389175A (en) * 2023-06-07 2023-07-04 鹏城实验室 Flow data detection method, training method, system, equipment and medium
CN116389175B (en) * 2023-06-07 2023-08-22 鹏城实验室 Flow data detection method, training method, system, equipment and medium
CN116821907A (en) * 2023-06-29 2023-09-29 哈尔滨工业大学 Drop-MAML-based small sample learning intrusion detection method
CN116821907B (en) * 2023-06-29 2024-02-02 哈尔滨工业大学 Drop-MAML-based small sample learning intrusion detection method

Also Published As

Publication number Publication date
CN115563610B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN115563610A (en) Method and device for training and identifying intrusion detection model
CN111783442A (en) Intrusion detection method, device, server and storage medium
Usman et al. Filter-based multi-objective feature selection using NSGA III and cuckoo optimization algorithm
CN115331732A (en) Gene phenotype training and predicting method and device based on graph neural network
CN107368526A (en) A kind of data processing method and device
CN113541985B (en) Internet of things fault diagnosis method, model training method and related devices
CN114511330B (en) Ether house Pompe fraudster detection method and system based on improved CNN-RF
CN113591962B (en) Network attack sample generation method and device
CN114417739A (en) Method and device for recommending process parameters under abnormal working conditions
Lehavi et al. Feature reduction method comparison towards explainability and efficiency in cybersecurity intrusion detection systems
US20050278352A1 (en) Using affinity measures with supervised classifiers
US20200210810A1 (en) Fusing output of artificial intelligence networks
EP3926553A1 (en) Post-processing output data of a classifier
Kuchipudi et al. Android Malware Detection using Ensemble Learning
Amarnath et al. Metaheuristic approach for efficient feature selection: A data classification perspective
Ali A New Intrusion Detection Strategy Based on Combined Feature Selection Methodology and Machine Learning Technique.
US20240054369A1 (en) Ai-based selection using cascaded model explanations
CN117745423B (en) Abnormal account identification method
CN113162914B (en) Intrusion detection method and system based on Taylor neural network
CN113076695B (en) Ionosphere high-dimensional data feature selection method based on improved BBA algorithm
Chen et al. Malicious behaviour identification for Android based on an RBF neural network
Srinivas et al. Feature selection algorithms: A comparative study
CN109905340B (en) Feature optimization function selection method and device and electronic equipment
Xiao et al. Identification of IoT Devices Using A Multiple Transformers Single Estimator (MTSE) Learning Pipeline
Chiu et al. Graph Sparsifications using Neural Network Assisted Monte Carlo Tree Search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant