CN115563610A - Method and device for training and identifying intrusion detection model - Google Patents
Method and device for training and identifying intrusion detection model Download PDFInfo
- Publication number
- CN115563610A CN115563610A CN202211546247.4A CN202211546247A CN115563610A CN 115563610 A CN115563610 A CN 115563610A CN 202211546247 A CN202211546247 A CN 202211546247A CN 115563610 A CN115563610 A CN 115563610A
- Authority
- CN
- China
- Prior art keywords
- model
- training
- layer
- task
- meta
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000001514 detection method Methods 0.000 title claims abstract description 55
- 238000013145 classification model Methods 0.000 claims abstract description 19
- 230000004913 activation Effects 0.000 claims abstract description 9
- 238000011176 pooling Methods 0.000 claims abstract description 6
- 101100455978 Arabidopsis thaliana MAM1 gene Proteins 0.000 claims abstract 3
- 238000012360 testing method Methods 0.000 claims description 34
- 238000012545 processing Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 14
- 238000004891 communication Methods 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 230000009977 dual effect Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 abstract description 14
- 238000013528 artificial neural network Methods 0.000 abstract description 6
- 239000000523 sample Substances 0.000 description 41
- 238000002474 experimental method Methods 0.000 description 16
- 230000000875 corresponding effect Effects 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000007430 reference method Methods 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001066 destructive effect Effects 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 101150049349 setA gene Proteins 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a training method, an identification method and a device of an intrusion detection model. The method specifically comprises the following steps: acquiring a sample data set, establishing a classification model, and training the classification model by using a primary training method based on the MAML. The classification model is a multi-channel CNN model. The multi-channel CNN model comprises: the device comprises an input layer and a plurality of channels, wherein each channel defines a Block, each Block comprises a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional maximum pooling layer and a Dropout layer, the device further comprises a splicing layer, the splicing layer is used for connecting local features extracted from the plurality of different channels to form a new feature vector, and a full connection layer and an output layer are sequentially arranged behind the splicing layer. The detection method based on the deep neural network and the meta-learning training thought can well solve the problem that a model cannot be trained due to insufficient attack sample data.
Description
Technical Field
The invention relates to the field of intrusion detection, in particular to a training method, an identification method and a device of an intrusion detection model.
Background
For certain specific types of attacks, most deep learning methods can accurately identify previously trained types of cyber attacks, provided that massive data and sufficient computational resources are provided. However, the current internet environment is changing, and new attack modes are coming up endlessly. For example, zero-day attacks (Zero-day), which refers to attacks that are immediately discovered and exploited to exploit, use a security vulnerability without patches to make a very destructive cyber attack on a system or software application. The depth model needs to be retrained in the face of detection of a new attack, the sample requirements are large and very time consuming. However, it is often difficult for security agencies to obtain enough attack instances in a short time to provide for model training. This leads to the problem that the model cannot be trained due to an insufficient number of samples.
Disclosure of Invention
In view of the above, it is necessary to provide a method for training an intrusion detection model to solve the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient attack sample data. The method trains a classifier with good generalization ability by using limited samples, and realizes quick learning and detection of newly attacked samples.
A method for training an intrusion detection model, comprising:
a sample data set is obtained, and the sample data set is obtained,
a classification model is established, and the classification model is established,
the classification Model was trained by a Model-iterative Meta-Learning (MAML) -based Meta-training method.
The detection method based on the deep neural network and the meta-learning training thought can well solve the problem that a model cannot be trained due to insufficient attack sample data.
In one embodiment, the classification model is a multi-channel CNN model.
In one embodiment, the multi-channel CNN model includes:
an input layer and a plurality of channels, each channel defining a Block, each Block comprising a two-dimensional convolutional layer, an activation function selection LeakyReLU, a 2-dimensional max pooling layer, and a Dropout layer,
the system also comprises a splicing layer which is used for connecting the local features extracted from a plurality of different channels to form a new feature vector,
and a full connecting layer and an output layer are sequentially arranged behind the splicing layer.
In one of the embodiments, the first and second parts of the device,
the probability distribution of the tag y in the output layer is calculated by a Softmax activation function.
In one embodiment, the sample data set comprises a meta training set Dmeta-train comprising a sample set and a query set and a meta test set Dmeta-test comprising a support set and a test set,
after the classification model is trained, entering a meta-test stage, wherein the meta-test stage comprises a fine-tuning stage and a verification stage,
the fine tuning phase comprises: using pre-trained model parameters when the model needs to adapt to a new specific taskAnd the sample data on the support set are used for fine tuning the model parameters, and the specific implementation is shown as the following formula,
wherein Pi denotes the support set of the ith taskAnd alpha is the learning rate shared between different tasks in the internal update step,representing an initial parameter ofThe training loss value of the model of (1) on task Ti,
the verification phase comprises: after the fine tuning stage, a group is obtainedParameterized new modelOn the test set, the new modelEvaluations were made and averaged to avoid chance of outcome.
In one embodiment, the training of the classification model by the MAML-based meta-training method specifically includes: training is based on dual gradient updates, including inner updates and outer updates,
in the internal update phase, the training loss value on each task Ti is first calculated using the sample set data SiAnd optimally updating the local parameter theta of each task Ti along the gradient descending direction, wherein the formula is as follows:where α is the learning rate shared between different tasks in the internal update step,representing the training loss value of the model with the initial parameter theta on the task Ti, and corresponding to the internal model of the task Ti through the loss valueThe initial parameter theta is subjected to gradient updating so as to obtain an updated parameter ofThe weak supervised model with a preference of (a) is,
in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task from the initial model, specifically, a weight wi of the gradient update is set for each task Ti, and the updating operation of the weight is as follows:
wherein ,representing the total loss value after one iteration,represents a weighted learning rate, t represents the number of iterations,
Therefore, it is necessary to further perform a normalization operation on the obtained weights, which is specifically shown in the following formula:
then, the parameters after local update are obtained through the training of the query setAnd obtaining the loss value by using the query set corresponding to each task TiAnd the total loss of each batch is calculated,and updating a parameter theta of the global network, specifically realizing the following formula:
after repeated iterations, the value of the loss function is continuously reduced, the network model is gradually converged, and finally a trained model can be obtained。
An intrusion detection identification method comprising:
acquiring intrusion data to be identified;
and calling the intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
An intrusion detection identification device comprising:
a data acquisition module and a data processing module,
the data acquisition module is used for acquiring intrusion data to be identified,
the data processing module is used for calling the intrusion detection model obtained by adopting the training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.
A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to execute operations corresponding to the method.
A computer apparatus, comprising: the processor, the memory, the communication interface and the communication bus are used for completing mutual communication, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the method.
Drawings
Fig. 1 is a flowchart of a method for training an intrusion detection model according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a multi-channel CNN model of an embodiment of the present application.
Fig. 3 is a flowchart of a meta-training phase of MAML-based network anomaly detection according to an embodiment of the present application.
FIG. 4 is a Loss plot of a model during training for an embodiment of the present application.
FIG. 5 is a comparison of run times for different models.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
In view of the above, it is necessary to provide a method for training an intrusion detection model to solve the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient attack sample data. The method trains a classifier with good generalization capability by using limited samples, and realizes rapid learning and detection of new attack samples.
As shown in fig. 1, an embodiment of the present application provides a method for training an intrusion detection model, where the method includes: acquiring a sample data set, establishing a classification model, and training the classification model by using a primary training method based on the MAML.
In one embodiment, the classification model is a multi-channel CNN model. The application optimizes the multichannel CNN model. Specifically, as shown in fig. 2, the multi-channel CNN model includes: an input layer and a plurality of channels, each channel defining a Block, each Block comprising a two-dimensional convolutional layer (Conv 2D), an activation function selection leakyreu, a 2-dimensional maximum pooling layer (MaxPooling 2D), and a Dropout layer, wherein the Dropout layer is set to 0.2. The device comprises a plurality of channels, and is characterized by further comprising a splicing layer, wherein the splicing layer is used for connecting local features extracted from the plurality of different channels to form a new feature vector, and a full connection layer and an output layer are sequentially arranged behind the splicing layer. In fig. 2, there are two full connection layers, FC (32,8) and FC (1), respectively, the parameter in the parentheses is the dimension of output, and the splicing layer is localization in fig. 2.
The optimized multi-channel CNN model of the present application is described in detail below by way of example.
First, assume that sample x is a one-dimensional vector containing d features, defined as follows:
where ci represents the ith feature of the sample. In order to adapt to the input rule of two-dimensional convolution, the dimensions of all samples need to be reshaped into 1 × d × 1, which respectively represents the height, the width and the number of channels;
a Block is defined for each channel, each Block comprises a convolution layer with different convolution kernel sizes, a network Block comprises a plurality of parallel blocks, input data are respectively input into the blocks, feature detection is carried out at different positions of the input data, and local features are extracted from different space channels of a multi-channel vector. According to experimental studies, here three parallel convolutional layer trains are used, with the window sizes of the convolutional kernels set to 1 × 3, 1 × 4 and 1 × 5, with a step size of 1 × 1, as shown in the following equation:
where d is the dimension of the input x, c is the characteristic of x, and wj and bj represent the weight and deviation of the offset matrix in the jth channel convolution operation, respectively. kj denotes the convolution kernel size. σ is an activation function that selects LeakyReLU, which accelerates learning convergence by mapping nonlinearities into the data. Unlike relus, leakyreu can avoid overfitting and solve the problem of dead relus by assigning a non-zero slope to all negative values, i.e., some neurons in the network may never be updated. After three independent convolution operations, in order to reduce the complexity of the network, a max pooling layer is used to connect the outputs of the convolution layers, specifically see the following formula:。
the maximum pooling layer can filter out the features with weaker correlation by down-sampling the feature map of the upper layer, and reserve the strongest correlated information for the lower layer, thereby effectively reducing overfitting.
Next, in the splice layer, the local features extracted from the three different channels are concatenated to form a new feature vector. See in particular the following formula:
where C denotes the stitching (Concatenation) operation and F denotes the tiling (flaten) operation, adjusting the data dimension to one-dimensional to accommodate the input of the fully-connected layer. The fully connected layer combines the extracted features to make the best decision, wherein the best decision comprises two hidden layers, the hidden layers respectively have 32 and 8 neurons, and the LeakyReLU activation function is used for enhancing the learning capability of the network, so that the model can learn from the feature map space in a global mode. The probability distribution of tag y in the output layer is calculated by the Softmax activation function:
where yi represents the output ith tag value. In the experimental setting, K = 2. The details of the parameter settings of the network model are shown in table 1.
Table 1 parameter setting table
The multichannel CNN model is a neural network specially designed for small sample learning, and the training mode of the multichannel CNN model is different from the traditional supervised learning. Instead of simply dividing the entire data set into a training set and a test set, a meta-training set containing multiple tasks is generated based on the source data set so that each task includes a sample set and a query set for modeling the meta-test set containing the support set and the test set. The following demonstrates how to generate a small sample task from a raw data set.
Given a data set comprising normal samples and N attack type samples:
wherein ,,0 represents a normal type, others represent N different types of attacks. Thus, a given dataset is divided into N + 1 subsets by label category:
wherein ,refers to the set of all samples (xi, yi) of yi = t. Meta-learning means that the neural network has to deal with tasks that have never been considered, and therefore needs to select an attack category to simulate a small sample scenario of a new attack in real life. For convenience of description, the attack category N is selected as a new network attack category, and is excluded in the training process; the remaining N-1 attack classes are known attacks for training.
First, randomly selecting an attack sample setAs the source of attack data in the task set, where i belongs toNext, from the normal sample setAnd attack sample setThe task set is formed by randomly sampling K samples respectively. The concrete formula is as follows:
wherein ,it is indicated that a random value is generated,representing a slave data setAnd randomly sampling K samples. Query setThe sampling step of (2) is the same as the sample set, which includes H normal and attack class r samples. See in particular the following formulas.
Wherein S andrepresenting the sample set and query set in each task separately and ensuring that they do not contain duplicate samples, i.e.. Each task that is finally generated includes 2K samples for training and 2H samples for validation. This process is repeated n times to construct n task sets for training. The task sampling steps of the meta-test set are the same as those of the meta-training set and are respectively represented by a support set P and a test set T, and the difference is that attack samples are selected from a specific subsetSelected and repeated m times. Therefore, n + m tasks are generated, wherein n tasks are used as a meta training set, and m tasks are used as a meta testing set. The sample set in each task contained 2k × n samples, the query set contained 2h × n samples, the number of samples in the support set was 2k × m, and the number of samples in the test set was 2h × m.
After the task set is generated, the task set is input into the optimized multi-channel CNN network for training. Different from the training mode of traditional supervised learning, the small sample classification process carried out by using the MAML framework needs two stages: a meta training phase and a meta testing phase. The basic idea is to try to initially parameterize a multi-channel model from θ at randomAnd the distribution of a particular task, a parameter that does not necessarily have the best performance for the different classes of data provided during the meta-training phase, but can quickly adapt to a new task that contains an unknown attack.
In the meta-training phase, it is trained based on dual gradient updates, which includes two modules: the internal update module and the external update module are specifically implemented as shown in fig. 3.
In the internal update phase, the training loss value on each task Ti is first calculated using the sample set data SiAnd optimally updating the local parameter theta of each task Ti along the gradient descending direction, wherein the formula is as follows:where α is the learning rate shared between different tasks in the internal update step,training of model with initial parameter theta on task TiTraining the loss value, and performing gradient update on the initial parameter theta of the internal model corresponding to the task Ti through the loss value so as to obtain an update parameter ofThe weak supervision model with the preference has good detection performance on specific attacks in corresponding tasks.
In the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task from the initial model, specifically, a weight wi of gradient updating is set for each task Ti, and the updating goal of the weight is to set the value of wi to an optimal value which minimizes the target value in the next iteration t. Local optimization of the model is avoided through automatic learning of the weight, so that overfitting is relieved, and model convergence is promoted to be more stable. The update operation of the weight value is shown as follows:
wherein ,representing the total loss value after one iteration of the operation,represents the weighted learning rate and t represents the number of iterations.
Therefore, it is necessary to further perform a normalization operation on the obtained weights, which is specifically shown in the following formula:
Then, training is carried out by inquiringObtaining locally updated parametersAnd obtaining the loss value by using the query set corresponding to each task TiCalculating the total loss of each batch, and updating a parameter theta of the global network, wherein the specific implementation is as shown in the following formula:
After multiple iterations, the value of the loss function is continuously reduced, the network model is gradually converged, and finally a trained model can be obtained。
In the meta-test stage, m tasks are randomly sampled to verify the generalization capability of the model in order to avoid accidental situations.
Specifically, the sample data set comprises a meta training set Dmeta-train and a meta test set Dmeta-test, the meta training set Dmeta-train comprises a sample set and a query set, the meta test set Dmeta-test comprises a support set and a test set, after training of the classification model is completed, a meta test stage is started, and the meta test stage comprises a fine adjustment stage and a verification stage.
The fine tuning phase comprises: using pre-trained model parameters when the model needs to adapt to a new specific taskAnd fine-tuning the model parameters by sample data on the support set, wherein the fine-tuning aims to ensure the detection performance of the model on the attack type by executing a plurality of iteration steps and a small amount of attack type samples which never appear, and the model is quickly adapted to a new task. The concrete implementation is shown in the following formula,
where Pi denotes the support set of the ith task, α is the learning rate shared between different tasks in the internal update step,representing an initial parameter ofThe training loss value of the model of (1) on task Ti.
The verification phase comprises: after the fine tuning stage, a group is obtainedParameterized new modelOn the test set, to the new modelEvaluations were made and averaged to avoid chance of outcome.
The above-described method of the present application is specifically evaluated by experiments as follows.
The first part is the experimental setup and hyper-parameters, which provide the experimental setup and hyper-parameters, performance indicators, and simulation environment. The second part is experimental performance evaluation and analysis, the method is abbreviated as MCCML, the method is compared with a reference method, the effectiveness of each component is proved through ablation experiments, and experimental results are analyzed in detail to verify the performance of the method. The hardware environment code implementation framework used in the specific experiment is a Pythrch.
Specifically, the experimental setup and the hyper-parameters include the following.
The optimal hyper-parameters used by the model of the present application are listed in table 2, by empirical rules and a number of experiments. External update
A global optimization of the model is performed, and therefore the experiment sets the value of beta to be larger than the value of alpha. In the training phase, the number of attack samples K in each task is set to 5. However, to avoid the chance of a model test phase result, here the number of attack samples H per task is set to 15. In addition, the back propagation process of the small sample training is similar to the traditional supervised learning after the front propagation process is completed. Since the set small sample task of abnormality detection is based on the problem of two classifications of supervised learning, there is no problem of data imbalance. Thus, the loss function used in the training process is a binary cross-entropy function. To better train the proposed model, experiments were performed to update network parameters based on the Adam optimization method of Stochastic Gradient Descent (SGD).
TABLE 2 Superparameter settings
Existing common data sets are generated manually in a particular environment containing many normal and abnormal samples and are not suitable for small sample problems. For small sample learning in network intrusion detection, a task set needs to be reconstructed according to the attack type labels. Therefore, a small part of samples are extracted from the existing public data set CICIDS2017 serving as a data source, the samples are packaged into tasks, and a plurality of task sets are reconstructed, wherein the task sets comprise normal and specific attack samples required by experiments. And finally, selecting the most typical five attacks (DDoS, bruteForce, portscan, bot and Web) in the CICIDS2017 data set for experiment. In addition, data preprocessing is a necessary step before training the model, and therefore preprocessing operations are required on the data. As shown in table 3, there are 5 sets of experiments, each set of experiments will select one attack to simulate the detection of an attack on a real unknown sample and three from the remaining four attack types for training, so there are 4 parallel experiments in total per set. Each group of experiments is repeated for multiple times, and the average value is taken as the final evaluation result, so that the model evaluation result is as accurate as possible.
TABLE 3 Experimental grouping
The following are experimental performance evaluations and analyses.
The performance of the proposed new attack intrusion detection method based on MAML will be verified. The setting of the number of iterations may be obtained by observing the variation of the training loss. Fig. 4 shows the Loss plot for the model over 100 iterations. As can be seen from the figure, with the continuous training of the neural network, the loss function achieves a fast convergence in the previous iterations, and remains at a relatively stable level after 60 iterations, with slight oscillations. Therefore, the number of iterations (Eposide) is set to 100.
In order to evaluate the performance of the method MCCML proposed by the present application and its fitting and generalization ability, it was compared with the current widely used reference classifier, which includes the traditional machine learning algorithm: K-Nearest Neighbor (KNN), random Forest (RF); an ensemble learning algorithm: adaboost, bagging algorithm (Bagging), and Gradient Boosting Decision Tree (GBDT). In addition, the benchmark method also comprises some experimental comparisons of the classical deep learning algorithm: MLP, multi-channel CNN (same as the infrastructure network structure in MCCML, trained using traditional supervised learning training methods). All the model methods described above are based on the same reference data set for experiments to achieve a fair comparison of the detection performance of the new task.
Table 4 lists the performance of the proposed method and benchmark method in identifying various unknown attack categories, including accuracy, recall, and F1 index. The bold face portion is the best detection result for each test attack category. The last three columns in table 4 can be viewed as a set of ablation experiments, and the effectiveness of each component in the model is demonstrated by comparative experiments on the three components of the multi-channel CNN, the meta-learning framework, and the weighted gradient update. As can be seen from table 4: (1) Compared with the full-connection layer method, the multichannel convolution method is improved by 3% on average in each index; (2) Compared with the traditional network model training mode, the meta-learning training aiming at the small sample learning provided by the application has the advantages that the overall performance is improved by 6-7%; (3) For a small sample scene, some shallow learning methods are even better than deep learning, because deep learning depends on a large number of sample sets, overfitting can be caused by too few training data, and the performance is poor; (4) The average gradient update rule of the MAML may cause the initial model to be too biased towards certain existing tasks and not be adapted to new tasks. Weighting the gradient updates can make the model more extensive, reducing the problem of the model performing excessively on certain specific tasks. In summary, compared with the traditional machine learning or deep neural network, the method MCCML provided by the present application provides a better detection effect, which is generally superior to the reference method in all indexes, and the worst detection result is comparable to the best result in the reference method.
TABLE 4
To highlight the training efficiency of the proposed model, fig. 5 provides a runtime comparison of different models per iteration. Experimental results show that the calculation speed of the method provided by the application is obviously higher than that of a pure deep learning method. Time consumption is one of the defects of deep learning, and faster detection efficiency and higher performance can be achieved by training through the meta-learning idea. The running time of each iteration of the method reaches 0.652s, which is also comparable to the training efficiency of machine learning. Since small sample learning is a relatively new topic in the field of network intrusion detection, there is little correlation work available for comparison, and there is no reference sample set suitable for testing. Therefore, the CICIDS2017 open source data set is used for reconstructing a detection task set special for small sample learning, and a plurality of related researches using the CICIDS2017 data set are selected for carrying out a benchmark comparison experiment. Determining an abnormal flow as a normal flow is much more detrimental than determining a normal flow as an abnormal flow. The present application compares the proposed algorithm MCCML with siemese, AE-CGAN-RF and ANID methods for the recall rate most interesting for network intrusion prevention systems, as shown in table 5.
TABLE 5
It should be noted that not all reference models use the same data set size. Both AE-CGAN-RF and ANID are not small sample detection methods, and they require a large number of samples to train. Experimental results show that the MCCML method can obtain competitive performance in a new task including unknown attack, has high detection rate on a new attack sample, achieves 95.22% on average, and is superior to all other reference detection methods. In addition, compared with the similar sample method, siamese, the MAML is superior to the Siamese network model in the field of network anomaly detection.
On the basis, an embodiment of the present application further provides an intrusion detection identification method, including:
acquiring intrusion data to be identified; and calling the intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
On the basis, an embodiment of the present application further provides an intrusion detection and identification device, including:
the data acquisition module is used for acquiring intrusion data to be identified, and the data processing module is used for calling an intrusion detection model obtained by a training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.
On the basis, the embodiment of the present application further provides a computer storage medium, where at least one executable instruction is stored in the computer storage medium, and the executable instruction causes a processor to execute the operation corresponding to the method.
On the basis of the foregoing, an embodiment of the present application further provides a computer apparatus, including: the processor, the memory, the communication interface and the communication bus are used for completing mutual communication, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the method.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A method for training an intrusion detection model, comprising:
a set of sample data is obtained and,
a classification model is established, and the classification model is established,
and training the classification model by using a MAML-based meta-training method.
2. The method of claim 1, wherein the classification model is a multi-channel CNN model.
3. The method of claim 2, wherein the multi-channel CNN model comprises:
an input layer and a plurality of channels, each channel defining a Block, each Block comprising a two-dimensional convolutional layer, an activation function selection LeakyReLU, a 2-dimensional max pooling layer, and a Dropout layer,
the system also comprises a splicing layer which is used for connecting the local features extracted from a plurality of different channels to form a new feature vector,
and a full connecting layer and an output layer are sequentially arranged behind the splicing layer.
4. The method of claim 3, wherein the intrusion detection model is trained,
the probability distribution of the tag y in the output layer is calculated by a Softmax activation function.
5. The method of claim 1, wherein the sample data set comprises a meta training set Dmeta-train and a meta test set Dmeta-test, the meta training set Dmeta-train comprises a sample set and a query set, the meta test set Dmeta-test comprises a support set and a test set,
after the training of the classification model is finished, the meta-test stage is entered, the meta-test stage comprises a fine-tuning stage and a verification stage,
the fine tuning phase comprises: using pre-trained model parameters when the model needs to adapt to a new specific taskAnd the sample data on the support set are used for fine tuning the model parameters, and the specific implementation is shown as the following formula,
where Pi denotes the support set of the ith task, α is the learning rate shared between the different tasks in the internal update step,representing an initial parameter ofThe training loss value of the model of (1) on task Ti,
6. The method for training the intrusion detection model according to claim 1, wherein the training the classification model by the MAML-based meta-training method specifically comprises: training is based on dual gradient updates, including inner updates and outer updates,
in the internal update phase, the training loss value on each task Ti is first calculated using the sample set data SiAnd optimally updating the local parameter theta of each task Ti along the gradient descending direction, wherein the formula is as follows:where α is the learning rate shared between different tasks in the internal update step,representing the training loss value of the model with the initial parameter theta on the task Ti, and carrying out gradient update on the initial parameter theta of the internal model corresponding to the task Ti through the loss value,thereby obtaining an updated parameter ofThe weakly supervised model with a preference for (b),
in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task from the initial model, specifically, a weight wi of the gradient update is set for each task Ti, and the updating operation of the weight is as follows:
wherein ,representing the total loss value after one iteration,represents a weighted learning rate, t represents the number of iterations,
Therefore, it is necessary to further perform a normalization operation on the obtained weights, which is specifically shown in the following formula:
then, the parameters after local update are obtained through the training of the query setAnd obtaining the loss value by using the query set corresponding to each task TiCalculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows:and beta represents the learning rate of the external update,
7. An intrusion detection identification method, comprising:
acquiring intrusion data to be identified;
invoking an intrusion detection model obtained by using a training method of the intrusion detection model according to any one of claims 1-6, and processing the intrusion data to be identified to obtain a processing result.
8. An intrusion detection recognition device, comprising:
a data acquisition module and a data processing module,
the data acquisition module is used for acquiring intrusion data to be identified,
the data processing module is used for calling an intrusion detection model obtained by adopting the training method of the intrusion detection model according to any one of claims 1 to 6, and processing the intrusion data to be identified so as to obtain a processing result.
9. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method of any one of claims 1 to 7.
10. A computer device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, and the memory is used for storing at least one executable instruction which enables the processor to execute the corresponding operation of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211546247.4A CN115563610B (en) | 2022-12-05 | 2022-12-05 | Training method, recognition method and device for intrusion detection model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211546247.4A CN115563610B (en) | 2022-12-05 | 2022-12-05 | Training method, recognition method and device for intrusion detection model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115563610A true CN115563610A (en) | 2023-01-03 |
CN115563610B CN115563610B (en) | 2023-05-30 |
Family
ID=84770287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211546247.4A Active CN115563610B (en) | 2022-12-05 | 2022-12-05 | Training method, recognition method and device for intrusion detection model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563610B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115618353A (en) * | 2022-10-21 | 2023-01-17 | 北京珞安科技有限责任公司 | Identification system and method for industrial production safety |
CN116389175A (en) * | 2023-06-07 | 2023-07-04 | 鹏城实验室 | Flow data detection method, training method, system, equipment and medium |
CN116821907A (en) * | 2023-06-29 | 2023-09-29 | 哈尔滨工业大学 | Drop-MAML-based small sample learning intrusion detection method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110365659A (en) * | 2019-06-26 | 2019-10-22 | 浙江大学 | A kind of building method of network invasion monitoring data set under small sample scene |
CN110808945A (en) * | 2019-09-11 | 2020-02-18 | 浙江大学 | Network intrusion detection method in small sample scene based on meta-learning |
CN113037730A (en) * | 2021-02-27 | 2021-06-25 | 中国人民解放军战略支援部队信息工程大学 | Network encryption traffic classification method and system based on multi-feature learning |
-
2022
- 2022-12-05 CN CN202211546247.4A patent/CN115563610B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110365659A (en) * | 2019-06-26 | 2019-10-22 | 浙江大学 | A kind of building method of network invasion monitoring data set under small sample scene |
CN110808945A (en) * | 2019-09-11 | 2020-02-18 | 浙江大学 | Network intrusion detection method in small sample scene based on meta-learning |
CN113037730A (en) * | 2021-02-27 | 2021-06-25 | 中国人民解放军战略支援部队信息工程大学 | Network encryption traffic classification method and system based on multi-feature learning |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115618353A (en) * | 2022-10-21 | 2023-01-17 | 北京珞安科技有限责任公司 | Identification system and method for industrial production safety |
CN115618353B (en) * | 2022-10-21 | 2024-01-23 | 北京珞安科技有限责任公司 | Industrial production safety identification system and method |
CN116389175A (en) * | 2023-06-07 | 2023-07-04 | 鹏城实验室 | Flow data detection method, training method, system, equipment and medium |
CN116389175B (en) * | 2023-06-07 | 2023-08-22 | 鹏城实验室 | Flow data detection method, training method, system, equipment and medium |
CN116821907A (en) * | 2023-06-29 | 2023-09-29 | 哈尔滨工业大学 | Drop-MAML-based small sample learning intrusion detection method |
CN116821907B (en) * | 2023-06-29 | 2024-02-02 | 哈尔滨工业大学 | Drop-MAML-based small sample learning intrusion detection method |
Also Published As
Publication number | Publication date |
---|---|
CN115563610B (en) | 2023-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115563610A (en) | Method and device for training and identifying intrusion detection model | |
CN111783442A (en) | Intrusion detection method, device, server and storage medium | |
CN115331732B (en) | Gene phenotype training and predicting method and device based on graph neural network | |
Usman et al. | Filter-based multi-objective feature selection using NSGA III and cuckoo optimization algorithm | |
CN113541985B (en) | Internet of things fault diagnosis method, model training method and related devices | |
CN113591962B (en) | Network attack sample generation method and device | |
CN114511330B (en) | Ether house Pompe fraudster detection method and system based on improved CNN-RF | |
Usman et al. | Design and implementation of a system for comparative analysis of learning architectures for Churn prediction | |
Lehavi et al. | Feature reduction method comparison towards explainability and efficiency in cybersecurity intrusion detection systems | |
US20240054369A1 (en) | Ai-based selection using cascaded model explanations | |
US20050278352A1 (en) | Using affinity measures with supervised classifiers | |
CN116756662A (en) | Yield prediction method and system for optimizing random forest based on Harris eagle algorithm | |
Kuchipudi et al. | Android Malware Detection using Ensemble Learning | |
KR102405799B1 (en) | Method and system for providing continuous adaptive learning over time for real time attack detection in cyberspace | |
Rathod et al. | Model comparison and multiclass implementation analysis on the unsw nb15 dataset | |
Xiao et al. | Identification of IoT Devices Using A Multiple Transformers Single Estimator (MTSE) Learning Pipeline | |
CN117792737B (en) | Network intrusion detection method, device, electronic equipment and storage medium | |
Amarnath et al. | Metaheuristic approach for efficient feature selection: A data classification perspective | |
Ali | A New Intrusion Detection Strategy Based on Combined Feature Selection Methodology and Machine Learning Technique. | |
CN116015787B (en) | Network intrusion detection method based on mixed continuous variable component sub-neural network | |
CN117745423B (en) | Abnormal account identification method | |
CN113162914B (en) | Intrusion detection method and system based on Taylor neural network | |
CN113076695B (en) | Ionosphere high-dimensional data feature selection method based on improved BBA algorithm | |
CN109905340B (en) | Feature optimization function selection method and device and electronic equipment | |
Chiu et al. | Graph Sparsifications using Neural Network Assisted Monte Carlo Tree Search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |