CN114065933A

CN114065933A - An Unknown Threat Detection Method Based on Artificial Immune Thought

Info

Publication number: CN114065933A
Application number: CN202111420523.8A
Authority: CN
Inventors: 彭海朋; 陈冠华; 李丽香; 黄京泽; 孙婧瑜
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-02-18
Anticipated expiration: 2041-11-26
Also published as: CN114065933B

Abstract

The invention discloses an unknown threat detection method based on artificial immune thought, which can effectively detect known and unknown threats. The pre-classification module introduces the convolutional neural network to solve the pre-collection of the self set and the pre-classification of network traffic in detection; the negative selection module introduces the gene pool to solve the random generation problem of the initial detector and the specificity for unknown threats. To solve the problem of sexual immunity, hierarchical clustering is introduced to improve the training efficiency of the detector; the clone mutation module solves the problem of overlapping detection of high-affinity detectors by introducing a detector optimization algorithm based on genetic algorithm; at the same time, a memory detector based on LRU is introduced The extinction mechanism effectively releases storage space and improves detection efficiency; the mRNA vaccination module introduces an mRNA vaccine algorithm based on feature importance ranking, decomposes the detected unknown threats according to their genetic importance and injects them into the gene bank, and generates corresponding detections device to achieve specific immunity to this unknown threat and its variants.

Description

Unknown threat detection method based on artificial immunity thought

Technical Field

The invention relates to the technical field of intrusion detection, in particular to an unknown threat detection method based on an artificial immunity thought.

Background

The biological immune system has the characteristics of diversity, tolerance, self-organization, self-adaptation and the like, artificial immunity is a mathematical model by referring to the concept of the biological immune system, a cell model is constructed by defining morphological space, self-body and non-self-body and affinity calculation, the maturation process of immune cells in bone marrow is simulated by a negative selection algorithm, the immune cells which can be matched with the self-body are eliminated, the rest immune cells reach the mature state, and threat detection is carried out by the mature immune cells.

Traditional artificial immunity-based models: firstly, a Self set is constructed manually, then an immature detector is randomly generated to enter a negative selection process for tolerance, the tolerance process is a process of Self-matching of the detector and Self, when the affinity reaches a matching threshold, the immature detector dies, otherwise, the immature detector evolves to become a mature detector and participates in detection.

Traditional intrusion detection models based on artificial immunity. Has the following disadvantages:

first, Self sets the adaptive problem. The Self set is usually a very large set, the collected normal sample is only a small subset of the Self set and cannot necessarily represent the real Self set, and if the number of elements of the two sets is greatly different, error scaling is caused, and finally the error of an actual result is large; in addition, in practical application, the number of elements in the Self set is often changed along with time, the traditional Self-non Self set model is usually manually distinguished, and the manually distinguished elements in the Self set are often dangerous, because the elements in the current Self set may become the elements in the tomorrow non Self set, such Self set lacks dynamic coverage, more false positives are generated, and the accuracy of the antibody set is difficult to guarantee; finally, most of the conventional Self sets are constructed according to static behavior information, and in a real environment, elements in the Self sets may change along with time, so that it is not desirable to artificially distinguish the elements in one Self set, and therefore, the failure of Self-adaptive updating of the Self sets becomes one of the disadvantages of the conventional model.

Second, the detector creates efficiency problems. In the negative selection algorithm proposed by Forrest et al, the detector generation efficiency is very low, the candidate detectors mature through the negative selection process, assuming that N is the size of the self-set to be trained, P is the probability of matching between antigen and antibody, P is_fIs the failure rate (probability that antigen is not matched by any antibody), the number of candidate detectors should be N_c＝-ln(P_f)/(P(1-P)^N) Then the time complexity of the algorithm is O (N)_cN), the number of candidate detectors grows exponentially as the size of the training set increases, and the time cost of the detection phase is higher. In the negative selection algorithm of real-valued representation, when the radius of the detector is constant, the volume of the hyper-sphere decreases with the increase of the dimension, and when the dimension exceeds 20, the volume is close to 0, which is also the reason that the performance of the detector in the high-dimensional real-valued space is not high. Furthermore, the increase in dimension also brings an increase in temporal complexity as well as spatial complexity.

Third, the detector identifies spatially large area overlap problems. When a high affinity detector that has evolved frequently is selected, many high affinity detectors may overlap each other in large areas of the recognition space in the next generation of detector clusters, which results in a relatively smaller recognition space for the entire detector cluster, which may then fall into a subset of the nonself set.

Fourth, the specific immune problem of unknown threats. When an unknown threat is found in a traditional model, only the threat information is recorded and a corresponding detector is generated, no countermeasure is provided for the threat, when the threat is changed, the process is carried out again, the reaction speed is low, and the time cost is high.

Disclosure of Invention

Aiming at the problems, the invention provides an unknown threat detection method based on an artificial immunity thought, which solves the Self-adaption problem of a Self set by introducing a pre-classification module, solves the generation efficiency problem of a detector by carrying out pre-hierarchical clustering on the Self set, solves the large-area overlapping problem of the identification space of the detector by introducing a detector optimization algorithm based on a genetic algorithm, and solves the specific immunity problem of the unknown threat by introducing an mRNA vaccine injection module.

In order to achieve the above purpose, the invention provides the following technical scheme:

an unknown threat detection method based on an artificial immunity idea is disclosed, wherein a detection model comprises a pre-classification module, a negative selection module, a clone variation module and an mRNA vaccination module, wherein the pre-classification module is introduced into a convolutional neural network; introducing a gene library and hierarchical clustering by a negative selection module; the clonal variation module incorporates a genetic algorithm-based detector optimization algorithm that controls the affinity of the detector by detector concentration, while incorporating an LRU-based memory detector regression mechanism; the mRNA vaccination module introduces an mRNA vaccine algorithm based on characteristic importance ranking, decomposes the detected unknown threats according to gene importance and injects the threats into a gene library, and generates a corresponding detector.

Further, the unknown threat detection method based on the artificial immunity thought comprises a training phase and a detection phase, wherein,

a training stage: training a convolutional neural network by using a labeled data set to enable the convolutional neural network to have initial classification capability, constructing a gene library by using an initial nonself set and generating an initial detector, and participating in negative selection by using an initial self set to generate a first generation mature detector set;

a detection stage: inputting network flow, extracting and coding features, and entering the following steps:

s1, matching with the memory detector group, if detected, indicating that the threat is the recorded threat; if not, go to S2;

s2, classifying the convolutional neural network by using a pre-classification module, then matching the convolutional neural network with a mature detector group, and entering a self set to participate in the next negative selection process if the convolutional neural network is not matched and the classification is positive; if so, and the classification is negative, then the unknown threat is determined to proceed to S3; if the matching condition is inconsistent with the classification condition, entering a manual checking process, marking correct labels on the data and adding the data into a training set of the neural network for training;

s3, enabling the discovered unknown threats to enter an mRNA vaccination module, extracting feature importance by using a random forest algorithm, decomposing key features and storing the key features in a gene bank, generating a maturity and memory detector capable of detecting similar attacks through negative selection and clonal variation, adding the generated maturity detector into a maturity detector group, and adding the memory detector into a memory detector group;

s4, updating the maturity detector and the memory detector using an LRU-based detector kill algorithm.

Further, in step S2, a convolutional neural network is trained in advance using the correctly labeled data set, and the convolutional neural network uses 2 convolutional layers, 2 pooling layers, a full-link layer, and a classification layer.

Further, in step S2, the step of classifying the convolutional neural network is:

s201, collecting data, making classification labels, and dividing the classification labels into a training set and a data set;

s202, designing a convolutional neural network: the convolutional neural network comprises 23 multiplied by 3 convolutional layers, 2 pooling layers, 1 full connection layer and 1 classification layer, wherein the pooling layers adopt a Max pooling mode, and the full connection layers are provided with 256 neurons;

s203, training the model in the S202 by using the training set in the S201, adjusting training parameters according to the test result and obtaining an optimal model;

and S204, dividing the data set into a self set and a non-nself set by using the optimal model of S203.

Further, the negative selection in step S3 includes the steps of:

s301, collecting corresponding data by using an existing data set or a pre-classification module to construct a self set, and performing hierarchical clustering on the self set to obtain N clustering centers;

s302, randomly generating N initial detectors from a gene library, wherein the initial detectors in the same batch are defined as: { character string, R, tag, fixness, age }, where the radius R is determined by the minimum distance of the previous generation initial detector from self,

d is the Euclidean distance between two character strings, and the fitness is updated when the two character strings are matched in the detection stage; tag is a label and is used for recording the state of the detector, and the value is Image, match and memory; age is used for recording the algebra of the detector, and the initial immature detector age is 0;

s303, for each detector: the detector is sequentially matched with the clustering center, the distance from the detector to the clustering center is calculated, whether the clustering center is within the radius or not is judged, and if the distance is smaller than the radius, the detector disappears;

and S304, when the negative selection process of all the detectors is completed, recording the minimum distance r from the same batch of detectors to the self-set for the next generation, adding the minimum distance r into the mature detector set, and updating tag (match) of the mature detector set, wherein the age (age + 1) of the mature detector set.

Further, the cloning variation in step S3 includes the steps of:

s311, sending a series of parent detector sets which detect the threat to a clone mutation module;

s312, sequentially selecting the detectors with the maximum affinity, and setting the initial concentration of each detector to be N-1;

s313, recording the selected detector as A and the mutated detector as a, defining a mutation number threshold value x, randomly selecting a mutation operator and mutating, if the distance between A and a is larger than the radius RA of A and the mutation (a)>fitness (a), concentration N +1,

and generating a detector a, wherein Ra ═ Ra, fitness (a) ═ fitness (a), age (a) is initialized to 1, and age (a) ═ age (a) + 1;

s314, adding A and a into the memory detector set and the maturity detector set.

Further, the mRNA vaccination in step S3 includes the steps of:

s321, when an unknown threat is found, sending the unknown threat to an mRNA vaccination module;

s322, constructing a data set and outputting feature importance by using a random forest algorithm;

s323, setting a threshold value to be N, taking the characteristic that the sum of the contribution of the characteristic values reaches the set threshold value N, and adding the characteristic into a gene library;

s324, randomly generating a detector with important genes;

s325, entering a negative selection step;

s326, cloning and mutating to generate a corresponding detector.

Further, the step of updating the method in step S4 is: if the threshold value of the number of matches is set to be N and the threshold value of age A, detectors whose age is equal to or more than A and the number of matches does not reach N die.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides an unknown threat detection method based on an artificial immunity thought, a detection model is divided into four modules: a pre-classification module, a negative selection module, a clonal variation module and an mRNA vaccination module. The pre-classification module introduces a convolutional neural network to solve the pre-collection of the negative selection module self set and the pre-classification problem of network traffic in detection. The negative selection module introduces a gene bank for solving the random generation problem of the initial detector and the specific immunity problem of the mRNA vaccine module against the position threat, and introduces hierarchical clustering for improving the training efficiency of the detector. The clone mutation module solves the overlapping detection problem of the high affinity detector by introducing a detector optimization algorithm based on a genetic algorithm, controls the affinity of the detector through the concentration of the detector, and further enables the frequently mutated detector to be slowly unselected, and avoids the detection range of the detector falling into a subset of a nonself set. Meanwhile, a memory detector fading mechanism based on the LRU is introduced, so that the storage space is effectively released, and the detection efficiency is improved. The mRNA vaccination module introduces an mRNA vaccine algorithm based on characteristic importance ordering, decomposes the detected unknown threats according to the gene importance and injects the decomposed unknown threats into a gene library, and generates a corresponding detector to realize specific immunity to the unknown threats and the variants thereof. The four modules act together to effectively detect known and unknown threats.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of an unknown threat detection method based on an artificial immunization idea according to an embodiment of the present invention.

Fig. 2 is a flowchart of a pre-classification module according to an embodiment of the present invention.

FIG. 3 is a flow chart of a negative selection module according to an embodiment of the present invention.

FIG. 4 is a flow chart of a clonal variation module provided in an embodiment of the present invention.

FIG. 5 is a flow chart of an mRNA vaccination module provided by an embodiment of the present invention.

Detailed Description

For a better understanding of the present solution, the method of the present invention is described in detail below with reference to the accompanying drawings.

The unknown threat detection method based on the artificial immunity thought provided by the embodiment of the invention has the advantages that the whole model is shown in figure 1 and is divided into four modules: a pre-classification module, a negative selection module, a clone variation module and an mRNA vaccination module, comprising a training stage and a detection stage, wherein,

Each module is described separately below.

With respect to the pre-classification module:

in step S2, as shown in fig. 2, the convolutional neural network classification step is:

Regarding the negative selection module:

in step S3, as shown in fig. 3, the negative selection step is:

Regarding clonal variation modules:

in step S3, as shown in fig. 4, the clonal variation steps are:

For the mRNA vaccination module:

in step S3, as shown in fig. 5, the mRNA vaccination step is:

s324, randomly generating a detector with important genes;

s325, entering a negative selection step;

s326, cloning and mutating to generate a corresponding detector.

Compared with the prior art, the method of the invention has the following beneficial effects:

1. pre-classification module

The pre-classification module can effectively solve the Self-adaption problem of the Self set, so that the Self set has dynamic coverage, and specifically, the invention has two functions, namely, generating an initial Self set and a nonself set, and continuously and dynamically expanding the Self set for network traffic classification.

2. And training an optimization algorithm based on a detector of hierarchical clustering.

In a traditional artificial immunity-based unknown threat detection model, any strategy is rarely adopted to reduce the distance calculation cost: the distance from the candidate detector to the self-set must be calculated, thereby reducing efficiency. The dynamic tolerance training of the detector based on clustering is carried out by introducing a detector training optimization algorithm based on hierarchical clustering, the self-clustering centers participate in immune tolerance training, and the number of the clustering centers is far less than that of the self-sets and is relatively stable, so that the training efficiency of the detector is greatly improved compared with that of the traditional method.

3. Genetic algorithm-based detector model clonal variation optimization mechanism

When a frequently evolving high affinity detector is selected, many high affinity detectors may overlap each other in large areas of the recognition space in the next generation of detector populations, which results in a relatively smaller recognition space for the entire detector population. By introducing a detector model clone mutation optimization mechanism based on a genetic algorithm and depending on concentration to weaken the action of affinity force, the frequently evolved high-affinity detectors are greatly reduced in the selected chance after being evolved for a certain number of generations and are finally not selected. Also, the chance of obtaining evolution for some newly added high affinity detectors will be much higher than for detectors of equal affinity but high concentration due to the low concentration. Thus, through the evolution of the concentration control detector, the diversity of the detector group can be kept to a certain extent, and the situation that the identification space of the detector group evolves to a subset of the non-self set is avoided.

4. LRU-based detector extinction algorithm

As the detected network traffic is more, the mature detector set and the memory detector set are larger, and the unlimited growth not only needs a large amount of storage space, but also sacrifices the query efficiency. By introducing LRU-based detector extinction algorithm, the sizes of mature detectors and memory detectors can be controlled, storage cost is reduced, and efficiency is improved.

5. mRNA vaccination mechanism based on feature importance ranking

When a known or unknown viral threat is present in the real world, the most effective method is to vaccinate mRNA vaccines for targeted immunization. The process of injecting inactivated mRNA vaccine is to inject inactivated antigen into human body and induce the immune system to produce corresponding antibody and memory cell. Therefore, in the process of detecting unknown threats based on immunity, when the system finds the unknown threats, the useful information of the threats is fully utilized, and specific immunity to antigens can be realized by injecting mRNA vaccines to generate specific antibodies.

In conclusion, the invention provides an unknown threat detection method based on the artificial immunity idea, and the known and unknown threats are effectively detected. The system comprises a pre-classification module, a detection module and a control module, wherein the pre-classification module introduces a convolutional neural network to solve the pre-collection of a self set and the pre-classification problem of network flow in detection; the negative selection module introduces a gene library for solving the random generation problem of an initial detector and the specific immunity problem aiming at the position threat, and introduces hierarchical clustering for improving the training efficiency of the detector; the clone mutation module solves the overlapping detection problem of the high affinity detector by introducing a detector optimization algorithm based on a genetic algorithm; meanwhile, a memory detector fading mechanism based on LRU is introduced, so that the storage space is effectively released, and the detection efficiency is improved; the mRNA vaccination module introduces an mRNA vaccine algorithm based on characteristic importance ordering, decomposes the detected unknown threats according to the gene importance and injects the decomposed unknown threats into a gene library, and generates a corresponding detector to realize specific immunity to the unknown threats and the variants thereof.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof, but such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. an unknown threat detection method based on artificial immune thought, is characterized in that, detection model comprises pre-classification module, negative selection module, clone variation module and mRNA vaccination module, wherein pre-classification module introduces convolutional neural network; negative selection The module introduces gene library and hierarchical clustering; the clone mutation module introduces the detector optimization algorithm based on genetic algorithm, controls the affinity of the detector through the detector concentration, and introduces the LRU-based memory detector extinction mechanism; the mRNA vaccination module introduces feature-based The importance-ranked mRNA vaccine algorithm decomposes the detected unknown threats according to their genetic importance and injects them into the gene pool, and generates corresponding detectors.

2. the unknown threat detection method based on artificial immune thought according to claim 1 of this play, is characterized in that, comprises training phase and detection phase, wherein,

Training phase: use the labeled dataset to train the convolutional neural network to make it have initial classification ability, use the initial nonself set to construct the gene pool and generate the initial detector, use the initial self set to participate in negative selection, and generate the first generation mature detector gather;

Detection stage: input network traffic, feature extraction and encoding, and enter the following steps:

S1. Match with the memory detector group. If detected, it means that it is a threat that has been recorded; if not detected, enter S2;

S2. Use the pre-classification module for convolutional neural network classification, and then match with the mature detector population. If it does not match and the classification is positive, enter the self set to participate in the next negative selection process; if it matches, and classify If it is negative, it is determined that the unknown threat enters S3; if the matching situation is inconsistent with the classification situation, enter the manual review process, label the data correctly and add it to the training set of the neural network;

S3. The discovered unknown threats enter the mRNA vaccination module, use the random forest algorithm to extract the feature importance, decompose and inject the key features into the gene bank, and generate mature and memory detectors that can detect similar attacks through negative selection and clonal mutation. The mature detector joins the mature detector group, and the memory detector joins the memory detector group;

S4. Use the LRU-based detector demise algorithm to update the mature detector and the memory detector.

3. The unknown threat detection method based on artificial immunity thought according to claim 2, is characterized in that, in step S2, use the data set with correct label to train a convolutional neural network in advance, use 2 convolution layers, 2 A convolutional neural network with a pooling layer, a fully connected layer, and a classification layer.

4. The unknown threat detection method based on artificial immunity thought according to claim 2, is characterized in that, in step S2, the step of convolutional neural network classification is:

S201, collect data and make classification labels, and divide it into two parts: training set and data set;

S202. Design a convolutional neural network: the convolutional neural network includes 2 3×3 convolutional layers, 2 pooling layers, 1 fully connected layer and 1 classification layer. The pooling layer adopts the Max pooling method. The connection layer is set to 256 neurons;

S203, use the training set in S201 to train the model in S202, adjust the training parameters according to the test results and obtain the best model;

S204. Use the best model of S203 to divide the dataset into a self set and a nonself set.

5. the unknown threat detection method based on artificial immune thought according to claim 2, is characterized in that, the step of negative selection in step S3 is:

S301. Use an existing data set or a pre-classification module to collect corresponding data to construct a self set, and perform hierarchical clustering on the self set to obtain N cluster centers;

S302, randomly generate N initial detectors from the gene library, the initial detectors of the same batch are defined as: {string, R, tag, fitness, age}, where the radius R is the minimum distance between the previous generation initial detector and self Sure,

D is the Euclidean distance between the two strings, and the fitness is updated when matching in the detection stage; tag is the label, which is used to record the state of the detector, and the values are Immature, mature, and memory; age is used to record the algebra of the detector , the initial immature detector age=0;

S303, for each detector: the detector matches the cluster center in turn, calculates the distance from the detector to the cluster center and determines whether the cluster center is within the radius, and if the distance is less than the radius, the detector dies;

S304. When the negative selection process of all detectors is completed, record the minimum distance r from the same batch of detectors to the self-collection set for the next generation to use, then add it to the mature detector set, and update its tag=mature, age=age+ 1.

6. The unknown threat detection method based on artificial immunity thought according to claim 5, is characterized in that, the step of cloning mutation in step S3 is:

S311, sending a series of parent detector sets that have detected the threat into the clone mutation module;

S312, selecting the detector with the greatest affinity in turn, and setting the initial concentration of each detector as N=1;

S313. Denote the selected detector as A, the mutated detector as a, define the mutation times threshold x, randomly select the mutation operator and mutate, if the distance between A and a is greater than the radius RA of A and fitness(a)>fitness (A), concentration N=N+1,

And generate detector a, where Ra=RA, fitness(a)=fitness(A), age(a) is initialized to 1, age(A)=age(A)+1;

S314. Add A and a to the memory detector set and the mature detector set.

7. the unknown threat detection method based on artificial immunity thought according to claim 6, is characterized in that, the step of mRNA vaccination in step S3 is:

S321. When an unknown threat is found, send the unknown threat into the mRNA vaccination module;

S322. Construct a dataset and use a random forest algorithm to output feature importance;

S323, set the threshold value as N, take the feature whose sum of contribution of the feature value reaches the set threshold value N and add it to the gene bank;

S324. Randomly generate detectors with important genes;

S325, enter the negative selection step;

S326 , the clone mutation generates a corresponding detector.

8. The method for detecting unknown threats based on artificial immune thinking according to claim 7, wherein the step of updating the method in step S4 is: setting the matching number threshold as N and age threshold A, then age is above A And the detector whose matching number does not reach N will die.