CN113162919A - Intrusion detection method based on network abnormal flow identification - Google Patents
Intrusion detection method based on network abnormal flow identification Download PDFInfo
- Publication number
- CN113162919A CN113162919A CN202110330430.XA CN202110330430A CN113162919A CN 113162919 A CN113162919 A CN 113162919A CN 202110330430 A CN202110330430 A CN 202110330430A CN 113162919 A CN113162919 A CN 113162919A
- Authority
- CN
- China
- Prior art keywords
- neural network
- intrusion detection
- network
- universe
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 48
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 35
- 238000013528 artificial neural network Methods 0.000 claims abstract description 55
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 40
- 241000544061 Cuculus canorus Species 0.000 claims abstract description 26
- 238000005457 optimization Methods 0.000 claims abstract description 20
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 claims abstract description 19
- 238000010845 search algorithm Methods 0.000 claims abstract description 12
- 230000000694 effects Effects 0.000 claims abstract description 9
- 238000012360 testing method Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 39
- 238000012549 training Methods 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 26
- 235000013601 eggs Nutrition 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 17
- 230000035772 mutation Effects 0.000 claims description 16
- 230000002068 genetic effect Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 230000000946 synaptic effect Effects 0.000 claims description 3
- 238000010998 test method Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 230000002547 anomalous effect Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000012805 post-processing Methods 0.000 claims description 2
- 238000012800 visualization Methods 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims 1
- 230000002411 adverse Effects 0.000 abstract description 3
- 108090000623 proteins and genes Proteins 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 241000271566 Aves Species 0.000 description 2
- 241000272177 Cuculiformes Species 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 241000272201 Columbiformes Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010006 flight Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 244000000042 obligate parasite Species 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides an intrusion detection method based on network abnormal traffic identification, aiming at the problems that in recent years, the network traffic is greatly increased due to the rapid increase of the number of intelligent equipment users, huge data has adverse effects on the performance of an intrusion detection system, and the performance of the intrusion detection system is poor due to redundant and irrelevant information in the traffic. Firstly, an improved cuckoo search algorithm is adopted for feature selection, and the most accurate and effective features are screened out from an original data set to serve as a group of optimal features. Then, the selected features are used as the input of the evolutionary neural network, and in order to overcome the parameter limitation of the artificial neural network and avoid trapping in local minimum values, a multivariate cosmic optimization algorithm is adopted to train the artificial neural network to obtain the optimal classification effect. And finally, inputting the test data set into the trained artificial neural network, predicting and evaluating abnormal flow detection, and constructing an abnormal flow intrusion detection model based on feature selection and evolution neural network.
Description
Technical Field
The invention relates to the field of improving the performance of an intrusion detection system, in particular to a method for detecting abnormal traffic intrusion.
Background
In recent years, the number of smart device users has increased rapidly, resulting in a great increase in network traffic and also bringing about security problems, such as various known and unknown network attacks. Intrusion Detection Systems (IDS) are one of the best methods to detect attacks because it involves a software or hardware system that can track, evaluate and detect internal and external activities. The huge data adversely affects the performance of the intrusion detection system and the redundant and irrelevant information found in its traffic is also a cause of poor performance of the intrusion detection system. Therefore, how to improve the performance of the intrusion detection system becomes an urgent problem to be solved.
Traffic anomaly intrusion detection methods have long been of interest to researchers, and several methods have been proposed to improve the performance of intrusion detection systems, including feature selection methods and optimization techniques. And pruning high-dimensional data in the large data set by selecting the most relevant features by adopting a feature selection technology so as to avoid dimension disasters when building an intrusion detection system model. The selected features are a subset of the original data set for simplifying model construction to improve execution time by reducing training time. The optimization algorithm can be used for solving complex problems by a simple method, mainly comprises a group-based algorithm, an evolutionary algorithm and a track-based algorithm, and achieves better performance effect of the model by optimizing model parameters or improving a model function. The above method sometimes has a problem of insufficient computing resources when processing a large amount of traffic data, and thus the processing efficiency is low.
Disclosure of Invention
The invention provides an intrusion detection method based on network abnormal traffic identification, aiming at the problems that in recent years, the network traffic is greatly increased due to the rapid increase of the number of intelligent equipment users, huge data has adverse effects on the performance of an intrusion detection system, and the performance of the intrusion detection system is poor due to redundant and irrelevant information in the traffic. Firstly, an improved cuckoo search algorithm is adopted for feature selection, and the most accurate and effective features are screened out from an original data set to serve as a group of optimal features. Then, the selected features are used as the input of the evolutionary neural network, and in order to overcome the parameter limitation of the artificial neural network and avoid trapping in local minimum values, a multivariate cosmic optimization algorithm is adopted to train the artificial neural network to obtain the optimal classification effect. And finally, inputting the test data set into the trained artificial neural network, predicting and evaluating abnormal flow detection, and constructing an efficient abnormal intrusion detection model based on feature selection and evolution neural network.
The purpose of the invention is realized by the following technical scheme: an intrusion detection method based on network abnormal flow identification comprises the following steps:
s1, adopting improved Cuckoo Search Algorithm (CSA) to select the most accurate and effective characteristics from the original data set, and obtaining a group of optimal characteristics.
And S2, taking the selected features as input of the evolutionary neural network, and training the Artificial Neural Network (ANN) by adopting a multivariate cosmic optimization algorithm (MVO) to obtain the optimal classification effect in order to overcome parameter limitation of the artificial neural network and avoid trapping in local minimum values.
And S3, inputting the test data set into the trained artificial neural network, predicting and evaluating abnormal traffic detection, and obtaining abnormal data information in network traffic.
Further, the S1 specifically includes:
s11, the solutions are sorted by using the improved cuckoo search algorithm, and the current best result is found in the post-processing of the results and visualization.
S12, dividing the data set x into C clusters according to the characteristics of the fuzzy data in the data set, nesting, selecting the best one among the C clusters, and placing the eggs in the nested.
Further, the S12 specifically includes:
the best solution is maintained by randomly changing nests with mutations, and the best nests are displayed according to error values and accuracy of classification. Genetic algorithm mutation operators are utilized to create more space and more diverse solutions. To this end, mutations are combined with an improved cuckoo search algorithm, considering two steps, random selection of nests, random selection of eggs.
Further, the S2 specifically includes:
s21, the selected features are sent as training features for the input information dimensions to an ANN module that is designed as a multi-layered perceptron (MLP) with a single input layer, a hidden layer and an output layer architecture. An artificial neural network includes highly interconnected parallel processing components and distributes multiple inputs to a set of desired outputs. The ANN-MLP classification is supervised learning with known class labels. The result of the summation function is passed through a transfer or activation function.
S22, this training is performed by transferring the weights to the MVO module. The MVO algorithm consists of three cosmic views: white holes, black holes and wormholes, which form part of the universe. These three concepts constitute the idea behind the MVO algorithm, which provides a simulation of kinetic and cosmic interactions through black holes, white holes and wormholes. During the optimization process, each cosmic value is kept consistent with its inflation rate and the selected cosmic value is identified by using the roulette wheel as a white hole.
S23, returning a final fitness value for each individual representation, the final fitness value being measured based on the training data set. The optimization process begins with the creation and initialization of factors such as overall size and upper and lower bounds. The cosmic values are then randomly initialized into a set according to the upper and lower bounds. The corresponding fitness value for each universe is calculated to describe the best potential inflation rate. Then, in a single iteration, high inflation rate objects in the universe tend to migrate through white or black holes to the universe containing low inflation rates. Thus, objects in a single universe will randomly enter the best universe through wormholes. Eventually, an optimal universe is formed at the completion of the operation.
Further, an abnormal intrusion detection model of the feature selection and evolution neural network is established, wherein S3 specifically includes:
s31, inputting inputs from the test dataset into the trained ANN;
based on the expected output, the ANN test procedure may be considered as the closest match to any target class to serve as an estimated output to identify anomalous data information in the network traffic S32.
Further, the S31 specifically includes:
the goal of optimizing the neural network with the training method is to find the synaptic weights of the neural network and reduce the Mean Square Error (MSE) representing the cost of the function of the neural network. By generating a solution population, the MVO algorithm initiates the optimization process assuming that each universe is considered to be an individual in a randomly generated solution population. The training method aims to find out a correct value, reduce errors and obtain the maximum classification precision.
Further, the S32 specifically includes:
according to the expected output, the ANN test process can be regarded as the closest match with any target class to serve as the estimation output, all individuals in the MVO are vectors containing the connection weight between ANN layers, the number of objects of each person is calculated, in the proposed MVO training algorithm, MSE is mainly used as a cost function, and abnormal data information in network traffic is obtained according to the calculation result.
The intrusion detection method based on network abnormal flow identification solves the problem that the traditional intrusion detection method is low in efficiency when facing a large amount of flow data. The performance of the intrusion detection system can be improved by analyzing the feature selection and neural network evolution methods, and the high-efficiency intrusion detection system based on the abnormity is established by combining the two methods. Experimental results show that the intrusion detection method based on network abnormal flow identification greatly improves the detection rate, the false alarm rate and the execution time of abnormal flow, can effectively process massive continuous flow monitoring data, and has wider applicability.
Drawings
FIG. 1 is a flow chart of a method for detecting traffic anomaly intrusion;
FIG. 2 is a flow chart of an improved cuckoo search algorithm;
FIG. 3 is a simple architecture diagram of an artificial neural network;
FIG. 4 is a comparison graph of recognition accuracy for different algorithms;
fig. 5 is a time efficiency comparison graph of different algorithms.
Detailed Description
Referring to fig. 1, an intrusion detection method based on network abnormal traffic identification is described in detail, and includes the following steps:
1) the optimal features are separated from the data set.
(1.1) CSA is an improved cuckoo search algorithm that mimics the natural behavior of cuckoos, i.e., some "obligate parasites" of cuckoos lay eggs in nests of other host birds. Researchers have used three rules to define CSA to implement it as a computer algorithm:
a) the best nest for laying high-quality eggs will be transmitted to the next generation;
b) a plurality of predetermined host nests exist, and the recognition probability Pa epsilon (0,1) of laying eggs of the brook birds;
c) when this happens, the eggs are either removed or discarded and then reconstituted with a new one;
for the above rules, CSA is implemented in such a way that a single egg in a nest represents a solution candidate. Thus, a cuckoo may lay only one egg in a nest, while typically there may be several egg solutions in each nest. CSA is responsible for generating novel and potentially superior solutions to replace inappropriate solutions in the current population. The quality of the solution is evaluated and solved according to the objective function of the problem that needs to be maximized. The last estimation rule Pa is called "handover probability" and determines when the worst host cell is replaced by a new randomly generated cell. This factor provides a balance of the two parts of the CSA process, exploration and development. Thus, excessive mining may lead to early convergence, while excessive exploration may slow convergence. When a new solution of cuckoo i is generated, Levy flight is performed using equation (1).
Where α >0 represents the step size assigned according to the scale of the problem. In general, α ═ 1 can be used to represent time. E sign is an entry multiplication. Typically, Levy flights are scheduled to walk randomly, however, their random number of steps is derived from the Levy distribution of large steps provided in equation (2), which has an infinite mean and infinite variance.
Levy~u=t-λ (2)
Based on the three rules, the algorithm shows the key steps of the CSA technology, namely, the efficient abnormal intrusion detection method based on feature selection and the evolutionary neural network.
(1.2) given that CSAs are used to generate new and potential solutions, objective functions play an important role in evaluating solutions and replacing them with other existing solutions. The fitness function (evaluation function) represents how close a given solution is to the final solution of the desired problem. For this reason, we considered fuzzy C-means (FCM) clustering as the objective function of this study, due to its advantages. According to FCM, a data set x must be divided into C clusters according to the characteristics of the data set. These data must be converted to blur. The ambiguity defined by the membership function represents the μ equation:
one widely used objective function for fuzzy c-means clustering is the weighted sum of squared errors in groups, which is used to define a constrained optimization problem, as described in equation (4):
where m is 1. ltoreq. m.ltoreq.infinity, m is an arbitrary real number higher than 1, uijThe degree of xi added to the cluster is j, xiIs the recording data of the i component employed, cjCenter of cluster, | xi-cj| | is where any specification states similarity between any measured data and the center. Through repeated optimization of the objective function, the membership u is updated through formulas (5) and (6) respectivelyijAnd a cluster center cjTo implement fuzzy partitioning:
the goal of cluster optimization is to find the optimal cluster centroid with the help of an iterative process. The FCM is applied to the cuckoo search fitness function as an effective objective function, the positioning of the optimal center of mass in the search result can be enhanced, and the fitness of nests found in the population matched with the random numbers is evaluated. First, the cuckoo algorithm selects several features among all the features. Then, the advisability of each parameter is determined using the FCM algorithm. The application process of FCM in cuckoo search is shown in fig. 2.
(1.3) since the CS algorithm has a fast convergence rate in some cases, its search space is small. The method should increase the search space and achieve the optimal response on the premise of ensuring the convergence.
Thus, in the present invention, genetic algorithm mutation operators are utilized to create more spatial and more diverse solutions. To this end, mutation is combined with cuckoo search algorithms, considering two steps:
a) and (4) randomly selecting nests.
b) Random selection of eggs.
Conventional CSs consider only one egg at a time in a nest, using a Levy flight. A mutation is defined as "altering one or more gene values on a chromosome from an initial state in a genetic algorithm". This may add a completely new genetic value to the gene bank. Mutation is also defined as an important component of "gene search because it helps to prevent the population from falling into any local optimum. "As previously described, the egg of a cuckoo will mimic the egg of a host bird. To explain this behavior, the algorithm includes a mutation operator to reflect the mutated behavior of cuckoo egg genes to improve their productivity. Thus, high quality eggs may be retained and low quality eggs rejected using this strategy. In the mutation process, when the new cuckoo egg selected randomly is better than the old cuckoo egg, the new cuckoo egg will replace the old cuckoo egg. This approach may ensure that the best candidate solution is maintained throughout the next generation. In order to enrich the diversity of the population, nests were randomly selected for mutation as shown in formula (7).
Mi=Xi+r(Xbest-Xworse) (7)
Wherein XiRepresents the position of the ith nest, XbestAnd XworseRepresenting the best and worst individuals in the generation population. Also, r is generated in the range of 0 to 1. Mutations were used to combine the best and worst individuals to ensure population diversity. If the mutant individual is elevated, it will replace the current individual, thereby creating a new brook population x ═ (x)1,…,xd)T。
For performance evaluation, the accuracy of the selected attribute will be measured. The improved cuckoo search algorithm based on the mutation and the fuzzy c-means can overcome the defect that the traditional cuckoo search has a good characteristic quality evaluation effect in characteristic subset selection. The characteristic selection process starts from the creation and initialization of parameters such as generation, nest number, P alpha and the like, and a cuckoo is randomly obtained through soldier flying. The error function is then calculated using fuzzy c-means clustering. Then, nesting is performed, and the best one among them is selected. Eggs are placed in the prepared nests. P α was calculated and the nests were randomly altered using mutations. The best solution is then maintained and the best nest is displayed based on the error values and the accuracy of the classification, and these selected features are used as input to the classification section performed using the MVO-ANN.
2) An information dimension feature training method based on MVO-ANN is shown in FIG. 3.
(2.1) sending the selected features to the ANN module as training features for the input information dimensions.
(2.1.1) the module is designed as a multilayer perceptron (MLP) with a single input layer, a hidden layer and an output layer architecture. Classifying the data as attack and normal based on anomaly-based detection, selecting a well-known algorithmic multi-layer perceptron (MLP) with binary classification and a hidden layer;
(2.1.2) the artificial neural network comprises highly interconnected parallel processing components and distributes multiple inputs to a set of desired outputs. The ANN-MLP classification is supervised learning with known class labels. MLP is a type of feedforward-an artificial neural network, comprising at least three layers: input, output and hidden layers. Typically, each input is multiplied by a weight that matches the network, and the sum of them is a weighted sum function;
(2.1.3) the result of the summing function is passed through a transfer or activation function. Fig. 3 is a simple structure of the artificial neural network. The overall function is calculated as the product, the initial weight and the additional bias, as shown in equation (8).
Back Propagation (BP) is a familiar training algorithm for training an ANN in supervised mode. There are numerous factors in the structure of artificial neural networks that are major problems in implementing such systems. In particular, BP-ANN may enter local minima, which negatively impact the ability to properly assign ANN structures. Therefore, instead of using a back propagation algorithm, a multivariate cosmic optimizer (MVO) is used in the ANN to adjust the weights and minimize the error function.
(2.2) this training is performed by transferring the weights to the MVO module.
(2.2.1) the MVO algorithm consists of three cosmic views: white holes, black holes and wormholes, which form part of the universe. These three concepts constitute the idea behind the MVO algorithm, which provides a simulation of kinetic and cosmic interactions through black holes, white holes and wormholes. In MVO, the inflation rate corresponds to the fitness, while the term "time" parallels the iteration. The following rules apply to any optimization process item by item.
The higher the inflation rate of the shipment, the greater the possibility of white holes.
Second, lower inflation indicates a greater likelihood of black holes.
The best universe is the result of objects randomly passing through wormholes.
(2.2.2) when objects are exchanged between universes, a universe with one object will be sent from a higher inflation rate to other lower inflation rates. In addition, a universe with a lower inflation rate will acquire additional objects from a better universe, thereby making it in a stable state and thus becoming an optimal universe with a higher inflation rate. During the optimization process, each cosmic value is kept consistent with its inflation rate and the selected cosmic value is identified by using the roulette wheel as a white hole.
The Travel Distance Rate (TDR) and Wormhole Existence Probability (WEP) are two main coefficients, which are equations (9) and (10), respectively.
Wherein p represents a development factor; l is the existing iteration and L represents the highest number of iterations. In all iterations, we and TDR are increased to achieve greater accuracy around the best-gained universe in exploration/local search.
(2.2.3) general steps of MVO algorithm are as follows:
a) initializing the parameters of MVO: lb,ub。
b) According to ubAnd lbA set of random universes is created.
c) The corresponding inflation rate (fitness) of each universe is calculated.
d) The WEP value is calculated.
e) Objects are exchanged between universes.
f) The objects in each Universe are passed to the best Universe.
g) If the end condition is not satisfied, please go to step c).
h) Returning to the best universe formed so far.
(2.3) returning a final fitness value for each individual representation, the final fitness value being measured based on the training data set. The optimization process begins with the creation and initialization of factors such as overall size and upper and lower bounds. The cosmic values are then randomly initialized into a set according to the upper and lower bounds. The corresponding fitness value for each universe is calculated to describe the best potential inflation rate. Then, in a single iteration, high inflation rate objects in the universe tend to migrate through white or black holes to the universe containing low inflation rates. Thus, objects in a single universe will randomly enter the best universe through wormholes. Eventually, an optimal universe is formed at the completion of the operation.
3) And establishing an abnormal intrusion detection model of the feature selection and evolution neural network.
(3.1) inputting input from the test dataset into the trained ANN.
(3.1.1) the goal of optimizing neural networks with training methods is to find the neural network synaptic weights and reduce the MSE representing the cost of the neural network function. By generating a solution population, the MVO algorithm initiates the optimization process assuming that each universe is considered to be an individual in a randomly generated solution population.
(3.1.2) the scale of the problem indicates the size of the solution, the delineation and design of individual MVOs are important considerations in ANN training.
(3.1.3) everyone in ANN training reflects all the weights and biases of ANN structure. The training method aims to find out a correct value, reduce errors and obtain the maximum classification precision.
(3.2) based on the expected output, the ANN test procedure can be considered as the closest match to any target class as the estimated output.
(3.2.1) all individuals in the MVO are vectors containing the connection weights between ANN layers. As shown in equation (11), the number of objects for each person is calculated:
Indvnbr=(n*m)+(2*m)+1 (11)
(3.2.2) in the proposed MVO training algorithm, MSE is mainly used as a cost function. MSE may be calculated using equation (12):
where input represents the actual network traffic data input and output represents the approximate value of the output, TnIs the frequency of collection in the flow data set.
The inventor researches and analyzes an intrusion detection method (CSA & MVO-ANN) based on network abnormal traffic identification, firstly, an optimal feature subset is selected by utilizing an improved Cuckoo fuzzy algorithm (CSA), then the optimal feature subset is input into an MVO-ANN model for training, and finally abnormal traffic information is identified according to an obtained detection model. The verification is carried out by adopting a known data set NSL-KDD, 22 features are selected from the 41 features as an optimal feature subset, and then the optimal feature subset is input into an MVO-ANN model to check the performance of an abnormal intrusion detection system. In order to prove the effectiveness of the CSA & MVO-ANN method, under data sets with different proportions, the detection accuracy and the time efficiency are respectively measured by comparing the detection of abnormal flow by algorithms such as Random Forest (RF), an LM-BP algorithm, a Support Vector Machine (SVM), a Pigeon swarm Optimization (PIO) and the like, the comparison results of the identification accuracy of different algorithms are shown in FIG. 4, and the comparison results of the time efficiency are shown in FIG. 5. The results show that the performance of the method is superior to the above algorithms, and the method shows better and more stable results in terms of detection accuracy and execution time.
Claims (10)
1. An intrusion detection method based on network abnormal flow identification is characterized by comprising the following steps:
and S1, performing feature selection by adopting an improved cuckoo search algorithm, and screening the most accurate and effective features from the original data set to obtain a group of optimal features.
And S2, taking the selected features as input of the evolutionary neural network, and training the artificial neural network by adopting a multi-universe optimization algorithm to obtain the optimal classification effect in order to overcome parameter limitation of the artificial neural network and avoid trapping in local minimum values.
And S3, inputting the test data set into the trained artificial neural network, predicting and evaluating abnormal traffic detection, and obtaining abnormal data information in network traffic.
2. The method according to claim 1, wherein the S1 specifically includes:
s11, sorting the solutions by using a cuckoo search algorithm, and finding the current best result in the post-processing and visualization.
S12, dividing the data set x into C clusters according to the characteristics of the fuzzy data in the data set, nesting, selecting the best one among the C clusters, and placing the eggs in the nested.
3. The intrusion detection method based on the network abnormal traffic identification according to claim 2, wherein the S12 specifically includes:
the best solution is maintained by randomly changing nests with mutations, and the best nests are displayed according to error values and accuracy of classification. Genetic algorithm mutation operators are utilized to create more space and more diverse solutions. For this purpose, mutations are combined with a cuckoo search algorithm, taking into account two steps, random selection of nests, random selection of eggs.
4. The intrusion detection method based on network abnormal traffic identification according to any one of claims 1 to 3, wherein the S2 specifically includes:
s21, sending the selected features as training features of the input information dimension to the ANN module,
s22, this training is performed by transferring the weights to a Multi-Verse Optimizer (MVO) module.
S23, returning a final fitness value for each individual representation, the final fitness value being measured based on the training data set.
5. The intrusion detection method based on the network abnormal traffic identification according to claim 4, wherein the S21 specifically includes:
the module is designed as a multilayer perceptron (MLP) with a single input layer, a hidden layer and an output layer architecture. An artificial neural network includes highly interconnected parallel processing components and distributes multiple inputs to a set of desired outputs. The ANN-MLP classification is supervised learning with known class labels. The result of the summation function is passed through a transfer or activation function.
6. The intrusion detection method based on the network abnormal traffic identification according to claim 5, wherein the S22 specifically includes:
the MVO algorithm consists of three cosmic views: white holes, black holes and wormholes, which form part of the universe. These three concepts constitute the idea behind the MVO algorithm, which provides a simulation of kinetic and cosmic interactions through black holes, white holes and wormholes. During the optimization process, each cosmic value is kept consistent with its inflation rate and the selected cosmic value is identified by using the roulette wheel as a white hole.
7. The intrusion detection method based on the network abnormal traffic identification according to claim 6, wherein the S23 specifically includes:
the optimization process begins with the creation and initialization of factors such as overall size and upper and lower bounds. The cosmic values are then randomly initialized into a set according to the upper and lower bounds. The corresponding fitness value for each universe is calculated to describe the best potential inflation rate. Then, in a single iteration, high inflation rate objects in the universe tend to migrate through white or black holes to the universe containing low inflation rates. Thus, objects in a single universe will randomly enter the best universe through wormholes. Eventually, an optimal universe is formed at the completion of the operation.
8. The intrusion detection method based on network abnormal traffic identification according to any one of claims 1 to 7, wherein an abnormal intrusion detection model of a feature selection and evolution neural network is established, and the S3 specifically includes:
s31, inputting inputs from the test dataset into the trained ANN;
based on the expected output, the ANN test procedure may be considered as the closest match to any target class to serve as an estimated output to identify anomalous data information in the network traffic S32.
9. The method according to claim 8, wherein the S31 specifically includes:
the goal of optimizing the neural network with the training method is to find the synaptic weights of the neural network and reduce the MSE, which represents the functional cost of the neural network. By generating a solution population, the MVO algorithm initiates the optimization process assuming that each universe is considered to be an individual in a randomly generated solution population. The training method aims to find out a correct value, reduce errors and obtain the maximum classification precision.
10. The method according to claim 9, wherein the S32 specifically includes:
according to the expected output, the ANN test process can be regarded as the closest match with any target class to serve as the estimation output, all individuals in the MVO are vectors containing the connection weight between ANN layers, the number of objects of each person is calculated, in the proposed MVO training algorithm, MSE is mainly used as a cost function, and abnormal data information in network traffic is obtained according to the calculation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110330430.XA CN113162919A (en) | 2021-03-22 | 2021-03-22 | Intrusion detection method based on network abnormal flow identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110330430.XA CN113162919A (en) | 2021-03-22 | 2021-03-22 | Intrusion detection method based on network abnormal flow identification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113162919A true CN113162919A (en) | 2021-07-23 |
Family
ID=76885812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110330430.XA Pending CN113162919A (en) | 2021-03-22 | 2021-03-22 | Intrusion detection method based on network abnormal flow identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113162919A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113660273A (en) * | 2021-08-18 | 2021-11-16 | 国家电网公司东北分部 | Intrusion detection method and device based on deep learning under super-fusion framework |
CN115174170A (en) * | 2022-06-23 | 2022-10-11 | 东北电力大学 | VPN encrypted flow identification method based on ensemble learning |
-
2021
- 2021-03-22 CN CN202110330430.XA patent/CN113162919A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113660273A (en) * | 2021-08-18 | 2021-11-16 | 国家电网公司东北分部 | Intrusion detection method and device based on deep learning under super-fusion framework |
CN115174170A (en) * | 2022-06-23 | 2022-10-11 | 东北电力大学 | VPN encrypted flow identification method based on ensemble learning |
CN115174170B (en) * | 2022-06-23 | 2023-05-09 | 东北电力大学 | VPN encryption flow identification method based on ensemble learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sarvari et al. | An efficient anomaly intrusion detection method with feature selection and evolutionary neural network | |
Gharehchopogh | An improved Harris Hawks optimization algorithm with multi-strategy for community detection in social network | |
Hosseini et al. | New hybrid method for attack detection using combination of evolutionary algorithms, SVM, and ANN | |
US11250327B2 (en) | Evolution of deep neural network structures | |
WO2022121289A1 (en) | Methods and systems for mining minority-class data samples for training neural network | |
Guendouz et al. | A discrete modified fireworks algorithm for community detection in complex networks | |
Ghanem et al. | Training a neural network for cyberattack classification applications using hybridization of an artificial bee colony and monarch butterfly optimization | |
CN113162919A (en) | Intrusion detection method based on network abnormal flow identification | |
Alizadeh et al. | Combination of feature selection and hybrid classifier as to network intrusion detection system adopting FA, GWO, and BAT optimizers | |
Noei et al. | A genetic asexual reproduction optimization algorithm for imputing missing values | |
CN113239638A (en) | Overdue risk prediction method for optimizing multi-core support vector machine based on dragonfly algorithm | |
Liu et al. | Integrating multi-objective genetic algorithm and validity analysis for locating and ranking alternative clustering | |
Chatterjee et al. | Non-dominated sorting genetic algorithm—II supported neural network in classifying forest types | |
Yan et al. | A novel clustering algorithm based on fitness proportionate sharing | |
Ravipati et al. | A survey on different machine learning algorithms and weak classifiers based on KDD and NSL-KDD datasets | |
Owusu et al. | A deep learning approach for loan default prediction using imbalanced dataset | |
Parvin et al. | A scalable method for improving the performance of classifiers in multiclass applications by pairwise classifiers and GA | |
Narengbam et al. | Harris hawk optimization trained artificial neural network for anomaly based intrusion detection system | |
Babatunde et al. | Comparative analysis of genetic algorithm and particle swam optimization: An application in precision agriculture | |
Pandithurai et al. | DDoS attack prediction using a honey badger optimization algorithm based feature selection and Bi-LSTM in cloud environment | |
Divya et al. | An Efficient K-Means Clustering Initialization Using Optimization Algorithm | |
CN110276375B (en) | Method for identifying and processing crowd dynamic clustering information | |
CN110276376B (en) | Crowd information clustering method based on super element heuristic algorithm | |
Colanzi et al. | Empirical studies on application of genetic algorithms and ant colony optimization for data clustering | |
Xue et al. | Optimizing neural network classification by using the Cuckoo algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |