CN112351033A - Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network - Google Patents

Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network Download PDF

Info

Publication number
CN112351033A
CN112351033A CN202011228033.3A CN202011228033A CN112351033A CN 112351033 A CN112351033 A CN 112351033A CN 202011228033 A CN202011228033 A CN 202011228033A CN 112351033 A CN112351033 A CN 112351033A
Authority
CN
China
Prior art keywords
population
intrusion detection
genetic algorithm
individuals
industrial control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011228033.3A
Other languages
Chinese (zh)
Other versions
CN112351033B (en
Inventor
刘学君
张小妮
王昊
晏勇
沙芸
曹雪莹
李凯丽
孔祥旻
陈建萍
王文晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Petrochemical Technology
Original Assignee
Beijing Institute of Petrochemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Petrochemical Technology filed Critical Beijing Institute of Petrochemical Technology
Priority to CN202011228033.3A priority Critical patent/CN112351033B/en
Publication of CN112351033A publication Critical patent/CN112351033A/en
Application granted granted Critical
Publication of CN112351033B publication Critical patent/CN112351033B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Manipulator (AREA)

Abstract

The invention relates to a deep learning intrusion detection method based on a double-population genetic algorithm in an industrial control network, which realizes the prediction of whether an intrusion behavior exists in the industrial control network through a constructed novel industrial control network intrusion detection model. The model combines a double-population genetic algorithm, an annealing algorithm, a selection strategy based on population communication, a Hash dictionary storage strategy and an elite strategy, organically integrates the functions of various algorithms and optimization strategies, and further obtains an improved deep neural network model.

Description

Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network
Technical Field
The invention relates to the technical field of signal detection, in particular to a deep learning intrusion detection method based on a double-population genetic algorithm in an industrial control network.
Background
The network security of the industrial control system network in the industries of petrochemical industry and the like is increasingly severe, and the traditional network security intrusion detection algorithm is difficult to be directly applied to the industrial control system. In recent years, the advantages of high recognition rate, high adaptability and the like of deep learning enable the deep learning to have a good application scene in the aspect of industrial control network safety, but deep neural network parameters are often selected according to experience and can not meet the requirement of complex industrial control intrusion detection.
The industrial control network is different from the existing internet, and a network intrusion detection algorithm applied to the field of the internet is difficult to be directly applied to an industrial control network scene. In an industrial control network, any abnormal intrusion behavior in a system directly affects real field control and decision, meanwhile, intrusion detection requires low missing report rate and lower false report rate, the creation process of a model needs to deeply understand the system and the system, and an intrusion detection method based on rules or the model has no portability, so that an accurate and effective model is difficult to obtain. In an industrial control network, a method for intrusion detection by using deep learning obtains some research results, but the construction of a model of the method involves more hyper-parameters, and the detection performance is difficult to be fully exerted. At present, because the selection of the hyper-parameters of the intrusion detection algorithm constructed based on the deep neural network is often selected according to experience, the performance of the false alarm rate or the false alarm rate cannot be optimized at the same time, and the requirement of an actual scene cannot be met.
Disclosure of Invention
In view of this, the present invention provides a deep learning intrusion detection method based on a dual population genetic algorithm in an industrial control network to overcome the defects of the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme: a deep learning intrusion detection method based on a double-population genetic algorithm in an industrial control network comprises the following steps:
reading data;
preprocessing the data;
constructing a novel industrial control network intrusion detection model by using an improved double-population genetic algorithm;
and predicting whether the industrial control network has intrusion behavior by using the novel industrial control network intrusion detection model so as to obtain a prediction result.
Optionally, the preprocessing the data includes:
selecting data characteristics to determine a data set;
dividing a training set, a verification set and a test set for a data set;
carrying out Min-Max normalization or Z-Score normalization processing on the divided data set;
and carrying out One-Hot coding on the labels of the various processed data sets.
Optionally, constructing a novel industrial control network intrusion detection model includes:
determining an optimal solution by adopting an improved double-population genetic algorithm;
and putting the optimal solution into a deep neural network model to obtain a novel industrial control network intrusion detection model.
Optionally, the determining an optimal solution by using an improved double-population genetic algorithm includes:
randomly generating an initial population;
dividing the initial population into two sub-populations;
selecting elite individuals of the two sub-populations respectively through elite senses, and removing the elite individuals from the two sub-populations respectively;
dividing the two sub-populations with the elite individuals removed respectively to obtain communication individuals and populations for selection;
carrying out selection operation, cross operation and mutation operation on the two sub-populations;
and combining the elite individuals, the communication individuals and the mutated individuals of the two sub-populations respectively.
Optionally, the selecting operation performed on the two sub-populations includes:
implementing a tournament selection strategy on the population for selection in the first sub-population; a roulette selection strategy is applied to the population for selection in the second sub-population.
Optionally, the interleaving operation includes:
combining the communication individuals in the first sub-population and the communication individuals in the second sub-population into a population;
randomly crossing all individuals in the combined population, wherein the crossing rate is set as 1;
and averagely dividing the population obtained after the intersection into two parts.
Optionally, the mutation operation comprises: applying an annealing algorithm to the genetic algorithm to vary the crossover rate and the variability rate;
specifically, the start phase starts with a higher mutation rate and crossover rate, and then gradually decreases the mutation rate and crossover rate as the number of iterations increases.
Optionally, the determining an optimal solution by using an improved double-population genetic algorithm further includes:
calculating the fitness value of individuals in the population;
the calculating of the fitness value of the individuals in the population specifically comprises:
and putting each individual in the population into the deep neural network model, and calculating the AUC of the model to be used as the fitness value of the population individual.
Optionally, the determining an optimal solution by using an improved double-population genetic algorithm further includes:
accessing each individual in the population to a fitness hash table;
judging whether the fitness value of the population individual exists in a fitness hash table or not;
if so, judging whether the current iteration reaches the maximum iteration number;
if the current iteration reaches the maximum iteration number, the iteration is ended to obtain the highest fitness value, and the individual corresponding to the highest fitness value is the optimal solution;
if the current iteration does not reach the maximum iteration number, the combined population is used as the next generation initial population, and the operations of division, selection, intersection and variation are carried out again until the maximum iteration number is reached;
and if the fitness value of the population individual does not exist in the fitness hash table, putting the individual into the deep neural network model, and calculating the fitness value of the individual.
The invention also provides a controller for executing the deep learning intrusion detection method based on the double-population genetic algorithm in the industrial control network.
By adopting the technical scheme, the intrusion detection method realizes the prediction of whether the intrusion behavior exists in the industrial control network by adopting a novel industrial control network intrusion detection model constructed by an improved double-population genetic algorithm. The model combines a double-population genetic algorithm, an annealing algorithm, a selection strategy based on population communication, a Hash dictionary storage strategy and an elite strategy, organically integrates the functions of various algorithms and optimization strategies, and further obtains an improved deep neural network model (a novel industrial control network intrusion detection model).
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of an embodiment of a deep learning intrusion detection method based on a dual population genetic algorithm in an industrial control network according to the present invention;
FIG. 2 is a schematic flow chart of a deep learning intrusion detection method based on a dual population genetic algorithm in an industrial control network according to a second embodiment of the present invention;
FIG. 3 is a schematic flow diagram of a dual population genetic algorithm;
FIG. 4 is a schematic diagram of a dual population partitioning process;
FIG. 5 is a schematic diagram of population 1 and population 2 communication and population evolution processes;
FIG. 6 is a schematic diagram of population crossing;
FIG. 7 is a schematic illustration of population merging;
fig. 8 is a line graph of experimental results of a conventional genetic algorithm and a dual population genetic algorithm.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
Fig. 1 is a schematic flow chart provided by an embodiment of a deep learning intrusion detection method based on a dual population genetic algorithm in an industrial control network according to the present invention.
As shown in fig. 1, the deep learning intrusion detection method based on dual population genetic algorithms in an industrial control network according to this embodiment includes:
s11: reading data;
s12: preprocessing the data;
further, the preprocessing the data includes:
selecting data characteristics to determine a data set;
dividing a training set, a verification set and a test set for a data set;
carrying out Min-Max normalization or Z-Score normalization processing on the divided data set;
and carrying out One-Hot coding on the labels of the various processed data sets.
S13: constructing a novel industrial control network intrusion detection model by using an improved double-population genetic algorithm;
further, constructing a novel industrial control network intrusion detection model includes:
determining an optimal solution by adopting an improved double-population genetic algorithm;
and putting the optimal solution into a deep neural network model to obtain a novel industrial control network intrusion detection model.
S14: and predicting whether the industrial control network has intrusion behavior by using the novel industrial control network intrusion detection model so as to obtain a prediction result.
The intrusion detection method provided by the embodiment realizes the prediction of whether the intrusion behavior exists in the industrial control network through the constructed novel industrial control network intrusion detection model. The model combines a double-population genetic algorithm, an annealing algorithm, a selection strategy based on population communication, a Hash dictionary storage strategy and an elite meaning strategy, organically integrates the functions of various algorithms and optimization strategies, and further obtains an improved deep neural network model.
Fig. 2 is a schematic flow chart provided by a second embodiment of the deep learning intrusion detection method based on the double population genetic algorithm in the industrial control network according to the present invention.
As shown in fig. 2, the method for detecting deep learning intrusion based on dual population genetic algorithms in an industrial control network according to this embodiment includes:
s201: reading data;
s202: preprocessing the data;
s203: randomly generating an initial population;
s204: putting each individual in the population into a deep neural network model, and calculating AUC of the model to be used as a fitness value of the population individual;
s205: dividing the population into two sub-populations;
s206: selecting elite individuals of the two sub-populations respectively through elite senses, and removing the elite individuals from the two sub-populations respectively;
s207: dividing the two sub-populations with the elite individuals removed respectively to obtain communication individuals and populations for selection; selecting the two sub-populations based on a selection strategy of population communication;
s208: performing population crossing operation;
s209: updating the crossing rate;
s210: performing population variation operation;
s211: updating the variation rate;
s212: combining the elite individuals, the communication individuals and the mutated individuals of the two sub-populations respectively;
s213: accessing each individual in the population to a fitness hash table;
s214: judging whether the fitness value of the population individual exists in a fitness hash table or not;
s215: if so, judging whether the current iteration reaches the maximum iteration number;
s216: if the current iteration reaches the maximum iteration number, the iteration is ended to obtain the highest fitness value, and the individual corresponding to the highest fitness value is the optimal solution;
otherwise, the merged population is used as the next generation initial population, and the operation of dividing, selecting, crossing and varying is carried out again in the step S205 until the maximum iteration number is reached;
s217: if the fitness value of the population individual does not exist in the fitness hash table, putting the individual into a deep neural network model, calculating the fitness value of the individual, judging whether the current iteration reaches the maximum iteration number, and executing the step S216;
s218: putting the optimal solution into a deep neural network model to obtain a novel industrial control network intrusion detection model;
s219: and predicting whether the industrial control network has intrusion behavior by using the novel industrial control network intrusion detection model to obtain a prediction result.
When the method described in this embodiment is performed,
the first step is to enter a data reading and preprocessing module. Firstly, simply selecting data characteristics, and deleting characteristic value characteristics or characteristics which have little influence on data according to actual meanings so as to save algorithm overhead; secondly, dividing a training set, a verification set and a test set for the data set; then carrying out Min-Max normalization or Z-Score normalization on the divided data sets, and determining the mode according to individual population; and finally, carrying out One-Hot coding on the labels of the classified data sets.
And the second step is entering a deep neural network module. The deep neural network module is applied to two places, namely calculation of fitness values of individuals in a genetic algorithm population, and final model training, verification and testing after final parameters are obtained. The first application here is to calculate the fitness value of an individual in a genetic algorithm population, specifically, the deep neural network adopts an error inverse propagation algorithm, for each training sample, an input example is provided for an input neuron first, signals are continuously transmitted layer by layer forward until an output result is generated, the error of an output layer is calculated after the result is obtained, the error is inversely propagated to a hidden layer neuron, finally, the weight and the bias are adjusted according to the error of the hidden layer neuron, and the process is repeated until a termination condition is reached.
And step three, entering a double population genetic algorithm module. The dual population genetic algorithm module is an important and innovative part of the model. Although the genetic algorithm has good performance, a single population selection mode still has a large promotion space, so that a framework of a double-population genetic algorithm is provided, the double-population genetic algorithm is improved by various optimization algorithms and optimization strategies, and the high searching speed and the accuracy of an optimal solution are guaranteed to be kept when the solution space is large.
And fourthly, entering a second application part of the deep neural network, putting the optimal solution into a deep neural network model, training, verifying and predicting by using data, finally obtaining various indexes of the intrusion detection model, and analyzing and evaluating the indexes.
The key point is that the intrusion detection model based on the deep neural network is automatically and efficiently constructed by adopting the genetic algorithm, the quality of the genetic algorithm directly determines the efficiency and the accuracy of the model, and the dual-population genetic algorithm is explained in detail below.
We decided to innovate and optimize in the genetic algorithm: firstly, abandoning the traditional genetic algorithm, adopting a double-population genetic algorithm, and enriching population individuals by increasing the population quantity and the like; in addition, the double-population genetic algorithm is effectively created, a selection strategy based on population exchange, an elite strategy and Hash fitness storage are used as optimization strategies, a simulated annealing algorithm is used as an optimization algorithm and is combined in the double-population genetic algorithm, and a novel double-population genetic algorithm comprehensive framework is obtained to replace the conventional algorithm.
The dual population genetic algorithm begins with the initial generation of a series of chromosomes, typically in a random generation manner. Subsequently, the dual population genetic algorithm divides the population into two classes, each of which gradually evolves toward an optimal solution by a combination of algorithms similar to the natural evolution process, such as selection, crossover, mutation, and the like. During the evolution of the algorithm, the optimal solution it produces needs to be evaluated according to the fitness function. When the algebra reaches a certain number or reaches a satisfactory fitness level, the algorithm is terminated, and the implementation flow is as shown in fig. 3.
In the framework of the dual-population genetic algorithm, the species diversity and the global searching capability of the genetic algorithm are enhanced through selection, intersection and mutation operations of two populations respectively. The traditional double-population genetic algorithm has higher overlapping performance on the processing capacity of various operators, and in order to avoid the problem, a selection strategy based on population communication is adopted.
The selection strategy based on population exchange is a very effective innovation for a double-population algorithm, the strategy is innovatively fused in the double-population algorithm, the defect of the traditional double-population genetic algorithm can be changed, namely the defect of single selection strategy of the traditional double-population genetic algorithm is changed, and the probability of high-quality individuals and the average score of the whole population are improved. Some individuals of one population may enter another population and breed offspring in it, which brings new genes to the latter population, which may have a great influence on this population due to the uncertainty of the genes, i.e. the genes of the population with new individuals will be superior to those of the original two populations. Before introducing a strategy based on population exchange, two necessary algorithms need to be known in advance, namely roulette selection and tournament selection.
The roulette selection method is to calculate the probability of each individual appearing in the offspring according to the fitness value of the individual and randomly select the individual to form an offspring population according to the probability, so that when the maximization problem is solved, the fitness value can be directly adopted for selection. The tournament selection method takes a certain number of individuals from the population each time, then selects the best one of them to enter the offspring population, and repeats the operation until the new population size reaches the original population size. On the basis of the above, the present embodiment introduces an improved algorithm based on the combination of the roulette algorithm and the tournament algorithm, i.e. a selection strategy based on population exchange. The selection strategy based on population communication is that one part of individuals adopt roulette selection, the other part adopts tournament selection, on the basis, populations on two sides communicate with each other, namely, a small number of individuals enter each other, and each generation adopts the mode until the iteration times are finished.
The key point of the model is to implement an optimized double-population genetic algorithm to obtain the optimal parameters for constructing the DNN model. The traditional genetic algorithm is replaced by the double-population genetic algorithm, the selection strategy based on population communication is used for optimization, and the adaptive value Hash storage, the simulated annealing algorithm and the elite meaning are integrated into a frame, so that the integral model is better in performance. The implementation process of the optimized double-population algorithm will be described in detail below.
(1) The dual population genetic algorithm process begins by randomly generating N individuals whose chromosomes represent potential optimization solutions, each chromosome being a binary string of 58 bits, each chromosome being a possible combination of the aforementioned relevant parameters. Each parameter can be considered as a gene in the chromosome, as shown in table 1.
Figure BDA0002764231030000101
TABLE 1 chromosomal coding
(2) And dividing the initial population according to the double-population idea, and combining the initial population with the elite meaning. The idea of eligibility is that the optimal solution obtained in some intermediate step may be lost when crossover and mutation create a new generation. Therefore, when a new generation is generated, the current optimal solution is copied into the new generation without change, and each next generation is performed according to the program. The elite method can greatly increase the computation speed because it can prevent missing the found excellent solution.
The initial population is divided into two sub-populations 1, 2, and at this time, it should be combined with the elite meaning, that is, according to the elite meaning, several individuals with best performance in the sub-populations 1, 2 are selected, the elite individuals are respectively removed from the populations 1, 2, the sub-population 1 from which the elite individual is removed is recorded as population 1, and the sub-population 2 from which the elite individual is removed is recorded as population 2, and the implementation process is shown in fig. 4. The elite individual is temporarily stored for subsequent use, and the populations 1 and 2 adopt subsequent operations.
(3) And realizing a selection strategy based on population exchange. A population exchange based selection strategy combines a roulette selection strategy, a tournament selection strategy, and a population individual crossover strategy to perform better than a roulette selection alone. Roulette selection, tournament selection, and population communication-based selection methods have been described above and will not be described in detail herein. Population 1 employed the tournament selection strategy and population 2 employed the roulette selection strategy, which was implemented as shown in fig. 5.
Next, the individual cross _1, cross _2 used for population communication are interleaved. Firstly, all individuals are combined into a population, the population is disordered and randomly crossed, the crossing rate at the moment is set to be 1, all the individuals are guaranteed to be crossed, and after the crossed population is obtained, the crossed population is averagely divided into two parts for subsequent population combination. This process is illustrated in fig. 6.
(4) And applying the simulated annealing algorithm to the population cross variation of the genetic algorithm. The algorithm begins with a high mutation rate and crossover rate and then gradually decreases as the algorithm iterates. This initial high mutation rate and crossover rate will force the genetic algorithm to search for the optimal solution in a larger search space to avoid trapping in locally optimal solutions. In the embodiment, a temperature variable Temp is introduced, and the process is realized by setting a cooling coefficient CoolingRate. At the end of each iteration of the genetic algorithm, the temperature is slightly cooled, thereby reducing the crossover and mutation rates used by the next round of genetic algorithm. Wherein Temp represents the temperature variable introduced in the simulated annealing algorithm, CoolingRate is the cooling coefficient for controlling the cooling process, and the crossover operator adopts the single-point crossover mode, i.e. only one bit is performed during each crossover; the mutation operator adopts a bit flipping mode.
(5) The merging operation is required after the genetic variation is finished.
The operations of accessing, storing and querying the hash table are performed in each generation of double populations after the initial population is generated and after various types of selection, crossover and mutation. For each instance, if its moderate value exists in the hash table, it is extracted from the hash table, otherwise the chromosome will be used to create an instance of the intrusion detection algorithm based on the deep neural network. Through continuous evolution, the genetic algorithm tends to be globally optimal through the aforementioned genetic operation, and when the optimization condition is met, the optimal chromosome is selected as a final result. Thus, a second order sub-population of all sub-populations is obtained, elite individuals, individuals for communication and individuals for inheritance are combined to obtain a new sub-population of the next generation, and the combination method is shown in fig. 7 to obtain a sub-population 1, and a new sub-population 2 is obtained according to the same method. And after the set algebra is reached, the double-population algorithm is ended. Putting the optimal solution into a deep neural network model to obtain a novel industrial control network intrusion detection model; finally, whether the intrusion behavior of the industrial control network exists or not can be predicted by utilizing the novel industrial control network intrusion detection model, and a prediction result is obtained.
In order to demonstrate the superior performance of the method described in this implementation compared to conventional genetic algorithms, the following experimental information is now provided.
The experimental indexes are as follows: as for the fitness evaluation index used in the experiment, the AUC index is selected as a moderate function of the genetic algorithm in the experiment. The AUC is a commonly used performance measurement index in a network intrusion detection algorithm, represents the capability of the detection algorithm for avoiding network data packet misclassification, and is a good balance between the rate of missing report and the rate of false report. For the final model evaluation index, accuracy (accuracycacy), precision (precision), Recall (Recall), Detectivity (DR) F-Score, TPR, FPR, etc. are also used herein.
Western data set: network transaction between a remote terminal unit and a master control unit in an SCADA natural gas pipeline inside the mississippi state university. A new data set is collected using a novel framework for simulating actual attacks and operator activities on a natural gas pipeline. The data set contains three separate classes of functions: network information, payload information, and tags. The CICIDS2017 dataset contains benign and recent common attacks, like real world data. It also includes the results of network traffic analysis using the CICFlowMeter, using markup streams based on timestamps, source and destination ip, source and destination ports, protocols, and attacks.
The experimental results of the conventional genetic algorithm and the double population genetic algorithm are shown in fig. 8. In the figure, the solid line is the test result of the data of the gas tank at the mississippi state university on the traditional genetic algorithm framework, and the dotted line is the test result of the data of the gas tank at the mississippi state university on the comprehensive framework adopting the dual population genetic algorithm. As can be seen from fig. 8, for the westward data set, the whole curve of the conventional genetic algorithm rises smoothly, and the AUC mean value is about 0.9496 after reaching the maximum number of iterations of 10 generations; the whole curve of the double-population genetic algorithm rises quickly, and the AUC mean value is about 0.9594 after the set maximum iteration number is reached for 10 generations.
After a natural gas tank experimental data set and a CICIDS2017 data set based on 2014 of Mississippi State university are subjected to a double population genetic algorithm comprehensive framework, the optimal parameters for constructing a deep neural network model are obtained, the parameters are used for constructing a final deep neural network, and experimental results are listed as the following table, which is shown in Table 2. The results include AUC, ACC, DR, FAR, Precision, Recall, F-Score, TNR and FNR, and the analysis shows that all the indexes are excellent.
Figure BDA0002764231030000131
TABLE 2 Final test results for each model
The dual population genetic algorithm involved in this embodiment injects new individuals into a gradually single population, and these newly injected individuals are cross-generated by excellent individuals that have undergone a certain evolutionary selection, thereby causing the new population to converge more efficiently toward an optimal solution. In the optimization measures, the embodiment introduces a selection strategy based on population exchange, and can combine a roulette selection algorithm and a tournament selection algorithm, namely: the roulette selection strategy has low convergence speed, high-quality individuals are reserved at a large probability, but the high-quality individuals are abandoned at a small probability and are easy to fall into a local optimal solution; the tournament selection strategy has high convergence speed, is easy to fall into a local optimal solution, and has better overall performance than the roulette selection. Through combination, the selection strategy based on population exchange reserves the advantages of two algorithms, the convergence speed is moderate, but a small probability is trapped in a local optimal solution, and for the solution, the simulated annealing algorithm plays a huge role, the selection cross variation is performed at a higher speed in the early stage, more new individuals are generated, the probability of generating a better solution is increased, the performance in the later stage is integrally higher, the cross variation speed is slowed down, and the generation of redundant low-quality individuals is avoided. While the elite sense always keeps the optimal solution of each generation, and ensures that the best quality individual can be propagated. The DNN anomaly detection model (namely, the novel industrial control network intrusion detection model) based on the improved double population genetic algorithm framework plays a great role in optimizing iteration time and improving algorithm accuracy.
The invention also provides a controller, which is used for executing the deep learning intrusion detection method based on the double-population genetic algorithm in the industrial control network shown in fig. 1 or fig. 2.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A deep learning intrusion detection method based on a double-population genetic algorithm in an industrial control network is characterized by comprising the following steps:
reading data;
preprocessing the data;
constructing a novel industrial control network intrusion detection model by using an improved double-population genetic algorithm;
and predicting whether the industrial control network has intrusion behavior by using the novel industrial control network intrusion detection model so as to obtain a prediction result.
2. The deep learning intrusion detection method according to claim 1, wherein the preprocessing the data comprises:
selecting data characteristics to determine a data set;
dividing a training set, a verification set and a test set for a data set;
carrying out Min-Max normalization or Z-Score normalization processing on the divided data set;
and carrying out One-Hot coding on the labels of the various processed data sets.
3. The deep learning intrusion detection method according to claim 1 or 2, wherein the building of the novel industrial control network intrusion detection model comprises:
determining an optimal solution by adopting an improved double-population genetic algorithm;
and putting the optimal solution into a deep neural network model to obtain a novel industrial control network intrusion detection model.
4. The deep learning intrusion detection method according to claim 3, wherein the determining an optimal solution using an improved two-population genetic algorithm comprises:
randomly generating an initial population;
dividing the initial population into two sub-populations;
selecting elite individuals of the two sub-populations respectively through elite senses, and removing the elite individuals from the two sub-populations respectively;
dividing the two sub-populations with the elite individuals removed respectively to obtain communication individuals and populations for selection;
carrying out selection operation, cross operation and mutation operation on the two sub-populations;
and combining the elite individuals, the communication individuals and the mutated individuals of the two sub-populations respectively.
5. The deep learning intrusion detection method according to claim 4, wherein the selecting operation for the two sub-populations comprises:
implementing a tournament selection strategy on the population for selection in the first sub-population; a roulette selection strategy is applied to the population for selection in the second sub-population.
6. The deep learning intrusion detection method of claim 5, wherein the interleaving operation comprises:
combining the communication individuals in the first sub-population and the communication individuals in the second sub-population into a population;
randomly crossing all individuals in the combined population, wherein the crossing rate is set as 1;
and averagely dividing the population obtained after the intersection into two parts.
7. The method of claim 6, wherein the mutation operation comprises: applying an annealing algorithm to the genetic algorithm to vary the crossover rate and the variability rate;
specifically, the start phase starts with a higher mutation rate and crossover rate, and then gradually decreases the mutation rate and crossover rate as the number of iterations increases.
8. The deep learning intrusion detection method according to claim 4, wherein the determining an optimal solution using an improved two-population genetic algorithm further comprises:
calculating the fitness value of individuals in the population;
the calculating of the fitness value of the individuals in the population specifically comprises:
and putting each individual in the population into the deep neural network model, and calculating the AUC of the model to be used as the fitness value of the population individual.
9. The deep learning intrusion detection method according to claim 8, wherein the determining an optimal solution using an improved two-population genetic algorithm further comprises:
accessing each individual in the population to a fitness hash table;
judging whether the fitness value of the population individual exists in a fitness hash table or not;
if so, judging whether the current iteration reaches the maximum iteration number;
if the current iteration reaches the maximum iteration number, the iteration is ended to obtain the highest fitness value, and the individual corresponding to the highest fitness value is the optimal solution;
if the current iteration does not reach the maximum iteration number, the combined population is used as the next generation initial population, and the operations of division, selection, intersection and variation are carried out again until the maximum iteration number is reached;
and if the fitness value of the population individual does not exist in the fitness hash table, putting the individual into the deep neural network model, and calculating the fitness value of the individual.
10. A controller configured to perform the deep learning intrusion detection method based on the dual population genetic algorithm in the industrial control network according to any one of claims 1 to 9.
CN202011228033.3A 2020-11-06 2020-11-06 Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network Active CN112351033B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011228033.3A CN112351033B (en) 2020-11-06 2020-11-06 Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011228033.3A CN112351033B (en) 2020-11-06 2020-11-06 Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network

Publications (2)

Publication Number Publication Date
CN112351033A true CN112351033A (en) 2021-02-09
CN112351033B CN112351033B (en) 2022-09-13

Family

ID=74429603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011228033.3A Active CN112351033B (en) 2020-11-06 2020-11-06 Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network

Country Status (1)

Country Link
CN (1) CN112351033B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095433A (en) * 2021-04-27 2021-07-09 北京石油化工学院 Method for training intrusion detection network structure model
CN113128655A (en) * 2021-05-07 2021-07-16 北京石油化工学院 Multi-population genetic algorithm-based industrial control intrusion detection classifier parameter selection method
CN113344071A (en) * 2021-06-02 2021-09-03 沈阳航空航天大学 Intrusion detection algorithm based on depth strategy gradient
CN113591078A (en) * 2021-08-03 2021-11-02 暨南大学 Industrial control intrusion detection system and method based on convolutional neural network architecture optimization
CN114422262A (en) * 2022-02-21 2022-04-29 上海应用技术大学 Industrial control network intrusion detection model construction method based on automatic machine learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605793A (en) * 2013-12-04 2014-02-26 西安电子科技大学 Heterogeneous social network community detection method based on genetic algorithm
US20170339187A1 (en) * 2016-05-19 2017-11-23 Nec Europe Ltd. Intrusion detection and prevention system and method for generating detection rules and taking countermeasures
CN109688154A (en) * 2019-01-08 2019-04-26 上海海事大学 A kind of Internet Intrusion Detection Model method for building up and network inbreak detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605793A (en) * 2013-12-04 2014-02-26 西安电子科技大学 Heterogeneous social network community detection method based on genetic algorithm
US20170339187A1 (en) * 2016-05-19 2017-11-23 Nec Europe Ltd. Intrusion detection and prevention system and method for generating detection rules and taking countermeasures
CN109688154A (en) * 2019-01-08 2019-04-26 上海海事大学 A kind of Internet Intrusion Detection Model method for building up and network inbreak detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何昌武: "双种群混合遗传算法的研究及应用", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
朱建军等: "工控网络异常行为的RST-SVM入侵检测方法", 《电子测量与仪器学报》 *
袁琴琴等: "基于改进蚁群算法与遗传算法组合的网络入侵检测", 《重庆邮电大学学报. 自然科学版》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095433A (en) * 2021-04-27 2021-07-09 北京石油化工学院 Method for training intrusion detection network structure model
CN113095433B (en) * 2021-04-27 2023-06-23 北京石油化工学院 Training method for intrusion detection network structure model
CN113128655A (en) * 2021-05-07 2021-07-16 北京石油化工学院 Multi-population genetic algorithm-based industrial control intrusion detection classifier parameter selection method
CN113128655B (en) * 2021-05-07 2024-02-02 北京石油化工学院 Industrial control intrusion detection classifier parameter selection method based on multiple swarm genetic algorithms
CN113344071A (en) * 2021-06-02 2021-09-03 沈阳航空航天大学 Intrusion detection algorithm based on depth strategy gradient
CN113344071B (en) * 2021-06-02 2024-01-26 新疆能源翱翔星云科技有限公司 Intrusion detection algorithm based on depth strategy gradient
CN113591078A (en) * 2021-08-03 2021-11-02 暨南大学 Industrial control intrusion detection system and method based on convolutional neural network architecture optimization
CN113591078B (en) * 2021-08-03 2024-06-07 暨南大学 Industrial control intrusion detection system and method based on convolutional neural network architecture optimization
CN114422262A (en) * 2022-02-21 2022-04-29 上海应用技术大学 Industrial control network intrusion detection model construction method based on automatic machine learning

Also Published As

Publication number Publication date
CN112351033B (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN112351033B (en) Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network
Peng et al. An improved feature selection algorithm based on ant colony optimization
KR102274389B1 (en) Method for building anomaly pattern detection model using sensor data, apparatus and method for detecting anomaly using the same
Kim et al. Genetic algorithm to improve SVM based network intrusion detection system
US8700548B2 (en) Optimization technique using evolutionary algorithms
CN111046664A (en) False news detection method and system based on multi-granularity graph convolution neural network
CN111626431A (en) System and method for operating a data center based on a generated machine learning pipeline
CN111275172B (en) Feedforward neural network structure searching method based on search space optimization
CN114373101A (en) Image classification method for neural network architecture search based on evolution strategy
CN112085161B (en) Graph neural network method based on random information transmission
CN116934220A (en) Intelligent warehouse layout method based on intelligent data analysis and algorithm optimization
Rostami et al. A clustering based genetic algorithm for feature selection
CN112765415A (en) Link prediction method based on relational content joint embedding convolution neural network
CN110738362A (en) method for constructing prediction model based on improved multivariate cosmic algorithm
CN112164426A (en) Drug small molecule target activity prediction method and device based on TextCNN
CN114328048A (en) Disk fault prediction method and device
CN112464996A (en) Intelligent power grid intrusion detection method based on LSTM-XGboost
CN113139570A (en) Dam safety monitoring data completion method based on optimal hybrid valuation
CN113673679A (en) Cut tobacco drying process parameter selection method based on particle swarm optimization neural network
Wu et al. A low-sample-count, high-precision Pareto front adaptive sampling algorithm based on multi-criteria and Voronoi
CN114511131A (en) Network security situation prediction method and system based on machine learning algorithm
CN114118567A (en) Power service bandwidth prediction method based on dual-channel fusion network
CN115987552A (en) Network intrusion detection method based on deep learning
CN116684877A (en) GYAC-LSTM-based 5G network traffic anomaly detection method and system
CN115481727A (en) Intention recognition neural network generation and optimization method based on evolutionary computation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Liu Xuejun

Inventor after: Wang Wenhui

Inventor after: Zhang Xiaoni

Inventor after: Wang Hao

Inventor after: Yan Yong

Inventor after: Sha Yun

Inventor after: Cao Xueying

Inventor after: Li Kaili

Inventor after: Kong Xiangmin

Inventor after: Chen Jianping

Inventor before: Liu Xuejun

Inventor before: Wang Wenhui

Inventor before: Zhang Xiaoni

Inventor before: Wang Hao

Inventor before: Yan Yong

Inventor before: Sha Yun

Inventor before: Cao Xueying

Inventor before: Li Kaili

Inventor before: Kong Xiangmin

Inventor before: Chen Jianping