CN112532633A - Industrial network firewall rule generation method and device based on machine learning - Google Patents

Industrial network firewall rule generation method and device based on machine learning Download PDF

Info

Publication number
CN112532633A
CN112532633A CN202011375118.4A CN202011375118A CN112532633A CN 112532633 A CN112532633 A CN 112532633A CN 202011375118 A CN202011375118 A CN 202011375118A CN 112532633 A CN112532633 A CN 112532633A
Authority
CN
China
Prior art keywords
rule
industrial network
industrial
data packets
firewall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011375118.4A
Other languages
Chinese (zh)
Other versions
CN112532633B (en
Inventor
吴宣够
张昊
沈浩
樊旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Technology AHUT
Original Assignee
Anhui University of Technology AHUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Technology AHUT filed Critical Anhui University of Technology AHUT
Priority to CN202011375118.4A priority Critical patent/CN112532633B/en
Publication of CN112532633A publication Critical patent/CN112532633A/en
Application granted granted Critical
Publication of CN112532633B publication Critical patent/CN112532633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0263Rule management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method and a device for generating industrial network firewall rules based on machine learning, relating to the technical field of industrial network security; the method comprises the following steps: 1) configuring a firewall into a gateway for clearing a rule file; 2) the gateway runs irregularly and is classified according to quintuple information, an industrial protocol and an industrial network function in the log file; 3) counting the same quintuple information, industrial protocol and data packet information under the industrial network function in the log file to generate a characteristic vector; 4) assigning a label to the feature vector; 5) generating a model through the characteristic vector training rule after the label is given; 6) obtaining a rule file; 7) dynamically adjusting the rule priority according to the data packet information passed by different rules in a fixed time period; according to the method, under the condition of small amount of manual intervention, the corresponding firewall rules are generated through the machine learning model, the rule files are obtained, and the rules in different time periods are dynamically regulated in priority through a clustering method, so that reasonable distribution of resources is realized.

Description

Industrial network firewall rule generation method and device based on machine learning
Technical Field
The invention relates to the technical field of industrial network security, in particular to a method and a device for generating an industrial network firewall rule based on machine learning.
Background
With the rapid development of the internet, the deep integration of new technologies with manufacturing is leading to a significant revolution in production modes, industrial modalities, business models and economic growth points. The degree of integration and innovation of the internet and industry is continuously promoted, a large amount of manufacturing infrastructure increasingly depends on the network, once the network is attacked, huge economic loss is caused, environmental disasters and casualties are more likely to be caused, and the public life and national safety are endangered. Therefore, the security of the industrial network is also becoming a topic of major concern in the country.
Although the traditional firewall can achieve a good effect in the aspect of common data filtering, in an industrial network, because different industrial protocols are used, the traditional firewall based on quintuple information cannot ensure the safety of the industrial network; therefore, the industrial protocol is analyzed and added into the filtering content by combining the existing firewall, and different filtering rules aiming at different industrial protocols are formulated, so that the aim of realizing the industrial firewall can be achieved. However, rule planning relying on human power gradually presents problems due to the complexity of the amount of content to be supervised. How to reduce human resources and making rules according to actual network conditions become a problem of gradual consideration in the field of firewalls.
Disclosure of Invention
The invention aims to introduce a method and a device for generating industrial network firewall rules based on machine learning, which can generate the industrial firewall adaptive to the actual industrial network rules by only a small amount of manpower, and realize the adaptive adjustment of rule priorities by adopting a flow analysis method.
In order to achieve the above purpose, the invention provides the following technical scheme: a method for generating industrial network firewall rules based on machine learning comprises the following steps:
(1) configuring a firewall into a gateway, and cleaning all rule files contained in the gateway;
(2) entering a learning mode: the gateway configures rule information passing all industrial network data packets, collects all industrial network data packets passing in a set time interval, and stores the information of log files of the gateway in the set time interval according to quintuple information, an industrial protocol and an industrial network function in a classified manner;
(3) characteristic pretreatment: performing integration statistics on log information with the same quintuple information, industrial protocol and industrial network functions in the log file, calculating the number of industrial network data packets with the same quintuple information, industrial protocol and industrial network functions and the average arrival time of the industrial network data packets, and generating a group of first feature vectors containing the quintuple information, the industrial protocol, the industrial network functions, the average arrival time of the data packets and the number of the industrial network data packets; the first feature vector is denoted as a, a ═ IPs,IPd,PTs,PTd,ProtolT,ProtolA,U,T,N]Wherein, IPsIs a source IP, IP in quintuple informationdIs a destination ip, PT in the quintuple informationsIs a source port, PT, in the five-tuple informationdProtol, a destination port in quintuple informationTIs a transport layer protocol, in quintuple informationAThe method is an industrial network protocol, U is an industrial network function, T is the average arrival time of data packets, and N is the number of the data packets;
(4) and (3) endowing a label: randomly selecting a plurality of first eigenvectors in a first eigenvector group to make labels, wherein the labels are processing modes of the data packets in the firewall rules, and the processing modes comprise passing, alarming and discarding;
(5) model learning: training a plurality of first feature vectors given with labels as a model training set to obtain a rule generation model;
(6) a rule file: taking any first feature vector which is not endowed with a label in the first feature vector group as the input of a rule generation model to generate a label of the first feature vector; any one first feature vector in the first feature vector group and a corresponding label form a rule file of the firewall;
(7) rule priority adjustment: collecting the number of industrial network data packets passing through different firewall rules and the average arrival time of the data packets in a preset time interval, and dynamically adjusting the rule priority of the firewall.
Further, the specific process of model learning in the step (5) is as follows: and (3) extracting a training sample from a training set by using a random forest RF algorithm and returning each decision tree, and performing machine learning to obtain a rule generation model.
Further, the specific process of model learning in the step (5) is as follows: fitting input feature vector information into a logic function by using a logistic regression algorithm to estimate the probability of any label, and then training by using a deep learning algorithm to obtain a rule generation model;
further, the specific process of adjusting the rule priority of the firewall in the step (7) is as follows:
acquiring flow information of any rule in a firewall within a preset time interval;
counting the number of industrial network data packets processed by any rule and the average arrival time of the industrial network data packets under the rule according to the flow information to generate a second feature vector S;
and clustering by taking the second feature vector S as input, and classifying according to a preset priority order to form the rule priority of the firewall.
Further, the step (6) further includes rule integration, and the rule integration is performed as merging of firewall rules with rule consistency in the rule file.
Further, the rule priority is processed by zeroing the second feature vector S before any preset time interval is dynamically adjusted.
The invention also discloses a device for generating the industrial network firewall rules based on machine learning, which comprises: a processor for executing the following program modules stored in memory;
the first configuration module is used for configuring the firewall into the gateway and cleaning all rule files contained in the gateway;
the second configuration module is used for configuring a piece of rule information passing through all the industrial network data packets for the gateway;
the first collection module is used for collecting all industrial network data packets which pass through the gateway within a set time interval;
the classification module is used for classifying and storing the information of the log files of the gateway in a set time interval according to quintuple information, an industrial protocol and an industrial network function;
the preprocessing module is used for preprocessing log information with the same quintuple information, industrial protocol and industrial network functions in the log file; the preprocessing operation is to calculate the number of industrial network data packets with the same quintuple information, industrial protocols and industrial network functions and the average arrival time of the industrial network data packets, and generate a group of first feature vectors containing the quintuple information, the industrial protocols, the industrial network functions, the average arrival time of the data packets and the number of the industrial network data packets;
the label endowing module is used for endowing labels to a plurality of first characteristic vectors in a first characteristic vector group selected randomly, wherein the labels are processing modes of the data packets in the firewall rules, and the processing modes comprise passing, alarming and discarding;
the training module is used for training according to a model training set to obtain a rule generation model, wherein the model training set is a plurality of first feature vectors after the labels are formulated;
the rule generating module is used for generating a label of any first feature vector which is not endowed with a label in a first feature vector group according to a rule generating model, wherein any first feature vector which is not endowed with a label in the first feature vector group is input into the rule generating model; any one first feature vector in the first feature vector group and a corresponding label form a rule file of the firewall;
the second collection module is used for collecting the number of industrial network data packets passing through different firewall rules and the average arrival time of the data packets;
and the priority adjusting module is used for dynamically adjusting the rule priority of the firewall within a preset time interval according to the data packet information collected by the second collecting module.
Further, the priority adjustment module comprises:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the flow information of any rule in the firewall within a preset time interval;
the generating unit is used for generating a second feature vector S according to the flow information and the data packet information collected by the second collecting module;
and the clustering unit is used for performing clustering processing by taking the second characteristic vector S as input, and classifying according to a preset priority order to form the rule priority of the firewall.
The invention also discloses a non-transitory computer readable storage medium, which stores computer instructions for causing the computer to execute the method for generating the industrial network firewall rules based on machine learning.
According to the technical scheme, the method and the device for generating the industrial network firewall rules based on machine learning have the following beneficial effects:
the invention provides a method and a device for generating industrial network firewall rules based on machine learning, relating to the technical field of industrial network security; the method comprises the following steps: 1) configuring a firewall into a gateway, and cleaning all rule files contained in the firewall; 2) classifying all industrial network data packets in a period of time according to quintuple information, an industrial protocol and industrial network functions in the generated log file; 3) performing feature preprocessing on the collected log files, counting the number and arrival time of data packets under the same quintuple information, industrial protocol and industrial network function, and generating a group of first feature vectors containing the quintuple information, the industrial protocol, the industrial network function, the average arrival time and the number of the data packets; 4) giving labels to the first feature vectors; 5) generating a model through the characteristic vector training rule after the label is given; 6) obtaining a rule file; 7) collecting the number of data packets passing through different rules and average arrival time in a fixed time period, and dynamically adjusting the rule priority; the method for generating the industrial network firewall rule based on machine learning generates the corresponding firewall rule through the machine learning model, and generates the firewall rule file which best accords with the current industrial network under the condition of a small amount of manual intervention, thereby greatly reducing the labor cost; and through a clustering method, the priority self-adaptive adjustment is carried out on the rules in different time periods, the reasonable allocation of resources is realized aiming at the industrial networks in different time periods, and the resource overhead is reduced.
The self-adaptive priority adjustment method has the advantages that when the number of the rules is large, and the firewall runs, if the rules are matched from top to bottom according to the rule file, the matching time is long, and the processing efficiency of the firewall is high by configuring the rule priority; in addition, the self-adjustment of the rule priority is adjusted according to the actual condition of the network, if the number of data packets matched with a certain rule in a period of time is large and the average time is short, the priority of the rule is adjusted to be high, the priority of the rule with the low number and the long average time is adjusted to be low, so that the network processing efficiency of the industrial network firewall is improved, and the requirement of real-time network protection is met.
It should be understood that all combinations of the foregoing concepts and additional concepts described in greater detail below can be considered as part of the inventive subject matter of this disclosure unless such concepts are mutually inconsistent.
The foregoing and other aspects, embodiments and features of the present teachings can be more fully understood from the following description taken in conjunction with the accompanying drawings. Additional aspects of the present invention, such as features and/or advantages of exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of specific embodiments in accordance with the teachings of the present invention.
Drawings
The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of rule generation model acquisition in the present invention;
fig. 2 is a flow chart of the operation of the industrial network firewall in the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention. Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.
The use of "first," "second," and similar terms in the description and claims of the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. Similarly, the singular forms "a," "an," or "the" do not denote a limitation of quantity, but rather denote the presence of at least one, unless the context clearly dictates otherwise. The terms "comprises," "comprising," or the like, mean that the elements or items listed before "comprises" or "comprising" encompass the features, integers, steps, operations, elements, and/or components listed after "comprising" or "comprising," and do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Based on the prior art, although the traditional firewall based on quintuple information has good data filtering effect, in the industrial network, the safety of the industrial network cannot be ensured due to the use of different industrial protocols; and therefore, different filtering rules are adopted aiming at different industrial protocols, so that the quantity of contents to be supervised is too complicated, and the rules cannot be made only by manpower; the invention aims to provide a method and a device for generating an industrial network firewall rule based on machine learning, which can reduce human resource investment and can make a rule according to the actual network condition.
The invention combines an industrial firewall with a machine learning method, carries out corresponding treatment aiming at the problem that the generation of a defense rule of the industrial firewall depends too much on manpower, takes actual network content as model input, generates a rule file specially aiming at the actual network content, carries out self-adaptive adjustment on a priority item in the rule according to the actual flow condition, provides a machine learning-based industrial network firewall rule generation method, manufactures a rule self-learning industrial firewall according to the method, and further realizes a means for protecting the safety of an industrial network.
The following describes a rule self-learning industrial firewall according to the present invention with reference to the embodiments shown in the drawings.
With reference to the embodiments shown in fig. 1 and fig. 2, the method for generating the firewall rules of the industrial network based on machine learning includes the following steps:
(1) configuring a firewall into a gateway, and cleaning all rule files contained in the gateway;
(2) entering a learning mode: the gateway configures rule information passing all industrial network data packets, collects all industrial network data packets passing in a set time interval, and stores the information of log files of the gateway in the set time interval according to quintuple information, an industrial protocol and an industrial network function in a classified manner;
(3) characteristic pretreatment: performing integration statistics on log information with the same quintuple information, industrial protocol and industrial network functions in the log file, calculating the number of industrial network data packets with the same quintuple information, industrial protocol and industrial network functions and the average arrival time of the industrial network data packets, and generating a group of first feature vectors containing the quintuple information, the industrial protocol, the industrial network functions, the average arrival time of the data packets and the number of the industrial network data packets;
(4) and (3) endowing a label: randomly selecting a plurality of first eigenvectors in a first eigenvector group to make labels, wherein the labels are processing modes of the data packets in the firewall rules, and the processing modes comprise passing, alarming and discarding; the operation of giving a label to the first feature vector is completed manually;
(5) model learning: training a plurality of first feature vectors given with labels as a model training set to obtain a rule generation model;
(6) a rule file: taking any first feature vector which is not endowed with a label in the first feature vector group as the input of a rule generation model to generate a label of the first feature vector; any one first feature vector in the first feature vector group and a corresponding label form a rule file of the firewall;
(7) rule priority adjustment: collecting the number of industrial network data packets passing through different firewall rules and the average arrival time of the data packets in a preset time interval, and dynamically adjusting the rule priority of the firewall.
In step (3), according to the grammatical requirement of the firewall rule, important information required in the firewall log is retained, specifically, statistical analysis is performed on information of the log file, and the statistical analysis is mainly performed on log information with the same quintuple information, industrial protocol and industrial network functions, the number of industrial network data packets with the same quintuple information, industrial protocol and industrial network functions and the average arrival time of the industrial network data packets are calculated, first feature vectors are respectively generated, a first feature vector group is obtained, and the first feature vector group is marked as a, where a is [ IP ═s,IPd,PTs,PTd,ProtolT,ProtolA,U,T,N]Wherein, IPsIs a source IP, IP in quintuple informationdIs a destination ip, PT in the quintuple informationsIs a source port, PT, in the five-tuple informationdProtol, a destination port in quintuple informationTIs a transport layer protocol, in quintuple informationAFor an industrial network protocol, U is an industrial network function, T is the average arrival time of data packets, and N is the number of data packets.
And (4) manually endowing a label to the first characteristic vector in the step (4), wherein the label is the processing behavior (passing, alarming and discarding) of the firewall on the industrial data packet, the part of the vector after the label is endowed is used as a model training set, relevant information in the model training set is searched, and a rule generation model is generated.
The specific process of the model learning in the step (5) is as follows: and (3) extracting a training sample from a training set by using a random forest RF algorithm and returning each decision tree, and performing machine learning to obtain a rule generation model. As a supervised learning method, a random forest is trained by adopting an RF algorithm, and a final prediction function adopts a voting mode to a classification problem to obtain a label model for initial processing of different first feature vectors, namely a rule generation model. In some embodiments, a logistic regression algorithm is also used for model learning, the feature vector information as input is fitted into a logistic function to estimate the probability of any label, then a deep learning algorithm is used for learning a relatively complex multilayer neural network, high-dimensional training can be effectively performed, and the hidden layer content existing between input and output can be used for modeling internal data to obtain a rule generation model.
The specific process of adjusting the rule priority of the firewall in the step (7) is as follows: acquiring flow information of any rule in a firewall within a preset time interval; counting the number of industrial network data packets processed by any rule and the average arrival time of the industrial network data packets under the rule according to the flow information to generate a second feature vector S; and (4) clustering by taking the second feature vector S as an input, classifying according to a preset priority order, generally classifying into 5 classes, and forming the rule priority of the firewall. The priority of the firewall rules is adjusted in real time according to the industrial protocol and the industrial network function of the network, and is continuously adjusted at a set time interval, and in order to avoid the influence of the priority of the previous time interval, the second eigenvector S is subjected to zero returning processing before any dynamic adjustment of the preset time interval. When the number of rules in the rule file is large, if the rules are matched from top to bottom according to the rule file, the matching time is long, and the firewall processing efficiency can be higher by configuring the priority of the rules; and the self-adjustment of the rule priority is adjusted according to the actual condition of the network, if the number of data packets matched with a certain rule in a period of time is large and the average time is short, the priority of the rule is increased, the priority of the rule with the small number and the average time is decreased, so that the network processing efficiency of the industrial network firewall is improved, and the requirement of real-time network protection is met.
The rule file obtained in the step (6) has a large number of rules because of large number of industrial network data packets, so that in order to avoid matching all rule entries one by one during program operation, except for setting rule priority, the firewall rules with rule consistency in the rule file can be merged to improve network processing efficiency, namely, the first feature vector information is merged by using a python data processing method; the rule consistency means that the rules in the rule file have the same processing mode for certain features such as ip, industrial protocol and industrial function, and can be directly merged without other tag characteristics. Such as: if all data packets arriving at a certain ip pass, the source ip, the source port, the destination port, the industrial protocol and the function can be set as any. For another example, the industrial network packets of function 1, function 2, function 3, and function 4 under the industrial protocol a all have the same label, if passing through, the rules including the industrial protocol and implementing such functions can be directly merged, and the processing mode is the same, without considering quintuple information, ports, and function contents.
Another embodiment of the present invention provides a device for generating an industrial network firewall rule based on machine learning, which includes a processor and a memory, and generates the firewall rule by using the method for generating an industrial network firewall rule based on machine learning, and uses a method for analyzing traffic, so as to implement adaptive adjustment of rule priority and reduce labor cost and resource cost.
For example, the method for generating the industrial network firewall rules based on machine learning according to the invention can be divided into a plurality of modules, the plurality of modules are stored in a memory, and the processor executes the method to complete the invention. The plurality of modules or units may be a series of computer program instruction segments capable of performing specific functions, the instruction segments describing an execution process in the apparatus for generating the machine-learning-based industrial network firewall rules by the method for generating the machine-learning-based industrial network firewall rules. For example, the method for generating the industrial network firewall rules based on machine learning may be divided into a first configuration module, a second configuration module, a first collection module, a classification module, a preprocessing module, a label assignment module, a training module, a rule generation module, a second collection module, and a priority adjustment module, where specific functions of the modules are as follows:
the first configuration module is used for configuring the firewall into the gateway and cleaning all rule files contained in the gateway;
the second configuration module is used for configuring a piece of rule information passing through all the industrial network data packets for the gateway;
the first collection module is used for collecting all industrial network data packets which pass through the gateway within a set time interval;
the classification module is used for classifying and storing the information of the log files of the gateway in a set time interval according to quintuple information, an industrial protocol and an industrial network function;
the preprocessing module is used for preprocessing log information with the same quintuple information, industrial protocol and industrial network functions in the log file; the preprocessing operation is to calculate the number of industrial network data packets with the same quintuple information, industrial protocols and industrial network functions and the average arrival time of the industrial network data packets, and generate a group of first feature vectors containing the quintuple information, the industrial protocols, the industrial network functions, the average arrival time of the data packets and the number of the industrial network data packets;
the label endowing module is used for endowing labels to a plurality of first characteristic vectors in a first characteristic vector group selected randomly, wherein the labels are processing modes of the data packets in the firewall rules, and the processing modes comprise passing, alarming and discarding;
the training module is used for training according to a model training set to obtain a rule generation model, wherein the model training set is a plurality of first feature vectors after the labels are formulated;
the rule generating module is used for generating a label of any first feature vector which is not endowed with a label in a first feature vector group according to a rule generating model, wherein any first feature vector which is not endowed with a label in the first feature vector group is input into the rule generating model; any one first feature vector in the first feature vector group and a corresponding label form a rule file of the firewall;
the second collection module is used for collecting the number of industrial network data packets passing through different firewall rules and the average arrival time of the data packets;
and the priority adjusting module is used for dynamically adjusting the rule priority of the firewall within a preset time interval according to the data packet information collected by the second collecting module.
In order to implement the priority of the priority adjustment module when applying each rule in the rule file in the gateway, the priority adjustment module includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the flow information of any rule in the firewall within a preset time interval; the generating unit is used for generating a second feature vector S according to the flow information and the data packet information collected by the second collecting module; and the clustering unit is used for performing clustering processing by taking the second characteristic vector S as input, and classifying according to a preset priority order to form the rule priority of the firewall.
The device for generating the industrial network firewall rule based on machine learning disclosed by the embodiment can be computing equipment such as a desktop computer, a notebook computer, a palm computer and a cloud server. The means for generating the machine learning based industrial network firewall rules can include, but is not limited to, a processor and a memory.
The processor may be a central processing unit of a computer, but may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, etc., and the processor is a control center of the device for generating the machine learning-based industrial network firewall rules, and various modules of the device for generating the entire machine learning-based industrial network firewall rules are connected by various interfaces and lines.
The memory is used as a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/modules corresponding to the method for generating the industrial network firewall rules based on machine learning in the embodiment of the present invention, and the processor executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions and modules stored in the memory, so as to implement the device for generating the industrial network firewall rules based on machine learning in the above method embodiment.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory is preferably, but not limited to, a high speed random access memory, for example, but may also be a non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory may also optionally include memory located remotely from the processor, which may be connected to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The method for generating the industrial network firewall rules based on machine learning is realized in the form of a computer program-software functional unit and can be stored in a computer readable storage medium when being sold or used as an independent product. Based on such understanding, all or part of the processes in the method according to the above embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can be executed by a processor to implement the steps and results of the above method embodiments. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk or a solid state disk; the storage medium may also comprise a combination of memories of the kind described above.
The method comprises the steps of collecting quintuple and application layer industrial protocol information required in a firewall rule by utilizing a deep packet analysis technology in the industrial firewall, further generating a rule generation model by utilizing a random forest technology of machine learning, and generating a corresponding firewall rule aiming at an actual specific industrial network. The rule self-learning industrial firewall has the advantages that a firewall rule file which best meets the current industrial network is generated by a machine learning method under the condition of small amount of manual intervention, the labor cost is greatly reduced, the rules in different time periods are subjected to priority self-adaptive adjustment through a clustering method, the reasonable allocation of resources is realized for the industrial networks in different time periods, and the resource overhead is reduced.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be determined by the appended claims.

Claims (10)

1. A method for generating industrial network firewall rules based on machine learning is characterized by comprising the following steps:
(1) configuring a firewall into a gateway, and cleaning all rule files contained in the gateway;
(2) entering a learning mode: the gateway configures rule information passing all industrial network data packets, collects all industrial network data packets passing in a set time interval, and stores the information of log files of the gateway in the set time interval according to quintuple information, an industrial protocol and an industrial network function in a classified manner;
(3) characteristic pretreatment: performing integration statistics on log information with the same quintuple information, industrial protocol and industrial network functions in the log file, calculating the number of industrial network data packets with the same quintuple information, industrial protocol and industrial network functions and the average arrival time of the industrial network data packets, and generating a group of first feature vectors containing the quintuple information, the industrial protocol, the industrial network functions, the average arrival time of the data packets and the number of the industrial network data packets;
(4) and (3) endowing a label: randomly selecting a plurality of first eigenvectors in a first eigenvector group to make labels, wherein the labels are processing modes of the data packets in the firewall rules, and the processing modes comprise passing, alarming and discarding;
(5) model learning: training a plurality of first feature vectors given with labels as a model training set to obtain a rule generation model;
(6) a rule file: taking any first feature vector which is not endowed with a label in the first feature vector group as the input of a rule generation model to generate a label of the first feature vector; any one first feature vector in the first feature vector group and a corresponding label form a rule file of the firewall;
(7) rule priority adjustment: collecting the number of industrial network data packets passing through different firewall rules and the average arrival time of the data packets in a preset time interval, and dynamically adjusting the rule priority of the firewall.
2. The method for generating industrial network firewall rules based on machine learning according to claim 1, wherein the first feature vector in step (3) is denoted as a, a ═ IPs,IPd,PTs,PTd,ProtolT,ProtolA,U,T,N];
Wherein, IPsIs a source IP, IP in quintuple informationdIs a destination ip, PT in the quintuple informationsIs a source port, PT, in the five-tuple informationdProtol, a destination port in quintuple informationTIs a transport layer protocol, in quintuple informationAFor an industrial network protocol, U is an industrial network function, T is the average arrival time of data packets, and N is the number of data packets.
3. The method for generating industrial network firewall rules based on machine learning according to claim 1, wherein the specific process of model learning in the step (5) is as follows: and (3) extracting a training sample from a training set by using a random forest RF algorithm and returning each decision tree, and performing machine learning to obtain a rule generation model.
4. The method for generating industrial network firewall rules based on machine learning according to claim 1, wherein the specific process of model learning in the step (5) is as follows: and fitting the input feature vector information into a logic function by using a logistic regression algorithm to estimate the probability of any label, and then training by using a deep learning algorithm to obtain a rule generation model.
5. The method for generating industrial network firewall rules based on machine learning according to claim 1, wherein the specific process of adjusting the rule priority of the firewall in the step (7) is as follows:
acquiring flow information of any rule in a firewall within a preset time interval;
counting the number of industrial network data packets processed by any rule and the average arrival time of the industrial network data packets under the rule according to the flow information to generate a second feature vector S;
and clustering by taking the second feature vector S as input, and classifying according to a preset priority order to form the rule priority of the firewall.
6. The method for generating industrial network firewall rules based on machine learning according to claim 1, wherein the step (6) further comprises rule integration, and the rule integration is merged into firewall rules with rule consistency in a rule file.
7. The method for generating industrial network firewall rules based on machine learning according to claim 5, wherein the rule priority is processed by zeroing the second feature vector S before any preset time interval is dynamically adjusted.
8. An apparatus for generating industrial network firewall rules based on machine learning, the apparatus comprising: a processor for executing the following program modules stored in memory;
the first configuration module is used for configuring the firewall into the gateway and cleaning all rule files contained in the gateway;
the second configuration module is used for configuring a piece of rule information passing through all the industrial network data packets for the gateway;
the first collection module is used for collecting all industrial network data packets which pass through the gateway within a set time interval;
the classification module is used for classifying and storing the information of the log files of the gateway in a set time interval according to quintuple information, an industrial protocol and an industrial network function;
the preprocessing module is used for preprocessing log information with the same quintuple information, industrial protocol and industrial network functions in the log file; the preprocessing operation is to calculate the number of industrial network data packets with the same quintuple information, industrial protocols and industrial network functions and the average arrival time of the industrial network data packets, and generate a group of first feature vectors containing the quintuple information, the industrial protocols, the industrial network functions, the average arrival time of the data packets and the number of the industrial network data packets;
the label endowing module is used for endowing labels to a plurality of first characteristic vectors in a first characteristic vector group selected randomly, wherein the labels are processing modes of the data packets in the firewall rules, and the processing modes comprise passing, alarming and discarding;
the training module is used for training according to a model training set to obtain a rule generation model, wherein the model training set is a plurality of first feature vectors after the labels are formulated;
the rule generating module is used for generating a label of any first feature vector which is not endowed with a label in a first feature vector group according to a rule generating model, wherein any first feature vector which is not endowed with a label in the first feature vector group is input into the rule generating model; any one first feature vector in the first feature vector group and a corresponding label form a rule file of the firewall;
the second collection module is used for collecting the number of industrial network data packets passing through different firewall rules and the average arrival time of the data packets;
and the priority adjusting module is used for dynamically adjusting the rule priority of the firewall within a preset time interval according to the data packet information collected by the second collecting module.
9. The apparatus of claim 8, wherein the priority adjustment module comprises:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the flow information of any rule in the firewall within a preset time interval;
the generating unit is used for generating a second feature vector S according to the flow information and the data packet information collected by the second collecting module;
and the clustering unit is used for performing clustering processing by taking the second characteristic vector S as input, and classifying according to a preset priority order to form the rule priority of the firewall.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method for generating machine learning based industrial network firewall rules according to any one of claims 1 to 7.
CN202011375118.4A 2020-11-30 2020-11-30 Industrial network firewall rule generation method and device based on machine learning Active CN112532633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011375118.4A CN112532633B (en) 2020-11-30 2020-11-30 Industrial network firewall rule generation method and device based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011375118.4A CN112532633B (en) 2020-11-30 2020-11-30 Industrial network firewall rule generation method and device based on machine learning

Publications (2)

Publication Number Publication Date
CN112532633A true CN112532633A (en) 2021-03-19
CN112532633B CN112532633B (en) 2022-04-12

Family

ID=74995256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011375118.4A Active CN112532633B (en) 2020-11-30 2020-11-30 Industrial network firewall rule generation method and device based on machine learning

Country Status (1)

Country Link
CN (1) CN112532633B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113452685A (en) * 2021-06-22 2021-09-28 上海明略人工智能(集团)有限公司 Recognition rule processing method and system, storage medium and electronic equipment
CN113507454A (en) * 2021-06-23 2021-10-15 北京惠而特科技有限公司 Industrial firewall strategy automatic generation and deployment method based on flow analysis
CN115865670A (en) * 2023-02-27 2023-03-28 灵长智能科技(杭州)有限公司 Method and device for adjusting concurrency performance of WEB security gateway based on kernel tuning
KR20230099381A (en) * 2021-12-27 2023-07-04 주식회사 엘로이큐브 Heterogeneous firewall policy optimization apparatus, system having the same, and heterogeneous firewall policy optimization method using the same
WO2023249763A1 (en) * 2022-06-20 2023-12-28 Microsoft Technology Licensing, Llc Firewall rule and data flow analysis and modification

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7966659B1 (en) * 2006-04-18 2011-06-21 Rockwell Automation Technologies, Inc. Distributed learn mode for configuring a firewall, security authority, intrusion detection/prevention devices, and the like
CN108429753A (en) * 2018-03-16 2018-08-21 重庆邮电大学 A kind of matched industrial network DDoS intrusion detection methods of swift nature
US20200329011A1 (en) * 2019-04-10 2020-10-15 Google Llc Firewall rules intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7966659B1 (en) * 2006-04-18 2011-06-21 Rockwell Automation Technologies, Inc. Distributed learn mode for configuring a firewall, security authority, intrusion detection/prevention devices, and the like
CN108429753A (en) * 2018-03-16 2018-08-21 重庆邮电大学 A kind of matched industrial network DDoS intrusion detection methods of swift nature
US20200329011A1 (en) * 2019-04-10 2020-10-15 Google Llc Firewall rules intelligence

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WENLI SHANG等: "《Research on self-learning method on generation and optimization of industrial firewall rules》", 《IEEE》 *
尚文利等: "基于哈希算法的工业防火墙规则自学习方法", 《计算机工程与设计》 *
张昊: "计算机网络数据包捕获技术浅析", 《合肥学院学报(自然科学版)》 *
雷艳晴等: "工业防火墙规则自学习算法设计", 《计算机工程与设计》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113452685A (en) * 2021-06-22 2021-09-28 上海明略人工智能(集团)有限公司 Recognition rule processing method and system, storage medium and electronic equipment
CN113452685B (en) * 2021-06-22 2024-04-09 上海明略人工智能(集团)有限公司 Processing method, system, storage medium and electronic equipment for recognition rule
CN113507454A (en) * 2021-06-23 2021-10-15 北京惠而特科技有限公司 Industrial firewall strategy automatic generation and deployment method based on flow analysis
KR20230099381A (en) * 2021-12-27 2023-07-04 주식회사 엘로이큐브 Heterogeneous firewall policy optimization apparatus, system having the same, and heterogeneous firewall policy optimization method using the same
KR102649649B1 (en) * 2021-12-27 2024-03-21 주식회사 엘로이큐브 Heterogeneous firewall policy optimization apparatus, system having the same, and heterogeneous firewall policy optimization method using the same
WO2023249763A1 (en) * 2022-06-20 2023-12-28 Microsoft Technology Licensing, Llc Firewall rule and data flow analysis and modification
CN115865670A (en) * 2023-02-27 2023-03-28 灵长智能科技(杭州)有限公司 Method and device for adjusting concurrency performance of WEB security gateway based on kernel tuning
CN115865670B (en) * 2023-02-27 2023-06-16 灵长智能科技(杭州)有限公司 Method and device for adjusting concurrency performance of WEB security gateway based on kernel tuning

Also Published As

Publication number Publication date
CN112532633B (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN112532633B (en) Industrial network firewall rule generation method and device based on machine learning
US20210124983A1 (en) Device and method for anomaly detection on an input stream of events
CN112671757B (en) Encryption flow protocol identification method and device based on automatic machine learning
Cao et al. CNN-based intelligent safety surveillance in green IoT applications
Groleat et al. Hardware acceleration of SVM-based traffic classification on FPGA
Nazarenko et al. Features of application of machine learning methods for classification of network traffic (features, advantages, disadvantages)
CN111970400B (en) Crank call identification method and device
CN110287316A (en) A kind of Alarm Classification method, apparatus, electronic equipment and storage medium
US20170230279A1 (en) Classification with a switch
CN114553475A (en) Network attack detection method based on network flow attribute directed topology
CN110147389A (en) Account number treating method and apparatus, storage medium and electronic device
CN110460662A (en) The processing method and system of internet of things data
TWI752486B (en) Training method, feature extraction method, device and electronic device
CN112766511A (en) Method, apparatus and computer program product for model adaptation
CN112884121A (en) Traffic identification method based on generation of confrontation deep convolutional network
Wang et al. Fcnn: An efficient intrusion detection method based on raw network traffic
Chen et al. A novel semi-supervised learning method for Internet application identification
CN110390315A (en) A kind of image processing method and device
CN109815736A (en) A kind of database desensitization method, device and desensitization equipment
CN110610099A (en) Financial risk intelligent early warning and wind control system based on FPGA hardware acceleration
Suresh et al. AI based intrusion detection system using self-adaptive energy efficient BAT algorithm for software defined IoT networks
CN116127400B (en) Sensitive data identification system, method and storage medium based on heterogeneous computation
CN112383488A (en) Content identification method suitable for encrypted and non-encrypted data streams
CN111352820A (en) Method, equipment and device for predicting and monitoring running state of high-performance application
CN103227730A (en) Method and system for analyzing large log

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant