Industrial communication anomaly detection method based on dual similarity measurement
Technical Field
The invention relates to the technical field of industrial control system network security, in particular to an industrial communication abnormity detection method based on dual similarity measurement.
Background
The information safety risk of the current industrial control system in China is particularly prominent, and the situation is very severe. According to the safety report of the network emergency response group of the industrial control system under the U.S. department of homeland safety, the information safety event of the industrial control system has a stepwise increasing trend in recent years, wherein the occupation ratio of industries such as energy, manufacturing and the like is the largest. Especially, in recent years, the integration of the internet and the industrial control system breaks the original inherent closure of the industrial system, and the information security problem is increasingly revealed.
The industrial control system is a business process management and control system which is formed by various automatic control components and process control components for collecting and monitoring real-time data and ensures the automatic operation of industrial infrastructure and the process control and monitoring. Compared with the traditional network and information system, most industrial control systems need to consider various factors such as application environment, control management and the like during development and design, efficiency and real-time characteristics are considered firstly, functional safety is only concerned at the beginning of the establishment of the industrial control system, related design for information safety is lacked, and the industrial control system is generally lack of effective industrial safety defense and data communication secrecy measures. In addition, the information security of the industrial control system must preferentially ensure the availability and reliability of all system components, and the traditional IT information security technology, such as a firewall, virus software, and the like, cannot adapt to the characteristics of the industrial control system and cannot be directly applied to the industrial control system.
To this end, researchers have set out to develop information security technologies adapted to the characteristics of the industrial control system itself, typically including: industrial firewalls, industrial gatekeepers, industrial software white-listing techniques, industrial intrusion detection, and the like. The industrial intrusion detection comprises two parts of feature detection and abnormal detection, the abnormal detection realizes abnormal behavior discovery through matching with normal behaviors, and unknown attacks can be effectively detected without knowing the feature form of the attacks in advance on the premise of not interfering the real-time performance and the availability of an industrial control system, so that the unknown attacks are consistently approved by researchers. The current abnormity detection method aiming at the industrial control system mainly relates to three types: statistical-based methods, knowledge-based methods, and machine learning-based methods. The machine learning-based method comprises the technologies of clustering, neural networks, Bayesian algorithms, genetic algorithms, fuzzy logic, support vector machines and the like. Generally, the methods are based on the characteristics of industrial communication behaviors, and adopt an unsupervised or semi-supervised means to acquire communication data in an industrial control network for analysis, construct a normal communication behavior model, and judge whether an abnormality occurs or not by calculating the deviation from the normal communication behavior model.
The industrial anomaly detection method usually provides anomaly detection capability only from one side of industrial network communication, for example, many statistical-based methods adopt a CUSUM algorithm to calculate an anomaly change point of industrial communication flow, and machine learning-based methods realize anomaly discovery aiming at changes (such as changes of function codes) of certain industrial activity, so that all-around consideration of all industrial communication characteristics is lacked, the anomaly detection capability is limited, and meanwhile, the anomaly detection engine method also has one-sidedness in application.
Disclosure of Invention
The invention further aims to provide an industrial communication anomaly detection method based on dual similarity measurement, which analyzes communication data in an industrial control network and extracts industrial communication behavior characteristics according to an industrial communication interaction mode and an industrial protocol, constructs a behavior characteristic tree through the characteristics, and respectively measures intra-tree similarity and inter-tree similarity so as to find abnormal communication conditions in the industrial control network. According to the method, through two similarity measurement algorithms in the tree and between the trees, the anomaly detection capability can be effectively and comprehensively improved, known attacks and unknown attacks in industrial network communication can be found in real time, and the safety of an industrial system, a network and equipment is protected.
In order to achieve the purpose, the invention adopts the technical scheme that: a double similarity measurement-based industrial communication anomaly detection method is characterized by comprising the following steps:
1) classification and selection of industrial communication behavior features: dividing industrial communication data into different message samples according to the same time interval, and extracting industrial communication behavior characteristics according to a protocol of an industrial communication protocol and an industrial communication interaction mode to form a characteristic space;
2) constructing an industrial behavior feature tree: respectively constructing a main branch, a secondary branch and a leaf node of the industrial behavior feature tree according to the feature space of each message sample, so that each message sample is represented by one industrial behavior feature tree;
3) and (3) real-time anomaly discrimination of the double similarity measurement: and performing double similarity measurement calculation on the industrial behavior characteristic tree of each message sample, comparing the calculation result with an intra-tree measurement threshold and an inter-tree measurement threshold respectively, judging whether abnormality occurs and giving an alarm.
In the step 1), the industrial communication behavior characteristics are divided into two types: general network behavior characteristics, industrial protocol semantic characteristics.
The general network behavior characteristics describe the characteristics of the message samples when the message samples are transmitted in the network, and comprise the following steps: packet rate, average packet size, IP to port mapping, round trip delay for one access.
The industrial protocol semantic features are proprietary features extracted according to industrial protocol syntax and protocol specifications, and comprise function codes, coil or register addresses and coil or register field values.
In the step 2), the construction process of the industrial behavior feature tree is as follows:
2.1) creating a root and a trunk of the industrial behavior feature tree;
2.2) respectively creating two main branches on the tree trunk according to the two industrial communication behavior characteristics;
2.3) creating a secondary branch on each main branch for all the characteristics belonging to the main branch, such as creating a secondary branch representing the packet rate on the main branch representing the general network behavior characteristics;
2.4) on each sub-branch, taking each eigenvalue of the feature as a leaf node.
In the step 3), the real-time anomaly discrimination of the dual similarity measurement specifically performs two-way calculation:
3.1) the intra-tree similarity measure is directed at the measure between different features in the industrial behavior feature tree, wherein the industrial behavior feature number belongs to the same message sample;
3.2) inter-tree similarity measure measures between the industrial behavior feature trees for different message samples.
The similarity measurement in the tree adopts the Minkowski distance as a measurement algorithm; the inter-tree similarity measurement adopts cosine similarity as a measurement algorithm.
The similarity measurement in the tree adopts the Minkowski distance as a measurement algorithm, and the calculation formula is as follows:
wherein, P ═ P (P)1,p2,…,pN) And Q ═ Q (Q)1,q2,…,qN) And v is a variable parameter and is specifically adjusted according to actual conditions.
The inter-tree similarity measurement adopts cosine similarity as a measurement algorithm, and the calculation formula is as follows:
wherein x iskAnd ykRespectively representing the same kind of characteristic values in different industrial behavior characteristic trees.
In the step 3), the intra-tree metric threshold and the inter-tree metric threshold are rated values calculated by double similarity measurement by using industrial communication data.
The beneficial effects created by the invention are as follows:
1. compared with the prior art, the invention discloses and provides an industrial communication anomaly detection method based on dual similarity measurement, which not only considers the general network behavior characteristics in an industrial control network, but also analyzes the industrial protocol semantic characteristics, and enables the characteristic detection to be more comprehensive by constructing an industrial behavior characteristic tree, thereby greatly improving the anomaly detection capability.
2. The method adopts two algorithms of intra-tree similarity measurement and inter-tree similarity measurement, wherein the intra-tree similarity measurement aims at the measurement between different characteristics in the industrial behavior characteristic tree of the same message sample, the inter-tree similarity measurement aims at the measurement between the industrial behavior characteristic trees of different message samples, and the two measurement modes can effectively solve the industrial communication abnormity caused by malicious attack or misoperation.
3. The method is a monitoring and analyzing method of a third-party bypass, is mainly deployed at a mirror image port of an industrial switch, does not participate in the production and manufacturing process of an industrial control system, and therefore does not interfere with the real-time performance and the availability of industrial control.
4. The method can not only identify, detect and alarm the intrusion behavior and the unauthorized behavior which appear in the industrial network once, but also detect the unknown industrial network attack, and is suitable for the characteristics of the imperceptibility and the unpredictability of the unknown industrial network attack and the like.
Description of the drawings:
FIG. 1 is a schematic diagram of an embodiment of application deployment of the method in an industrial control network based on Modbus/TCP.
FIG. 2 is a schematic diagram of a basic model of the method of the present invention.
FIG. 3 is a schematic diagram of the main implementation process of the real-time anomaly detection in the method of the present invention.
FIG. 4 is a schematic diagram of an industrial behavior feature tree construction process of the method of the present invention.
Detailed Description
A double similarity measurement-based industrial communication anomaly detection method comprises the following steps:
1) classification and selection of industrial communication behavior features: the industrial communication data are divided into different message samples according to the same time interval, and the industrial communication behavior characteristics are extracted according to the protocol of the industrial communication protocol and the industrial communication interaction mode to form a characteristic space.
The industrial communication behavior characteristics are divided into two types: general network behavior characteristics, industrial protocol semantic characteristics.
The general network behavior characteristics describe the characteristics of message samples when transmitted in the network, including: packet rate, average packet size, IP to port mapping, round trip delay for one access.
The industrial protocol semantic features are proprietary features extracted according to industrial protocol syntax and protocol specifications, and comprise function codes, coil or register addresses and coil or register field values.
2) Constructing an industrial behavior feature tree: and respectively constructing a main branch, a secondary branch and a leaf node of the industrial behavior feature tree according to the feature space of each message sample, so that each message sample is represented by one industrial behavior feature tree.
The construction process of the industrial behavior feature tree is as follows:
2.1) creating a root and a trunk of the industrial behavior feature tree;
2.2) respectively creating two main branches on the tree trunk according to the two industrial communication behavior characteristics;
2.3) creating a secondary branch on each main branch for all the characteristics belonging to the main branch, such as creating a secondary branch representing the packet rate on the main branch representing the general network behavior characteristics;
2.4) on each sub-branch, taking each eigenvalue of the feature as a leaf node.
3) And (3) real-time anomaly discrimination of the double similarity measurement: and performing double similarity measurement calculation on the industrial behavior characteristic tree of each message sample, comparing the calculation result with an intra-tree measurement threshold and an inter-tree measurement threshold respectively, judging whether abnormality occurs and giving an alarm.
In the step 3), the real-time anomaly discrimination of the dual similarity measurement specifically performs two-way calculation:
3.1) the intra-tree similarity measurement aims at the measurement among different characteristics in an industrial behavior characteristic tree, wherein the industrial behavior characteristic number belongs to the same message sample, and the intra-tree similarity measurement adopts the Minkowski distance as a measurement algorithm; the similarity measurement in the tree adopts the Minkowski distance as a measurement algorithm, and the calculation formula is as follows:
wherein, P ═ P (P)1,p2,…,pN) And Q ═ Q (Q)1,q2,…,qN) And v is a variable parameter and is specifically adjusted according to actual conditions.
3.2) inter-tree similarity measure measures between the industrial behavior feature trees for different message samples; the inter-tree similarity measurement adopts cosine similarity as a measurement algorithm; the inter-tree similarity measurement adopts cosine similarity as a measurement algorithm, and the calculation formula is as follows:
wherein x iskAnd ykRespectively representing the same kind of characteristic values in different industrial behavior characteristic trees.
In the step 3), the intra-tree metric threshold and the inter-tree metric threshold are rated values calculated by double similarity measurement by using industrial communication data.
Example 1: the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method of the invention belongs to the field of information safety detection and protection of an industrial control system. FIG. 1 shows a schematic diagram of an embodiment of the method of the invention in Modbus/TCP-based industrial control network application deployment. As shown in the figure, the method can be used as a third-party monitoring method and is deployed on a mirror image port of an industrial switch, the industrial switch is responsible for Modbus/TCP communication work between a workstation (such as an operator station and an engineer station) and a master controller (such as a PLC and a DCS controller), the industrial switch copies all Modbus/TCP control communication data to the mirror image port of the industrial switch, and detection equipment applying the method captures the communication data of the mirror image port in real time, analyzes and detects the communication data, so that intrusion behaviors, unauthorized behaviors or misoperation behaviors mixed in the normal process operation process of an industrial control system are found, and an alarm is given. In this embodiment, the method of the present invention first captures a Modbus/TCP communication data stream between a workstation (Modbus/TCP master station) and a master (Modbus/TCP slave station), extracts general network behavior characteristics (including packet rate of Modbus/TCP data packets, average data packet size, mapping information from an IP address to a 503 port, round-trip delay of each control request, etc.) and industrial protocol semantic characteristics (including a function code, a coil address and a corresponding switch amount of a control request) of the Modbus/TCP communication data stream through deep parsing and feature extraction, and then constructs an industrial behavior feature tree through these characteristics, and performs anomaly detection by using a dual similarity metric algorithm.
The invention provides an industrial communication abnormity detection method based on dual similarity measurement. Referring to fig. 2, a basic model of an industrial communication anomaly detection method based on a dual similarity metric is shown. The model mainly comprises three parts, namely initialization preprocessing, abnormal detection model construction of dual similarity measurement and real-time abnormal detection. In the initialization preprocessing part, each industrial control communication protocol has a unique communication interaction mode according to different protocol specifications, and the specificity is often closely associated with a time factor, so that when the captured communication data is analyzed, the communication data with the same time interval is taken as a message sample, and meanwhile, the message sample is analyzed by adopting a deep packet analysis technology. In the abnormal detection model construction part of the dual similarity measurement, firstly, extracting and classifying the characteristics of message samples to obtain general network behavior characteristics and industrial protocol semantic characteristics, finally forming a characteristic space of industrial communication behaviors, establishing an industrial behavior characteristic tree according to the characteristic space, so that each message sample can be described by one industrial behavior characteristic tree, then, carrying out normalization processing on all the characteristics aiming at each industrial behavior characteristic tree, and respectively learning an intra-tree measurement threshold and an inter-tree measurement threshold by calculating through a dual similarity measurement mechanism, wherein the intra-tree similarity measurement aims at the measurement between different characteristics in the industrial behavior characteristic tree of the same message sample; the inter-tree similarity measure measures between the industrial behavior feature trees for different message samples. In the real-time anomaly detection part, referring to fig. 3, the main execution process of real-time anomaly detection is shown, transmission data in an industrial communication network is captured on line in real time, feature selection and extraction are carried out on the data, a corresponding industrial behavior feature tree is constructed, then dual similarity measurement mechanism calculation is carried out, and whether anomaly occurs or not is judged and an alarm is given according to the calculation result which is respectively compared with the measurement thresholds in the tree and between the trees. In the process of judging the abnormity, firstly, the intra-tree similarity measurement calculation is carried out, if the calculation result does not accord with the intra-tree measurement threshold, the abnormity is judged and an alarm is given, if the calculation result accords with the intra-tree measurement threshold, the inter-tree similarity measurement calculation is carried out, and if the calculation result does not accord with the inter-tree measurement threshold, the abnormity is judged and an alarm is given.
In the dual similarity measurement mechanism, the tree similarity measurement adopts the minkowski distance as a measurement algorithm, and the calculation formula is as follows:
wherein, P ═ P (P)1,p2,…,pN) And Q ═ Q (Q)1,q2,…,qN) And v is a variable parameter and can be adjusted according to specific actual conditions.
The inter-tree similarity measurement adopts cosine similarity as a measurement algorithm, and the calculation formula is as follows:
wherein x iskAnd ykRespectively representing the same kind of characteristic values in different industrial behavior characteristic trees.
Referring to fig. 4, a schematic diagram of a specific construction process of the industrial behavior feature tree in the method of the present invention is shown. And respectively constructing a main branch, a secondary branch and a leaf node of the industrial behavior feature tree according to the feature space of each message sample, so that each message sample is represented by one industrial behavior feature tree. The main implementation process is as follows:
the method comprises the following steps: creating a root and a trunk of an industrial behavior feature tree;
step two: respectively creating two main branches on a tree trunk, wherein one main branch represents general network behavior characteristics, and the other main branch represents industrial protocol semantic characteristics;
step three: analyzing the message sample by adopting technologies such as deep packet analysis and the like, acquiring all industrial communication behavior characteristics in the message sample, creating a corresponding secondary branch on the main branch for each characteristic belonging to general network behavior characteristics, and simultaneously creating a corresponding secondary branch on the main branch for each characteristic belonging to industrial protocol semantic characteristics;
step four: creating leaf nodes on each secondary branch, wherein each leaf node represents a characteristic value, and all characteristic values belonging to the same characteristic form all leaf nodes on the secondary branch;
step five: judging whether all the characteristics and characteristic values in the message sample have corresponding secondary branches and leaf nodes on the industrial behavior characteristic tree or not, and if so, completing construction of the industrial behavior characteristic tree; if not, the third step to the fifth step are repeatedly executed.