CN116055071A - Industrial control network threat information generation system and method based on hidden network traffic mining - Google Patents

Industrial control network threat information generation system and method based on hidden network traffic mining Download PDF

Info

Publication number
CN116055071A
CN116055071A CN202111251627.0A CN202111251627A CN116055071A CN 116055071 A CN116055071 A CN 116055071A CN 202111251627 A CN202111251627 A CN 202111251627A CN 116055071 A CN116055071 A CN 116055071A
Authority
CN
China
Prior art keywords
industrial control
flow
traffic
threat information
attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111251627.0A
Other languages
Chinese (zh)
Inventor
张长河
耿童童
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Weida Information Technology Co ltd
Original Assignee
Beijing Weida Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Weida Information Technology Co ltd filed Critical Beijing Weida Information Technology Co ltd
Priority to CN202111251627.0A priority Critical patent/CN116055071A/en
Publication of CN116055071A publication Critical patent/CN116055071A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides an industrial control network threat information generation system and method based on hidden network traffic mining, which overcomes the defect that the traditional threat information generation method cannot accurately construct threat information oriented to the industrial control security field, solves the defect that the traditional threat information cannot effectively analyze the hidden network traffic, and has the following basic ideas: firstly, capturing dark network traffic and identifying an active segment for industrial control network attack; then, designing and implementing a deep packet parsing algorithm to parse the data packet to detect the load pattern and attack intention of the identified active fragment; secondly, classifying attack sources, attack modes and attack intentions of different attack fragments by a deep learning method, and finally generating threat information oriented to an industrial control network. The invention can effectively utilize the hidden network flow to capture the network threat information facing the industrial control field in real time, and can effectively find out the potential network threat inside and outside the industrial control system in a ground way and master the network security situation of the industrial control system in an omnibearing and three-dimensional way.

Description

Industrial control network threat information generation system and method based on hidden network traffic mining
Technical Field
The invention relates to the field of network security, in particular to an industrial control network threat information generation system and method based on hidden network traffic mining.
Background
A darknet is a hidden network, and a common user cannot search for access through a conventional means, and needs to use specific software, configuration or authorization to log in. Because the hidden network has the characteristics of anonymity, concealment and the like, the hidden network flow attack is one of the main sources for the attack of the industrial control system. The industrial control system (ICS, industrial Control System) is an organic combination of a communication network and physical equipment, is an important basic guarantee for automated production, and relates to the operation of national key infrastructure such as electric power, coal, petrochemical industry, rail transit and the like. In recent years, with the deep integration of industrialization and informatization, intelligent sensing devices are widely connected into an industrial control system for task delivery, state monitoring, device health management and the like of the industrial control system. However, since smart sensor devices often lack effective security software, they introduce a significant cyber-security threat while facilitating industrial control systems. In recent years, network attacks are frequent, and industrial control systems for guaranteeing national production and people's life are becoming the preferred targets for hacking. Once the industrial control system is invaded, great economic and safety losses are caused to the country and the society, so that the safety protection research aiming at the industrial control system has great practical significance for protecting the national social stability, the personal and property safety of residents and building a network health environment.
At present, typical industrial control network security protection methods are roughly divided into two types: one is a protection method based on a static signature mechanism, and the other is a protection method based on network threat information. The passive protection method based on the static signature mechanism focuses on deploying various security protection devices (such as a firewall, an intrusion detection system, an intrusion protection system and the like) to match malicious fingerprint information to identify and block attack intrusion of the industrial control system, but the method can only passively identify network attack according to a predefined malicious signature library, and cannot protect novel unknown industrial control network attack. The security protection method based on the network threat information focuses on extracting indication entities or signatures with suspicious threats from captured data related to security, such as text description, logs, flow and the like by utilizing techniques of machine learning, deep learning and the like, and actively acquiring the network threat information to protect the industrial control system security in real time. In order to reduce the influence of network threat on an industrial control system, numerous industrial control safety protection methods and patents based on network threat information have been proposed.
The invention patent with the application number of CN201710849672.3 discloses an industrial control system network security analysis system and method based on threat information, wherein the system comprises the following steps: acquiring IP (Internet Protocol) addresses and related bound domain names of all the accessed industrial control systems; carrying out malicious judgment on the discovered IP address and the related domain name; grouping the existing access IPs based on the binding relation between the access IPs and the related domain names; analyzing the discovered IP group in aspects of time characteristic, space characteristic, maliciousness and the like; and according to the discovered IP group and the related characteristics thereof, performing machine learning on the IP group, establishing a decision tree model for judging whether the IP group is a malicious IP group, and evaluating the model. And finding all IP groups related to the industrial control system according to the access IP, and performing full-scale analysis on the access relation of the industrial control system through the visual page. And according to the preset time, the steps are periodically executed, the accuracy and coverage rate are continuously improved, and the related malice and security analysis results of the threat information library are updated.
The invention patent with publication number of CN107391598A discloses a threat information automatic generation method and system, wherein the method comprises the following steps: the method comprises the steps of acquiring unstructured threat description text data and structured data related to the safety of an industrial control system, extracting industrial control threat entities and relations contained in a text from the unstructured threat description text data by using a public entity identification and relation extraction technology, extracting the industrial control threat entities and relations from the structured data based on a statistical result, storing the industrial control threat entities and relations extracted from the two types of data in graph data, and fusing the multi-source heterogeneous industrial control threat entities and relations in the form of the graph data; then, embedding the industrial control threat entity by using the disclosed graph embedding method; based on the obtained embedded representation, finally, the machine learning algorithm is utilized to realize the automatic analysis of the industrial control threat information.
The disclosed network threat information generation method can be found by analyzing the network threat information generation method, and the current network threat information generation method has the following main defects:
(1) The method has the advantages that the analysis capability of the dark network flow is lacking, the dark network flow is a main source of industrial control network attack, but most of the current methods are difficult to effectively extract threat information aiming at industrial control network attack from the dark network flow due to the characteristics of difficult capture, high camouflage, difficult analysis and the like of the dark network flow;
(2) The current threat information extraction method mostly simply extracts a source IP address from the traffic data packet as threat information, and lacks analysis and extraction of depth features such as attack fragments, attack load features, attack intents and the like.
The invention comprehensively considers the advantages and the disadvantages of a plurality of network threat information generating methods and provides an industrial control network threat information generating system and method based on the hidden network traffic mining.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides an industrial control network threat information generation system and method based on the hidden network traffic mining, which overcomes the defect that the traditional threat information generation method cannot accurately construct threat information oriented to the industrial control security field, solves the defect that the traditional threat information cannot effectively analyze the hidden network traffic, and has the following basic ideas: firstly, capturing dark network traffic and identifying an active segment for industrial control network attack; then, designing and implementing a deep packet parsing algorithm to parse the data packet to detect the load pattern and attack intention of the identified active fragment; secondly, classifying attack sources, attack modes and attack intentions of different attack fragments by a deep learning method, and finally generating threat information oriented to an industrial control network. The invention can effectively utilize the hidden network flow to capture the network threat information facing the industrial control field in real time, and can effectively find out the potential network threat inside and outside the industrial control system in a ground way and master the security situation of the industrial control system in an omnibearing and three-dimensional way.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows:
an industrial control network threat information generation system based on hidden network traffic mining is characterized by comprising: the system comprises an industrial control flow filtering unit, an industrial control flow converging unit, a flow similarity evaluation unit, an industrial control flow clustering unit and an industrial control threat information signature generation unit. The industrial control flow filtering unit is used for designing an industrial control flow filtering algorithm based on character feature matching to filter flow irrelevant to industrial control system attack from captured dark network flow; the industrial control flow converging unit is responsible for converging industrial control flow data packets containing the same source IP address and the same destination port in the flow data packet header file and embedding the industrial control flow data packets into the industrial control flow data packets for representation by using a disclosed text embedding algorithm; the flow similarity evaluation unit is responsible for embedding nodes in the flow diagram into a low-dimensional vector space, and judging the similarity between any two flow nodes embedded into the low-dimensional space by calculating the Euclidean distance between the two flow nodes; the industrial control flow clustering unit clusters the nodes in the flow graph to obtain different clustering groups by using a Kmeans clustering algorithm based on the similarity of the flow nodes; the industrial control threat information signature generating unit is used for designing a flow data packet analyzing algorithm based on an automatic encoder to analyze flow data packets in a suspicious industrial control attack flow group, automatically identifying industrial control attack activity fragments from the flow data packets, learning corresponding flow data packet load modes and attack intents of the flow data packets, constructing industrial control threat information signatures by the learned industrial control attack activity fragments, attack load modes and attack intents, and adding the industrial control threat information signatures to an industrial control threat information signature database in real time.
And further, according to the industrial control network threat information generation system based on the hidden network traffic mining, the industrial control traffic filtering unit is used for maintaining and updating an industrial control traffic load feature library in real time in order to filter traffic data packets irrelevant to industrial control attacks from the captured hidden network traffic, designing an industrial control traffic extraction algorithm based on character feature matching, judging whether each traffic belongs to the industrial control field traffic by comparing the relative Hamming code distance of the traffic and the known industrial control traffic load feature, and filtering the industrial control traffic with the industrial control traffic load feature from the captured hidden network traffic to be used as a data source for generating the industrial control network threat information.
And the industrial control flow converging unit is used for converging flow data packets according to the source IP address and the destination port value, converging the flow data packets containing the same source IP address and the same destination port in the flow data packet header file as a whole text, using a word2vec text embedding algorithm to embed the flow data packets into the flow data packet header file to represent the flow data packets, and simultaneously using a dictionary structure to store the flow data packets, wherein dictionary keys store the flow data packet embedding characteristics, and dictionary values store the number and the serial number of the converged flow data packets.
Further according to the industrial control network threat information generation system based on the hidden network traffic mining, the traffic similarity evaluation unit firstly utilizes a graph structure to construct an interactive relationship among an industrial control traffic source IP, a source port, a destination IP, a destination port and a communication protocol; the flow nodes in the graph are then embedded into the low-dimensional vector space by using algorithms including, but not limited to, GCN (Graph Convolutional Networks), and the similarity between any two flow nodes is determined by calculating the Euclidean distance after being embedded into the low-dimensional space.
And further, according to the industrial control network threat information generation system based on the hidden network traffic mining, the industrial control traffic clustering unit clusters the nodes in the traffic map by using a Kmeans clustering algorithm to obtain different cluster groups based on the traffic node similarity, calculates the Euclidean distance between each cluster group and the normal traffic group marked in advance, and marks the cluster group with the distance larger than a set threshold as a suspicious industrial control attack traffic group.
The industrial control network threat information generating system based on the hidden network traffic mining is further designed, the industrial control threat information signature generating unit analyzes traffic data packets in the suspicious industrial control attack traffic group by using a traffic data packet analysis algorithm based on an automatic encoder, automatically identifies industrial control attack activity fragments from the traffic data packets and learns corresponding traffic data packet load modes, constructs industrial control threat information signatures from the learned industrial control attack activity fragments and attack load modes, and adds the industrial control threat information signatures to an industrial control threat information signature database in real time.
The invention discloses an industrial control network threat information generation method based on hidden network traffic mining, which is characterized by comprising the following steps of:
in order to filter flow data packets irrelevant to industrial control attack from captured dark network flow, the invention maintains and updates an industrial control flow load feature library in real time, designs an industrial control flow extraction algorithm based on character feature matching, judges whether each flow belongs to industrial control field flow by comparing the relative Hamming code distance of the flow and the known industrial control flow load feature, and filters industrial control flow with the industrial control flow load feature from the captured dark network flow to be used as a data source for generating industrial control network threat information;
step (2), industrial control flow convergence, which is based on the industrial control flow extracted in the step (1), converging flow data packets according to source IP addresses and destination port values, converging the flow data packets containing the same source IP addresses and the same destination ports in flow data packet header files as a whole text to be treated, embedding the flow data packets into the whole text by using a word2vec text embedding algorithm, and storing the flow data packets by using a dictionary structure, wherein dictionary keys store flow data packet embedding characteristics, and dictionary values store the number and the number of the converged flow data packets;
step (3), flow similarity evaluation, wherein a flow similarity evaluation unit firstly utilizes a graph structure to construct an interactive behavior relation among an industrial control flow source IP, a source port, a destination IP, a destination port and a communication protocol; then, embedding the flow nodes in the graph into a low-dimensional vector space by using a GCN algorithm, judging the similarity between any two flow entities by calculating the Euclidean distance of the nodes embedded into the low-dimensional space, and taking the obtained flow similarity evaluation as a measurement standard of industrial control flow clustering;
step (4), industrial control flow clustering, wherein an industrial control flow clustering unit clusters nodes in a flow graph by utilizing the similarity of flow nodes obtained in the step (3) through a Kmeans clustering algorithm, so as to obtain different cluster groups, calculates Euclidean distance between each cluster group and a normal flow group marked in advance, and marks the cluster group with the distance larger than a set threshold as a suspicious industrial control attack flow group;
and (5) generating an industrial control threat information signature, designing a flow data packet analysis algorithm based on an automatic encoder to analyze the flow data packet in the suspicious industrial control attack flow group in the step (4), automatically identifying industrial control attack activity fragments from the flow data packet analysis algorithm and learning corresponding flow data packet load modes, constructing the industrial control threat information signature by the learned industrial control attack activity fragments, attack load modes and attack intents, adding the industrial control threat information signature to an industrial control threat information signature database in real time, and continuously generating industrial control network threat information based on dark network flow mining in an increment mode.
The invention has the beneficial effects that:
1) The invention breaks through the defect that the traditional network threat information cannot efficiently utilize the hidden network traffic data, designs and realizes the threat information generation system and method oriented to industrial control network security, integrates related technologies such as deep analysis, deep learning, graph theory and the like of traffic data packets, and analyzes and extracts traffic load characteristics, attack fragment pattern characteristics and attack intention characteristics aiming at industrial control network attack from the hidden network traffic data packets in real time.
2) The invention designs and realizes a flow data packet deep analysis algorithm for identifying and detecting the active fragments aiming at industrial network attack in the dark network flow, can effectively locate the attack flow range, and greatly reduces the resource consumption of full flow analysis mining; the invention combines a document vectorization algorithm and an iterative clustering algorithm to vectorize the load of the industrial control flow attack segment so as to characterize the low-dimensional vector characteristic of each attack segment.
3) The invention provides a dynamic-expansion industrial control network threat information generation method, which can process newly captured hidden network traffic, extract the latest attack traffic load characteristics, attack fragment pattern characteristics and the like from the hidden network traffic in real time, construct corresponding threat information signatures and realize dynamic real-time expansion and update of an industrial control network threat information library.
4) The prototype system is used and practice proves that the invention can effectively detect and extract the attack activity fragments aiming at the industrial control network in the dark network flow, effectively analyze the origins, the participants, the attack load modes and the attack intentions of the attack fragments, generate accurate and detailed threat information aiming at the industrial control network, and can effectively sense the network security situation of the industrial control system; the scheme of the invention is easy to arrange in the existing network, is simple to operate, is safe and reliable, and has obvious economic and social benefits and wide market popularization and application prospects.
Drawings
FIG. 1 is a general block diagram of an industrial control network threat information generation system and method based on darknet traffic mining according to the present invention;
Detailed Description
The following detailed description of the present invention is provided in connection with the accompanying drawings so that those skilled in the art can more clearly understand the present invention, but the scope of the present invention is not limited thereto.
At present, typical industrial control network security protection methods are roughly divided into two types: one is a protection method based on a static signature mechanism, and the other is a protection method based on network threat information. The passive protection method based on the static signature mechanism focuses on deploying various security protection devices (such as a firewall, an intrusion detection system, an intrusion protection system and the like) to match malicious fingerprint information to identify and block attack intrusion of the industrial control system, but the method can only passively identify network attack according to a predefined malicious signature library, and cannot protect novel unknown industrial control network attack. The security protection method based on the network threat information focuses on extracting indication entities or signatures with suspicious threats from captured data related to security, such as text description, logs, flow and the like by utilizing techniques of machine learning, deep learning and the like, and actively acquiring the network threat information to protect the industrial control system security in real time. The invention comprehensively considers the advantages and the disadvantages of a plurality of network threat information generating methods and provides an industrial control network threat information generating system and method based on the hidden network traffic mining.
First, the technical innovation principle of the invention is explained, and the basic idea is as follows: firstly, capturing dark network traffic and identifying an active segment for industrial control network attack; then, designing and implementing a deep packet parsing algorithm to parse the data packet to detect the load pattern and attack intention of the identified active fragment; secondly, classifying attack sources, attack modes and attack intentions of different attack fragments by a deep learning method, and finally generating threat information oriented to an industrial control network. The invention can effectively utilize the hidden network flow to capture the network threat information facing the industrial control field in real time, and can effectively find out the potential network threat inside and outside the industrial control system in a ground way and master the network security situation of the industrial control system in an omnibearing and three-dimensional way.
The invention discloses an industrial control network threat information generation system based on hidden network traffic mining, which is characterized by comprising the following steps as shown in figure 1. 1) The invention maintains and updates an industrial control flow load characteristic library in real time, designs an industrial control flow extraction algorithm based on character characteristic matching, judges whether each flow belongs to the industrial control field flow by comparing the relative Hamming code distance between the flow and the known industrial control flow load characteristic, and filters the industrial control flow with the industrial control flow load characteristic from the captured dark network flow to be used as a data source for generating industrial control network threat information; 2) Based on the industrial control flow extracted in the step (1), aggregating flow data packets according to source IP addresses and destination port values, aggregating the flow data packets containing the same source IP addresses and the same destination ports in a flow data packet header file as a whole text to be treated, embedding the flow data packets into a representation by using a word2vec text embedding algorithm, and storing the flow data packets by using a dictionary structure, wherein dictionary keys store flow data packet embedding characteristics, and dictionary values store the number and the number of the aggregated flow data packets; 3) Constructing interactive behavior relations among an industrial control flow source IP, a source port, a destination IP, a destination port and a communication protocol by utilizing a graph structure; then, embedding the flow nodes in the graph into a low-dimensional vector space by using the method including but not limited to GCN (Graph Convolutional Networks), judging the similarity between any two flow entities by calculating the Euclidean distance of the nodes embedded into the low-dimensional space, and taking the obtained flow similarity evaluation as a measurement standard of industrial control flow clustering; 4) Clustering the nodes in the flow graph by using the similarity of the flow nodes obtained in the step (3) and using a Kmeans clustering algorithm to obtain different cluster groups, then calculating the Euclidean distance between each cluster group and a normal flow group marked in advance, and marking the cluster group with the distance larger than a set threshold value as a suspicious industrial control attack flow group; 5) Designing a flow data packet analysis algorithm based on an automatic encoder to analyze the flow data packet in the suspicious industrial control attack flow group in the step (4), automatically identifying industrial control attack activity fragments from the flow data packet analysis algorithm, learning a corresponding flow data packet load mode and attack intention, constructing industrial control threat information signatures by the learned industrial control attack activity fragments, attack load modes and attack intention, adding the industrial control threat information signatures into an industrial control threat information signature database in real time, and continuously generating industrial control network threat information based on dark network flow mining in an increment mode.
The invention breaks through the defect that the traditional network threat information generation method cannot efficiently utilize the hidden network traffic data, designs and realizes the threat information generation system and method oriented to industrial control network security, integrates related technologies such as deep analysis, deep learning, graph theory and the like of traffic data packets, analyzes and extracts traffic load characteristics, attack fragment pattern characteristics and attack intention characteristics aiming at industrial control network attack from the hidden network traffic data packets; meanwhile, the invention can process newly captured hidden network traffic, extract the latest attack traffic load characteristics, attack fragment pattern characteristics and the like from the hidden network traffic in real time, construct corresponding threat information signatures, and realize real-time dynamic expansion and update of the industrial control network threat information library; the invention designs and realizes a flow data packet deep analysis algorithm for identifying and detecting the active fragments aiming at industrial network attack in the dark network flow, can effectively locate the attack flow range, and greatly reduces the resource consumption of full flow analysis mining; the method comprises the steps of combining a document vectorization algorithm and an iterative clustering algorithm to vectorize the load of the industrial control flow attack fragments so as to characterize the low-dimensional vector characteristics of each attack fragment; in addition, the prototype system is used and practice proves that the invention can effectively detect and extract the attack activity fragments aiming at the industrial control network in the dark network flow, effectively analyze the origins, the participants, the attack load modes and the attack intentions of the attack fragments, generate accurate and detailed threat information aiming at the industrial control network, and can effectively sense the network security situation of the industrial control system; the scheme of the invention is easy to arrange in the existing network, is simple to operate, is safe and reliable, and has obvious economic and social benefits and wide market popularization and application prospects.
The invention relates to a system and a method for generating industrial control network threat information based on hidden network traffic mining, which specifically describe the structural principle and the working process of the system and the method with reference to the accompanying drawings, and preferably comprise the following embodiments.
PREFERRED EMBODIMENTS FOR CARRYING OUT THE INVENTION
As shown in fig. 1, as a preferred embodiment, the industrial control network threat information generating system based on the hidden network traffic mining according to the present invention is characterized by comprising: the system comprises an industrial control flow filtering unit, an industrial control flow converging unit, a flow similarity evaluation unit, an industrial control flow clustering unit and an industrial control threat information signature generation unit.
The industrial control flow filtering unit is used for maintaining and updating an industrial control flow load feature library in real time in order to filter flow data packets irrelevant to industrial control attack from captured dark network flow, designing an industrial control flow extraction algorithm based on character feature matching, judging whether each flow belongs to industrial control field flow by comparing the relative Hamming code distance of the flow and the known industrial control flow load feature, and filtering industrial control flow with the industrial control flow load feature from the captured dark network flow to be used as a data source for generating industrial control network threat information.
The industrial control flow converging unit converges flow data packets according to source IP addresses and destination port values, converges the flow data packets containing the same source IP addresses and the same destination ports in flow data packet header files to be treated as a whole text, embeds the flow data packets into a representation by using a word2vec text embedding algorithm, stores the flow data packets by using a dictionary structure, stores flow data packet embedding characteristics by using dictionary keys, and stores the number and the number of the converged flow data packets by using the dictionary value.
The flow similarity evaluation unit firstly utilizes a graph structure to construct an interactive relation among an industrial control flow source IP, a source port, a destination IP, a destination port and a communication protocol; the flow nodes in the graph are then embedded into the low-dimensional vector space by using an algorithm, and the similarity between any two flow nodes is judged by calculating the Euclidean distance after being embedded into the low-dimensional space.
The industrial control flow clustering unit utilizes the similarity of flow nodes, utilizes a Kmeans clustering algorithm to cluster the nodes in the flow graph to obtain different cluster groups, calculates Euclidean distance between each cluster group and a normal flow group marked in advance, and marks the cluster group with the distance larger than a set threshold as a suspicious industrial control attack flow group.
The industrial control threat information signature generating unit designs a flow data packet analyzing algorithm based on an automatic encoder to analyze the flow data packet in the suspicious industrial control attack flow group, automatically identifies industrial control attack activity fragments from the flow data packet analyzing algorithm and learns the corresponding flow data packet load modes, constructs industrial control threat information signatures from the learned industrial control attack activity fragments, attack load modes and attack intents, and adds the industrial control threat information signatures to an industrial control threat information signature database in real time.
The invention discloses an industrial control network threat information generation method based on hidden network traffic mining, which is characterized by comprising the following steps:
in order to filter flow data packets irrelevant to industrial control attack from captured dark network flow, the invention maintains and updates an industrial control flow load feature library in real time, designs an industrial control flow extraction algorithm based on character feature matching, judges whether each flow belongs to industrial control field flow by comparing the relative Hamming code distance of the flow and the known industrial control flow load feature, and filters industrial control flow with the industrial control flow load feature from the captured dark network flow to be used as a data source for generating industrial control network threat information;
step (2), industrial control flow convergence, which is based on the industrial control flow extracted in the step (1), converging flow data packets according to source IP addresses and destination port values, converging the flow data packets containing the same source IP addresses and the same destination ports in flow data packet header files as a whole text to be treated, embedding the flow data packets into the whole text by using a word2vec text embedding algorithm, and storing the flow data packets by using a dictionary structure, wherein dictionary keys store flow data packet embedding characteristics, and dictionary values store the number and the number of the converged flow data packets;
step (3), flow similarity evaluation, wherein a flow similarity evaluation unit firstly utilizes a graph structure to construct an interactive behavior relation among an industrial control flow source IP, a source port, a destination IP, a destination port and a communication protocol; then, embedding the flow nodes in the graph into a low-dimensional vector space by using a GCN algorithm, judging the similarity between any two flow entities by calculating the Euclidean distance of the nodes embedded into the low-dimensional space, and taking the obtained flow similarity evaluation as a measurement standard of industrial control flow clustering;
step (4), industrial control flow clustering, wherein an industrial control flow clustering unit clusters nodes in a flow graph by utilizing the similarity of flow nodes obtained in the step (3) through a Kmeans clustering algorithm, so as to obtain different cluster groups, calculates Euclidean distance between each cluster group and a normal flow group marked in advance, and marks the cluster group with the distance larger than a set threshold as a suspicious industrial control attack flow group;
and (5) generating an industrial control threat information signature, designing a flow data packet analysis algorithm based on an automatic encoder to analyze the flow data packet in the suspicious industrial control attack flow group in the step (4), automatically identifying industrial control attack activity fragments from the flow data packet analysis algorithm and learning corresponding flow data packet load modes, constructing the industrial control threat information signature by the learned industrial control attack activity fragments, attack load characteristics and attack intention modes, adding the industrial control threat information signature to an industrial control threat information signature database in real time, and continuously generating industrial control network threat information based on dark network flow mining in an increment mode.
The foregoing description of the preferred embodiments of the present invention is merely illustrative, and the technical solution of the present invention is not limited thereto, and any known modifications made by those skilled in the art based on the main technical concept of the present invention are included in the technical scope of the present invention, and the specific scope of the present invention is defined by the claims.

Claims (7)

1. An industrial control network threat information generation system based on hidden network traffic mining is characterized by comprising: the system comprises an industrial control flow filtering unit, an industrial control flow converging unit, a flow similarity evaluation unit, an industrial control flow clustering unit and an industrial control threat information signature generation unit.
2. The industrial control network threat information generation system based on the hidden network traffic mining according to claim 1, wherein the industrial control traffic filtering unit maintains and updates an industrial control traffic load feature library in real time, designs an industrial control traffic filtering algorithm based on character feature matching, and filters industrial control traffic with industrial control traffic load features from the captured hidden network traffic as a data source for generating industrial control network threat information.
3. The industrial control network threat information generation system based on the darknet traffic mining according to claim 1, wherein the industrial control traffic aggregation unit aggregates the traffic data packets according to the source IP (Internet Protocol) address and the destination port value, aggregates the traffic data packets containing the same source IP address and the same destination port in the traffic data packet header file, embeds the traffic data packets into a representation by using a word2vec text embedding algorithm, and stores the traffic data packets by using a dictionary structure, wherein the dictionary key stores traffic data packet embedding characteristics, and the dictionary value stores the number and the number of the aggregated traffic data packets.
4. The industrial control network threat information generation system based on the hidden network traffic mining according to claim 1, wherein the traffic similarity evaluation unit firstly utilizes a graph structure to construct an interaction relationship among an industrial control traffic source IP, a source port, a destination IP, a destination port and a communication protocol; the flow nodes in the graph are then embedded into the low-dimensional vector space by using algorithms including, but not limited to, GCN (Graph Convolutional Networks), and the similarity between any two flow nodes is determined by calculating the Euclidean distance after being embedded into the low-dimensional space.
5. The industrial control network threat information generation system based on the darknet traffic mining according to claim 1, wherein the industrial control traffic clustering unit uses the traffic node similarity according to claim 4 to cluster nodes in the traffic map to obtain different cluster groups by using a Kmeans clustering algorithm, and then calculates the euclidean distance between each cluster group and the normal traffic group marked in advance, and marks the cluster group with the distance larger than a set threshold as a suspicious industrial control attack traffic group.
6. The industrial control network threat information generation system based on the darknet traffic mining according to claim 1, wherein the industrial control threat information signature generation unit designs a traffic data packet in the suspicious industrial control attack traffic group according to claim 5 by including but not limited to a traffic data packet analysis algorithm based on a self-encoder, automatically identifies industrial control attack activity fragments therefrom and learns their corresponding traffic data packet load patterns and attack intents, constructs industrial control threat information signatures from the learned industrial control attack activity fragments, attack load patterns and attack intents, and adds the industrial control threat information signatures to an industrial control threat information signature database in real time.
7. The industrial control network threat information generation method based on the hidden network traffic mining is characterized by comprising the following steps of:
in order to filter flow data packets irrelevant to industrial control attack from captured dark network flow, the invention maintains and updates an industrial control flow load feature library in real time, designs an industrial control flow extraction algorithm based on character feature matching, judges whether each flow belongs to industrial control field flow by comparing the relative Hamming code distance of the flow and the known industrial control flow load feature, and filters industrial control flow with the industrial control flow load feature from the captured dark network flow to be used as a data source for generating industrial control network threat information;
step (2), industrial control flow convergence, which is based on the industrial control flow extracted in the step (1), converging flow data packets according to source IP addresses and destination port values, converging the flow data packets containing the same source IP addresses and the same destination ports in flow data packet header files as a whole text to be treated, embedding the flow data packets into the whole text by using a word2vec text embedding algorithm, and storing the flow data packets by using a dictionary structure, wherein dictionary keys store flow data packet embedding characteristics, and dictionary values store the number and the number of the converged flow data packets;
step (3), flow similarity evaluation, wherein a flow similarity evaluation unit firstly utilizes a graph structure to construct an interactive behavior relation among an industrial control flow source IP, a source port, a destination IP, a destination port and a communication protocol; then, embedding the flow nodes in the graph into a low-dimensional vector space by using an algorithm including but not limited to GCN (Graph Convolutional Network), judging the similarity between any two flow entities by calculating the Euclidean distance of the nodes embedded into the low-dimensional space, and taking the obtained flow similarity evaluation as a measurement standard of industrial control flow clustering;
step (4), industrial control flow clustering, wherein an industrial control flow clustering unit clusters nodes in a flow graph by using the flow node similarity obtained in the step (3) through a Kmeans clustering algorithm, so as to obtain different cluster groups, calculates Euclidean distance between each cluster group and a normal flow group marked in advance, and marks the cluster group with the distance larger than a set threshold as a suspicious industrial control attack flow group;
and (5) generating an industrial control threat information signature, designing a flow data packet analysis algorithm based on a self-encoder to analyze the flow data packet in the suspicious industrial control attack flow group in the step (4), automatically identifying industrial control attack activity fragments from the flow data packet analysis algorithm, learning a corresponding flow data packet load mode and attack intention, constructing the industrial control threat information signature by the learned industrial control attack activity fragments, attack load mode and attack intention, adding the industrial control threat information signature to an industrial control threat information signature database in real time, and continuously generating industrial control network threat information based on hidden network flow mining in an increment mode.
CN202111251627.0A 2021-10-27 2021-10-27 Industrial control network threat information generation system and method based on hidden network traffic mining Pending CN116055071A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111251627.0A CN116055071A (en) 2021-10-27 2021-10-27 Industrial control network threat information generation system and method based on hidden network traffic mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111251627.0A CN116055071A (en) 2021-10-27 2021-10-27 Industrial control network threat information generation system and method based on hidden network traffic mining

Publications (1)

Publication Number Publication Date
CN116055071A true CN116055071A (en) 2023-05-02

Family

ID=86118672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111251627.0A Pending CN116055071A (en) 2021-10-27 2021-10-27 Industrial control network threat information generation system and method based on hidden network traffic mining

Country Status (1)

Country Link
CN (1) CN116055071A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117955745A (en) * 2024-03-26 2024-04-30 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Network attack homology analysis method integrating network flow characteristics and threat information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117955745A (en) * 2024-03-26 2024-04-30 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Network attack homology analysis method integrating network flow characteristics and threat information

Similar Documents

Publication Publication Date Title
CN113556354B (en) Industrial Internet security threat detection method and system based on flow analysis
CN110233849B (en) Method and system for analyzing network security situation
Jianliang et al. The application on intrusion detection based on k-means cluster algorithm
CN112738015B (en) Multi-step attack detection method based on interpretable convolutional neural network CNN and graph detection
Nelms et al. {ExecScent}: Mining for New {C&C} Domains in Live Networks with Adaptive Control Protocol Templates
CN115296924B (en) Network attack prediction method and device based on knowledge graph
CN105577679A (en) Method for detecting anomaly traffic based on feature selection and density peak clustering
Shang et al. Research on industrial control anomaly detection based on FCM and SVM
CN110191103A (en) A kind of DGA domain name detection classification method
CN111131260B (en) Mass network malicious domain name identification and classification method and system
CN104660594A (en) Method for identifying virtual malicious nodes and virtual malicious node network in social networks
CN112671701B (en) Vehicle-mounted terminal intrusion detection method based on vehicle-mounted network abnormal behavior feature driving
KR20150091775A (en) Method and System of Network Traffic Analysis for Anomalous Behavior Detection
CN107360152A (en) A kind of Web based on semantic analysis threatens sensory perceptual system
CN107172022A (en) APT threat detection method and system based on intrusion feature
CN113420802B (en) Alarm data fusion method based on improved spectral clustering
CN110768946A (en) Industrial control network intrusion detection system and method based on bloom filter
CN111709034A (en) Machine learning-based industrial control environment intelligent safety detection system and method
Xu et al. [Retracted] DDoS Detection Using a Cloud‐Edge Collaboration Method Based on Entropy‐Measuring SOM and KD‐Tree in SDN
CN114168968A (en) Vulnerability mining method based on Internet of things equipment fingerprints
Yang et al. Naruto: DNS covert channels detection based on stacking model
CN116055071A (en) Industrial control network threat information generation system and method based on hidden network traffic mining
Yang et al. Detecting DNS covert channels using stacking model
Mohamed et al. Alert correlation using a novel clustering approach
Qi et al. Construction and application of machine learning model in network intrusion detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB03 Change of inventor or designer information

Inventor after: Zhang Changhe

Inventor before: Zhang Changhe

Inventor before: Geng Tongtong

CB03 Change of inventor or designer information