CN114640611A - Unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium - Google Patents

Unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium Download PDF

Info

Publication number
CN114640611A
CN114640611A CN202210229892.7A CN202210229892A CN114640611A CN 114640611 A CN114640611 A CN 114640611A CN 202210229892 A CN202210229892 A CN 202210229892A CN 114640611 A CN114640611 A CN 114640611A
Authority
CN
China
Prior art keywords
message
protocol
unknown
industrial
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210229892.7A
Other languages
Chinese (zh)
Inventor
沈玉龙
蒋梓恒
祝幸辉
赵双睿
何吉
程珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210229892.7A priority Critical patent/CN114640611A/en
Publication of CN114640611A publication Critical patent/CN114640611A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/18Protocol analysers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Communication Control (AREA)

Abstract

The invention belongs to the technical field of industrial internet, and discloses a method, a system, equipment and a medium for detecting and identifying an unknown heterogeneous industrial protocol, wherein the method for detecting and identifying the unknown heterogeneous industrial protocol comprises the steps of receiving the input of a mixed message of a known industrial protocol and an unknown industrial protocol, removing an invalid message by a preprocessing module, and shunting the message to a feature dimension reduction module of a corresponding code according to a message coding format; the feature dimension reduction module performs feature extraction and feature dimension reduction on the message set; the known industrial protocol message screening module separates the known protocol message and the unknown protocol message according to the known protocol message training algorithm and the screening in the database, and submits the unknown protocol message to the unknown industrial protocol message identification module; and the unknown industrial protocol message identification module identifies the screened message. The invention simplifies the adaptation process of the industrial Internet access protocol and improves the access efficiency of large-scale industrial equipment.

Description

Unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium
Technical Field
The invention belongs to the technical field of industrial Internet, and particularly relates to an unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium.
Background
In the process of advancing industrial internet in the industrial field, the multi-source equipment and heterogeneous protocol access problem is a negative factor for further advancing industrial internet industrialization and marketization. The industrial scale of the domestic industrial industry is large, the construction time span is wide, industrial PLCs of all brands in the domestic factory assembly line are mixed, the specifications of industrial equipment are not unified, the industrial protocol format is five-door and eight-door, an OPC access solution based on a dispersion control system and a matching upper computer in foreign countries cannot be adopted, and the cloud difficulty of the industrial equipment is greatly increased.
Unknown protocol detection and identification are always important difficulties in the field of network security and the field of Internet of things equipment access, and many researches are carried out at home and abroad. The unknown protocol is characterized in that the protocol format is unknown, the protocol length is unknown, the protocol characteristics are unknown, and the protocol flow is unknown, so that most researchers at home and abroad complete the detection and identification algorithm of the unknown protocol through the directions of format, length, flow, characteristics and the like so as to achieve the purpose of detecting the unknown protocol. The research method adopted by the thesis is mainly divided into supervised learning algorithm classification and unsupervised learning algorithm classification, and the supervised learning algorithm comprises algorithms such as a deep neural network and a convolutional neural network; the unsupervised learning algorithm mainly comprises a K-means clustering algorithm, a DBSCAN clustering algorithm and the like. However, due to the complexity of the industrial field environment and the multi-source heterogeneity of the industrial protocol, the methods and algorithms cannot solve the problems and difficulties in the industrial internet access process.
Through the above analysis, the problems and defects of the prior art are as follows: the access environment of an industrial field is complex, industrial protocols are multi-source heterogeneous, and an unknown protocol detection technology is not suitable for the access requirement of an industrial internet, so that the cloud difficulty of industrial equipment is greatly increased.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method, a system, equipment and a medium for detecting and identifying unknown heterogeneous industrial protocols.
The invention is realized in such a way that an unknown heterogeneous industrial protocol detection and identification method comprises the following steps:
after receiving the mixed message input of the known industrial protocol and the unknown industrial protocol, removing the invalid message by the preprocessing module, and shunting to the feature dimension reduction module of the corresponding code according to the message coding format; the feature dimension reduction module performs feature extraction and feature dimension reduction on the message set; the known industrial protocol message screening module separates the known protocol message and the unknown protocol message according to the known protocol message training algorithm and screening in the database, and submits the unknown protocol message to the unknown industrial protocol message identification module; and the unknown industrial protocol message identification module identifies the screened messages, removes interference messages, classifies and files the identified unknown industrial protocol messages, and realizes the conversion from the unknown protocol to the known protocol.
Further, the specific process of the unknown heterogeneous industrial protocol detection and identification method is as follows:
the method comprises the steps that firstly, after a protocol message uploaded by a physical layer industrial data acquisition terminal reaches a middleware, an invalid message is removed through a preprocessing module, and the invalid message is distributed to a corresponding coding characteristic dimension reduction module according to a message coding type. The method eliminates invalid data, classifies the input messages according to coding characteristics, and improves the identification accuracy of subsequent steps.
And step two, performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module. The effective characteristics of the message are extracted in the step, the data volume of the message is compressed, and the identification performance of the subsequent steps is improved.
And step three, the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module. The DBSCAN algorithm identification process is simplified by providing the DBSCAN improved algorithm, and the identification speed and accuracy are improved.
And step four, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to a protocol training database in a classified manner. The unknown industrial protocol family is identified in the step, and the conversion from the unknown protocol to the known protocol is completed by storing the unknown protocol to the protocol training database.
Further, in the first step, the specific process of removing the invalid message by the preprocessing module and distributing the invalid message to the corresponding coding feature dimension reduction module according to the coding type of the message is as follows:
firstly, judging the length of a message stack, and considering a protocol with the message stack depth less than 10 as an invalid protocol to be removed according to the commonality of an industrial field bus protocol; then, the message data set is divided into a binary coding message data set and an ASCII coding message data set according to the coding range of each data bit of the message in the data link layer.
Further, in the second step, the specific process of performing feature extraction and feature dimension reduction on the message by the feature dimension reduction module is as follows:
and (3) performing feature extraction and feature dimensionality reduction on the binary coding message data set and the ASCII coding message data set submitted by the preprocessing module by using a PCA principal component analysis method, and extracting a feature message data set with the largest dimensionality discrimination.
Further, in the third step, the specific process of dividing the mixed message data set into the known industrial protocol message data set and the unknown industrial protocol message data set by the known industrial protocol message screening module according to the known protocol message set in the protocol training database is as follows:
screening known industrial protocol messages according to the industrial protocol features of the binary codes and the ASCII codes, and storing an algorithm training data set in a protocol training database;
a known industrial protocol screening algorithm is improved based on a DBSCAN algorithm, and an input message dimension reduction data set is screened and identified by utilizing a characteristic message training algorithm in a protocol training database.
Further, in the fourth step, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates the interference message, and stores the identified unknown industrial protocol message data set to the protocol training database in a classified manner, and the specific process is as follows: and identifying through a DBSCAN algorithm.
Another object of the present invention is to provide an unknown heterogeneous industrial protocol detection and identification system for implementing the unknown heterogeneous industrial protocol detection and identification method, the unknown heterogeneous industrial protocol detection and identification system including:
the preprocessing module is used for receiving an original message data set uploaded by the industrial data acquisition terminal, eliminating invalid messages according to message coding formats and message types, and dividing the message data set according to message codes;
the characteristic dimension reduction module is used for providing a characteristic dimension reduction function for the message data set, performing characteristic extraction and characteristic dimension reduction on the message data set sent by the preprocessing module and providing support for screening of known industrial protocols and identification of unknown industrial protocols;
the known industrial protocol message screening module is used for receiving the message data set processed by the feature dimension reduction module, screening unknown protocol messages by utilizing a DBSCAN algorithm improvement algorithm, and screening the unknown protocol messages according to the feature point properties of the message samples in the protocol training database;
the unknown industrial protocol message identification module is used for identifying the unknown industrial protocol feature point clusters into an industrial protocol family by utilizing the DBSCAN algorithm, storing messages of different protocol families into a protocol training database according to the identified industrial protocol family, and converting the unknown industrial protocol into a known industrial protocol.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
the method comprises the steps that firstly, after a protocol message uploaded by a physical layer industrial data acquisition terminal reaches a middleware, an invalid message is removed through a preprocessing module, and the invalid message is distributed to a corresponding coding feature dimension reduction module according to a message coding type;
step two, performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module;
step three, the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module;
and step four, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to a protocol training database in a classified manner.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the method comprises the steps that firstly, after a protocol message uploaded by a physical layer industrial data acquisition terminal reaches a middleware, an invalid message is removed through a preprocessing module, and the invalid message is distributed to a corresponding coding feature dimension reduction module according to a message coding type;
step two, performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module;
step three, the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module;
and step four, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to a protocol training database in a classified manner.
Another object of the present invention is to provide an information data processing terminal, which is configured to execute the unknown heterogeneous industrial protocol detection and identification method.
In combination with the technical solutions and the technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected in the present invention from the following aspects:
first, aiming at the technical problems existing in the prior art and the difficulty in solving the problems, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows: aiming at the actual problems in the access process of the industrial internet data link layer, the invention aims to simplify the adaptation process of the industrial internet access protocol and improve the access efficiency of large-scale industrial equipment, and the identification accuracy of the binary coding unknown industrial protocol reaches 94 percent and the identification accuracy of the ASCII coding unknown industrial protocol reaches 95 percent.
Secondly, considering the technical scheme as a whole or from the perspective of products, the technical effect and advantages of the technical scheme to be protected by the invention are specifically described as follows:
aiming at the application scenes of the access and protocol adaptation of the existing industrial Internet equipment, the invention realizes the identification and detection of unknown industrial protocols and the conversion of the unknown industrial protocols to the known industrial protocols, and aims to simplify the access and protocol adaptation processes of the industrial Internet equipment and perfect the network-accessing cloud system of the industrial Internet.
Third, as inventive supplementary proof of the claims of the present invention, there are several important aspects as follows:
(1) the expected income and commercial value after the technical scheme of the invention is converted are as follows:
the invention solves the ubiquitous industrial protocol access problem in the domestic industrial Internet business field and provides technical support for large-scale commercial industrial Internet equipment access.
(2) The technical scheme of the invention fills the technical blank in the industry at home and abroad:
the unknown protocol detection technology at home and abroad is mostly used in the field of computer networks and the field of Internet of things, and the invention fills the gap of the technology applied in the field of industrial Internet.
(3) The technical scheme of the invention solves the technical problem that people are eagerly to solve but can not be successfully solved all the time:
the invention solves the problem of large-scale self-adaptive access of industrial multi-source equipment and heterogeneous protocols, clears the barrier of strategic advancement of industrial internet and accelerates the process of industrial intelligent transformation and upgrading in China.
(4) The technical scheme of the invention overcomes the technical prejudice whether: .
The invention combines the algorithm theory of the scientific research field with the actual production technical requirements of the industrial field, and overcomes the technical bias.
Drawings
Fig. 1 is a schematic structural diagram of an unknown heterogeneous industrial protocol detection and identification system according to an embodiment of the present invention.
Fig. 2 is a flowchart of an unknown heterogeneous industrial protocol detection and identification method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an unknown heterogeneous industrial protocol detection and identification process according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a preprocessing process provided by an embodiment of the present invention.
Fig. 5 is a schematic diagram of a feature dimension reduction process provided by an embodiment of the present invention.
Fig. 6 is a schematic diagram of a process of screening a conventional industrial protocol packet according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of an unknown industrial protocol packet identification process according to an embodiment of the present invention.
Fig. 8 is a diagram of a feature dimension reduction result of a binary coded message by a principal component analysis method according to an embodiment of the present invention.
Fig. 9 is a diagram of a feature dimension reduction result of an ASCII encoded message by principal component analysis according to an embodiment of the present invention.
In the figure: 1. a preprocessing module; 2. a feature dimension reduction module; 3. a known industrial protocol message screening module; 4. an unknown industrial protocol message identification module; 5. an industrial data acquisition terminal; 6. a protocol training database.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
First, an embodiment is explained. This section is an explanatory embodiment expanding on the claims so as to fully understand how the present invention is embodied by those skilled in the art.
As shown in fig. 1, an unknown heterogeneous industrial protocol detection and identification system provided in an embodiment of the present invention includes:
the preprocessing module 1 is used for receiving an original message data set uploaded by the industrial data acquisition terminal 5, eliminating invalid messages according to message coding formats and message types, and dividing the message data set according to message codes.
And the feature dimension reduction module 2 is used for providing a feature dimension reduction function for the message data set, performing feature extraction and feature dimension reduction on the message data set sent by the preprocessing module, and providing support for screening of known industrial protocols and identification of unknown industrial protocols.
And the known industrial protocol message screening module 3 is used for receiving the message data set processed by the feature dimension reduction module, screening unknown protocol messages by utilizing a DBSCAN algorithm improvement algorithm, and screening the unknown protocol messages according to the feature point properties of the message samples in the protocol training database.
The unknown industrial protocol message identification module 4 identifies the unknown industrial protocol feature point cluster as an industrial protocol family by utilizing the DBSCAN algorithm, stores messages of different protocol families into the protocol training database 6 according to the identified industrial protocol family, and converts the unknown industrial protocol into the known industrial protocol.
The preprocessing module 1 provided by the embodiment of the present invention provides an invalid packet elimination and packet coding identification mechanism, as shown in table 1 and fig. 4. The main function of the preprocessing module 1 is to process an original message data set and generate a message data set divided by message codes.
TABLE 1 invalid message rejection and message coding identification
Module name Input device Output of Remarks for note
Pre-processing module Raw message data set Encoding a message data set
The preprocessing module 1 receives an original message data set uploaded by an industrial data acquisition terminal 5, eliminates invalid messages according to message coding formats and message types, and divides the message data set according to message codes.
The feature dimension reduction module 2 provided by the embodiment of the invention provides a feature dimension reduction function for the message data set, performs feature extraction and feature dimension reduction on the message data set sent by the preprocessing module 1, and provides support for known industrial protocol screening and unknown industrial protocol identification. The feature dimension reduction module 2 performs feature extraction on the message data set through a principal component analysis algorithm, and performs feature dimension reduction on the message data set, as shown in table 2 and fig. 5.
TABLE 2 message data set feature dimension reduction
Module name Input device Output of Remarks for note
Feature dimension reduction module Message data set Message dimensionality reduction dataset
The known industrial protocol message screening module 3 provided by the embodiment of the invention screens and separates the known industrial protocol message and the unknown industrial protocol message, and provides support for the unknown industrial protocol message identification module 4. Screening and classifying the known industrial protocol messages and the unknown industrial protocol messages in the input message data set according to the message samples in the protocol training database, and submitting the known industrial protocol messages to the upper application of the corresponding protocol; and the unknown industrial protocol message is submitted to the unknown industrial protocol message identification module 4.
The known industrial protocol message screening module 3 has the main functions of receiving the message data set processed by the feature dimension reduction module 2, providing a DBSCAN algorithm improvement algorithm for screening unknown protocol messages, and screening the unknown protocol messages according to the feature point properties of the message samples in the protocol training database, as shown in table 3 and fig. 6.
Table 3 known industry protocol message screening
Figure BDA0003537957070000081
The unknown industrial protocol message identification module 4 provided by the embodiment of the invention receives the unknown industrial protocol message data set screened by the known industrial protocol message screening module 3, and has the main functions of identifying the protocol family in the unknown industrial protocol message data set, eliminating interference messages and storing the interference messages into the protocol training database, as shown in table 4.
TABLE 4 unknown Industrial protocol message identification
Figure BDA0003537957070000082
The unknown industrial protocol message identification module 4 utilizes the DBSCAN algorithm to cluster and identify the unknown industrial protocol feature points into an industrial protocol family. According to the identified industrial protocol family, the messages of different protocol families are stored in the protocol training database, and the unknown industrial protocol is converted into the known industrial protocol, as shown in fig. 7.
As shown in fig. 2, the method for detecting and identifying an unknown heterogeneous industrial protocol provided in the embodiment of the present invention includes:
s101: and after the protocol message uploaded by the physical layer industrial data acquisition terminal reaches the middleware, removing the invalid message through the preprocessing module, and distributing the invalid message to the corresponding coding characteristic dimension reduction module according to the message coding type.
S102: and performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module.
S103: the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module.
S104: and the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to the protocol training database in a classified manner.
In S101 provided in the embodiment of the present invention, the specific process of removing the invalid packet by the preprocessing module and distributing the invalid packet to the corresponding coding feature dimension reduction module according to the packet coding type is as follows:
firstly, judging the length of a message stack, and considering the protocol with the message stack depth less than 10 as an invalid protocol to be removed according to the commonality of the industrial field bus protocol.
Then, the message data set is divided into a binary coding message data set and an ASCII coding message data set according to the coding range of each data bit of the message in the data link layer.
In S102 provided in the embodiment of the present invention, the specific processes of feature extraction and feature dimension reduction performed on the packet by the feature dimension reduction module are as follows:
and (3) performing feature extraction and feature dimensionality reduction on the binary coding message data set and the ASCII coding message data set submitted by the preprocessing module by using a PCA principal component analysis method, and extracting a feature message data set with the largest dimensionality discrimination. The common industry protocol dimension reduction results are shown in fig. 8 and fig. 9.
In S103 provided in the embodiment of the present invention, the specific process of dividing the mixed message dataset into the known industrial protocol message dataset and the unknown industrial protocol message dataset by the known industrial protocol message screening module according to the known protocol message dataset in the protocol training database is as follows:
screening known industrial protocol messages according to the industrial protocol features of the binary codes and the ASCII codes, and storing an algorithm training data set in a protocol training database;
a known industrial protocol screening algorithm is improved based on a DBSCAN algorithm, an input message dimensionality reduction data set is screened and identified by a characteristic message training algorithm in a protocol training database, and the screening process comprises the following steps:
and inputting the message dimension reduction data set into an algorithm, and performing hit judgment on input data according to the given neighborhood distance and the minimum neighborhood point number.
If the points in the input data set are represented as core points, the characteristic points are judged to be known protocols, and corresponding messages are determined to be messages of the known protocols; and if the point in the input data set has no neighborhood point, determining the point as an unknown protocol message.
If the points in the input data set are represented as boundary points, traversing other feature points in the neighborhood of the points, and if core points of a known protocol exist, determining the corresponding message as a message of the known protocol; if a core point of the data set exists and the core point belongs to a known protocol, the corresponding message is also determined to be a message of the known protocol.
In S104 provided by the embodiment of the present invention, the unknown industrial protocol packet identifying module identifies the unknown industrial protocol packet data set, removes the interference packet, and stores the identified unknown industrial protocol packet data set to the protocol training database in a classified manner, which specifically includes the following steps:
identifying through a DBSCAN algorithm; firstly, determining a core point meeting neighborhood distance and neighborhood point number; then traversing each point in the neighborhood of the core point to determine whether the core point is the core point under the current neighborhood distance and the number of the neighborhood points; and if the core point is the core point, adding the result set. And if one point does not satisfy the current neighborhood distance and the number of the neighborhood points, the point is a boundary point. When a cluster is surrounded by boundary points, this cluster has been searched for completion because there are no more points within distance. A new random point is selected and the process is repeated to identify the next cluster.
And II, application embodiment. In order to prove the creativity and the technical value of the technical scheme of the invention, the part is the application example of the technical scheme of the claims on specific products or related technologies. The method is applied to an industrial Internet platform with independent intellectual property rights, is used as a processing middleware of a data acquisition terminal of a physical layer industrial hardware device, and is used for carrying out known/unknown protocol message screening identification and known/unknown protocol conversion on acquired heterogeneous industrial protocol messages at a data link layer so as to provide data support for the industrial Internet platform protocol adaptation identification middleware and the message analysis middleware.
The computer device provided by the embodiment of the invention comprises a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the following steps: the method comprises the steps that firstly, after a protocol message uploaded by a physical layer industrial data acquisition terminal reaches a middleware, an invalid message is removed through a preprocessing module, and the invalid message is distributed to a corresponding coding feature dimension reduction module according to a message coding type; step two, performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module; step three, the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module; and step four, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to a protocol training database in a classified manner.
A computer-readable storage medium according to an embodiment of the present invention stores a computer program, and when the computer program is executed by a processor, the processor executes the following steps: the method comprises the steps that firstly, after a protocol message uploaded by a physical layer industrial data acquisition terminal reaches a middleware, an invalid message is removed through a preprocessing module, and the invalid message is distributed to a corresponding coding feature dimension reduction module according to a message coding type; step two, performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module; step three, the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module; and step four, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to a protocol training database in a classified manner.
The embodiment of the invention provides an information data processing terminal, which is used for executing the unknown heterogeneous industrial protocol detection and identification method.
And thirdly, evidence of relevant effects of the embodiment. The embodiment of the invention achieves some positive effects in the process of research and development or use, and has great advantages compared with the prior art, and the following contents are described by combining data, diagrams and the like in the test process.
The feature extraction results of the binary code and ASCII code protocol messages by the feature dimension reduction module are shown in fig. 8 and 9. The binary code feature dimension reduction module verifies that 1000 messages of each binary code protocol are used, 12000 messages of 12 protocols are used for testing, and the feature dimension reduction result shown in figure 8 is clear, and the multi-dimensional protocol features are extracted in two dimensions. The ASCII coding feature dimension reduction module verifies that 1000 messages of ASCII protocols are used, 10000 messages of 10 protocols are used for testing, and the feature dimension reduction result shown in figure 9 clearly extracts the multi-dimensional protocol features on two dimensions.
The following table is a verification result table of a binary coding known industrial protocol message screening module, ten groups of binary coding mixed protocol message data sets subjected to dimension reduction by a feature dimension reduction module are used for testing, each group of 1000 messages has 10000 message data, and the average known message identification rate is 95.39%.
Figure BDA0003537957070000121
The following table is an ASCII encoding known industrial protocol message screening module verification result table, ten groups of ASCII encoding mixed protocol message data sets subjected to dimension reduction by a feature dimension reduction module are used for testing, each group of 1000 messages has 10000 message data, and the average known message identification rate is 94.14%.
Figure BDA0003537957070000122
The following table is a verification result table of the unknown industrial protocol message identification module of binary coding, ten groups of ASCII coding mixed protocol message data sets subjected to dimension reduction by the feature dimension reduction module are used for testing, and each group has 1000 message data of 10000. The verification result visually displays the identification accuracy of common clustering identification algorithms such as a DBSCAN algorithm, a K-means algorithm, a means algorithm and the like to the ASCII coded unknown industrial protocol message, the average identification rate of the DBSCAN algorithm used in the invention is 95.20 percent, which is higher than 87.75 percent of the K-means algorithm and 82.93 percent of the means algorithm.
Figure BDA0003537957070000131
The following table is an ASCII encoding unknown industrial protocol message identification module verification result table, ten sets of ASCII encoding mixed protocol message data sets subjected to dimension reduction by the feature dimension reduction module are used for testing, and each set comprises 1000 messages of 10000 messages. The verification result visually displays the identification accuracy of common clustering identification algorithms such as a DBSCAN algorithm, a K-means algorithm, a means algorithm and the like to the ASCII coded unknown industrial protocol message, the average identification rate of the DBSCAN algorithm used in the invention is 95.76 percent, which is higher than 88.34 percent of the K-means algorithm and 91.84 percent of the means algorithm.
Figure BDA0003537957070000132
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the embodiments of the present invention, and the scope of the present invention should not be limited thereto, and any modifications, equivalents and improvements made by those skilled in the art within the technical scope of the present invention as disclosed in the present invention should be covered by the scope of the present invention.

Claims (10)

1. An unknown heterogeneous industrial protocol detection and identification method is characterized by comprising the following steps: after receiving the mixed message input of the known industrial protocol and the unknown industrial protocol, removing the invalid message by the preprocessing module, and shunting to the feature dimension reduction module of the corresponding code according to the message coding format; the feature dimension reduction module performs feature extraction and feature dimension reduction on the message set; the known industrial protocol message screening module separates the known protocol message and the unknown protocol message according to the known protocol message training algorithm and screening in the database, and submits the unknown protocol message to the unknown industrial protocol message identification module; and the unknown industrial protocol message identification module identifies the screened messages, removes interference messages, classifies and files the identified unknown industrial protocol messages, and realizes the conversion from the unknown protocol to the known protocol.
2. The method for detecting and identifying the unknown heterogeneous industrial protocol according to claim 1, wherein the method for detecting and identifying the unknown heterogeneous industrial protocol comprises the following specific processes:
the method comprises the steps that firstly, after a protocol message uploaded by a physical layer industrial data acquisition terminal reaches a middleware, an invalid message is removed through a preprocessing module, and the invalid message is distributed to a corresponding coding feature dimension reduction module according to a message coding type;
step two, performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module;
step three, the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module;
and step four, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to a protocol training database in a classified manner.
3. The method for detecting and identifying the unknown heterogeneous industrial protocol according to claim 2, wherein in the first step, the specific process of removing the invalid message through the preprocessing module and distributing the invalid message to the corresponding coding feature dimension reduction module according to the message coding type comprises the following steps:
firstly, judging the length of a message stack, and considering a protocol with the message stack depth less than 10 as an invalid protocol to be removed according to the commonality of an industrial field bus protocol; then, the message data set is divided into a binary coding message data set and an ASCII coding message data set according to the coding range of each data bit of the message in the data link layer.
4. The method for detecting and identifying an unknown heterogeneous industrial protocol according to claim 2, wherein in the second step, the specific processes of feature extraction and feature dimension reduction on the message by the feature dimension reduction module are as follows:
and (3) performing feature extraction and feature dimensionality reduction on the binary coding message data set and the ASCII coding message data set submitted by the preprocessing module by using a PCA principal component analysis method, and extracting a feature message data set with the largest dimensionality discrimination.
5. The unknown heterogeneous industrial protocol detecting and identifying method according to claim 2, wherein in the third step, the known industrial protocol message screening module divides the mixed message data set into the known industrial protocol message data set and the unknown industrial protocol message data set according to the known protocol message set in the protocol training database by a specific process:
screening known industrial protocol messages according to the industrial protocol features of the binary codes and the ASCII codes, and storing an algorithm training data set in a protocol training database; a known industrial protocol screening algorithm is improved based on a DBSCAN algorithm, and an input message dimension reduction data set is screened and identified by using a characteristic message training algorithm in a protocol training database.
6. The unknown heterogeneous industrial protocol detecting and identifying method according to claim 2, wherein in the fourth step, the unknown industrial protocol message identifying module identifies the unknown industrial protocol message data set, eliminates the interference message, and stores the identified unknown industrial protocol message data set to the protocol training database in a classified manner by the specific process: the identification is performed through a DBSCAN algorithm.
7. An unknown heterogeneous industrial protocol detection and identification system for implementing the unknown heterogeneous industrial protocol detection and identification method according to any one of claims 1 to 6, wherein the unknown heterogeneous industrial protocol detection and identification system comprises:
the preprocessing module is used for receiving an original message data set uploaded by the industrial data acquisition terminal, eliminating invalid messages according to message coding formats and message types, and dividing the message data set according to message codes;
the characteristic dimension reduction module is used for providing a characteristic dimension reduction function for the message data set, performing characteristic extraction and characteristic dimension reduction on the message data set sent by the preprocessing module and providing support for screening of known industrial protocols and identification of unknown industrial protocols;
the known industrial protocol message screening module is used for receiving the message data set processed by the feature dimension reduction module, screening unknown protocol messages by utilizing a DBSCAN algorithm improvement algorithm, and screening the unknown protocol messages according to the feature point properties of the message samples in the protocol training database;
the unknown industrial protocol message identification module is used for identifying the unknown industrial protocol feature point clusters into an industrial protocol family by utilizing the DBSCAN algorithm, storing messages of different protocol families into a protocol training database according to the identified industrial protocol family, and converting the unknown industrial protocol into a known industrial protocol.
8. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
the method comprises the steps that firstly, after a protocol message uploaded by a physical layer industrial data acquisition terminal reaches a middleware, an invalid message is removed through a preprocessing module, and the invalid message is distributed to a corresponding coding feature dimension reduction module according to a message coding type;
step two, performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module;
step three, the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module;
and step four, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to a protocol training database in a classified manner.
9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the method comprises the steps that firstly, after a protocol message uploaded by a physical layer industrial data acquisition terminal reaches a middleware, an invalid message is removed through a preprocessing module, and the invalid message is distributed to a corresponding coding feature dimension reduction module according to a message coding type;
step two, performing feature extraction and feature dimension reduction on the message through a feature dimension reduction module;
step three, the known industrial protocol message screening module divides the mixed message data set into a known industrial protocol message data set and an unknown industrial protocol message data set according to the known protocol message set in the protocol training database, and submits the unknown industrial protocol message data set to the unknown industrial protocol message identification module;
and step four, the unknown industrial protocol message identification module identifies the unknown industrial protocol message data set, eliminates interference messages and stores the identified unknown industrial protocol message data set to a protocol training database in a classified manner.
10. An information data processing terminal, characterized in that the information data processing terminal is used for executing the unknown heterogeneous industrial protocol detection and identification method according to any one of claims 1-6.
CN202210229892.7A 2022-03-09 2022-03-09 Unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium Pending CN114640611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210229892.7A CN114640611A (en) 2022-03-09 2022-03-09 Unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210229892.7A CN114640611A (en) 2022-03-09 2022-03-09 Unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium

Publications (1)

Publication Number Publication Date
CN114640611A true CN114640611A (en) 2022-06-17

Family

ID=81948081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210229892.7A Pending CN114640611A (en) 2022-03-09 2022-03-09 Unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN114640611A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101605067A (en) * 2009-04-22 2009-12-16 网经科技(苏州)有限公司 Network behavior active analysis diagnostic method
CN102546625A (en) * 2011-12-31 2012-07-04 深圳市永达电子股份有限公司 Semi-supervised clustering integrated protocol identification system
CN107241307A (en) * 2017-04-26 2017-10-10 北京立思辰计算机技术有限公司 The Network Isolation safety device and method of a kind of self study based on message content
CN108234452A (en) * 2017-12-12 2018-06-29 上海天旦网络科技发展有限公司 A kind of system and method for network packet multi-layer protocol identification
JP2018124513A (en) * 2017-02-03 2018-08-09 Kddi株式会社 Classification device, classification method, and classification program
CN110445750A (en) * 2019-06-18 2019-11-12 国家计算机网络与信息安全管理中心 A kind of car networking protocol traffic recognition methods and device
CN111163071A (en) * 2019-12-20 2020-05-15 杭州九略智能科技有限公司 Unknown industrial protocol recognition engine
CN111274235A (en) * 2020-01-16 2020-06-12 电子科技大学 Unknown protocol data cleaning and protocol field feature extraction method
WO2020119662A1 (en) * 2018-12-14 2020-06-18 深圳先进技术研究院 Network traffic classification method
CN112134737A (en) * 2020-10-19 2020-12-25 北方工业大学 Reverse analysis system of industrial Internet of things
CN112235264A (en) * 2020-09-28 2021-01-15 国家计算机网络与信息安全管理中心 Network traffic identification method and device based on deep migration learning
CN112702235A (en) * 2020-12-21 2021-04-23 中国人民解放军陆军炮兵防空兵学院 Method for automatically and reversely analyzing unknown protocol
CN113985831A (en) * 2021-11-17 2022-01-28 河北工业大学 Industrial control system state mechanism building method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101605067A (en) * 2009-04-22 2009-12-16 网经科技(苏州)有限公司 Network behavior active analysis diagnostic method
CN102546625A (en) * 2011-12-31 2012-07-04 深圳市永达电子股份有限公司 Semi-supervised clustering integrated protocol identification system
JP2018124513A (en) * 2017-02-03 2018-08-09 Kddi株式会社 Classification device, classification method, and classification program
CN107241307A (en) * 2017-04-26 2017-10-10 北京立思辰计算机技术有限公司 The Network Isolation safety device and method of a kind of self study based on message content
CN108234452A (en) * 2017-12-12 2018-06-29 上海天旦网络科技发展有限公司 A kind of system and method for network packet multi-layer protocol identification
WO2020119662A1 (en) * 2018-12-14 2020-06-18 深圳先进技术研究院 Network traffic classification method
CN110445750A (en) * 2019-06-18 2019-11-12 国家计算机网络与信息安全管理中心 A kind of car networking protocol traffic recognition methods and device
CN111163071A (en) * 2019-12-20 2020-05-15 杭州九略智能科技有限公司 Unknown industrial protocol recognition engine
CN111274235A (en) * 2020-01-16 2020-06-12 电子科技大学 Unknown protocol data cleaning and protocol field feature extraction method
CN112235264A (en) * 2020-09-28 2021-01-15 国家计算机网络与信息安全管理中心 Network traffic identification method and device based on deep migration learning
CN112134737A (en) * 2020-10-19 2020-12-25 北方工业大学 Reverse analysis system of industrial Internet of things
CN112702235A (en) * 2020-12-21 2021-04-23 中国人民解放军陆军炮兵防空兵学院 Method for automatically and reversely analyzing unknown protocol
CN113985831A (en) * 2021-11-17 2022-01-28 河北工业大学 Industrial control system state mechanism building method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林荣强;李鸥;李青;刘琰;: "基于半监督聚类集成的未知网络协议识别方法", 小型微型计算机系统, no. 06, pages 2 *
苏欣;杨建华;张大方;谢高岗;: "面向流量识别系统的聚类算法的比较与分析", 计算技术与自动化, no. 03 *

Similar Documents

Publication Publication Date Title
CN109639739B (en) Abnormal flow detection method based on automatic encoder network
CN111796957A (en) Transaction abnormal root cause analysis method and system based on application log
CN111681132A (en) Typical power consumption mode extraction method suitable for massive class unbalanced load data
CN112884121A (en) Traffic identification method based on generation of confrontation deep convolutional network
CN110837874A (en) Service data abnormity detection method based on time series classification
CN110704616A (en) Equipment alarm work order identification method and device
CN115758183A (en) Training method and device for log anomaly detection model
CN116756225B (en) Situation data information processing method based on computer network security
CN114154829A (en) Method, device, terminal and storage medium for determining industrial chain nodes of enterprise
CN113569048A (en) Method and system for automatically dividing affiliated industries based on enterprise operation range
CN114640611A (en) Unknown heterogeneous industrial protocol detection and identification method, system, equipment and medium
Wu et al. Mixed Pattern Matching‐Based Traffic Abnormal Behavior Recognition
CN116541792A (en) Method for carrying out group partner identification based on graph neural network node classification
Chao et al. Research on network intrusion detection technology based on dcgan
CN114826764B (en) Edge computing network attack recognition method and system based on ensemble learning
CN115758086A (en) Method, device and equipment for detecting faults of cigarette cut-tobacco drier and readable storage medium
CN113657443B (en) On-line Internet of things equipment identification method based on SOINN network
CN115879030A (en) Network attack classification method and system for power distribution network
CN113705624B (en) Intrusion detection method and system for industrial control system
CN113221642B (en) Violation snapshot image AI recognition system
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium
CN113326688A (en) Ideological and political theory word duplication checking processing method and device
CN113641824A (en) Text classification system and method based on deep learning
CN111046934A (en) Method and device for identifying soft clauses of SWIFT message
CN116912845B (en) Intelligent content identification and analysis method and device based on NLP and AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination