CN115134433A - A method, system, device and storage medium for semantic analysis of industrial control protocol - Google Patents

A method, system, device and storage medium for semantic analysis of industrial control protocol Download PDF

Info

Publication number
CN115134433A
CN115134433A CN202210723745.5A CN202210723745A CN115134433A CN 115134433 A CN115134433 A CN 115134433A CN 202210723745 A CN202210723745 A CN 202210723745A CN 115134433 A CN115134433 A CN 115134433A
Authority
CN
China
Prior art keywords
protocol
field
data
data stream
industrial control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210723745.5A
Other languages
Chinese (zh)
Other versions
CN115134433B (en
Inventor
李勇
田晓芸
郝怡
贾江凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Digital Technology Holdings Co ltd
State Grid E Commerce Technology Co Ltd
Original Assignee
State Grid Digital Technology Holdings Co ltd
State Grid E Commerce Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Digital Technology Holdings Co ltd, State Grid E Commerce Technology Co Ltd filed Critical State Grid Digital Technology Holdings Co ltd
Priority to CN202210723745.5A priority Critical patent/CN115134433B/en
Publication of CN115134433A publication Critical patent/CN115134433A/en
Application granted granted Critical
Publication of CN115134433B publication Critical patent/CN115134433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/18Multiprotocol handlers, e.g. single devices capable of handling multiple protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Communication Control (AREA)

Abstract

The embodiment of the invention provides a semantic analysis method, a semantic analysis system, semantic analysis equipment and a storage medium for an industrial control protocol, wherein the method comprises the following steps: the method comprises the steps of identifying each data stream from bus protocol streams by utilizing a preset multi-mode matching algorithm, determining the protocol type of the data stream meeting the requirement of a preset protocol header format as an industrial Ethernet protocol type, determining the protocol type of the data stream not meeting the requirement of the preset protocol header format as a field bus protocol type, dividing each data stream by using a protocol format corresponding to the protocol type, carrying out semantic analysis on each field according to the protocol format, obtaining a semantic analysis result of each field, obtaining an industrial human-computer interface, identifying, obtaining industrial control data of each display area in the industrial human-computer interface, and determining the positions and meanings of an area identification field and a variable data field in each data stream based on the industrial control data and the industrial control data field. The invention improves the resolution precision and the resolution efficiency of the industrial control protocol.

Description

一种工控协议的语义解析方法、系统、设备及存储介质A method, system, device and storage medium for semantic analysis of industrial control protocol

技术领域technical field

本发明涉及信息处理技术领域,特别是涉及一种工控协议的语义解析方法、系统、设备及存储介质。The invention relates to the technical field of information processing, in particular to a method, system, device and storage medium for semantic analysis of industrial control protocols.

背景技术Background technique

伴随着信息技术的发展,工业互联网(Industrial Internet)作为工业数字化、网络化和智能化转型的基础设施,被广泛应用于工业控制领域。但是,由于工业互联网中的工控协议,其数量和协议复杂度远高于传统以太网协议。导致现有技术在利用基于传统以太网协议的解析方法,对工控协议进行解析时,其解析的精度和效率低。又由于工控协议的解析精度与解析效率低,会导致工业互联网出现数据传输延时的风险,从而影响了工业互联网中各接入设备的运行可靠性。因此,如何提高对工控协议的解析精度和解析效率,已成为亟待解决的问题。With the development of information technology, the Industrial Internet (Industrial Internet) is widely used in the field of industrial control as an infrastructure for industrial digitalization, networking and intelligent transformation. However, due to the industrial control protocol in the Industrial Internet, its number and protocol complexity are much higher than those of the traditional Ethernet protocol. As a result, when the prior art uses the traditional Ethernet protocol-based parsing method to parse the industrial control protocol, the parsing precision and efficiency are low. In addition, due to the low parsing accuracy and parsing efficiency of the industrial control protocol, the risk of data transmission delay in the industrial Internet will occur, thus affecting the operational reliability of each access device in the industrial Internet. Therefore, how to improve the parsing accuracy and parsing efficiency of industrial control protocols has become an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

本发明实施例的目的在于提供一种工控协议的语义解析方法、系统、设备及存储介质,以实现提高对工控协议的解析精度和解析效率的发明目的。具体技术方案如下:The purpose of the embodiments of the present invention is to provide a method, system, device and storage medium for semantic parsing of industrial control protocols, so as to achieve the purpose of improving the parsing precision and parsing efficiency of industrial control protocols. The specific technical solutions are as follows:

一种工控协议的语义解析方法,所述语义解析方法包括:A semantic parsing method for an industrial control protocol, the semantic parsing method comprising:

利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足所述预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足所述预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型。Using the preset multi-pattern matching algorithm, according to the preset protocol header format requirements, identify each data stream from the bus protocol stream, and determine the protocol type of the data stream that meets the preset protocol header format requirements as Industrial Ethernet For the protocol type, the protocol type of the data stream that does not meet the requirements of the preset protocol header format is determined as the field bus protocol type.

使用与所述协议类型对应的协议格式,对各数据流进行字段划分,并根据所述协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果。Using a protocol format corresponding to the protocol type, field division is performed on each data stream, and semantic parsing is performed on each field of each data stream according to the protocol format to obtain a semantic parsing result of each field.

获得工业人机界面,对所述工业人机界面进行识别,获得所述工业人机界面中各显示区域的工控数据,并基于所述工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,其中,所述工控数据字段是所述语义解析结果中数据类型为工控数据的字段。Obtain an industrial man-machine interface, identify the industrial man-machine interface, obtain industrial control data of each display area in the industrial man-machine interface, and determine the area identification field in each data stream based on the industrial control data and the industrial control data field and the position and meaning of the variable data field, wherein the industrial control data field is a field whose data type is industrial control data in the semantic analysis result.

可选的,所述利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足所述预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足所述预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型,包括:Optionally, the use of a preset multi-pattern matching algorithm to identify each data stream from the bus protocol stream according to the preset protocol header format requirements, and the protocol type of the data stream that meets the preset protocol header format requirements. , which is determined as the industrial Ethernet protocol type, and the protocol type of the data stream that does not meet the requirements of the preset protocol header format is determined as the field bus protocol type, including:

对所述总线协议流中的各数据流:For each data stream in the bus protocol stream:

利用所述预设多模式匹配算法判断该数据流是否包含协议头部数据,若是,则将所述预设协议头部格式要求中,与该数据流的所述协议头部数据匹配的工业以太网协议类型确定为该数据流的协议类型。Use the preset multi-pattern matching algorithm to determine whether the data stream contains protocol header data; The network protocol type is determined as the protocol type of the data stream.

在该数据流不包含所述协议头部数据的情况下,将该数据流的协议类型确定为现场总线协议类型。In the case that the data stream does not contain the protocol header data, the protocol type of the data stream is determined as the field bus protocol type.

可选的,在数据流的协议类型为所述工业以太网协议类型的情况下,所述使用与所述协议类型对应的协议格式,对各数据流进行字段划分,并根据所述协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果,包括:Optionally, in the case that the protocol type of the data stream is the industrial Ethernet protocol type, the use of the protocol format corresponding to the protocol type is used to perform field division on each data stream, and according to the protocol format. Perform semantic analysis on each field of each data stream to obtain the semantic analysis result of each field, including:

对所述协议类型为所述工业以太网协议类型的各数据流:For each data stream whose protocol type is the Industrial Ethernet protocol type:

利用预设字符串分割算法,计算该数据流中各字节的信息熵及相邻字节间的互信息量,并根据所述信息熵和所述互信息量确定数据流的各分割点。Using a preset string segmentation algorithm, the information entropy of each byte in the data stream and the mutual information between adjacent bytes are calculated, and each segmentation point of the data stream is determined according to the information entropy and the mutual information.

根据各分割点将所述数据流划分为多个字段。The data stream is divided into a plurality of fields according to each division point.

根据该数据流的协议类型中的以太网协议流标识,获得与所述以太网协议流标识匹配的协议格式。According to the Ethernet protocol stream identifier in the protocol type of the data stream, a protocol format matching the Ethernet protocol stream identifier is obtained.

对各字段:利用预设逆向解析算法,根据所述以太网协议流标识匹配的协议格式确定各字段的第一语义解析结果,所述第一语义解析结果包括各字段的语义和数据类型。For each field: a preset reverse parsing algorithm is used to determine the first semantic parsing result of each field according to the protocol format matched with the Ethernet protocol flow identifier, and the first semantic parsing result includes the semantics and data type of each field.

可选的,在数据流的协议类型为所述现场总线协议类型的情况下,还包括:Optionally, when the protocol type of the data stream is the fieldbus protocol type, it also includes:

对所述协议类型为所述现场总线协议类型的各数据流:For each data stream whose protocol type is the fieldbus protocol type:

根据该数据流的字节长度确定该数据流的总线协议流标识,获得与所述总线协议流标识匹配的协议格式。The bus protocol stream identifier of the data stream is determined according to the byte length of the data stream, and a protocol format matching the bus protocol stream identifier is obtained.

利用预设字节语义推断算法,根据所述与所述总线协议流标识匹配的协议格式,将该数据流划分为控制命令字段、协议数据字段和结束符字段。Using a preset byte semantic inference algorithm, according to the protocol format matching the bus protocol stream identifier, the data stream is divided into a control command field, a protocol data field and a terminator field.

利用所述预设字节语义推断算法,根据所述与所述总线协议流标识匹配的协议格式,确定所述协议数据字段中的各子字段,并对所述控制命令字段、各子字段和所述结束符字段进行语义解析,获得各字段和各子字段的第二语义解析结果,所述第二语义解析结果包括各字段或各子字段的语义和数据类型。Using the preset byte semantic inference algorithm, according to the protocol format matching the bus protocol flow identifier, each subfield in the protocol data field is determined, and the control command field, each subfield and the Semantic parsing is performed on the terminator field to obtain a second semantic parsing result of each field and each subfield, where the second semantic parsing result includes the semantics and data type of each field or each subfield.

可选的,所述获得工业人机界面,对所述工业人机界面进行识别,获得所述工业人机界面中各显示区域的工控数据,并基于所述工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,包括:Optionally, the obtaining of the industrial human-machine interface is to identify the industrial human-machine interface, obtain industrial control data of each display area in the industrial human-machine interface, and determine each data based on the industrial control data and the industrial control data field. The location and meaning of the area ID fields and variable data fields in the stream, including:

利用预设图像识别算法,获得所述工业人机界面中各显示区域的工控数据。Using a preset image recognition algorithm, the industrial control data of each display area in the industrial human-machine interface is obtained.

对各显示区域的工控数据:Industrial control data for each display area:

根据该显示区域的工控数据获取目标数据流,其中,所述目标数据流是存在与该显示区域的工控数据的数据编码匹配的字段的数据流。A target data stream is acquired according to the industrial control data in the display area, wherein the target data stream is a data stream that has a field matching the data encoding of the industrial control data in the display area.

利用预设序列比对算法,将该显示区域的工控数据中的恒定数据序列与所述目标数据流的工控数据字段进行序列比对,并将所述工控数据字段中比对一致字段,确定为所述区域标识字段,并将所述预设图像识别算法识别出的该恒定数据序列的含义,确定为所述区域标识字段的含义。Using a preset sequence comparison algorithm, the constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared, and determined as the region identification field, and the meaning of the constant data sequence identified by the preset image recognition algorithm is determined as the meaning of the region identification field.

利用所述预设序列比对算法,将该显示区域的工控数据中的非恒定数据序列与所述目标数据流的工控数据字段进行序列比对,并将所述工控数据字段中比对一致字段,确定为所述变量数据字段,并将所述预设图像识别算法识别出的该非恒定数据序列含义,确定为所述变量数据字段的含义。Using the preset sequence comparison algorithm, the non-constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared. is determined as the variable data field, and the meaning of the non-constant data sequence identified by the preset image recognition algorithm is determined as the meaning of the variable data field.

一种工控协议的语义解析系统,所述语义解析系统包括:A semantic parsing system of an industrial control protocol, the semantic parsing system comprising:

协议类型确定单元,利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足所述预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足所述预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型。The protocol type determination unit uses a preset multi-pattern matching algorithm to identify each data stream from the bus protocol stream according to the preset protocol header format requirements, and determines the protocol type of the data stream that meets the preset protocol header format requirements, It is determined as the industrial Ethernet protocol type, and the protocol type of the data stream that does not meet the requirements of the preset protocol header format is determined as the field bus protocol type.

字段语义解析单元,使用与所述协议类型对应的协议格式,对各数据流进行字段划分,并根据所述协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果。The field semantic parsing unit uses the protocol format corresponding to the protocol type to perform field division on each data stream, and performs semantic parsing on each field of each data stream according to the protocol format to obtain the semantic parsing result of each field.

关键字段确定单元,用于获得工业人机界面,对所述工业人机界面进行识别,获得所述工业人机界面中各显示区域的工控数据,并基于所述工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,其中,所述工控数据字段是所述语义解析结果中数据类型为工控数据的字段。A key field determination unit is used to obtain an industrial human-machine interface, identify the industrial human-machine interface, obtain industrial control data of each display area in the industrial human-machine interface, and determine based on the industrial control data and the industrial control data field The location and meaning of the area identification field and the variable data field in each data stream, wherein the industrial control data field is a field whose data type is industrial control data in the semantic analysis result.

可选的,所述协议类型确定单元被设置为:Optionally, the protocol type determination unit is set to:

对所述总线协议流中的各数据流:For each data stream in the bus protocol stream:

判断该数据流是否包含协议头部数据,若是,则将所述预设协议头部格式要求中,与该数据流的所述协议头部数据匹配的工业以太网协议类型确定为该数据流的协议类型。Determine whether the data stream contains protocol header data, and if so, determine the industrial Ethernet protocol type that matches the protocol header data of the data stream in the preset protocol header format requirements as the data stream of the data stream. agreement type.

在该数据流不包含所述协议头部数据的情况下,将该数据流的协议类型确定为现场总线协议类型。In the case that the data stream does not contain the protocol header data, the protocol type of the data stream is determined as the field bus protocol type.

可选的,在数据流的协议类型为所述工业以太网协议类型的情况下,所述字段语义解析单元被设置为:Optionally, when the protocol type of the data stream is the industrial Ethernet protocol type, the field semantic parsing unit is set to:

对所述协议类型为所述工业以太网协议类型的各数据流:For each data stream whose protocol type is the Industrial Ethernet protocol type:

利用预设字符串分割算法,计算该数据流中各字节的信息熵及相邻字节间的互信息量,并根据所述信息熵和所述互信息量确定数据流的各分割点。Using a preset string segmentation algorithm, the information entropy of each byte in the data stream and the mutual information between adjacent bytes are calculated, and each segmentation point of the data stream is determined according to the information entropy and the mutual information.

根据各分割点将所述数据流划分为多个字段。The data stream is divided into a plurality of fields according to each division point.

根据该数据流的协议类型中的以太网协议流标识,获得与所述以太网协议流标识匹配的协议格式。According to the Ethernet protocol stream identifier in the protocol type of the data stream, a protocol format matching the Ethernet protocol stream identifier is obtained.

对各字段:利用预设逆向解析算法,根据所述以太网协议流标识匹配的协议格式确定各字段的第一语义解析结果,所述第一语义解析结果包括各字段的语义和数据类型。For each field: a preset reverse parsing algorithm is used to determine the first semantic parsing result of each field according to the protocol format matched with the Ethernet protocol flow identifier, and the first semantic parsing result includes the semantics and data type of each field.

可选的,在数据流的协议类型为所述现场总线协议类型的情况下,所述字段语义解析单元还被设置为:Optionally, when the protocol type of the data stream is the fieldbus protocol type, the field semantic parsing unit is further set to:

对所述协议类型为所述现场总线协议类型的各数据流:For each data stream whose protocol type is the fieldbus protocol type:

根据该数据流的字节长度确定该数据流的总线协议流标识,获得与所述总线协议流标识匹配的协议格式。The bus protocol stream identifier of the data stream is determined according to the byte length of the data stream, and a protocol format matching the bus protocol stream identifier is obtained.

利用预设字节语义推断算法,根据所述与所述总线协议流标识匹配的协议格式,将该数据流划分为控制命令字段、协议数据字段和结束符字段。Using a preset byte semantic inference algorithm, according to the protocol format matching the bus protocol stream identifier, the data stream is divided into a control command field, a protocol data field and a terminator field.

利用所述预设字节语义推断算法,根据所述与所述总线协议流标识匹配的协议格式,确定所述协议数据字段中的各子字段,并对所述控制命令字段、各子字段和所述结束符字段进行语义解析,获得各字段和各子字段的第二语义解析结果,所述第二语义解析结果包括各字段或各子字段的语义和数据类型。Using the preset byte semantic inference algorithm, according to the protocol format matching the bus protocol flow identifier, each subfield in the protocol data field is determined, and the control command field, each subfield and the Semantic parsing is performed on the terminator field to obtain a second semantic parsing result of each field and each subfield, where the second semantic parsing result includes the semantics and data type of each field or each subfield.

可选的,所述关键字段确定单元被设置为:Optionally, the key field determination unit is set to:

利用预设图像识别算法,获得所述工业人机界面中各显示区域的工控数据。Using a preset image recognition algorithm, the industrial control data of each display area in the industrial human-machine interface is obtained.

对各显示区域的工控数据:Industrial control data for each display area:

根据该显示区域的工控数据获取目标数据流,其中,所述目标数据流是存在与该显示区域的工控数据的数据编码匹配的字段的数据流。A target data stream is acquired according to the industrial control data in the display area, wherein the target data stream is a data stream that has a field matching the data encoding of the industrial control data in the display area.

利用预设序列比对算法,将该显示区域的工控数据中的恒定数据序列与所述目标数据流的工控数据字段进行序列比对,并将所述工控数据字段中比对一致字段,确定为所述区域标识字段,并将所述预设图像识别算法识别出的该恒定数据序列的含义,确定为所述区域标识字段的含义。Using a preset sequence comparison algorithm, the constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared, and determined as the region identification field, and the meaning of the constant data sequence identified by the preset image recognition algorithm is determined as the meaning of the region identification field.

利用所述预设序列比对算法,将该显示区域的工控数据中的非恒定数据序列与所述目标数据流的工控数据字段进行序列比对,并将所述工控数据字段中比对一致字段,确定为所述变量数据字段,并将所述预设图像识别算法识别出的该非恒定数据序列含义,确定为所述变量数据字段的含义。Using the preset sequence comparison algorithm, the non-constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared. is determined as the variable data field, and the meaning of the non-constant data sequence identified by the preset image recognition algorithm is determined as the meaning of the variable data field.

一种工控协议的语义解析设备,所述语义解析设备包括:A semantic parsing device for an industrial control protocol, the semantic parsing device comprising:

处理器;processor;

用于存储所述处理器可执行指令的存储器。memory for storing instructions executable by the processor.

其中,所述处理器被配置为执行所述指令,以实现如上述任一项所述的工控协议的语义解析方法。Wherein, the processor is configured to execute the instruction to implement the method for semantic parsing of the industrial control protocol according to any one of the above.

一种计算机可读存储介质,当所述计算机可读存储介质中的指令由工控协议的语义解析设备的处理器执行时,使得所述语义解析设备能够执行如上述任一项所述的工控协议的语义解析方法。A computer-readable storage medium, when an instruction in the computer-readable storage medium is executed by a processor of a semantic analysis device of an industrial control protocol, the semantic analysis device can execute the industrial control protocol described in any of the above semantic analysis method.

本发明实施例提供的工控协议的语义解析方法、系统、设备及存储介质,可以通过引入预设多模式匹配算法,并基于工控协议封装头部的结构和类型,设定协议头部格式要求。可以实现从大量的总线协议流中,高效的识别出满足预设协议头部格式要求的多个数据流,并准确的确定各数据流的协议类型。同时,通过协议类型确定生成数据流的协议格式,并根据该协议格式对数据流进行字段划分和语义解析,实现了对同一条数据流中各字段语义的准确解析。最后,通过获取各显示区域的工控数据,并将工控数据的数值与数据流中的工控数据字段进行比对,使得本发明相较于现有技术,实现了对数据流中含义和位置不明确的字段的准确解析。可见,本发明实现了提高对工控协议的解析精度和解析效率的发明目的。The method, system, device and storage medium for semantic parsing of industrial control protocols provided by the embodiments of the present invention can set protocol header format requirements by introducing a preset multi-pattern matching algorithm and based on the structure and type of the industrial control protocol encapsulation header. From a large number of bus protocol streams, it can efficiently identify multiple data streams that meet the requirements of the preset protocol header format, and accurately determine the protocol type of each data stream. At the same time, the protocol format of the generated data stream is determined by the protocol type, and the data stream is divided into fields and semantically analyzed according to the protocol format, so as to realize the accurate analysis of the semantics of each field in the same data stream. Finally, by acquiring the industrial control data of each display area, and comparing the value of the industrial control data with the industrial control data field in the data stream, the present invention, compared with the prior art, realizes that the meaning and position in the data stream are not clear. accurate parsing of the fields. It can be seen that the present invention achieves the purpose of improving the analytical precision and analytical efficiency of the industrial control protocol.

当然,实施本发明的任一产品或方法必不一定需要同时达到以上所述的所有优点。Of course, it is not necessary for any product or method to implement the present invention to simultaneously achieve all of the advantages described above.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明实施例提供的一种工控协议的语义解析方法的流程图;1 is a flowchart of a method for semantic parsing of an industrial control protocol provided by an embodiment of the present invention;

图2为本发明的一个可选实施例提供的一种预设多模式匹配算法的数据结构示意图;2 is a schematic diagram of a data structure of a preset multi-pattern matching algorithm provided by an optional embodiment of the present invention;

图3为本发明的另一个可选实施例提供的一种对工业以太网协议类型数据流进行字段划分和语义解析的示意图;3 is a schematic diagram of field division and semantic analysis of an industrial Ethernet protocol type data stream provided by another optional embodiment of the present invention;

图4为本发明的另一个可选实施例提供的一种工业人机界面的示意图;4 is a schematic diagram of an industrial man-machine interface provided by another optional embodiment of the present invention;

图5为本发明的另一个可选实施例提供的系统的框图;5 is a block diagram of a system provided by another optional embodiment of the present invention;

图6为本发明的另一个可选实施例提供的设备的框图。FIG. 6 is a block diagram of a device provided by another optional embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例提供了一种工控协议的语义解析方法,如图1所示,该语义解析方法包括:An embodiment of the present invention provides a semantic parsing method for an industrial control protocol, as shown in FIG. 1 , the semantic parsing method includes:

S101、利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型。S101. Using a preset multi-pattern matching algorithm, according to the preset protocol header format requirements, identify each data stream from the bus protocol stream, and determine the protocol type of the data stream that meets the preset protocol header format requirements as Industrial Ethernet For the protocol type, the protocol type of the data stream that does not meet the requirements of the preset protocol header format is determined as the field bus protocol type.

可选的,在本发明的一个可选实施例中,上述预设多模式匹配算法,可以是根据上述工业以太网协议的协议头部格式,并基于多模式匹配算法(multi-pattern matchingalgorithm)构建的算法。由于工业控制协议在传输过程中,其传输的数据量及传输效率要高于普通的通信协议,且不同的工业控制协议间的协议格式又存在较大区别。导致现有技术在大规模数据传输的应用场景下,无法实现对不同工业控制协议准确和高效的识别。而多模式匹配算法是针对大规模多关键字匹配问题提出的算法。因此,本发明通过引入上述预设多模式匹配算法,可以实现从大量的总线协议流中,准确且高效的识别出满足上述预设协议头部格式要求的多个数据流。Optionally, in an optional embodiment of the present invention, the above-mentioned preset multi-pattern matching algorithm may be constructed according to the protocol header format of the above-mentioned industrial Ethernet protocol and based on a multi-pattern matching algorithm (multi-pattern matching algorithm). algorithm. Because the data volume and transmission efficiency of industrial control protocols are higher than those of ordinary communication protocols during the transmission process, and the protocol formats between different industrial control protocols are quite different. As a result, in the application scenario of large-scale data transmission, the existing technology cannot realize accurate and efficient identification of different industrial control protocols. The multi-pattern matching algorithm is an algorithm proposed for large-scale multi-keyword matching problems. Therefore, by introducing the above-mentioned preset multi-pattern matching algorithm, the present invention can accurately and efficiently identify multiple data streams that meet the requirements of the above-mentioned preset protocol header format from a large number of bus protocol streams.

可选的,在本发明的另一个可选实施例中,上述现场总线协议是用于工厂中的机器设备间传输控制信号的一种工业控制协议。上述工业以太网协议是用于在传输控制信号的同时,传输其他类型数据的通信协议。由于工业以太网协议类型的数据流,其在数据结构上相较于现场总线协议类型的数据流,多出了工控协议封装头部。因此,本发明通过基于工控协议封装头部的结构和类型,设定上述协议头部格式要求,即可实现对不同协议类型的数据流的准确识别。Optionally, in another optional embodiment of the present invention, the above-mentioned field bus protocol is an industrial control protocol used for transmitting control signals between machines and equipment in a factory. The above-mentioned industrial Ethernet protocol is a communication protocol used to transmit other types of data while transmitting control signals. Due to the data flow of the industrial Ethernet protocol type, compared with the data flow of the field bus protocol type, the industrial control protocol encapsulation header is added in the data structure. Therefore, the present invention can realize accurate identification of data streams of different protocol types by setting the format requirements of the protocol header based on the structure and type of the encapsulation header of the industrial control protocol.

可选的,在本发明的另一个可选实施例中,上述现场总线协议类型的具体类型可以有多种,例如控制器局域网络(Controller Area Network,CAN)协议、过程现场总线(Process Field Bus,Profibus)协议等。Optionally, in another optional embodiment of the present invention, the specific types of the above-mentioned field bus protocol types may be multiple, for example, a controller area network (Controller Area Network, CAN) protocol, a process field bus (Process Field Bus) protocol. , Profibus) protocol, etc.

可选的,在本发明的拎一个可选实施例中,上述工业以太网协议类型的具体类型可以有多种,例如传输控制协议(Transmission ControlProtocol,TCP)、工业以太网通讯协定(EtherNet IndustrialProtocol,EtherNet/IP)等。Optionally, in an optional embodiment of the present invention, the specific types of the above-mentioned industrial Ethernet protocol types can be multiple, such as transmission control protocol (Transmission Control Protocol, TCP), industrial Ethernet communication protocol (EtherNet Industrial Protocol, EtherNet/IP), etc.

可选的,在本发明的另一个可选实施例中,在实际应用场景下,上述确定各数据流协议类型的实施方式,可以是在确定各数据流的具体协议类型,根据该协议类型生成对应电子标签。Optionally, in another optional embodiment of the present invention, in an actual application scenario, the above-mentioned implementation manner of determining the protocol type of each data stream may be determining the specific protocol type of each data stream, and generating the protocol type according to the protocol type. Corresponding electronic label.

S102、使用与协议类型对应的协议格式,对各数据流进行字段划分,并根据协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果。S102 , using a protocol format corresponding to the protocol type, perform field division on each data stream, and perform semantic analysis on each field of each data stream according to the protocol format, to obtain a semantic analysis result of each field.

其中,上述协议格式是指根据不同协议规定的帧的格式,即帧格式(frameformat)。由于不同的工业控制协议产生的数据流中,不同位置的字段表征了不同的含义。且对于同一种工业控制协议生成的数据流,虽然其字符串长度会随其所携带的数据信息量发生变动。但是,工业控制协议会通过在表征不同数据类型的字符串前添加分隔标识的方式用于区分。因此,通过协议类型确定生成数据流的协议格式,并根据该协议格式对数据流进行字段划分和语义解析,实现了对同一条数据流中各字段语义的准确解析。The above-mentioned protocol format refers to a frame format specified according to different protocols, that is, a frame format. In the data streams generated by different industrial control protocols, fields in different positions represent different meanings. And for the data stream generated by the same industrial control protocol, although the string length will vary with the amount of data information it carries. However, industrial control protocols are used to differentiate by adding delimiters to strings that characterize different data types. Therefore, the protocol format of the generated data stream is determined by the protocol type, and the field division and semantic analysis of the data stream are performed according to the protocol format, so as to realize the accurate analysis of the semantics of each field in the same data stream.

可选的,在本发明的一个可选实施例中,上述语义解析结果的内容,可以是该字段的含义、该字段所表征的数据的数据类型等。Optionally, in an optional embodiment of the present invention, the content of the above-mentioned semantic analysis result may be the meaning of the field, the data type of the data represented by the field, and the like.

S103、获得工业人机界面,对工业人机界面进行识别,获得工业人机界面中各显示区域的工控数据,并基于工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,其中,工控数据字段是语义解析结果中数据类型为工控数据的字段。S103, obtaining an industrial man-machine interface, identifying the industrial man-machine interface, obtaining industrial control data of each display area in the industrial man-machine interface, and determining the area identification field and variable data field in each data stream based on the industrial control data and the industrial control data field The location and meaning of , where the industrial control data field is a field whose data type is industrial control data in the semantic analysis result.

其中,上述工业人机界面(Industrial Human Machine Interface,IndustrialHMI),是通过应用于工业控制领域人机界面组态软件生成的,用于人机交互和控制的界面。Wherein, the above-mentioned Industrial Human Machine Interface (Industrial Human Machine Interface, IndustrialHMI) is generated by the human-machine interface configuration software applied in the field of industrial control, and is used for the interface of human-machine interaction and control.

可选的,在本发明的一个可选实施例中,上述对工业人机界面进行识别,获得工业人机界面中各显示区域的工控数据的实施方式,可以是通过预设的图像识别算法,对工业人机界面中的图像数据进行分区域采集实现的。Optionally, in an optional embodiment of the present invention, the above-mentioned implementation of identifying the industrial human-machine interface and obtaining the industrial control data of each display area in the industrial human-machine interface may be through a preset image recognition algorithm, It is realized by sub-regional acquisition of image data in industrial man-machine interface.

可选的,在本发明的另一个可选实施例中,由于工业人机界面显示的图像内容,是运行状态或运行参数等影响设备运行可靠性的重要工控数据。由于这些工控数据经常处于变动状态。致使工业控制协议不会在数据流中体现上述工控数据对应字段的具体含义及位置,只会在对应字段显示其数值。而现有的解析方法只能对工业控制协议中设定好具体含义的字段实现解析。因此,本发明通过获取上述各显示区域的工控数据,并将工控数据的数值与数据流中的工控数据字段进行比对,使得本发明相较于现有技术,实现了对数据流中含义和位置不明确的字段的准确解析。Optionally, in another optional embodiment of the present invention, the image content displayed on the industrial human-machine interface is important industrial control data such as the operating state or operating parameters that affect the reliability of the equipment's operation. Because these industrial control data are often in a state of change. As a result, the industrial control protocol will not reflect the specific meaning and location of the corresponding fields of the above industrial control data in the data stream, but only display the values in the corresponding fields. However, the existing parsing methods can only implement parsing of fields with specific meanings set in the industrial control protocol. Therefore, the present invention obtains the industrial control data of the above-mentioned display areas, and compares the value of the industrial control data with the industrial control data field in the data stream, so that the present invention, compared with the prior art, realizes the meaning and difference in the data stream. Accurate parsing of poorly positioned fields.

本发明通过引入预设多模式匹配算法,并基于工控协议封装头部的结构和类型,设定协议头部格式要求。可以实现从大量的总线协议流中,高效的识别出满足预设协议头部格式要求的多个数据流,并准确的确定各数据流的协议类型。同时,通过协议类型确定生成数据流的协议格式,并根据该协议格式对数据流进行字段划分和语义解析,实现了对同一条数据流中各字段语义的准确解析。最后,通过获取各显示区域的工控数据,并将工控数据的数值与数据流中的工控数据字段进行比对,使得本发明相较于现有技术,实现了对数据流中含义和位置不明确的字段的准确解析。可见,本发明实现了提高对工控协议的解析精度和解析效率的发明目的。The present invention sets the protocol header format requirements by introducing a preset multi-mode matching algorithm and based on the structure and type of the industrial control protocol encapsulation header. From a large number of bus protocol streams, it can efficiently identify multiple data streams that meet the requirements of the preset protocol header format, and accurately determine the protocol type of each data stream. At the same time, the protocol format of the generated data stream is determined by the protocol type, and the data stream is divided into fields and semantically analyzed according to the protocol format, so as to realize the accurate analysis of the semantics of each field in the same data stream. Finally, by acquiring the industrial control data of each display area, and comparing the value of the industrial control data with the industrial control data field in the data stream, the present invention, compared with the prior art, realizes that the meaning and position in the data stream are not clear. accurate parsing of the fields. It can be seen that the present invention achieves the purpose of improving the analytical precision and analytical efficiency of the industrial control protocol.

可选的,利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型,包括:Optionally, use a preset multi-pattern matching algorithm to identify each data stream from the bus protocol stream according to the preset protocol header format requirements, and determine the protocol type of the data stream that meets the preset protocol header format requirements as industrial. Ethernet protocol type, the protocol type of the data stream that does not meet the requirements of the preset protocol header format is determined as the field bus protocol type, including:

对总线协议流中的各数据流:For each data stream in the bus protocol stream:

利用预设多模式匹配算法判断该数据流是否包含协议头部数据,若是,则将预设协议头部格式要求中,与该数据流的协议头部数据匹配的工业以太网协议类型确定为该数据流的协议类型。Use the preset multi-pattern matching algorithm to determine whether the data stream contains protocol header data, and if so, determine the industrial Ethernet protocol type that matches the protocol header data of the data stream in the preset protocol header format requirements as the The protocol type of the data stream.

在该数据流不包含协议头部数据的情况下,将该数据流的协议类型确定为现场总线协议类型。In the case that the data stream does not contain protocol header data, the protocol type of the data stream is determined as the field bus protocol type.

需要说明的是,在实际应用场景中,上述利用预设多模式匹配算法判断该数据流是否包含协议头部数据并确定数据流协议类型的实施方式有多种。在此示例性的提供一种:It should be noted that, in practical application scenarios, there are various implementations for determining whether the data stream contains protocol header data and determining the protocol type of the data stream by using a preset multi-pattern matching algorithm. An example is provided here:

为了方便描述,在本示例中采用多模式匹配AC(Aho-Corasick)算法作为上述预设多模式匹配算法,其数据结构如图2所示。For the convenience of description, the multi-pattern matching AC (Aho-Corasick) algorithm is used as the above-mentioned preset multi-pattern matching algorithm in this example, and the data structure thereof is shown in FIG. 2 .

设定当前待识别的数据流分别为Q1、Q2、Q3、Q4和Q5。其中,数据流Q1的字符串表示为:00 18 20。数据流Q2的字符串表示为:00 20 30。数据流Q3的字符串表示为:04 20 30。数据流Q4的字符串表示为:00 20 26。数据流Q5的字符串表示为:13 27 46.Set the current data streams to be identified as Q1, Q2, Q3, Q4 and Q5 respectively. Among them, the string representation of data stream Q1 is: 00 18 20. The string representation of data stream Q2 is: 00 20 30. The string representation of data stream Q3 is: 04 20 30. The string representation of data stream Q4 is: 00 20 26. The string representation of data stream Q5 is: 13 27 46.

请参阅图2,利用上述多模式匹配AC算法对上述Q1、Q2、Q3、Q4和Q5进行头部识别的过程是:Referring to Figure 2, the process of using the above-mentioned multi-pattern matching AC algorithm to identify the heads of the above Q1, Q2, Q3, Q4 and Q5 is:

Q1、Q2和Q4的第一字符均为“00”。Q3的第一字符为“04”。且Q1、Q2、Q3和Q4的第一字符在上述多模式匹配AC算法均存在匹配项。因此,可以对Q1、Q2、Q3和Q4的第二字符进行识别。并将Q1、Q2和Q4的第一类型编码确定为“1”。将Q3的第一类型编码确定为“6”。The first characters of Q1, Q2 and Q4 are all "00". The first character of Q3 is "04". And the first characters of Q1, Q2, Q3 and Q4 all have matching items in the above-mentioned multi-pattern matching AC algorithm. Therefore, the second characters of Q1, Q2, Q3 and Q4 can be recognized. And the first type codes of Q1, Q2 and Q4 are determined to be "1". The first type code of Q3 is determined to be "6".

需要说明的是,图2中所示的“00-1”、“20-2”、“30-3”等数据,其表征的含义为上述数据流中不同位置字符的具体数值。例如“00-1”表征第一字符的数值为00,“30-3”表征第三字符的数值为30。本发明对此不作过多赘述。It should be noted that data such as "00-1", "20-2", and "30-3" shown in FIG. 2 represent the specific numerical values of characters at different positions in the above data stream. For example, "00-1" indicates that the value of the first character is 00, and "30-3" indicates that the value of the third character is 30. The present invention will not describe it too much.

需要说明的是,图2中圆形内部的数字,其表征的含义是该字符位置对应的类型编码。该类型编码可以用于确定数据流的具体工业以太网协议类型。图中圆形内部的Root为虚拟跟用户,用于确定各数据流的协议类型。It should be noted that the number inside the circle in Figure 2 represents the meaning of the type code corresponding to the character position. This type code can be used to determine the specific Industrial Ethernet protocol type of the data stream. The Root inside the circle in the figure is a virtual follower, which is used to determine the protocol type of each data stream.

数据流Q5的第一字符在上述多模式匹配AC算法不存在匹配项。因此,上述多模式匹配AC算法判断数据流Q5不包含协议头部数据,并将数据流Q5的协议类型确定为现场总线协议类型。The first character of the data stream Q5 has no match in the above-mentioned multi-pattern matching AC algorithm. Therefore, the above-mentioned multi-pattern matching AC algorithm judges that the data stream Q5 does not contain protocol header data, and determines the protocol type of the data stream Q5 as the field bus protocol type.

需要说明的是,由于现场总线协议的具体类型有多种。因此上述多模式匹配AC算法只能确定数据流Q5的协议类型为现场总线协议类型,但无法确定数据流Q5属于现场总线协议中的哪一个具体类型。It should be noted that there are many specific types of fieldbus protocols. Therefore, the above-mentioned multi-mode matching AC algorithm can only determine that the protocol type of the data stream Q5 is the fieldbus protocol type, but cannot determine which specific type of the fieldbus protocol the data stream Q5 belongs to.

在对Q1、Q2、Q3和Q4的第二字符进行识别时,Q1的第二字符为“18”。Q2、Q3和Q4的第二字符均为“20”。但是,由于Q2和Q4的第一字符,与Q3的第一字符不一致。因此,将Q1的第二类型编码确定为“2”。将Q2和Q4的第二类型编码确定为“4”。将Q3的第二类型编码确定为“7”。When recognizing the second characters of Q1, Q2, Q3 and Q4, the second character of Q1 is "18". The second characters of Q2, Q3 and Q4 are all "20". However, due to the first characters of Q2 and Q4, it is inconsistent with the first character of Q3. Therefore, the second type code of Q1 is determined to be "2". The second type encoding of Q2 and Q4 is determined to be "4". The second type code of Q3 is determined to be "7".

在对Q1、Q2、Q3和Q4的第三字符进行识别时,Q1的第三字符为“20”。Q2和Q3的第三字符均为“30”。Q4的第三字符为“26”。因此,将Q1的第三类型编码确定为“3”。将Q2的第三类型编码确定为“5”。将Q3的第三类型编码确定为“8”。将Q4的第三类型编码确定为“9”。When recognizing the third characters of Q1, Q2, Q3 and Q4, the third character of Q1 is "20". The third character of both Q2 and Q3 is "30". The third character of Q4 is "26". Therefore, the third type code of Q1 is determined to be "3". The third type code of Q2 is determined to be "5". The third type code of Q3 is determined to be "8". The third type code of Q4 is determined to be "9".

由于Q1、Q2、Q3和Q4的各位置字符在上述多模式匹配AC算法均存在匹配项。因此,确定Q1、Q2、Q3和Q4的协议类型均为工业以太网协议。且根据各数据流的第一至第三类型编码,可以确定各数据流的综合类型编码。其中,Q1的综合类型编码为“123”。Q2的综合类型编码为“145”。Q3的综合类型编码为“678”。Q4的综合类型编码为“149”。根据各数据流的综合类型编码,确定各数据流的具体工业以太网协议类型。Since the characters in each position of Q1, Q2, Q3 and Q4 all have matching items in the above-mentioned multi-pattern matching AC algorithm. Therefore, it is determined that the protocol types of Q1, Q2, Q3 and Q4 are all industrial Ethernet protocols. And according to the first to third type codes of each data stream, the comprehensive type code of each data stream can be determined. Among them, the comprehensive type code of Q1 is "123". The synthesis type code for Q2 is "145". The synthesis type code for Q3 is "678". The synthesis type code for Q4 is "149". According to the comprehensive type coding of each data stream, the specific industrial Ethernet protocol type of each data stream is determined.

可选的,在数据流的协议类型为工业以太网协议类型的情况下,使用与协议类型对应的协议格式,对各数据流进行字段划分,并根据协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果,包括:Optionally, when the protocol type of the data stream is the industrial Ethernet protocol type, use the protocol format corresponding to the protocol type to divide the fields of each data stream, and perform the field division of each data stream according to the protocol format. Semantic parsing, to obtain the semantic parsing results of each field, including:

对协议类型为工业以太网协议类型的各数据流:For each data stream whose protocol type is Industrial Ethernet protocol type:

利用预设字符串分割算法,计算该数据流中各字节的信息熵及相邻字节间的互信息量,并根据信息熵和互信息量确定数据流的各分割点。Using a preset string segmentation algorithm, the information entropy of each byte in the data stream and the mutual information between adjacent bytes are calculated, and each segmentation point of the data stream is determined according to the information entropy and mutual information.

根据各分割点将数据流划分为多个字段。The data stream is divided into multiple fields according to each split point.

根据该数据流的协议类型中的以太网协议流标识,获得与以太网协议流标识匹配的协议格式。According to the Ethernet protocol stream identifier in the protocol type of the data stream, a protocol format matching the Ethernet protocol stream identifier is obtained.

对各字段:利用预设逆向解析算法,根据以太网协议流标识匹配的协议格式确定各字段的第一语义解析结果,第一语义解析结果包括各字段的语义和数据类型。For each field: use a preset reverse parsing algorithm to determine the first semantic parsing result of each field according to the protocol format matched with the Ethernet protocol stream identifier, and the first semantic parsing result includes the semantics and data type of each field.

可选的,在本发明的一个可选实施例中,上述对工业以太网协议类型数据流进行字段划分和语义解析的实施方式,可以是:Optionally, in an optional embodiment of the present invention, the above-mentioned implementation manner of performing field division and semantic analysis on industrial Ethernet protocol type data streams may be:

请参阅图3。d1至dn是该数据流的各字节位置。利用预设信息熵算法,计算处于各字节位置的字节的信息熵H(dj)。利用预设互信息量计算算法,根据相邻的两个字节的信息熵,计算相邻的字节间的互信息量MIS(dj-1,dj)。判断相邻的字节间的互信息量是否小于预设分割阈值,若否,则确定该相邻的两个字节间需要设置分割点g。例如,设定图3中MIS(dj-1,dj)不小于预设分割阈值。设定除dj-1和dj以外的其他相邻字节的互信息量,如MIS(d1,d2)、MIS(dn-1,dn)等均小于预设分割阈值。则在dj-1和dj添加内容为分割点g的标签。在其他节点添加内容为连续l的标签。See Figure 3. d 1 to dn are the byte positions of the data stream. Using a preset information entropy algorithm, the information entropy H(d j ) of the bytes at each byte position is calculated. Using a preset mutual information calculation algorithm, according to the information entropy of two adjacent bytes, the mutual information MIS(d j-1 , d j ) between adjacent bytes is calculated. It is judged whether the amount of mutual information between adjacent bytes is less than the preset dividing threshold, and if not, it is determined that a dividing point g needs to be set between the adjacent two bytes. For example, it is set that MIS(d j-1 , d j ) in FIG. 3 is not smaller than the preset segmentation threshold. It is set that the mutual information of other adjacent bytes except d j-1 and d j , such as MIS(d 1 , d 2 ), MIS(d n-1 , d n ), are all smaller than the preset segmentation threshold. Then add a label whose content is the split point g to d j-1 and d j . Add labels with the content of consecutive l to other nodes.

可选的,上述预设字符串分割算法可以是由上述预设信息熵算法和预设互信息量计算算法构成的组合算法。Optionally, the preset string segmentation algorithm may be a combined algorithm composed of the preset information entropy algorithm and the preset mutual information calculation algorithm.

再根据上述标签内容进行分割后,获得了两个字段:f1和f2After dividing according to the above label content, two fields are obtained: f 1 and f 2 .

由于经过上述步骤,已明确该数据流的具体工业以太网协议类型。则根据以太网协议流标识调取该协议类型的协议格式。且该协议格式应当也由两个字段构成。则利用预设逆向解析算法,根据该协议格式中各字段的语义及数据类型,对上述字段f1和f2进行语义解析。例如:该协议格式中,第一字段的语义为设备标识,数据类型为文本数据。第二字段的语义为设备参数,数据类型为动态数据。则将第一字段的语义及数据类型,确定为字段f1的第一语义解析结果。将第二字段的语义及数据类型,确定为字段f2的第一语义解析结果。Due to the above steps, the specific industrial Ethernet protocol type of the data flow has been clarified. Then, the protocol format of the protocol type is retrieved according to the Ethernet protocol flow identifier. And the protocol format should also consist of two fields. Then, using a preset reverse parsing algorithm, according to the semantics and data types of the fields in the protocol format, semantic parsing is performed on the above - mentioned fields f1 and f2. For example, in this protocol format, the semantics of the first field is device identification, and the data type is text data. The semantics of the second field is device parameters, and the data type is dynamic data. Then, the semantics and data type of the first field are determined as the first semantic analysis result of the field f1. The semantics and data type of the second field are determined as the first semantic analysis result of the field f2.

可选的,在本发明的一个可选实施例中,上述以太网协议流标识可以是用于确定具体协议类型的标签。Optionally, in an optional embodiment of the present invention, the foregoing Ethernet protocol flow identifier may be a label used to determine a specific protocol type.

本领域技术人员可以理解的是,上述预设信息熵算法和预设互信息量计算算法,可以根据信息论中信息熵互信息的相关概念和计算公式制定。本发明对上述两个算法的具体构建过程不作过多限定和赘述。Those skilled in the art can understand that the above-mentioned preset information entropy algorithm and preset mutual information calculation algorithm can be formulated according to the related concepts and calculation formulas of information entropy and mutual information in information theory. The present invention does not limit or describe the specific construction process of the above two algorithms too much.

可选的,在数据流的协议类型为现场总线协议类型的情况下,还包括:Optionally, when the protocol type of the data stream is the fieldbus protocol type, it also includes:

对协议类型为现场总线协议类型的各数据流:For each data stream whose protocol type is fieldbus protocol type:

根据该数据流的字节长度确定该数据流的总线协议流标识,获得与总线协议流标识匹配的协议格式。The bus protocol flow identifier of the data stream is determined according to the byte length of the data stream, and a protocol format matching the bus protocol stream identifier is obtained.

利用预设字节语义推断算法,根据与总线协议流标识匹配的协议格式,将该数据流划分为控制命令字段、协议数据字段和结束符字段。Using a preset byte semantic inference algorithm, the data stream is divided into a control command field, a protocol data field and a terminator field according to the protocol format matching the bus protocol stream identifier.

利用预设字节语义推断算法,根据与总线协议流标识匹配的协议格式,确定协议数据字段中的各子字段,并对控制命令字段、各子字段和结束符字段进行语义解析,获得各字段和各子字段的第二语义解析结果,第二语义解析结果包括各字段或各子字段的语义和数据类型。Use the preset byte semantic inference algorithm to determine each subfield in the protocol data field according to the protocol format matching the bus protocol stream identifier, and perform semantic analysis on the control command field, each subfield and the terminator field to obtain each field. and the second semantic parsing result of each subfield, where the second semantic parsing result includes the semantics and data type of each field or each subfield.

可选的,在本发明的另一个可选实施例中,由于现场总线协议类型的数据流结构,一般为“起始符+控制命令字段+协议数据字段+结束符字段”。且其各字段的长度相对固定。因此,通过上述字节长度,可以确定该数据流的具体现场总线协议类型。Optionally, in another optional embodiment of the present invention, due to the data flow structure of the field bus protocol type, it is generally "starter + control command field + protocol data field + terminator field". And the length of each field is relatively fixed. Therefore, the specific field bus protocol type of the data stream can be determined through the above-mentioned byte length.

本领域技术人员可以理解的是,上述预设字节语义推断算法,可以利用现有基于设定的字节长度提取字符串的Java代码,构建上述预设字节语义推断算法。本发明对于上述预设字节语义推断算法的构建过程不作过多限定和赘述。Those skilled in the art can understand that the above-mentioned preset byte semantic inference algorithm can utilize the existing Java code for extracting character strings based on the set byte length to construct the above-mentioned preset byte semantic inference algorithm. The present invention does not limit or describe the construction process of the above-mentioned preset byte semantic inference algorithm too much.

可选的,获得工业人机界面,对工业人机界面进行识别,获得工业人机界面中各显示区域的工控数据,并基于工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,包括:Optionally, obtain the industrial human-machine interface, identify the industrial human-machine interface, obtain the industrial control data of each display area in the industrial human-machine interface, and determine the area identification field and variable in each data stream based on the industrial control data and the industrial control data field The location and meaning of the data fields, including:

利用预设图像识别算法,获得工业人机界面中各显示区域的工控数据。Using the preset image recognition algorithm, the industrial control data of each display area in the industrial human-machine interface is obtained.

对各显示区域的工控数据:Industrial control data for each display area:

根据该显示区域的工控数据获取目标数据流,其中,目标数据流是存在与该显示区域的工控数据的数据编码匹配的字段的数据流。The target data stream is acquired according to the industrial control data in the display area, wherein the target data stream is a data stream that has a field matching the data encoding of the industrial control data in the display area.

利用预设序列比对算法,将该显示区域的工控数据中的恒定数据序列与目标数据流的工控数据字段进行序列比对,并将工控数据字段中比对一致字段,确定为区域标识字段,并将预设图像识别算法识别出的该恒定数据序列的含义,确定为区域标识字段的含义。Using a preset sequence comparison algorithm, the constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared and determined as the area identification field, The meaning of the constant data sequence recognized by the preset image recognition algorithm is determined as the meaning of the area identification field.

利用预设序列比对算法,将该显示区域的工控数据中的非恒定数据序列与目标数据流的工控数据字段进行序列比对,并将工控数据字段中比对一致字段,确定为变量数据字段,并将预设图像识别算法识别出的该非恒定数据序列含义,确定为变量数据字段的含义。Using a preset sequence comparison algorithm, the non-constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared and determined as the variable data field , and determine the meaning of the non-constant data sequence identified by the preset image recognition algorithm as the meaning of the variable data field.

需要说明的是,在实际应用中,上述基于工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义的实施方式有多种,在此示例性的提供一种:It should be noted that, in practical applications, there are various implementations for determining the location and meaning of the area identification field and the variable data field in each data stream based on the industrial control data and the industrial control data field. Here, an exemplary implementation is provided. :

如图4所示,为监控锅炉内部运行状态的工业人机界面401。利用预设图像识别算法,对工业人机界面401进行图像识别,获得多个显示区域及其工控数据。其中:区域402为锅炉本体。区域403为压力阀。As shown in FIG. 4 , it is an industrial man-machine interface 401 for monitoring the internal operation state of the boiler. Using a preset image recognition algorithm, image recognition is performed on the industrial man-machine interface 401 to obtain multiple display areas and their industrial control data. Wherein: the area 402 is the boiler body. Region 403 is the pressure valve.

为了方便描述,设定已获取到区域402的目标数据流A,和区域403的目标数据流B。其中,目标数据流A包含两个字段,字段甲的含义为设备类型,字段乙的含义为设备编号。目标数据流B包含四个字段,字段丙的含义为仪器类别,字段丁的含义为数值1,字段戊的含义为数值2,字段己的含义为数值3。For the convenience of description, the target data stream A of the area 402 and the target data stream B of the area 403 are set. Among them, the target data stream A includes two fields, the meaning of the field A is the device type, and the meaning of the field B is the device number. The target data stream B includes four fields, the meaning of the field C is the instrument category, the meaning of the field D is the value 1, the meaning of the field E is the value 2, and the meaning of the field X is the value 3.

则对于区域402,由锅炉这一设备类型和“一号锅炉”这一设备编号构成的恒定数据序列,其字段序列与目标数据流A的两个字段分别对应。则根据上述恒定数据序列,将目标数据流A的字段甲的含义确定为锅炉设备,将字段乙的含义确定为锅炉编号。Then, for the area 402, the constant data sequence consisting of the equipment type of boiler and the equipment number of "No. 1 boiler", the field sequence of which corresponds to the two fields of the target data stream A respectively. Then, according to the above-mentioned constant data sequence, the meaning of field A of the target data stream A is determined as boiler equipment, and the meaning of field B is determined as boiler number.

对于区域403,经过图像识别后获得的该区域的工控数据包括压力阀这一仪器类和“11MPa”炉内压力值。其中压力阀属于恒定数据,炉内压力值是波动的非恒定数据。因此,在经过上述比对后,对目标数据流B中的字段丙,确定其含义为压力阀。对目标数据流B中的字段丁至字段已,将其含义确定为炉内压力。且字段丁至字段己的字段位置,位于字段丙之后。For area 403, the industrial control data of this area obtained after image recognition includes the instrument type of pressure valve and the "11MPa" furnace pressure value. Among them, the pressure valve is a constant data, and the pressure value in the furnace is a fluctuating non-constant data. Therefore, after the above comparison, it is determined that the meaning of field C in the target data stream B is a pressure valve. For the fields D to fields in the target data stream B, determine their meanings as furnace pressure. And field D to the field position of field own, after field C.

与上述方法实施例相对应地,本发明还提供了一种工控协议的语义解析系统,如图5所示,该语义解析系统包括:Corresponding to the above method embodiments, the present invention also provides a semantic parsing system for an industrial control protocol. As shown in FIG. 5 , the semantic parsing system includes:

协议类型确定单元501,利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型。The protocol type determination unit 501 uses a preset multi-pattern matching algorithm to identify each data stream from the bus protocol stream according to the preset protocol header format requirements, and determines the protocol type of the data stream that meets the preset protocol header format requirements. For the industrial Ethernet protocol type, the protocol type of the data stream that does not meet the requirements of the preset protocol header format is determined as the field bus protocol type.

字段语义解析单元502,使用与协议类型对应的协议格式,对各数据流进行字段划分,并根据协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果。The field semantic parsing unit 502 uses the protocol format corresponding to the protocol type to perform field division on each data stream, and performs semantic parsing on each field of each data stream according to the protocol format to obtain the semantic parsing result of each field.

关键字段确定单元503,用于获得工业人机界面,对工业人机界面进行识别,获得工业人机界面中各显示区域的工控数据,并基于工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,其中,工控数据字段是语义解析结果中数据类型为工控数据的字段。The key field determination unit 503 is used to obtain the industrial human-machine interface, identify the industrial human-machine interface, obtain the industrial control data of each display area in the industrial human-machine interface, and determine the data in each data stream based on the industrial control data and the industrial control data field. The location and meaning of the area identification field and the variable data field, where the industrial control data field is a field whose data type is industrial control data in the semantic analysis result.

可选的,上述协议类型确定单元501被设置为:Optionally, the above-mentioned protocol type determination unit 501 is set to:

对总线协议流中的各数据流:For each data stream in the bus protocol stream:

利用预设多模式匹配算法判断该数据流是否包含协议头部数据,若是,则将预设协议头部格式要求中,与该数据流的协议头部数据匹配的工业以太网协议类型确定为该数据流的协议类型。Use the preset multi-pattern matching algorithm to determine whether the data stream contains protocol header data, and if so, determine the industrial Ethernet protocol type that matches the protocol header data of the data stream in the preset protocol header format requirements as the The protocol type of the data stream.

在该数据流不包含协议头部数据的情况下,将该数据流的协议类型确定为现场总线协议类型。In the case that the data stream does not contain protocol header data, the protocol type of the data stream is determined as the field bus protocol type.

可选的,在数据流的协议类型为工业以太网协议类型的情况下,上述字段语义解析单元502被设置为:Optionally, when the protocol type of the data stream is the industrial Ethernet protocol type, the field semantic analysis unit 502 is set to:

对协议类型为工业以太网协议类型的各数据流:For each data stream whose protocol type is Industrial Ethernet protocol type:

利用预设字符串分割算法,计算该数据流中各字节的信息熵及相邻字节间的互信息量,并根据信息熵和互信息量确定数据流的各分割点。Using a preset string segmentation algorithm, the information entropy of each byte in the data stream and the mutual information between adjacent bytes are calculated, and each segmentation point of the data stream is determined according to the information entropy and mutual information.

根据各分割点将数据流划分为多个字段。The data stream is divided into multiple fields according to each split point.

根据该数据流的协议类型中的以太网协议流标识,获得与以太网协议流标识匹配的协议格式。According to the Ethernet protocol stream identifier in the protocol type of the data stream, a protocol format matching the Ethernet protocol stream identifier is obtained.

对各字段:利用预设逆向解析算法,根据以太网协议流标识匹配的协议格式确定各字段的第一语义解析结果,第一语义解析结果包括各字段的语义和数据类型。For each field: use a preset reverse parsing algorithm to determine the first semantic parsing result of each field according to the protocol format matched with the Ethernet protocol stream identifier, and the first semantic parsing result includes the semantics and data type of each field.

可选的,在数据流的协议类型为现场总线协议类型的情况下,上述字段语义解析单元502还被设置为:Optionally, in the case that the protocol type of the data stream is the field bus protocol type, the above-mentioned field semantic analysis unit 502 is also set to:

对协议类型为现场总线协议类型的各数据流:For each data stream whose protocol type is fieldbus protocol type:

根据该数据流的字节长度确定该数据流的总线协议流标识,获得与总线协议流标识匹配的协议格式。The bus protocol flow identifier of the data stream is determined according to the byte length of the data stream, and a protocol format matching the bus protocol stream identifier is obtained.

利用预设字节语义推断算法,根据与总线协议流标识匹配的协议格式,将该数据流划分为控制命令字段、协议数据字段和结束符字段。Using a preset byte semantic inference algorithm, the data stream is divided into a control command field, a protocol data field and a terminator field according to the protocol format matching the bus protocol stream identifier.

利用预设字节语义推断算法,根据与总线协议流标识匹配的协议格式,确定协议数据字段中的各子字段,并对控制命令字段、各子字段和结束符字段进行语义解析,获得各字段和各子字段的第二语义解析结果,第二语义解析结果包括各字段或各子字段的语义和数据类型。Use the preset byte semantic inference algorithm to determine each subfield in the protocol data field according to the protocol format matching the bus protocol stream identifier, and perform semantic analysis on the control command field, each subfield and the terminator field to obtain each field. and the second semantic parsing result of each subfield, where the second semantic parsing result includes the semantics and data type of each field or each subfield.

可选的,关键字段确定单元503被设置为:Optionally, the key field determination unit 503 is set to:

利用预设图像识别算法,获得工业人机界面中各显示区域的工控数据。Using the preset image recognition algorithm, the industrial control data of each display area in the industrial human-machine interface is obtained.

对各显示区域的工控数据:Industrial control data for each display area:

根据该显示区域的工控数据获取目标数据流,其中,目标数据流是存在与该显示区域的工控数据的数据编码匹配的字段的数据流。The target data stream is acquired according to the industrial control data in the display area, wherein the target data stream is a data stream that has a field matching the data encoding of the industrial control data in the display area.

利用预设序列比对算法,将该显示区域的工控数据中的恒定数据序列与目标数据流的工控数据字段进行序列比对,并将工控数据字段中比对一致字段,确定为区域标识字段,并将预设图像识别算法识别出的该恒定数据序列的含义,确定为区域标识字段的含义。Using a preset sequence comparison algorithm, the constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared and determined as the area identification field, The meaning of the constant data sequence recognized by the preset image recognition algorithm is determined as the meaning of the area identification field.

利用预设序列比对算法,将该显示区域的工控数据中的非恒定数据序列与目标数据流的工控数据字段进行序列比对,并将工控数据字段中比对一致字段,确定为变量数据字段,并将预设图像识别算法识别出的该非恒定数据序列含义,确定为变量数据字段的含义。Using a preset sequence comparison algorithm, the non-constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared and determined as the variable data field , and determine the meaning of the non-constant data sequence identified by the preset image recognition algorithm as the meaning of the variable data field.

本发明实施例还提供了一种工控协议的语义解析设备,如图6所示,该语义解析设备包括:The embodiment of the present invention also provides a semantic parsing device for an industrial control protocol, as shown in FIG. 6 , the semantic parsing device includes:

处理器601;processor 601;

用于存储处理器601可执行指令的存储器602。Memory 602 for storing instructions executable by processor 601 .

其中,处理器601被配置为执行指令,以实现上述如图1所示的任一项工控协议的语义解析方法。Wherein, the processor 601 is configured to execute an instruction to implement the semantic parsing method of any one of the industrial control protocols as shown in FIG. 1 .

本发明实施例还提供了一种计算机可读存储介质,当计算机可读存储介质中的指令由工控协议的语义解析设备的处理器执行时,使得语义解析设备能够执行上述如图1所示的任一项的工控协议的语义解析方法。The embodiment of the present invention also provides a computer-readable storage medium, when the instructions in the computer-readable storage medium are executed by the processor of the semantic parsing device of the industrial control protocol, the semantic parsing device can execute the above-mentioned as shown in FIG. 1 . A semantic parsing method for any industrial control protocol.

存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。存储器是计算机可读介质的示例。Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one memory chip. Memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or apparatus that includes the element.

本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to the partial descriptions of the method embodiments.

以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above are merely examples of the present application, and are not intended to limit the present application. Various modifications and variations of this application are possible for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the scope of the claims of this application.

Claims (10)

1.一种工控协议的语义解析方法,其特征在于,所述语义解析方法包括:1. a semantic analysis method of industrial control protocol, is characterized in that, described semantic analysis method comprises: 利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足所述预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足所述预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型;Using the preset multi-pattern matching algorithm, according to the preset protocol header format requirements, identify each data stream from the bus protocol stream, and determine the protocol type of the data stream that meets the preset protocol header format requirements as Industrial Ethernet Protocol type, the protocol type of the data stream that does not meet the requirements of the preset protocol header format is determined as the field bus protocol type; 使用与所述协议类型对应的协议格式,对各数据流进行字段划分,并根据所述协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果;Using the protocol format corresponding to the protocol type, field division is performed on each data stream, and semantic analysis is performed on each field of each data stream according to the protocol format, and the semantic analysis result of each field is obtained; 获得工业人机界面,对所述工业人机界面进行识别,获得所述工业人机界面中各显示区域的工控数据,并基于所述工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,其中,所述工控数据字段是所述语义解析结果中数据类型为工控数据的字段。Obtain an industrial man-machine interface, identify the industrial man-machine interface, obtain industrial control data of each display area in the industrial man-machine interface, and determine the area identification field in each data stream based on the industrial control data and the industrial control data field and the position and meaning of the variable data field, wherein the industrial control data field is a field whose data type is industrial control data in the semantic analysis result. 2.根据权利要求1所述的方法,其特征在于,所述利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足所述预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足所述预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型,包括:2. The method according to claim 1, characterized in that, using a preset multi-pattern matching algorithm, according to preset protocol header format requirements, identifying each data stream from a bus protocol stream, will satisfy the preset The protocol type of the data stream required by the protocol header format is determined as the industrial Ethernet protocol type, and the protocol type of the data stream that does not meet the requirements of the preset protocol header format is determined as the field bus protocol type, including: 对所述总线协议流中的各数据流:For each data stream in the bus protocol stream: 利用所述预设多模式匹配算法判断该数据流是否包含协议头部数据,若是,则将所述预设协议头部格式要求中,与该数据流的所述协议头部数据匹配的工业以太网协议类型确定为该数据流的协议类型;Use the preset multi-pattern matching algorithm to determine whether the data stream contains protocol header data; The network protocol type is determined as the protocol type of the data stream; 在该数据流不包含所述协议头部数据的情况下,将该数据流的协议类型确定为现场总线协议类型。In the case that the data stream does not contain the protocol header data, the protocol type of the data stream is determined as the field bus protocol type. 3.根据权利要求2所述的方法,其特征在于,在数据流的协议类型为所述工业以太网协议类型的情况下,所述使用与所述协议类型对应的协议格式,对各数据流进行字段划分,并根据所述协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果,包括:3. The method according to claim 2, wherein, in the case that the protocol type of the data stream is the industrial Ethernet protocol type, the use of the protocol format corresponding to the protocol type is performed for each data stream. Perform field division, and perform semantic analysis on each field of each data stream according to the protocol format, and obtain the semantic analysis result of each field, including: 对所述协议类型为所述工业以太网协议类型的各数据流:For each data stream whose protocol type is the Industrial Ethernet protocol type: 利用预设字符串分割算法,计算该数据流中各字节的信息熵及相邻字节间的互信息量,并根据所述信息熵和所述互信息量确定数据流的各分割点;Using a preset string segmentation algorithm, calculate the information entropy of each byte in the data stream and the mutual information between adjacent bytes, and determine each split point of the data stream according to the information entropy and the mutual information; 根据各分割点将所述数据流划分为多个字段;dividing the data stream into a plurality of fields according to each dividing point; 根据该数据流的协议类型中的以太网协议流标识,获得与所述以太网协议流标识匹配的协议格式;Obtain a protocol format matching the Ethernet protocol stream identifier according to the Ethernet protocol stream identifier in the protocol type of the data stream; 对各字段:利用预设逆向解析算法,根据所述以太网协议流标识匹配的协议格式确定各字段的第一语义解析结果,所述第一语义解析结果包括各字段的语义和数据类型。For each field: a preset reverse parsing algorithm is used to determine the first semantic parsing result of each field according to the protocol format matched with the Ethernet protocol flow identifier, and the first semantic parsing result includes the semantics and data type of each field. 4.根据权利要求3所述的方法,其特征在于,在数据流的协议类型为所述现场总线协议类型的情况下,还包括:4. The method according to claim 3, wherein, in the case that the protocol type of the data stream is the fieldbus protocol type, further comprising: 对所述协议类型为所述现场总线协议类型的各数据流:For each data stream whose protocol type is the fieldbus protocol type: 根据该数据流的字节长度确定该数据流的总线协议流标识,获得与所述总线协议流标识匹配的协议格式;Determine the bus protocol stream identifier of the data stream according to the byte length of the data stream, and obtain a protocol format matching the bus protocol stream identifier; 利用预设字节语义推断算法,根据所述与所述总线协议流标识匹配的协议格式,将该数据流划分为控制命令字段、协议数据字段和结束符字段;Using a preset byte semantic inference algorithm, according to the protocol format matching the bus protocol stream identifier, the data stream is divided into a control command field, a protocol data field and a terminator field; 利用所述预设字节语义推断算法,根据所述与所述总线协议流标识匹配的协议格式,确定所述协议数据字段中的各子字段,并对所述控制命令字段、各子字段和所述结束符字段进行语义解析,获得各字段和各子字段的第二语义解析结果,所述第二语义解析结果包括各字段或各子字段的语义和数据类型。Using the preset byte semantic inference algorithm, according to the protocol format matching the bus protocol flow identifier, each subfield in the protocol data field is determined, and the control command field, each subfield and the Semantic parsing is performed on the terminator field to obtain a second semantic parsing result of each field and each subfield, where the second semantic parsing result includes the semantics and data type of each field or each subfield. 5.根据权利要求1所述的方法,其特征在于,所述获得工业人机界面,对所述工业人机界面进行识别,获得所述工业人机界面中各显示区域的工控数据,并基于所述工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,包括:5 . The method according to claim 1 , wherein the obtaining an industrial human-machine interface, identifying the industrial human-machine interface, obtaining industrial control data of each display area in the industrial human-machine interface, and based on the The industrial control data and the industrial control data field determine the position and meaning of the area identification field and the variable data field in each data stream, including: 利用预设图像识别算法,获得所述工业人机界面中各显示区域的工控数据;Using a preset image recognition algorithm to obtain industrial control data of each display area in the industrial human-machine interface; 对各显示区域的工控数据:Industrial control data for each display area: 根据该显示区域的工控数据获取目标数据流,其中,所述目标数据流是存在与该显示区域的工控数据的数据编码匹配的字段的数据流;Obtain a target data stream according to the industrial control data in the display area, wherein the target data stream is a data stream with a field matching the data encoding of the industrial control data in the display area; 利用预设序列比对算法,将该显示区域的工控数据中的恒定数据序列与所述目标数据流的工控数据字段进行序列比对,并将所述工控数据字段中比对一致字段,确定为所述区域标识字段,并将所述预设图像识别算法识别出的该恒定数据序列的含义,确定为所述区域标识字段的含义;Using a preset sequence comparison algorithm, the constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared, and determined as the region identification field, and determine the meaning of the constant data sequence identified by the preset image recognition algorithm as the meaning of the region identification field; 利用所述预设序列比对算法,将该显示区域的工控数据中的非恒定数据序列与所述目标数据流的工控数据字段进行序列比对,并将所述工控数据字段中比对一致字段,确定为所述变量数据字段,并将所述预设图像识别算法识别出的该非恒定数据序列含义,确定为所述变量数据字段的含义。Using the preset sequence comparison algorithm, the non-constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared. is determined as the variable data field, and the meaning of the non-constant data sequence identified by the preset image recognition algorithm is determined as the meaning of the variable data field. 6.一种工控协议的语义解析系统,其特征在于,所述语义解析系统包括:6. A semantic parsing system of an industrial control protocol, wherein the semantic parsing system comprises: 协议类型确定单元,利用预设多模式匹配算法,根据预设协议头部格式要求,从总线协议流中识别各数据流,将满足所述预设协议头部格式要求的数据流的协议类型,确定为工业以太网协议类型,将不满足所述预设协议头部格式要求的数据流的协议类型,确定为现场总线协议类型;The protocol type determination unit uses a preset multi-pattern matching algorithm to identify each data stream from the bus protocol stream according to the preset protocol header format requirements, and determines the protocol type of the data stream that meets the preset protocol header format requirements, Determine as the industrial Ethernet protocol type, and determine the protocol type of the data stream that does not meet the requirements of the preset protocol header format as the field bus protocol type; 字段语义解析单元,使用与所述协议类型对应的协议格式,对各数据流进行字段划分,并根据所述协议格式对每个数据流的各字段进行语义解析,获得各字段的语义解析结果;a field semantic parsing unit, using a protocol format corresponding to the protocol type, to perform field division on each data stream, and to perform semantic parsing on each field of each data stream according to the protocol format, to obtain a semantic parsing result of each field; 关键字段确定单元,用于获得工业人机界面,对所述工业人机界面进行识别,获得所述工业人机界面中各显示区域的工控数据,并基于所述工控数据与工控数据字段确定各数据流中的区域标识字段和变量数据字段的位置及含义,其中,所述工控数据字段是所述语义解析结果中数据类型为工控数据的字段。A key field determination unit is used to obtain an industrial human-machine interface, identify the industrial human-machine interface, obtain industrial control data of each display area in the industrial human-machine interface, and determine based on the industrial control data and the industrial control data field The location and meaning of the area identification field and the variable data field in each data stream, wherein the industrial control data field is a field whose data type is industrial control data in the semantic analysis result. 7.根据权利要求6所述的语义解析系统,其特征在于,所述协议类型确定单元被设置为:7. The semantic parsing system according to claim 6, wherein the protocol type determination unit is set to: 对所述总线协议流中的各数据流:For each data stream in the bus protocol stream: 判断该数据流是否包含协议头部数据,若是,则将所述预设协议头部格式要求中,与该数据流的所述协议头部数据匹配的工业以太网协议类型确定为该数据流的协议类型;Determine whether the data stream contains protocol header data, and if so, determine the industrial Ethernet protocol type that matches the protocol header data of the data stream in the preset protocol header format requirements as the data stream of the data stream. agreement type; 在该数据流不包含所述协议头部数据的情况下,将该数据流的协议类型确定为现场总线协议类型。In the case that the data stream does not contain the protocol header data, the protocol type of the data stream is determined as the field bus protocol type. 8.根据权利要求6所述的语义解析系统,其特征在于,所述关键字段确定单元被设置为:8. The semantic parsing system according to claim 6, wherein the key field determining unit is set to: 利用预设图像识别算法,获得所述工业人机界面中各显示区域的工控数据;Using a preset image recognition algorithm to obtain industrial control data of each display area in the industrial human-machine interface; 对各显示区域的工控数据:Industrial control data for each display area: 根据该显示区域的工控数据获取目标数据流,其中,所述目标数据流是存在与该显示区域的工控数据的数据编码匹配的字段的数据流;Obtain a target data stream according to the industrial control data in the display area, wherein the target data stream is a data stream with a field matching the data encoding of the industrial control data in the display area; 利用预设序列比对算法,将该显示区域的工控数据中的恒定数据序列与所述目标数据流的工控数据字段进行序列比对,并将所述工控数据字段中比对一致字段,确定为所述区域标识字段,并将所述预设图像识别算法识别出的该恒定数据序列的含义,确定为所述区域标识字段的含义;Using a preset sequence comparison algorithm, the constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared, and determined as the region identification field, and determine the meaning of the constant data sequence identified by the preset image recognition algorithm as the meaning of the region identification field; 利用所述预设序列比对算法,将该显示区域的工控数据中的非恒定数据序列与所述目标数据流的工控数据字段进行序列比对,并将所述工控数据字段中比对一致字段,确定为所述变量数据字段,并将所述预设图像识别算法识别出的该非恒定数据序列含义,确定为所述变量数据字段的含义。Using the preset sequence comparison algorithm, the non-constant data sequence in the industrial control data in the display area is sequenced with the industrial control data field of the target data stream, and the consistent field in the industrial control data field is compared. is determined as the variable data field, and the meaning of the non-constant data sequence identified by the preset image recognition algorithm is determined as the meaning of the variable data field. 9.一种工控协议的语义解析设备,其特征在于,所述语义解析设备包括:9. A semantic parsing device for an industrial control protocol, wherein the semantic parsing device comprises: 处理器;processor; 用于存储所述处理器可执行指令的存储器;a memory for storing the processor-executable instructions; 其中,所述处理器被配置为执行所述指令,以实现如权利要求1至5中任一项所述的工控协议的语义解析方法。Wherein, the processor is configured to execute the instructions to implement the method for semantic parsing of an industrial control protocol according to any one of claims 1 to 5. 10.一种计算机可读存储介质,其特征在于,当所述计算机可读存储介质中的指令由工控协议的语义解析设备的处理器执行时,使得所述语义解析设备能够执行如权利要求1至5中任一项所述的工控协议的语义解析方法。10. A computer-readable storage medium, characterized in that, when the instructions in the computer-readable storage medium are executed by a processor of a semantic parsing device of an industrial control protocol, the semantic parsing device is enabled to execute the method of claim 1 . The semantic parsing method of the industrial control protocol described in any one of to 5.
CN202210723745.5A 2022-06-24 2022-06-24 A semantic parsing method, system, device and storage medium for industrial control protocol Active CN115134433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210723745.5A CN115134433B (en) 2022-06-24 2022-06-24 A semantic parsing method, system, device and storage medium for industrial control protocol

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210723745.5A CN115134433B (en) 2022-06-24 2022-06-24 A semantic parsing method, system, device and storage medium for industrial control protocol

Publications (2)

Publication Number Publication Date
CN115134433A true CN115134433A (en) 2022-09-30
CN115134433B CN115134433B (en) 2024-03-29

Family

ID=83379282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210723745.5A Active CN115134433B (en) 2022-06-24 2022-06-24 A semantic parsing method, system, device and storage medium for industrial control protocol

Country Status (1)

Country Link
CN (1) CN115134433B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116112409A (en) * 2023-02-17 2023-05-12 上海致景信息科技有限公司 Industrial equipment protocol automatic analysis method, system, medium and computer
CN119105384A (en) * 2024-11-05 2024-12-10 浙江国利网安科技有限公司 Automatic industrial control protocol reverse engineering system and method based on global voting expert algorithm

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035111A (en) * 2007-04-13 2007-09-12 北京启明星辰信息技术有限公司 Intelligent protocol parsing method and device
CN103188267A (en) * 2013-03-27 2013-07-03 中国科学院声学研究所 Protocol analyzing method based on DFA (Deterministic Finite Automaton)
CN103595729A (en) * 2013-11-25 2014-02-19 北京锐安科技有限公司 Protocol analysis method and device
CN109547409A (en) * 2018-10-19 2019-03-29 中国电力科学研究院有限公司 A kind of method and system for being parsed to industrial network transport protocol
WO2020143226A1 (en) * 2019-01-07 2020-07-16 浙江大学 Industrial control system intrusion detection method based on integrated learning
CN111585832A (en) * 2020-04-01 2020-08-25 浙江树人学院(浙江树人大学) A Reverse Analysis Method of Industrial Control Protocol Based on Semantic Pre-mining
CN111723579A (en) * 2020-06-17 2020-09-29 国家计算机网络与信息安全管理中心 Industrial control protocol field and semantic reverse inference method
CN114553983A (en) * 2022-03-03 2022-05-27 沈阳化工大学 An efficient industrial control protocol analysis method based on deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035111A (en) * 2007-04-13 2007-09-12 北京启明星辰信息技术有限公司 Intelligent protocol parsing method and device
CN103188267A (en) * 2013-03-27 2013-07-03 中国科学院声学研究所 Protocol analyzing method based on DFA (Deterministic Finite Automaton)
CN103595729A (en) * 2013-11-25 2014-02-19 北京锐安科技有限公司 Protocol analysis method and device
CN109547409A (en) * 2018-10-19 2019-03-29 中国电力科学研究院有限公司 A kind of method and system for being parsed to industrial network transport protocol
WO2020143226A1 (en) * 2019-01-07 2020-07-16 浙江大学 Industrial control system intrusion detection method based on integrated learning
CN111585832A (en) * 2020-04-01 2020-08-25 浙江树人学院(浙江树人大学) A Reverse Analysis Method of Industrial Control Protocol Based on Semantic Pre-mining
CN111723579A (en) * 2020-06-17 2020-09-29 国家计算机网络与信息安全管理中心 Industrial control protocol field and semantic reverse inference method
CN114553983A (en) * 2022-03-03 2022-05-27 沈阳化工大学 An efficient industrial control protocol analysis method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程必成;刘仁辉;赵云飞;许凤凯;: "非标工业控制协议格式逆向方法研究", 电子技术应用, no. 04, 6 April 2018 (2018-04-06) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116112409A (en) * 2023-02-17 2023-05-12 上海致景信息科技有限公司 Industrial equipment protocol automatic analysis method, system, medium and computer
CN119105384A (en) * 2024-11-05 2024-12-10 浙江国利网安科技有限公司 Automatic industrial control protocol reverse engineering system and method based on global voting expert algorithm

Also Published As

Publication number Publication date
CN115134433B (en) 2024-03-29

Similar Documents

Publication Publication Date Title
US10936645B2 (en) Method and apparatus for generating to-be-played multimedia content
CN112800427B (en) Webshell detection method and device, electronic equipment and storage medium
CN115134433B (en) A semantic parsing method, system, device and storage medium for industrial control protocol
WO2020233360A1 (en) Method and device for generating product evaluation model
CN109460220A (en) The predefined code generating method of message, device, electronic equipment and storage medium
CN111104214B (en) Workflow application method and device
CN112085087A (en) Method and device for generating business rules, computer equipment and storage medium
CN116033048B (en) Multi-protocol analysis method of Internet of things, electronic equipment and storage medium
CN118916499A (en) Query method integrating AI large model and knowledge graph
CN103200203B (en) Based on the semantic class protocol format estimating method performing track
CN117787216A (en) Training method and device of format conversion model, electronic equipment and storage medium
US12222977B2 (en) Method of processing multimedia data, device and medium
CN116192527A (en) Attack traffic detection rule generation method, device, equipment and storage medium
JP7509886B2 (en) Method and apparatus for pushing subscription data in the internet of things, and devices and storage media thereof
CN115334179A (en) Unknown protocol reverse analysis method based on named entity recognition
EP1710718B1 (en) Systems and methods for performing streaming checks on data format for UDTs
CN114201756A (en) Vulnerability detection method and related device for intelligent contract code segment
CN112883088B (en) Data processing method, device, equipment and storage medium
CN119071204A (en) A method, device, electronic device and medium for analyzing multiple power communication protocols
CN117375958A (en) Web application system identification method and device and readable storage medium
CN114443476B (en) Code review method and device
CN117150001A (en) A log parsing method, device, equipment and storage medium
CN115412274A (en) Attack tracing method and related data processing and association display method and device
CN114492324A (en) Component data statistics method and device
CN115033688A (en) Method, device, equipment and storage medium for identifying alarm event type

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant