CN111723181A - Industrial control protocol reverse analysis method based on active learning - Google Patents

Industrial control protocol reverse analysis method based on active learning Download PDF

Info

Publication number
CN111723181A
CN111723181A CN202010553659.5A CN202010553659A CN111723181A CN 111723181 A CN111723181 A CN 111723181A CN 202010553659 A CN202010553659 A CN 202010553659A CN 111723181 A CN111723181 A CN 111723181A
Authority
CN
China
Prior art keywords
message
protocol
industrial control
active learning
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010553659.5A
Other languages
Chinese (zh)
Inventor
张晓明
何跃鹰
孙中豪
张嘉玮
曹可建
王占丰
马玮骏
毛传奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Lexbell Information Technology Co ltd
National Computer Network and Information Security Management Center
Original Assignee
Nanjing Lexbell Information Technology Co ltd
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Lexbell Information Technology Co ltd, National Computer Network and Information Security Management Center filed Critical Nanjing Lexbell Information Technology Co ltd
Priority to CN202010553659.5A priority Critical patent/CN111723181A/en
Publication of CN111723181A publication Critical patent/CN111723181A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Communication Control (AREA)

Abstract

The invention discloses an industrial control protocol reverse analysis method based on active learning, which comprises the steps of importing, primarily analyzing, mutating, matching and merging, wherein a part of message formats and state machines of an industrial control protocol are mastered by primarily analyzing a sample of a pcap message of the industrial control protocol, then the result is utilized to carry out interactive active learning with an industrial control computer, and new messages are continuously obtained, so that the word method and grammar of the protocol are more accurately and completely deduced, a Needleman-Wunsch sequence comparison algorithm is adopted during the reverse analysis of the protocol, the format and the state machines of the protocol are deduced by similarity scoring and optimal backtracking steps, the accuracy of an analysis result is effectively ensured, meanwhile, the response message is matched with the protocol formats in the primary analysis result by combining with the active learning process, whether the message is matched with the protocol formats or not is judged, and repeated matching is carried out according to requirements, the reverse accuracy and coverage of the industrial control protocol are obviously improved.

Description

Industrial control protocol reverse analysis method based on active learning
Technical Field
The invention relates to the technical field of protocol format analysis, in particular to an industrial control protocol reverse analysis method based on active learning.
Background
An industrial control system, called industrial control system for short, is an automatic control system composed of computer equipment and industrial process control components, and is widely applied to industries such as electric power, water treatment, petroleum and natural gas, chemical industry, transportation, manufacturing industry and the like.
The reverse analysis of unknown industrial control protocols mainly adopts an analysis method based on network flow, the method is universal, only communication samples of the industrial control protocols are led into an analysis system in a pcap mode, and then the formats and state machines of the industrial control protocols can be obtained through reversible analysis.
Disclosure of Invention
The invention provides an industrial control protocol reverse analysis method based on active learning, which can effectively solve the problem that samples of the industrial control protocol cannot cover all message formats and state machines of the protocol in many times in the traditional processing method provided by the background technology, so that the analysis result is inaccurate and incomplete.
In order to achieve the purpose, the invention provides the following technical scheme: an industrial control protocol reverse analysis method based on active learning comprises the following steps:
s1, importing: importing message data in the pcap file, and loading all the message data into a message data set OriginalSet;
s2, preliminary analysis: the message in the message data set OriginalSet is subjected to reverse analysis by the algorithm to obtain a preliminary industrial control protocol format and a state machine;
s3, mutation: according to the preliminarily obtained analysis result, the functional code field in the protocol format is mutated to generate a new message;
s4, matching: matching the response message with the protocol format in the preliminary analysis result through interactive active learning, screening out messages which are not matched with the existing protocol format, and adding the screened messages into a message data set Newset;
s5, merging: and performing reverse analysis on the actively learned message, and combining the analyzed result with the primary analysis result to obtain a complete analysis result.
Preferably, in step S1, the running environment is a PC with an Intel-Windows architecture, the industrial personal computer running the industrial control protocol server-side program, and the sample data set with the format of a pcap type, and is obtained by a packet capture method using a wireshark tool.
Preferably, in step S2, the algorithm performs a reverse analysis on the message in the message dataset OriginalSet, wherein a Needleman-Wunsch sequence comparison algorithm is used for performing the reverse analysis on the protocol, and the format and the state machine of the protocol are inferred through similarity scoring and optimal backtracking.
Preferably, in step S4, the method uses a new message to perform interactive active learning with the industrial personal computer, and continuously obtains the new message, and the specific steps include:
a. sending the newly generated message to an industrial personal computer and receiving a response message of the industrial personal computer;
b. matching the response message with the protocol formats in the preliminary analysis result, judging whether the message is matched with the protocol formats, if so, performing the step d, otherwise, performing the step c;
c. adding a response message of the industrial personal computer into a NewSet set;
d. and c, judging whether the active learning process is finished or not, if so, finishing the active learning, and otherwise, returning to the step a.
Preferably, in step S5, after the active learning, the message in the message data set NewSet is reversely analyzed by using the Needleman-Wunsch sequence comparison algorithm again, so as to obtain a new industrial control protocol format and a new state machine, and the analyzed result is merged with the preliminary analysis result.
Compared with the prior art, the invention has the beneficial effects that: the invention has scientific and reasonable structure and safe and convenient use, masters partial message formats and state machines of the industrial control protocol by preliminarily analyzing the message samples of the industrial control protocol pcap, then the result is used to carry out interactive active learning with the industrial personal computer, continuously obtain new messages, thereby deducing the lexical and grammatical relations of the protocol more accurately and completely, and adopting a Needleman-Wunsch sequence comparison algorithm when reversely analyzing the protocol, the algorithm infers the format and the state machine of the protocol through similarity scoring and optimal backtracking steps, effectively ensures the accuracy of the analysis result, meanwhile, the active learning process is combined, the response message is matched with the protocol formats in the preliminary analysis result, whether the message is matched with the protocol formats is judged, repeated matching is carried out according to requirements, and the reverse accuracy and coverage of the industrial control protocol are obviously improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
In the drawings:
FIG. 1 is a schematic block diagram of an industrial control protocol reverse analysis method for active learning according to the present invention;
FIG. 2 is a schematic block diagram of the present invention for active learning to match.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example (b): as shown in fig. 1, a method for inverse analysis of an industrial control protocol based on active learning includes the following steps:
s1, importing: importing message data in the pcap file, and loading all the message data into a message data set OriginalSet;
s2, preliminary analysis: the message in the message data set OriginalSet is subjected to reverse analysis by the algorithm to obtain a preliminary industrial control protocol format and a state machine;
s3, mutation: according to the preliminarily obtained analysis result, the functional code field in the protocol format is mutated to generate a new message;
s4, matching: matching the response message with the protocol format in the preliminary analysis result through interactive active learning, screening out messages which are not matched with the existing protocol format, and adding the screened messages into a message data set Newset;
s5, merging: and performing reverse analysis on the actively learned message, and combining the analyzed result with the primary analysis result to obtain a complete analysis result.
Further, in step S1, the operating environment is a PC with an Intel-Windows architecture, an industrial personal computer running an industrial control protocol server-side program, and a sample data set with a pcap type format, and the sample data set is obtained by a wireshark tool in a packet-grabbing manner, where in this embodiment, the PC with Core eight-Core CPUs of 2.5GHz and above of the PC hardware has a main frequency of 2.5GHz and above, the memory is not less than 4GB, the hard disk is 500GB, and the operating system Windows10 is run, the PC with Core eight-Core CPUs of 2.5GHz and above of the industrial personal computer hardware has a main frequency of 2GB and above, and the memory is not less than 2GB, the hard disk is 100GB, and the operating system Windows10 is run.
Further, in step S2, the algorithm performs a reverse analysis on the packet in the packet data set OriginalSet, wherein a Needleman-Wunsch sequence comparison algorithm is used for performing the reverse analysis on the protocol, and the format and the state machine of the protocol are inferred through similarity scoring and optimal backtracking.
As shown in fig. 2, in step S4, the method performs interactive active learning with the industrial personal computer by using a new message, and continuously obtains the new message, and the method specifically includes:
a. sending the newly generated message to an industrial personal computer and receiving a response message of the industrial personal computer;
b. matching the response message with the protocol formats in the preliminary analysis result, judging whether the message is matched with the protocol formats, if so, performing the step d, otherwise, performing the step c;
c. adding a response message of the industrial personal computer into a NewSet set;
d. and c, judging whether the active learning process is finished or not, if so, finishing the active learning, and otherwise, returning to the step a.
By adopting an active learning method, the tester actively communicates with the industrial personal computer, and a new message format and a state machine are obtained by utilizing the variation of message contents, so that the optimization and the perfection of the analysis result of the industrial control protocol are realized.
Further, in step S5, after the active learning, the message in the message data set NewSet is reversely analyzed by using the Needleman-Wunsch sequence comparison algorithm again to obtain a new industrial control protocol format and a state machine, and the analyzed result is merged with the preliminary analysis result to obtain a complete analysis result, and all the analysis results are saved, and the analysis is finished.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. An industrial control protocol reverse analysis method based on active learning is characterized by comprising the following steps:
s1, importing: importing message data in the pcap file, and loading all the message data into a message data set OriginalSet;
s2, preliminary analysis: the message in the message data set OriginalSet is subjected to reverse analysis by the algorithm to obtain a preliminary industrial control protocol format and a state machine;
s3, mutation: according to the preliminarily obtained analysis result, the functional code field in the protocol format is mutated to generate a new message;
s4, matching: matching the response message with the protocol format in the preliminary analysis result through interactive active learning, screening out messages which are not matched with the existing protocol format, and adding the screened messages into a message data set Newset;
s5, merging: and performing reverse analysis on the actively learned message, and combining the analyzed result with the primary analysis result to obtain a complete analysis result.
2. The active learning-based inverse analysis method for industrial control protocols, according to claim 1, is characterized in that: in step S1, a PC with an Intel-Windows architecture operating environment, an industrial personal computer with an industrial control protocol server-side program operating environment, and a sample data set with a pcap type format are obtained by using a wireshark tool in a packet capture manner.
3. The active learning-based inverse analysis method for industrial control protocols, according to claim 1, is characterized in that: in step S2, the algorithm performs a reverse analysis on the packet in the packet data set OriginalSet, wherein a Needleman-Wunsch sequence comparison algorithm is used for the reverse analysis of the protocol, and the format and state machine of the protocol are inferred through similarity scoring and optimal backtracking.
4. The active learning-based inverse analysis method for industrial control protocols, according to claim 1, is characterized in that: in step S4, interactive active learning is performed with the industrial personal computer by using the new message, and the new message is continuously acquired, which specifically includes:
a. sending the newly generated message to an industrial personal computer and receiving a response message of the industrial personal computer;
b. matching the response message with the protocol formats in the preliminary analysis result, judging whether the message is matched with the protocol formats, if so, performing the step d, otherwise, performing the step c;
c. adding a response message of the industrial personal computer into a NewSet set;
d. and c, judging whether the active learning process is finished or not, if so, finishing the active learning, and otherwise, returning to the step a.
5. The active learning-based inverse analysis method for industrial control protocols, according to claim 1, is characterized in that: in step S5, after the active learning, the message in the message data set NewSet is reversely analyzed by using the Needleman-Wunsch sequence comparison algorithm again to obtain a new industrial control protocol format and a new state machine, and the analyzed result is merged with the preliminary analysis result.
CN202010553659.5A 2020-06-17 2020-06-17 Industrial control protocol reverse analysis method based on active learning Pending CN111723181A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010553659.5A CN111723181A (en) 2020-06-17 2020-06-17 Industrial control protocol reverse analysis method based on active learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010553659.5A CN111723181A (en) 2020-06-17 2020-06-17 Industrial control protocol reverse analysis method based on active learning

Publications (1)

Publication Number Publication Date
CN111723181A true CN111723181A (en) 2020-09-29

Family

ID=72567209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010553659.5A Pending CN111723181A (en) 2020-06-17 2020-06-17 Industrial control protocol reverse analysis method based on active learning

Country Status (1)

Country Link
CN (1) CN111723181A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112422515A (en) * 2020-10-27 2021-02-26 锐捷网络股份有限公司 Protocol vulnerability testing method and device and storage medium
CN113132366A (en) * 2021-04-07 2021-07-16 深圳市奇虎智能科技有限公司 Method, system, storage medium and computer device for interactive protocol reversal
CN113535731A (en) * 2021-07-21 2021-10-22 北京威努特技术有限公司 Heuristic message state interactive self-learning method and device
CN115065623A (en) * 2022-08-15 2022-09-16 国家计算机网络与信息安全管理中心江苏分中心 Active and passive combined reverse analysis method for private industrial control protocol

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297427A (en) * 2013-05-21 2013-09-11 中国科学院信息工程研究所 Unknown network protocol identification method and system
CN105847249A (en) * 2016-03-22 2016-08-10 英赛克科技(北京)有限公司 Safety protection system and method for Modbus network
CN106326119A (en) * 2016-08-19 2017-01-11 北京匡恩网络科技有限责任公司 Method and device for generating test case
CN109462590A (en) * 2018-11-15 2019-03-12 成都网域复兴科技有限公司 A kind of unknown protocol conversed analysis method based on fuzz testing
CN110213130A (en) * 2019-06-03 2019-09-06 南京莱克贝尔信息技术有限公司 A kind of industry control protocol format analysis method based on iteration optimization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297427A (en) * 2013-05-21 2013-09-11 中国科学院信息工程研究所 Unknown network protocol identification method and system
CN105847249A (en) * 2016-03-22 2016-08-10 英赛克科技(北京)有限公司 Safety protection system and method for Modbus network
CN106326119A (en) * 2016-08-19 2017-01-11 北京匡恩网络科技有限责任公司 Method and device for generating test case
CN109462590A (en) * 2018-11-15 2019-03-12 成都网域复兴科技有限公司 A kind of unknown protocol conversed analysis method based on fuzz testing
CN110213130A (en) * 2019-06-03 2019-09-06 南京莱克贝尔信息技术有限公司 A kind of industry control protocol format analysis method based on iteration optimization

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张钊;温巧燕;唐文;: "协议规范挖掘研究综述", 计算机工程与应用, no. 09, pages 1 - 9 *
王珂: "基于等保2.0的工控系统网络安全防护技术方案研究", 《电子技术与软件工程》, no. 181, pages 255 - 256 *
费远鹏;陈剑云;马书研;: "基于Modbus协议的交流采样测量系统的实现", 微计算机信息, no. 23, pages 21 - 23 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112422515A (en) * 2020-10-27 2021-02-26 锐捷网络股份有限公司 Protocol vulnerability testing method and device and storage medium
CN112422515B (en) * 2020-10-27 2023-03-21 锐捷网络股份有限公司 Protocol vulnerability testing method and device and storage medium
CN113132366A (en) * 2021-04-07 2021-07-16 深圳市奇虎智能科技有限公司 Method, system, storage medium and computer device for interactive protocol reversal
CN113535731A (en) * 2021-07-21 2021-10-22 北京威努特技术有限公司 Heuristic message state interactive self-learning method and device
CN113535731B (en) * 2021-07-21 2024-04-16 北京威努特技术有限公司 Heuristic-based message state interaction self-learning method and device
CN115065623A (en) * 2022-08-15 2022-09-16 国家计算机网络与信息安全管理中心江苏分中心 Active and passive combined reverse analysis method for private industrial control protocol

Similar Documents

Publication Publication Date Title
CN111723181A (en) Industrial control protocol reverse analysis method based on active learning
US20210209410A1 (en) Method and apparatus for classification of wafer defect patterns as well as storage medium and electronic device
CN107122594B (en) New energy vehicle battery health prediction method and system
CN108600195B (en) Rapid industrial control protocol format reverse inference method based on incremental learning
CN110727437B (en) Code optimization item acquisition method and device, storage medium and electronic equipment
WO2021174812A1 (en) Data cleaning method and apparatus for profile, and medium and electronic device
CN110162518B (en) Data grouping method, device, electronic equipment and storage medium
CN111723579A (en) Industrial control protocol field and semantic reverse inference method
CN116112271B (en) Session data processing method, electronic equipment and storage medium
CN111178701B (en) Risk control method and device based on feature derivation technology and electronic equipment
CN111563172A (en) Academic hotspot trend prediction method and device based on dynamic knowledge graph construction
CN117648931A (en) Code examination method, device, electronic equipment and medium
CN112242136B (en) Improving test coverage of session models
WO2024140909A1 (en) Matching model training method and apparatus, device, and medium
CN110782128A (en) User occupation label generation method and device and electronic equipment
CN118132049A (en) Online collaborative programming processing method, device and storage medium
CN109766260B (en) Method, device, electronic equipment and storage medium for configuring test action
Wang et al. Natural is the best: Model-agnostic code simplification for pre-trained large language models
CN112749082B (en) Test case generation method and system based on DE-TH algorithm
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
Nguyen et al. Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things-SEA4DQ'22 Report
CN112860671A (en) Production factor data abnormity diagnosis method and device
CN113849540B (en) Fault prediction model training and predicting method and device, electronic equipment and medium
Zimmermann et al. All that Glitters Is not Gold: Type‐I Error Controlled Variable Selection from Clinical Trial Data
CN111881128B (en) Big data regression verification method and big data regression verification device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination