CN111723181A - Industrial control protocol reverse analysis method based on active learning - Google Patents
Industrial control protocol reverse analysis method based on active learning Download PDFInfo
- Publication number
- CN111723181A CN111723181A CN202010553659.5A CN202010553659A CN111723181A CN 111723181 A CN111723181 A CN 111723181A CN 202010553659 A CN202010553659 A CN 202010553659A CN 111723181 A CN111723181 A CN 111723181A
- Authority
- CN
- China
- Prior art keywords
- message
- protocol
- industrial control
- active learning
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 61
- 230000002441 reversible effect Effects 0.000 title claims abstract description 25
- 230000004044 response Effects 0.000 claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 13
- 230000002452 interceptive effect Effects 0.000 claims abstract description 8
- 230000035772 mutation Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003345 natural gas Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000003209 petroleum derivative Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Communication Control (AREA)
Abstract
The invention discloses an industrial control protocol reverse analysis method based on active learning, which comprises the steps of importing, primarily analyzing, mutating, matching and merging, wherein a part of message formats and state machines of an industrial control protocol are mastered by primarily analyzing a sample of a pcap message of the industrial control protocol, then the result is utilized to carry out interactive active learning with an industrial control computer, and new messages are continuously obtained, so that the word method and grammar of the protocol are more accurately and completely deduced, a Needleman-Wunsch sequence comparison algorithm is adopted during the reverse analysis of the protocol, the format and the state machines of the protocol are deduced by similarity scoring and optimal backtracking steps, the accuracy of an analysis result is effectively ensured, meanwhile, the response message is matched with the protocol formats in the primary analysis result by combining with the active learning process, whether the message is matched with the protocol formats or not is judged, and repeated matching is carried out according to requirements, the reverse accuracy and coverage of the industrial control protocol are obviously improved.
Description
Technical Field
The invention relates to the technical field of protocol format analysis, in particular to an industrial control protocol reverse analysis method based on active learning.
Background
An industrial control system, called industrial control system for short, is an automatic control system composed of computer equipment and industrial process control components, and is widely applied to industries such as electric power, water treatment, petroleum and natural gas, chemical industry, transportation, manufacturing industry and the like.
The reverse analysis of unknown industrial control protocols mainly adopts an analysis method based on network flow, the method is universal, only communication samples of the industrial control protocols are led into an analysis system in a pcap mode, and then the formats and state machines of the industrial control protocols can be obtained through reversible analysis.
Disclosure of Invention
The invention provides an industrial control protocol reverse analysis method based on active learning, which can effectively solve the problem that samples of the industrial control protocol cannot cover all message formats and state machines of the protocol in many times in the traditional processing method provided by the background technology, so that the analysis result is inaccurate and incomplete.
In order to achieve the purpose, the invention provides the following technical scheme: an industrial control protocol reverse analysis method based on active learning comprises the following steps:
s1, importing: importing message data in the pcap file, and loading all the message data into a message data set OriginalSet;
s2, preliminary analysis: the message in the message data set OriginalSet is subjected to reverse analysis by the algorithm to obtain a preliminary industrial control protocol format and a state machine;
s3, mutation: according to the preliminarily obtained analysis result, the functional code field in the protocol format is mutated to generate a new message;
s4, matching: matching the response message with the protocol format in the preliminary analysis result through interactive active learning, screening out messages which are not matched with the existing protocol format, and adding the screened messages into a message data set Newset;
s5, merging: and performing reverse analysis on the actively learned message, and combining the analyzed result with the primary analysis result to obtain a complete analysis result.
Preferably, in step S1, the running environment is a PC with an Intel-Windows architecture, the industrial personal computer running the industrial control protocol server-side program, and the sample data set with the format of a pcap type, and is obtained by a packet capture method using a wireshark tool.
Preferably, in step S2, the algorithm performs a reverse analysis on the message in the message dataset OriginalSet, wherein a Needleman-Wunsch sequence comparison algorithm is used for performing the reverse analysis on the protocol, and the format and the state machine of the protocol are inferred through similarity scoring and optimal backtracking.
Preferably, in step S4, the method uses a new message to perform interactive active learning with the industrial personal computer, and continuously obtains the new message, and the specific steps include:
a. sending the newly generated message to an industrial personal computer and receiving a response message of the industrial personal computer;
b. matching the response message with the protocol formats in the preliminary analysis result, judging whether the message is matched with the protocol formats, if so, performing the step d, otherwise, performing the step c;
c. adding a response message of the industrial personal computer into a NewSet set;
d. and c, judging whether the active learning process is finished or not, if so, finishing the active learning, and otherwise, returning to the step a.
Preferably, in step S5, after the active learning, the message in the message data set NewSet is reversely analyzed by using the Needleman-Wunsch sequence comparison algorithm again, so as to obtain a new industrial control protocol format and a new state machine, and the analyzed result is merged with the preliminary analysis result.
Compared with the prior art, the invention has the beneficial effects that: the invention has scientific and reasonable structure and safe and convenient use, masters partial message formats and state machines of the industrial control protocol by preliminarily analyzing the message samples of the industrial control protocol pcap, then the result is used to carry out interactive active learning with the industrial personal computer, continuously obtain new messages, thereby deducing the lexical and grammatical relations of the protocol more accurately and completely, and adopting a Needleman-Wunsch sequence comparison algorithm when reversely analyzing the protocol, the algorithm infers the format and the state machine of the protocol through similarity scoring and optimal backtracking steps, effectively ensures the accuracy of the analysis result, meanwhile, the active learning process is combined, the response message is matched with the protocol formats in the preliminary analysis result, whether the message is matched with the protocol formats is judged, repeated matching is carried out according to requirements, and the reverse accuracy and coverage of the industrial control protocol are obviously improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
In the drawings:
FIG. 1 is a schematic block diagram of an industrial control protocol reverse analysis method for active learning according to the present invention;
FIG. 2 is a schematic block diagram of the present invention for active learning to match.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example (b): as shown in fig. 1, a method for inverse analysis of an industrial control protocol based on active learning includes the following steps:
s1, importing: importing message data in the pcap file, and loading all the message data into a message data set OriginalSet;
s2, preliminary analysis: the message in the message data set OriginalSet is subjected to reverse analysis by the algorithm to obtain a preliminary industrial control protocol format and a state machine;
s3, mutation: according to the preliminarily obtained analysis result, the functional code field in the protocol format is mutated to generate a new message;
s4, matching: matching the response message with the protocol format in the preliminary analysis result through interactive active learning, screening out messages which are not matched with the existing protocol format, and adding the screened messages into a message data set Newset;
s5, merging: and performing reverse analysis on the actively learned message, and combining the analyzed result with the primary analysis result to obtain a complete analysis result.
Further, in step S1, the operating environment is a PC with an Intel-Windows architecture, an industrial personal computer running an industrial control protocol server-side program, and a sample data set with a pcap type format, and the sample data set is obtained by a wireshark tool in a packet-grabbing manner, where in this embodiment, the PC with Core eight-Core CPUs of 2.5GHz and above of the PC hardware has a main frequency of 2.5GHz and above, the memory is not less than 4GB, the hard disk is 500GB, and the operating system Windows10 is run, the PC with Core eight-Core CPUs of 2.5GHz and above of the industrial personal computer hardware has a main frequency of 2GB and above, and the memory is not less than 2GB, the hard disk is 100GB, and the operating system Windows10 is run.
Further, in step S2, the algorithm performs a reverse analysis on the packet in the packet data set OriginalSet, wherein a Needleman-Wunsch sequence comparison algorithm is used for performing the reverse analysis on the protocol, and the format and the state machine of the protocol are inferred through similarity scoring and optimal backtracking.
As shown in fig. 2, in step S4, the method performs interactive active learning with the industrial personal computer by using a new message, and continuously obtains the new message, and the method specifically includes:
a. sending the newly generated message to an industrial personal computer and receiving a response message of the industrial personal computer;
b. matching the response message with the protocol formats in the preliminary analysis result, judging whether the message is matched with the protocol formats, if so, performing the step d, otherwise, performing the step c;
c. adding a response message of the industrial personal computer into a NewSet set;
d. and c, judging whether the active learning process is finished or not, if so, finishing the active learning, and otherwise, returning to the step a.
By adopting an active learning method, the tester actively communicates with the industrial personal computer, and a new message format and a state machine are obtained by utilizing the variation of message contents, so that the optimization and the perfection of the analysis result of the industrial control protocol are realized.
Further, in step S5, after the active learning, the message in the message data set NewSet is reversely analyzed by using the Needleman-Wunsch sequence comparison algorithm again to obtain a new industrial control protocol format and a state machine, and the analyzed result is merged with the preliminary analysis result to obtain a complete analysis result, and all the analysis results are saved, and the analysis is finished.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (5)
1. An industrial control protocol reverse analysis method based on active learning is characterized by comprising the following steps:
s1, importing: importing message data in the pcap file, and loading all the message data into a message data set OriginalSet;
s2, preliminary analysis: the message in the message data set OriginalSet is subjected to reverse analysis by the algorithm to obtain a preliminary industrial control protocol format and a state machine;
s3, mutation: according to the preliminarily obtained analysis result, the functional code field in the protocol format is mutated to generate a new message;
s4, matching: matching the response message with the protocol format in the preliminary analysis result through interactive active learning, screening out messages which are not matched with the existing protocol format, and adding the screened messages into a message data set Newset;
s5, merging: and performing reverse analysis on the actively learned message, and combining the analyzed result with the primary analysis result to obtain a complete analysis result.
2. The active learning-based inverse analysis method for industrial control protocols, according to claim 1, is characterized in that: in step S1, a PC with an Intel-Windows architecture operating environment, an industrial personal computer with an industrial control protocol server-side program operating environment, and a sample data set with a pcap type format are obtained by using a wireshark tool in a packet capture manner.
3. The active learning-based inverse analysis method for industrial control protocols, according to claim 1, is characterized in that: in step S2, the algorithm performs a reverse analysis on the packet in the packet data set OriginalSet, wherein a Needleman-Wunsch sequence comparison algorithm is used for the reverse analysis of the protocol, and the format and state machine of the protocol are inferred through similarity scoring and optimal backtracking.
4. The active learning-based inverse analysis method for industrial control protocols, according to claim 1, is characterized in that: in step S4, interactive active learning is performed with the industrial personal computer by using the new message, and the new message is continuously acquired, which specifically includes:
a. sending the newly generated message to an industrial personal computer and receiving a response message of the industrial personal computer;
b. matching the response message with the protocol formats in the preliminary analysis result, judging whether the message is matched with the protocol formats, if so, performing the step d, otherwise, performing the step c;
c. adding a response message of the industrial personal computer into a NewSet set;
d. and c, judging whether the active learning process is finished or not, if so, finishing the active learning, and otherwise, returning to the step a.
5. The active learning-based inverse analysis method for industrial control protocols, according to claim 1, is characterized in that: in step S5, after the active learning, the message in the message data set NewSet is reversely analyzed by using the Needleman-Wunsch sequence comparison algorithm again to obtain a new industrial control protocol format and a new state machine, and the analyzed result is merged with the preliminary analysis result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010553659.5A CN111723181A (en) | 2020-06-17 | 2020-06-17 | Industrial control protocol reverse analysis method based on active learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010553659.5A CN111723181A (en) | 2020-06-17 | 2020-06-17 | Industrial control protocol reverse analysis method based on active learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111723181A true CN111723181A (en) | 2020-09-29 |
Family
ID=72567209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010553659.5A Pending CN111723181A (en) | 2020-06-17 | 2020-06-17 | Industrial control protocol reverse analysis method based on active learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111723181A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112422515A (en) * | 2020-10-27 | 2021-02-26 | 锐捷网络股份有限公司 | Protocol vulnerability testing method and device and storage medium |
CN113132366A (en) * | 2021-04-07 | 2021-07-16 | 深圳市奇虎智能科技有限公司 | Method, system, storage medium and computer device for interactive protocol reversal |
CN113535731A (en) * | 2021-07-21 | 2021-10-22 | 北京威努特技术有限公司 | Heuristic message state interactive self-learning method and device |
CN115065623A (en) * | 2022-08-15 | 2022-09-16 | 国家计算机网络与信息安全管理中心江苏分中心 | Active and passive combined reverse analysis method for private industrial control protocol |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103297427A (en) * | 2013-05-21 | 2013-09-11 | 中国科学院信息工程研究所 | Unknown network protocol identification method and system |
CN105847249A (en) * | 2016-03-22 | 2016-08-10 | 英赛克科技(北京)有限公司 | Safety protection system and method for Modbus network |
CN106326119A (en) * | 2016-08-19 | 2017-01-11 | 北京匡恩网络科技有限责任公司 | Method and device for generating test case |
CN109462590A (en) * | 2018-11-15 | 2019-03-12 | 成都网域复兴科技有限公司 | A kind of unknown protocol conversed analysis method based on fuzz testing |
CN110213130A (en) * | 2019-06-03 | 2019-09-06 | 南京莱克贝尔信息技术有限公司 | A kind of industry control protocol format analysis method based on iteration optimization |
-
2020
- 2020-06-17 CN CN202010553659.5A patent/CN111723181A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103297427A (en) * | 2013-05-21 | 2013-09-11 | 中国科学院信息工程研究所 | Unknown network protocol identification method and system |
CN105847249A (en) * | 2016-03-22 | 2016-08-10 | 英赛克科技(北京)有限公司 | Safety protection system and method for Modbus network |
CN106326119A (en) * | 2016-08-19 | 2017-01-11 | 北京匡恩网络科技有限责任公司 | Method and device for generating test case |
CN109462590A (en) * | 2018-11-15 | 2019-03-12 | 成都网域复兴科技有限公司 | A kind of unknown protocol conversed analysis method based on fuzz testing |
CN110213130A (en) * | 2019-06-03 | 2019-09-06 | 南京莱克贝尔信息技术有限公司 | A kind of industry control protocol format analysis method based on iteration optimization |
Non-Patent Citations (3)
Title |
---|
张钊;温巧燕;唐文;: "协议规范挖掘研究综述", 计算机工程与应用, no. 09, pages 1 - 9 * |
王珂: "基于等保2.0的工控系统网络安全防护技术方案研究", 《电子技术与软件工程》, no. 181, pages 255 - 256 * |
费远鹏;陈剑云;马书研;: "基于Modbus协议的交流采样测量系统的实现", 微计算机信息, no. 23, pages 21 - 23 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112422515A (en) * | 2020-10-27 | 2021-02-26 | 锐捷网络股份有限公司 | Protocol vulnerability testing method and device and storage medium |
CN112422515B (en) * | 2020-10-27 | 2023-03-21 | 锐捷网络股份有限公司 | Protocol vulnerability testing method and device and storage medium |
CN113132366A (en) * | 2021-04-07 | 2021-07-16 | 深圳市奇虎智能科技有限公司 | Method, system, storage medium and computer device for interactive protocol reversal |
CN113535731A (en) * | 2021-07-21 | 2021-10-22 | 北京威努特技术有限公司 | Heuristic message state interactive self-learning method and device |
CN113535731B (en) * | 2021-07-21 | 2024-04-16 | 北京威努特技术有限公司 | Heuristic-based message state interaction self-learning method and device |
CN115065623A (en) * | 2022-08-15 | 2022-09-16 | 国家计算机网络与信息安全管理中心江苏分中心 | Active and passive combined reverse analysis method for private industrial control protocol |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111723181A (en) | Industrial control protocol reverse analysis method based on active learning | |
US20210209410A1 (en) | Method and apparatus for classification of wafer defect patterns as well as storage medium and electronic device | |
CN107122594B (en) | New energy vehicle battery health prediction method and system | |
CN108600195B (en) | Rapid industrial control protocol format reverse inference method based on incremental learning | |
CN110727437B (en) | Code optimization item acquisition method and device, storage medium and electronic equipment | |
WO2021174812A1 (en) | Data cleaning method and apparatus for profile, and medium and electronic device | |
CN110162518B (en) | Data grouping method, device, electronic equipment and storage medium | |
CN111723579A (en) | Industrial control protocol field and semantic reverse inference method | |
CN116112271B (en) | Session data processing method, electronic equipment and storage medium | |
CN111178701B (en) | Risk control method and device based on feature derivation technology and electronic equipment | |
CN111563172A (en) | Academic hotspot trend prediction method and device based on dynamic knowledge graph construction | |
CN117648931A (en) | Code examination method, device, electronic equipment and medium | |
CN112242136B (en) | Improving test coverage of session models | |
WO2024140909A1 (en) | Matching model training method and apparatus, device, and medium | |
CN110782128A (en) | User occupation label generation method and device and electronic equipment | |
CN118132049A (en) | Online collaborative programming processing method, device and storage medium | |
CN109766260B (en) | Method, device, electronic equipment and storage medium for configuring test action | |
Wang et al. | Natural is the best: Model-agnostic code simplification for pre-trained large language models | |
CN112749082B (en) | Test case generation method and system based on DE-TH algorithm | |
CN110083807B (en) | Contract modification influence automatic prediction method, device, medium and electronic equipment | |
Nguyen et al. | Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things-SEA4DQ'22 Report | |
CN112860671A (en) | Production factor data abnormity diagnosis method and device | |
CN113849540B (en) | Fault prediction model training and predicting method and device, electronic equipment and medium | |
Zimmermann et al. | All that Glitters Is not Gold: Type‐I Error Controlled Variable Selection from Clinical Trial Data | |
CN111881128B (en) | Big data regression verification method and big data regression verification device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |