CN110661778A - Method and system for testing industrial control network protocol based on reverse analysis fuzzy - Google Patents

Method and system for testing industrial control network protocol based on reverse analysis fuzzy Download PDF

Info

Publication number
CN110661778A
CN110661778A CN201910750014.8A CN201910750014A CN110661778A CN 110661778 A CN110661778 A CN 110661778A CN 201910750014 A CN201910750014 A CN 201910750014A CN 110661778 A CN110661778 A CN 110661778A
Authority
CN
China
Prior art keywords
industrial control
control network
sequence
network protocol
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910750014.8A
Other languages
Chinese (zh)
Inventor
王海翔
缪思薇
周亮
朱朝阳
孙辰军
杨波
余文豪
朱亚运
韩丽芳
应欢
张晓娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
State Grid Hebei Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Gansu Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
State Grid Hebei Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Gansu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI, State Grid Hebei Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Gansu Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201910750014.8A priority Critical patent/CN110661778A/en
Publication of CN110661778A publication Critical patent/CN110661778A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a method and a system for testing an industrial control network protocol based on reverse analysis fuzzy test, belonging to the technical field of network security. The invention comprises the following steps: performing reverse analysis on an industrial control network protocol to generate a rule tree, preprocessing the rule tree and screening out a plurality of subsequences of a candidate key sequence; generating a subsequence set W of the key sequence; forming a protocol state machine; and (3) carrying out variation on the regular tree nodes by using a protocol state machine according to the regular tree rules to generate a fuzzy test case, and carrying out fuzzy test on the target industrial control network protocol. The problem that the traditional fuzzy test method cannot carry out vulnerability mining on the industrial control proprietary protocol is solved, and the generation process of the test case is optimized.

Description

Method and system for testing industrial control network protocol based on reverse analysis fuzzy
Technical Field
The invention relates to the technical field of network security, in particular to a method and a system for testing an industrial control network protocol based on reverse analysis and fuzzy testing.
Background
With the full integration of industrial network, internet and internet of things, the potential network safety hazard is greatly increased. For the industrial control system, the personnel designing the industrial control system only considers the usability of the industrial control system and neglects the safety design of the system. Over the years, safety problems are exposed to the visual field of people, in recent years, serious damages are caused to the society and the people due to frequent industrial control safety events, virus of 'seismic net', blackland power failure events and the like, and at present, a plurality of core technologies of equipment depend on import and have risks at the back door of the equipment, so vulnerability mining becomes a necessary and difficult task.
The traditional vulnerability mining method mainly comprises a reverse analysis technology and a fuzzy test technology. The reverse analysis technology can deeply analyze the source code of the program, and has the defects of high code coverage rate, manual participation, path explosion, high false alarm rate, high missing report rate and the like. Compared with the traditional information system, the environment of the industrial control system is relatively closed, the types of protocols are complicated, and most protocol documents are not disclosed, so that the difficulty of reverse analysis is increased. Therefore, the fuzzy test (Fuzzing) becomes an effective vulnerability mining method.
Fuzz testing is a method of discovering vulnerabilities by providing unexpected inputs to a target system in the form of "black box" tests to monitor the target system for anomalous results. The method has the advantages of high automation degree, high efficiency, no need of source code support, reproducibility, low false alarm rate and the like. However, for some industrial control protocols, especially industrial control proprietary protocols such as some proprietary protocols for manufacturers, siemens S7, PPI protocol, ohronf FINS protocol, GE SRTP protocol, WDBRPC protocol, etc., the protocol construction and session process are difficult to obtain, resulting in an unsatisfactory fuzzy test result; the method has the advantages that the case pertinence is not strong, vulnerability mining work is difficult to develop, the code coverage rate is low, the test cases are sent blindly, the number of the test cases is exploded, and the like. This brings great challenge to industrial control system vulnerability discovery.
The investigator Devarajan of the Tipping-Point corporation, USA, developed the industrial control protocol ICCP (including TPKT and COTP modules), Modbus, DNP3, etc. specifically for the fuzzy test framework Sulley, and published it on the black-hat university in 2007. The method is used for discovering unauthorized command execution bugs, unauthorized data transmission bugs, denial of service bugs caused by buffer overflow and the like existing in an ICS network protocol.
However, because the industrial control protocols are various and mostly are proprietary protocols, the Sulley framework does not cover the industrial control proprietary protocols, for example, some industrial control firmware is applied to embedded systems, such as VxWorks, cedinux, and most systems are modified.
The 2004 PI project, Marshall, Beddoe and the like, originally proposed that protocol formats are extracted according to network traffic, and the basic idea is to introduce biological information to perform a progressive multi-sequence comparison algorithm of protocol messages, thereby laying a research foundation for protocol analysis based on the network traffic. The improved algorithm discover in 2007 adopts a message sequence analysis method to realize complete extraction of message formats, but analysis results lack semantic analysis on protocols. Some subsequent studies are influenced by PI projects, and researchers propose to add an N-gram language model and a hidden Markov model to acquire the migration information of the protocol state.
Because the method is based on large-scale network flow and a protocol message progressive multi-sequence comparison algorithm, the semantics of the protocol can be well analyzed, but a complete protocol format cannot be obtained, because the N-gram algorithm adopts a sliding window L with the size of N to divide the protocol message into N subsequences, along with the division of the message, more noise is mixed in a key sequence of the protocol, and the reverse effect of the protocol is not ideal. Meanwhile, the method does not consider the value range and the constraint of the protocol field, and the efficiency of protocol analysis is low due to a large amount of invalid operations.
Disclosure of Invention
Aiming at the problems, the invention provides a method for testing an industrial control network protocol based on reverse analysis fuzzy, which comprises the following steps:
performing reverse analysis on an industrial control network protocol to generate a rule tree, wherein nodes of the rule tree are the times of occurrence of a plurality of subsequences of a key sequence of an industrial control network protocol message, preprocessing the rule tree, and screening out a plurality of subsequences of a candidate key sequence;
assigning a weight value to each subsequence in a plurality of subsequences of the candidate key sequence based on the probability value of the subsequence, sorting the subsequences of the candidate key sequence according to different weight values, selecting the subsequence in the candidate key sequence ranked before a preset percentage as the subsequence of the key sequence of the industrial control network protocol, and generating a subsequence set W of the key sequence;
constructing an industrial control network protocol format epsilon machine and acquiring a message key type sequence by taking a subsequence set W of a key sequence as input, performing Markov model minimum prediction according to the message key type sequence and the industrial control network protocol format epsilon machine, acquiring a finite state automaton, substituting the message key type sequence into the finite state automaton, and forming a protocol state machine;
and (3) carrying out variation on the regular tree nodes by using a protocol state machine according to the regular tree rules to generate a fuzzy test case, and carrying out fuzzy test on the target industrial control network protocol.
Optionally, the reverse analysis of the industrial control network protocol to generate the rule tree specifically includes:
capturing an industrial control network protocol message, dividing a key sequence of the captured industrial control network protocol message through a sliding window L to obtain a plurality of subsequences of the key sequence, traversing the plurality of subsequences of the captured key sequence through the sliding window L, and generating a rule tree.
Optionally, the depth of the regular tree is n +1, and n is greater than or equal to 1.
Optionally, preprocessing the rule tree and screening out a plurality of subsequences of the candidate key sequence, specifically including: and counting the access frequency of the subsequences of each key sequence of the rule tree, deleting the sequences with the access frequency lower than a threshold value T, counting the total count of each node, determining the possibly existing boundary and extracting the subsequences of the candidate key sequences.
Optionally, the rule tree is generated according to a preset letter sequence.
Optionally, the constructing an industrial control network protocol format epsilon machine and obtaining a message key type sequence specifically include:
capturing an industrial control protocol session, extracting key message types in a subsequence set W of a key sequence, inputting the key message types into a hidden Markov model, forming the industrial control protocol session into a message key type sequence, and generating an industrial control network protocol format epsilon machine.
The invention also provides a system based on the reverse analysis fuzzy test industrial control network protocol, which comprises:
the system comprises a spanning tree module, a rule tree generation module and a rule analysis module, wherein the spanning tree module is used for performing reverse analysis on an industrial control network protocol to generate the rule tree, nodes of the rule tree are the times of occurrence of a plurality of subsequences of a key sequence of an industrial control network protocol message, preprocessing the rule tree and screening out a plurality of subsequences of a candidate key sequence;
the screening module is used for endowing each subsequence in the multiple subsequences of the candidate key sequence with a weight value based on the probability value of the subsequence, sequencing the multiple subsequences of the candidate key sequence according to the difference of the weight values, selecting the subsequence in the candidate key sequence ranked before a preset percentage as the subsequence of the key sequence of the industrial control network protocol, and generating a subsequence set W of the key sequence;
the system comprises a construction state machine module, a protocol state machine module and a data processing module, wherein the construction state machine module takes a subsequence set W of a key sequence as input to construct an industrial control network protocol format epsilon machine and obtain a message key type sequence, carries out Markov model minimum prediction according to the message key type sequence and the industrial control network protocol format epsilon machine, obtains a finite state automaton, and substitutes the message key type sequence into the finite state automaton to form the protocol state machine;
and the test module is used for mutating the regular tree nodes by using the protocol state machine according to the regular tree rules to generate a fuzzy test case and carrying out fuzzy test on the target industrial control network protocol.
Optionally, the reverse analysis of the industrial control network protocol to generate the rule tree specifically includes:
capturing an industrial control network protocol message, dividing a key sequence of the captured industrial control network protocol message through a sliding window L to obtain a plurality of subsequences of the key sequence, traversing the plurality of subsequences of the captured key sequence through the sliding window L, and generating a rule tree.
Optionally, the depth of the regular tree is n +1, and n is greater than or equal to 1.
Optionally, preprocessing the rule tree and screening out a plurality of subsequences of the candidate key sequence, specifically including: and counting the access frequency of the subsequences of each key sequence of the rule tree, deleting the sequences with the access frequency lower than a threshold value T, counting the total count of each node, determining the possibly existing boundary and extracting the subsequences of the candidate key sequences.
Optionally, the rule tree is generated according to a preset letter sequence.
Optionally, the constructing an industrial control network protocol format epsilon machine and obtaining a message key type sequence specifically include:
capturing an industrial control protocol session, extracting key message types in a subsequence set W of a key sequence, inputting the key message types into a hidden Markov model, forming the industrial control protocol session into a message key type sequence, and generating an industrial control network protocol format epsilon machine.
The invention adopts a method of combining protocol reverse analysis and fuzzy test to solve the problems that the use case has poor pertinence due to lack of protocol analysis by the fuzzy test, the blind sending of the test case is rejected by target equipment, a large number of test cases and resources are wasted, and the like. In the process of reverse protocol analysis, a protocol grammar model is firstly constructed according to a data protocol, a grammar tree is constructed by combining the reverse protocol analysis, and then a grammar element, namely a node, is mutated by utilizing the rule of the model, so that a fuzzy test case is generated, and a target is subjected to fuzzy test.
The invention provides a method for generating a specific fuzzy test case aiming at an industrial control special protocol by adopting a mode of combining reverse analysis and fuzzy test. The problem that the traditional fuzzy test method cannot carry out vulnerability mining on the industrial control proprietary protocol is solved, and the generation process of the test case is optimized.
Drawings
FIG. 1 is a flow chart of a method for testing industrial control network protocol based on reverse analysis fuzzy test according to the present invention;
FIG. 2 is a syntax tree diagram of a method for reverse-analysis-based fuzzy test of industrial control network protocol according to the present invention;
FIG. 3 is a system structure diagram of an industrial control network protocol based on reverse analysis fuzzy test according to the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
The invention provides a method for testing an industrial control network protocol based on reverse analysis fuzzy, which comprises the following steps as shown in figure 1:
performing reverse analysis on an industrial control network protocol to generate a rule tree, wherein nodes of the rule tree are the times of occurrence of a plurality of subsequences of a key sequence of an industrial control network protocol message, preprocessing the rule tree, and screening out a plurality of subsequences of a candidate key sequence;
the method specifically comprises the following steps:
firstly, constructing a protocol syntax tree based on a virtual machine interface (VCA) of an effective counting method, selecting a protocol message boundary position with the highest possibility by a VCA algorithm through a sliding window L, and dividing the message.
To implement a sliding window L of size n, a search tree structure is used to store combinations of characters that may appear in the data stream.
As shown in fig. 2, a search tree is generated by arbitrarily setting a letter sequence "abacbcabacad".
The depth of the rule tree and the search tree is n + 1;
dividing the character according to a two-item convention, the first item representing the occurrence of the sub-sequence as a whole in the character streamThe storage is performed by taking the whole as a unit, and the concept of an internal entropy value (InternalEncopy) in the information theory is introduced and is expressed as HIThe expression is shown as formula (1).
HI=-logP(ω) (1)
Wherein HIFor the total entropy, ω is the subsequence, and P (ω) is the probability that the subsequence appears as a whole. When H is presentIThe smaller the value represented, the greater the probability that the subsequence as a whole appears, and the smaller the uncertainty.
Second, Boundary Entropy (Boundary Entropy), then HBAnd (omega) is the boundary entropy of omega, namely if the bytes after the subsequence omega are changed frequently, the byte after the subsequence omega is considered as the boundary of the subsequence, and the expression is shown in a formula (2).
Figure BDA0002166869940000071
In order to meet the calculation of entropy values of subsequences with different lengths, the expression is normalized to express the two expressions EI、EBThe formula is shown in formula (3) and formula (4).
Figure BDA0002166869940000072
The effective counting method is divided into two stages, wherein the first stage is a counting stage, and counting operation is carried out on positions which may be system boundaries in a sliding window with the size of n, and expressions are shown as formulas (5) and (6).
Figure BDA0002166869940000074
Figure BDA0002166869940000075
Wherein the content of the first and second substances,
Figure BDA0002166869940000076
representing interior and boundary points that increase at i, j ∈ [0, N]And V (x) represents the total count of the x points, and the calculation method is shown as the formula (7).
Figure BDA0002166869940000077
If (y ═ x), the function value is 1, otherwise it is 0.
The second stage is a statistical stage, and the following two conventions need to be followed when judging whether x is a boundary:
a) the total number of points x is larger than that of the adjacent points;
b) the total count is greater than a set threshold T;
and after the algorithm is finished, the possible boundary of the protocol message character sequence can be obtained, and the character sequence is divided according to the semantics. However, the VCA algorithm is likely to cause node space explosion during the process of processing protocol messages.
The observation shows that the node use has a local regularity. That is, most of the nodes have a low use frequency, and a few of the nodes have a high use frequency, and if capturing and storing all the nodes require a huge storage space, most of the nodes have no practical significance, and only a subset with a high frequency needs to be concerned.
The invention utilizes LCA (loss Counting Algorithm) to carry out pruning operation and line pruning operation on a grammar tree, the Algorithm organizes input into blocks, each block is counted, the Counting is reduced for each element in the block in a pruning period until the Counting is reduced to a set threshold value T, the Counting is deleted, and the maximum fault-tolerant rate e is a most key parameter in the LCA Algorithm;
generally, a lower fault tolerance rate can make the memory resource pressure greater, and a higher fault tolerance rate can cause a mis-pruning operation, i.e. a key sequence is deleted. The invention provides an improved effective counting algorithm IVCA, namely a feedback regulation link is added, and the frequency of occurrence of sequences which do not appear in a syntax tree is set to be e/2, so that all subsequences are fully considered during counting. The IVCA algorithm pseudo code first part algorithm defines abstract data types of trees and nodes and declares each variable, and the algorithm first part pseudo code is as shown in table (1):
watch (1)
Figure BDA0002166869940000081
Wherein, P is flow data, total byte number processed by LCA algorithm during pruning, L is sliding window, and T is threshold value set in algorithm statistical stage.
The second part of algorithm counts the access frequency of each subsequence, performs pruning operation, namely deletes the sequence with the frequency lower than the threshold value T, constructs a protocol syntax tree, completes feedback regulation, counts the total count of each node, determines the possibly existing boundary and extracts the key sequence, as shown in the table (2);
watch (2)
Figure BDA0002166869940000091
However, the final key sequence cannot be obtained through the IVCA algorithm, namely only candidate key sequences can be generated, but the subsequences cannot uniquely distinguish the protocols, and in order to select the key sequence of the protocol, the TF-IDF method is adopted.
Assigning a weight value to each subsequence in the multiple subsequences of the candidate key sequence based on the probability value of the subsequence, sorting the multiple subsequences of the candidate key sequence according to different weight values, selecting subsequences of the candidate key sequence ranked before a predetermined percentage, for example, the top 10% to 20% of the candidate key sequence as subsequences of the key sequence of the industrial control network protocol, and generating a subsequence set W of the key sequence;
the method specifically comprises the following steps:
TF-IDF (Term Frequency-Inverse Document-Frequency) is a statistical method widely used in the field of data mining, is a weighting technology and can be used as a standard for sequencing key sequences. The TF-IDF calculation method is shown as the formula (8).
TF-IDF(x)=TF(x)*IDF(x) (8)
Wherein TF is the word frequency, i.e. the number of times of occurrence of the sub-sequence, and IDF is the inverse text frequency, i.e. if one sub-sequence occurs in multiple sequences, the corresponding IDF value will be lower, and the IDF calculation method is shown in formula (9).
Wherein N represents the total number of messages in the message sequence, and N (x) is the total number of messages containing subsequence elements in the text.
And the TF-IDF comprehensively considers the word frequency and the specific position. For example, in a general protocol message, header information, such as source and destination addresses, occurs frequently, but has little practical significance for outputting a message format. Therefore, the TF-IDF technology is suitable for extracting the key sequence of the industrial control protocol message.
And sequencing the subsequences in the candidate key sequences according to the difference of the weights, and selecting the subsequences with the top rank as the real key sequences of the protocol.
Constructing an industrial control network protocol format epsilon machine and acquiring a message key type sequence by taking a subsequence set W of a key sequence as input, performing Markov model minimum prediction according to the message key type sequence and the industrial control network protocol format epsilon machine, acquiring a finite state automaton, substituting the message key type sequence into the finite state automaton, and forming a protocol state machine;
the method specifically comprises the following steps:
and obtaining a subsequence set W of key sequences formed by k subsequences before ranking through a TF-IDF algorithm. Aiming at the generation of the fuzzy test case, the invention provides a causal segmentation reconstruction epsilon machine algorithm based on a hidden Markov model.
W is the input to the hidden markov model.
Hidden Markov Models (HMMs) are used to describe a statistical Model that determines Hidden parameters in a stochastic process by observing only the resulting parameters. The industrial control protocol can be regarded as a random process, and the character string sequence in the protocol message can be regarded as occurring with a certain probability, so that the conversation process can be regarded as a Markov process. The ε machine is a special case of an HMM that can make optimal predictions for the Markov process.
The ε machine generally consists of a state transition matrix A and a causal equation ε, where the causal equation refers to the past versus past set of mappings. The definition is shown in formula (10).
Figure BDA0002166869940000111
Wherein
Figure BDA0002166869940000113
Represents the past state of the random process,representing a future state of the process,
Figure BDA0002166869940000115
representing the last L characters of the past state,
Figure BDA0002166869940000116
the first L characters representing the future state. The method for constructing the epsilon machine comprises two methods, one is to adopt a causal state segmentation reconstruction algorithm, the method can restore the implicit information in the original sequence after reconstruction, and the second is to adopt a state merging epsilon machine to deduce the reconstruction algorithm. Compared with the prior art, the advantage of the causal segmentation reconstruction algorithm is larger in reconstruction performance and convergence rate. The algorithmic idea is to segment a complex system into a finite set of causal states. The pseudocode of the causal state partitioning algorithm is shown in the table (3) and the table (4); where W is the set of extracted key sequences used to construct the alphabet of the HMM,
Figure BDA0002166869940000112
for subsequences extracted from W, LmaxFor causal estimation of the maximum length of the sequence, alpha is the confidence level for the protocolAnd (5) constructing an epsilon machine.
The first part of the algorithm first makes a null hypothesis, sets all events of the process to belong to the same state, then predicts the distribution of the next time, tests the distribution by K-S (Kolmogorov-Smirnov), adds the events to the existing distribution state if the test is passed, and separates and forms a new state if any event is not passed.
Watch (3)
The number of last states is typically quite large, and the second part of the algorithm will remove most of the transient states and the loop state will be retained. Finally, the message format epsilon machine is obtained. Capturing protocol sessions, extracting key message types, and forming a message key type sequence by a protocol session process, wherein algorithm pseudo codes are shown in a table (4);
watch (4)
And performing Markov model minimum prediction according to the message key sequence and the protocol format epsilon machine. At this time, a Finite state automaton (DFA) is obtained, and the protocol state conversion is substituted into the DFA to form a Finite deterministic automaton, i.e., a protocol state machine.
The protocol state machine mutates syntax elements, i.e., nodes, using rules of the syntax tree model. And automatically generating the test case by adopting a fuzzy test case generation technology based on grammar driving.
The present invention further provides a system 200 based on a reverse analysis fuzzy test industrial control network protocol, as shown in fig. 3, including: ,
the spanning tree module 201 performs inverse analysis on the industrial control network protocol to generate a rule tree, and specifically includes:
capturing an industrial control network protocol message, dividing a key sequence of the captured industrial control network protocol message through a sliding window L to obtain a plurality of subsequences of the key sequence, traversing the plurality of subsequences of the captured key sequence through the sliding window L again to generate a rule tree, wherein the depth of the rule tree is n +1, n is greater than or equal to 1, and the rule tree is generated according to a preset letter sequence.
The nodes of the rule tree are the times of occurrence of a plurality of subsequences of the industrial control network protocol message key sequence, and the rule tree is preprocessed and a plurality of subsequences of candidate key sequences are screened out;
the method comprises the following steps of preprocessing a rule tree and screening out a plurality of subsequences of a candidate key sequence, wherein the method specifically comprises the following steps: and counting the access frequency of the subsequences of each key sequence of the rule tree, deleting the sequences with the access frequency lower than a threshold value T, counting the total count of each node, determining the possibly existing boundary and extracting the subsequences of the candidate key sequences.
The screening module 202 assigns a weight value to each subsequence in the multiple subsequences of the candidate key sequence based on the probability value of the subsequence, sorts the multiple subsequences of the candidate key sequence according to the difference of the weight values, selects subsequences of 10% -20% of the candidate key sequences in the top rank as subsequences of the key sequence of the industrial control network protocol, and generates a subsequence set W of the key sequence;
the construction state machine module 203 constructs an industrial control network protocol format epsilon machine and obtains a message key type sequence by taking the subsequence set W of the key sequence as input, performs Markov model minimum prediction according to the message key type sequence and the industrial control network protocol format epsilon machine, obtains a finite state automata, substitutes the message key type sequence into the finite state automata, and forms a protocol state machine;
the method for constructing the industrial control network protocol format epsilon machine and acquiring the key type sequence of the message specifically comprises the following steps:
capturing an industrial control protocol session, extracting key message types in a subsequence set W of a key sequence, inputting the key message types into a hidden Markov model, forming the industrial control protocol session into a message key type sequence, and generating an industrial control network protocol format epsilon machine.
The test module 204 uses a protocol state machine to perform variation on the regular tree nodes according to the regular tree rules, generates a fuzzy test case, and performs fuzzy test on the target industrial control network protocol.
The invention adopts a method of combining protocol reverse analysis and fuzzy test to solve the problems that the use case has poor pertinence due to lack of protocol analysis by the fuzzy test, the blind sending of the test case is rejected by target equipment, a large number of test cases and resources are wasted, and the like. In the process of reverse protocol analysis, a protocol grammar model is firstly constructed according to a data protocol, a grammar tree is constructed by combining the reverse protocol analysis, and then a grammar element, namely a node, is mutated by utilizing the rule of the model, so that a fuzzy test case is generated, and a target is subjected to fuzzy test.
The invention provides a method for generating a specific fuzzy test case aiming at an industrial control special protocol by adopting a mode of combining reverse analysis and fuzzy test. The problem that the traditional fuzzy test method cannot carry out vulnerability mining on the industrial control proprietary protocol is solved, and the generation process of the test case is optimized.

Claims (12)

1. A method for testing industrial control network protocols based on reverse analysis fuzzy, the method comprising:
performing reverse analysis on an industrial control network protocol to generate a rule tree, wherein nodes of the rule tree are the times of occurrence of a plurality of subsequences of a key sequence of an industrial control network protocol message, preprocessing the rule tree, and screening out a plurality of subsequences of a candidate key sequence;
assigning a weight value to each subsequence in a plurality of subsequences of the candidate key sequence based on the probability value of the subsequence, sorting the subsequences of the candidate key sequence according to different weight values, selecting the subsequence in the candidate key sequence ranked before a preset percentage as the subsequence of the key sequence of the industrial control network protocol, and generating a subsequence set W of the key sequence;
constructing an industrial control network protocol format epsilon machine and acquiring a message key type sequence by taking a subsequence set W of a key sequence as input, performing Markov model minimum prediction according to the message key type sequence and the industrial control network protocol format epsilon machine, acquiring a finite state automaton, substituting the message key type sequence into the finite state automaton, and forming a protocol state machine;
and (3) carrying out variation on the regular tree nodes by using a protocol state machine according to the regular tree rules to generate a fuzzy test case, and carrying out fuzzy test on the target industrial control network protocol.
2. The method according to claim 1, wherein the inverse analysis of the industrial control network protocol to generate the rule tree specifically includes:
capturing an industrial control network protocol message, dividing a key sequence of the captured industrial control network protocol message through a sliding window L to obtain a plurality of subsequences of the key sequence, traversing the plurality of subsequences of the captured key sequence through the sliding window L, and generating a rule tree.
3. The method of claim 2, wherein the rule tree has a depth of n +1, and n is greater than or equal to 1.
4. The method of claim 1, wherein the preprocessing the rule tree and screening out a plurality of subsequences of the candidate key sequence comprises: and counting the access frequency of the subsequences of each key sequence of the rule tree, deleting the sequences with the access frequency lower than a threshold value T, counting the total count of each node, determining the possibly existing boundary and extracting the subsequences of the candidate key sequences.
5. The method of claim 1, wherein the rule tree is generated according to a predetermined letter sequence.
6. The method according to claim 1, wherein the constructing an industrial control network protocol format epsilon machine and the obtaining a message key type sequence specifically comprise:
capturing an industrial control protocol session, extracting key message types in a subsequence set W of a key sequence, inputting the key message types into a hidden Markov model, forming the industrial control protocol session into a message key type sequence, and generating an industrial control network protocol format epsilon machine.
7. A system for reverse-analysis-based fuzzy testing of industrial control network protocols, the system comprising:
the system comprises a spanning tree module, a rule tree generation module and a rule analysis module, wherein the spanning tree module is used for performing reverse analysis on an industrial control network protocol to generate the rule tree, nodes of the rule tree are the times of occurrence of a plurality of subsequences of a key sequence of an industrial control network protocol message, preprocessing the rule tree and screening out a plurality of subsequences of a candidate key sequence;
the screening module is used for endowing each subsequence in the multiple subsequences of the candidate key sequence with a weight value based on the probability value of the subsequence, sequencing the multiple subsequences of the candidate key sequence according to the difference of the weight values, selecting the subsequence in the candidate key sequence ranked before a preset percentage as the subsequence of the key sequence of the industrial control network protocol, and generating a subsequence set W of the key sequence;
the system comprises a construction state machine module, a protocol state machine module and a data processing module, wherein the construction state machine module takes a subsequence set W of a key sequence as input to construct an industrial control network protocol format epsilon machine and obtain a message key type sequence, carries out Markov model minimum prediction according to the message key type sequence and the industrial control network protocol format epsilon machine, obtains a finite state automaton, and substitutes the message key type sequence into the finite state automaton to form the protocol state machine;
and the test module is used for mutating the regular tree nodes by using the protocol state machine according to the regular tree rules to generate a fuzzy test case and carrying out fuzzy test on the target industrial control network protocol.
8. The system of claim 7, wherein the inverse analysis of the industrial control network protocol to generate the rule tree specifically comprises:
capturing an industrial control network protocol message, dividing a key sequence of the captured industrial control network protocol message through a sliding window L to obtain a plurality of subsequences of the key sequence, traversing the plurality of subsequences of the captured key sequence through the sliding window L, and generating a rule tree.
9. The system of claim 7, wherein the regular tree has a depth of n +1, n being greater than or equal to 1.
10. The system of claim 7, wherein the preprocessing the rule tree and screening out a plurality of subsequences of the candidate key sequence comprises: and counting the access frequency of the subsequences of each key sequence of the rule tree, deleting the sequences with the access frequency lower than a threshold value T, counting the total count of each node, determining the possibly existing boundary and extracting the subsequences of the candidate key sequences.
11. The system of claim 7, wherein the rule tree is generated according to a predetermined letter sequence.
12. The system according to claim 7, wherein the constructing an industrial control network protocol format epsilon machine and the obtaining a message key type sequence specifically comprise:
capturing an industrial control protocol session, extracting key message types in a subsequence set W of a key sequence, inputting the key message types into a hidden Markov model, forming the industrial control protocol session into a message key type sequence, and generating an industrial control network protocol format epsilon machine.
CN201910750014.8A 2019-08-14 2019-08-14 Method and system for testing industrial control network protocol based on reverse analysis fuzzy Pending CN110661778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910750014.8A CN110661778A (en) 2019-08-14 2019-08-14 Method and system for testing industrial control network protocol based on reverse analysis fuzzy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910750014.8A CN110661778A (en) 2019-08-14 2019-08-14 Method and system for testing industrial control network protocol based on reverse analysis fuzzy

Publications (1)

Publication Number Publication Date
CN110661778A true CN110661778A (en) 2020-01-07

Family

ID=69037480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910750014.8A Pending CN110661778A (en) 2019-08-14 2019-08-14 Method and system for testing industrial control network protocol based on reverse analysis fuzzy

Country Status (1)

Country Link
CN (1) CN110661778A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111830928A (en) * 2020-06-08 2020-10-27 杭州电子科技大学 Fuzzy test method for industrial control equipment firmware
CN112019403A (en) * 2020-08-24 2020-12-01 杭州弈鸽科技有限责任公司 Cross-platform automatic mining method and system for message protocol state machine of Internet of things
CN112039196A (en) * 2020-04-22 2020-12-04 广东电网有限责任公司 Power monitoring system private protocol analysis method based on protocol reverse engineering
CN112153030A (en) * 2020-09-15 2020-12-29 杭州弈鸽科技有限责任公司 Internet of things protocol security automatic analysis method and system based on formal verification
CN112312590A (en) * 2020-10-10 2021-02-02 腾讯科技(深圳)有限公司 Equipment communication protocol identification method and device
CN113535731A (en) * 2021-07-21 2021-10-22 北京威努特技术有限公司 Heuristic message state interactive self-learning method and device
CN114189382A (en) * 2021-12-10 2022-03-15 中国电子科技集团公司第十五研究所 Fuzzy test-based automatic analysis vulnerability mining device for network protocol
CN114501458A (en) * 2022-01-27 2022-05-13 重庆邮电大学 WIA-PA protocol fuzz test data generation method based on extended finite-state machine
CN115174276A (en) * 2022-09-07 2022-10-11 国网江西省电力有限公司电力科学研究院 Vulnerability mining method and system for competitive industrial control system
CN115242424A (en) * 2022-05-31 2022-10-25 东南大学 Private network protocol classification method based on state machine subgraph isomorphic matching
CN116614421A (en) * 2023-05-24 2023-08-18 岭东核电有限公司 S5 protocol robustness testing method and device
CN116663019A (en) * 2023-07-06 2023-08-29 华中科技大学 Source code vulnerability detection method, device and system
CN117290856A (en) * 2023-11-14 2023-12-26 广州红海云计算股份有限公司 Intelligent test management system based on software automation test technology
CN117435506A (en) * 2023-12-15 2024-01-23 中兴通讯股份有限公司 Fuzzy test method, electronic device and computer readable storage medium
CN117472787A (en) * 2023-12-27 2024-01-30 山东泽鹿安全技术有限公司 Test case generation method, device, medium and equipment for vehicle-mounted computer fuzzy test

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104168288A (en) * 2014-08-27 2014-11-26 中国科学院软件研究所 Automatic vulnerability discovery system and method based on protocol reverse parsing
CN104796240A (en) * 2015-04-30 2015-07-22 北京理工大学 Fuzz testing system for stateful network protocol
CN105245403A (en) * 2015-10-27 2016-01-13 国网智能电网研究院 Power-grid industrial control protocol vulnerability mining system and method based on fuzzy test
CN105763392A (en) * 2016-02-19 2016-07-13 中国人民解放军理工大学 Industrial control protocol fuzzing test method based on protocol state
US9432394B1 (en) * 2015-03-16 2016-08-30 Ixia Methods, systems, and computer readable media for converging on network protocol stack vulnerabilities using fuzzing variables, vulnerability ratings and progressive convergence
CN107241226A (en) * 2017-06-29 2017-10-10 北京工业大学 Fuzz testing method based on industry control proprietary protocol

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104168288A (en) * 2014-08-27 2014-11-26 中国科学院软件研究所 Automatic vulnerability discovery system and method based on protocol reverse parsing
US9432394B1 (en) * 2015-03-16 2016-08-30 Ixia Methods, systems, and computer readable media for converging on network protocol stack vulnerabilities using fuzzing variables, vulnerability ratings and progressive convergence
CN104796240A (en) * 2015-04-30 2015-07-22 北京理工大学 Fuzz testing system for stateful network protocol
CN105245403A (en) * 2015-10-27 2016-01-13 国网智能电网研究院 Power-grid industrial control protocol vulnerability mining system and method based on fuzzy test
CN105763392A (en) * 2016-02-19 2016-07-13 中国人民解放军理工大学 Industrial control protocol fuzzing test method based on protocol state
CN107241226A (en) * 2017-06-29 2017-10-10 北京工业大学 Fuzz testing method based on industry control proprietary protocol

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王海翔等: "基于逆向分析的工控协议模糊测试方法", 《电力信息与通信技术》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112039196A (en) * 2020-04-22 2020-12-04 广东电网有限责任公司 Power monitoring system private protocol analysis method based on protocol reverse engineering
CN111830928B (en) * 2020-06-08 2021-07-30 杭州电子科技大学 Fuzzy test method for industrial control equipment firmware
CN111830928A (en) * 2020-06-08 2020-10-27 杭州电子科技大学 Fuzzy test method for industrial control equipment firmware
CN112019403A (en) * 2020-08-24 2020-12-01 杭州弈鸽科技有限责任公司 Cross-platform automatic mining method and system for message protocol state machine of Internet of things
CN112019403B (en) * 2020-08-24 2021-10-01 杭州弈鸽科技有限责任公司 Cross-platform automatic mining method and system for message protocol state machine of Internet of things
CN112153030A (en) * 2020-09-15 2020-12-29 杭州弈鸽科技有限责任公司 Internet of things protocol security automatic analysis method and system based on formal verification
CN112153030B (en) * 2020-09-15 2022-04-12 杭州弈鸽科技有限责任公司 Internet of things protocol security automatic analysis method and system based on formal verification
CN112312590A (en) * 2020-10-10 2021-02-02 腾讯科技(深圳)有限公司 Equipment communication protocol identification method and device
CN113535731A (en) * 2021-07-21 2021-10-22 北京威努特技术有限公司 Heuristic message state interactive self-learning method and device
CN113535731B (en) * 2021-07-21 2024-04-16 北京威努特技术有限公司 Heuristic-based message state interaction self-learning method and device
CN114189382A (en) * 2021-12-10 2022-03-15 中国电子科技集团公司第十五研究所 Fuzzy test-based automatic analysis vulnerability mining device for network protocol
CN114189382B (en) * 2021-12-10 2023-03-07 中国电子科技集团公司第十五研究所 Fuzzy test-based automatic analysis vulnerability mining device for network protocol
CN114501458A (en) * 2022-01-27 2022-05-13 重庆邮电大学 WIA-PA protocol fuzz test data generation method based on extended finite-state machine
CN115242424A (en) * 2022-05-31 2022-10-25 东南大学 Private network protocol classification method based on state machine subgraph isomorphic matching
CN115174276B (en) * 2022-09-07 2022-12-30 国网江西省电力有限公司电力科学研究院 Competitive industrial control system vulnerability mining method and system
CN115174276A (en) * 2022-09-07 2022-10-11 国网江西省电力有限公司电力科学研究院 Vulnerability mining method and system for competitive industrial control system
CN116614421A (en) * 2023-05-24 2023-08-18 岭东核电有限公司 S5 protocol robustness testing method and device
CN116614421B (en) * 2023-05-24 2024-02-06 岭东核电有限公司 S5 protocol robustness testing method and device
CN116663019A (en) * 2023-07-06 2023-08-29 华中科技大学 Source code vulnerability detection method, device and system
CN116663019B (en) * 2023-07-06 2023-10-24 华中科技大学 Source code vulnerability detection method, device and system
CN117290856A (en) * 2023-11-14 2023-12-26 广州红海云计算股份有限公司 Intelligent test management system based on software automation test technology
CN117290856B (en) * 2023-11-14 2024-02-23 广州红海云计算股份有限公司 Intelligent test management system based on software automation test technology
CN117435506A (en) * 2023-12-15 2024-01-23 中兴通讯股份有限公司 Fuzzy test method, electronic device and computer readable storage medium
CN117435506B (en) * 2023-12-15 2024-04-16 中兴通讯股份有限公司 Fuzzy test method, electronic device and computer readable storage medium
CN117472787A (en) * 2023-12-27 2024-01-30 山东泽鹿安全技术有限公司 Test case generation method, device, medium and equipment for vehicle-mounted computer fuzzy test
CN117472787B (en) * 2023-12-27 2024-03-15 山东泽鹿安全技术有限公司 Test case generation method, device, medium and equipment for vehicle-mounted computer fuzzy test

Similar Documents

Publication Publication Date Title
CN110661778A (en) Method and system for testing industrial control network protocol based on reverse analysis fuzzy
He et al. Active learning of causal networks with intervention experiments and optimal designs
CN109388565B (en) Software system performance optimization method based on generating type countermeasure network
CN113961922A (en) Malicious software behavior detection and classification system based on deep learning
CN114553983B (en) Deep learning-based high-efficiency industrial control protocol analysis method
CN112507699A (en) Remote supervision relation extraction method based on graph convolution network
CN112685738A (en) Malicious confusion script static detection method based on multi-stage voting mechanism
CN106789871A (en) Attack detection method, device, the network equipment and terminal device
CN112115313A (en) Regular expression generation method, regular expression data extraction method, regular expression generation device, regular expression data extraction device, regular expression equipment and regular expression data extraction medium
CN114297079A (en) XSS fuzzy test case generation method based on time convolution network
Xian et al. A novel intrusion detection method based on clonal selection clustering algorithm
Hlaing Feature selection and fuzzy decision tree for network intrusion detection
CN112887323B (en) Network protocol association and identification method for industrial internet boundary security
Avellaneda et al. Inferring DFA without negative examples
CN116662184B (en) Industrial control protocol fuzzy test case screening method and system based on Bert
JI et al. Log Anomaly Detection Through GPT-2 for Large Scale Systems
CN115567305B (en) Sequential network attack prediction analysis method based on deep learning
Jaroszewicz Using interesting sequences to interactively build Hidden Markov Models
CN114362992A (en) Hidden Markov attack chain prediction method and device based on SNORT log
Bai et al. Boosting performance in attack intention recognition by integrating multiple techniques
Boyko et al. Development Application for Traffic Classification Using the Neural Network Approach
CN112435151A (en) Government affair information data processing method and system based on correlation analysis
Jaroszewicz Interactive HMM construction based on interesting sequences
Zheng et al. Multistart global optimization with tunnelling and an evolutionary strategy supervised by a martingale
Blanco et al. Study on the Statistics of the English

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200107

RJ01 Rejection of invention patent application after publication