CN117235745B

CN117235745B - Deep learning-based industrial control vulnerability mining method, system, equipment and storage medium

Info

Publication number: CN117235745B
Application number: CN202311517554.4A
Authority: CN
Inventors: 马振肖; 梁玉平; 梁玉龙; 梁国春
Original assignee: Beijing Dongfang Sentai Technology Development Co ltd
Current assignee: Beijing Dongfang Sentai Technology Development Co ltd
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-05-10
Anticipated expiration: 2043-11-15
Also published as: CN117235745A

Abstract

The invention relates to the technical field of industrial control systems, in particular to a deep learning industrial control vulnerability discovery method, a deep learning industrial control vulnerability discovery system, deep learning industrial control vulnerability discovery equipment and a deep learning industrial control vulnerability discovery storage medium, which are used for vulnerability discovery of a communication protocol between an industrial control host and an industrial control terminal. According to the invention, the message sample data set is generated by collecting communication message data in normal and under-attack states, and semantic analysis is carried out on the message sample data set by using a deep cyclic neural network, so that semantic feature vectors of positive samples and negative samples are extracted. And then, generating a plurality of test cases according to the semantic feature vector by using a strategy gradient method based on reinforcement learning, and sending the test cases to the program to be tested so as to trigger the abnormality of the program to be tested. Meanwhile, the test case, the corresponding time stamp and the communication message data are recorded so as to analyze and locate when the abnormality occurs. The invention can effectively utilize the deep learning technology and the reinforcement learning technology to automatically generate high-quality test cases, and improve the efficiency and effect of industrial control vulnerability mining.

Description

Deep learning-based industrial control vulnerability mining method, system, equipment and storage medium

Technical Field

The invention relates to the technical field of industrial control systems, in particular to a deep learning industrial control vulnerability-based mining method, a deep learning industrial control vulnerability-based mining system, equipment and a storage medium.

Background

An industrial control system (Industrial Control System, ICS) refers to a computer system used to monitor and control industrial processes, such as automation systems in the power, petroleum, chemical, manufacturing industries, etc. Industrial control systems are typically composed of an industrial control host (e.g., a programmable logic controller PLC), an industrial control terminal (e.g., a sensor, an actuator, etc.), a human-computer interface (e.g., an operator station), etc., and exchange data via a specific communication protocol.

With the development of industrial internet, the industrial control system is increasingly connected with an external network, thereby bringing higher security risks. If the industrial control system is invaded or disturbed by a malicious attacker using a vulnerability of the communication protocol, the production process may be abnormal, equipment may be damaged, and even personnel may be injured or killed. Therefore, the method for timely discovering and repairing the communication protocol loopholes of the industrial control system is an important task for guaranteeing industrial safety.

At present, the common industrial control system communication protocol vulnerability mining method mainly comprises two types of static analysis and dynamic analysis. Static analysis refers to the finding of possible logic errors or defects by analyzing protocol implementation codes or specification documents. Dynamic analysis methods refer to the triggering and discovery of potential anomalies or errors by observing and testing the behavior of the protocol as it runs.

The advantage of the static analysis method is that it can cover all functions and logic of the protocol, but there are also some drawbacks such as: (1) The source code or specification document of the protocol needs to be acquired, and the information is often kept secret or incomplete; (2) Precise modeling and analysis of the syntax and semantics of the protocol is required, and these processes tend to be complex and time consuming; (3) A large number of false positives or false negatives may be generated, reducing analysis efficiency and accuracy.

The dynamic analysis method has the advantages that the run-time behavior of the protocol can be directly tested and verified, but has some defects, such as the need of constructing effective and efficient test cases, in the prior art, the test cases are often randomly generated, a large number of invalid test cases exist, the test calculation force is greatly wasted, the vulnerability mining efficiency is reduced, and in addition, all functions and logics of the protocol can not be covered by the randomly generated test cases, so that part of vulnerabilities are omitted.

Therefore, the existing industrial control system communication protocol vulnerability mining method still has some defects, and further improvement and optimization are needed.

The information disclosed in the background section of the application is only for enhancement of understanding of the general background of the application and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The embodiment of the invention provides a deep learning industrial control vulnerability discovery method, which is used for discovering vulnerabilities of a communication protocol between an industrial control host and an industrial control terminal and comprises the following steps:

Collecting communication message data between an industrial control host and an industrial control terminal, marking the communication message data between the industrial control host and the industrial control terminal in a normal working state as a positive sample, marking the communication message data between the industrial control host and the industrial control terminal in an attacked state as a negative sample, and generating a message sample data set;

Inputting the message sample data set into a semantic analysis network, and learning positive samples and negative samples in the message sample data set by the semantic analysis network by using a deep cyclic neural network, so as to extract semantic feature vectors of the positive samples and the negative samples and generate a feature vector set;

Inputting the feature vector set into a generating strategy network, and generating a plurality of test cases by the generating strategy network according to the semantic feature vector by using a strategy gradient method based on reinforcement learning to form a test case queue;

Sequentially sending the test cases in the test case queue to a program to be tested of the industrial control host to trigger the abnormality of the program to be tested;

When the test case is sent to the program to be tested of the industrial control host, the test case and the corresponding timestamp are recorded in a message storage file, and when an abnormality occurs, the message storage file is called out to extract the test case with the abnormality and the corresponding communication message data.

In some embodiments of the present invention, the "collecting communication message data between an industrial control host and an industrial control terminal, marking the communication message data between the industrial control host and the industrial control terminal in a normal working state as a positive sample, marking the communication message data between the industrial control host and the industrial control terminal in an attacked state as a negative sample, and generating a message sample data set" specifically includes:

Selecting a target industrial control protocol and determining communication parameters;

According to the selected target industrial control protocol and the communication parameters, the network sniffing tool or the switch port mirror function is used for collecting the communication message data between the industrial control host and the industrial control terminal in real time;

filtering and analyzing the collected communication message data according to the specification and the characteristics of the target industrial control protocol, and extracting effective message content and fields;

Dividing message data into a positive sample and a negative sample according to the running state of an industrial control host, wherein the positive sample refers to communication message data between the industrial control host and an industrial control terminal in a normal working state and reflects normal functions and logic of a protocol, the negative sample refers to communication message data between the industrial control host and the industrial control terminal in an attacked state and reflects abnormal functions and logic of the protocol, and the attacked state is realized in a mode of artificial interference, malicious injection or simulated attack;

And storing the positive samples and the negative samples in a file or a database according to a certain format and labels to form a message sample data set.

In some embodiments of the invention, the method further comprises:

Performing static instrumentation compiling on the program to be tested for vulnerability mining to track the edge coverage rate of the test case on the code of the program to be tested;

During operation, dynamically recording the edge coverage rate of each test case, and outputting the edge coverage rate after the execution of the program to be tested is finished;

And selecting a plurality of high-quality cases from all the test cases according to the edge coverage rate, performing mutation processing on the high-quality cases to generate a plurality of new test cases, adding the new test cases into the test case queue, sending the new test cases to a program to be tested of the industrial control host, and triggering the program to be tested to be abnormal.

In some embodiments of the invention, the high quality use case comprises:

Among all the test cases, the test case with the top k% of the maximum edge coverage rate; or alternatively

Among all test cases, the test cases with the edge coverage rate larger than the threshold value; or alternatively

Test cases capable of covering new paths in the program to be tested; or alternatively

A test case for causing the program to be tested to generate new state transition; or alternatively

And the test case causing the program to be tested to crash or hang up.

In some embodiments of the present invention, the mutation processing specifically includes:

randomly selecting 2 high-quality cases from the high-quality cases, randomly selecting a field, and splicing the first half section of one high-quality case with the second half section of the other high-quality case by taking the field as a boundary line to generate an intermediate test case;

One or more fields of the intermediate test case are randomly selected, and a value which is both random and suitable for the fields is generated for the fields to replace the original value of the fields.

In some embodiments of the invention, further comprising:

Filling the missing feature values in the semantic feature vectors to form complete semantic feature vectors;

When the missing characteristic value is a category variable, filling the attribute value with the largest occurrence number of the category variable corresponding to the missing characteristic value in all the sample data in the missing part;

When the missing characteristic value is a numerical variable, the missing characteristic value is complemented according to the following formula:

，

Wherein, Is the missing eigenvalue,/>Is the previous recorded eigenvalue of the numerical variable corresponding to the missing eigenvalue,/>Is the eigenvalue/>Line number of location,/>Is the next recorded eigenvalue of the numerical variable corresponding to the missing eigenvalue,/>Is the eigenvalue/>The number of rows where x is the number of rows where the missing eigenvalues are located.

In some embodiments of the invention, further comprising:

inputting the high-quality use cases into a semantic analysis network, and extracting semantic feature vectors of the high-quality use cases;

Adding the semantic feature vector of the high-quality use case into the feature vector set to generate a new feature vector set;

inputting the new feature vector set into the generation strategy network to generate a new test case;

and adding a new test case into the test case queue to send the new test case to a program to be tested of the industrial control host, and triggering the abnormality of the program to be tested.

The invention also provides a deep learning industrial control vulnerability discovery system, which is used for discovering vulnerabilities of communication protocols between an industrial control host and an industrial control terminal, and comprises the following steps:

The sample data acquisition module is used for acquiring communication message data between the industrial control host and the industrial control terminal, marking the communication message data between the industrial control host and the industrial control terminal in a normal working state as a positive sample, marking the communication message data between the industrial control host and the industrial control terminal in an attacked state as a negative sample, and generating a message sample data set;

The feature vector set generation module is used for inputting the message sample data set into a semantic analysis network, and the semantic analysis network learns positive samples and negative samples in the message sample data set by using a deep cyclic neural network, so that semantic feature vectors of the positive samples and the negative samples are extracted, and a feature vector set is generated;

the test case queue generating module is used for inputting the feature vector set into a generating strategy network, and the generating strategy network generates a plurality of test cases by using a strategy gradient method based on reinforcement learning according to the semantic feature vector to form a test case queue;

The program testing module to be tested is used for sequentially sending the test cases in the test case queue to the program to be tested of the industrial control host so as to trigger the abnormality of the program to be tested;

And the data access module is used for recording the test case and the corresponding timestamp thereof in the message storage file when the test case is sent to the program to be tested of the industrial control host, and calling out the message storage file when the abnormality occurs so as to extract the test case and the corresponding communication message data when the abnormality occurs.

The invention also provides a multi-deep learning industrial control vulnerability discovery device, which is used for discovering vulnerabilities of communication protocols between an industrial control host and an industrial control terminal, and comprises the following components:

A processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to perform the method according to any of the embodiments described above.

The invention also provides a storage medium having stored thereon computer program instructions which when executed by a processor implement the method of any of the embodiments described above.

According to the deep learning industrial control vulnerability discovery method, system, equipment and storage medium, provided by the invention, the method, system and equipment are used for discovering vulnerabilities of a communication protocol between an industrial control host and an industrial control terminal. According to the method, the message sample data set is generated by collecting communication message data in normal and under-attack states, and semantic analysis is carried out on the message sample data set by using a deep cyclic neural network, so that semantic feature vectors of positive samples and negative samples are extracted. And then, generating a plurality of test cases according to the semantic feature vector by using a strategy gradient method based on reinforcement learning, and sending the test cases to the program to be tested so as to trigger the abnormality of the program to be tested. Meanwhile, the test case, the corresponding time stamp and the communication message data are recorded so as to analyze and locate when the abnormality occurs. The invention can effectively utilize the deep learning technology and the reinforcement learning technology to automatically generate high-quality test cases, and improve the efficiency and effect of industrial control vulnerability mining.

Drawings

FIG. 1 schematically illustrates a flow chart of a deep learning industrial control vulnerability-based mining method according to an embodiment of the present invention;

FIG. 2 schematically illustrates a block diagram of a deep learning industrial vulnerability-based mining system in accordance with an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.

It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C are comprised, "comprising A, B or C" means that one of A, B, C is comprised, "comprising A, B and/or C" means that any 1 or any 2 or 3 of A, B, C are comprised.

It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.

As used herein, the term "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 schematically illustrates a flow chart of a deep learning industrial control vulnerability discovery method according to an embodiment of the present invention, which is used for vulnerability discovery of a communication protocol between an industrial control host and an industrial control terminal, as shown in fig. 1, and the method includes:

Step S101, collecting communication message data between an industrial control host and an industrial control terminal, marking the communication message data between the industrial control host and the industrial control terminal in a normal working state as a positive sample, marking the communication message data between the industrial control host and the industrial control terminal in an attacked state as a negative sample, and generating a message sample data set.

Step S102, inputting the message sample data set into a semantic analysis network, wherein the semantic analysis network learns positive samples and negative samples in the message sample data set by using a deep cyclic neural network, so that semantic feature vectors of the positive samples and the negative samples are extracted, and a feature vector set is generated.

Step S103, inputting the feature vector set into a generation strategy network, wherein the generation strategy network generates a plurality of test cases by using a strategy gradient method based on reinforcement learning according to the semantic feature vector to form a test case queue.

Step S104, the test cases in the test case queue are sequentially sent to the program to be tested of the industrial control host, so as to trigger the program to be tested to be abnormal.

Step S105, when the test case is sent to the program to be tested of the industrial control host, the test case and the timestamp corresponding to the test case are recorded in the message storage file, and when an abnormality occurs, the message storage file is called out to extract the test case and the corresponding communication message data with the abnormality.

The communication message data collected in step S101 may be actual or simulated industrial control system communication data obtained by capturing packets, simulating or otherwise. The method of labeling the positive and negative samples in step S101 may be manual or automatic. The method of generating the packet sample data set in step S101 may be to mix or store the positive samples and the negative samples separately according to a certain ratio or rule.

The deep recurrent neural network used in step S102 may be a Long Short-Term Memory (LSTM), a threshold recurrent unit network (Gated Recurrent Unit, GRU), or other type of recurrent neural network. The semantic analysis network used in step S102 may be single-layer or multi-layer. The semantic feature vectors extracted in step S102 may be of fixed length or variable length.

Reinforcement learning used in step S103 refers to a machine learning method of learning an optimal behavior strategy by interacting with the environment. The strategy gradient method used in step S103 refers to a reinforcement learning algorithm that optimizes strategy parameters by gradient ascent. The test case generated in step S103 may be whole or partial communication message data. The test case queue formed in step S103 may be ordered or unordered, may be of a fixed length or a variable length, and may be transmitted one or more times.

The method for sending the test case in step S104 may be to communicate with the industrial control host through a network, a serial port or other manners. The method for triggering the abnormality of the program to be tested in step S104 may be determined by monitoring the running state, log, alarm or other indexes of the industrial control host. The exception triggered in step S104 may be a program crash, a deadlock, a memory leak, a performance degradation, a functional error, or other type of exception.

The method of recording the test case and the corresponding timestamp in step S105 may be writing the test case and the timestamp in a text, binary or other format into the message storage file. The method of calling out the message storage file in step S105 may be to read, parse or otherwise obtain the content in the message storage file. The method for extracting the test case with the abnormality and the corresponding communication message data in step S105 may be to compare the time stamp, the search keyword or other ways to locate the test case and the communication message data when the abnormality occurs.

According to the deep learning industrial control vulnerability mining method, the message sample data set is generated by collecting communication message data in normal and attacked states, semantic analysis is carried out on the message sample data set by using a deep cyclic neural network, and semantic feature vectors of positive samples and negative samples are extracted. And then, generating a plurality of test cases according to the semantic feature vector by using a strategy gradient method based on reinforcement learning, and sending the test cases to the program to be tested so as to trigger the abnormality of the program to be tested. Meanwhile, the test case, the corresponding time stamp and the communication message data are recorded so as to analyze and locate when the abnormality occurs. Therefore, the deep learning technology and the reinforcement learning technology can be effectively utilized, high-quality test cases can be automatically generated, and the efficiency and the effect of industrial control vulnerability mining are improved.

In some embodiments of the present invention, the step S101 of collecting communication message data between an industrial control host and an industrial control terminal, marking the communication message data between the industrial control host and the industrial control terminal in a normal working state as a positive sample, marking the communication message data between the industrial control host and the industrial control terminal in an attacked state as a negative sample, and generating a message sample data set specifically includes:

substep S1011, a target industrial control protocol is selected and communication parameters are determined.

In this sub-step, a suitable target industrial control communication protocol, such as Modbus, DNP3, IEC 60870-5-104, is selected according to the characteristics and actual requirements of the industrial control host, and corresponding communication parameters, such as baud rate, check bits, stop bits, data bits, etc., are determined.

For example, if the program to be tested is a PLC program based on the Modbus protocol, the Modbus protocol may be selected as the target industrial control communication protocol, and the communication parameters thereof may be determined to be 9600 baud rate, no check bit, 1 stop bit, 8 data bits, and the like.

And step S1012, according to the selected target industrial control protocol and the communication parameters, the communication message data between the industrial control host and the industrial control terminal are acquired in real time by using a network sniffing tool or using a switch port mirroring function.

In the substep, a proper network sniffing tool or a switch port mirroring function is used to collect communication message data between the industrial control host and the industrial control terminal in real time, and the collected message data is stored in a file or a database.

For example, if the target industrial control communication protocol is the Modbus protocol, a network sniffing tool such as Wireshark may be used to set the filtering condition as the Modbus protocol, designate corresponding communication parameters, collect communication message data between the industrial control host and the industrial control terminal in real time, and store the collected message data in the pcap file.

And step S1013, filtering and analyzing the collected communication message data according to the specification and the characteristics of the target industrial control protocol, and extracting effective message content and fields.

In the substep, the collected communication message data is filtered and analyzed according to the specification and the characteristics of the target industrial control protocol, irrelevant or invalid message data is removed, and effective message contents and fields are extracted.

For example, if the target industrial control communication protocol is the Modbus protocol, the collected message data may be filtered and parsed according to the specifications and features of the Modbus protocol, so as to remove message data of non-Modbus protocol, or error or incomplete message data of the Modbus protocol, and extract effective message content and fields, such as function codes, register addresses, data lengths, data values, and the like.

In sub-step S1014, according to the running state of the industrial control host, the message data is divided into a positive sample and a negative sample, the positive sample refers to the communication message data between the industrial control host and the industrial control terminal in the normal working state and reflects the normal function and logic of the protocol, the negative sample refers to the communication message data between the industrial control host and the industrial control terminal in the attacked state and reflects the abnormal function and logic of the protocol, and the attacked state is realized by means of artificial interference, malicious injection or simulated attack.

In this sub-step, the message data is divided into positive and negative samples according to the running state of the industrial control host, and a corresponding label is added to each message data. The positive sample refers to communication message data between the industrial control host and the industrial control terminal in a normal working state, and reflects normal functions and logic of a protocol, such as a read-write register, diagnostic equipment, synchronous time and the like. The negative sample refers to communication message data between the industrial control host and the industrial control terminal in an attacked state, and reflects abnormal functions and logic of a protocol, such as modifying a register, falsifying equipment, falsifying time and the like. The attacked state is realized by means of artificial interference, malicious injection or simulated attack, such as using a network attack tool, writing malicious code or using a simulator, etc.

For example, if the target industrial control communication protocol is the Modbus protocol, the message data may be divided into a positive sample and a negative sample according to the operation state of the industrial control host, and a corresponding tag is added to each message data. The positive sample is communication message data between the industrial control host and the industrial control terminal in a normal working state, and reflects normal functions and logics of a protocol, such as reading and writing a holding register (function code 03H or 06H), reading diagnosis information (function code 08H), writing a plurality of coils (function code 0 FH) and the like. The negative sample refers to communication message data between the industrial control host and the industrial control terminal in an attacked state, and reflects abnormal functions and logic of a protocol, such as modifying a holding register (function code 06H) to an illegal value, forging diagnosis information (function code 08H) to an error code, writing a plurality of coils (function code 0 FH) to a random value, and the like. The attacked state is realized by means of artificial interference, malicious injection or simulation attack, such as using Metasploit and other network attack tools, writing Python and other malicious codes, using ModbusPal and other simulators, and the like.

Substep S1015, storing the positive samples and the negative samples in a file or database according to a certain format and a label, to form a packet sample data set.

In this sub-step, the positive and negative samples are stored in a file or database in a format and tag to form a message sample dataset. The format may be a common format such as CSV, JSON, XML or a custom format. The tag may be binary (0 or 1), multiple (A, B, C, etc.) or other types.

For example, if the target industrial control communication protocol is the Modbus protocol, the positive samples and the negative samples may be stored in a file according to a CSV format to form a message sample data set. Each row represents a message data and each column represents a field or a tag. Wherein the first column is a label, 0 represents a positive sample, and 1 represents a negative sample. The second column is a function code, the third column is a register address, the fourth column is a data length, and the fifth column is a data value.

The following is a specific example:

，

In some embodiments of the present invention, specifically, step S102 may include the following sub-steps:

Substep S1021, converting the set of message sample data into a numerical matrix.

In this sub-step, the set of message sample data is converted into a matrix of values for input into a semantic analysis network for computation. Specifically, for each message data, the values of the fields or labels are converted into corresponding numerical values, and are spliced into a one-dimensional vector. For all message data, the one-dimensional vectors are arranged into a two-dimensional matrix according to rows.

For example, if the target industrial control protocol is the Modbus protocol, the message sample dataset may be converted into a numerical matrix. For each message data, the values of the fields or labels are converted into corresponding numerical values, and are spliced into a one-dimensional vector. The values of fields such as function codes, register addresses, data lengths, data values and the like are converted into decimal numbers and separated by commas. For all message data, the one-dimensional vectors are arranged into a two-dimensional matrix according to rows. The following is one example:

Sub-step S1022, inputting the numerical matrix into a semantic analysis network, wherein the semantic analysis network learns positive samples and negative samples in a message sample data set by using a deep cyclic neural network, so as to extract semantic feature vectors of the positive samples and the negative samples.

In this sub-step, the numerical matrix is input to a semantic analysis network that learns positive and negative samples in a message sample dataset using a deep cyclic neural network, thereby extracting semantic feature vectors for the positive and negative samples.

The semantic analysis network mainly comprises the following layers:

Input layer: for receiving each row of the matrix of values as an input sequence;

an embedding layer: for converting each value in the input sequence into a low-dimensional dense vector for subsequent computation;

and (3) a circulating layer: the method comprises the steps of encoding an input sequence by using a deep cyclic neural network, and capturing timing characteristics and semantic information of the input sequence;

Output layer: the last hidden state for the output loop layer is used as the semantic feature vector for the input sequence.

The deep recurrent neural network may be a Long Short-Term Memory (LSTM), a threshold recurrent unit network (Gated Recurrent Unit, GRU), or other types of recurrent neural networks. The deep-loop neural network may be single-layer or multi-layer. The semantic feature vector may be of fixed length or variable length.

For example, if the target industrial control protocol is the Modbus protocol, the numerical matrix may be input into a semantic analysis network that learns positive and negative samples in a message sample dataset using a two-layer LSTM, thereby extracting semantic feature vectors for the positive and negative samples.

Substep S1023, storing the semantic feature vectors of the positive sample and the negative sample in a file or a database according to corresponding labels, and generating a feature vector set.

In this sub-step, the semantic feature vectors of the positive and negative samples are stored in a file or database according to the corresponding labels, generating a feature vector set. The feature vector set may be used for subsequent test case generation or other purposes.

For example, if the target industrial control protocol is the Modbus protocol, the semantic feature vectors of the positive and negative samples may be stored in a file in CSV format to form a feature vector set. Each row represents a semantic feature vector and each column represents a feature value. Wherein the first column is a label, 0 represents a positive sample, and 1 represents a negative sample.

In some embodiments of the present invention, specifically, step S103 may include the following sub-steps:

And step S1031, inputting the feature vector set into a generation strategy network, wherein the generation strategy network generates a plurality of test cases according to the semantic feature vector by using a strategy gradient method based on reinforcement learning.

In this sub-step, the feature vector set is input to a generation policy network that generates a plurality of test cases from the semantic feature vectors using a reinforcement learning based policy gradient method.

The generation strategy network mainly comprises the following layers:

input layer: for receiving each row of the feature vector set as an input state;

hidden layer: the method is used for carrying out nonlinear transformation on the input state and extracting high-order characteristics of the input state;

Output layer: the method comprises the steps of outputting a result of a hidden layer as an action probability distribution, and representing probabilities of selecting different actions;

Wherein, the action refers to an operation of modifying or keeping unchanged an input state. For example, if the target industrial control protocol is the Modbus protocol, the action may be to modify the values of fields such as function code, register address, data length, data value, etc., or to leave the values of these fields unchanged.

The generation strategy network optimizes parameters by using a strategy gradient method based on reinforcement learning, so that the generation strategy network can select optimal or suboptimal actions according to input states, and thus, test cases which are more likely to trigger program abnormality to be tested are generated. Specifically, the strategy gradient method comprises the following steps:

initializing parameters of a strategy network;

for each training round, the following steps are performed:

Randomly selecting an input state from the feature vector set;

according to the output action probability distribution of the generated strategy network, randomly selecting an action, and correspondingly modifying the input state to obtain a test case;

the test case is sent to a program to be tested, and a feedback signal is observed;

calculating a reward value of the test case according to the feedback signal, wherein if the feedback signal indicates that the program to be tested is abnormal, the reward value is positive, otherwise, the reward value is negative;

calculating the gradient value of the test case according to the rewarding value and the action probability distribution, and accumulating the gradient value into a gradient cache;

for each training period, the following steps are performed:

Updating parameters of the generated strategy network according to the gradient values in the gradient cache;

And (5) emptying the gradient cache.

And S1032, storing the plurality of test cases generated by the generation strategy network in a file or a database according to a certain sequence to form a test case queue.

In the substep, a plurality of test cases generated by the generation strategy network are stored in a file or a database according to a certain sequence to form a test case queue. The order may be ordered according to the prize value, probability of action, time of generation, or other criteria of the test case. The test case queue may be used for subsequent test execution or other purposes.

In some embodiments of the invention, the method further comprises:

And performing static instrumentation compiling on the program to be tested for vulnerability mining to track the edge coverage rate of the test case on the code of the program to be tested.

In this step, in order to evaluate the code coverage degree of the test case to the program to be tested, static instrumentation compiling is required to be performed on the program to be tested. Static instrumentation compilation refers to inserting some additional statements or functions into the code of the program under test during compilation, and is used to record or output the code edges (i.e., the transitions between two basic blocks) that pass through when the test case is executed. The static instrumentation compilation may be performed using existing tools or methods, such as using a compiler or framework of LLVM1, GCC2, pin3, etc.

In this step, in order to obtain the code coverage of each test case to the program to be tested, the edge coverage of each test case needs to be dynamically recorded during the operation. The edge coverage is the ratio of the number of code edges passed by the test case when it is executed to the total number of code edges. The dynamic recording may be performed using existing tools or methods, for example, using storage media or structures such as files, databases, memory, etc. to store the code edge identifiers that are output when each test case is executed.

And dynamically recording the edge coverage rate of each test case during the operation, and outputting the edge coverage rate after the execution of the program to be tested is finished.

In this step, in order to improve the test efficiency and effect, it is necessary to select a plurality of high-quality cases from all the test cases according to the edge coverage, and to perform mutation processing on the high-quality cases to generate a plurality of new test cases. The high quality case is a test case having a high edge coverage or capable of triggering an abnormality of a program to be tested. The mutation process refers to some random or regular modification of the test case, such as changing the values of certain fields or parameters, adding or deleting certain fields or parameters, and disturbing the sequence of certain fields or parameters. The mutation process may be performed using existing tools or methods, such as fuzzy test tools or frameworks like AFL6, radamsa, peach.

In this step, in order to improve the test efficiency and effect, it is necessary to select a plurality of high-quality cases from all the test cases according to the edge coverage, and to perform mutation processing on the high-quality cases to generate a plurality of new test cases. The high quality case is a test case having a high edge coverage or capable of triggering an abnormality of a program to be tested. The mutation process refers to some random or regular modification of the test case, such as changing the values of certain fields or parameters, adding or deleting certain fields or parameters, and disturbing the sequence of certain fields or parameters. The mutation process may be performed using existing tools or methods, such as fuzzy test tools or frameworks like AFL1, radamsa, peach.

In some embodiments of the invention, the high quality use case comprises:

And the test case causing the program to be tested to crash or hang up.

In order to improve the test efficiency and effect, a plurality of high-quality cases need to be selected from all the test cases according to the edge coverage rate, and mutation processing is performed on the high-quality cases to generate a plurality of new test cases. In addition to selecting the high quality use cases according to the edge coverage, the high quality use cases can be selected according to whether the test cases can cover new paths in the program to be tested, can cause the program to be tested to generate new state transitions, and can cause the program to be tested to crash or hang up. The specific value of k% may be selected according to practical situations such as calculation power, test time tension, and the like, and is not limited herein.

And randomly selecting 2 high-quality cases from the high-quality cases, randomly selecting a field, and splicing the first half section of one high-quality case with the second half section of the other high-quality case by taking the field as a boundary line to generate an intermediate test case.

The function of this step is to generate new test cases using the existing combinations of high quality cases to expand the scope and number of test cases. The specific implementation mode is as follows:

firstly, two different high-quality cases are randomly selected from the high-quality cases, wherein each high-quality case is message data conforming to the industrial control protocol specification and consists of a plurality of fields, and each field has a fixed or variable length.

Then, a field is randomly selected from each of the two high quality use cases as a boundary for concatenation. If the length of the field is variable, a length is randomly determined within its range of values.

Then, the first high-quality use case is spliced from the start position to the part before the splicing boundary (including the boundary) and the second high-quality use case is spliced from the part after the splicing boundary (excluding the boundary) to the end position, so that new message data is generated as an intermediate test use case.

And finally, performing checksum calculation on the intermediate test case, and updating the corresponding field value to enable the field value to accord with the industrial control protocol specification.

The function of this step is to generate new test cases using randomness and adaptability to increase variability and effectiveness of the test cases. The specific implementation mode is as follows:

First, one or more fields are randomly selected from the intermediate test cases as targets of mutation. If the length of the field is variable, a length is randomly determined within its range of values.

A value is then generated for the selected field that is both random and appropriate for the field as a result of the variation. The resulting value should meet the following conditions:

Different from the original value to increase variability;

within the value range of the field, the field accords with the industrial control protocol specification;

To some extent similar to the original value to maintain the industrial control protocol logic.

Then, the original value of the selected field is replaced by the generated value, and new message data is generated and used as a variation test case.

And finally, performing checksum calculation on the variation test case, and updating the corresponding field value to enable the field value to accord with the industrial control protocol specification.

The mutation processing is a link in the deep learning-based industrial control protocol vulnerability mining method, and aims to generate new test cases according to the existing high-quality cases so as to increase the diversity and coverage rate of the test cases and further improve the vulnerability mining efficiency and effect.

In some embodiments of the invention, the method further comprises:

，

The function of the step is to process the possible data missing problem in the semantic feature vector, so that the meaning and rule of the message data can be completely reflected. And filling the missing feature values in the semantic feature vectors to form more complete semantic feature vectors, so that the number of samples for semantic analysis network learning is increased.

In some embodiments of the invention, the method further comprises:

Firstly, semantic analysis is carried out on the high-quality use cases by utilizing a trained semantic analysis network, so that the intrinsic meaning and rule of the high-quality use cases are obtained. And secondly, expanding the original feature vector set by utilizing the semantic feature vector of the high-quality use case, thereby increasing the diversity and coverage rate of the feature vector set. The semantic feature vector of the high-quality use case can be combined with the original feature vector set to form a new feature vector set. The new feature vector set contains more semantic information of the message data, and can provide more bases for test case generation. And the new feature vector set may be de-duplicated and ordered to avoid duplication and redundancy. De-duplication refers to deleting the same or similar feature vectors in the feature vector set; the sorting refers to sorting according to the importance or priority of the feature vectors so as to improve the efficiency and effect of test case generation. And thirdly, generating new test cases according to the new feature vector set by utilizing the trained generation strategy network so as to facilitate subsequent vulnerability mining. And finally, carrying out fuzzy test on the program to be tested of the industrial control host by utilizing the new test case, thereby finding out the potential vulnerability of the target communication protocol.

According to the deep learning industrial control vulnerability mining method, communication message data in normal and under-attack states are collected to generate a message sample data set, semantic analysis is carried out on the message sample data set by using a deep cyclic neural network, and semantic feature vectors of positive samples and negative samples are extracted. And then, generating a plurality of test cases according to the semantic feature vector by using a strategy gradient method based on reinforcement learning, and sending the test cases to the program to be tested so as to trigger the abnormality of the program to be tested. Meanwhile, the test case, the corresponding time stamp and the communication message data are recorded so as to analyze and locate when the abnormality occurs. The invention can effectively utilize the deep learning technology and the reinforcement learning technology to automatically generate high-quality test cases, and improve the efficiency and effect of industrial control vulnerability mining. In addition, in the vulnerability discovery process, the number and the quality of the test cases for vulnerability discovery can be enlarged by selecting high-quality cases and performing mutation processing on the high-quality cases, and semantic feature vectors can be optimized by utilizing the high-quality cases, so that the quality of the generated test cases is improved.

Fig. 2 schematically illustrates a block diagram of a deep learning industrial control vulnerability-mining system for vulnerability-mining of a communication protocol between an industrial control host and an industrial control terminal, the deep learning industrial control vulnerability-mining system includes:

The sample data acquisition module 201 is configured to acquire communication message data between an industrial control host and an industrial control terminal, mark the communication message data between the industrial control host and the industrial control terminal in a normal working state as a positive sample, mark the communication message data between the industrial control host and the industrial control terminal in an attacked state as a negative sample, and generate a message sample data set;

The feature vector set generating module 202 is configured to input the packet sample data set into a semantic analysis network, where the semantic analysis network learns positive samples and negative samples in the packet sample data set by using a deep cyclic neural network, so as to extract semantic feature vectors of the positive samples and the negative samples, and generate a feature vector set;

The test case queue generating module 203 is configured to input the feature vector set into a generating policy network, where the generating policy network generates a plurality of test cases according to the semantic feature vector by using a policy gradient method based on reinforcement learning, so as to form a test case queue;

The program testing module 204 to be tested is configured to send the test cases in the test case queue to the program to be tested of the industrial control host in sequence, so as to trigger the abnormality of the program to be tested;

And the data access module 205 is configured to record the test case and its corresponding timestamp in a message storage file when the test case is sent to the program to be tested of the industrial control host, and call out the message storage file when an abnormality occurs, so as to extract the test case and the corresponding communication message data when the abnormality occurs.

The collected communication message data can be actual or simulated industrial control system communication data obtained through packet capturing, simulation or other modes. The method of labeling positive and negative samples may be manual or automatic. The method of generating the message sample data set may be to mix or store the positive and negative samples in proportion or in a regular manner.

The deep recurrent neural network may be a Long Short-Term Memory network (LSTM), a threshold recurrent unit network (Gated Recurrent Unit, GRU), or other type of recurrent neural network. The semantic analysis network may be single-layer or multi-layer. The extracted semantic feature vectors may be of fixed length or variable length.

Reinforcement learning refers to a machine learning method that learns optimal behavior strategies by interacting with the environment. The strategy gradient method refers to a reinforcement learning algorithm for optimizing strategy parameters through gradient rising. The test cases may be full or partial communication message data. The test case queues may be ordered or unordered, may be of a fixed length or variable length, and may be sent a single time or multiple times.

The method for sending the test case can be to communicate with the industrial control host through a network, a serial port or other modes. The method for triggering the abnormality of the program to be detected can be judged by monitoring the running state, log, alarm or other indexes of the industrial control host. The triggered exception may be a program crash, a deadlock, a memory leak, a performance degradation, a functional error, or other type of exception.

The method of recording the test cases and their corresponding time stamps may be writing the test cases and time stamps in text, binary or other formats into a message storage file. The method for calling out the message storage file may be to obtain the content in the message storage file by reading, parsing or other modes. The method for extracting the abnormal test case and the corresponding communication message data can be to compare the time stamp, the search keyword or other modes to locate the test case and the communication message data when the abnormality occurs.

According to the deep learning industrial control vulnerability mining system, a message sample data set is generated by collecting communication message data in normal and attacked states, semantic analysis is carried out on the message sample data set by using a deep cyclic neural network, and semantic feature vectors of positive samples and negative samples are extracted. And then, generating a plurality of test cases according to the semantic feature vector by using a strategy gradient method based on reinforcement learning, and sending the test cases to the program to be tested so as to trigger the abnormality of the program to be tested. Meanwhile, the test case, the corresponding time stamp and the communication message data are recorded so as to analyze and locate when the abnormality occurs. Therefore, the deep learning technology and the reinforcement learning technology can be effectively utilized, high-quality test cases can be automatically generated, and the efficiency and the effect of industrial control vulnerability mining are improved.

A target industrial control protocol is selected and communication parameters are determined.

In this step, a suitable target industrial control communication protocol, such as Modbus, DNP3, IEC 60870-5-104, etc., is selected according to the characteristics and actual requirements of the industrial control host, and corresponding communication parameters, such as baud rate, check bits, stop bits, data bits, etc., are determined.

And according to the selected target industrial control protocol and the communication parameters, the network sniffing tool or the switch port mirror image function is used for collecting the communication message data between the industrial control host and the industrial control terminal in real time.

In the step, communication message data between the industrial control host and the industrial control terminal are acquired in real time by using a proper network sniffing tool or using a port mirror function of the switch, and the acquired message data are stored in a file or a database.

And filtering and analyzing the collected communication message data according to the specification and the characteristics of the target industrial control protocol, and extracting effective message content and fields.

In the step, the collected communication message data is filtered and analyzed according to the specification and the characteristics of the target industrial control protocol, irrelevant or invalid message data is removed, and effective message contents and fields are extracted.

According to the running state of the industrial control host, the message data are divided into a positive sample and a negative sample, wherein the positive sample refers to communication message data between the industrial control host and the industrial control terminal in a normal working state and reflects normal functions and logic of a protocol, the negative sample refers to communication message data between the industrial control host and the industrial control terminal in an attacked state and reflects abnormal functions and logic of the protocol, and the attacked state is realized in a mode of artificial interference, malicious injection or simulated attack.

In the step, according to the running state of the industrial control host, the message data are divided into a positive sample and a negative sample, and corresponding labels are added to each message data. The positive sample refers to communication message data between the industrial control host and the industrial control terminal in a normal working state, and reflects normal functions and logic of a protocol, such as a read-write register, diagnostic equipment, synchronous time and the like. The negative sample refers to communication message data between the industrial control host and the industrial control terminal in an attacked state, and reflects abnormal functions and logic of a protocol, such as modifying a register, falsifying equipment, falsifying time and the like. The attacked state is realized by means of artificial interference, malicious injection or simulated attack, such as using a network attack tool, writing malicious code or using a simulator, etc.

In this step, the positive samples and the negative samples are stored in a file or database according to a certain format and a label to form a message sample data set. The format may be a common format such as CSV, JSON, XML or a custom format. The tag may be binary (0 or 1), multiple (A, B, C, etc.) or other types.

The following is a specific example:

，

In some embodiments of the present invention, specifically, "inputting the packet sample data set into a semantic analysis network, the semantic analysis network learns positive samples and negative samples in the packet sample data set using a deep cyclic neural network, thereby extracting semantic feature vectors of the positive samples and the negative samples, and generating a feature vector set" may include the following sub-steps:

And converting the message sample data set into a numerical matrix.

And inputting the numerical matrix into a semantic analysis network, and learning positive samples and negative samples in a message sample data set by the semantic analysis network by using a deep cyclic neural network, so as to extract semantic feature vectors of the positive samples and the negative samples.

The semantic analysis network mainly comprises the following layers:

And storing the semantic feature vectors of the positive sample and the negative sample in a file or a database according to corresponding labels to generate a feature vector set.

In some embodiments of the present invention, specifically, "inputting the feature vector set into a generating policy network, where the generating policy network generates a plurality of test cases according to the semantic feature vector by using a policy gradient method based on reinforcement learning, and forms a test case queue" may include the following substeps:

Inputting the feature vector set into a generating strategy network, wherein the generating strategy network generates a plurality of test cases according to the semantic feature vector by using a strategy gradient method based on reinforcement learning.

The generation strategy network mainly comprises the following layers:

initializing parameters of a strategy network;

for each training round, the following steps are performed:

Randomly selecting an input state from the feature vector set;

for each training period, the following steps are performed:

And (5) emptying the gradient cache.

And storing the plurality of test cases generated by the generation strategy network in a file or a database according to a certain sequence to form a test case queue.

In some embodiments of the invention, the system further comprises:

And the instrumentation compiling module is used for performing static instrumentation compiling on the program to be tested for performing vulnerability mining so as to track the edge coverage rate of the test case on the code of the program to be tested.

In order to evaluate the code coverage degree of the test case to the program to be tested, static instrumentation compiling is required to be carried out on the program to be tested. Static instrumentation compilation refers to inserting some additional statements or functions into the code of the program under test during compilation, and is used to record or output the code edges (i.e., the transitions between two basic blocks) that pass through when the test case is executed. The static instrumentation compilation may be performed using existing tools or methods, such as using a compiler or framework of LLVM1, GCC2, pin3, etc.

In order to obtain the code coverage condition of each test case to the program to be tested, the edge coverage rate of each test case needs to be dynamically recorded during the operation. The edge coverage is the ratio of the number of code edges passed by the test case when it is executed to the total number of code edges. The dynamic recording may be performed using existing tools or methods, for example, using storage media or structures such as files, databases, memory, etc. to store the code edge identifiers that are output when each test case is executed.

And the edge coverage tracking module is used for dynamically recording the edge coverage of each test case during the operation period and outputting the edge coverage after the execution of the program to be tested is finished.

In order to improve the test efficiency and effect, a plurality of high-quality cases need to be selected from all the test cases according to the edge coverage rate, and mutation processing is performed on the high-quality cases to generate a plurality of new test cases. The high quality case is a test case having a high edge coverage or capable of triggering an abnormality of a program to be tested. The mutation process refers to some random or regular modification of the test case, such as changing the values of certain fields or parameters, adding or deleting certain fields or parameters, and disturbing the sequence of certain fields or parameters. The mutation process may be performed using existing tools or methods, such as fuzzy test tools or frameworks like AFL6, radamsa, peach.

And the high-quality case selection module is used for selecting a plurality of high-quality cases from all the test cases according to the edge coverage rate, carrying out mutation processing on the high-quality cases to generate a plurality of new test cases, adding the new test cases into the test case queue, sending the new test cases to a program to be tested of the industrial control host, and triggering the program to be tested to be abnormal.

In order to improve the test efficiency and effect, a plurality of high-quality cases need to be selected from all the test cases according to the edge coverage rate, and mutation processing is performed on the high-quality cases to generate a plurality of new test cases. The high quality case is a test case having a high edge coverage or capable of triggering an abnormality of a program to be tested. The mutation process refers to some random or regular modification of the test case, such as changing the values of certain fields or parameters, adding or deleting certain fields or parameters, and disturbing the sequence of certain fields or parameters. The mutation process may be performed using existing tools or methods, such as fuzzy test tools or frameworks like AFL1, radamsa, peach.

In some embodiments of the invention, the high quality use case comprises:

And the test case causing the program to be tested to crash or hang up.

The method has the function of generating new test cases by utilizing the combination of the existing high-quality cases so as to expand the range and the number of the test cases. The specific implementation mode is as follows:

The method has the function of generating new test cases by utilizing randomness and adaptability so as to increase variability and effectiveness of the test cases. The specific implementation mode is as follows:

Different from the original value to increase variability;

In some embodiments of the invention, the system further comprises:

The feature value module is used for filling the missing feature values in the semantic feature vectors to form complete semantic feature vectors;

，

The method is used for processing the possible data missing problem in the semantic feature vector, so that the meaning and the rule of the message data can be completely reflected. And filling the missing feature values in the semantic feature vectors to form more complete semantic feature vectors, so that the number of samples for semantic analysis network learning is increased.

In some embodiments of the invention, the system includes a test optimization module for:

The embodiment of the invention also provides a deep learning industrial control vulnerability-based excavating device, which comprises:

a processor:

a memory for storing processor-executable instructions;

The present invention also provides a storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method according to any of the above embodiments.

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Note that all features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic set of equivalent or similar features. Where used, further, preferably, still further and preferably, the brief description of the other embodiment is provided on the basis of the foregoing embodiment, and further, preferably, further or more preferably, the combination of the contents of the rear band with the foregoing embodiment is provided as a complete construct of the other embodiment. A further embodiment is composed of several further, preferably, still further or preferably arrangements of the strips after the same embodiment, which may be combined arbitrarily.

It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The utility model provides a based on degree of depth study industrial control vulnerability discovery system for the vulnerability discovery of communication protocol between industrial control host and the industrial control terminal, its characterized in that includes:

The system comprises a sample data acquisition module, a communication data processing module and a communication data processing module, wherein the sample data acquisition module is used for acquiring communication message data between an industrial control host and an industrial control terminal, marking the communication message data between the industrial control host and the industrial control terminal in a normal working state as a positive sample, marking the communication message data between the industrial control host and the industrial control terminal in an attacked state as a negative sample, and generating a message sample data set, wherein a target industrial control protocol is selected, and communication parameters are determined; according to the selected target industrial control protocol and the communication parameters, the network sniffing tool or the switch port mirror function is used for collecting the communication message data between the industrial control host and the industrial control terminal in real time; filtering and analyzing the collected communication message data according to the specification and the characteristics of the target industrial control protocol, and extracting effective message content and fields; dividing message data into a positive sample and a negative sample according to the running state of an industrial control host, wherein the positive sample refers to communication message data between the industrial control host and an industrial control terminal in a normal working state and reflects normal functions and logic of a protocol, the negative sample refers to communication message data between the industrial control host and the industrial control terminal in an attacked state and reflects abnormal functions and logic of the protocol, and the attacked state is realized in a mode of artificial interference, malicious injection or simulated attack; storing the positive sample and the negative sample in a file or a database according to a certain format and a label to form a message sample data set;

The feature vector set generation module is used for inputting the message sample data set into a semantic analysis network, and the semantic analysis network learns positive samples and negative samples in the message sample data set by using a deep cyclic neural network so as to extract semantic feature vectors of the positive samples and the negative samples and generate a feature vector set, wherein missing feature values in the semantic feature vectors are filled to form complete semantic feature vectors; when the missing characteristic value is a category variable, filling the attribute value with the largest occurrence number of the category variable corresponding to the missing characteristic value in all the sample data in the missing part; when the missing characteristic value is a numerical variable, the missing characteristic value is complemented according to the following formula:

，

Wherein, Is the missing eigenvalue,/>Is the previous recorded eigenvalue of the numerical variable corresponding to the missing eigenvalue,/>Is the eigenvalue/>Line number of location,/>Is the last recorded eigenvalue of the numerical variable corresponding to the missing eigenvalue,Is the eigenvalue/>The number of lines where x is the number of lines where the missing feature value is located;

The test case queue generating module is used for inputting the feature vector set into a generating strategy network, the generating strategy network generates a plurality of test cases by using a strategy gradient method based on reinforcement learning according to the semantic feature vector to form a test case queue, wherein the plurality of test cases are stored in a file or a database according to a certain sequence to form the test case queue, the sequence is ordered according to the rewarding value, the action probability or the generating time of the test cases, and the test case queue is used for subsequent test execution;

The generation strategy network optimizes parameters by using a strategy gradient method based on reinforcement learning, and selects optimal or suboptimal actions according to input states, so as to generate test cases more likely to trigger abnormality of a program to be tested;

The strategy gradient method comprises the following steps:

initializing parameters of a strategy network;

for each training round, the following steps are performed:

Randomly selecting an input state from the feature vector set;

for each training period, the following steps are performed:

Clearing the gradient cache;

A program testing module to be tested, which is used for sequentially sending the test cases in the test case queue to the program to be tested of the industrial control host to trigger the abnormality of the program to be tested, performing static instrumentation compiling on the program to be tested for vulnerability mining to track the edge coverage rate of the test case on the code of the program to be tested; during operation, dynamically recording the edge coverage rate of each test case, and outputting the edge coverage rate after the execution of the program to be tested is finished; according to the edge coverage rate, selecting a plurality of high-quality cases from all the test cases, performing mutation processing on the high-quality cases to generate a plurality of new test cases, adding the new test cases into the test case queue, sending the new test cases to a program to be tested of the industrial control host, and triggering the program to be tested to be abnormal; selecting a high-quality use case according to whether the test case can cover a new path in the program to be tested, can cause the program to be tested to generate new state conversion and can cause the program to be tested to crash or hang;

The high quality use case comprises:

A test case that causes the program under test to crash or hang up;

The mutation treatment specifically comprises the following steps:

randomly selecting one or more fields of the intermediate test case, generating a value which is both random and suitable for the fields, and replacing the original value of the fields;

The system comprises a test optimization module, a semantic analysis module and a semantic analysis module, wherein the test optimization module is used for inputting the high-quality use cases into a semantic analysis network and extracting semantic feature vectors of the high-quality use cases; adding the semantic feature vector of the high-quality use case into the feature vector set to generate a new feature vector set; inputting the new feature vector set into the generation strategy network to generate a new test case; adding a new test case into the test case queue to send a program to be tested of the industrial control host, and triggering the abnormality of the program to be tested;