CN115687114A

CN115687114A - SCADA software vulnerability mining method based on reinforcement learning

Info

Publication number: CN115687114A
Application number: CN202211332080.1A
Authority: CN
Inventors: 查奇文; 王聪; 马莉雅; 岳洋; 徐绍航; 赵佳宾; 钮艳; 刘权; 王宁; 殷荣超; 于成丽
Original assignee: China Industrial Internet Research Institute
Current assignee: China Industrial Internet Research Institute
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-02-03

Abstract

The invention realizes a reinforcement learning-based vulnerability mining method for SCADA software by a method in the field of network security. The method mainly comprises two parts: generating a test case based on reinforcement learning and testing a software environment; the test case generation part based on reinforcement learning comprises two modules: a state analysis network and an information evaluation network; the software testing environment part comprises a Python system interface and an AFL agent module. The method provided by the invention provides a vulnerability mining method of SCADA software with self-learning, high coverage rate and adjustable algorithm multi-parameter.

Description

SCADA software vulnerability mining method based on reinforcement learning

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a reinforcement learning-based vulnerability mining method for SCADA (supervisory control and data acquisition) software.

Background

With the rapid development of industrial internet, the conventional industry is coming to be closely integrated with advanced internet technology. Industrial internetworking is exposed to an open internet environment, which means that the risk of an industrial system being attacked is greatly increased, compared to the relatively closed and secure environment of conventional industrial systems. In recent years, many security incidents have occurred in which SCADA systems are attacked, resulting in serious economic losses and security risks. Discovering vulnerabilities in advance through vulnerability mining is an effective way to avoid security attack events.

Traditional SCADA software vulnerability discovery approaches include two broad categories: static analysis methods and dynamic analysis methods. The difference between the two methods is whether the program needs to be operated or not, and the static method can find the vulnerability directly by detecting the source code without the operation of the program. Mainstream static analysis methods include abstract syntax tree based and control flow graph based, such as flawfnder, fortify, and coverage. The dynamic method needs to perform vulnerability detection through program operation, so that the source code can be detected, the binary code can also be detected, and the mainstream dynamic analysis mainly comprises three methods of symbolic execution, dynamic taint analysis and fuzzy test. Compared with a static method, the dynamic method has good expandability and low false alarm rate and is more popular.

At present, the mainstream vulnerability detection method still needs experts to make rules, but a self-learning model is lacked, so that the diversity and coverage rate of test detection samples are limited.

Purpose of the invention

As described above, in order to establish a self-learning vulnerability mining model and improve the diversity and coverage of detection cases, a vulnerability detection process is modeled as a Markov decision process, and a vulnerability mining method based on reinforcement learning is provided. The reinforcement learning intelligent agent receives the software running state, formulates a test case according to the strategy network, and the SCADA software gives out excitation information according to the test case. Our method has the following advantages:

1. the model has self-learning, and the reinforcement learning method improves the strategy of case generation through the excitation obtained by the interaction with the environment without the use case or the use case generation rule supervised by an expert.

2. The method has high coverage rate, and the method can generate cases according to the feedback self-adaptive regulation of the test environment, and has higher coverage rate in different test environments.

3. The algorithm has multiple adjustable parameters, so that the adjustment and the setting can be carried out according to specific tasks and problems, and the portability of the algorithm is good.

Disclosure of Invention

Therefore, the invention firstly provides a reinforcement learning-based SCADA software vulnerability mining method, which comprises the steps of iteratively generating a targeted test case mainly through a reinforcement learning method, testing SCADA system software in a software testing environment, and dynamically adjusting a generation strategy of the test case according to collected test feedback information, thereby efficiently mining the software vulnerability.

The test case generation part based on reinforcement learning comprises two modules: a state analysis network and an information evaluation network;

the software testing environment part comprises a Python system interface and an AFL agent module;

the reinforcement learning test case generation part receives state information of software operation through a Python system interface, the state analysis network outputs all possible test case rules through analysis and coding of the state information, the information evaluation network evaluates and selects each rule by combining the state information of current software after receiving the test case rules generated by the state analysis network, finally a series of case rules which are most likely to cause software errors are obtained and input to the software test environment part through the Python system interface, the Python system interface generates a test case to the AFL agent module after receiving the test case rules output by the reinforcement learning network, the AFL agent module adjusts the test case by counting the coverage rate of codes and the relation of the test case, and simultaneously returns the test result of the case and the operation state of the software to the reinforcement learning network through the Python system interface, and the reinforcement learning network also performs learning iteration through the feedback information to adjust the generated case rules.

The specific implementation manner of the state analysis network is as follows: the state analysis network receives the software state S output by the test software environment, and comprises test structure information T of the software, wherein T =<G，E>Wherein G = { i = { n } ₁ ，i ₂ ，i ₃ ...i _n Structural information indicating the output of software, i _n Representing the input node, E = { (i) ₂ ，i ₃ )，(i _m ，i _n )，...}，(i _m ，i _n ) Representing an input node i _n ，i _m The software state analysis network outputs the software state S and the software input structure information T, encodes the software state S and the software input structure information T, and predicts the information of the side of the graph formed by the software input structure, wherein the result is the rule A = { i } of the software test ₁ →i ₄ ，i ₁ →i ₇ ，i ₇ →i _1， ...}. The calculation process is shown in formula 1:

the specific implementation mode of the information evaluation network is as follows: for the rule A generated by the state analysis network, the current state of the software is combined to analyze the rule in the state analysis network, then a score is calculated and output, the score represents the probability that the current rule may cause software errors, in order to improve the quality of a test case, some rules tested by the software with higher probability are selected to be output, and meanwhile, excitation information r of a software test structure is used for evaluating the updating of the self weight of the network, namely, the evaluation of the generated rule is adjusted. The score is calculated as shown in equation 2:

score＝W*(concat[A，S])+b#(2)

where W, b are learnable parameters.

The use case rule generating method of the Python system interface comprises the following steps: a python system call interface in a system test environment is mainly responsible for receiving a test case rule output by a reinforcement learning network according to information transfer between the reinforcement learning network and software to be tested, and then generating test cases in batch according to a given case template, wherein the case template comprises an input boundary, a type condition and the like; the system calling interface is also responsible for receiving the running state of the software as the current software running state, calculating the excitation information r generated by the current test case according to the running state of the software, and transmitting the excitation information r to the reinforcement learning network so as to update the strategy of the reinforcement learning network. The loss function loss calculated according to equation (3) is first used and then the parameters of the network are updated using the gradient descent method.

loss＝(Q(A，S)-r) ² #(3)

Where Q is a parameter of the reinforcement learning network.

The implementation method of the AFL agent module comprises the following steps: for each case E for testing the input software, counting the running time and the code line number of the software under the case, simultaneously recording the running state of the software under the test case, and transmitting the running state to the reinforcement learning network through a system interface of python.

The technical effects to be realized by the invention are as follows:

1. the model has self-learning property, and the reinforcement learning method improves the strategy of case generation through the excitation obtained by interaction with the environment without making cases or case generation rules under the supervision of an expert.

3. The algorithm has multiple adjustable parameters, so that the adjustment and setting can be carried out according to specific tasks and problems, and the algorithm has good transportability.

Drawings

FIG. 1 is a reinforcement learning-based vulnerability mining framework;

FIG. 2 software test rule generation;

FIG. 3 software test rule evaluation

Detailed Description

The following is a preferred embodiment of the present invention and is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.

The invention provides a reinforcement learning-based SCADA software vulnerability mining method which can effectively test the safety and robustness of the existing industrial SCADA software.

In the embodiment, for a security vulnerability mining scene of industrial SCADA software, a reinforcement learning mode is used, and deep analysis and calculation can be performed on any industrial SCADA software to be detected. Specifically, in the embodiment, by using a SCADA software vulnerability mining method based on reinforcement learning, a large number of test cases with pertinence, high availability and comprehensive coverage are automatically generated in a continuous iteration mode by inputting real-time state information of analysis software, comprehensive safety and robustness tests are performed on SCADA software by using the test cases, and meanwhile, a generation strategy of the test cases is dynamically adjusted according to a test effect. And importing the test case into SCADA software for execution, observing whether the execution result of the software is consistent with the expected result, and if not, indicating that the SCADA software has a bug, thus giving out a corresponding bug type and a trigger condition.

The method combines the state information of the software structure, generates different test cases through the analysis of the software input structure, tests the safety of the SCADA software, and simultaneously feeds back the test result to a learning algorithm to generate the next test case and test iteratively.

The method mainly comprises two parts, namely test case generation based on reinforcement learning and a software test environment.

The test case generation part based on reinforcement learning comprises two modules: a state analysis network and an information evaluation network.

State analysis network: the state analysis network receives the output of the system interface of the software testing environment, wherein the output contains the state information of the software to be tested, and the module analyzes and codes the received software state information to generate a series of test case rules.

Information evaluation network: the information evaluation network receives the test case information generated by the state analysis network, and selects the test case rule which is most effective to the current software to output by combining the current state information of the software.

The software test environment part comprises a Python system interface and an AFL agent.

Python system interface: the module is mainly responsible for batch generation of software test cases and software state interaction. The method comprises the steps of receiving test case rules of a reinforcement learning model, and generating test cases in batches according to the rules; and meanwhile, acquiring the software state output by the AFL module and uniformly transmitting the software state to a reinforcement learning network for training.

And the AFL agent: by comparing the coverage rates of the input test cases and codes, the combination of the test cases can be automatically adjusted, and the probability of vulnerability mining is improved.

The software testing method for reinforcement learning can comprehensively and efficiently test the safety of software, can automatically mine software bugs, and improves the efficiency of software testers.

The technical framework adopted by the invention is shown in figure 1, firstly, a reinforcement learning network receives some state information generated during the operation of software through a Python system interface, wherein the state analysis network outputs all possible test case rules through the analysis and coding of the state information, the information evaluation network combines the state information of the current software after receiving the test case rules generated by the state analysis network to evaluate and select each rule, finally, a series of case rules which are most likely to cause the software to make errors are obtained and input into a test environment through the Python system interface, the Python system interface generates batch test cases to an AFL agent module after receiving the test case rules output by the reinforcement learning network, the AFL agent module automatically adjusts the test cases through counting the coverage rate of codes and the relation of the test cases, and simultaneously returns the test results of the cases and the operation state of the software to the reinforcement learning network through the Python system interface, the reinforcement learning network also learns through the feedback information and iteratively adjusts the generated case rules.

State analysis network

The test case generation method in the present invention adopts a reinforcement learning method to generate, as shown in fig. 2, firstly, a state analysis network receives a software state S output by a test software environment, which includes test structure information T of software, and is represented by a graph, T =<G，E>Wherein G = { i = { n } ₁ ，i ₂ ，i ₃ ...i _n Structural information indicating the output of software, i _n Representing the input node, E = { (i) ₂ ，i ₃ )，(i _m ，i _n )，...}，(i _m ，i _n ) Representing an input node i _n ，i _m The software state analysis network outputs the software state S and the software input structure information T, encodes the software state S and the software input structure information T, and then predicts information of edges of a graph formed by the software input structures, so that the result is the rule a = { i } of the software test ₁ →i ₄ ，i ₁ →i ₇ ，i ₇ →i ₁ ，...}。

The calculation process is shown in formula 1:

information evaluation network

For the rule a generated by the state analysis network, the goal of the rule evaluation network is to analyze the rule in combination with the current state of the software, and then calculate and output a score, which represents the probability that the current rule (a certain edge in the figure) may cause the software to have an error, in order to improve the quality of the test case, we select some rules of the software test with higher probability to output (greater than 0.5), on the other hand, the excitation information r of the software test structure is used for evaluating the update of the self weight of the network, that is, adjusting the evaluation of the generated rule.

The score is calculated as shown in equation 2:

score＝W*(concat[A，S])+b#(2)

generation of batched test cases

A python system call interface in a system test environment is mainly responsible for receiving a test case rule output by a reinforcement learning network according to information transfer between the reinforcement learning network and software to be tested, and then generating test cases in batch according to a given case template, wherein the case template comprises an input boundary, type conditions and the like. The system calling interface is also responsible for receiving the running state of the software as the current software running state, calculating the excitation information r generated by the current test case according to the running state of the software, and transmitting the excitation information r to the reinforcement learning network so as to update the strategy of the reinforcement learning network.

The loss function loss calculated according to equation (3) is first used and then the parameters of the network are updated using the gradient descent method.

loss＝(Q(A，S)-r) ² #(3)

AFL proxy

Specifically, for each case E for testing input software, the running time and the code line number of the software under the case are counted, the running state of the software under the test case is recorded at the same time, and the running state is transmitted to the reinforcement learning network through a system interface of python.

The module randomly combines the code running conditions among different test cases in order to fully utilize the given test cases, so that the information of the current test cases can be utilized to the maximum extent to carry out vulnerability mining on the software.

Claims

1. A reinforcement learning-based SCADA software vulnerability mining method is characterized by comprising the following steps: a reinforcement learning-based SCADA software vulnerability mining method is characterized by comprising the following steps: the method comprises two parts, namely a test case generation part and a software test environment based on reinforcement learning, and finally giving out corresponding vulnerability types and trigger conditions through the method;

the test case generation part based on reinforcement learning comprises two modules: the test case generation part based on reinforcement learning comprises two modules: a state analysis network and an information evaluation network;

2. An intensity-based alloy as claimed in claim 1The SCADA software vulnerability mining method for the learning-oriented system is characterized by comprising the following steps: the specific implementation mode of the state analysis network is as follows: the state analysis network receives the software state S output by the test software environment, and comprises test structure information T of the software, wherein T =<G,E>Wherein G = { i = { n } ₁ ,i ₂ ,i ₃ ...i _n Structural information indicating the output of software, i _n Representing an input node, E = { (i) ₂ ,i ₃ ),(i _m ,i _n ),...},(i _m ,i _n ) Representing an input node i _n ，i _m The software state analysis network outputs the software state S and the software input structure information T, encodes the software state S and the software input structure information T, and predicts the information of the side of the graph formed by the software input structure, wherein the result is the rule A = { i } of the software test ₁ →i ₄ ,i ₁ →i ₇ ,i ₇ →i ₁ A survey, the calculation process is shown as the formula:

3. the reinforcement learning-based SCADA software vulnerability mining method according to claim 2, characterized in that: the specific implementation mode of the information evaluation network is as follows: for the rule A generated by the state analysis network, the current state of the software is combined to analyze the rule in the state analysis network, and then a score is calculated and output, wherein the score represents the probability that the current rule can cause the software to generate errors, in order to improve the quality of test cases, selecting some rules of the software test with higher probability for output, and simultaneously using excitation information r of the software test structure for updating the self weight of the evaluation network, namely adjusting the evaluation of the generation rule, wherein the calculation mode of the score is shown as a formula:

score＝W*(concat[A,S])+b#

where W, b are learnable parameters.

4. The reinforcement learning-based SCADA software vulnerability mining method according to claim 3, characterized in that: the use case rule generating method of the Python system interface comprises the following steps: a python system call interface in a system test environment is mainly responsible for receiving a test case rule output by a reinforcement learning network according to information transfer between the reinforcement learning network and software to be tested, and then generating test cases in batch according to a given case template, wherein the case template comprises an input boundary, a type condition and the like; the system call interface is also responsible for receiving the running state of the software as the current software running state, calculating the excitation information r generated by the current test case from the running state of the software, and transmitting the excitation information r to the reinforcement learning network for strategy updating of the reinforcement learning network, firstly calculating the loss function loss according to the following formula, then updating the parameters of the network by using a gradient descent method,

loss＝(Q(A,S)-r) ² #

where Q is a parameter of the reinforcement learning network.

5. The reinforcement learning-based SCADA software vulnerability mining method according to claim 4, characterized in that: the implementation method of the AFL agent module comprises the following steps: for each case E for testing the input software, counting the running time and the code line number of the software under the case, simultaneously recording the running state of the software under the test case, and transmitting the running state to the reinforcement learning network through a system interface of python.