CN114064471A

CN114064471A - Ethernet/IP protocol fuzzy test method based on generation of countermeasure network

Info

Publication number: CN114064471A
Application number: CN202111330268.8A
Authority: CN
Inventors: 龚丽; 张益�; 刘翱; 肖宇洋; 张红莉
Original assignee: Second Research Institute of CAAC
Current assignee: Second Research Institute of CAAC
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-02-18

Abstract

The application belongs to the technical field of Ethernet/IP protocol testing and vulnerability mining, and particularly relates to an Ethernet/IP protocol fuzzy testing method based on a generation countermeasure network, which comprises the following steps: constructing an Ethernet/IP protocol sample library; preprocessing a communication data packet in an Ethernet/IP protocol sample library; constructing a training set of a corresponding category by a plurality of clustering construction methods; generating confrontation network training; generating fuzzy test sample data; carrying out Ethernet/IP protocol fuzzy test to form an abnormal test case set; and carrying out Ethernet/IP protocol fuzzy test on the abnormal test case, and confirming that the case is the abnormal test case and the vulnerability caused by the abnormal test case. According to the Ethernet/IP protocol fuzzy test method based on the generation countermeasure network, the generation countermeasure network is used for learning autonomously and generating high-quality fuzzy test cases, manpower for protocol analysis and case construction in the fuzzy test process is reduced, and efficient and intelligent fuzzy test is achieved.

Description

Ethernet/IP protocol fuzzy test method based on generation of countermeasure network

Technical Field

The application belongs to the technical field of Ethernet/IP protocol testing and vulnerability mining, and particularly relates to an Ethernet/IP protocol fuzzy testing method based on a generation countermeasure network.

Background

Industrial control systems are widely distributed in the fields of key infrastructures such as industry, energy, electric power, water conservancy and traffic, and the importance of communication protocols as carriers for transmitting operation instructions in the industrial control systems is self-evident. Therefore, how to effectively mine the potential loopholes of the communication protocol and improve the anti-attack capability of the communication protocol is of great importance for improving the safety of the industrial control system.

The types of communication protocols of industrial control systems are various, wherein an Ethernet/IP protocol is a communication protocol widely applied to industrial control systems in the industries of traffic, energy and the like, and the safety communication of the Ethernet/IP protocol is guaranteed to be a basic stone for guaranteeing the safety operation of the industrial control systems.

Fuzz testing is a method of discovering potential bugs by providing a large number of unexpected inputs to a target program and monitoring for abnormal results, and the core of this technique is how to construct valid test data (unexpected data of inputs). The current test data construction methods mainly include two methods: 1) the construction method based on variation: the input data are subjected to targeted variation, and then the varied data are input into an industrial control system to trigger a vulnerability; 2) the construction method based on generation: and defining a test data format according to a known protocol rule, and generating test data from scratch. However, the current industrial control protocol fuzzy test method still has some disadvantages: 1) most of the test cases depend on manual analysis and design, time and labor are consumed, and the quality of the test cases completely depends on the priori knowledge of people, so that the passing rate and the path coverage rate of the test cases are different. 2) The current test case generation method is mainly constructed according to the public vulnerability knowledge base, has certain limitation on the excavation of new vulnerabilities, and has very limited Ethernet/IP protocol vulnerabilities in the existing public vulnerability knowledge base (including CNVD, CNNVD, CVE and the like).

Disclosure of Invention

In order to solve at least one technical problem in the prior art, the application provides an Ethernet/IP protocol fuzzing test method based on a generation countermeasure network.

The application discloses an Ethernet/IP protocol fuzzy test method based on generation of a countermeasure network, which comprises the following steps:

step one, constructing an Ethernet/IP protocol sample library;

step two, processing the communication data packet in the Ethernet/IP protocol sample library to obtain 10-system Ethernet/IP protocol sample data;

thirdly, constructing a training set of a corresponding type by utilizing the sample data of the Ethernet/IP protocol processed in the second step and through a plurality of different clustering construction methods;

step four, respectively generating confrontation network training by utilizing different types of training sets;

generating fuzzy test sample data, wherein the fuzzy test sample data comprises a plurality of test cases;

step six, inputting all test cases in the fuzzy test sample data into an industrial control system one by one to carry out Ethernet/IP protocol fuzzy test, if a certain test case causes the industrial control system to operate abnormally, recording the test case as an abnormal test case, and finally forming an abnormal test case set;

and step seven, inputting all the abnormal test cases in the abnormal test case set into the industrial control system one by one again to perform Ethernet/IP protocol fuzzy test, manually checking whether the industrial control system operation abnormality caused by each abnormal test case belongs to false alarm, deleting the test case from the abnormal test case set if the operation abnormality belongs to the false alarm, confirming the case as the abnormal test case if the abnormal test case causes the abnormality again, and recording the abnormal test case and the bug condition caused by the abnormal test case.

According to at least one embodiment of the present application, the method for testing the Ethernet/IP protocol ambiguity further includes:

and step eight, copying the data of all the abnormal test cases finally recorded in the step seven, then randomly carrying out data mutation operation of numerical value negation or random value taking, and returning the abnormal test case data after mutation to the step four and the step five, thereby generating the fuzzy test sample data which is easier to trigger the Ethernet/IP protocol vulnerability.

According to at least one embodiment of the present application, in the first step, a manual collection mode is adopted, and a Wireshark packet grabbing tool is used to grab the Ethernet/IP protocol communication data packet from the industrial control system, so as to obtain an Ethernet/IP protocol sample library.

According to at least one embodiment of the present application, in the second step, the communication data packet in the Ethernet/IP protocol sample library is parsed, the other protocol headers are removed, the 16 th Ethernet/IP protocol sample data is automatically extracted, and then the 16 th Ethernet/IP protocol sample data is converted into 10 th Ethernet/IP protocol sample data bit by bit.

According to at least one embodiment of the present application, in the third step, the plurality of different cluster construction methods include:

and (3) classifying and constructing according to message length: according to different message lengths, dividing the sample data of the Ethernet/IP protocol into different training sets, and performing data amplification on the smaller training set;

constructing according to functional code classification: constructing training sets corresponding to different function codes according to the types of the Ethernet/IP protocol function codes, and aligning and filling data in different training sets respectively;

and (3) mixed construction: all Ethernet/IP protocol sample data is taken as a training data set.

According to at least one embodiment of the present application, in the message length classification construction method, data amplification includes:

and copying the data in the smaller training set, and then performing data variation operation of numerical value inversion, numerical value inversion or random value dereferencing on the copied data.

According to at least one embodiment of the present application, in the method for constructing according to functional code classification, the functional code types include 8 types, and data in different training sets are aligned and filled through the following relation (1):

wherein S is₀For any original message, len () is a length function, MaxLen is the maximum length of the message in the current training set, x is a padding character, and x is 'PAD';

when len (S)₀) When MaxLen indicates that the raw data does not need to be filled, the raw data S is replaced with the original data S₀Remaining in the training set;

when len (S)₀)<In MaxLen, it is described that the original data needs to be padded, and the padding length is n ═ MaxLen-len (S)₀) The original data S₀Removing from the training set and filling the message S₁And adding the training data into a training set.

According to at least one embodiment of the present application, in the fourth step, selecting the long-short term memory network LSTM as a generator model network, selecting the convolutional neural network CNN as a discriminator model network, and using Dropout to avoid model overfitting, wherein the generating the antagonistic network training includes:

step 4.1, initializing network parameters of an LSTM generator and a CNN discriminator;

step 4.2, inputting the training set data into an input layer of the LSTM generator, inputting the data into an embedding layer and a hiding layer by the input layer, preliminarily learning the characteristic distribution of the training data, and updating the weight parameters of the LSTM generator;

4.3, generating negative sample data by using the LSTM generator obtained in the step 4.2, then using the real data as positive sample data, training a two-classification CNN (neural network) discriminator by using the data, preliminarily learning and distinguishing the generated data and the real data, and updating the weight parameters of the CNN discriminator;

and 4.4, circularly and countercurrently training the LSTM generator and the CNN discriminator.

In accordance with at least one embodiment of the present application, in said step 4.4, the loop confrontation training LSTM generator comprises:

step 4.51, circularly training an LSTM generator and generating sequence data;

step 4.52, inputting the generated sequence data into a CNN (parallel neural network) discriminator to obtain corresponding Reward;

4.53, utilizing a Policy Gradient algorithm to transmit the Reward to the LSTM generator, and realizing the updating of the weight parameters of the LSTM generator;

the cyclic confrontation training CNN discriminator comprises:

4.61, circularly training a CNN (CNN) discriminator by using the negative sample data and the real sample data set generated by the updated LSTM generator;

and 4.62, updating the weight parameters until the model converges.

According to at least one embodiment of the present application, the step five, generating the fuzz test sample data includes:

step 5.1, in the process of training the LSTM generator and the CNN discriminator by the loop countermeasure in the step 4.4, storing the LSTM generator model once every 5 training periods;

step 5.2, arranging generator models stored in the countermeasure training process completely generated at one time according to storage time, and dividing the generator models into an early model, a middle model and a later model according to the proportion of 1:7: 2;

and 5.3, extracting the Ethernet/IP protocol data generated by the generator models in three different periods according to the same proportion to be used as the fuzzy test sample data.

The application has at least the following beneficial technical effects:

1) according to the Ethernet/IP protocol fuzzy test method based on the generation countermeasure network, the generation countermeasure network is used for learning autonomously and generating a high-quality fuzzy test case, so that the manpower for protocol analysis and case construction in the fuzzy test process is reduced, and efficient and intelligent fuzzy test is realized;

2) in the Ethernet/IP protocol fuzzy test method based on the generation countermeasure network, a plurality of different clustering strategies are adopted to classify and construct protocol samples, so that a generation model can generate more diversified and more standard test cases, the path coverage rate is increased, more potential vulnerabilities of the Ethernet/IP protocol can be triggered, and the safety and the attack resistance of an industrial control system are improved;

3) in the Ethernet/IP protocol fuzzy test method based on the generation countermeasure network, a method for extracting fuzzy test sample data in proportion is provided. The stored generator models are divided into early, middle and later models according to time sequence, generated fuzzy test samples are extracted according to a specific proportion, only a small amount of generated data of the early models and the later models are extracted, and the passing rate and the randomness of test cases in the fuzzy test are well balanced, so that the balance problem of the passing rate and the randomness of the test cases in the fuzzy test is solved;

4) the test case generated by the Ethernet/IP protocol fuzzy test method based on the generation countermeasure network has higher passing rate and vulnerability detection rate.

Drawings

FIG. 1 is a flow chart of the fuzzy test method of the Ethernet/IP protocol based on the generation of the countermeasure network.

Detailed Description

In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application. The described embodiments are a subset of the embodiments in the present application and not all embodiments in the present application. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

As shown in fig. 1, the present application discloses a method for generating a fuzzy test of Ethernet/IP protocol of an anti-network, comprising the following steps:

step one (construction of an Ethernet/IP protocol sample library): and constructing an Ethernet/IP protocol sample library.

Specifically, the step adopts a manual acquisition mode, and uses a Wireshark packet capturing tool to fully capture the Ethernet/IP protocol communication data packet from the industrial control system, so as to obtain a complete Ethernet/IP protocol sample library.

Step two (sample data preprocessing): and processing the communication data packet in the Ethernet/IP protocol sample library to obtain 10-system Ethernet/IP protocol sample data.

Specifically, the step is to analyze the communication data packet in the Ethernet/IP protocol sample library, remove other protocol headers, automatically extract the 16-ary Ethernet/IP protocol sample data, and perform the binary conversion operation to convert the 16-ary Ethernet/IP protocol sample data into 10-ary data bit by bit.

Step three (constructing a training set): and D, constructing a training set of the corresponding type by utilizing the Ethernet/IP protocol sample data processed in the step two and through a plurality of different clustering construction methods.

Specifically, the cluster construction method in this step includes the following three methods:

1) and (3) classifying and constructing according to message length: dividing the preprocessed protocol data into different training sets according to different message lengths, and performing data amplification on a smaller training set to increase the influence of the training set; the construction method guarantees the data randomness and the similarity of the generated test cases, thereby increasing the passing rate of the test cases.

Wherein, the data amplification method comprises the following steps: and copying the data in the smaller training set, and then performing data variation operations such as numerical value inversion, numerical value inversion or random value taking and the like on the copied data, thereby realizing data amplification of the small data set.

2) Constructing according to functional code classification: and constructing training sets corresponding to different function codes according to the types of the Ethernet/IP protocol function codes, and aligning and filling data in different training sets respectively. It should be noted that, such a construction method can fully test the robustness of each function of the Ethernet/IP protocol.

In this embodiment, the reserved field of the Ethernet/IP protocol is removed, the Ethernet/IP protocol has 8 kinds of functional codes in total, and as shown in table 1 below, 8 different training sets are constructed according to different functional codes:

TABLE 1 Ethernet/IP function code

Code	Name (R)	Function(s)
			0x0000	NOP	TCP open probing
0x0004	ListServices	List service
			0x0063	ListIdentity	List identification
0x0064	ListInterfaces	List interface
			0x0065	RegisterSession	Registration session
0x0066	UnRegisterSession	Terminating a session
			0x006F	SendRRData	Sending encapsulation packets
0x0070	SendUnitData	Transmitting unit data

Further, in this step, the data in different training sets are aligned and filled through the following relation (1) (i.e. the original data S is aligned and filled according to the relation (1)₀Carrying out alignment filling to obtain new data S₁)：

3) And (3) mixed construction: all the sample data of the Ethernet/IP protocol is used as a training data set, the randomness of the fuzzy test is fully exerted, and the generated test case has higher coverage rate.

Step four (generation of confrontation network training): and respectively carrying out generation of confrontation network training by utilizing different types of training sets.

Specifically, in the step, according to the characteristics of fixed frame format, strong time sequence and the like of the Ethernet/IP industrial control protocol, a long-short term memory network (LSTM) is selected as a generator model network, a Convolutional Neural Network (CNN) is selected as a discriminator model network, and Dropout is adopted to avoid overfitting of the model.

In addition, the LSTM generator constructed by the application comprises two hidden layers and an output layer, wherein each layer of the two hidden layers consists of 64 hidden states, and the output layer adopts a full-connection mode and softmax as an activation function; the CNN discriminator network hierarchy sequentially comprises an Embedding layer, a convolution layer, a pooling layer, a Dropout layer and an output layer, wherein the output layer adopts a full-connection mode and sigmoid as an activation function.

Further, the generation of the confrontation network training in this step includes the following steps:

step 4.1: initializing network parameters of an LSTM generator and a CNN discriminator;

step 4.2 (pre-training LSTM generator): inputting training set data into an input layer of an LSTM generator, inputting the data into an embedding layer and a hiding layer by the input layer, preliminarily learning the characteristic distribution of the training data, and updating the weight parameters of the LSTM generator;

step 4.3 (pre-training CNN discriminator): generating negative sample data by using the LSTM generator obtained in the step 4.2, training a two-class CNN (neural network) discriminator by using the real data as the positive sample data, preliminarily learning and distinguishing the generated data and the real data, and updating the weight parameters of the CNN discriminator;

step 4.4: the loop countermeasure trains the LSTM generator and CNN arbiter.

Furthermore, the loop countermeasure training LSTM generator in step 4.4 further includes the following steps:

step 4.51, circularly training an LSTM generator and generating sequence data;

correspondingly, the cyclic confrontation training CNN discriminator comprises the following steps:

and 4.62, updating the weight parameters until the model converges.

Step five (fuzzy test sample data generation): generating fuzzy test sample data, wherein the fuzzy test sample data comprises a plurality of test cases.

It should be noted that after the generation countermeasure network training in the fourth step, different trained LSTM generators can be obtained, and if a test case is generated only by using the last trained LSTM generator, it can be ensured that the generated test case has higher similarity to real Ethernet/IP communication data, i.e. it can be ensured that the generated test case has higher throughput; however, this is contrary to the concept of fuzzy testing (inputting a large amount of unexpected data to trigger a bug), and the test cases generated by the well-trained LSTM generator cannot sufficiently exert the unexpected and random properties of the fuzzy testing, so that the balance problem of the throughput rate and the random properties of the test cases in the fuzzy testing is involved.

Therefore, in the step, generating the fuzzy test sample data comprises the following steps:

In summary, because the generator models at different times have different training degrees, the generated test cases have different degrees of difference from the real communication data, and thus the whole test case set can maintain good diversity and randomness.

In addition, because the early model is only trained for 5 rounds, the fitting effect is poor, and in order to ensure the passing rate of the test case, only 1 is extracted from the Ethernet/IP protocol data generated by the early model to be used as the final fuzzy test sample; similarly, after the later model is subjected to multiple rounds of iterative training, a test case with high similarity to real Ethernet/IP communication data can be generated, and in order to ensure the randomness of the test case, 2 samples are extracted from the Ethernet/IP protocol data generated by the later model to serve as the final fuzzy test sample; the remaining 7 fuzzy test samples were extracted from the Ethernet/IP protocol data generated by the metaphase model. The method for generating the fuzzy test sample data according to the proportion is innovatively adopted by the method, and the passing rate and the randomness of the test cases in the fuzzy test can be well balanced.

Step six (Ethernet/IP protocol fuzzing test): inputting all test cases in the fuzzy test sample data into an industrial control system one by one to carry out Ethernet/IP protocol fuzzy test, recording the test case as an abnormal test case if a certain test case causes the industrial control system to operate abnormally, and finally forming an abnormal test case set.

Specifically, in this step, a Python script is used to encapsulate the Ethernet/IP protocol fuzz test module. Firstly, converting generated 10-system fuzzy test sample data into 16-system fuzzy test samples, then sending the fuzzy test samples to an industrial control system by using a socket.send () command, monitoring the operating state of the industrial control system by using a PING command, receiving a return command of the industrial control system by using a socket.resume () command, analyzing a return state code, recording the fuzzy test samples and the corresponding state code if the state code of a return data packet is not 0x0000 (success), and then calculating the passing rate of the test samples to evaluate the quality of a generated model. If the fuzzy test case causes abnormal operation (including downtime, communication interruption, program crash, logic abnormality and the like) of the industrial control system or the returned data packet is analyzed to contain information such as Error, Exception and the like, the test case is recorded as an abnormal test case, and finally an abnormal test case set is formed.

Step seven (vulnerability verification): inputting all abnormal test cases in the abnormal test case set into the industrial control system one by one again to perform Ethernet/IP protocol fuzzy test, manually checking whether the operation abnormality of the industrial control system caused by each abnormal test case belongs to false alarm, deleting the test case from the abnormal test case set if the operation abnormality belongs to false alarm, confirming the case as an abnormal test case if the operation abnormality of the industrial control system caused by each abnormal test case again belongs to abnormal alarm, and recording the abnormal test case and the loophole condition caused by the abnormal test case.

Further, the Ethernet/IP protocol fuzzing test method based on generation of the countermeasure network may further include the following steps:

step eight (model retraining): copying the data of all the abnormal test cases finally recorded in the step seven, then randomly carrying out data mutation operations such as numerical value negation or random value taking and the like, and returning the abnormal test case data after mutation to the step four and the step five; and generating the abnormal test case data after mutation in the countermeasure network as a training data set in the fourth step, adjusting the model parameters, and retraining by using a new data set to generate the countermeasure network model, thereby generating the fuzzy test sample data which is easier to trigger the Ethernet/IP protocol vulnerability in the fifth step.

Different from the generation of the confrontation network training process in the previous step four, the step four begins with the normal preprocessed and clustered Ethernet/IP protocol data as a training data set; and in the model retraining process, the abnormal test case data after being copied and mutated is used as a training data set in the fourth step.

Different from the generation process of the sample data of the fuzzy test in the previous step five, in the first step five, in order to balance the passing rate and the randomness of the test cases in the fuzzy test, the generation models which are arranged in sequence according to the storage time need to be extracted according to the proportion, and the sample data of the fuzzy test is generated according to the same proportion; and in the model retraining process, in order to enable the model to fully learn the characteristics of the abnormal test cases and generate fuzzy test sample data more similar to the abnormal test cases, the LSTM generator model stored after the last round of training period is used for generating retrained test samples.

In summary, the method and the device utilize the generation of the countermeasure network for autonomous learning and generation of the high-quality fuzzy test case, thereby effectively reducing the labor consumption of protocol analysis and case construction in the fuzzy test process and realizing efficient and intelligent fuzzy test. Meanwhile, different clustering strategies are innovatively introduced, the diversity of test case generation is improved, and the path coverage rate is increased, so that more potential vulnerabilities of an Ethernet/IP protocol can be triggered more favorably, and the safety and the attack resistance of an industrial control system are improved. In addition, the method and the device solve the balance problem of the passing rate and the randomness of the test cases in the fuzzy test by introducing the idea of extracting the fuzzy test samples according to the proportion.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An Ethernet/IP protocol fuzzy test method based on a generation countermeasure network is characterized by comprising the following steps:

step one, constructing an Ethernet/IP protocol sample library;

thirdly, constructing a training set of a corresponding type by utilizing the sample data of the Ethernet/IP protocol processed in the second step and through a plurality of clustering construction methods;

2. The Ethernet/IP protocol fuzzy test method of claim 1 further comprising:

3. The fuzzy test method for the Ethernet/IP protocol according to claim 1 or 2, wherein in the first step, a manual collection mode is adopted, and a Wireshark packet capturing tool is used to capture the Ethernet/IP protocol communication data packet from the industrial control system, so as to obtain the Ethernet/IP protocol sample library.

4. The fuzzy test method of Ethernet/IP protocol according to claim 1 or 2, wherein in the second step, the communication data packet in the Ethernet/IP protocol sample library is parsed, other protocol headers are removed, the 16-system Ethernet/IP protocol sample data is automatically extracted, and then the 16-system Ethernet/IP protocol sample data is converted into 10-system Ethernet/IP protocol sample data bit by bit.

5. The Ethernet/IP protocol fuzzy test method according to claim 1 or 2, wherein in the third step, the plurality of different cluster construction methods comprise:

6. The Ethernet/IP protocol fuzzy test method according to claim 5, wherein in said construction method by message length classification, data amplification comprises:

7. The Ethernet/IP protocol fuzzy test method of claim 5, wherein in the method of constructing according to functional code classification, the functional code types include 8 types, and the data in different training sets are aligned and filled through the following relation (1):

wherein S is₀Is any oneOriginal message, len () is length function, MaxLen is maximum length of message in current training set, x is padding character and x ═ PAD';

8. The Ethernet/IP protocol fuzzy test method according to claim 1 or 2, wherein in the fourth step, the long short term memory network LSTM is selected as a generator model network, the convolutional neural network CNN is selected as a discriminator model network, and Dropout avoidance model overfitting is adopted, wherein the generation countermeasure network training comprises:

9. The Ethernet/IP protocol fuzzy test method of claim 8 wherein in said step 4.4, the loop confrontation training LSTM generator comprises:

step 4.51, circularly training an LSTM generator and generating sequence data;

the cyclic confrontation training CNN discriminator comprises:

and 4.62, updating the weight parameters until the model converges.

10. The Ethernet/IP protocol fuzz testing method according to claim 9, wherein the fifth step of generating fuzz test sample data comprises: