Disclosure of Invention
The invention provides a fuzzy test case generation system based on a coupling self-encoder, which can enable the generated test case to meet different test requirements through the learning and training of the self-encoder, improve the acceptance rate of the test case and the triggering rate of a bug, reduce the labor consumption and improve the test efficiency through continuous training and optimization.
In order to achieve the purpose, the invention provides the following scheme:
a fuzz test case generation system based on a coupled autoencoder, comprising: the device comprises a sample set construction module, a coupled self-encoder module and a fuzzy test module;
the sample set constructing module is used for generating a data training set; the data training set comprises a normal data training set and an abnormal data training set; the normal data training set is a normal data message; the malformed data training set comprises an abnormal message set and a vulnerability information set;
the coupling self-encoder module is used for generating a test case through a coupling self-encoder based on a data training set;
the fuzzy test module is used for detecting the test cases, detecting whether the abnormal test cases occur or not, adding the abnormal test cases into the malformed data set, and optimizing the data training set.
Preferably, the coupled self-encoder module comprises a first self-encoder and a second self-encoder;
the first self-encoder and the second self-encoder each comprise an encoding unit and a decoding unit;
the coding unit of the first self-coder is used for compressing the data of the normal data training set and generating the internal representation of the normal data, and the decoding unit of the first self-coder is used for restoring the internal representation of the normal data; and the coding unit of the second self-encoder is used for compressing the data of the malformed data training set to generate the internal representation of the malformed data, and the decoding unit of the second self-encoder is used for restoring the internal representation of the malformed data.
Preferably, the first self-encoder and the second self-encoder form the coupled self-encoder module by weight binding.
Preferably, the encoding unit and the decoder unit both use neural network cascade.
Preferably, the fuzzy test module further comprises an adaptation module and an anomaly analysis module;
the adaptation module is used for detecting and judging whether the test case meets preset conditions or not;
and the abnormality analysis module is used for judging the abnormal type of the test case, adding the abnormal test case into the malformed data set and optimizing the data training set.
The invention also discloses a fuzzy test case generation method based on the coupling self-encoder, which comprises the following steps:
s1, generating the data training set through the sample set construction module;
s2, generating the test case through the coupled self-encoder module based on the data training set;
and S3, detecting and judging the test cases through the fuzzy test module, and adding the abnormal test cases into the malformed data set for optimizing the data training set.
Preferably, the S2 further includes:
s2.1, binding and constraining the weights of the first self-encoder and the second self-encoder to construct data joint distribution containing different attributes;
s2.2, compressing the data in the normal data training set through a coding unit of the first self-coder to generate the internal representation of the normal data;
s2.3, decoding and restoring the internal representation of the normal data generated in the S2.2 through a decoding unit of the second self-encoder to generate the test case;
preferably, the step S3 further includes:
s3.1, detecting whether the test case meets a preset condition or not through the adaptation module;
s3.2, judging the type of the abnormity of the test case through the abnormity analysis module;
and S3.3, adding the abnormal test case into the abnormal data training set through the abnormality analysis module for optimizing the data training set.
The invention has the beneficial effects that:
the invention provides a fuzzy test case generation system and method based on a coupling self-encoder aiming at the problems of insufficient test case malformation degree, low vulnerability triggering rate and low test efficiency of the traditional self-encoding test method, wherein the fuzzy test case is generated by learning and training of the coupling self-encoder, so that the grammatical requirement of a data protocol format is met, the characteristic of abnormal or vulnerability data is provided, and the acceptance rate of the test case and the vulnerability triggering rate are improved; by continuously optimizing the data training set, the applicability of the test case can be continuously improved, the test efficiency is improved, and the labor consumption is reduced. The method can be applied to various different industrial control network protocols, and provides a feasible technical thought for different application fields.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The present embodiment takes the Modbus protocol as an example for explanation. Modbus, an application-layer messaging protocol, is widely used to communicate on millions of automation devices. The Modbus protocol is simple and open in format, and enables request/response communications on different types of buses or network devices through the service of providing function code. Modbus also provides TCP/IP based messaging services, connecting Modbus/TCP clients to server devices of a TCP/IP network. The Modbus protocol defines a simple Protocol Data Unit (PDU) that is independent of the underlying communication layer. And the Modbus data can be transmitted and identified on a TCP/IP network by introducing a special Modbus Application Protocol header (MBAP header). The MBAP header is generated by the client and includes a transmission identification, a protocol identification, a subsequent byte length, and an element identifier. And the Modbus/TCP message PDU part comprises a functional code and a data segment. The function code is an important component of the Modbus message and represents the action to be performed by the message. The function codes are public function codes, user-defined function codes and reserved function codes. When normal data are received, the function code replied by the server is consistent with the function code of the received data; when the abnormal data is received, the server replies the abnormal function code. The data segment may be divided into a requested data segment and a responsive data segment. The contents of the requested data segment contain additional information to perform the operation defined by the function code, such as hash and register addresses, the number of entries to be processed, and the number of actual data bytes in the field, and the requested data segment may also be empty. When normal data is received, the response data segment is data required by the request data, and when abnormal data is received, the response data segment is an abnormal code corresponding to the abnormal data, namely an illegal function code, an illegal data value, a slave station equipment fault and the like.
As shown in FIG. 1, the present invention provides a fuzzy test case generation system based on a coupled self-encoder, comprising: the device comprises a sample set construction module, a coupled self-encoder module and a fuzzy test module.
And the sample set construction module is used for generating a data training set. Aiming at different test objects, a normal industrial control network protocol message which conforms to the object protocol is used as a normal data training set, and vulnerability data is added and expanded on the basis to be enough as a malformation training set, wherein the malformation training set comprises an abnormal message set and a vulnerability information set which can cause abnormal response of industrial control equipment.
In this embodiment, the present invention firstly adds abnormal/bug data as input to the coupled self-encoder on the basis of the normal data training set, so that a large amount of normal data and bug data are required as training data. Under actual conditions, normal Modbus/TCP messages are convenient to capture, and can be obtained through network packet analysis software such as Wireshark and Tshark. However, the acquisition of the vulnerability data is very difficult, and the abnormal data constructed based on the known vulnerability or network attack code is limited and is not enough to perform sufficient training on the coupling self-encoder, so that the abnormal or vulnerability data in the malformed data training set needs to be expanded to support the training of the coupling self-encoder. In the embodiment, the vulnerability data of the known industrial control system and the illegal data which do not conform to the protocol format are used as an abnormal message set, the vulnerability of the open source of the vulnerability database, the vulnerability reproduced in the public data at home and abroad and the like are used as vulnerability data sets, the vulnerability data sets and the malformed data training set are added for data fusion to form the malformed database, and the data are used as data samples for model training.
Further, the sample sets are divided into two types, one type is a normal Modbus/TCP protocol message data training set MnormalThe other type is an abnormal/loophole data set training set Mabnormal. Wherein the exception/vulnerability dataset training set MabnormalAnd is divided into two categories, which are abnormal data setsMexceptionAnd a vulnerability dataset MvulnerabilitySpecifically defined as:
abnormal data set Mexception: the messages returning the exception code in the test cases generated by the embodiment and other traditional modes are collected to be used as an exception data set.
Vulnerability data set Mvulnerability: and taking the vulnerability of the open source of the vulnerability database, the recurrent vulnerability in domestic and foreign open documents, the vulnerability mined by other traditional vulnerability mining methods and the like as vulnerability data sets.
After the data set construction, two data sets are obtained for training, including a normal Modbus/TCP protocol message data set M obtained by a network packet analysis toolnormalAnd from an exception data set MexceptionAnd a vulnerability data set MvulnerabilityComponent exception/vulnerability dataset training set Mabnormal。
The coupling self-encoder module is used for generating a test case through a coupling self-encoder based on the data training set.
Extracting the characteristics of the training data through a coding unit part of a self-coder, and integrating the characteristic vectors to obtain a characteristic space of a training data sample; in this embodiment, two types of feature spaces are obtained according to the type of sample data used: the first class of feature space is the feature space of a normal data training set, and the second class of feature space is the feature space of an abnormal/vulnerability data training set.
In this embodiment, the coupled self-encoder is composed of a pair of self-encoders, including a first self-encoder and a second self-encoder, each self-encoder is composed of an encoding unit and a decoding unit, and a data joint distribution containing the correct syntax of the protocol and different attributes of the abnormal message or bug information is constructed by binding and constraining the weights of the two self-encoders, so that the generated test case has both the correct syntax feature of the protocol and the feature of the abnormal message or bug.
In the present embodiment, each of the self-encoders is responsible for generating data in one domain; the coding unit is responsible for learning the representation of the input data, compressing the input data to low-level internal representation, sharing the parameters of the internal representation and the internal representation, and learning the joint distribution of different domains; the decoding unit is responsible for converting the internal representation generated by the encoding unit into output data. The coding unit and the decoding unit generally adopt artificial neural network cascade of semi-supervised learning or unsupervised learning.
The coupling self-encoder module is responsible for the design and training of the coupling self-encoder. The loss function coupled to the self-encoder includes both the difference between the generated data and the data in the training set of normal data and the difference between the generated data and the data in the training set of abnormal/leaky data sets. In this embodiment, the coupled self-encoder can learn the joint distribution of the normal data training set and the abnormal/bug data set training set after training for many times by binding the weights of the two self-encoders, so that a test case which not only conforms to the industrial control network protocol format but also has abnormal/bug characteristics can be generated for testing. As shown in FIG. 2, in the present embodiment, the coupled self-encoder consists of a pair of self-encoders AE1And AE2Each self-encoder consists of an encoding unit and a decoding unit, and the encoding unit and the decoding unit adopt neural network cascade connection. The training of the coupled self-encoder is to encode element E1And E2And a decoding unit D1And D2And (4) training. Trained coding unit E1Can compress data in normal sample set into internal characterization code1And the trained decoding unit D2Internal characterization code capable of data compression of exception/vulnerability sample sets2And restoring to obtain the abnormal/loophole message. The trained coding unit E1And a decoding unit D2And (4) combining, namely decoding the internal representation of the normal sample set by using a decoding unit of the abnormal/loophole sample set, so as to obtain the test case with both normal Modbus/TCP protocol format and abnormal/loophole characteristics.
In this embodiment, the loss function coupled to the self-encoder network is composed of two parts, one part is the difference between the data generated by the normal Modbus/TCP protocol message data training set and the normal Modbus/TCP protocol message data, and the other part is the difference between the data generated by the abnormal/loophole data set training set and the abnormal/loophole data, that is:
Loss=∑||X1-D1(E1(X1))||2+∑||X2-D2(E2(X2))||2
wherein, X1、X2Respectively, normal data set data and abnormal/bug data set data.
The goal of coupled self-encoder training is to derive the parameters that minimize the loss function, namely:
designing a coupled self-encoder according to the designed loss function, and encoding the unit E1And E2Are tied together.
Because the coupled self-encoder belongs to semi-supervised learning or unsupervised learning, a data joint distribution with different attributes (correct grammar of the protocol and abnormal message or loophole information) can be learned from the constructed data set through weight binding, and the data joint distribution comprises the correct grammar of the protocol and the abnormal message or loophole information, so that the generated test case has both a legal format of the protocol and abnormal characteristics. The whole aim of the coupled self-encoder training is to hopefully optimize a loss function step by step through counterstudy (study on characteristics of data with different attributes), realize data joint distribution, generate a Modbus/TCP fuzzy test case by using the trained coupled self-encoder, and send the Modbus/TCP fuzzy test case to a fuzzy test module.
The fuzzy test module is used for detecting the test cases, detecting whether the abnormal test cases occur or not, adding the abnormal test cases into the malformed data set, and optimizing the data training set.
In this embodiment, the adaptation module of the fuzzy test module includes a network protocol adapter module and an industrial control network protocol module, and is configured to send the test case to the device under test for detection and discrimination. And an anomaly analysis module in the fuzzy test module is used for detecting whether anomalies occur or not and judging whether the anomalies occur as bugs or not, and simultaneously adding abnormal messages in each test into the malformed data training set for the next model training.
As shown in fig. 3, the present invention further provides a fuzzy test case generating method based on a coupled self-encoder, which includes the following steps:
s1, generating the data training set through the sample set construction module;
in this embodiment, firstly, the abnormal/bug data is added as the input of the coupled self-encoder on the basis of the normal data training set, and the normal Modbus/TCP packet can be obtained as the normal data training set through network packet analysis software such as Wireshark and Tshark. In the embodiment, the vulnerability data of the known industrial control system and the illegal data which do not conform to the protocol format are used as an abnormal message set, the vulnerability of the open source of the vulnerability database, the vulnerability reproduced in the public data at home and abroad and the like are used as vulnerability data sets, the vulnerability data sets and the malformed data training set are added for data fusion to form the malformed database, and the data are used as data samples for model training.
S2, generating the test case through the coupled self-encoder module based on the data training set;
s2.1, binding and constraining the weights of the first self-encoder and the second self-encoder to construct data joint distribution containing different attributes;
s2.2, compressing the data in the normal data training set through a coding unit of the first self-coder to generate the internal representation of the normal data;
s2.3, decoding and restoring the internal representation of the normal data generated in the step S2.2 through a decoding unit of the second self-encoder to generate the test case;
and S3, detecting and judging the test cases through the fuzzy test module, and adding the abnormal test cases into the malformed data set for optimizing the data training set.
S3.1, detecting whether the test case meets a preset condition or not through the adaptation module;
s3.2, judging the type of the abnormity of the test case through the abnormity analysis module;
and S3.3, adding the abnormal test case into the abnormal data training set through the abnormality analysis module for optimizing the data training set.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.