CN111770078B

CN111770078B - Active learning method and device for network physical system and attack discovery method and device

Info

Publication number: CN111770078B
Application number: CN202010591068.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Xi'an Xinxin Information Technology Co ltd
Current assignee: Anhui Xinxin Science And Technology Innovation Information Technology Co ltd; Xi'an Xinxin Zhixing Technology Co ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2022-07-12
Anticipated expiration: 2040-06-24
Also published as: CN111770078A

Abstract

The invention discloses an active learning method and device for a network physical system and a method and device for discovering attacks, wherein the active learning method comprises the steps of obtaining a first feature vector; turning over elements with preset digits in the first feature vector to obtain a second feature vector; based on the first model which is trained in advance, obtaining a first predicted value after a preset time according to the second feature vector; splicing n₁Obtaining a first feature vector sequence by different second feature vectors, and splicing n₁Obtaining a first difference sequence by different absolute differences; using roulette selection on the first difference sequence to select a second feature vector from the first feature vector sequence as a third feature vector; and obtaining a second model according to the third feature vector and a second actual value after the preset time corresponding to the third feature vector based on the first model which is trained in advance. The second model obtained by the invention can be used for actively identifying attacks, so that the defense capability of the CPS can be improved.

Description

Active learning method and device for network physical system and attack discovery method and device

Technical Field

The invention belongs to the technical field of network physical systems, and particularly relates to an active learning method and device for a network physical system and a method and device for discovering attacks.

Background

Cyber-physical systems (CPS) is a multi-dimensional complex system that integrates computational, network, and physical environments, often used for automation of critical public infrastructure. The real-time perception, dynamic Control and information service of a large engineering system are realized through the organic integration and deep cooperation of 3C (computing, Communication and Control) technology. The CPS realizes the integrated design of calculation, communication and physical systems, can ensure that the systems are more reliable, efficient and cooperative in real time, and has important and wide application prospect.

In view of the potential impact of network attacks on CPS, ensuring the security of CPS has become an important goal anytime than ever before. However, different time scales, patterns and process interactions in CPS pose a huge challenge to the solution and lead to a diversity of different possible countermeasure studies. In recent years, several different research directions have emerged in detecting and protecting against CPS attacks. Popular methods include anomaly detection, which analyzes suspicious events or patterns through data logs (e.g., data from historical records), digital fingerprints, and invariant-based inspection; the digital fingerprint checks the sensor for deception jamming by monitoring the time and frequency domain characteristics of the sensor and process noise; and, based on the examination of invariants, the conditions of its processes and components are constantly monitored. These techniques are complementary to and overriding the built-in authentication procedures installed on the CPS, which typically focus on the simpler and more localized nature of the system. Currently, Morgan uses active learning to reduce the training time of the streaming data classifier, and Zhao and Hoi also use active learning to reduce the training time of the malicious URL (Uniform Resource Locator) classifier.

However, the existing active learning methods are all for classification, but not regression, and cannot actively identify attacks, so that the defense capability of the CPS cannot be improved.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides an active learning method and apparatus for a cyber-physical system, and a method and apparatus for discovering an attack. The technical problem to be solved by the invention is realized by the following technical scheme:

a method for active learning of cyber-physical systems, comprising:

acquiring a first feature vector, wherein the first feature vector comprises a plurality of first effective loads;

turning over elements with preset digits in the first feature vector to obtain a second feature vector;

based on the first model which is trained in advance, obtaining a first predicted value after a preset time according to the second feature vector;

splicing n₁Obtaining a first feature vector sequence by the different second feature vectors, and splicing n₁Obtaining a first difference value sequence by using different absolute difference values, wherein the absolute difference value is the absolute value of the difference value between the first actual value and the first predicted value after the preset time;

using roulette selection on the first difference sequence to select a second feature vector from the first feature vector sequence as a third feature vector;

and obtaining a second model according to the third feature vector and a second actual value after the preset time corresponding to the third feature vector based on the pre-trained first model.

In one embodiment of the present invention, obtaining a first feature vector comprises:

sniffing a plurality of first data packets at a specific time point;

and connecting the first effective loads of the first data packets according to a first preset sequence to obtain a first characteristic vector.

In an embodiment of the present invention, the training method of the pre-trained first model includes:

sniffing a number of second packets at a particular point in time;

connecting the second effective loads of all the second data packets according to a second preset sequence to obtain a second characteristic vector sequence;

obtaining a first actual value sequence after a preset time according to the second effective load;

and inputting the second feature vector sequence and the first actual value sequence into an untrained first model to train the first model, so as to obtain the first model which is trained in advance.

In one embodiment of the invention, the first model comprises one of a linear model and a gradient enhanced decision tree.

In an embodiment of the present invention, obtaining a second model according to the third feature vector and a second actual value corresponding to the third feature vector based on a first model that is trained in advance includes:

splicing a plurality of different third eigenvectors according to a third preset sequence to obtain a third eigenvector sequence;

correspondingly obtaining a plurality of second actual values after preset time according to a plurality of different third eigenvectors, and splicing the plurality of second actual values according to a third preset sequence to obtain a second actual value sequence;

and inputting the third feature vector sequence and the second actual value sequence into the pre-trained first model to obtain a second model.

One embodiment of the present invention provides an active learning apparatus for cyber-physical systems, including:

the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a first feature vector which comprises a plurality of first effective loads;

the first turning module is used for turning elements with preset digits in the first feature vector to obtain a second feature vector;

the first predicted value generation module is used for obtaining a first predicted value after preset time according to the second feature vector based on a first model which is trained in advance;

splicing module for splicing n₁Obtaining a first characteristic vector sequence by the different second characteristic vectors, and splicing n₁Obtaining a first difference value sequence by using different absolute difference values, wherein the absolute difference value is the absolute value of the difference value between the first actual value and the first predicted value after the preset time;

a selection module for using roulette selection for the first difference sequence to select a second feature vector from the first feature vector sequence as a third feature vector;

and the optimization module is used for obtaining a second model according to the third feature vector and a second actual value after the preset time corresponding to the third feature vector based on the pre-trained first model.

One embodiment of the present invention provides a method for discovery attack of a cyber-physical system, including:

step 3.1, obtaining a fourth feature vector, wherein the fourth feature vector comprises a plurality of third effective loads;

step 3.2, obtaining a position sequence according to the importance degree of each element in the fourth feature vector, wherein the position sequence is used for storing the position sequence number of each bit in the fourth feature vector according to the importance degree, and the total number of bits of the fourth feature vector is N;

step 3.3, extracting the first K elements in the position sequence to form a first sequence set, wherein K is less than or equal to N;

step 3.4, a second sequence set is obtained according to the power set of the first sequence set, the second sequence set comprises a plurality of preset sets with the number of digits being n, and elements in a fourth feature vector corresponding to the preset sets are not subjected to bit flipping operation;

step 3.5, turning over elements at corresponding positions in the fourth feature vector according to the preset set to obtain a fifth feature vector;

step 3.6, inputting the fifth feature vector into the second model according to any one of the embodiments to obtain a second predicted value;

step 3.7, obtaining a target function value according to the second predicted value;

step 3.8, obtaining undetermined characteristic vectors according to the objective function values and the objective function threshold, wherein the undetermined characteristic vectors are the current most attacked characteristic vectors;

and 3.9, judging whether K is equal to N or not, if K is not equal to N, circularly executing the step 3.3 to the step 3.8, and updating K +1 each time when executing, until K is equal to N, so as to obtain the final undetermined feature vector.

In an embodiment of the present invention, obtaining an objective function value according to the second predicted value includes:

and obtaining the objective function value according to the second predicted value, the upper limit of the safety threshold and the lower limit of the safety threshold based on an objective function calculation formula.

In an embodiment of the present invention, obtaining the undetermined feature vector according to the objective function value and the objective function threshold includes:

and judging the size of the objective function value and an objective function threshold, if the objective function value is greater than the objective function threshold, taking the objective function value as a new objective function threshold, and taking a fifth feature vector corresponding to the objective function value as an undetermined feature vector, and if the objective function value is less than or equal to the objective function threshold, taking the fifth feature vector corresponding to the objective function threshold as the undetermined feature vector, wherein the initial value of the objective function threshold is 0.

One embodiment of the present invention provides an apparatus for discovery attack of a cyber-physical system, including:

the second obtaining module is used for obtaining a fourth feature vector, and the fourth feature vector comprises a plurality of third effective loads;

a position sequence generating module, configured to obtain a position sequence according to importance degrees of elements in the fourth feature vector, where the position sequence is used to store position sequence numbers of bits in the fourth feature vector according to the importance degrees, where the total number of bits of the fourth feature vector is N;

the first sequence set generation module is used for extracting the first K elements in the position sequence to form a first sequence set, wherein K is less than or equal to N;

a second sequence set generating module, configured to obtain a second sequence set according to the power set of the first sequence set, where the second sequence set includes a plurality of preset sets with n digits, and an element in a fourth feature vector corresponding to the preset set is not subjected to a bit flipping operation;

the second turning module is used for turning elements at corresponding positions in the fourth feature vector according to the preset set to obtain a fifth feature vector;

a second predicted value generation module, configured to input the fifth feature vector to the second model according to any one of claims 1 to 5 to obtain a second predicted value;

the objective function value generating module is used for obtaining an objective function value according to the second predicted value;

the characteristic vector generating module is used for obtaining undetermined characteristic vectors according to the objective function values and the objective function threshold, wherein the undetermined characteristic vectors are the most attacked characteristic vectors at present;

and the judging module is used for judging whether K is equal to N or not, if the K is not equal to the N, updating K to K +1, and controlling the first sequence set generating module to extract front K elements in the fourth feature vector according to the updated K value to form a first sequence set until the K is equal to the N, so as to obtain the final undetermined feature vector.

The invention has the beneficial effects that:

the second model obtained by the present invention, which belongs to a regression model, is capable of predicting future readings (e.g., sensors) from network packets and using the model to guide a search for payload operations (i.e., bit flipping) that are most likely to cause the CPS to enter an unsafe state, can be used to proactively identify attacks, which can improve the CPS's defense.

Drawings

FIG. 1 is a flow chart diagram of an active learning method for CPS according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an active learning apparatus for CPS according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method for CPS attack discovery provided by an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for CPS discovery attack according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.

Example one

Referring to fig. 1, fig. 1 is a schematic flow chart of an active learning method for CPS according to an embodiment of the present invention. The embodiment of the invention provides an active learning method for a network physical system, which comprises the following steps:

step 1.1, a first feature vector is obtained, wherein the first feature vector comprises a plurality of first effective loads.

Step 1.11, sniffing a plurality of first data packets at a specific time point.

Specifically, in this embodiment, for example, a first data packet sniffed through a Raspberry pi (Raspberry Pis) is a data packet sniffed from a sensor, and a specific time point is a certain time point selected according to a requirement, and the sniffed data packet can extract a payload thereof and can obtain a real value corresponding to the payload thereof, for example, a real reading of the sensor. For each sensor, there is one model, so all first data packets come from one sensor.

For a clearer understanding of the present embodiment, the present embodiment is described with CPS as SWaT (Secure Water Treatment platform), a scaled down version of a real-world Water purification plant that produces 5 gallons (gallon) of drinking Water per minute. SWaT is a complex multi-stage system involving chemical processes such as ultrafiltration, dechlorination and reverse osmosis. The SWaT communication is organized into a hierarchical network hierarchy with the lowest level packets consisting of 16 different types of data, totaling up to 22752 different payload combinations. Despite the huge search space, Active blurring is effective in discovering packet-based traffic, pressure, and over/down flow attacks, achieving coverage comparable to established benchmarks and LSTM (Long Short-Term Memory network) based fuzzy testers, and requiring much less time, data, or expertise. Furthermore, the lowest level of the SWaT network hierarchy is actively fuzzy-targeted, so its operation can directly affect the actuators and bypass the logical checks of the system controller.

The network of SWaT testbeds is organized into a hierarchical hierarchy conforming to the ISA99 standard, providing different levels of subdivision and flow control. The upper layers of the hierarchy, i.e. layer 3 and layer 2, are responsible for handling operations management (e.g. history) and remote supervision (e.g. touch screen, engineering workstations), respectively. Layer 1 is a star network connected to a PLC (Programmable Logic Controller), and implements Common Industrial Protocol (CIP) through ethernet/IP (Internet Protocol). Finally, the lowest level of the hierarchy is layer 0, which consists of a ring network implementing ethernet/IP, which connects the various PLCs to their associated sensors and actuators.

For SWaT, packets may be extracted from layer 0 of the network hierarchy, i.e., packets exchanged between PLCs and remote IO (Input/Output) devices. By targeting the lowest layer of the network (i.e., layer 0), it can be ensured that the performed operation is directly changing the state of the actuator. Thus, a network connection bridge may be used to physically connect some raspberry pies to the SWaT's PLC and sniff packets; in fact, an attacker may sniff packets in other ways, for example, by using a wireless connection when it is enabled. Since layer 0 implements the ethernet/IP protocol, Tcpdump, one of the powerful network data collection and analysis tools in Linux, can be used to capture packets and further extract their contents using Scapy (network monitoring tool).

Thus, the current sensor reading can be obtained by querying the history of SWaT. It is assumed that the historical data is authentic, i.e., the system is not simultaneously attacked by another entity, and the system is normal in operation. For SWaT, the first packet is, for example, a packet collected from layer 0. The first data packets come from four stages, and each stage has 4 types of data packets that can be sniffed, wherein the four stages are a supply and storage stage, a preprocessing stage, an ultrafiltration and reflux stage, and a dechlorination stage, the sensors of the four stages are vulnerable, and the 4 types of data packets are respectively 70 in length, 92 in length, 82 in length (192.168.0. S0 in source IP), and 82 in length (192.168.0. S2 in source IP). In these four phases, there are a total of 16 different types of packets. To construct the feature vector for training, the first packet of each type of packet collected at a particular point in time is selected as the first packet.

Step 1.12, connecting the first payloads of the first data packets according to a first preset sequence to obtain a first feature vector.

Specifically, the first preset order may be to fixedly arrange the first payloads in a certain order, and the specific ordering manner of the first payload is not specifically limited in this embodiment, so that all the first packets are fixedly connected together in a specific order to obtain the first eigenvector.

For example, for 16 different types of data packets at SWaT layer 0, their payloads can be concatenated together in a fixed order to obtain a feature vector comprising a 2752 bit sequence.

And step 1.2, turning over elements with preset digits in the first feature vector to obtain a second feature vector.

Specifically, the present embodiment guides the search for the payload operation that most likely causes the CPS to enter the unsafe state through a bit flipping (bit flipping), so that the element of the preset number of bits in the first feature vector may be flipped, and the flipped feature vector is the second feature vector, where the preset number of bits may be set according to the specific feature vector, for example, the preset number of bits may be 1-4 bits, 5 bits, 10 bits, and the like.

And step 1.3, based on the first model which is trained in advance, obtaining a first predicted value after preset time according to the second feature vector.

Specifically, in this embodiment, a training process of the first model is first described in a specific manner, and the training method of the first model that is trained first includes:

step 1.31, sniffing a plurality of second data packets at a specific time point.

Specifically, the first data packet and the second data packet are both from the same sensor, and the sniffing manner is as shown in step 1.11, which is not described herein again.

And step 1.32, connecting the second effective loads of all the second data packets according to a second preset sequence to obtain a second feature vector sequence.

Specifically, the second preset order may be to fixedly arrange the second payloads in a certain order, and the specific sorting manner of the second payloads is not specifically limited in this embodiment, so that all the second packets are fixedly connected together in a specific order to obtain the second eigenvector sequence.

And step 1.33, obtaining a first actual value sequence after a preset time according to the second effective load.

Specifically, for each second payload, an actual value after a certain time may be obtained by, for example, a sensor query according to its correspondence, and thus, for the entire second feature vector sequence, a first actual value sequence is obtained after a preset time.

For example, for SWaT, a history of sensor values after a fixed period of time (i.e., a preset time) has elapsed may be queried, such as 5 seconds for flow and pressure sensors and 30 seconds for a tank level sensor because the tank level sensor changes state more slowly.

And step 1.34, inputting the second characteristic vector sequence and the first actual value sequence into an untrained first model to train the first model, so as to obtain the first model which is trained in advance.

Specifically, an untrained first model is selected, a second feature vector sequence and a first actual value sequence are input into the first model, when a correlation coefficient r is larger than 0.3, the first model starts to converge, and the first model which is trained in advance can be obtained, wherein r represents the nature and the closeness index of the relation between two variables, and the r score is the change percentage explained by the model and reflects the correlation between the predicted value of the sensor and the future actual value. For example, data packets and sensor data for 230 minutes may be collected, data packets and sensor data for 180 minutes may be used as a training set, data packets and sensor data for the remaining 50 minutes may be used as a test set, the first model may first be trained using the data packets and sensor data for 180 minutes, and the r-score of the first model may be calculated using the test set.

Therefore, the first predicted value after the preset time can be obtained by inputting the second feature vector into the first model trained in advance. For example, for flow and pressure sensors, the pre-trained first models may predict their future values, and for tank level sensors, the pre-trained first models may predict their degree of change.

In this embodiment, each sensor corresponds to a first model, the purpose of the establishment of the first model is to enable active learning to converge, all data packets used in the training of the first model are normal data packets, the first model trained in advance is used to output the predicted value of the sensor after a preset time (i.e. a reading at a time in the future) when the payload of the data packet is taken as input, so as to predict the reading at the sensor in the future, and to achieve this goal, many system-specific decisions need to be made, for example, the type of the data packet used for training the model, and a fixed time period suitable for the involved process (some processes may change the physical state faster than other processes). For the reasons mentioned above, the first model is preferably a linear model and a Gradient Boosting Decision Tree (GBDT), both of which can be integrated with an existing active learning framework for regression, which is one key reason for selecting the two models.

Step 1.4, splicing n₁Obtaining a first feature vector sequence by different second feature vectors, and splicing n₁And obtaining a first difference sequence by using different absolute differences, wherein the absolute difference is the absolute value of the difference between the first actual value and the first predicted value after a preset time.

Specifically, repeating steps 1.2 to 1.3 can obtain a plurality of second feature vectors, for example, repeating steps 1.2 to 1.3 to obtain n₁A different second feature vector, n₁And splicing the different second feature vectors to obtain the first feature vector sequence, wherein the different second feature vectors are obtained by turning over elements with different preset digits. In addition, for each second feature vector, it corresponds to an actual value (i.e. a first actual value) after a preset time, so that each first actual value is subtracted from the corresponding first predicted value, and then the absolute value of the difference is taken to obtain an absolute difference value, and the obtained absolute difference value is used to obtain an absolute difference valueAnd splicing all the obtained absolute differences according to the sequence which is the same as the splicing sequence of the first feature vector sequence to obtain a first difference sequence, namely, the a-th second feature vector in the first feature vector sequence is the same as the second feature vector corresponding to the a-th absolute difference in the first difference sequence, and a is a positive integer.

And step 1.5, a roulette selection method is used for the first difference value sequence to select a second feature vector from the first feature vector sequence as a third feature vector.

Specifically, the first feature vector sequence of this embodiment includes a plurality of second feature vectors, and a feature vector needs to be selected from the plurality of second feature vectors, and the method of this embodiment is a roulette selection method, which assigns a selected probability to each second feature vector according to the fitness of the second feature vector, and this embodiment uses the absolute difference in the first difference sequence as the fitness, and the roulette selection method generates a random number within a score sum interval of 0 and the fitness of the second feature vectors, and then traverses all the second feature vectors until the cumulative fitness is greater than the random number, and returns the final second feature vector as the selected feature vector, which is the third feature vector, wherein the probability of each second feature vector being selected is that the selected probability is that the second feature vector is the third feature vector

Where fi is the fitness, and i is from 1 to n₁，n₁Is the total number of the second eigenvectors in the first eigenvector sequence.

And step 1.6, based on the first model which is trained in advance, obtaining a second model according to the third feature vector and a second actual value after the preset time corresponding to the third feature vector.

Specifically, in this embodiment, the first model trained in advance is continuously trained according to the third feature vector and the second actual value after the preset time corresponding to the third feature vector, and the second model is obtained after training, so that the active learning process of the first model is realized.

Step 1.6 of this embodiment may specifically include step 1.61 to step 1.63, where:

and step 1.61, splicing a plurality of different third eigenvectors according to a third preset sequence to obtain a third eigenvector sequence.

Specifically, the step 1.4 is repeatedly executed to obtain a plurality of third feature vectors, and all the third feature vectors are spliced according to a certain sequence (i.e., a third preset sequence), so as to obtain a third feature vector sequence, where the third preset sequence may be, for example, a sequence of obtaining the third feature vectors.

Step 1.62, correspondingly obtaining a plurality of second actual values after preset time according to a plurality of different third eigenvectors, and splicing the plurality of second actual values according to a third preset sequence to obtain a second actual value sequence.

Specifically, each third eigenvector corresponds to an actual value (i.e., a second actual value) after a preset time, so that a second actual value sequence can be obtained after all the second actual values are spliced according to a third preset sequence.

And step 1.63, inputting the third feature vector sequence and the second actual value sequence into the first model which is trained in advance to obtain a second model.

Specifically, the obtained third feature vector sequence and the second actual value sequence are input into the first model trained in advance until the third feature vector sequence and the second actual value sequence converge, so that the second model subjected to active learning can be obtained.

After the pre-training phase of the first model is completed, a model is available that makes reasonable predictions about normal data packets in the CPS network. However, the attacks required to test the CPS do not necessarily consist of normal packets. Therefore, a larger sample set is needed to train the model further, but because of the overhead of the system and the huge search space, it cannot train blindly, e.g. for SWaT there can be 2^2752 potential feature vector combinations.

Thus, the present embodiment further trains the model using (online) active learning, which is a supervised machine learning approach that iteratively improves the current model. Active learning may exponentially reduce the amount of training data required. Active learning is to reduce the amount of extra data required for sampling examples, which estimate to some extent that the current model can be maximally modified. In this embodiment, the construction of a new packet is guided by flipping the bits of an existing packet (which is more conservative than constructing the feature vector of the payload from the beginning, but minimizes the likelihood that the packet will be rejected). Once a new feature vector is sampled, the constituent packets in the CPS network can be spoofed to observe their effect on the sensor readings, and the model is retrained accordingly.

Although active learning for the classification problem has been well studied, the active learning framework for regression is limited and the existing assumptions of some frameworks are not suitable for the application of the present embodiment (e.g., gaussian distribution). However, the Expected Model Change Maximization (EMCM) method proposed by Cai et al avoids this assumption and is applicable to CPS. The framework of this approach is based on sampling new samples that estimate the gradient in the model itself, i.e. the linear model, or a linear approximation of a GBDT based on "super features" extracted from the tree, to the greatest extent possible.

Therefore, the present embodiment proposes an Expected Behavior Change Maximization (EBCM) based on the above reasons. Rather than sampling the estimated examples of the most varied models, EBCM attempts to identify those that explore the most different behaviors than the system currently behaves. For example, if it is currently predicted that the reading of one sensor will increase, the EBCM will attempt to identify those instances that will result in the reading being reduced as much as possible. The intuition of this approach is that by exploring different behaviors in a particular environment, providing more information, this approach will also check whether strange packets that are predicted to cause some behavior will actually cause that behavior.

The second model obtained by the present embodiment, which belongs to a regression model, can predict future readings (e.g., sensors) from network packets and use the model to guide a search for payload operations (i.e., bit flipping) that are most likely to cause the CPS to enter an unsafe state, can actively identify attacks, and thus can improve the CPS's defense.

The embodiment uses an active learning method to construct a regression model in the CPS system for the first time, rather than constructing the model for classification.

Example two

Referring to fig. 2, fig. 2 is a schematic structural diagram of an active learning apparatus for a cyber-physical system according to an embodiment of the present invention. The present embodiment further provides an active learning device based on the first embodiment, where the active learning device includes:

the first obtaining module is used for obtaining a first feature vector, and the first feature vector comprises a plurality of first effective loads;

splicing module for splicing n₁Obtaining a first feature vector sequence by different second feature vectors, and splicing n₁Obtaining a first difference value sequence by using different absolute difference values, wherein the absolute difference value is the absolute value of the difference value between a first actual value and a first predicted value after a preset time;

and the optimization module is used for obtaining a second model according to the third feature vector and a second actual value after the preset time corresponding to the third feature vector based on the first model trained in advance.

In one embodiment of the present invention, the first obtaining module is configured to sniff a number of first packets at a specific time point; and connecting the first effective loads of the first data packets according to a first preset sequence to obtain a first characteristic vector.

In one embodiment of the present invention, a training method for pre-training a completed first model includes: sniffing a number of second data packets at a specific point in time; connecting the second effective loads of all the second data packets according to a second preset sequence to obtain a second characteristic vector sequence; obtaining a first actual value sequence after a preset time according to the second effective load; and inputting the second characteristic vector sequence and the first actual value sequence into an untrained first model to train the first model, so as to obtain the first model which is trained in advance.

In an embodiment of the present invention, obtaining a second model according to a third feature vector and a second actual value corresponding to the third feature vector based on a first model that is trained in advance includes: splicing a plurality of different third eigenvectors according to a third preset sequence to obtain a third eigenvector sequence; correspondingly obtaining a plurality of second actual values after preset time according to a plurality of different third eigenvectors, and splicing the plurality of second actual values according to a third preset sequence to obtain a second actual value sequence; and inputting the third feature vector sequence and the second actual value sequence into the first model which is trained in advance to obtain a second model.

The active learning apparatus provided in the embodiment of the present invention may implement the method embodiment of the first embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

EXAMPLE III

Referring to fig. 3, fig. 3 is a flowchart illustrating a method for CPS to discover attacks according to an embodiment of the present invention. On the basis of the above embodiments, the present invention provides a method for discovering attacks on a network physical system, where the method includes:

and 3.1, acquiring a fourth feature vector, wherein the fourth feature vector comprises a plurality of third effective loads.

Specifically, the method for obtaining the fourth feature vector in this embodiment is the same as the method for obtaining the first feature vector in the first embodiment, and is not repeated herein.

And 3.2, obtaining a position sequence according to the importance degree of each element in the fourth feature vector, wherein the position sequence is used for storing the position serial number of each bit in the fourth feature vector according to the importance degree, and the total number of the bits of the fourth feature vector is N.

Specifically, for example, the position sequence may be arranged in order from the position with the highest importance to the position with the lowest importance, and the sequence numbers are, for example, 0 to (N-1), where Φ is used to store the position in the fourth feature vector, for example, the position sequence is {3,2,5,0. }, 3 represents the bit with subscript 3 in the fourth feature vector, the bit is the position with the highest importance, and 3 represents the position of the bit in the fourth feature vector.

And 3.3, extracting the first K elements in the position sequence to form a first sequence set, wherein K is less than or equal to N.

In particular, using phi_KRepresenting the first set of sequences, the first K elements of the position sequence phi are stored.

And 3.4, obtaining a second sequence set according to the power set of the first sequence set, wherein the second sequence set comprises a plurality of preset sets with the number of n, and elements in a fourth feature vector corresponding to the preset sets are not subjected to bit flipping operation.

Specifically, the Power Set is a Set family formed by all subsets (including full Set and empty Set) in the original Set, and the Power Set of the first sequence Set is

And selecting the subsets with the number of bits n from the first sequence set, wherein each subset with the number of bits n is a preset set, the preset set is marked with C, and the sequence numbers stored in the preset set correspond to the sequence numbers of the positions which are not subjected to the bit flipping operation in the fourth feature vector, so that all the preset set sets form a second sequence set.

In addition, a set Done is further provided in this embodiment, and after the processing is completed, the preset set C that has completed the above steps may be stored in the set Done, so that the set stored in the set Done may not be executed when the bit flipping step is executed next time.

And 3.5, turning over elements at corresponding positions in the fourth feature vector according to the preset set to obtain a fifth feature vector.

Specifically, for each obtained preset set, the element corresponding to the position sequence number in the fourth feature vector is correspondingly flipped according to the position sequence number stored in the preset set, so as to obtain a fifth feature vector after bit flipping.

And 3.6, inputting the fifth feature vector into the second model in the first embodiment to obtain a second predicted value.

And 3.7, obtaining the objective function value according to the second predicted value.

Specifically, the present embodiment generates the candidate feature vector by flipping a fixed number of bits in the fourth feature vector, and then expands the search range to other relatively less important bits. When different candidates are generated, they are evaluated according to a simple objective function value that is maximized when the predicted sensor state is closer to the edge of its operating range.

Further, step 3.7 may comprise: and obtaining an objective function value according to the second predicted value, the upper limit of the safety threshold and the lower limit of the safety threshold based on an objective function calculation formula, wherein the objective function calculation formula is as follows:

wherein vs represents the second predicted value, Hs represents the upper limit of the safety threshold, Ls represents the lower limit of the safety threshold, where ds represents:

and 3.8, obtaining the undetermined characteristic vector according to the objective function value and the objective function threshold, wherein the undetermined characteristic vector is the current most attacked characteristic vector.

Specifically, the magnitude of the objective function value and the magnitude of the objective function threshold are judged, if the objective function value is greater than the objective function threshold, the objective function value is used as a new objective function threshold, a fifth feature vector corresponding to the objective function value is used as an undetermined feature vector, if the objective function value is less than or equal to the objective function threshold, the fifth feature vector corresponding to the objective function threshold is used as the undetermined feature vector, and the initial value of the objective function threshold is 0.

In this embodiment, an initial value of the objective function threshold is set to be 0, after the objective function value is obtained, the objective function threshold is updated by judging the size of the objective function value and the size of the objective function threshold, when the objective function value is greater than the objective function threshold, the fifth feature vector corresponding to the objective function value is taken as an undetermined feature vector, and when the objective function threshold is greater than the objective function value, the fifth feature vector corresponding to the objective function threshold is taken as an undetermined feature vector, so that the current most aggressive feature vector can be determined.

Specifically, after step 3.8, the relationship between the current K value and N needs to be determined, if the current K value is not equal to N, it indicates that bit flipping operation has not been performed on all elements in the fourth eigenvector, the K value in step 3.3 needs to be updated to K +1, and steps 3.3 to 3.8 are executed in a loop, only the K value in step 3.3 needs to be updated each time, until bit flipping operation is performed on all elements in the fourth eigenvector, and the undetermined eigenvector finally obtained in step 3.8 is the vector most likely to cause the CPS system to enter the unsafe state.

The present embodiment searches for candidate attacks by flipping the significant bits of the payload in the data packet, and uses the model obtained by active learning to identify which attacks will drive the system into an unsafe state.

The method of attack discovery of this embodiment is an Active fuzzy method, which is a popular technique for automatic test system defense, providing invalid, unexpected or random inputs to the system and monitoring their responses, and a fully automated method for finding a test suite for packet-level cyber attacks suffered in the CPS.

In this embodiment, it is assumed that an attacker can intercept, analyze and manipulate packets of the CPS, such as packets exchanged over the layer 0 network of the SWaT, without the need to know the meaning of the payload of these packets in advance. It is critical that this embodiment assumes that an attacker can observe real values during the operation of the test bench, i.e. to observe (or query) the effect of packet manipulation during active learning. While the present embodiment does not assume that an attacker has access to a large number of data sets, the present embodiment assumes that it is able to observe values like sensors for several consecutive minutes at a time in order to perform some process of pre-training and active learning convergence. Without this assumption, the attacker cannot judge whether the attack was successful.

The active fuzzy method of the present embodiment is a black box method for automatically discovering packet level network attacks on the CPS. By iteratively constructing a regression model with active learning methods, the huge search space and resource cost problems are overcome and a new algorithm (EBCM) is proposed to guide this process by finding the most different behaviors.

Unlike the existing Fuzzing method, the Active Fuzzing is directly applied to the network package level environment of the real and complex CPS, so that any abstract environment necessary for a modeling language is not required. In addition to achieving coverage comparable to established benchmarking and LSTM-based fuzzers, it also significantly reduces time and data costs.

Example four

Referring to fig. 4, fig. 4 is a schematic structural diagram of an apparatus for CPS attack discovery according to an embodiment of the present invention. The present embodiment further provides, on the basis of the third embodiment, an apparatus for discovering an attack on a network physical system, where the apparatus includes:

the position sequence generating module is used for obtaining a position sequence according to the importance degree of each element in the fourth feature vector, and the position sequence is used for storing the position sequence number of each bit in the fourth feature vector according to the importance degree, wherein the total number of bits of the fourth feature vector is N;

the second sequence set generating module is used for obtaining a second sequence set according to the power set of the first sequence set, the second sequence set comprises a plurality of preset sets with the number of digits being n, and elements in a fourth feature vector corresponding to the preset sets are not subjected to bit flipping operation;

a second predicted value generation module, configured to input a fifth feature vector to the second model according to any one of claims 1 to 5 to obtain a second predicted value;

the characteristic vector generating module is used for obtaining undetermined characteristic vectors according to the objective function values and the objective function threshold, and the undetermined characteristic vectors are the current most aggressive characteristic vectors;

and the judging module is used for judging whether K is equal to N or not, if the K is not equal to the N, updating K to K +1, and controlling the first sequence set generating module to extract front K elements in the fourth feature vector according to the updated K value to form a first sequence set until the K is equal to N, so as to obtain the final undetermined feature vector.

In an embodiment of the present invention, the second predicted value generating module is specifically configured to obtain an objective function value according to the second predicted value, the upper limit of the safety threshold, and the lower limit of the safety threshold, based on an objective function calculation formula.

In an embodiment of the present invention, the feature vector generation module is specifically configured to determine the sizes of the objective function value and the objective function threshold, if the objective function value is greater than the objective function threshold, use the objective function value as a new objective function threshold, and use a fifth feature vector corresponding to the objective function value as an undetermined feature vector, if the objective function value is less than or equal to the objective function threshold, use the fifth feature vector corresponding to the objective function threshold as an undetermined feature vector, where an initial value of the objective function threshold is 0.

The device for discovering attacks provided by the embodiment of the present invention can execute the method embodiment of the third embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device), or computer program product. Accordingly, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "module" or "system. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. A computer program stored/distributed on a suitable medium supplied together with or as part of other hardware, may also take other distributed forms, such as via the Internet or other wired or wireless telecommunication systems.

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples described in this specification can be combined and combined by those skilled in the art.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. An active learning method for cyber-physical systems, comprising:

obtaining a first feature vector, wherein the first feature vector comprises a plurality of first effective loads;

splicing n₁Obtaining a first characteristic vector sequence by the different second characteristic vectors, and splicing n₁Obtaining a first difference sequence by different absolute differences, wherein the absolute differences are absolute values of differences between the first actual value and the first predicted value after the preset time, and n is an absolute value of a difference between the first actual value and the first predicted value after the preset time₁Is the total number of second eigenvectors in the first eigenvector sequence, n₁≥2；

Selecting a second feature vector from the first feature vector sequence as a third feature vector by using a roulette selection method by taking the absolute difference value in the first difference value sequence as a fitness;

and obtaining a second model according to the third feature vector and a second actual value after preset time corresponding to the third feature vector based on the pre-trained first model.

2. The active learning method of claim 1, wherein obtaining a first feature vector comprises:

sniffing a plurality of first data packets at a specific time point;

3. The active learning method of claim 1, wherein the training method of the pre-trained first model comprises:

sniffing a number of second data packets at a specific point in time;

4. The active learning method of any one of claims 1 to 3 wherein the first model comprises one of a linear model and a gradient enhanced decision tree.

5. The active learning method according to claim 1, wherein obtaining a second model according to the third feature vector and a second actual value corresponding to the third feature vector based on a first model trained in advance comprises:

6. An active learning apparatus for cyber-physical systems, comprising:

splicing module for splicing n₁Obtaining a first feature vector sequence by the different second feature vectors, and splicing n₁Obtaining a first difference sequence by different absolute differences, wherein the absolute differences are absolute values of differences between the first actual value and the first predicted value after the preset time, and n is an absolute value of a difference between the first actual value and the first predicted value after the preset time₁Is the total number of second eigenvectors in the first eigenvector sequence, n₁≥2；

A selection module, configured to select a second feature vector from the first feature vector sequence as a third feature vector by using a roulette selection method with the absolute difference in the first difference sequence as a fitness;

7. A method for discovery attack of a cyber-physical system, comprising:

step 3.4, a second sequence set is obtained according to the power set of the first sequence set, the second sequence set comprises a plurality of preset sets with n digits, and elements in a fourth feature vector corresponding to the preset sets do not undergo bit flipping operation;

step 3.6, inputting the fifth feature vector into the second model of any one of claims 1 to 5 to obtain a second predicted value;

step 3.7, based on the objective function calculation formula, obtaining an objective function value according to the second predicted value;

and 3.9, judging whether K is N, if K is not N, circularly executing the step 3.3 to the step 3.8, and updating K +1 when executing each time until K is N to obtain a final undetermined feature vector, wherein N is the last element.

8. The method of discovering attacks according to claim 7, wherein deriving an objective function value based on the second predictor comprises:

9. The method of discovering attacks according to claim 7, wherein deriving a pending feature vector based on the objective function value and an objective function threshold comprises:

judging the size of the objective function value and the objective function threshold, if the objective function value is larger than the objective function threshold, taking the objective function value as a new objective function threshold, and taking a fifth feature vector corresponding to the objective function value as an undetermined feature vector, and if the objective function value is smaller than or equal to the objective function threshold, taking the fifth feature vector corresponding to the objective function threshold as the undetermined feature vector, wherein the initial value of the objective function threshold is 0.

10. An apparatus for discovery attacks on cyber-physical systems, comprising:

a second obtaining module, configured to obtain a fourth feature vector, where the fourth feature vector includes a plurality of third payloads;

a second sequence set generation module, configured to obtain a second sequence set according to the power set of the first sequence set, where the second sequence set includes a plurality of preset sets with n digits, and an element in a fourth eigenvector corresponding to each preset set is not subjected to bit flipping;

the objective function value generating module is used for obtaining an objective function value according to the second predicted value based on an objective function calculation formula;

and the judging module is used for judging whether K is N or not, if not, updating K to be K +1, and controlling the first sequence set generating module to extract the first K elements in the fourth feature vector according to the updated K value to form a first sequence set until K is N to obtain a final undetermined feature vector, wherein N is the last element.