CN114863226A

CN114863226A - Network physical system intrusion detection method

Info

Publication number: CN114863226A
Application number: CN202210446927.2A
Authority: CN
Inventors: 王振东; 李泽煜; 陈潇潇; 杨书新; 王俊岭; 李大海
Original assignee: Jiangxi University of Science and Technology
Current assignee: Jiangxi University of Science and Technology
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-08-05

Abstract

A network physical system intrusion detection method, carry on the data preconditioning to the intrusion detection data set, the data preconditioning includes the digitized processing of the data of the character type, data normalization processing and data imbalance processing; selecting the optimal characteristic subset of the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm; pre-training the teacher network model according to the selected optimal feature subset; and (3) an intrusion detection model training process: initializing parameters of an intrusion model, and determining the structure of a student network model; inputting two groups of network flows of different categories into an intrusion detection model for training based on the optimal feature subset; adjusting errors in the K-fold cross training process according to knowledge distillation loss until the student network model converges; and testing the intrusion detection model to obtain a classification result of each piece of data. The invention realizes the intrusion detection of the Internet of things with the characteristics of light weight, real-time property, unsupervised property and the like, reduces the excessive dependence on the label and prompts the generalization capability.

Description

Network physical system intrusion detection method

Technical Field

The invention belongs to the technical field of industrial networks, and particularly relates to a network physical system intrusion detection method.

Background

A Cyber Physical System (CPS) is a mechanism that is based on control or monitoring of a computer algorithm, and the entire system is integrated with a network, and is generally referred to as a large-scale, geographically dispersed, complex and heterogeneous internet of things. In recent years, the development and deployment of various types of network physical systems have exponentially increased, and have a great influence on aspects of daily life, such as power grids, transportation systems, healthcare equipment, household appliances and the like. Many such systems are deployed in critical infrastructure, life support devices, or places of vital importance to our daily lives.

However, the diversity of CPS applications deployed across networks in the internet of things makes them vulnerable to cyber and physical attacks between different levels of systems, particularly in terms of message transmission in the smart manufacturing process. This introduces a safety hazard into the CPS application, causing the program to become out of control and injuring people who rely on the program. The industrial CPS highly attaches importance to communication and network capacity, acquires physical world object state data in real time through a network and an interface and sends the data to the server, and the server performs corresponding processing after receiving the data and returns the data to the physical terminal equipment to perform corresponding changes. Typically, an attacker will intrude into the CANbus network and hijack the data sent to the server, thereby compromising the equipment of the industrial CPS, and a supervisory control and data acquisition (SCADA) system involves monitoring and collecting signals generated across the network (such as vibration, temperature and TX & RX packet data), where a Deep Learning (DL) based anomaly detection module is deployed to identify anomalies.

An Intrusion Detection System (IDS) can detect intrusion behaviors which cannot be prevented by other security mechanisms, and plays an important role in protecting the CPS as a second-channel defense line. According to the difference of data sources, the intrusion detection system can be divided into: host-based intrusion detection and network-based intrusion detection. Host-based intrusion detection only monitors hosts, needs to be installed on each host, cannot observe network traffic, and cannot analyze network-related behavior information. Network-based intrusion detection observes and analyzes real-time network traffic and monitors a plurality of hosts, and aims to collect data packet information and check the content of the data packet information so as to detect intrusion behaviors in the network. Modern artificial intelligence technology, including intelligent sensing, intelligent control, etc., is widely used in behavior monitoring in intelligent manufacturing. However, detecting abnormal traffic in industrial CPS still presents some challenges. First, the hybrid network physical environment constructed with cloud infrastructure is a large and complex distributed system, and thus a large number of industrial data streams (e.g., instructions, accelerometers, video, images, etc.) are generated by various physical systems and sensors. Another key problem is that such abnormal events occur in the real world with a low probability, thus resulting in a lack of good labeling data for model training. Moreover, the lack of monitoring data may be caused by different factors, such as sensor failure, data transmission error, etc., which may cause more difficulties in data acquisition and model training, and make it difficult to implement anomaly detection. Furthermore, nodes in the internet of things network are mostly deployed in devices with limited resources, e.g., limited power, limited computing, communication and storage capabilities, etc. In order to reduce the damage caused by malicious attacks in the industrial CPS, high-precision and timely real-time anomaly detection is generally required to facilitate overall performance monitoring of data streams obtained and transmitted based on distributed nodes at different levels across the system.

To sum up, how to compress the size of the model while not reducing the efficiency of the intrusion detection model, and improving the generalization ability of the model has practical significance.

Disclosure of Invention

Therefore, the invention provides the Internet of things intrusion detection method based on self-supervision learning and self-knowledge distillation, which realizes the Internet of things intrusion detection with the characteristics of light weight, real-time performance, unsupervised performance and the like, reduces the excessive dependence on the label and prompts the generalization capability.

In order to achieve the above purpose, the invention provides the following technical scheme: a network physical system intrusion detection method comprises the following steps:

(1) carrying out data preprocessing on the intrusion detection data set, wherein the data preprocessing comprises character type data digitization processing, data normalization processing and data unbalance processing;

(2) selecting the optimal characteristic subset of the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm;

(3) pre-training the teacher network model according to the selected optimal feature subset;

(4) training a KD-TCNN intrusion detection model:

(41) initializing KD-TCNN intrusion detection parameters and determining the structure of a student network model;

(42) inputting two groups of network flows of different categories into the KD-TCNN intrusion detection model for training based on the optimal feature subset;

(43) adjusting the error of the K-fold cross training process according to the knowledge distillation loss until the student network model converges;

(5) and testing the KD-TCNN intrusion detection model, and inputting the preprocessed test data set into a student network to obtain a classification result of each piece of data.

As a preferred scheme of the network physical system intrusion detection method, in the step (1), the intrusion detection data set comprises an NSL-KDD data set, and the character type data is subjected to a digitization processing process, so that the element types of the character type in the NSL-KDD data set are converted into numerical type data.

As a preferred scheme of the intrusion detection method of the cyber-physical system, in the step (1), the data normalization processing procedure, according to the actual distribution of the data, has a normalization preprocessing formula as follows:

wherein x is _i For the ith characteristic value in the original data,

is the minimum value of the ith characteristic value,

is the maximum value among the ith characteristic values,

the normalized result is adopted.

As a preferred scheme of the intrusion detection method of the cyber-physical system, in step (2), the optimal solution is named as α, the second and third optimal solutions are named as β and δ, respectively, the remaining candidate solutions are assumed to be ω, and the grayling optimization algorithm step includes:

and (3) a prey surrounding stage: establishing a mathematical model of the surrounding behavior;

a hunting stage: guided by α, β and δ may participate in hunting; the remaining omega updates the location according to the location of the best search agent;

and (3) a prey attacking stage: simulating an approaching prey, and performing linear updating on the parameter alpha in each iteration;

and in the characteristic subset evaluation stage, a convolutional neural network is used as a learning algorithm, a fitness function for evaluating the position of the wolf is adopted, and the characteristic subset with the lowest fitness function value is selected to perform characteristic selection and dimension reduction to obtain an optimal characteristic subset.

As a preferred scheme of the intrusion detection method of the cyber-physical system, in the step (42), a knowledge distillation framework based on a triple convolution neural network is adopted for KD-TCNN intrusion detection model training.

As a preferred scheme of the intrusion detection method of the network physical system, in step (42), three losses are considered in the design of the loss function, wherein the three losses comprise a triple loss L based on the distance between an anchor sample and a positive sample and a negative sample _triplet Intersection of student network output and tagsLoss of entropy L _hard KL divergence loss L with teacher-student network _soft 。

As an optimal scheme of the intrusion detection method of the network physical system, in order to restrict the difference degree of the probability distribution of the output of the student network model and the real label, the cross entropy loss of the output of the student network model and the real label is used as a part of a model loss function, and the cross entropy loss L of the output of the student network model and the real label is defined _hard 。

As a preferred scheme of the intrusion detection method of the network physical system, coefficients are added to loss terms to adjust the contribution of each loss to the overall loss function, and the loss function L of the model is defined as follows:

L＝L _KD +θL _triplet

where θ is the equilibrium coefficient controlling the knowledge distillation loss and triplet loss during model training, L _KD Knowledge of the distillation part loss, L _triplet A triplet penalty based on the distance between the anchor sample and the positive and negative samples.

As a preferred scheme of the intrusion detection method of the network physical system, the knowledge distillation framework based on the triple convolutional neural network adopts deep separable convolution.

As a preferred scheme of the intrusion detection method of the cyber-physical system, in step (43), the K-fold cross training process includes:

(431) defining a model and a learning rate, and dividing a data set into a training data set and a testing data set;

(432) dividing the training data set into K parts, taking one part as a verification set, and taking the other K-1 parts as a training set;

(433) defining a gradient optimizer, wherein the learning rate adopts an attenuation strategy, K-1 data is used for model training, and the rest data is used for testing a model;

(434) and (4) repeating the step (433) K times to obtain an optimal model and obtain the performance index of the optimal model in the test data set.

The invention has the following advantages: carrying out data preprocessing on the intrusion detection data set, wherein the data preprocessing comprises character type data digitization processing, data normalization processing and data unbalance processing; selecting the optimal characteristic subset of the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm; pre-training the teacher network model according to the selected optimal feature subset; a KD-TCNN intrusion detection model training process: initializing KD-TCNN intrusion detection parameters and determining the structure of a student network model; inputting two groups of network flows of different categories into a KD-TCNN intrusion detection model for training based on the optimal feature subset; adjusting the error of the K-fold cross training process according to the knowledge distillation loss until the student network model converges; and testing the KD-TCNN intrusion detection model, and inputting the preprocessed test data set into a student network to obtain a classification result of each piece of data. The invention adopts knowledge distillation to make the output of the student model as close as possible to the teacher model, so that the student network can learn the information between classes in the teacher network, and can process and analyze large-scale data in real time and reduce the parameter quantity of the model; the difference between the output of the teacher network model and the output of the student network model can be reduced, so that the performance of the student model is improved; the invention further reduces the parameter and the calculated amount of the model by adopting the deep separable convolution, so that the intrusion detection model can be deployed in the nodes with limited computing capability in the Internet of things network, and the intrusion detection time is reduced to realize real-time detection; the verification result shows that the method is superior to the traditional deep learning model in the parameter quantity and other performance indexes.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary and that other implementation drawings may be derived from the provided drawings by those of ordinary skill in the art without inventive effort.

Fig. 1 is a schematic flowchart of an intrusion detection method for a cyber-physical system according to an embodiment of the present invention;

fig. 2 is a flowchart of feature selection of a binary grayish wolf optimization algorithm in the intrusion detection method for the cyber physical system according to the embodiment of the present invention;

fig. 3 is a knowledge distillation framework based on a triple convolutional neural network in the intrusion detection method for a cyber physical system according to an embodiment of the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With reference to fig. 1, the intrusion detection method for a network physical system provided by the present invention includes:

(4) training a KD-TCNN intrusion detection model:

In this embodiment, a binary grayish wolf optimization algorithm is used to select the optimal feature subset. The grey wolf optimization algorithm is a group intelligent algorithm for simulating the trapping behavior of the grey wolf, and the trapping tasks such as enclosing, catching and attacking are distributed to grey wolf groups with different levels according to the social level of the grey wolf to finish the trapping behavior, so that the process of global optimization is realized.

To model the social rank of the wolf when designing the wolf optimization algorithm, the present invention names the best solution as α, the second and third best solutions as β and δ, respectively, and the remaining candidate solutions are assumed to be ω. In the gray wolf optimization algorithm, the hunting process is guided by three wolfs, α, β, and δ, with the ω wolf following the three wolfs. The grey wolf optimization algorithm comprises the following specific steps:

step 1. surround prey stage:

the wolves are caught by the wolf pack, which needs to first be enclosed. To build a mathematical model of the bounding behavior, the bounding behavior is explained by the following equation:

where, t is the number of iterations,

and

is a vector of coefficients that is a function of,

is the position of the prey,

is the position of the grey wolf, alpha decreases linearly from 2 to 0 in an iterative process,

and

is at [0,1]]Random vectors within a range.

Step 2. hunting stage:

hunting is usually guided by α wolves, and β and δ wolves may participate in hunting. Assuming that α, β, and δ are better understood about the potential locations of prey and require other wolves (including ω wolves) to update their locations according to the location of the best search agent, the location update formula is as follows:

wherein the content of the first and second substances,

and

are the three best solutions in a given iteration t population,

and

is defined by formula (3).

And

are defined by equations (9) to (11), respectively:

wherein

And

is defined by formula (4).

Step 3. prey stage:

when the prey stops moving, the wolf completes the hunting process through the attack. To model an approaching prey, the parameter α is updated linearly in each iteration according to equation (12), ranging from 2 to 0.

α＝2-t(2/MaxIter) (12)

Wherein t is the current iteration number, and MaxIter is the maximum iteration number allowed by optimization.

In the binary grayish wolf optimization algorithm, the update formula of the wolf position is a function of three position vectors, namely x _α 、x _β And x _δ It attracts each wolf to move forward three best solutions. In the binary grayling optimization algorithm, the solution pool is in binary form at any given time, with all solutions in the corners of a hypercube. The present invention employs a second model of the binary grayling optimization algorithm, bGWO2, in which only the updated grayling location vector is binary. The gray wolf location update formula is shown as formula (13):

wherein rand is from [0,1]]The random numbers to be extracted are uniformly distributed,

is the binary position updated in dimension d in iteration number t, the sigmoid function is defined as follows:

the binary grayish optimization algorithm searches the feature space in a self-adaptive mode to find the optimal feature subset, wherein the optimal feature subset is the feature subset with the highest classification performance and the least selected feature number. The fitness function for estimating the gray wolf location in the binary gray wolf optimization is shown in equation (15):

wherein, P is the classification accuracy, L is the number of elements of the selected optimal feature subset, N is the total number of features, alpha and beta are divided into classification accuracy and the weight of the number of the selected feature subset, and alpha belongs to [0,1] and beta is 1-alpha.

And completing selection of the optimal characteristic subset of the network traffic data on the training subset. In the characteristic subset evaluation stage, a convolutional neural network is used as a learning algorithm, and a formula (15) is used as a fitness function. And selecting the characteristic subset with the lowest fitness function value to realize characteristic selection and dimension reduction, thereby obtaining the characteristic subset with the best classification effect. Fig. 2 shows a flow chart of the feature selection of the whole binary grayish wolf optimization algorithm.

Specifically, knowledge distillation is a common method for model compression, and the expression of 'knowledge distillation' in a complex classroom network with strong learning capacity is used for model compression, and the feature expression is transmitted to a Student network with small parameter and weak learning capacity in a Teacher-Student framework, so that the Student network with high speed, strong capacity and small model is obtained. On the other hand, knowledge distillation to bring the output of the student model as close as possible to the teacher model allows the student network to learn softer knowledge in the teacher network, where the information between classes is contained, which is not available in traditional one-hot coding. Because the goal of knowledge distillation is to increase the similarity between teacher and student models, while depth metric learning aims to reduce the distance between similar sample inputs, increasing the distance between different input samples. The functionality of metric learning to reduce differences between similar inputs can be used for knowledge distillation to reduce differences between teacher model and student model outputs, thereby improving performance of the student model. Generally, the Siamese Neural Network and the triple Neural Network are two common Neural Network architectures for metric learning, and since the Siamese Neural Network can only consider the distance between two samples, the Siamese Neural Network must uniquely determine the definition of similarity between two samples, for example: if there are two different male figures, they should be judged similar in the case of gender concepts. However, they should be judged dissimilar in terms of the concept of an individual. It is difficult to express these multiple concepts in the simple Neural Network, and the triple Neural Network makes the distance between anchors-positive closer to the distance between anchors-negative through learning, so that a plurality of similar concepts can be considered, and does not depend on one similar concept, so the invention reduces the difference between the teacher model and the student model output through the triple Neural Network in the depth measurement learning.

Referring to fig. 3, a knowledge distillation framework based on a triplet convolutional neural network is shown.

In order to train KD-TCNN intrusion detection model, network flow sample data x _a Sending into pre-trained teacher network, and outputting via softmax output layer

Computing probability vectors for classes

Where T is typically set to a temperature of 1, i.e., corresponding to the softmax activation function, using a higher value for T results in a smoother probability distribution over the classes, also referred to as softpseudo-labels.

And x _a Network traffic sample data x of different classes _n And network traffic sample data x _a Respectively sent to student network, and output via softmax output layer

And

computing probability vectors for classes

And

in order to ensure the prediction accuracy and the false alarm rate of the abnormal detection of the industrial CPS data, three losses are considered in the design of the loss function, and the loss L of the triple based on the distance between the anchor sample and the positive and negative samples _triplet Student network output and label cross entropy loss L _hard KL divergence loss L with teacher-student network _soft 。

For the same sample, the output of the teacher model and the student model are considered anchor and positive, respectively; similarly, the present invention considers those samples that are output by the student model that are different from the positive sample class, called negative samples. The triplet penalty has the effect of decreasing the distance between the anchor-positive outputs and increasing the anchor-negative output distance. The present invention incorporates this technology into the knowledge distillation, defining the triplet loss of the knowledge distillation as follows:

wherein m is margin, which is a manually set hyper-parameter, and Ω is a set of industrial CPS intrusion detection data sets.

In order to approximate the softmax output of the student model to the softmax output of the teacher model, the invention uses the KL divergence of the softmax outputs of the two models as part of the model training loss, defining the KL divergence loss L of the teacher-student network _soft The following were used:

wherein KL (p, q) is KL divergence between the softmax output of the student model and the softmax output of the teacher model, and a KL divergence calculation formula is defined as follows:

in order to restrict the difference degree of the probability distribution of the output of the student network and the real label, the invention takes the cross entropy loss of the output of the student network and the real label as a part of a model loss function, and defines the cross entropy loss L of the output of the student network and the real label _hard The following were used:

wherein, y _i,k Denotes the ith sample as label k; p is a radical of _i,k Representing the probability that the ith sample is predicted as label k; n is the total number of data set samples and K is the total number of categories.

The invention distills knowledge partially to lose L _KD Is defined as follows:

L _KD ＝αT ² *L _hard +(1-α)*L _soft (23)

where T is the temperature used for the softened label distribution above and α is the constraint L _hard And L _soft The weighting factor (2) is a hyper-parameter set artificially.

Since the model loss function is composed of multiple parts, the present invention needs to add coefficients in the loss term to adjust the contribution of each loss to the overall loss function, so the loss function L of the model is defined as follows:

L＝L _KD +θL _triplet (24)

where θ is the equilibrium coefficient controlling the knowledge distillation loss and triplet loss during model training. The invention adjusts the error of the training process according to the Loss until the student model reaches the convergence state, and saves the optimal student model for the later test experiment.

In this embodiment, the core idea of the deep separable Convolution is to decompose a complete Convolution operation into two steps, which are respectively performed by a channel-by-channel Convolution (Depthwise Convolution) and a point-by-point Convolution (Pointwise Convolution).

A Convolution kernel of the Depthwise Convolution is responsible for one channel, one channel is only convolved by one Convolution kernel, the number of characteristic image channels generated in the process is completely the same as the number of input channels, and therefore the parameter quantity of the Depthwise Convolution is as follows:

number of input channels (25) parameter number W convolution kernel H convolution kernel

The calculated amount of Depthwise Convolition is:

the calculated quantity is convolution kernel W convolution kernel H (picture W convolution kernel W +1) (picture H convolution kernel H +1) input channel number (26)

The number of feature maps after the completion of Depthwise Convolition is the same as the number of channels of the input layer, and the feature map size cannot be expanded. Moreover, the Convolution operation is performed independently for each channel of the input layer, and the feature information of different channels at the same spatial position is not effectively utilized, so that the poitwise conversion is required to combine the feature maps to generate a new feature map.

The operation of poitwise Convolution is similar to the conventional Convolution operation, and the size of its Convolution kernel is 1 × 1 × M, where M is the number of channels in the previous layer. In the Convolution operation, the feature maps of the previous step are weighted and combined in the depth direction to generate a new feature map, so the parameters of poitwise convention are:

number of input channels 1, number of output channels (27)

The calculated amount of poitwise restriction is:

calculating the quantity 1X 1 characteristic diagram W characteristic diagram H input channel number output channel number (28)

By breaking down the conventional convolution operation into two steps, the amount of computation and the number of parameters of the convolution layer are greatly reduced. For example, assuming that the input feature map size is 224 × 224 × 16, the output feature map size is 224 × 224 × 32, and the convolution kernel size is 3 × 3, if the number of conventional convolution parameters is 3 × 3 × 16 × 32 4608, the amount of calculation is 3 × 3 × (224-2) × (224-2) × 16 × 32 ≈ 2.3 billion, while the number of parameters using the deep separable convolution is 3 × 3 × 16+1 × 1 × 16 × 32 ═ 656, the amount of computation is 3 × 3 × (224-2) × (224-2) × 16+3 × 3 × 16 × 32 ≈ 7.1 million, and the amount of computation and the number of parameters using the deep separable convolution are significantly smaller than those of the conventional convolution, so that the deep separable convolution can be applied to the intrusion detection model, therefore, the intrusion detection method is deployed in nodes with limited computing capacity in the Internet of things network, and thus the intrusion detection time can be greatly reduced.

In the embodiment, the neural network training mode of the K-fold cross training is similar to the K-fold cross validation, the K-fold cross training equally divides a training data set into K parts, each subset data is respectively made into a validation set, the rest K-1 groups of subset data are used as the training sets, and different from the K-fold cross validation, K models can be obtained by the K-fold cross validation each time, the average of the classification accuracy of the final validation sets of the K models is used as the performance index of the classifier, the K-fold cross training only obtains 1 model, the model is continuously optimized on the basis of the previous training each time, and the model has strong priori knowledge before each training similar to the idea of pre-training, so that the model can be converged more quickly and can be prevented from falling into the condition of local optimization.

The K-fold cross training comprises the following specific steps: (431) defining a model and a learning rate, and dividing a data set into a training data set and a testing data set; (432) dividing the training data set into K parts, taking one part as a verification set, and taking the other K-1 parts as a training set; (433) defining a gradient optimizer, wherein the learning rate adopts an attenuation strategy, K-1 data is used for model training, and the rest data is used for testing a model; (434) and (4) repeating the step (433) K times to obtain an optimal model and obtaining an optimal model performance index in the test data set. The Algorithm pseudo code for K-fold cross training is shown in Algorithm 1:

in order to verify the detection capability of a KD-TCNN intrusion detection model on a network-based industrial CPS intrusion detection system, the invention not only carries out intrusion detection on an older intrusion detection data set NSL-KDD, but also carries out intrusion detection on a newer intrusion detection data set CIC IDS 2017.

Because the input data set must conform to the input format of the convolutional neural network, the experimental data set needs to be preprocessed, and the preprocessing steps are as follows:

firstly, carrying out digitalized processing on character type data;

taking the NSL-KDD dataset as an example, if the element types of the three features, namely protocol, flag, and service, are character types, the three feature types need to be converted into numerical data, for example, if the protocol includes UDP, TCP, and ICMP 3 types, the protocol type is processed into 0,1, and 2 types, the processing processes of other features are similar, and the dimension of each network traffic after processing is 41 dimensions. In order to conform to the input format of the convolutional neural network, the network traffic needs to be subjected to reshape operation, the network traffic of the NSL-KDD data set is converted into an 8 × 8 grayscale format, and the network traffic of the CIC IDS2017 data set is converted into a 10 × 10 grayscale format.

Secondly, data normalization processing;

in order to cancel the dimension, the data after feature mapping needs to be normalized to make the gradient advance towards the direction of the minimum value all the time and accelerate convergence, and as a linear scale method, Min-Max normalization preprocesses the data in machine learning. However, Min-Max normalization has significant limitations since it depends on the minimum and maximum values of the samples. The present invention employs a new scaling method to handle cases where the range of values for each feature varies widely. Since the values of each feature in the NSL-KDD and CIC IDS2017 datasets differ very much, we use a hybrid data pre-processing approach. From the actual distribution of the data, our normalization preprocessing method is shown in equation (29).

Wherein, furthermore _i For the ith characteristic value in the original data,

is the minimum value of the ith characteristic value,

is the maximum value among the ith characteristic values,

the normalized result is adopted.

Thirdly, data imbalance processing:

in an industrial CPS intrusion detection scenario, some malicious attack methods account for only a small portion of all network traffic. For example, there is a serious data imbalance problem in the NSL-KDD data set, and the attacks of R2L and U2R in the NSL-KDD training set only account for 0.79% and 0.041% of the training set, respectively, so the classification model is often biased to most categories, resulting in a large false alarm rate. In order to alleviate the problem, the SVMSMOTE algorithm is adopted to perform oversampling processing on a small number of attack types, but only aims at the training data set without changing the data distribution in the test data set (so as to avoid the model from excessively depending on the generated data).

Because network intrusion detection data are complex, the quality of an evaluation model cannot be only determined by accuracy as a unique evaluation standard, and a data set has an obvious data imbalance phenomenon, the Accuracy (ACC), weighted precision (WPrecotion), weighted detection rate (weighted DR, WDR) and weighted FMeasure (WFMeasure) are used as evaluation indexes of the intrusion detection model, and the accuracy and the stability of the model are comprehensively verified through the indexes.

In order to further verify the effectiveness of the knowledge distillation intrusion detection model based on the triple convolutional neural network provided by the valve, an ablation experiment is carried out on the KD-TCNN model. The KD-TCNN model uses four parts of feature selection, depth measurement learning, knowledge distillation and K-fold cross training, so that the four parts are respectively subjected to ablation experiments on an NSL-KDD data set, and the experimental results are shown in Table 1. As can be seen from the table, the accuracy of the student model of Baseline is 96.86% of the lowest, redundant features are eliminated after the feature selection operation is added, the performance of the model is improved, and the accuracy of the model is 96.88% at the moment. And then, knowledge distillation is introduced into an intrusion detection model, the accuracy of the model is improved by 0.39% compared with that of a Baseline model, the difference between the output of a teacher model and the output of a student model is reduced after depth measurement learning is introduced into a knowledge distillation frame, the accuracy is improved to 97.98%, and after a K-fold cross training mode is introduced to train the model, the accuracy of the model is further improved to 98.44%, compared with the teacher model, the accuracy is only different by 0.4%, and the effectiveness of the knowledge distillation intrusion detection model and the K-fold cross training mode of the triple convolutional neural network provided by the invention is fully proved.

TABLE 1 NSL-KDD data set ablation experiment

In summary, the present invention performs data preprocessing on the intrusion detection data set, where the data preprocessing includes character-type data digitization processing, data normalization processing, and data imbalance processing; selecting an optimal feature subset from the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm; pre-training the teacher network model according to the selected optimal feature subset; a KD-TCNN intrusion detection model training process: initializing KD-TCNN intrusion detection parameters and determining the structure of a student network model; inputting two groups of network flows of different categories into a KD-TCNN intrusion detection model for training based on the optimal feature subset; adjusting the error of the K-fold cross training process according to the knowledge distillation loss until the student network model converges; and testing the KD-TCNN intrusion detection model, and inputting the preprocessed test data set into a student network to obtain a classification result of each piece of data. The invention adopts knowledge distillation to make the output of the student model as close as possible to the teacher model, so that the student network can learn the information between classes in the teacher network, and can process and analyze large-scale data in real time and reduce the parameter quantity of the model; the difference between the output of the teacher network model and the output of the student network model can be reduced, so that the performance of the student model is improved; the invention further reduces the parameter and the calculated amount of the model by adopting the deep separable convolution, so that the intrusion detection model can be deployed in the nodes with limited computing capability in the Internet of things network, and the intrusion detection time is reduced to realize real-time detection; the verification result shows that the method is superior to the traditional deep learning model in the parameter quantity and other performance indexes.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A network physical system intrusion detection method is characterized by comprising the following steps:

(2) selecting an optimal feature subset from the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm;

(4) training a KD-TCNN intrusion detection model:

2. The cyber physical system intrusion detection method according to claim 1, wherein in the step (1), the intrusion detection data set includes an NSL-KDD data set, and the character-type data digitization processing procedure converts the type of the character-type element in the NSL-KDD data set into numerical data.

3. The intrusion detection method for cyber physical system according to claim 1, wherein in the step (1), the data normalization processing procedure, according to the actual distribution of the data, the normalization preprocessing formula is:

wherein x is _i For the ith characteristic value in the original data,

is the minimum value of the ith characteristic value,

is the maximum value among the ith characteristic values,

the normalized result is adopted.

4. The cyber physical system intrusion detection method according to claim 1, wherein in the step (2), the most suitable solution is named α, the second and third best solutions are named β and δ, respectively, the remaining candidate solutions are assumed to be ω, and the grayling optimization algorithm step includes:

and in the characteristic subset evaluation stage, a convolutional neural network is adopted as a learning algorithm, a fitness function for evaluating the position of the wolf is adopted, and the characteristic subset with the lowest fitness function value is selected for characteristic selection and dimension reduction to obtain the optimal characteristic subset.

5. The cyber physical system intrusion detection method according to claim 1, wherein in the step (42), the KD-TCNN intrusion detection model training employs a knowledge distillation framework based on a triple convolutional neural network.

6. The cyber physical system intrusion detection method according to claim 5, wherein in the step (42), three kinds of losses are considered in the design of the loss function, the three kinds of losses include a triple loss L based on a distance between the anchor sample and the positive and negative samples _triplet Student network output and label cross entropy loss L _hard KL divergence loss L with teacher-student network _soft 。

7. The cyber physical system intrusion detection method according to claim 6, wherein in order to constrain the degree of difference between the probability distributions of the output of the student network model and the real label, the cross entropy loss between the output of the student network model and the real label is defined as a part of a model loss function, and the cross entropy loss L between the output of the student network model and the real label is defined _hard 。

8. The intrusion detection method for cyber physical system according to claim 7, wherein a coefficient is added to the loss term to adjust the contribution of each loss to the overall loss function, and the loss function L of the model is defined as follows:

L＝L _KD +θL _triplet

9. The cyber-physical system intrusion detection method according to claim 8, wherein the knowledge distillation framework based on the triple convolution neural network employs a deep separable convolution.

10. The cyber physical system intrusion detection method according to claim 1, wherein in the step (43), the K-fold cross training process comprises: