CN114863226A - Network physical system intrusion detection method - Google Patents

Network physical system intrusion detection method Download PDF

Info

Publication number
CN114863226A
CN114863226A CN202210446927.2A CN202210446927A CN114863226A CN 114863226 A CN114863226 A CN 114863226A CN 202210446927 A CN202210446927 A CN 202210446927A CN 114863226 A CN114863226 A CN 114863226A
Authority
CN
China
Prior art keywords
intrusion detection
model
data
training
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210446927.2A
Other languages
Chinese (zh)
Inventor
王振东
李泽煜
陈潇潇
杨书新
王俊岭
李大海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi University of Science and Technology
Original Assignee
Jiangxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Science and Technology filed Critical Jiangxi University of Science and Technology
Priority to CN202210446927.2A priority Critical patent/CN114863226A/en
Publication of CN114863226A publication Critical patent/CN114863226A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S40/00Systems for electrical power generation, transmission, distribution or end-user application management characterised by the use of communication or information technologies, or communication or information technology specific aspects supporting them
    • Y04S40/20Information technology specific aspects, e.g. CAD, simulation, modelling, system security

Abstract

A network physical system intrusion detection method, carry on the data preconditioning to the intrusion detection data set, the data preconditioning includes the digitized processing of the data of the character type, data normalization processing and data imbalance processing; selecting the optimal characteristic subset of the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm; pre-training the teacher network model according to the selected optimal feature subset; and (3) an intrusion detection model training process: initializing parameters of an intrusion model, and determining the structure of a student network model; inputting two groups of network flows of different categories into an intrusion detection model for training based on the optimal feature subset; adjusting errors in the K-fold cross training process according to knowledge distillation loss until the student network model converges; and testing the intrusion detection model to obtain a classification result of each piece of data. The invention realizes the intrusion detection of the Internet of things with the characteristics of light weight, real-time property, unsupervised property and the like, reduces the excessive dependence on the label and prompts the generalization capability.

Description

Network physical system intrusion detection method
Technical Field
The invention belongs to the technical field of industrial networks, and particularly relates to a network physical system intrusion detection method.
Background
A Cyber Physical System (CPS) is a mechanism that is based on control or monitoring of a computer algorithm, and the entire system is integrated with a network, and is generally referred to as a large-scale, geographically dispersed, complex and heterogeneous internet of things. In recent years, the development and deployment of various types of network physical systems have exponentially increased, and have a great influence on aspects of daily life, such as power grids, transportation systems, healthcare equipment, household appliances and the like. Many such systems are deployed in critical infrastructure, life support devices, or places of vital importance to our daily lives.
However, the diversity of CPS applications deployed across networks in the internet of things makes them vulnerable to cyber and physical attacks between different levels of systems, particularly in terms of message transmission in the smart manufacturing process. This introduces a safety hazard into the CPS application, causing the program to become out of control and injuring people who rely on the program. The industrial CPS highly attaches importance to communication and network capacity, acquires physical world object state data in real time through a network and an interface and sends the data to the server, and the server performs corresponding processing after receiving the data and returns the data to the physical terminal equipment to perform corresponding changes. Typically, an attacker will intrude into the CANbus network and hijack the data sent to the server, thereby compromising the equipment of the industrial CPS, and a supervisory control and data acquisition (SCADA) system involves monitoring and collecting signals generated across the network (such as vibration, temperature and TX & RX packet data), where a Deep Learning (DL) based anomaly detection module is deployed to identify anomalies.
An Intrusion Detection System (IDS) can detect intrusion behaviors which cannot be prevented by other security mechanisms, and plays an important role in protecting the CPS as a second-channel defense line. According to the difference of data sources, the intrusion detection system can be divided into: host-based intrusion detection and network-based intrusion detection. Host-based intrusion detection only monitors hosts, needs to be installed on each host, cannot observe network traffic, and cannot analyze network-related behavior information. Network-based intrusion detection observes and analyzes real-time network traffic and monitors a plurality of hosts, and aims to collect data packet information and check the content of the data packet information so as to detect intrusion behaviors in the network. Modern artificial intelligence technology, including intelligent sensing, intelligent control, etc., is widely used in behavior monitoring in intelligent manufacturing. However, detecting abnormal traffic in industrial CPS still presents some challenges. First, the hybrid network physical environment constructed with cloud infrastructure is a large and complex distributed system, and thus a large number of industrial data streams (e.g., instructions, accelerometers, video, images, etc.) are generated by various physical systems and sensors. Another key problem is that such abnormal events occur in the real world with a low probability, thus resulting in a lack of good labeling data for model training. Moreover, the lack of monitoring data may be caused by different factors, such as sensor failure, data transmission error, etc., which may cause more difficulties in data acquisition and model training, and make it difficult to implement anomaly detection. Furthermore, nodes in the internet of things network are mostly deployed in devices with limited resources, e.g., limited power, limited computing, communication and storage capabilities, etc. In order to reduce the damage caused by malicious attacks in the industrial CPS, high-precision and timely real-time anomaly detection is generally required to facilitate overall performance monitoring of data streams obtained and transmitted based on distributed nodes at different levels across the system.
To sum up, how to compress the size of the model while not reducing the efficiency of the intrusion detection model, and improving the generalization ability of the model has practical significance.
Disclosure of Invention
Therefore, the invention provides the Internet of things intrusion detection method based on self-supervision learning and self-knowledge distillation, which realizes the Internet of things intrusion detection with the characteristics of light weight, real-time performance, unsupervised performance and the like, reduces the excessive dependence on the label and prompts the generalization capability.
In order to achieve the above purpose, the invention provides the following technical scheme: a network physical system intrusion detection method comprises the following steps:
(1) carrying out data preprocessing on the intrusion detection data set, wherein the data preprocessing comprises character type data digitization processing, data normalization processing and data unbalance processing;
(2) selecting the optimal characteristic subset of the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm;
(3) pre-training the teacher network model according to the selected optimal feature subset;
(4) training a KD-TCNN intrusion detection model:
(41) initializing KD-TCNN intrusion detection parameters and determining the structure of a student network model;
(42) inputting two groups of network flows of different categories into the KD-TCNN intrusion detection model for training based on the optimal feature subset;
(43) adjusting the error of the K-fold cross training process according to the knowledge distillation loss until the student network model converges;
(5) and testing the KD-TCNN intrusion detection model, and inputting the preprocessed test data set into a student network to obtain a classification result of each piece of data.
As a preferred scheme of the network physical system intrusion detection method, in the step (1), the intrusion detection data set comprises an NSL-KDD data set, and the character type data is subjected to a digitization processing process, so that the element types of the character type in the NSL-KDD data set are converted into numerical type data.
As a preferred scheme of the intrusion detection method of the cyber-physical system, in the step (1), the data normalization processing procedure, according to the actual distribution of the data, has a normalization preprocessing formula as follows:
Figure BDA0003617284650000031
wherein x is i For the ith characteristic value in the original data,
Figure BDA0003617284650000032
is the minimum value of the ith characteristic value,
Figure BDA0003617284650000033
is the maximum value among the ith characteristic values,
Figure BDA0003617284650000034
the normalized result is adopted.
As a preferred scheme of the intrusion detection method of the cyber-physical system, in step (2), the optimal solution is named as α, the second and third optimal solutions are named as β and δ, respectively, the remaining candidate solutions are assumed to be ω, and the grayling optimization algorithm step includes:
and (3) a prey surrounding stage: establishing a mathematical model of the surrounding behavior;
a hunting stage: guided by α, β and δ may participate in hunting; the remaining omega updates the location according to the location of the best search agent;
and (3) a prey attacking stage: simulating an approaching prey, and performing linear updating on the parameter alpha in each iteration;
and in the characteristic subset evaluation stage, a convolutional neural network is used as a learning algorithm, a fitness function for evaluating the position of the wolf is adopted, and the characteristic subset with the lowest fitness function value is selected to perform characteristic selection and dimension reduction to obtain an optimal characteristic subset.
As a preferred scheme of the intrusion detection method of the cyber-physical system, in the step (42), a knowledge distillation framework based on a triple convolution neural network is adopted for KD-TCNN intrusion detection model training.
As a preferred scheme of the intrusion detection method of the network physical system, in step (42), three losses are considered in the design of the loss function, wherein the three losses comprise a triple loss L based on the distance between an anchor sample and a positive sample and a negative sample triplet Intersection of student network output and tagsLoss of entropy L hard KL divergence loss L with teacher-student network soft
As an optimal scheme of the intrusion detection method of the network physical system, in order to restrict the difference degree of the probability distribution of the output of the student network model and the real label, the cross entropy loss of the output of the student network model and the real label is used as a part of a model loss function, and the cross entropy loss L of the output of the student network model and the real label is defined hard
As a preferred scheme of the intrusion detection method of the network physical system, coefficients are added to loss terms to adjust the contribution of each loss to the overall loss function, and the loss function L of the model is defined as follows:
L=L KD +θL triplet
where θ is the equilibrium coefficient controlling the knowledge distillation loss and triplet loss during model training, L KD Knowledge of the distillation part loss, L triplet A triplet penalty based on the distance between the anchor sample and the positive and negative samples.
As a preferred scheme of the intrusion detection method of the network physical system, the knowledge distillation framework based on the triple convolutional neural network adopts deep separable convolution.
As a preferred scheme of the intrusion detection method of the cyber-physical system, in step (43), the K-fold cross training process includes:
(431) defining a model and a learning rate, and dividing a data set into a training data set and a testing data set;
(432) dividing the training data set into K parts, taking one part as a verification set, and taking the other K-1 parts as a training set;
(433) defining a gradient optimizer, wherein the learning rate adopts an attenuation strategy, K-1 data is used for model training, and the rest data is used for testing a model;
(434) and (4) repeating the step (433) K times to obtain an optimal model and obtain the performance index of the optimal model in the test data set.
The invention has the following advantages: carrying out data preprocessing on the intrusion detection data set, wherein the data preprocessing comprises character type data digitization processing, data normalization processing and data unbalance processing; selecting the optimal characteristic subset of the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm; pre-training the teacher network model according to the selected optimal feature subset; a KD-TCNN intrusion detection model training process: initializing KD-TCNN intrusion detection parameters and determining the structure of a student network model; inputting two groups of network flows of different categories into a KD-TCNN intrusion detection model for training based on the optimal feature subset; adjusting the error of the K-fold cross training process according to the knowledge distillation loss until the student network model converges; and testing the KD-TCNN intrusion detection model, and inputting the preprocessed test data set into a student network to obtain a classification result of each piece of data. The invention adopts knowledge distillation to make the output of the student model as close as possible to the teacher model, so that the student network can learn the information between classes in the teacher network, and can process and analyze large-scale data in real time and reduce the parameter quantity of the model; the difference between the output of the teacher network model and the output of the student network model can be reduced, so that the performance of the student model is improved; the invention further reduces the parameter and the calculated amount of the model by adopting the deep separable convolution, so that the intrusion detection model can be deployed in the nodes with limited computing capability in the Internet of things network, and the intrusion detection time is reduced to realize real-time detection; the verification result shows that the method is superior to the traditional deep learning model in the parameter quantity and other performance indexes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary and that other implementation drawings may be derived from the provided drawings by those of ordinary skill in the art without inventive effort.
Fig. 1 is a schematic flowchart of an intrusion detection method for a cyber-physical system according to an embodiment of the present invention;
fig. 2 is a flowchart of feature selection of a binary grayish wolf optimization algorithm in the intrusion detection method for the cyber physical system according to the embodiment of the present invention;
fig. 3 is a knowledge distillation framework based on a triple convolutional neural network in the intrusion detection method for a cyber physical system according to an embodiment of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
With reference to fig. 1, the intrusion detection method for a network physical system provided by the present invention includes:
(1) carrying out data preprocessing on the intrusion detection data set, wherein the data preprocessing comprises character type data digitization processing, data normalization processing and data unbalance processing;
(2) selecting the optimal characteristic subset of the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm;
(3) pre-training the teacher network model according to the selected optimal feature subset;
(4) training a KD-TCNN intrusion detection model:
(41) initializing KD-TCNN intrusion detection parameters and determining the structure of a student network model;
(42) inputting two groups of network flows of different categories into the KD-TCNN intrusion detection model for training based on the optimal feature subset;
(43) adjusting the error of the K-fold cross training process according to the knowledge distillation loss until the student network model converges;
(5) and testing the KD-TCNN intrusion detection model, and inputting the preprocessed test data set into a student network to obtain a classification result of each piece of data.
In this embodiment, a binary grayish wolf optimization algorithm is used to select the optimal feature subset. The grey wolf optimization algorithm is a group intelligent algorithm for simulating the trapping behavior of the grey wolf, and the trapping tasks such as enclosing, catching and attacking are distributed to grey wolf groups with different levels according to the social level of the grey wolf to finish the trapping behavior, so that the process of global optimization is realized.
To model the social rank of the wolf when designing the wolf optimization algorithm, the present invention names the best solution as α, the second and third best solutions as β and δ, respectively, and the remaining candidate solutions are assumed to be ω. In the gray wolf optimization algorithm, the hunting process is guided by three wolfs, α, β, and δ, with the ω wolf following the three wolfs. The grey wolf optimization algorithm comprises the following specific steps:
step 1. surround prey stage:
the wolves are caught by the wolf pack, which needs to first be enclosed. To build a mathematical model of the bounding behavior, the bounding behavior is explained by the following equation:
Figure BDA0003617284650000061
Figure BDA0003617284650000071
Figure BDA0003617284650000072
Figure BDA0003617284650000073
where, t is the number of iterations,
Figure BDA0003617284650000074
and
Figure BDA0003617284650000075
is a vector of coefficients that is a function of,
Figure BDA0003617284650000076
is the position of the prey,
Figure BDA0003617284650000077
is the position of the grey wolf, alpha decreases linearly from 2 to 0 in an iterative process,
Figure BDA0003617284650000078
and
Figure BDA0003617284650000079
is at [0,1]]Random vectors within a range.
Step 2. hunting stage:
hunting is usually guided by α wolves, and β and δ wolves may participate in hunting. Assuming that α, β, and δ are better understood about the potential locations of prey and require other wolves (including ω wolves) to update their locations according to the location of the best search agent, the location update formula is as follows:
Figure BDA00036172846500000710
Figure BDA00036172846500000711
Figure BDA00036172846500000712
Figure BDA00036172846500000713
wherein the content of the first and second substances,
Figure BDA00036172846500000714
and
Figure BDA00036172846500000715
are the three best solutions in a given iteration t population,
Figure BDA00036172846500000716
and
Figure BDA00036172846500000717
is defined by formula (3).
Figure BDA00036172846500000718
And
Figure BDA00036172846500000719
are defined by equations (9) to (11), respectively:
Figure BDA00036172846500000720
Figure BDA00036172846500000721
Figure BDA00036172846500000722
wherein
Figure BDA00036172846500000723
And
Figure BDA00036172846500000724
is defined by formula (4).
Step 3. prey stage:
when the prey stops moving, the wolf completes the hunting process through the attack. To model an approaching prey, the parameter α is updated linearly in each iteration according to equation (12), ranging from 2 to 0.
α=2-t(2/MaxIter) (12)
Wherein t is the current iteration number, and MaxIter is the maximum iteration number allowed by optimization.
In the binary grayish wolf optimization algorithm, the update formula of the wolf position is a function of three position vectors, namely x α 、x β And x δ It attracts each wolf to move forward three best solutions. In the binary grayling optimization algorithm, the solution pool is in binary form at any given time, with all solutions in the corners of a hypercube. The present invention employs a second model of the binary grayling optimization algorithm, bGWO2, in which only the updated grayling location vector is binary. The gray wolf location update formula is shown as formula (13):
Figure BDA0003617284650000081
wherein rand is from [0,1]]The random numbers to be extracted are uniformly distributed,
Figure BDA0003617284650000082
is the binary position updated in dimension d in iteration number t, the sigmoid function is defined as follows:
Figure BDA0003617284650000083
the binary grayish optimization algorithm searches the feature space in a self-adaptive mode to find the optimal feature subset, wherein the optimal feature subset is the feature subset with the highest classification performance and the least selected feature number. The fitness function for estimating the gray wolf location in the binary gray wolf optimization is shown in equation (15):
Figure BDA0003617284650000084
wherein, P is the classification accuracy, L is the number of elements of the selected optimal feature subset, N is the total number of features, alpha and beta are divided into classification accuracy and the weight of the number of the selected feature subset, and alpha belongs to [0,1] and beta is 1-alpha.
And completing selection of the optimal characteristic subset of the network traffic data on the training subset. In the characteristic subset evaluation stage, a convolutional neural network is used as a learning algorithm, and a formula (15) is used as a fitness function. And selecting the characteristic subset with the lowest fitness function value to realize characteristic selection and dimension reduction, thereby obtaining the characteristic subset with the best classification effect. Fig. 2 shows a flow chart of the feature selection of the whole binary grayish wolf optimization algorithm.
Specifically, knowledge distillation is a common method for model compression, and the expression of 'knowledge distillation' in a complex classroom network with strong learning capacity is used for model compression, and the feature expression is transmitted to a Student network with small parameter and weak learning capacity in a Teacher-Student framework, so that the Student network with high speed, strong capacity and small model is obtained. On the other hand, knowledge distillation to bring the output of the student model as close as possible to the teacher model allows the student network to learn softer knowledge in the teacher network, where the information between classes is contained, which is not available in traditional one-hot coding. Because the goal of knowledge distillation is to increase the similarity between teacher and student models, while depth metric learning aims to reduce the distance between similar sample inputs, increasing the distance between different input samples. The functionality of metric learning to reduce differences between similar inputs can be used for knowledge distillation to reduce differences between teacher model and student model outputs, thereby improving performance of the student model. Generally, the Siamese Neural Network and the triple Neural Network are two common Neural Network architectures for metric learning, and since the Siamese Neural Network can only consider the distance between two samples, the Siamese Neural Network must uniquely determine the definition of similarity between two samples, for example: if there are two different male figures, they should be judged similar in the case of gender concepts. However, they should be judged dissimilar in terms of the concept of an individual. It is difficult to express these multiple concepts in the simple Neural Network, and the triple Neural Network makes the distance between anchors-positive closer to the distance between anchors-negative through learning, so that a plurality of similar concepts can be considered, and does not depend on one similar concept, so the invention reduces the difference between the teacher model and the student model output through the triple Neural Network in the depth measurement learning.
Referring to fig. 3, a knowledge distillation framework based on a triplet convolutional neural network is shown.
In order to train KD-TCNN intrusion detection model, network flow sample data x a Sending into pre-trained teacher network, and outputting via softmax output layer
Figure BDA0003617284650000091
Computing probability vectors for classes
Figure BDA0003617284650000092
Figure BDA0003617284650000093
Where T is typically set to a temperature of 1, i.e., corresponding to the softmax activation function, using a higher value for T results in a smoother probability distribution over the classes, also referred to as softpseudo-labels.
And x a Network traffic sample data x of different classes n And network traffic sample data x a Respectively sent to student network, and output via softmax output layer
Figure BDA0003617284650000094
And
Figure BDA0003617284650000095
computing probability vectors for classes
Figure BDA0003617284650000096
And
Figure BDA0003617284650000097
Figure BDA0003617284650000098
Figure BDA0003617284650000099
in order to ensure the prediction accuracy and the false alarm rate of the abnormal detection of the industrial CPS data, three losses are considered in the design of the loss function, and the loss L of the triple based on the distance between the anchor sample and the positive and negative samples triplet Student network output and label cross entropy loss L hard KL divergence loss L with teacher-student network soft
For the same sample, the output of the teacher model and the student model are considered anchor and positive, respectively; similarly, the present invention considers those samples that are output by the student model that are different from the positive sample class, called negative samples. The triplet penalty has the effect of decreasing the distance between the anchor-positive outputs and increasing the anchor-negative output distance. The present invention incorporates this technology into the knowledge distillation, defining the triplet loss of the knowledge distillation as follows:
Figure BDA0003617284650000101
wherein m is margin, which is a manually set hyper-parameter, and Ω is a set of industrial CPS intrusion detection data sets.
In order to approximate the softmax output of the student model to the softmax output of the teacher model, the invention uses the KL divergence of the softmax outputs of the two models as part of the model training loss, defining the KL divergence loss L of the teacher-student network soft The following were used:
Figure BDA0003617284650000102
wherein KL (p, q) is KL divergence between the softmax output of the student model and the softmax output of the teacher model, and a KL divergence calculation formula is defined as follows:
Figure BDA0003617284650000103
in order to restrict the difference degree of the probability distribution of the output of the student network and the real label, the invention takes the cross entropy loss of the output of the student network and the real label as a part of a model loss function, and defines the cross entropy loss L of the output of the student network and the real label hard The following were used:
Figure BDA0003617284650000104
wherein, y i,k Denotes the ith sample as label k; p is a radical of i,k Representing the probability that the ith sample is predicted as label k; n is the total number of data set samples and K is the total number of categories.
The invention distills knowledge partially to lose L KD Is defined as follows:
L KD =αT 2 *L hard +(1-α)*L soft (23)
where T is the temperature used for the softened label distribution above and α is the constraint L hard And L soft The weighting factor (2) is a hyper-parameter set artificially.
Since the model loss function is composed of multiple parts, the present invention needs to add coefficients in the loss term to adjust the contribution of each loss to the overall loss function, so the loss function L of the model is defined as follows:
L=L KD +θL triplet (24)
where θ is the equilibrium coefficient controlling the knowledge distillation loss and triplet loss during model training. The invention adjusts the error of the training process according to the Loss until the student model reaches the convergence state, and saves the optimal student model for the later test experiment.
In this embodiment, the core idea of the deep separable Convolution is to decompose a complete Convolution operation into two steps, which are respectively performed by a channel-by-channel Convolution (Depthwise Convolution) and a point-by-point Convolution (Pointwise Convolution).
A Convolution kernel of the Depthwise Convolution is responsible for one channel, one channel is only convolved by one Convolution kernel, the number of characteristic image channels generated in the process is completely the same as the number of input channels, and therefore the parameter quantity of the Depthwise Convolution is as follows:
number of input channels (25) parameter number W convolution kernel H convolution kernel
The calculated amount of Depthwise Convolition is:
the calculated quantity is convolution kernel W convolution kernel H (picture W convolution kernel W +1) (picture H convolution kernel H +1) input channel number (26)
The number of feature maps after the completion of Depthwise Convolition is the same as the number of channels of the input layer, and the feature map size cannot be expanded. Moreover, the Convolution operation is performed independently for each channel of the input layer, and the feature information of different channels at the same spatial position is not effectively utilized, so that the poitwise conversion is required to combine the feature maps to generate a new feature map.
The operation of poitwise Convolution is similar to the conventional Convolution operation, and the size of its Convolution kernel is 1 × 1 × M, where M is the number of channels in the previous layer. In the Convolution operation, the feature maps of the previous step are weighted and combined in the depth direction to generate a new feature map, so the parameters of poitwise convention are:
number of input channels 1, number of output channels (27)
The calculated amount of poitwise restriction is:
calculating the quantity 1X 1 characteristic diagram W characteristic diagram H input channel number output channel number (28)
By breaking down the conventional convolution operation into two steps, the amount of computation and the number of parameters of the convolution layer are greatly reduced. For example, assuming that the input feature map size is 224 × 224 × 16, the output feature map size is 224 × 224 × 32, and the convolution kernel size is 3 × 3, if the number of conventional convolution parameters is 3 × 3 × 16 × 32 4608, the amount of calculation is 3 × 3 × (224-2) × (224-2) × 16 × 32 ≈ 2.3 billion, while the number of parameters using the deep separable convolution is 3 × 3 × 16+1 × 1 × 16 × 32 ═ 656, the amount of computation is 3 × 3 × (224-2) × (224-2) × 16+3 × 3 × 16 × 32 ≈ 7.1 million, and the amount of computation and the number of parameters using the deep separable convolution are significantly smaller than those of the conventional convolution, so that the deep separable convolution can be applied to the intrusion detection model, therefore, the intrusion detection method is deployed in nodes with limited computing capacity in the Internet of things network, and thus the intrusion detection time can be greatly reduced.
In the embodiment, the neural network training mode of the K-fold cross training is similar to the K-fold cross validation, the K-fold cross training equally divides a training data set into K parts, each subset data is respectively made into a validation set, the rest K-1 groups of subset data are used as the training sets, and different from the K-fold cross validation, K models can be obtained by the K-fold cross validation each time, the average of the classification accuracy of the final validation sets of the K models is used as the performance index of the classifier, the K-fold cross training only obtains 1 model, the model is continuously optimized on the basis of the previous training each time, and the model has strong priori knowledge before each training similar to the idea of pre-training, so that the model can be converged more quickly and can be prevented from falling into the condition of local optimization.
The K-fold cross training comprises the following specific steps: (431) defining a model and a learning rate, and dividing a data set into a training data set and a testing data set; (432) dividing the training data set into K parts, taking one part as a verification set, and taking the other K-1 parts as a training set; (433) defining a gradient optimizer, wherein the learning rate adopts an attenuation strategy, K-1 data is used for model training, and the rest data is used for testing a model; (434) and (4) repeating the step (433) K times to obtain an optimal model and obtaining an optimal model performance index in the test data set. The Algorithm pseudo code for K-fold cross training is shown in Algorithm 1:
Figure BDA0003617284650000121
in order to verify the detection capability of a KD-TCNN intrusion detection model on a network-based industrial CPS intrusion detection system, the invention not only carries out intrusion detection on an older intrusion detection data set NSL-KDD, but also carries out intrusion detection on a newer intrusion detection data set CIC IDS 2017.
Because the input data set must conform to the input format of the convolutional neural network, the experimental data set needs to be preprocessed, and the preprocessing steps are as follows:
firstly, carrying out digitalized processing on character type data;
taking the NSL-KDD dataset as an example, if the element types of the three features, namely protocol, flag, and service, are character types, the three feature types need to be converted into numerical data, for example, if the protocol includes UDP, TCP, and ICMP 3 types, the protocol type is processed into 0,1, and 2 types, the processing processes of other features are similar, and the dimension of each network traffic after processing is 41 dimensions. In order to conform to the input format of the convolutional neural network, the network traffic needs to be subjected to reshape operation, the network traffic of the NSL-KDD data set is converted into an 8 × 8 grayscale format, and the network traffic of the CIC IDS2017 data set is converted into a 10 × 10 grayscale format.
Secondly, data normalization processing;
in order to cancel the dimension, the data after feature mapping needs to be normalized to make the gradient advance towards the direction of the minimum value all the time and accelerate convergence, and as a linear scale method, Min-Max normalization preprocesses the data in machine learning. However, Min-Max normalization has significant limitations since it depends on the minimum and maximum values of the samples. The present invention employs a new scaling method to handle cases where the range of values for each feature varies widely. Since the values of each feature in the NSL-KDD and CIC IDS2017 datasets differ very much, we use a hybrid data pre-processing approach. From the actual distribution of the data, our normalization preprocessing method is shown in equation (29).
Figure BDA0003617284650000131
Wherein, furthermore i For the ith characteristic value in the original data,
Figure BDA0003617284650000132
is the minimum value of the ith characteristic value,
Figure BDA0003617284650000133
is the maximum value among the ith characteristic values,
Figure BDA0003617284650000134
the normalized result is adopted.
Thirdly, data imbalance processing:
in an industrial CPS intrusion detection scenario, some malicious attack methods account for only a small portion of all network traffic. For example, there is a serious data imbalance problem in the NSL-KDD data set, and the attacks of R2L and U2R in the NSL-KDD training set only account for 0.79% and 0.041% of the training set, respectively, so the classification model is often biased to most categories, resulting in a large false alarm rate. In order to alleviate the problem, the SVMSMOTE algorithm is adopted to perform oversampling processing on a small number of attack types, but only aims at the training data set without changing the data distribution in the test data set (so as to avoid the model from excessively depending on the generated data).
Because network intrusion detection data are complex, the quality of an evaluation model cannot be only determined by accuracy as a unique evaluation standard, and a data set has an obvious data imbalance phenomenon, the Accuracy (ACC), weighted precision (WPrecotion), weighted detection rate (weighted DR, WDR) and weighted FMeasure (WFMeasure) are used as evaluation indexes of the intrusion detection model, and the accuracy and the stability of the model are comprehensively verified through the indexes.
In order to further verify the effectiveness of the knowledge distillation intrusion detection model based on the triple convolutional neural network provided by the valve, an ablation experiment is carried out on the KD-TCNN model. The KD-TCNN model uses four parts of feature selection, depth measurement learning, knowledge distillation and K-fold cross training, so that the four parts are respectively subjected to ablation experiments on an NSL-KDD data set, and the experimental results are shown in Table 1. As can be seen from the table, the accuracy of the student model of Baseline is 96.86% of the lowest, redundant features are eliminated after the feature selection operation is added, the performance of the model is improved, and the accuracy of the model is 96.88% at the moment. And then, knowledge distillation is introduced into an intrusion detection model, the accuracy of the model is improved by 0.39% compared with that of a Baseline model, the difference between the output of a teacher model and the output of a student model is reduced after depth measurement learning is introduced into a knowledge distillation frame, the accuracy is improved to 97.98%, and after a K-fold cross training mode is introduced to train the model, the accuracy of the model is further improved to 98.44%, compared with the teacher model, the accuracy is only different by 0.4%, and the effectiveness of the knowledge distillation intrusion detection model and the K-fold cross training mode of the triple convolutional neural network provided by the invention is fully proved.
TABLE 1 NSL-KDD data set ablation experiment
Figure BDA0003617284650000141
In summary, the present invention performs data preprocessing on the intrusion detection data set, where the data preprocessing includes character-type data digitization processing, data normalization processing, and data imbalance processing; selecting an optimal feature subset from the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm; pre-training the teacher network model according to the selected optimal feature subset; a KD-TCNN intrusion detection model training process: initializing KD-TCNN intrusion detection parameters and determining the structure of a student network model; inputting two groups of network flows of different categories into a KD-TCNN intrusion detection model for training based on the optimal feature subset; adjusting the error of the K-fold cross training process according to the knowledge distillation loss until the student network model converges; and testing the KD-TCNN intrusion detection model, and inputting the preprocessed test data set into a student network to obtain a classification result of each piece of data. The invention adopts knowledge distillation to make the output of the student model as close as possible to the teacher model, so that the student network can learn the information between classes in the teacher network, and can process and analyze large-scale data in real time and reduce the parameter quantity of the model; the difference between the output of the teacher network model and the output of the student network model can be reduced, so that the performance of the student model is improved; the invention further reduces the parameter and the calculated amount of the model by adopting the deep separable convolution, so that the intrusion detection model can be deployed in the nodes with limited computing capability in the Internet of things network, and the intrusion detection time is reduced to realize real-time detection; the verification result shows that the method is superior to the traditional deep learning model in the parameter quantity and other performance indexes.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (10)

1. A network physical system intrusion detection method is characterized by comprising the following steps:
(1) carrying out data preprocessing on the intrusion detection data set, wherein the data preprocessing comprises character type data digitization processing, data normalization processing and data unbalance processing;
(2) selecting an optimal feature subset from the preprocessed intrusion detection data set through a binary grayish wolf optimization algorithm;
(3) pre-training the teacher network model according to the selected optimal feature subset;
(4) training a KD-TCNN intrusion detection model:
(41) initializing KD-TCNN intrusion detection parameters and determining the structure of a student network model;
(42) inputting two groups of network flows of different categories into the KD-TCNN intrusion detection model for training based on the optimal feature subset;
(43) adjusting the error of the K-fold cross training process according to the knowledge distillation loss until the student network model converges;
(5) and testing the KD-TCNN intrusion detection model, and inputting the preprocessed test data set into a student network to obtain a classification result of each piece of data.
2. The cyber physical system intrusion detection method according to claim 1, wherein in the step (1), the intrusion detection data set includes an NSL-KDD data set, and the character-type data digitization processing procedure converts the type of the character-type element in the NSL-KDD data set into numerical data.
3. The intrusion detection method for cyber physical system according to claim 1, wherein in the step (1), the data normalization processing procedure, according to the actual distribution of the data, the normalization preprocessing formula is:
Figure FDA0003617284640000011
wherein x is i For the ith characteristic value in the original data,
Figure FDA0003617284640000012
is the minimum value of the ith characteristic value,
Figure FDA0003617284640000013
is the maximum value among the ith characteristic values,
Figure FDA0003617284640000014
the normalized result is adopted.
4. The cyber physical system intrusion detection method according to claim 1, wherein in the step (2), the most suitable solution is named α, the second and third best solutions are named β and δ, respectively, the remaining candidate solutions are assumed to be ω, and the grayling optimization algorithm step includes:
and (3) a prey surrounding stage: establishing a mathematical model of the surrounding behavior;
a hunting stage: guided by α, β and δ may participate in hunting; the remaining omega updates the location according to the location of the best search agent;
and (3) a prey attacking stage: simulating an approaching prey, and performing linear updating on the parameter alpha in each iteration;
and in the characteristic subset evaluation stage, a convolutional neural network is adopted as a learning algorithm, a fitness function for evaluating the position of the wolf is adopted, and the characteristic subset with the lowest fitness function value is selected for characteristic selection and dimension reduction to obtain the optimal characteristic subset.
5. The cyber physical system intrusion detection method according to claim 1, wherein in the step (42), the KD-TCNN intrusion detection model training employs a knowledge distillation framework based on a triple convolutional neural network.
6. The cyber physical system intrusion detection method according to claim 5, wherein in the step (42), three kinds of losses are considered in the design of the loss function, the three kinds of losses include a triple loss L based on a distance between the anchor sample and the positive and negative samples triplet Student network output and label cross entropy loss L hard KL divergence loss L with teacher-student network soft
7. The cyber physical system intrusion detection method according to claim 6, wherein in order to constrain the degree of difference between the probability distributions of the output of the student network model and the real label, the cross entropy loss between the output of the student network model and the real label is defined as a part of a model loss function, and the cross entropy loss L between the output of the student network model and the real label is defined hard
8. The intrusion detection method for cyber physical system according to claim 7, wherein a coefficient is added to the loss term to adjust the contribution of each loss to the overall loss function, and the loss function L of the model is defined as follows:
L=L KD +θL triplet
where θ is the equilibrium coefficient controlling the knowledge distillation loss and triplet loss during model training, L KD Knowledge of the distillation part loss, L triplet A triplet penalty based on the distance between the anchor sample and the positive and negative samples.
9. The cyber-physical system intrusion detection method according to claim 8, wherein the knowledge distillation framework based on the triple convolution neural network employs a deep separable convolution.
10. The cyber physical system intrusion detection method according to claim 1, wherein in the step (43), the K-fold cross training process comprises:
(431) defining a model and a learning rate, and dividing a data set into a training data set and a testing data set;
(432) dividing the training data set into K parts, taking one part as a verification set, and taking the other K-1 parts as a training set;
(433) defining a gradient optimizer, wherein the learning rate adopts an attenuation strategy, K-1 data is used for model training, and the rest data is used for testing a model;
(434) and (4) repeating the step (433) K times to obtain an optimal model and obtain the performance index of the optimal model in the test data set.
CN202210446927.2A 2022-04-26 2022-04-26 Network physical system intrusion detection method Pending CN114863226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210446927.2A CN114863226A (en) 2022-04-26 2022-04-26 Network physical system intrusion detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210446927.2A CN114863226A (en) 2022-04-26 2022-04-26 Network physical system intrusion detection method

Publications (1)

Publication Number Publication Date
CN114863226A true CN114863226A (en) 2022-08-05

Family

ID=82633949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210446927.2A Pending CN114863226A (en) 2022-04-26 2022-04-26 Network physical system intrusion detection method

Country Status (1)

Country Link
CN (1) CN114863226A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116865887A (en) * 2023-07-06 2023-10-10 四川省广播电视科学技术研究所 Emotion classification broadcasting system and method based on knowledge distillation
CN116916318A (en) * 2023-07-19 2023-10-20 西华师范大学 Lightweight intrusion detection method based on separable convolution for Internet of things equipment
CN117726461A (en) * 2024-02-07 2024-03-19 湖南招采猫信息技术有限公司 Financial risk prediction method and system for electronic recruitment assistance

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116865887A (en) * 2023-07-06 2023-10-10 四川省广播电视科学技术研究所 Emotion classification broadcasting system and method based on knowledge distillation
CN116865887B (en) * 2023-07-06 2024-03-01 四川省广播电视科学技术研究所 Emotion classification broadcasting system and method based on knowledge distillation
CN116916318A (en) * 2023-07-19 2023-10-20 西华师范大学 Lightweight intrusion detection method based on separable convolution for Internet of things equipment
CN117726461A (en) * 2024-02-07 2024-03-19 湖南招采猫信息技术有限公司 Financial risk prediction method and system for electronic recruitment assistance

Similar Documents

Publication Publication Date Title
CN109492582B (en) Image recognition attack method based on algorithm adversarial attack
Haggag et al. Implementing a deep learning model for intrusion detection on apache spark platform
CN114863226A (en) Network physical system intrusion detection method
CN105488528B (en) Neural network image classification method based on improving expert inquiry method
CN111310814A (en) Method and device for training business prediction model by utilizing unbalanced positive and negative samples
CN111783442A (en) Intrusion detection method, device, server and storage medium
CN112165485A (en) Intelligent prediction method for large-scale network security situation
Ortet Lopes et al. Towards effective detection of recent DDoS attacks: A deep learning approach
CN110768971B (en) Confrontation sample rapid early warning method and system suitable for artificial intelligence system
CN112131578A (en) Method and device for training attack information prediction model, electronic equipment and storage medium
CN114417427A (en) Deep learning-oriented data sensitivity attribute desensitization system and method
CN111709022B (en) Hybrid alarm association method based on AP clustering and causal relationship
CN113660196A (en) Network traffic intrusion detection method and device based on deep learning
Shi et al. A framework of intrusion detection system based on Bayesian network in IoT
CN113239638A (en) Overdue risk prediction method for optimizing multi-core support vector machine based on dragonfly algorithm
Kalaivani et al. A Hybrid Deep Learning Intrusion Detection Model for Fog Computing Environment.
Benaddi et al. Adversarial attacks against iot networks using conditional gan based learning
CN115051864A (en) PCA-MF-WNN-based network security situation element extraction method and system
Ramadevi et al. Deep Learning Based Distributed Intrusion Detection in Secure Cyber Physical Systems.
Qiang et al. Network security based on DS evidence theory optimizing CS-BP neural network situation assessment
CN115909027B (en) Situation estimation method and device
He Identification and Processing of Network Abnormal Events Based on Network Intrusion Detection Algorithm.
CN116996272A (en) Network security situation prediction method based on improved sparrow search algorithm
Xu et al. Adversarial robustness in graph-based neural architecture search for edge ai transportation systems
CN116545679A (en) Industrial situation security basic framework and network attack behavior feature analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination