CN116743646A

CN116743646A - Tunnel network anomaly detection method based on domain self-adaptive depth self-encoder

Info

Publication number: CN116743646A
Application number: CN202311023612.8A
Authority: CN
Inventors: 李�浩; 李朋; 杨路; 陆艳铭; 陈志涛; 李孜; 胡皓; 马伟任
Original assignee: BROADVISION ENGINEERING CONSULTANTS
Current assignee: BROADVISION ENGINEERING CONSULTANTS
Priority date: 2023-08-15
Filing date: 2023-08-15
Publication date: 2023-09-12
Anticipated expiration: 2043-08-15
Also published as: CN116743646B

Abstract

The invention relates to a tunnel network anomaly detection method based on a domain self-adaptive depth self-encoder, and belongs to the technical field of tunnel network anomaly detection. The method comprises the steps of data acquisition and preprocessing, training and updating of an abnormal detection source domain model, dynamic threshold calculation of abnormal detection, abnormal data detection and the like. The invention can directly perform operations such as preprocessing, abnormality detection and the like on the tunnel network at the edge side, improves the processing speed of the monitoring system and effectively reduces the processing time delay. Meanwhile, based on the normal reference value of the current network state, the abnormal threshold range is reasonably set, and abnormal information missing report, false report and other conditions caused by fixed setting of the threshold are avoided.

Description

Tunnel network anomaly detection method based on domain self-adaptive depth self-encoder

Technical Field

The invention belongs to the technical field of tunnel network anomaly detection, and particularly relates to a tunnel network anomaly detection method based on a domain self-adaptive depth self-encoder, in particular to an edge calculation method for finishing tunnel network anomaly detection based on the domain self-adaptive depth self-encoder.

Background

The electromechanical system in the tunnel is huge, the equipment distribution is complex, and the method is particularly important to the monitoring and management of the network state of the electromechanical equipment in the expressway tunnel. At present, in a single expressway tunnel, an area controller is arranged every 500 meters to control peripheral field devices, and the control cabinets are interconnected through a switch to form an optical fiber ring network in two directions of an opening. The Ethernet switch not only needs to process high-bandwidth data of the video monitoring system, but also needs to be configured into a redundant optical fiber ring network connection area controller to control equipment such as ventilation, illumination, traffic lights and the like in the tunnel. As the number of industrial ethernet devices increases, the structure of industrial ethernet networks is increasingly complex. In the practical application process, the problems of insufficient network topology sensing capability, network storm caused by misoperation, virus infection and the like have become important factors influencing the stability and the reliability of the network. When industrial Ethernet is problematic, the industrial Ethernet tends to instantaneously spread to the whole network, and the influence range is large. In addition, the regional controller in the tunnel is generally responsible for executing functions such as digital quantity, analog quantity input and output, serial port communication and the like, but cannot effectively collect the running state information of network equipment such as a switch and the like. That is, although the tunnel network is actually established, when an abnormality occurs in the network traffic, the system cannot accurately locate the fault location, and cannot generate a corresponding record.

Therefore, the introduction of the edge computing architecture into each electromechanical system within the tunnel is of great research interest. The edge computing technology can directly process data at the tunnel electromechanical equipment end, avoid the transfer of cloud or other data centers, improve the response speed and reduce the requirement on the tunnel network bandwidth. The edge computing nodes are deployed in the tunnel environment, a large number of front-end devices in the environment are managed, the fault of the electromechanical devices in the operation process can be avoided by detecting the state of the tunnel network based on the edge computing, and the reliability and the intelligent level of the tunnel electromechanical system are improved.

However, in order to make the edge calculation play a better role in the tunnel network monitoring, it is necessary to design a reasonable tunnel network anomaly detection edge calculation method. Most of the conventional abnormality detection methods applied to practice still depend on manual detection and analysis, the most theoretical method is a mathematical statistical method, statistical distribution in statistics is generally used as a standard for abnormality judgment, statistical characteristics among samples are calculated, and a set threshold is adopted to realize abnormality detection. The second type of method is based on a classification model, but model training requires good training data and has a large number of labeled datasets to perform model training. The third class is distance-based methods, where outlier samples are considered abnormal, and such algorithms are not well suited for large-volume, high-dimensional data. However, because of the complicated and diversified electromechanical devices in the tunnel, the acquired network data network has higher flow characteristic dimension, high nonlinearity among data and the like, and an effective anomaly detection model is difficult to establish. In addition, because the tunnel network operation environment has the characteristic of dynamic change along with time, the problem of false detection is easy to generate only according to fixed monitoring indexes and by using a mode of a fixed abnormal detection model for all operation samples. Therefore, how to overcome the defects of the prior art is a problem to be solved in the technical field of tunnel network anomaly detection.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a tunnel network anomaly detection method based on a domain self-adaptive depth self-encoder.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a tunnel network anomaly detection method based on a domain adaptive depth self-encoder comprises the following steps:

step 1: collecting historical network flow data of equipment in a tunnel electromechanical system equipment layer through edge computing nodes deployed in a tunnel, analyzing and obtaining corresponding network original data flow, and preprocessing data to obtain corresponding network flow characteristics, namely preprocessed network flow samples;

step 2: the collected historical network flow data is processed in the step 1 to obtain the network flow characteristics corresponding to each network flow data, and then the network flow characteristics are used as a source domain data set; training an anomaly detection source domain model based on a depth self-encoder algorithm by taking a source domain data set as a training set, and deploying the anomaly detection source domain model in a tunnel edge computing node after training is completed;

step 3: after the network flow data acquired in real time are subjected to the preprocessing mode in the step 1 to obtain the corresponding network flow characteristics, constructing a self-adaptive sliding window algorithm to obtain a target domain data set corresponding to the network flow; updating the anomaly detection source domain model obtained in the step 2 according to the corresponding target domain data set;

Step 4: calculating a dynamic threshold for anomaly detection;

step 5: inputting the network flow characteristics after preprocessing the network flow data to be detected, which are acquired in real time, into an updated anomaly detection source domain model, and calculating a reconstruction error of the anomaly detection source domain model;

step 6: and (3) detecting whether the network flow data to be detected acquired in real time is abnormal data or not according to the dynamic threshold value obtained in the step (4) and the reconstruction error obtained in the step (5).

Further, preferably, in step 1, the system architecture for performing tunnel network anomaly detection by using an edge computing node includes a device layer, an edge computing layer, a network layer and a cloud platform layer; the equipment layer, the edge computing layer, the network layer and the cloud platform layer are sequentially connected; each equipment system in the equipment layer comprises a broadcast telephone system, a tunnel monitoring system, a tunnel ventilation lighting system, a tunnel area controller, a tunnel fire protection system, an information release system and a tunnel traffic signal system; the edge computing layer is an edge computing node deployed in the tunnel.

Further, preferably, in step 1, the data preprocessing method includes removing abnormal data, removing meaningless features and normalizing the data;

the network traffic characteristics include data flow duration, number of forward packets, number of reverse packets, total number of bytes of forward packets, total number of bytes of reverse packets, total number of bytes of forward substreams, and total number of bytes of reverse substreams.

Further, preferably, in step 2, the anomaly detection source domain model adopts a depth automatic encoder, including an encoder and a decoder;

the anomaly detection source domain model is provided with three layers of neural networks, namely an input layer, an hidden layer and an output layer, wherein the input is，/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a source domain dataset,/->Indicate->The network traffic samples after pretreatment;

the specific training method of the anomaly detection source domain model is as follows:

step 2.1: the encoder encodes source domain dataVia an activation function->Mapping to obtain hidden layer data:

;

in the method, in the process of the invention,representing implicit layer vectors,/->Indicate->Implicit layer vectors of the preprocessed network traffic samples;

the encoding process is shown as formula (1):

(1)

in the method, in the process of the invention,and->Representing the network weights and offset vectors of the encoder, respectively,/->To activate a function, in the present invention, a Sigmoid function;

step 2.2: the decoder activates the functionImplicit layer data->Conversion to the output layer obtains the output variable:

;

in the method, in the process of the invention,output variable representing reconstruction, +.>Indicate->The reconstructed network traffic samples;

via implicit layer vector->The input variable is reconstructed, and the decoding process is shown as a formula (2);

(2)

in the method, in the process of the invention,and->Representing the network weights and offset vectors of the decoder, respectively,/- >Is an activation function;

step 2.3: training the anomaly detection source domain model by using a gradient descent algorithm, and obtaining optimal network parameters by taking a minimum reconstructed error as a target; the objective function is shown in formula (3):

(3)

in which a set of network parametersRepresenting the network weight of the encoder, the bias vector of the encoder, the network weight of the decoder, the bias vector of the decoder, respectively,>and->Respectively represent +.>Input and reconstruction output variables of the DAE networks;Mrepresenting the total number of the preprocessed network traffic samples;

step 2.4: and saving the network parameters of the trained anomaly detection source domain model, and deploying the model at the tunnel edge computing node.

Further, it is preferable that the method comprises,is a Sigmoid function.

Further, it is preferable that the specific method of step 3 is:

step 3.1: is assumed to be inAt moment, the network flow data acquired by the edge computing node in real time is preprocessed in the step 1 to obtain the corresponding network flow characteristics, namely, preprocessed network flow samples ∈ ->Defining a sample to be measured; constructing a target domain data set +.>The method comprises the steps of carrying out a first treatment on the surface of the The specific method comprises the following steps:

step 3.1.1: to be used forTime-of-day pre-processed network traffic sample +. >Expanding the pre-processed network traffic sample with the forward sequence for the right boundary of the sliding window, and attributing the network traffic sample with the time sequence close to the pre-processed network traffic sample into the sliding window, so as to adapt to the sliding window data set +_>Expressed as:

;

in the method, in the process of the invention,indicating length +.>Comprising->Time to->The preprocessed network traffic samples in the moment; />Pre-processing network traffic samples for the left boundary of a window, i.e. adaptively sliding the windowTime forward expansion->The network traffic samples after pretreatment;

step 3.1.2: adaptive sliding window in determining whether to dilate a preamble sample, sliding is assumedThe window has been expanded toAt the moment, the sample to be judged whether to fit into the window is +.>Time; first, the Euclidean distance average of the samples at this time and all samples inside the current window is calculated according to the similarity function shown below:

(4)

the ED is calculated by the following steps:

;

in the method, in the process of the invention,representing the slave +.>Time to->Any pre-processed network traffic samples within a time instant,/-or->For the number of pre-processed network traffic samples in the current window,/I>A network traffic sample after the preamble pretreatment to judge whether to incorporate the window is determined;

Step 3.1.3: setting the boundary threshold of the adaptive sliding window according to the similarity function of the step 3.1.2If->The sliding window will incorporate the pre-processed network traffic sample, i.e. the left boundary pre-processed network traffic sample of the window is +.>Otherwise, stop expanding, the left boundary sample is +.>；

Let the sliding window data setAs->The target domain data set is expressed as:

;

due toCommon->Strip pre-processed network traffic samples, thus target domain dataset +.>Expressed as:

;

in the method, in the process of the invention,is indicated as including->Data set of network traffic samples after strip pretreatment +.>Interior (I)>The network traffic samples after pretreatment;

step 3.2: utilizing a target domain datasetPerforming domain self-adaptive updating on the anomaly detection source domain model trained in the step 2; the method comprises the following specific steps:

step 3.2.1: first, the source domain data setInputting the source domain model into a trained anomaly detection source domain model, and acquiring an implicit layer vector of source domain data through forward propagation of a formula (1)>；

Step 3.2.2: target domain dataAlso input into the anomaly detection source domain model, acquire the hidden layer vector of the target domain data by forward propagation of the following formula >：

(5)

Step 3.2.3: taking the maximum average difference distance as an objective function, and the calculation formula is shown as formula (6):

;

(6)

in the method, in the process of the invention,respectively->And->The number of samples in>To find the minimum upper bound function, ++>Refers to any index in the dataset, i.e. +.>And->Respectively indicate->Middle->And->Sample number->And->Respectively representMiddle->And->A sample number; />For Gaussian kernel function +.>The calculation method is as follows:

(7)

in the method, in the process of the invention,representing a bandwidth parameter;

step 3.2.4: calculating the difference between the implicit vectors generated by the source domain data and the target domain data according to formulas (6) - (7) to construct a DADAE model; targeting minimizing the DADAE model objective function, the DADAE model objective function is as follows:

(8)

in the method, in the process of the invention,a loss function represented by formula (3); />Distance loss function for MMD; network parameter set->Respectively representing the network weight of the encoder, the bias vector of the encoder, the network weight of the decoder and the bias vector of the decoder after the domain adaptive update, +.>Is a balance parameter;

step 3.2.5: and saving the trained network parameters of the new anomaly detection source domain model, and deploying the new anomaly detection source domain model at the tunnel edge computing node.

Further, in step 4, it is preferable that a dynamic threshold for abnormality detection is calculated, and an upper limit of the dynamic threshold is defined as The lower limit is marked as/>The method comprises the steps of carrying out a first treatment on the surface of the The specific method comprises the following steps:

step 4.1: first, the target domain data setRe-executing in the updated abnormality detection source domain model to obtain an output data set after encoder and decoder, denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Then, the reconstruction error of each piece of target domain data is calculated using the following formula:

(9)

in the method, in the process of the invention,comprises->Elements, denoted->，/>And->Respectively indicate containing->Target domain data sets of individual network traffic and reconstructed output data sets thereof;

step 4.2: calculation ofThe mean and standard deviation of (2) are calculated as follows:

(10)

(11)

in the method, in the process of the invention,representation->Average value of>Is->Standard deviation of (2); the dynamic threshold range is:

(12)

(13)

in the method, in the process of the invention,is a standard deviation coefficient.

Further, it is preferable that the method comprises,2.

Further, in step 5, preferably, the method for calculating the reconstruction error is as follows:

;

in the method, in the process of the invention,and (3) reconstructing and outputting the network traffic samples subjected to pretreatment to be detected after the network traffic samples are subjected to the updating and then decoded after the abnormal detection source domain model is encoded.

Further, it is preferable that the detection method in step 6 is:

when (when)When the mark is normal; when->Or->When marked as abnormal.

In the present invention,the value of (2) can be selected according to actual conditions, and the invention is not limited to the boundary threshold value.

The technical problems to be solved by the invention are as follows: aiming at the problem of difficult detection of tunnel network abnormality caused by complex Ethernet equipment and higher network transmission flow sampling rate in the prior art, edge computing nodes are introduced into the tunnel network, and network flow is directly processed at the equipment end. However, because of the complicated and diversified electromechanical devices in the tunnel, the acquired network data network has higher flow characteristic dimension, high nonlinearity among data and the like, and an effective anomaly detection model is difficult to establish. In addition, because the tunnel network operation environment has the characteristic of dynamic change along with time, the tunnel network operation environment is changed instantly, the condition that the detection effect continuously decreases along with time is easy to occur only by a model which is unchanged, and the fixed abnormal threshold value and the detection model have no good robustness to the abnormal detection task of the tunnel network.

In view of the above, the present invention proposes a tunnel network anomaly detection method based on a domain adaptive depth self-encoder (Domain Adaptive Deep Autoencoder, DADAE). According to the method, an edge computing architecture is introduced into a tunnel electromechanical system, and network flow characteristics corresponding to different services are acquired and obtained by edge computing nodes, so that an abnormality detection task is completed. For the problem of difficult construction of a tunnel network anomaly detection model, the invention introduces the idea of migration learning, and designs a domain self-adaptive depth self-encoder algorithm to realize real-time update detection of the network state. From the perspective of transfer learning, in the invention, a section of historical network flow generated by a tunnel electromechanical system is regarded as a source domain, the distribution of network flow to be detected acquired in real time is not matched with the historical network flow to a certain extent due to the time-varying characteristic of data, and the network flow has strong correlation in adjacent time periods, so that samples in a certain time window are taken as target domains for the samples of the flow to be detected. The invention aims to directly collect and process data at the tunnel edge, and utilizes the idea of transfer learning to improve the adaptability of an anomaly detection model to time-varying flow samples, improve the processing speed of a monitoring system, effectively reduce the processing time delay and improve the robustness and the accuracy of the anomaly detection model.

Specifically, firstly, aiming at the characteristics of high characteristic dimension and nonlinearity among data of a tunnel network, a source domain automatic encoder model is built by taking historical normal network flow as source domain data so as to initially build a nonlinear fitting relation of the tunnel network. And then, after the network sample to be detected, which is acquired by the edge node in real time, arrives, determining target domain data corresponding to the sample to be detected through a sliding window, and carrying out domain self-adaptive updating on the source domain self-encoder model by using the target domain data. And finally, inputting the sample to be detected into the updated model to calculate the reconstruction loss of the sample, and detecting whether the sample to be detected is an abnormal network flow sample or not by using the constructed abnormal detection module.

Compared with the prior art, the invention has the beneficial effects that:

(1) The tunnel network anomaly detection method of the domain self-adaptive depth self-encoder can directly perform operations such as preprocessing and anomaly detection on the tunnel network at the edge side, so that the processing speed of a monitoring system is improved, and the processing time delay is effectively reduced.

(2) Aiming at the characteristic of the dynamic change of the tunnel network environment along with time, the domain self-adaptive depth self-encoder algorithm provided by the invention can enable the anomaly detection algorithm to be self-adaptively matched with the network traffic sample to be detected, and improve the robustness and accuracy of the anomaly detection algorithm.

(3) The method for dynamically determining the abnormal threshold value provided by the invention has the advantages that the range of the abnormal threshold value is reasonably set based on the normal reference value of the current network state, and the situations of abnormal information missing report, false report and the like caused by fixed setting of the threshold value are avoided.

(4) By applying the method and the system in the expressway tunnel inner edge computing nodes, the robustness and the accuracy of network flow abnormality detection tasks can be improved, and the expressway tunnel operation management cost can be reduced.

Drawings

FIG. 1 is a layer architecture diagram of the present invention for domain-adaptive depth self-encoder based tunnel network anomaly detection;

FIG. 2 is a flow chart of a method of detecting tunnel network anomalies based on a domain adaptive depth self-encoder of the present invention;

fig. 3 is an algorithm diagram of a method for detecting tunnel network anomalies based on a domain-adaptive depth self-encoder according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples.

It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The specific techniques or conditions are not identified in the examples and are performed according to techniques or conditions described in the literature in this field or according to the product specifications. The materials or equipment used are conventional products available from commercial sources, not identified to the manufacturer.

Embodiment 1 is a tunnel network anomaly detection method based on a domain adaptive depth self-encoder, comprising the following steps:

step 4: calculating a dynamic threshold for anomaly detection;

Embodiment 2 is a tunnel network anomaly detection method based on a domain adaptive depth self-encoder, comprising the steps of:

step 4: calculating a dynamic threshold for anomaly detection;

In the step 1, a system architecture for detecting tunnel network abnormality by utilizing an edge computing node comprises a device layer, an edge computing layer, a network layer and a cloud platform layer; the equipment layer, the edge computing layer, the network layer and the cloud platform layer are sequentially connected; each equipment system in the equipment layer comprises a broadcast telephone system, a tunnel monitoring system, a tunnel ventilation lighting system, a tunnel area controller, a tunnel fire protection system, an information release system and a tunnel traffic signal system; the edge computing layer is an edge computing node deployed in the tunnel.

In the step 1, the data preprocessing mode comprises abnormal data removal, meaningless characteristic removal and data normalization;

In the step 2, the anomaly detection source domain model adopts a depth automatic encoder, comprising an encoder and a decoder;

;

the encoding process is shown as formula (1):

(1)

in the method, in the process of the invention,and->Representing network weights and bias of encoders, respectivelyVector of placement (I/O) >To activate a function, in the present invention, a Sigmoid function;

;

(2)

in the method, in the process of the invention,and->Representing the network weights and offset vectors of the decoder, respectively,/->Is an activation function;

(3)

Is a Sigmoid function.

The specific method of the step 3 is as follows:

step 3.1: is assumed to be in At moment, the network flow data acquired by the edge computing node in real time is preprocessed in the step 1 to obtain the corresponding network flow characteristics, namely, preprocessed network flow samples ∈ ->Defining a sample to be measured; constructing a target domain data set +.>The method comprises the steps of carrying out a first treatment on the surface of the The specific method comprises the following steps:

step 3.1.1: to be used forTime-of-day pre-processed network traffic sample +.>Expanding the pre-processed network traffic sample with the forward sequence for the right boundary of the sliding window, and attributing the network traffic sample with the time sequence close to the pre-processed network traffic sample into the sliding window, so as to adapt to the sliding window data set +_>Expressed as:

;

step 3.1.2: adaptive sliding window in determining whether to expand a preamble sample, it is assumed that the sliding window has been expanded toAt the moment, the sample to be judged whether to fit into the window is +.>Time; first, the Euclidean distance average of the samples at this time and all samples inside the current window is calculated according to the similarity function shown below:

(4)

The ED is calculated by the following steps:

;

Let the sliding window data setAs->The target domain data set is expressed as:

;

step 3.2.1: first, the source domain data set Inputting the source domain model into a trained anomaly detection source domain model, and acquiring an implicit layer vector of source domain data through forward propagation of a formula (1)>；

Step 3.2.2: target domain dataAlso input into the anomaly detection source domain model, acquire the hidden layer vector of the target domain data by forward propagation of the following formula>：

(5)

;

(6)

(7)

(8)

in the method, in the process of the invention,a loss function represented by formula (3); />Distance loss function for MMD; network parameter set->Respectively representing the network weight of the encoder, the bias vector of the encoder, the network weight of the decoder and the bias vector of the decoder after the domain adaptive update, +. >Is a balance parameter;

In step 4, a dynamic threshold for anomaly detection is calculated, and the upper limit of the dynamic anomaly threshold is recorded asThe lower limit is marked asThe method comprises the steps of carrying out a first treatment on the surface of the The specific method comprises the following steps:

(9)

(10)

(11)

(12)

(13)

2.

In step 5, the method for calculating the reconstruction error is as follows:

;

The detection method in the step 6 is as follows:

when (when)When the mark is normal; when->Or->When marked as abnormal.

Embodiment 3 as shown in fig. 1, the invention provides a tunnel network anomaly detection method based on a domain adaptive depth self-encoder due to difficult recognition and inaccurate positioning of network traffic anomalies in a tunnel and introduction of an edge computing architecture. The edge computing architecture of the tunnel electromechanical system is divided into a device layer, an edge computing layer, a network layer and a cloud platform layer. The equipment layer mainly comprises tunnel sensing equipment and control equipment, such as a broadcast telephone system, a tunnel monitoring system, a tunnel ventilation lighting system, a tunnel area controller, a tunnel fire protection system, an information release system, a tunnel traffic signal system and the like. The method solves the problem that in the prior art, the Ethernet equipment is complex, the tunnel network abnormality detection is difficult due to the high network transmission flow sampling rate, edge computing nodes are arranged in the tunnel, a large number of front-end equipment on the periphery is managed, and data acquisition and processing are performed, including the functions of model training, domain self-adaptive updating, network abnormality detection and the like. Because of the problems of high dimensionality of the acquired network flow characteristics, high nonlinearity among data and the like caused by the complexity and diversity of electromechanical equipment in the tunnel, an effective anomaly detection model is difficult to establish.

As shown in the flowchart of fig. 2 and the algorithm chart of fig. 3, taking the detection of network traffic anomalies of the tunnel monitoring system as an example, the method for detecting tunnel network anomalies based on the domain adaptive depth self-encoder according to the present embodiment specifically includes the following steps:

step 1: taking the network traffic anomaly detection of the tunnel monitoring system as an example, collecting historical network traffic data of the tunnel monitoring system through edge computing nodes deployed in a tunnel, analyzing and obtaining corresponding network original data streams by using an existing conventional mode, and preprocessing data to obtain network traffic characteristics corresponding to the service.

The data preprocessing mode comprises abnormal data removal, meaningless characteristic removal and data normalization. The abnormal data removing operation is performed in the collected network flow of the monitoring system, and normal historical data is reserved for subsequent modeling through manual judgment. Removing nonsensical features includes removing nonsensical features such as IP addresses, port numbers, timestamps, etc., and converting various feature data of the tunnel network into processable data. The network traffic characteristics include basic characteristics of data flow, content characteristics of protocol connection, time-based traffic statistics characteristics, and connection characteristics. Optionally, several features including, but not limited to, data stream duration, number of forward packets, number of reverse packets, total number of bytes of forward packets, total number of bytes of reverse packets, total number of bytes of forward packet header, total number of bytes of reverse packet header, total number of bytes of forward substream, and total number of bytes of reverse substream are used for modeling and anomaly detection. And the data normalization is performed by taking the maximum and minimum values of the flow characteristics as references, and the maximum and minimum normalization is performed on the data, so that the value range of all the data is ensured to be in the [0,1] interval.

Step 2: the step is an offline part, and the collected historical network flow data of the tunnel monitoring system is used as a source domain data set after the corresponding characteristics of each flow sample are obtained through the preprocessing method in the step 1. And training an anomaly detection source domain model based on a Deep Auto Encoder (DAE) algorithm by taking the source domain data set as a training set, and deploying the DAE in a tunnel edge computing node after training is completed. The depth automatic encoder is an unsupervised neural network model comprising an encoder and a decoder, and can learn implicit characteristics (the encoder) of network traffic input data of the tunnel monitoring system, reconstruct the input characteristics (the decoder) by using the learned implicit characteristics, and the principle of the DAE is that the output of the decoder is restored to be input as much as possible. Assume that the source domain dataset for training is represented as:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,representing a source domain dataset,/->Indicate->And (5) pre-processing the network traffic samples.

The invention does not limit the number of hidden layers and the number of hidden layer neurons of the DAE algorithm, such as a self-encoder structure diagram in an algorithm diagram of FIG. 3, and takes the network flow of the tunnel monitoring system as an example, an anomaly detection source domain model based on the DAE algorithm is provided with three layers of neural networks, namely an input layer, a hidden layer and an output layer, and the input is The specific training pattern of the DAE is as follows:

step 2.1: the encoder encodes source domain dataOne by one via an activation function>Mapping to obtain hidden layer data:

;

in the method, in the process of the invention,implicit layer vector representing DAE, +.>Indicate->Implicit layer vectors of individual network traffic samples;

the encoding process is shown as formula (1):

(1)

in the method, in the process of the invention,and->Representing the network weights and offset vectors of the encoder, respectively,/->To activate a function, in the present invention, it is a Sigmoid function. />

;

in the method, in the process of the invention,output variable representing reconstruction, +.>Indicate->Individual reconstructionNetwork traffic samples of (a); in this step->Via implicit layer vector->The input variable is reconstructed, and the decoding process is shown as a formula (2);

(2)

in the method, in the process of the invention,and->Representing the network weights and offset vectors of the decoder, respectively,/->To activate a function, in the present invention, it is a Sigmoid function.

Step 2.3: training the DAE by using a gradient descent algorithm, and obtaining optimal network parameters by minimizing a reconstruction error; the objective loss function required to be optimized in the training process is shown in the formula (3):

(3)

in which a set of network parametersRepresenting the network weight of the encoder, the bias vector of the encoder, the network weight of the decoder, the bias vector of the decoder, respectively, >And->Respectively represent +.>Input and reconstruction output variables of the DAE networks;Mrepresenting the total number of network traffic samples;

step 2.4: and saving the trained DAE network parameters, and deploying the model at a tunnel edge computing node to serve as a source domain model for anomaly detection.

It should be noted that, in the step 2, the network traffic generated by the tunnel monitoring system in real time can be detected abnormally by using the DAE model trained by the source domain data, but in the network environment of the terminal device in the tunnel, the network traffic dynamically changes with time, and the robustness of anomaly detection is lower to a certain extent by adopting a constant model.

Step 3: the step is online part, and domain self-adaptive updating is carried out based on the DAE model deployed on the tunnel edge computing node. In this example, when the edge computing node acquires the network traffic generated by the tunnel monitoring system in real time and needs to detect an anomaly, the DAE model established in the step 2 is updated by the domain adaptive update strategy constructed by the invention, and the algorithm is defined as a domain adaptive encoder (Domain Adaptive Deep Autoencoder, DADAE). Specifically, the DADAE is updated as follows:

Step 3.1: step 3.1: is assumed to be inAt moment, the network flow sample generated by the tunnel monitoring system which is acquired by the edge computing node in real time is preprocessed in the step 1 to obtain the corresponding network flow characteristic, and the preprocessed sample is expressed asThat is->The pre-processed network traffic sample to be detected abnormally of the tunnel monitoring system is defined as a sample to be detected in this example.

Since domain-adaptive updating requires updating the source domain model with the target domain, which requires updating with the target (i.e.)) The updated model can be matched with the target only if the data structure and the characteristics have higher similarity. According to the strong correlation characteristic of network traffic in adjacent time periods, the concept of a sliding window is introduced, and an adaptive sliding window algorithm is constructed to obtain +.>The corresponding target domain data set comprises the following specific steps:

step 3.1.1: to be used forTime tunnel monitoring system network flow sample +.>Expanding the pre-processed network traffic sample of the forward sequence for the right boundary of the sliding window, finding out that the network traffic with the proper time sequence is close to the right boundary of the sliding window, and then self-adapting the data set of the sliding window +. >Can be expressed as: />

;

In the method, in the process of the invention,indicating length +.>Comprising->Time to->Network traffic samples within a time of day; it should be noted that these samples are all normal data after abnormality detection。/>Pre-processed network traffic samples for the left boundary of the window, i.e. adaptive sliding window +.>Time forward expansion->A network traffic sample.

Step 3.1.2: adaptive sliding window in determining whether to expand a preamble sample, it is assumed that the sliding window has been expanded toAt the moment, the sample to be judged whether to fit into the window is +.>Time; first, the Euclidean distance (Euclidean Distance, ED) average of the samples at this time and all samples inside the current window is calculated according to the similarity function shown below:

(4)

the ED is calculated by the following steps:

;

in the method, in the process of the invention,representing the slave +.>Time to->Any network traffic sample within a time instant +.>For the number of network traffic samples in the current window, < + >>A preamble sample to be judged whether to include a window or not;

step 3.1.3: setting the boundary threshold of the adaptive sliding window according to the similarity function of the step 3.1.2If->The sliding window will incorporate the network traffic sample, i.e. the left boundary sample of the window is +. >Otherwise, stop expanding, the left boundary sample is +.>。

Through the step 3.1, the obtained self-adaptive sliding window network flow and the sample to be testedHas strong correlation property, and makes sliding window data set +.>As->The target domain data set is expressed as:

;

due toCommon->Strip netA sample of traffic, thus the target domain dataset +.>Can also be expressed as:

;

in the method, in the process of the invention,is indicated as including->Data set of strip sample->Interior (I)>A network traffic sample.

Step 3.2: utilizing a target domain datasetAnd (3) performing domain self-adaptive updating on the DAE trained in the step (2). Domain adaptation can be described simply as an inter-domain knowledge transfer of model similarity between source and target domains in order to discover and attenuate differences between the two domains. Therefore, under the dynamically-changed network environment of the tunnel terminal equipment, the constructed DADAE can be adaptively matched with the network traffic sample to be detected, and the accuracy and the robustness of anomaly detection are improved. The method comprises the following specific steps:

step 3.2.1: as shown in fig. 3, first, the source domain data set is acquiredInputting into the trained DAE, obtaining hidden layer vector of source domain data through forward propagation of formula (1) >；

Step 3.2.2: target domain dataAlso input into the DAE, acquires the hidden layer vector of the target domain data by forward propagation of the following formula>：

(5)/>

Step 3.2.3: the maximum average difference (Maximum Mean Difference, MMD) distance is introduced into the objective function of the DAE to calculate the data difference between the source domain and the target domain. Wherein, the liquid crystal display device comprises a liquid crystal display device,and->The MMD calculation between (a) and (b) is shown as formula (6):

;

(6)

in the method, in the process of the invention,respectively->And->The number of samples in>To find the minimum upper bound function, ++>Any index in the dataset is referred to in the formulas. MMD aims to measure the regenerated hilbert space (Reproducing Kernel Hilbert Space), which is a method of kernel learning, the smaller the MMD distance, the higher the similarity between two data domains. />For Gaussian kernel function +.>The calculation method is as follows:

(7)

in the method, in the process of the invention,the bandwidth parameter is represented, the value of the bandwidth parameter is proportional to the width of the Gaussian kernel function, and the value of the bandwidth parameter is always 1.

Step 3.2.4: gathering target domain dataDirectly inputting the obtained DAE obtained through offline training in the step 2, introducing the MMD distance into an objective function of the DAE, and calculating the difference between the implicit vectors generated by the source domain data and the objective domain data according to formulas (6) - (7) to construct a DADAE model.

Because of the network parameters trained based on the previous steps, only a small number of iterations (i.e., network weight fine-tuning) are required in this step to achieve domain-adaptive updating of the model. The DADAE model objective function (with the minimum objective function as the objective) constructed by the invention is as follows:

(8)

wherein the objective function consists of two partial losses, namely the objective function losses of the DAEAnd MMD distance loss between the implicit vector of the source domain data and the implicit vector of the target domain data; network parameter set->Respectively representing the network weight of the encoder, the bias vector of the encoder, the network weight of the decoder and the bias vector of the decoder after the domain adaptive update, +.>For the balance parameter between DAE loss and inter-domain MMD distance loss, the value of the balance parameter is generally 0.5, and the balance parameter can be finely adjusted up and down in the implementation process, the invention does not aim at->Is used as a constraint. The training mode is still a gradient descent algorithm.

It can be seen that the objective function constructed by the invention not only fully utilizes the source domain model information, but also solves the problem that the static model can not adapt to the dynamically changed network environment of the tunnel electromechanical device to a certain extent by minimizing the objective function so that the updated network weight and bias tend to the characteristics of the objective domain data.

In step 4, a dynamic threshold value for anomaly detection is calculated based on the target domain data set of the network traffic to be detected, and the upper limit of the dynamic anomaly threshold value is recorded asThe lower limit is->。

Because the network traffic dynamically changes with time in the tunnel electromechanical system network environment, the state of the normal traffic is also updated continuously along with relevant factors such as the network environment. Therefore, for the abnormal judgment of the network flow of the tunnel monitoring system acquired by the edge computing node, the normal reference value of the current network state should be based. The target domain data set determined based on the adaptive sliding window according to said step 3.1 has a strong temporal correlation, the network state of which is less affected by temporal variations. Therefore, the specific steps for determining the dynamic threshold range based on the target domain data of the network traffic to be detected are as follows:

step 4.1: first, the target domain data setExecuting again in the updated DADAE model, obtaining an output data set after encoder and decoder, recorded as +.>. Then, the reconstruction error of each piece of target domain data is calculated using the following formula:

(9)

in the method, in the process of the invention,comprises->An element, which can be expressed as->，/>And->Respectively indicate containing->Target domain data sets of individual network traffic and reconstructed output data sets.

(10)

(11)

in the method, in the process of the invention,representation->Average value of>Is->Standard deviation of (2). The dynamic threshold range set by the present invention is:

(12)

(13)

in the method, in the process of the invention,as the standard deviation coefficient, the present invention is not limited to +.>For example +.>May be 2.

In step 5, the sample to be measured is input into a DADAE model for reasoning, and the reconstruction error is calculated by the following calculation method:

;

in the method, in the process of the invention,and (5) reconstructing and outputting the samples to be detected after DADADAE encoding.

Step 6: according to the reconstruction error of the sample to be measuredAnd judging whether the network flow of the real-time tunnel monitoring system to be detected is abnormal data or not according to the dynamic error threshold range. The judgment criteria are as follows:

when (when)When the mark is normal; when->Or->When marked as abnormal.

Step 7: after the current sample to be detected finishes the abnormal detection, after the edge computing node at the next moment acquires the new network flow to be detected, carrying out data preprocessing according to the steps, re-determining a sliding window data set, carrying out domain self-adaptive updating on the DAE model by utilizing target domain data, calculating a dynamic threshold range, detecting whether the network flow to be detected is abnormal or not and the like.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The tunnel network anomaly detection method based on the domain adaptive depth self-encoder is characterized by comprising the following steps of:

step 4: calculating a dynamic threshold for anomaly detection;

2. The method for detecting tunnel network anomalies based on the domain adaptive depth self-encoder according to claim 1, wherein in step 1, the system architecture for detecting tunnel network anomalies by using edge computing nodes comprises a device layer, an edge computing layer, a network layer and a cloud platform layer; the equipment layer, the edge computing layer, the network layer and the cloud platform layer are sequentially connected; each equipment system in the equipment layer comprises a broadcast telephone system, a tunnel monitoring system, a tunnel ventilation lighting system, a tunnel area controller, a tunnel fire protection system, an information release system and a tunnel traffic signal system; the edge computing layer is an edge computing node deployed in the tunnel.

3. The method for detecting the tunnel network anomaly based on the domain adaptive depth self-encoder according to claim 1, wherein in the step 1, the data preprocessing mode comprises removing anomaly data, removing nonsensical features and normalizing data;

4. The method for detecting the tunnel network anomaly based on the domain adaptive depth self-encoder according to claim 1, wherein in the step 2, the anomaly detection source domain model adopts a depth self-encoder, and comprises an encoder and a decoder;

step 2.1: the encoder encodes source domain dataVia an activation function- >Mapping to obtain hidden layer data:

;

the encoding process is shown as formula (1):

(1)

;

(2)

(3)

in which a set of network parametersRepresenting the network weight of the encoder, the bias vector of the encoder, the network weight of the decoder, the bias vector of the decoder, respectively,>and->Respectively represent +.>Input and reconstruction output variables of the DAE networks; MRepresenting the total number of the preprocessed network traffic samples;

5. The method for detecting tunnel network anomalies based on domain-adaptive depth self-encoder as recited in claim 4, wherein,is a Sigmoid function.

6. The method for detecting tunnel network anomalies based on the domain adaptive depth self-encoder as claimed in claim 4, wherein the specific method of step 3 is as follows:

step 3.1: is assumed to be inAt moment, network traffic collected by edge computing nodes in real timeThe data is preprocessed in the step 1 to obtain the corresponding network flow characteristics, namely, preprocessed network flow samples +.>Defining a sample to be measured; constructing a target domain data set +.>The method comprises the steps of carrying out a first treatment on the surface of the The specific method comprises the following steps:

;

In the method, in the process of the invention,indicating length +.>Comprising->Time to->The preprocessed network traffic samples in the moment; />Pre-processed network traffic samples for the left boundary of the window, i.e. adaptive sliding window +.>Time forward expansion->The network traffic samples after pretreatment;

(4)

the ED is calculated by the following steps:

;

in the method, in the process of the invention,representing the slave +.>Time to->Within the moment of timeIs +.>For the number of pre-processed network traffic samples in the current window,/I>A network traffic sample after the preamble pretreatment to judge whether to incorporate the window is determined;

step 3.1.3: setting the boundary threshold of the adaptive sliding window according to the similarity function of the step 3.1.2If (if)The sliding window will incorporate the pre-processed network traffic sample, i.e. the left boundary pre-processed network traffic sample of the window is +. >Otherwise, stop expanding, the left boundary sample is +.>；

Let the sliding window data setAs->The target domain data set is expressed as:

;

(5)

;

(6)

in the method, in the process of the invention,respectively->And->The number of samples in>To find the minimum upper bound function, ++>Refers to any index in the dataset, i.e. +.>And->Respectively indicate- >Middle->And->Sample number->And->Respectively indicate->Middle->And->A sample number; />For Gaussian kernel function +.>The calculation method is as follows:

(7)

(8)

7. The method for detecting tunnel network anomalies based on domain adaptive depth self-encoder as claimed in claim 4, wherein in step 4, a dynamic threshold for anomaly detection is calculated, and an upper limit of the dynamic anomaly threshold is defined asThe lower limit is marked asThe method comprises the steps of carrying out a first treatment on the surface of the The specific method comprises the following steps:

step 4.1: first, the target domain data set Re-executing in the updated abnormality detection source domain model to obtain an output data set after encoder and decoder, denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Then, the reconstruction error of each piece of target domain data is calculated using the following formula:

(9)

(10)

(11)

(12)

(13)

8. The method for detecting tunnel network anomalies based on domain adaptive depth self-encoder of claim 7, wherein,2.

9. The method for detecting tunnel network anomalies based on the domain adaptive depth self-encoder as claimed in claim 7, wherein in step 5, the calculation method of the reconstruction error is as follows:

;

10. The method for detecting tunnel network anomalies based on the domain adaptive depth self-encoder as claimed in claim 8, wherein the detection method in step 6 is as follows:

When (when)When the mark is normal; when->Or->When marked as abnormal.