CN117041972A

CN117041972A - Channel-space-time attention self-coding based anomaly detection method for vehicle networking sensor

Info

Publication number: CN117041972A
Application number: CN202311167968.9A
Authority: CN
Inventors: 周漫; 史治国; 车欣
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-09-11
Filing date: 2023-09-11
Publication date: 2023-11-10

Abstract

The invention discloses a channel-space-time attention self-coding based anomaly detection method for an Internet of vehicles sensor. The invention utilizes a self-encoder network based on memory enhanced channel-space-time attention to detect sensor spoofing attacks in an autopilot system, and the proposed detection method utilizes reconstruction errors to detect anomalies in a multi-sensor input time sequence. The detection model in the invention consists of a time attention block based on memory enhancement and an encoder and a decoder based on a PSE-Res2Net module. The PSE-Res2Net module firstly adopts the Res2Net module to generate a multi-scale feature map, the multi-dimensional representation capability of the neural network is enhanced, and then the PSENT module is used for capturing the channel information of the position perception and the space information of the channel sensitivity through the interaction of the channel attention and the space attention. Based on the memory-enhanced temporal attention module, global sequence information of sensor measurements is collected to integrate multi-scale features.

Description

Channel-space-time attention self-coding based anomaly detection method for vehicle networking sensor

Technical Field

The invention belongs to the technical field of automatic driving automobile anomaly detection, relates to an automobile networking sensor anomaly detection method based on channel-space-time attention self-coding, and particularly relates to anomaly detection based on a self-encoder, combined with a memory enhancement model.

Background

Autopilot cars (Autonomous Vehicles, AV) rely on various types of sensors to evaluate the driving environment and issue necessary control commands, however, these sensors are susceptible to false data injection and spoofing attacks, and anomalies in sensor readings caused by malicious network attacks can lead to damaging consequences. The anomaly detection strategy is applied to various applications such as fault diagnosis, attack detection and the like. Due to the rapid development of wireless sensing and measurement technologies, autopilot process data may be described in terms of a time series as a set of time series observations. Time series anomaly detection has space-time dependence, and a classical statistical method is difficult to capture a long-term dynamic time series model of a vehicle. The hierarchical representation and the depth representation of the time series model are extracted by using a deep neural network method, and the model which is helpful for future state inference is constructed as a novel abnormality detection and safety event prevention strategy.

The reconstruction-based method learns a particular model to capture a low-dimensional potential space for a given time series of data, and then creates a comprehensive reconstruction of the data to approximate the original input data. An automatic encoder is a fundamental model of a reconstruction learning method, and can train data in an unsupervised manner to learn important features and potentially relevant structures of input data, including both encoder and decoder components. As the length of the sequence increases, the number of concealment states passing from the concealment state of the encoder to the concealment state corresponding to the decoder increases, leading to long-term dependency problems, and observing the latest frame in the sequence is more informative than the old frame. In order to solve the series of problems, the latest research method introduces a focus mechanism between the model encoder and the decoder, which is beneficial to selecting relevant encoder hiding states in all time steps and improving the representation capability of the model on multi-element time series data.

In view of this, the invention proposes a novel channel-space-time attention-based self-encoder network to enhance the expression capability of the network, which is used for solving the problem that abnormal data generated by hidden attack can be accurately fitted and reconstructed by an encoder, and reducing the false alarm rate.

Disclosure of Invention

Aiming at the problems of the existing anomaly detection technology based on reconstruction, the invention provides the anomaly detection method of the vehicle networking sensor based on channel-space-time attention self-coding, which has robustness, higher detection efficiency and accuracy.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

the method for detecting the anomaly of the vehicle networking sensor based on the channel-space-time attention self-encoder network comprises the following steps:

acquiring time series data of the running state of the automobile through a plurality of sensors arranged on the automobile, and preprocessing the data;

encoding the preprocessed data by using an encoder based on PSE-Res2 Net;

integrating multi-scale features on the features output by the encoder based on PSE-Res2Net by using a memory-enhanced time-attention mechanism, tracking long-term dependence, and generating global sequence features with time dependence in a span;

using a PSE-Res2Net based decoder, the global sequence features with time dependence are converted from a low-dimensional representation of the code to a high-dimensional representation of the original input, the reconstruction error is calculated and the detection result is obtained.

In a second aspect, the present invention provides an anomaly detection system for a vehicle networking sensor, comprising:

the data acquisition and preprocessing module acquires time series data of the running state of the automobile through a plurality of sensors arranged on the automobile and preprocesses the data;

the detection module is used for realizing abnormal detection of the running state of the automobile by using a self-encoder model based on channel-space-time attention; the self-encoder model based on channel-space-time attention comprises an encoder based on PSE-Res2Net, a decoder based on PSE-Res2Net and a memory enhanced time attention module; the encoder based on PSE-Res2Net is responsible for encoding the data output by the data acquisition and preprocessing module; the memory-enhanced time attention module is responsible for integrating multi-scale features on the features output by the encoder based on PSE-Res2Net by using a memory-enhanced time-attention mechanism, long-term dependency tracking and span generation of global sequence features with time dependency; the PSE-Res2Net based decoder is responsible for converting the time-dependent global sequence features from a coded low-dimensional representation to a high-dimensional representation of the original input, computing the reconstruction error and obtaining the detection result.

Compared with the prior art, the invention has the following advantages:

the invention utilizes Res2Net modules to learn information related to inputs fused with multisensor data and generate multiscale feature maps, thereby enhancing the multidimensional representation capability of a neural network

The PSENT module is used for extracting the channel space attention of the multi-scale feature map, and the channel information of the position perception and the channel sensitive space information are captured through the interaction of the channel attention mechanism and the space attention mechanism, so that the feature context information can be fully utilized.

Third, to capture long-term dependencies across sequence segments, the invention designs a memory-enhanced temporal attention block to integrate multi-scale features and obtain global sequence information for sensor measurements.

In summary, the invention is used for detecting the anomaly of the sensor of the automatic driving automobile, and a channel-space-time attention mechanism is introduced to aim at enhancing the effective learning capability of the multi-sensor time sequence and increasing the reconstruction error of the anomaly data, thereby improving the efficiency of anomaly detection.

Drawings

FIG. 1 is a block diagram of the workflow of the method of the present invention;

FIG. 2 is a block diagram of a PSE-Res2Net module; (a) a Res2Net module, (b) a PSENet module;

FIG. 3 is a schematic diagram of a memory enhanced time attention module;

FIG. 4 is a statistical histogram of reconstruction errors;

fig. 5 (a) - (b) are graphs of the reconstruction effect of the test set velocity and acceleration features, respectively.

Detailed Description

The invention is further analyzed in connection with the following figures.

The invention provides a channel-space-time attention self-coding based anomaly detection method for an Internet of vehicles sensor. The technical scheme adopted is as follows: firstly, eliminating abnormal data of a vehicle sensor by using a data statistics detection mechanism, and then, detecting the abnormal sensor by using a self-encoder network based on channel-space-time attention; the self-encoder consists of a spatial attention module based on memory enhancement and an encoder and a decoder based on a PSE-Res2Net module; the time attention module based on memory enhancement is used for collecting multi-scale characteristics so as to integrate global sequence information estimated by the sensor; the PSE-Res2Net module generates a multi-scale feature map by utilizing Res2Net, the multi-dimensional representation capability of the neural network is enhanced, and then the PSENT module is applied to capture interaction of channel attention and space attention, so that the channel information of location awareness and the space information of channel sensitivity are obtained. As shown in fig. 1, the following steps are specifically:

step one: acquiring time-series data (e.g., position, speed, acceleration, steering angle) of a running state of an automobile by a plurality of sensors mounted on the automobile; preprocessing the data;

the preprocessing adopts abnormal data elimination;

the abnormal data rejection specifically comprises the steps of performing state estimation on the time series data fused by the multiple sensors by using a state estimator based on extended Kalman filtering (Extended Kalman Filter, EKF), and then calculating state residual errors and performing statistical analysis on the residual errors so as to reject the abnormal data.

Step two: step one data is encoded using a PSE-Res2Net based encoder.

Step three: integrating multi-scale features on the features output by the encoder based on PSE-Res2Net by using a memory-enhanced time-attention mechanism, tracking long-term dependence, and generating global sequence features with time dependence in a span;

step four: and converting the global sequence characteristics with time dependence in the step three from the encoded low-dimensional representation to the high-dimensional representation of the original input by using a decoder based on PSE-Res2Net, calculating a reconstruction error and obtaining a detection result.

The method of the invention utilizes reconstruction errors based on the self-encoder to detect anomalies in the multi-sensor input time series.

In the first step, the data preprocessing is to perform state evaluation according to the sensing and fusion of the multi-sensor information so as to acquire accurate information of the surrounding environment and driving state of the vehicle, thereby eliminating abnormal data.

In step one, an EKF-based state estimator is modeled as:

P _k+1|k+1 ＝(I-K _k+1|k S _k+1|k )P _k+1|k

wherein the method comprises the steps ofIs the state estimate, P, at time steps k and k+1 _k+1|k And P _k+1|k+1 Is the error covariance matrix at time steps k and k+1, S _k And H is _k Is a jacobian matrix, K _k+1|k Is Kalman gain, S _k+1|k Is the value of the jacobian matrix at the state prediction value, y _k+1 And y is _k+1|k The system measurement state and the system estimation state are respectively, and I is a unit array.

In the second step, the encoder based on PSE-Res2Net inputs the sequence X= [ X ] ₁ ,X ₂ ,…,X _T ]Encoded asWherein T is ₁ =t/8; the PSE-Res2 Net-based encoder comprises four serially connected residual blocks, wherein each residual block is composed of a Res2Net module and a PSENT module in FIG. 2, and the convolution layers of the encoder comprise maximum pooling operation.

Each residual block acquires a multi-scale feature map using a Res2Net module, and then extracts channel and spatial attention through a PSENet module.

The operation steps of the Res2Net module as described in fig. 2 (a) are:

dividing a characteristic subgraph: receiving the time series data preprocessed in the step one or the output result X= [ X ] of the last residual block ₁ ,X ₂ ,…,X _T ]T represents time, input data is convolved with 1×1Divided into four sub-feature maps x _i ∈R ^H×W×C' (i e {1,2,3,4}, C' =c/4), where H, W and C represent the height, width, and number of channels of the feature map, respectively;

the convolution operation: first sub-feature map x ₁ Directly as a segmentation feature x ₁ 'A'; second sub-feature map x ₂ The segmentation feature x is obtained after a 3 x 3 convolution process ₂ 'A'; third sub-feature map x ₃ And segmentation feature x ₂ ' after a 3 x 3 convolution process, the segmented feature x is obtained ₃ 'A'; fourth sub-feature map x ₄ And segmentation feature x ₃ ' after a 3 x 3 convolution process, the segmented feature x is obtained ₄ 'A'; finally, all the segmentation features x are obtained _i ’(i∈{1,2,3,4},C'＝C/4)。

Characteristic connection: all the segmentation features x _i The 'connection is passed to a 1X 1 convolution to yield the output X' of the Res2Net block.

The PSENet module as described in fig. 2 (b) includes a channel attention module and a spatial attention module. SENet is a model of the attention mechanism used to enhance the expression capacity of convolutional neural networks (Convolutional Neural Network, CNN). The channel attention module focuses on the most significant local features in the feature map at different scales by both pressing and exciting the received feature map X' with the SENet network, thereby helping to infer fine channel features. The spatial attention module selects important location features by weighting all spatial features. The specific operation of the PSENet module is as follows:

calculating channel attention: the channel attention module performs extrusion operation along channel axis by combining average pooling (GAP) and maximum pooling (GMP) operation, and then performs excitation operation, and sums characteristic information of each channel to obtain average pooling characteristic of the channelAnd maximum pooling feature->Then, the average pooling feature of the channel +.>And maximum pooling feature->The relation of the channels is learned by inputting the relation into two convolution layers, namely, the output dimension of the first convolution layer is C/r (r represents the grouping factor of the channels and is usually set to be 16), and the relation is input into the second convolution layer after passing through a nonlinear activation function ReLU, wherein the output dimension is C. Then, the channel weights are adjusted using the sigmoid activation function to get the channel attention +.>G _c The calculation formula of (X') is as follows,

wherein the method comprises the steps ofIs a weight parameter of the convolution layer, σ is a nonlinear activation function ReLU.

The calculation space is noted: the spatial attention module selects important position features by weighting all spatial features to obtain spatial dimension attention G _w (X')，G _w The calculation formula of (X') is as follows,

wherein F is ₁ ^3×3 Representing the first 1 x 1 convolution operation,representing a second 3 x 3 convolution operation, < > and->Representing a third 1 x 1 convolution operation, < >>Representing the fourth 3 x 3 convolution operation, BN represents the batch normalization operation.

Generating cross-dimensional space channel attention: combining channel attention G using element summation operations _c (X') and spatial dimension attention G _w (X') with a dimension of generationThe cross-dimensional space channel attention of the PSENT module is normalized by adopting a sigmoid function, and then the normalized cross-dimensional space channel attention is multiplied by the original input X' of the PSENT module to obtain an output characteristic-> The calculation formula of (c) is as follows,

wherein the method comprises the steps ofThe Kronecker product of the two matrices is represented.

PSE-Res2Net will input sequence x= [ X ₁ ,X ₂ ,…,X _T ]Encoded asWherein T is ₁ ＝T/8。/>Representing T processed through four serially connected residual blocks ₁ The characteristics of the moment.

In step three, the time-attention mechanism of memory enhancement as in FIG. 3 includes a memory block responsible for storing and maintaining state information; a controller for performing read, write and update operations on the memory block; and a multi-headed note module for extracting context sequence information.

Memory blocks are used as a storage mechanism and consist of a fixed number of unordered memory entries, each of which is a vector. The memory blocks of the feature map of the nth training are represented as a matrixWhere q represents the height of the feature map, d is the memory block size, b= (l× (T ₁ -l+1)) is the dimension of the memory term, l is the sequence length. The p-th memory term of the nth training is marked as +.>Representing the p-th memory item on the feature map.

The controller acts as a central processing unit, managing interactions between the memory blocks and the attention module. It controls the read-write operation of the memory block and the update of the memory item.

The input of the memory block is used for calculating the reading weight of the memory blockAnd write weight +.>To minimize reconstruction of normal dataError, the memory item will be updated to record the prototype element of the encoded normal data. Wherein, the update formula of the p-th memory term is as follows,

since there are some abnormal sample data, these data can be memorized well by the combination of complex memory terms, so memory terms with weights greater than 1/S are selected to memorize the input feature vector, S is a constant greater than 1. If the weight value of a certain memory termLess than 1/S, it is forced to be 0.

Final read feature vector r _n Is the weighted sum of each memory term, the calculation formula is as follows,

according to the classification method of the characteristic sequence, r is calculated _n Divided into T ₁ L+1 subsequences of length l, 0 < l < T ₁ Wherein the jth subsequence is denoted r _n-(j) 。

In the running state of the vehicle, the state information of the past and future periods is more valuable than the information of the distant time. To capture this dependency, the present invention will extract the most influential context information. First, the feature sequence isDivided into T ₁ -l+1 subsequences of length l, i.e.>Is provided with->The j-th subsequence->The left-hand context and the right-hand context of (a) are denoted by +.>And->Wherein j is {1,2, …, T ₁ -l+1}，/>And->Is L and R, respectively, and L and R are integer multiples of L. Furthermore, the average context->Is a subsequence->Average value of (2). Then, the sub-sequence feature is converted into a multi-head query +.>Multi-head key-> And Multihead value->The calculation formula of the multi-head attention mechanism is as follows:

wherein the weight matrixIs the weight of the query, key and value of a particular header (i e {1,2, …, g }) representing g subspaces of different dimensions, < }>Is a linear transformation parameter, +.>A representation of the jth subsequence calculated by the ith head-attention mechanism. Attention (& gt) function for computingBy making inquiries +.>And bond->Interaction with the value->Multiplication. Multi-head inquiry->Is a subsequence->Is a linear transformation of the average context. Multi-headKey->And Multihead value->Is the j-th block of the characteristic sequence X' and the subsequence read by the memory module +.>And linear transformation of the connection between its context sequences. />Representing the j-th sub-sequence output by the multi-head attention module.

Finally, T mapping the multi-head attention module ₁ The l+1 subsequences are input to the maximum pool layer after Batch normalization operation, and global sequence characteristics with time dependency are obtained

In the fourth step, the structure of the decoder based on PSE-Res2Net is similar to that of the encoder, and the decoder consists of four serially connected residual blocks. These residual blocks facilitate the information flow by skipping the connection, enabling the decoder to reconstruct the output from the coding features. The PSE-Res2Net based decoder is to add up-sampling operations between convolutions of the PSE-Res2Net based encoder to enable the decoder to generate a high resolution output corresponding to the input data.

During model training, a training data set X is considered _Tr ＝{x ₁ ,x ₂ ,…,x _n N samples, each training sample x _i Consists of T elements. The penalty function includes memory addressing penaltyAnd reconstruction loss->Loss function->The calculation formula of (c) is as follows,

wherein I II ₂ Representation ofNorms (F/F)>Indicating the desire, F indicates the addressing function, +.>Is a model training parameter, gamma is a weighting parameter for determining the significance of two loss functions, and lambda is an optimal regularization parameter; e (x) _i ) Representation of sample x _i Coding; omega _i Weight representing memory block>D (-) represents decoding by a PSE-Res2Net based decoder.

Model training and anomaly detection optimize the encoder based on PSE-Res2Net and the decoder based on PSE-Res2Net, so that the reconstruction error of normal data is minimized, and the reconstruction error of an anomaly sample is larger.

(1) Parameter initialization: the method comprises lambda, gamma, maximum pooling and up-sampling rate, convolution kernel size, learning attenuation rate and weight attenuation rate;

(2) acquiring a multi-scale feature map: x' ≡Res2Net (X);

(3) encoding:

(4) enhancing memory:

(5) decoding:

(6) loss function training and model parameter updating:

(7) judging whether convergence is carried out, if so, ending training.

Examples:

the invention has been tested on 3 autopilot datasets: comma2k19, KITTI and CCSAD data sets. The experimental setup was equipped with a 12-core intel processor, two NVIDIA GeForce RTX 3090 GPUs and 64GB memory.

The following describes in detail the specific implementation steps of the present invention with reference to the accompanying drawings:

step one: and (3) data preparation, namely standardizing the acquired characteristics of the position, the speed, the acceleration, the steering angle and the like related to the driving state, generating continuous segments by applying a sliding window, and dividing a training set and a testing set.

Step two: parameter selection, the selection of training parameters will affect the performance of the detection model, two important parameters will be discussed: 1) Context length, 2) detection threshold τ.

Table 1 influence of left and right context length on three data sets

Table 1 illustrates the effect of different context lengths on the detection accuracy of three data sets. When the context length is 0, no context is embedded, and accuracy is the lowest among the three data sets.

Fig. 4 shows a statistical histogram of the reconstruction errors. Parameters alpha and beta of probability density function obtained by maximum likelihood estimation are 15 and 392 respectively, and mean value and variance of gamma distribution are μ=alpha/beta, sigma respectively ² ＝α/β ² The threshold τ is defined as τ=μ+ε.

Step three: and (3) abnormality detection, wherein an automatic encoder is adopted to reconstruct a feature set, the encoder maps normal sample data into a convex set of an encoding space, abnormal data is outside the convex set, and a reconstruction error is utilized to detect abnormality of time series data input by a fusion multi-sensor.

FIG. 5 is a graph showing the reconstruction effects of the velocity features and the acceleration features of the test set, and from the graph, it can be seen that the method of the invention can well reconstruct a normal subsequence, and the reconstruction error of the acceleration features is not more than 0.1m/s ² . The method has larger reconstruction error of the abnormal subsequence and the reconstruction error of the speed characteristic is larger than 6.3kph, thereby being beneficial to the detection of the abnormal sensor data.

The abnormality detection model in step three of the present embodiment is described as follows:

the detector assembly: as shown in fig. 1, the detection model consists of three components, including a PSE-Res2Net based encoder for input encoding and query generation, a time-awareness module for memory-enhanced long-term dependency tracking, and a PSE-Res2Net based decoder for sample reconstruction.

The reconstruction process: as shown in FIG. 2, the PSE-Res2Net module takes the multi-scale feature map with the Res2Net module and then draws channel and spatial attention through the PSENT module. The multi-scale features are then integrated using the memory enhanced temporal attention module shown in fig. 3, global sequence information is obtained and transmitted to the decoder for reconstruction.

The loss function is as follows: the loss function of the self-encoder consists of reconstruction loss for ensuring the similarity between reconstructed data and original data and memory addressing loss for not only encouraging the compactness of the memory module to ensure the similarity between memory elements and original features, but also for constraining the memory weightsAvoiding abnormal over-reconstruction due to complex combinations of memory terms. Loss function->The calculation formula of (c) is as follows,

the detection process comprises the following steps: and detecting the abnormality by using the reconstruction error according to the selected threshold tau.

The simulation mode of the abnormal track data in the third step is as follows: challenge data is artificially generated using a perturbation scheme to simulate a stealth attack. Specifically, randomly select samples { X } _j M of } _l Subsequence, and from X _j Subtracting or adding 2m from the lateral position of (c), simulating a sensor spoofing attack.

It is apparent that the above-described embodiments are illustrative of a method for detecting anomalies in an autopilot sensor, and are not intended to be limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. Any modifications and variations made thereto are still within the scope of the invention.

Claims

1. The method for detecting the anomaly of the vehicle networking sensor based on the channel-space-time attention self-coding is characterized by comprising the following steps of:

encoding the preprocessed data by using an encoder based on PSE-Res2 Net;

2. The method of claim 1, wherein the preprocessing includes outlier rejection; the abnormal data rejection specifically comprises the steps of performing state estimation on the multi-sensor fused time series data by using an EKF-based state estimator, then calculating a state residual error and performing statistical analysis on the residual error, so as to reject the abnormal data.

3. The method of claim 2, wherein the EKF-based state estimator is modeled as:

P _k+1|k+1 ＝(I-K _k+1|k S _k+1|k )P _k+1|k (1)

4. The method according to claim 1, characterized in that the PSE-Res2Net based encoder will input the sequence x= [ X ₁ ,X ₂ ,…,X _T ]Encoded asWherein T is ₁ =t/8; the PSE-Res2 Net-based encoder comprises four serially connected residual blocks, wherein each residual block comprises a Res2Net module and a PSENT module;

5. The method according to claim 4, wherein the Res2Net module obtains a multi-scale feature map, specifically:

dividing a characteristic subgraph: receiving the time series data preprocessed in the step one or the output result X= [ X ] of the last residual block ₁ ,X ₂ ,…,X _T ]T represents time, input data is convolved with 1×1Division into four sub-feature maps->Wherein H, W and C respectively represent the height, width and channel number of the feature map;

the convolution operation: first sub-feature map x ₁ Directly as a segmentation feature x ₁ 'A'; second sub-feature map x ₂ The segmentation feature x is obtained after a 3 x 3 convolution process ₂ 'A'; third sub-feature map x ₃ And segmentation feature x ₂ ' after a 3 x 3 convolution process, the segmented feature x is obtained ₃ 'A'; fourth sub-feature map x ₄ And segmentation feature x ₃ ' after a 3 x 3 convolution process, the segmented feature x is obtained ₄ 'A'; finally, all the segmentation features x are obtained _i ’(i∈{1,2,3,4},C'＝C/4)；

6. The method of claim 4, wherein the PSENet module comprises a channel attention module and a spatial attention module; the channel attention module focuses on the most obvious local features in the received feature map X' by utilizing two operations of extrusion and excitation of a SENet network under different scales, so that the channel attention module is helpful for deducing detailed channel features; the spatial attention module selects important location features by weighting all spatial features.

7. The method of claim 4, wherein the PSENet module operates as follows:

calculating channel attention: the channel attention module performs extrusion operation along the channel axis by combining average pooling and maximum pooling operation, and performs excitation operation afterwards, and gathers characteristic information of each channel to obtain average pooling characteristics of the channelAnd maximum pooling feature->Then, the average pooling feature of the channel +.>And maximum pooling feature->The relation of the channels is learned by inputting the relation into two convolution layers, namely, the output dimension of the first convolution layer is C/r, r represents the grouping factor of the channels, the channel is input into the second convolution layer after passing through a nonlinear activation function ReLU, and the output dimension is C; then, the channel weights are adjusted using the sigmoid activation function to get the channel attention +.>G _c The calculation formula of (X') is as follows:

wherein the method comprises the steps ofIs the weight parameter of the convolution layer, sigma is the nonlinear activation function ReLU;

the calculation space is noted: the spatial attention module selects important position features by weighting all spatial features to obtain spatial dimension attention G _w (X')，G _w The calculation formula of (X') is as follows:

wherein F is ₁ ^1×1 Representing the first 1X 1 convolution operation, F ₂ ^3×3 Representing a second 3 x 3 convolution operation, F ₃ ^3×3 Representing a third 1 x 1 convolution operation, F ₄ ^1×1 Representing a fourth 3 x 3 convolution operation, BN representing a batch normalization operation;

generating cross-dimensional space channel attention: combining channel attention G using element summation operations _c (X') and spatial dimension attention G _w (X') with a dimension of generationThe cross-dimensional space channel attention of the PSENT module is normalized by adopting a sigmoid function, and then the normalized cross-dimensional space channel attention is multiplied by the original input X' of the PSENT module to obtain an output characteristic-> The calculation formula of (2) is as follows:

8. The method of claim 7, wherein the memory-enhanced time-attention mechanism comprises a memory block responsible for storing and maintaining state information; a controller for performing read, write and update operations on the memory block; a multi-head attention module for extracting context sequence information;

the memory block is used as a storage mechanism and consists of a fixed number of unordered memory items, and each memory item is a vector; the memory blocks of the feature map of the nth training are represented as a matrixWhere q represents the height of the feature map, d is the memory block size, b= (l× (T ₁ -l+1)) is the dimension of the memory term, l is the sequence length; the p-th memory term of the nth training is marked as +.>Representing the p-th memory item on the feature map;

the controller acts as a central processing unit and manages the interaction between the memory block and the attention module; it controls the read-write operation of the memory block and the update of the memory item;

the input of the memory block is used for calculating the reading weight of the memory blockAnd write weight +.>In order to minimize the reconstruction error of the normal data, the memory item is updated to record the prototype element of the encoded normal data; wherein, the update formula of the p-th memory term is as follows,

selecting a memory term having a weight greater than 1/S to store the input feature vector, S being a constant greater than 1; if the weight value of a certain memory termLess than 1/S, it is forcedly set to 0;

according to the classification method of the characteristic sequence, r is calculated _n Divided into T ₁ L+1 subsequences of length l, 0 < l < T ₁ Wherein the jth subsequence is denoted r _n-(j) ；

Sequence of featuresDivided into T ₁ -l+1 subsequences of length l, i.e.>Is provided withThe j-th subsequence->The left-hand context and the right-hand context of (a) are denoted by +.>And->Wherein j is {1,2, …, T ₁ -l+1}，/>And->Is L and R, respectively, and L and R are integer multiples of L; furthermore, the average context->Is a subsequence->Average value of (2); then, the sub-sequence features are converted into multi-head queries by utilizing the multi-head attention mechanism of the multi-head attention moduleMulti-head key->And Multihead value->The calculation formula of the multi-head attention mechanism is as follows:

wherein the weight matrixIs the weight of the query, key and value of a particular header (i e {1,2, …, g }) representing g subspaces of different dimensions, < }>Is a linear transformation parameter, +.>A representation representing a j-th subsequence calculated by an i-th head attention mechanism; attention (&) function is used to calculate +.>By making inquiries +.>And bond->Interaction with the value->Multiplying to obtain; multi-head inquiry->Is a subsequence->A linear transformation of the context sequence and the average context; multi-head key->And Multihead value->Is the j-th block of the characteristic sequence X' and the subsequence read by the memory module +.>Linear transformation of the connection between its context sequences; />Representing the j sub-sequence output by the multi-head attention module;

9. The method of claim 1 wherein the penalty function comprises a memory addressing penalty/ _fmem And reconstruction loss l _rec Loss functionThe calculation formula of (2) is as follows:

wherein I II ₂ Representation l ₂ The norm of the sample is calculated,indicating the desire, F indicates the addressing function, +.>Is a model training parameter, gamma is a weighting parameter for determining the significance of two loss functions, and lambda is an optimal regularization parameter; e (x) _i ) Representation of sample x _i Coding; omega _i Weight representing memory block>D (-) represents decoding by a PSE-Res2Net based decoder.

10. An internet of vehicles sensor anomaly detection system for implementing the method of any one of claims 1-9, comprising: