CN114936530A

CN114936530A - Multi-element air quality data missing value filling model based on TAM and construction method thereof

Info

Publication number: CN114936530A
Application number: CN202210714518.6A
Authority: CN
Inventors: 马思远; 宋伟; 任晟岐; 焦佳辉
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2022-06-22
Filing date: 2022-06-22
Publication date: 2022-08-23

Abstract

The invention relates to a multi-air quality data missing value filling model based on TAM and a construction method thereof. The Triple-Views Layer predicts missing readings from three different perspectives, namely a timestamp perspective, a characteristic perspective and a short-term historical data perspective, the Output Layer performs different calculations according to numerical types of the missing readings, when the missing readings are continuous numerical values, the Output Layer distributes weights to predicted values of all the perspectives and performs weighted summation to obtain final prediction results, when the missing readings are discrete numerical values, the Output Layer connects outputs of the timestamp perspective and the characteristic perspective, prediction probabilities of different categories are obtained through linear mapping and normalization, and then prediction numerical values are obtained. The model can predict and fill missing readings of different numerical types, achieves a more accurate prediction effect, and develops a new idea for a subsequent task of filling the missing values of the multivariate air quality data.

Description

Multi-element air quality data missing value filling model based on TAM and construction method thereof

Technical Field

The invention relates to a filling model of a plurality of air quality data missing values based on TAM and a construction method thereof.

Background

The air quality data is multi-dimensional time series data with geographic markers, and has the periodicity of timeliness, sequentiality and seasonal variation. Through many years of research and development, methods for processing missing data include missing data deletion method, MEAN value replacement (MEAN), previous value replacement method, linear regression model lr (linear regression), Multi-Layer Perceptron MLP (Multi-Layer Perceptron), K-Nearest neighbor models KNN (K-Nearest Neighbors) and recurrent Neural network rnn (current Neural network). In the big data era, the quality of data profoundly influences decision making and scientific development, and the importance of the data is self-evident, and the missing data processing is still a field continuously explored by many researchers in addition to the influence of the data on the field of machine learning.

The method for deleting the missing data is a mode for processing the missing data in an early stage, and directly deletes all data containing true readings in data types, so that a large amount of information is lost in the data, the structurality of the data is damaged, and effective data cannot be obtained even in a data set with a high missing rate for analysis and processing.

When a missing value in a certain attribute is filled, the average value substitution method is used for filling the average value of all observed values of the attribute, the previous value substitution method is used for filling the data before the timestamp where the missing value is located, the two methods ignore the correlation between the variance of the data and each attribute, and the prediction is one-sided.

The linear regression model is a Feed Forward Neural Networks (Feed Forward Neural Networks) composed of a Fully Connected Layer (full Connected Layer), and the multi-Layer perceptron is formed by adding an activation function and the number of the Fully Connected layers on the basis of the linear regression model. Compared with the mean value substitution method, the two methods retain the variance and covariance of the missing data variables, but all the estimated values of the two methods follow a single regression curve, only the single-characteristic time sequence information is considered, the structural property of the data matrix is ignored, and the estimation cannot represent any intrinsic change in the data.

The K-nearest neighbor model is a machine learning model, which does not contain parameters to be learned, and for target samples containing missing values, KNN calculates the "distance" between the target samples and known samples according to some distance metric algorithm (such as euclidean distance), so as to select the nearest K samples to predict the missing values.

The recurrent neural network can deal with the timing problem, has the capacity of 'memorizing', and can well deal with the correlation of time characteristics. On one hand, the RNN forward propagation is carried out sequentially, the parallel characteristic of a GPU is difficult to utilize, a large amount of time is spent when data with a long time sequence is processed, and the calculation efficiency is not high; on the other hand, RNNs have difficulty in solving the problem of long-range dependence, and mining information corresponding to all time stamps is difficult when data with a long time sequence is encountered.

TAM is a deep learning model based on Attention mechanism (Attention). Firstly, Attention is paid to the fact that potential interrelations in data can be well explored, and in the face of multi-feature air quality data, the Attention mechanism can extract internal relations of the data from different angles, namely time stamp data correlation and feature correlation, so that a better prediction effect is achieved. Secondly, the attention mechanism well solves the long-range dependence problem, and can pay attention to all timestamp information, so that the condition that the timestamp information is difficult to establish contact at a distance is avoided. Finally, the Attention mechanism well utilizes the parallel processing capability of the GPU, and has good efficiency when processing data with a long time sequence.

In conclusion, the Attention mechanism can well process the task of filling missing values of multi-element air quality data.

Nowadays, people pay more and more attention to the air pollution problem because the people threaten the physical health of human beings and the sustainable development of society all the time, so people establish more and more monitoring stations in cities to continuously acquire air quality data, meteorological data and the like, and the data basis is provided for people to analyze pollution sources, explore main pollution components and predict air quality. However, due to shutdown maintenance, damage, communication error, unexpected interruption (such as power failure) and the like of the monitoring equipment, the data obtained by the monitoring of the sensor contains missing values. Missing data not only affects real-time pollutant numerical monitoring, but also brings interference to data analysis and pollutant concentration prediction, and the validity of the data is very important for people to analyze the data and prevent and treat air pollution.

The air quality missing value filling (air quality data monitoring) is used as an important branch of an urban air quality prediction task, has important research significance and application value, and has attracted wide attention in the field of air quality data mining. The traditional data analysis methods (such as MEAN value substitution method MEAN and pre-substitution method) can not meet the challenges of big data, the realization efficiency is low, and the use of the big data analysis technology is beneficial to deep mining of the data so as to extract the mode and the rule of air quality change and apply the mode and the rule to the filling task of missing data to obtain better effect. In the big data-based method, LR and MLP only consider single-feature time sequence information, and the structurality of a data matrix is ignored; the KNN needs to compare all known data when predicting each exact reading, so that time is consumed, and the algorithm depends on a distance measurement method; although RNN can be well competent for time series prediction tasks, its operating efficiency is low and the long-range dependence problem cannot be solved.

Disclosure of Invention

The invention provides a multivariate air quality data missing value filling model based on TAM and a construction method thereof, which are used for solving the technical problem of low prediction precision of the urban multivariate air quality data missing value filling model based on the conventional model.

A construction method of a multi-element air quality data missing value filling model based on a TAM (goal-based model) comprises the following steps:

constructing a BatchNorm layer for normalizing multivariate air quality data;

constructing an Input Layer, wherein the Input Layer comprises a multi-element time sequence position code and a fully-connected linear Layer, the multi-element time sequence position code is used for adding position information to a time sequence, and the fully-connected linear Layer is used for mapping Input data to a dense vector;

constructing a Triple-Views Layer, wherein the Triple-Views Layer comprises a realization time stamp view predictor, a characteristic view predictor and a short-term historical data view predictor, the time stamp view predictor is used for predicting from a time dimension, the characteristic view predictor is used for predicting from a multi-element characteristic dimension, and the short-term historical data view predictor is used for predicting according to historical readings;

and constructing an Output Layer, wherein the Output Layer is divided into a Continuous Output Layer and a Discrete Output Layer according to the type of the missing data. The Continuous Output Layer firstly distributes and initializes weights and offsets for a time stamp visual angle predictor, a characteristic visual angle predictor and a short-term historical data visual angle predictor aiming at Continuous missing data, secondly maps Output matrixes of the time stamp visual angle predictor and the characteristic visual angle predictor into predicted values, and finally carries out weighting summation on the predicted values according to the weights to obtain a final prediction result; aiming at Discrete missing data, the Discrete Output Layer firstly processes Output matrixes of a timestamp visual angle predictor and a characteristic visual angle predictor into Output vectors, secondly maps the Output vectors into a [0,1] interval, enables the accumulated sum of all Output vector values to be 1, and finally selects a node with the maximum probability as an Output node.

In one embodiment, the constructing the BatchNorm layer comprises:

a BatchNorm layer was constructed and the multivariate air quality data was normalized by a normalization function.

In a specific embodiment, a full connection Layer and a position code are arranged in the Input Layer.

In one embodiment, a multi-head attention mechanism, a ReLU activation function and a Dropout random deactivation function are arranged in the timestamp view predictor and the characteristic view predictor.

In one embodiment, the assigning and initializing weights and biases includes:

weights and offsets are assigned and initialized for each view { w } ₁ ,w ₂ ,w ₃ B, its initial value is {0.33,0.33,0.33,0 };

the mapping of the output matrix of the timestamp view predictor and the output matrix of the characteristic view predictor into a predicted value comprises the following steps:

the obtained output matrixes of the timestamp view predictor and the characteristic view predictor are Matrix respectively _T And Matrix _F Mapping the output matrix through a layer of fully-connected neural network to obtain predicted values;

the weighted summation of the predicted values according to the weights comprises the following steps:

the obtained predicted value of the characteristic view predictor, the predicted value of the timestamp view predictor and the predicted value of the short-term historical data view predictor are pre respectively _F 、pre _T And pre _P And weighting and summing the predicted values of the three visual angles, wherein the calculation formula is as follows:

Output＝w ₁ *pre _F +w ₂ *pre _T +w ₃ *pre _P +b

wherein Output is the final prediction result;

the processing of the output matrices of the timestamp view predictor and the feature view predictor into output vectors comprises:

the obtained output matrixes of the timestamp view predictor and the characteristic view predictor are Matrix respectively _T And Matrix _F The vector obtained by connecting the output matrixes is coordinate (Matrix) _T ,Matrix _F ) Then mapping through a full connection layer to obtain an output vector;

the mapping of the output vector into the [0,1] interval comprises:

wherein z is _i C is the length of the output vector, i.e. the number of classes of the discrete data.

A TAM-based multivariate air quality data missing value filling model comprises:

a BatchNorm layer to normalize the multivariate air quality data;

the system comprises an Input Layer, a data processing Layer and a data processing Layer, wherein the Input Layer comprises a position code for realizing a multi-element time sequence and a fully-connected linear Layer, the multi-element time sequence position code is used for adding position information to the time sequence, and the fully-connected linear Layer is used for mapping Input data to dense vectors;

the system comprises a Triple-Views Layer, a view Layer and a view Layer, wherein the Triple-Views Layer comprises a realization time stamp view predictor, a characteristic view predictor and a short-term historical data view predictor, the time stamp view predictor is used for predicting from a time dimension, the characteristic view predictor is used for predicting from a plurality of characteristic dimensions, and the short-term historical data view predictor is used for predicting according to historical readings;

and the Output Layer is divided into a Continuous Output Layer and a Discrete Output Layer according to the type of the missing data. The Continuous Output Layer firstly distributes and initializes weights and offsets for a time stamp visual angle predictor, a characteristic visual angle predictor and a short-term historical data visual angle predictor aiming at Continuous missing data, secondly maps Output matrixes of the time stamp visual angle predictor and the characteristic visual angle predictor into predicted values, and finally carries out weighting summation on the predicted values according to the weights to obtain a final prediction result; aiming at Discrete missing data, the Discrete Output Layer firstly processes Output matrixes of a timestamp visual angle predictor and a characteristic visual angle predictor into Output vectors, secondly maps the Output vectors into a [0,1] interval, enables the accumulated sum of all Output vector values to be 1, and finally selects a node with the maximum probability as an Output node.

In one embodiment, the constructing the BatchNorm layer comprises:

and constructing a BatchNorm layer, and normalizing the multi-element air quality data through a normalization function.

In one embodiment, a multi-head attention mechanism, a ReLU activation function and a Dropout random deactivation are arranged in the timestamp view predictor and the characteristic view predictor.

In one embodiment, the assigning and initializing weights and biases includes:

mapping the output matrix of the timestamp view predictor and the output matrix of the characteristic view predictor into a predicted value comprises the following steps:

the obtained predicted value of the characteristic view predictor, the predicted value of the timestamp view predictor and the predicted value of the short-term historical data view predictor are pre respectively _F 、pre _F And pre _P And weighting and summing the predicted values of the three visual angles, wherein the calculation formula is as follows:

Output＝w ₁ *pre _F +w ₂ *pre _T +w ₃ *pre _P +b

wherein Output is the final prediction result;

the mapping of the output vector into the [0,1] interval comprises:

The invention provides a multivariate air quality data missing value filling model based on TAM and a construction method thereof. After an Output matrix is obtained at each visual angle, according to the data type of missing values, an Output Layer is divided into a Continuous Output Layer and a Discrete Output Layer, and the Continuous Output Layer is used for Continuous missing data, firstly, a timestamp visual angle predictor, a characteristic visual angle predictor and a short-term historical data visual angle predictor are distributed and initialized with weight and bias, secondly, the Output matrices of the timestamp visual angle predictor and the characteristic visual angle predictor are mapped into predicted values, and finally, the predicted values are weighted and summed according to the weight to obtain a final prediction result; aiming at Discrete missing data, the Discrete Output Layer firstly processes Output matrixes of a timestamp visual angle predictor and a characteristic visual angle predictor into Output vectors, secondly maps the Output vectors into a [0,1] interval, enables the accumulated sum of all Output vector values to be 1, and finally selects a node with the maximum probability as an Output node. Compared with other traditional models and models based on big data technology, the model for filling the multiple air quality data missing value provided by the invention can achieve higher prediction precision, and opens up a new idea for the subsequent task of filling the multiple air quality data missing value.

Drawings

FIG. 1 is a flow chart of a filling model of multiple air quality data missing values based on TAM and a construction method thereof;

FIG. 2 is a flow chart of a specific data execution of the TAM-based multivariate air quality data missing value filling model provided by the present invention;

FIG. 3 is a schematic overall structure diagram of a filling model of the multiple air quality data missing value based on the TAM provided by the invention;

fig. 4 is a specific network structure diagram of the filling model of the multiple air quality data missing value based on TAM according to the present invention.

Detailed Description

The embodiment of the construction method of the filling model of the multiple air quality data missing value based on the TAM comprises the following steps:

the embodiment provides a multivariate air quality data missing value filling model based on TAM and a construction method thereof, a hardware execution main body of the construction method can be a desktop computer, a notebook computer, a server device, an intelligent mobile terminal (a tablet computer, a smart phone, etc.), and the embodiment is not limited.

As shown in fig. 1, the construction method includes:

step 1: constructing a BatchNorm layer:

Normalization function:

wherein x is the input data, and x is the input data,

in order to be the normalized data, the data,

is the mean value, σ, of the input data ² For the variance of the input data, e is a minimum to prevent the denominator from being 0.

Step 2: constructing an Input Layer, wherein the Input Layer comprises a Layer for realizing multi-element time sequence position coding and a fully-connected linear Layer, the multi-element time sequence position coding is used for adding position information for a time sequence, and the fully-connected linear Layer is used for mapping Input data to a dense vector:

the input data is a multidimensional time sequence with a certain time length, the time span is small, and each numerical value only represents index data of a certain attribute at a certain time stamp, so that a full connection layer is added to map the sparse vector into a high-dimensional dense vector.

Since the self-attention mechanism cannot identify time position information of a multi-dimensional time series, position coding is added to encode the time information.

Full connection layer: a ═ WX + B

Position coding:

wherein pos refers to the position of a certain feature in the multi-element air quality data, the value range is [0, max _ sequence _ length ], max _ sequence _ length is the total number of the features contained in the multi-element air quality data, i refers to the time dimension serial number of the feature, the value range is [0, embedding _ dimension/2 ], embedding _ dimension refers to the dimension of the feature after the multi-dimensional feature is mapped into a dense vector, and d refers to the dimension of the feature after the multi-dimensional feature is mapped into the dense vector _model Refers to the value of embedding _ dimension.

And step 3: constructing a Triple-Views Layer, wherein the Triple-Views Layer comprises a realization timestamp view predictor, a characteristic view predictor and a short-term historical data view predictor, the timestamp view predictor is used for predicting from a time dimension, the characteristic view predictor is used for predicting from a multi-element characteristic dimension, and the short-term historical data view predictor is used for predicting according to historical readings:

firstly, the multivariate air quality data is a time series data, and the numerical prediction of the multivariate air quality data in the future time depends on the variation characteristic in time; secondly, the multi-element air quality data has a plurality of characteristic dimensions, wherein the data of each characteristic dimension is a univariate time sequence, and potential relation exists between different characteristics; finally, the values of the time series tend to be relatively similar to the short-term neighboring data. This example designs a Triple-Views Layer framework where predictors in both the timestamp view and feature Views explicitly perform the following steps: the correlation calculation related to the time stamp and the feature is performed by an attention mechanism.

Q＝X·W ^Q

K＝X·W ^K

V＝X·W ^V

Q, K, V, which respectively represent the Query vector, Key vector and Value vector for each sequence.

The self-attention mechanism is as follows:

constructing a Triple-Views Layer, wherein the Triple-Views Layer comprises a realization time stamp view predictor, a characteristic view predictor and a short-term historical data view predictor, the time stamp view predictor is used for predicting from a time dimension, the characteristic view predictor is used for predicting from a multi-characteristic dimension, the short-term historical data view predictor is used for predicting according to historical readings, and the method comprises the following steps:

timestamp View Predictor:

to focus on potential links between different timestamp data, the model computes a correlation weight matrix between all timestamps, with a larger weight representing a higher correlation. The multi-head Attention mechanism focuses on correlation calculation on different spaces and splices a plurality of Attention results to serve as output vectors. After multi-head attention, a ReLU nonlinear activation function and Dropout are inactivated immediately, the ReLU adds nonlinear change to the neural network, the learning speed of the model is increased, the fitting capability is enhanced, and the overfitting degree of the model is reduced by random inactivation.

As shown in fig. 2, a specific data processing flow chart of the timestamp view predictor is that a linear layer is used to map the normalized multivariate air quality data into a dense vector, then a position code is added to the multivariate time sequence (dense vector), then a timestamp Q, K, V matrix is generated to perform correlation calculation, and a timestamp orientation matrix, that is, an output matrix of the timestamp view predictor, is obtained.

Feature View Predictor (Feature View Predictor):

in order to focus on potential links between different features, the model calculates a correlation weight matrix between all feature time series, with a larger weight representing a higher correlation. The multi-head Attention mechanism focuses on correlation calculation on different spaces and splices a plurality of Attention results to serve as output vectors. After multi-head attention, a ReLU nonlinear activation function and Dropout are inactivated immediately, the ReLU adds nonlinear change to the neural network, the learning speed of the model is increased, the fitting capability is enhanced, and the overfitting degree of the model is reduced by random inactivation.

As shown in fig. 2, a specific data processing flow diagram of the characteristic view predictor is that first, the normalized multivariate air quality data is transposed, then the normalized transposed multivariate air quality data is mapped into a dense vector by using a linear layer, then a position code is added to the multivariate time sequence (dense vector), then a characteristic Q, K, V matrix is generated for correlation calculation, and a characteristic Attention matrix, that is, an output matrix of the characteristic view predictor, is obtained.

Short-term historical data View Predictor (Previous View Predictor):

the multivariate air quality data is time sequence data, the data of adjacent time stamps are often more similar in the time dimension, and in order to pay attention to the characteristic, the short-term historical data predictor selects the data of a time stamp before the attribute (characteristic) where the missing value (to be predicted) is located as a predicted value.

As shown in fig. 2, for the short-term historical data view predictor, the data of the timestamp before the attribute where the missing value is located is obtained as the predicted value according to the normalized multivariate air quality data.

And step 3: and constructing an Output Layer, wherein the Output Layer is divided into a Continuous Output Layer and a Discrete Output Layer according to the type of the missing data. Aiming at Continuous missing data, the Continuous Output Layer firstly distributes and initializes weight and bias for a timestamp visual angle predictor, a characteristic visual angle predictor and a short-term historical data visual angle predictor, secondly maps Output matrixes of the timestamp visual angle predictor and the characteristic visual angle predictor into a predicted value, and finally carries out weighting summation on the predicted value according to the weight to obtain a final prediction result; aiming at Discrete missing data, the Discrete Output Layer firstly processes Output matrixes of a timestamp visual angle predictor and a characteristic visual angle predictor into Output vectors, secondly maps the Output vectors into a [0,1] interval, enables the accumulation sum of all Output vector values to be 1, and finally selects a node with the maximum probability as an Output node:

the air quality data not only contains continuous missing data but also contains discrete missing data, the attribution of the continuous missing data is a prediction task, and the attribution of the discrete missing data is a classification task, so that different types of missing data are filled in the air quality data and should be treated differently. Most of prediction models based on big data are often suitable for tasks attributed to continuous data, but are difficult to obtain a good effect on tasks attributed to discrete missing data, and models based on an Attention mechanism can well solve the problem, and the Attention extracts data features in a multi-dimensional time sequence by paying Attention to the mutual relation among different dimensions, and are suitable for classification and prediction tasks.

When the missing value is Continuous data, two linear layers are set in the Continuous Output Layer to map the Output matrixes of the time stamp view predictor and the characteristic view predictor into predicted values which are pre respectively _T And pre _F Then we obtain the prediction value pre of the short-term historical data view predictor _P And finally, calculating by a weighted summation formula to obtain a final predicted value, wherein the calculation formula is as follows:

Output＝w ₁ *pre _F +w ₂ *pre _T +w ₃ *pre _P +b

wherein w ₁ 、w ₂ 、w ₃ B, initializing when the model is established, and learning and updating by gradient descent in the model learning process, wherein Output is a final prediction value.

When the missing value is discrete data, Matrix _T And Matrix _F Respectively, the Output matrixes of the timestamp view predictor and the characteristic view predictor, and the Discrete Output Layer firstly connects the Output matrixes to obtain a splicing vector Concatenate (Matrix) _T ,Matrix _F ) Secondly, mapping the spliced vector to an output vector with dimensionality of classification category (discrete data category) through full-connection layer mapping, and finally mapping the output vector to [0,1] through a Softmax function]In the interval, the cumulative sum of all the output vector values is 1, and finally, the node with the maximum probability is selected as an output node, wherein the Softmax function is as follows:

The model provided by the invention can use the Attention mechanism to discover potential interrelations in data from different angles, namely time stamp data correlation and characteristic correlation, and simultaneously considers short-term historical data of a time sequence, so that a better prediction effect is obtained, and the Attention mechanism solves the long-range dependence problem. Compared with other models based on big data technology, the model can realize more accurate prediction on the multi-air quality missing value filling task, and opens up a new idea for the subsequent multi-air quality missing data filling task.

The embodiment of a filling model of the deficiency value of the multi-element air quality data based on the TAM comprises the following steps:

the present embodiment provides a multivariate air quality data missing value filling model based on TAM, which corresponds to the above construction method of the multivariate air quality data missing value filling model based on TAM, as shown in fig. 3, the multivariate air quality data missing value filling model based on TAM includes:

a BatchNorm layer to normalize the multivariate air quality data;

the system comprises an Input Layer, a data processing Layer and a data processing Layer, wherein the Input Layer comprises a multi-element time sequence position code and a fully-connected linear Layer, the multi-element time sequence position code is used for adding position information to a time sequence, and the fully-connected linear Layer is used for mapping Input data to dense vectors;

For specific implementation manners of each processing layer, refer to the above embodiment of the method for constructing the multivariate air quality data missing value filling model based on the TAM, and details are not repeated.

Fig. 4 is a specific network structure diagram of a multi-component air quality data missing value filling model based on TAM.

The above-mentioned embodiments are merely illustrative of the technical solutions of the present invention in a specific embodiment, and any equivalent substitutions and modifications or partial substitutions of the present invention without departing from the spirit and scope of the present invention should be covered by the claims of the present invention.

Claims

1. A construction method of a multi-element air quality data missing value filling model based on TAM is characterized by comprising the following steps:

constructing a BatchNorm layer for normalizing multivariate air quality data;

2. The method for constructing the TAM-based multivariate air quality data missing value filling model according to claim 1, wherein the constructing the embedding layer comprises the following steps:

3. The method for constructing the multivariate air quality data missing value filling model based on the TAM as claimed in claim 1, wherein a full connection Layer and a position code are arranged in the Input Layer.

4. The method for constructing the multivariate air quality data missing value filling model based on the TAM as claimed in claim 1, wherein a multi-head attention mechanism, a ReLU activation function and a Dropout random deactivation are arranged in the time stamp view predictor and the characteristic view predictor.

5. The method for constructing the TAM-based multivariate air quality data missing value filling model according to claim 1, wherein the assigning and initializing weights and biases comprises:

Output＝w ₁ *pre _F +w ₂ *pre _F +w ₃ *pre _P +b

wherein Output is the final prediction result;

the obtained output matrixes of the timestamp view predictor and the characteristic view predictor are Matrix respectively _T And matrix _F The vector obtained by connecting the output matrixes is Concatenate (Matrix) _T ,Matrix _F ) Then mapping through a full connection layer to obtain an output vector;

the mapping of the output vector into the [0,1] interval comprises:

6. A multi-element air quality data missing value filling model based on TAM is characterized by comprising the following steps:

a BatchNorm layer to normalize the multivariate air quality data;

and the Output Layer is divided into a Continuous Output Layer and a Discrete Output Layer according to the missing data type. The Continuous Output Layer firstly distributes and initializes weights and offsets for a time stamp visual angle predictor, a characteristic visual angle predictor and a short-term historical data visual angle predictor aiming at Continuous missing data, secondly maps Output matrixes of the time stamp visual angle predictor and the characteristic visual angle predictor into predicted values, and finally carries out weighting summation on the predicted values according to the weights to obtain a final prediction result; aiming at Discrete missing data, the Discrete Output Layer firstly processes Output matrixes of a timestamp visual angle predictor and a characteristic visual angle predictor into Output vectors, secondly maps the Output vectors into a [0,1] interval, enables the accumulated sum of all Output vector values to be 1, and finally selects a node with the maximum probability as an Output node.

7. The TAM-based multivariate air quality data deficiency value filling model of claim 6, wherein the BatchNorm layer is used to normalize the multivariate air quality data.

8. The TAM-based multivariate air quality data missing value filling model according to claim 6, wherein a full connection Layer and a position code are arranged in the Input Layer.

9. The TAM-based multivariate air quality data missing value population model according to claim 6, wherein a multi-head attention mechanism, a ReLU activation function and a Dropout random deactivation are set in the time stamp view predictor and the characteristic view predictor.

10. The TAM-based multivariate air quality data deficiency value filling model of claim 6, wherein the assigning and initializing weights and biases comprises:

weights and offsets are assigned and initialized for each view { w } ₁ ，w ₂ ，w ₃ B, its initial value is {0.33,0.33,0.33,0 };

Output＝w ₁ *pre _F +w ₂ *pre _T +w ₃ *pre _P +b

wherein Output is the final prediction result;

the obtained output matrixes of the timestamp view predictor and the characteristic view predictor are Matrix respectively _T And Matrix _F The vector obtained by connecting the output matrixes is Concatenate (Matrix) _T ，Matrix _F ) Then mapping through a full connection layer to obtain an output vector;

the mapping of the output vector into the [0,1] interval comprises: